Visualization for xy_grid
Visualization for rays
Visualization for point samples from camera 0
Visualization of Volume Renderer with BoxSDF
Obtained depth map
Box center: (0.2502, 0.2505, -0.0005) Box side lengths: (2.005, 1.5035, 1.5032)
NeRF trained on only
By adding view directions to the input, the network needs to learn view-dependent effects. For a given network capacity and number of views, adding view dependencies can cause the network to not generalize well to novel views. Hence, the image quality at these unseen poses could be lower. For example, the grills at the front of the tractor are lower in quality for the right image.
To avoid overfitting to training views, the positional encoding of the view directions are given as input after the opacity value predictions are made. This way, the network can decouple the actual color of a surface vs. the view-dependent emissions for a given view.
The hierarchical sampling strategy uses two networks - coarse and fine. The densities
The PDF is defined by the normalized weights
The results on the high-res model with different parameters are shown below.
Left: Model trained with 32 samples per ray. Right: Model trained with 64 samples per ray.
With lesser number of samples per ray, the visual quality degrades. Additionally, spurious artifacts can be seen for the image on the right. This is mostly due to the network predicting random color values for certain directions of the novel views.