[16-889] Learning for 3D Vision, Spring 2022

1.3 Ray Sampling

Visualization for xy_grid

grid

Visualization for rays

rays

1.4 Point Sampling

Visualization for point samples from camera 0

q1_4_pts_sample

1.5 Volume Rendering

Visualization of Volume Renderer with BoxSDF

part_1

Obtained depth map

depth

2.2 Optimizing a basic implicit volume - Loss and training

Box center: (0.2502, 0.2505, -0.0005) Box side lengths: (2.005, 1.5035, 1.5032)

2.2 Optimizing a basic implicit volume - Visualization

part_2

3. Optimizing a Neural Radiance Field (NeRF)

part_3

4. NeRF Extras

4.1 View Dependence

xyz dir

$(x, y, z)$ left $(x, y, z, \theta, \phi)$ (right). Specular effects can be observed on the tractor bucket region for the image trained with view directions.

By adding view directions to the input, the network needs to learn view-dependent effects. For a given network capacity and number of views, adding view dependencies can cause the network to not generalize well to novel views. Hence, the image quality at these unseen poses could be lower. For example, the grills at the front of the tractor are lower in quality for the right image.

To avoid overfitting to training views, the positional encoding of the view directions are given as input after the opacity value predictions are made. This way, the network can decouple the actual color of a surface vs. the view-dependent emissions for a given view.

4.2 Hierarchical Sampling

xyz

$\sigma$ obtained from the coarse network is used to calculate a piecewise-constant PDF along each ray.

$\hat{w_i} = \frac{w_i}{\sum w_i}$ $w_i = T_i(1 - exp(-\sigma_i \delta_i))$ . While the coarse network samples 64 points along each ray, the fine network samples 128 points along each ray according to the above PDF. The sampling procedure applied is called the inverse transform sampling. Since more sample points need to be evaluated for the final rendering, the rendering speed is 2x as much slower than without hierarchical sampling. But the visual quality is much better as shown above.

4.3 High Resolution Imagery

The results on the high-res model with different parameters are shown below.

xyz xyz

Left: Model trained with 32 samples per ray. Right: Model trained with 64 samples per ray.

With lesser number of samples per ray, the visual quality degrades. Additionally, spurious artifacts can be seen for the image on the right. This is mostly due to the network predicting random color values for certain directions of the novel views.