Assignment 3: Volume Rendering & Neural Radiance Fields

Name: Edward Li
Andrew ID: edwardli
Late Days Used: Three

1. Differentiable Volume Rendering

1.3. Ray Sampling (10 points)

We shoot rays through camera pixels. Run python main.py --config-name=box to generate images.

xy_grid rays
xy_grid rays

1.4. Point Sampling (10 points)

Run python main.py --config-name=box to generate point sample visualizations:

points

1.5. Volume rendering (30 points)

A few cumprods later, we have our volume rendered. The depth map is rendered similarly to the image, but with depth values instead of colors (weighted average of point depths by weight).

Run python main.py --config-name=box to generate images.

Render Depth
Render Depth

2. Optimizing a basic implicit volume

2.1. Random ray sampling (5 points)

This is implemented on L110 of ray_utils.py. We use nice indexing tricks to do this, although it might not be as memory efficient as it could be.

2.2. Loss and training (5 points)

We use torch's built-in mean squared error loss. Run python main.py --config-name=train_box to fit the implicit volume.

We get the box parameters:

Box center: $(0.25, 0.25, 0.00)$ Box side lengths: $(2.01, 1.50, 1.50)$

2.3. Visualization

The rendered GIF seems almost identical to the TA solution:

Render

3. Optimizing a Neural Radiance Field (NeRF) (30 points)

I implemented the standard NeRF architecture given in the NeRF paper, with the parameters in the Hydra config. More concretely, we use 6 XYZ harmonic functions, 6 XYZ layers, with a input skip connection on layer 3.

One nice trick I use that diverges from the parameters in the config is adding density noise. This helps prevent the model from collapsing to all 0s, as well as improves generalization to novel views as claimed in the NeRF paper. I add noise with stdev 1.0 when training, and turn it off during eval.

This gives us quite a clean image:

NeRF

A few spurious pixels appear in a few places, which probably can be attributed to the density noise, as the background, even when NeRF is predicting 0, can still have some noise during training. Gradients are weird.

Run python main.py --config-name=nerf_lego to train the model.

NeRF Extras

4.1. View Dependence (10 points)

I implemented view dependence with the nice LinearWithRepeat layer provided. This is more computationally efficient than duplicating the direction over all ray points. I also follow the original NeRF paper's suggestion to insert direction encoding only in the second-to-last layer, after density is predicted. This allows the network to only use direction in emission color prediction, as density is not a view-dependent feature. This is a nice regularization to prevent overfitting.

I have a new config to enable view dependence - run python main.py --config-name=nerf_lego_view to train.

View Dependence

The rendered image looks mostly similar to part 3, but with some changing specularity through rotation. It's a bit too low resolution to look at in detail, but we can observe the Lego baseplate getting darker and lighter as it rotates, which is probably because the baseplate has studs in the real world which reflect light directionally.

In general, increasing view dependence hurts generalization quality - as we're adding a non-global feature, the network can overfit to a small number of views. In the most degenerate case, the network can just output a flat plane for each image, which it can differentiate by the view direction. However, we make sure to only condition color on view, and not density, which mostly counteracts this.

4.2. Hierarchical Sampling (10 points)

This was quite a lot more annoying to implement than the previous section. I make sure to .detach() the weights when resampling, otherwise autograd tries to backprop through the hierarchical sampling, which hurts performance. We also modify the loss function to both optimize over coarse and fine images.

Run python main.py --config-name=nerf_lego_view_hierarchical.yaml to train.

Hierarchical

We see increased clarity, with the red lego sticking on top staying the same width over the entire rotation.

However, this training takes about twice as long - we significantly reduce the speed by performing two ray passes instead of 1, as well as a random sampling step. However, we probably get better quality than just running a non hierarchical sampling technique for twice as long, as our sampled points are more helpful for training/produce better gradients as they are focused on high density areas.