16-889 Assignment 3

Name: Adithya Sampath
Andrew ID: adithyas

Late days used:

Late Days

NOTE:

Updated website only to remove code I'd put up here previously.

I hope this does not count as using a late day.

My last Canvas submission is still before the assignment deadline and I haven't made any changes to my code submission on Canvas.

Piazza Post: https://piazza.com/class/kvtncsfwx8768l?cid=145

1.3. Ray sampling (10 points)

Visualization

You can run the code for part 1.3 with:

python main.py --config-name=box

By default, the results will be written out to images/xy_grid.png and images/rays.png.

Results

Feature My Output
Grid Grid
Rays Rays

1.4. Point sampling (10 points)

Visualization

You can run the code for part 1.4 with:

python main.py --config-name=box

By default, the results will be written out to images/points.png.

Results

Feature My Output
Points Grid

1.5. Volume rendering (30 points)

Visualization

You can run the code for part 1.5 (for box) with:

python main.py --config-name=box

You can run the code for part 1.5 (for sphere) with:

python main.py --config-name=sphere

By default, the results will be written out to images/part_1.gif and depth.png.

Results for Box

Feature My Output
Color Features Color
Depth Features Depth

Results for Sphere

Feature My Output
Color Features Color
Depth Features Depth

2. Optimizing a basic implicit volume

2.1. Random ray sampling (5 points)

2.2. Loss and training (5 points)

Visualization

You can run the code for part 2.2 for box with:

python main.py --config-name=train_box

By default, the results will be written out to images/part_2.gif.

Box Center and Side Lengths

Value My Output Rounded Output
Center (0.25022584199905396, 0.2505739629268646, -0.0005213514086790383) (0.25, 0.25, 0.00)
Side Lengths (2.005093812942505, 1.503567099571228, 1.5033282041549683) (2.00, 1.50, 1.50)

2.3. Visualization

Results for Box

Feature My Output
Color Features Color

3. Optimizing a Neural Radiance Field (NeRF) (30 points)

Implementation

I used the network defined in the NeRF paper as reference to define my model. However, unlike the NeRF paper, I decided to use 6 MLP layers instead of 8 layers, and added the skip connection in the 4th layer instead of the 5th layer. Also, I set the number of neurons in the Linear layers to 128, instead of 256 which is used in the paper. I found that using this config was sufficient since the image is of low resolution (i.e 128x128). The implementation of my NeRF model is shared below. In each of the config files, I added a flag for whether to add view dependence or not. This made the implememntation of section 4.1 more convinient. I also calculate the higher frequency harmonic embeddings of the sample points and direction in the forward pass, to be used as input to model and for skip connection. I've used the MSE Loss for minimising the error between the NeRF rgb features output and the ground truth rgb sampled from the image.

Visualization

You can train a NeRF on the lego bulldozer dataset:

python main.py --config-name=nerf_lego

This will train a NeRF for 250 epochs on 128x128 images.

After training, the results will be written out to images/part_3.gif.

Results

Train Arguments My Output
Image size: 128 x 128
chunk_size: 32768
n_pts_per_ray: 128
n_hidden_neurons_xyz: 128
n_layers_xyz: 6
append_xyz: [3]
view_dependence: False
Color

4. NeRF Extras (Choose at least one! More than one is extra credit)

4.1 View Dependence (10 pts)

Implementation

I used the network defined in previous section, however, I defined different config files for view dependence on low res and high res images. I used the above config of 6 MLP layers instead of 8 layers as in the NeRF paper, and added the skip connection in the 4th layer instead of the 5th layer. Also, I set the number of neurons in the initial Linear layers to 128, instead of 256 which is used in the paper. The implementation of my NeRF model is shared in the above section. In each of the config files, I added a flag for whether to add view dependence or not. This made the implememntation of this section more convinient. I also calculate the higher frequency harmonic embeddings of the sample points and direction in the forward pass, to be used as input to model and for skip connections. In this case, I concat the harmonic embeddings of the direction input to the features output after 7 MLP layers (i.e. only after getting the density output). Like before, I've used the MSE Loss for minimising the error between the NeRF rgb features output and the ground truth rgb sampled from the image.

Visualization

You can train a NeRF on the lego bulldozer dataset with view dependence:

python main.py --config-name=nerf_lego_vd

This will train a NeRF for 250 epochs on 128x128 images.

After training, the results will be written out to images/part_4_vd.gif

You can also train a NeRF on the lego bulldozer dataset with view dependence and high-resolution output:

python main.py --config-name=nerf_lego_highres_vd

This will train a NeRF for 250 epochs on 400x400 images.

After training, the results will be written out to images/part_4_highres_vd.gif

Results

Low Resolution Images (128x128)

Train Arguments Outputs
Image size: 128 x 128
chunk_size: 32768
n_pts_per_ray: 128
n_hidden_neurons_xyz: 128
n_layers_xyz: 6
append_xyz: [3]
view_dependence: False
Color
Image size: 128 x 128
chunk_size: 32768
n_pts_per_ray: 128
n_hidden_neurons_xyz: 128
n_layers_xyz: 6
append_xyz: [3]
view_dependence: True
Color

High Resolution Images (400x400)

Train Arguments Outputs
Image size: 400 x 400
chunk_size: 8192
n_pts_per_ray: 128
n_hidden_neurons_xyz: 256
n_layers_xyz: 8
append_xyz: [4]
view_dependence: False
Color
Image size: 400 x 400
chunk_size: 8192
n_pts_per_ray: 128
n_hidden_neurons_xyz: 256
n_layers_xyz: 8
append_xyz: [4]
view_dependence: True
Color

Discussion

The NeRF paper used only position encoding to calculate density and used the direction encoding along with position features only for the rgb output. In case we passed both position and direction encoding as inputs, the model might learn to represent only the input views - thus overfitting. In this scenario, the output might be good for the views in the training data, but will not generalise to unknown views not in the training data. In fact, as we increase the dependence of direction embeddings, we tend to observe additional artifacts in the output for unknown views not present in the training data. Due to these reasons, i.e. model over-fitting to input views and failing to generalise to unknown views, we add view dependence only for estimating the rgb output and use only the position encoding for the density output.

Although the difference in outputs aren't very apparent for low-res images, I could observe minute improvements in output for high-res images. One observation I had, especially in the high-res view dependent output, is that the changes in color are more relatistic as view direction changes. There are also improvements when the n_pts_per_ray are increased, and these are discussed in the section below.

4.3 High Resolution Imagery (10 pts)

Implementation

I used the network defined in previous section, however, I defined different config files for with and without view dependence on high-res images (400x400). I used the network defined in the paper for this setup - 8 MLP layers as in the NeRF paper, and added the skip connection in the 5th layer. Also, I set the number of neurons in the initial Linear layers to 256, which is what is used in the paper. I experimented with increasing the n_pts_per_ray (for with and without view dependence on high-res images). Due to hardware restrictions, I also had to reduce chunk size to 8192 (especially when I increased n_pts_per_ray). However, for my experiements with n_pts_per_ray as 32 or 64, I realised that a deeper network wasn't required. For n_pts_per_ray=64, I used the n_hidden_neurons_xyz=128 keeping n_layers_xyz=8, and for n_pts_per_ray=32, I used n_hidden_neurons_xyz=64 and used only 6 layers instead of 8.

Visualization

You can train a NeRF on the lego bulldozer dataset for high-resolution output:

python main.py --config-name=nerf_lego_highres

This will train a NeRF for 250 epochs on 400x400 images.

After training, the results will be written out to images/part_4_highres.gif

You can also train a NeRF on the lego bulldozer dataset with view dependence and high-resolution output:

python main.py --config-name=nerf_lego_highres_vd

This will train a NeRF for 250 epochs on 400x400 images.

After training, the results will be written out to images/part_4_highres_vd.gif

Results

Without View Dependence

Train Arguments Results
chunk_size: 8192
n_pts_per_ray: 32
n_hidden_neurons_xyz: 64
n_layers_xyz: 6
append_xyz: [3]
view_dependence: False
Color
chunk_size: 8192
n_pts_per_ray: 64
n_hidden_neurons_xyz: 128
n_layers_xyz: 8
append_xyz: [4]
view_dependence: False
Color
chunk_size: 8192
n_pts_per_ray: 128
n_hidden_neurons_xyz: 256
n_layers_xyz: 8
append_xyz: [4]
view_dependence: False
Color
chunk_size: 8192
n_pts_per_ray: 256
n_hidden_neurons_xyz: 256
n_layers_xyz: 8
append_xyz: [4]
view_dependence: False
Color

With View Dependence

Train Arguments Results
chunk_size: 8192
n_pts_per_ray: 32
n_hidden_neurons_xyz: 64
n_layers_xyz: 6
append_xyz: [3]
view_dependence: True
Color
chunk_size: 8192
n_pts_per_ray: 64
n_hidden_neurons_xyz: 128
n_layers_xyz: 8
append_xyz: [4]
view_dependence: True
Color
chunk_size: 8192
n_pts_per_ray: 128
n_hidden_neurons_xyz: 256
n_layers_xyz: 8
append_xyz: [4]
view_dependence: True
Color
chunk_size: 8192
n_pts_per_ray: 192
n_hidden_neurons_xyz: 256
n_layers_xyz: 8
append_xyz: [4]
view_dependence: True
Color

Discussion

Intuitively, when we increase n_pts_per_ray, we're sampling more points along the ray, so we can get better estimates of density and color along the ray, hence the model learns to predict more precise features of the object. This was evident especially for the occuluded surfaces, like behind the bulldozer (where there's a shadow) - with 128 points, the floor looks smoothened out and the lego projections aren't visible. However, when I increased n_pts_per_ray to 192/256 points (with/without view-dependence respectively), the lego projections are visible on the back size floor as well. In fact the results are better with adding view dependence, as the color changes are more realistic especially for the occluded back region of the bulldozer. Clearly, for n_pts_per_ray as 32 or 64, the model isn't able to capture the high-frequency features and very smoothened out. In fact, the results for n_pts_per_ray=32 is very blurry, and most of the high-frequency features have been smoothened out.

However, one down side to increase n_pts_per_ray is the increase in compute required. The model required more time for training, and took significantly more time especially to render the output. Which is also why I had to reduce the chuck size when working with high-res images.