16-889 Assignment 3
Name: Adithya Sampath
Andrew ID: adithyas
Late days used:

NOTE:
Updated website only to remove code I'd put up here previously.
I hope this does not count as using a late day.
My last Canvas submission is still before the assignment deadline and I haven't made any changes to my code submission on Canvas.
Piazza Post: https://piazza.com/class/kvtncsfwx8768l?cid=145
1.3. Ray sampling (10 points)
Visualization
You can run the code for part 1.3 with:
python main.py --config-name=box
By default, the results will be written out to images/xy_grid.png
and images/rays.png
.
Results
Feature | My Output |
---|---|
Grid | ![]() |
Rays | ![]() |
1.4. Point sampling (10 points)
Visualization
You can run the code for part 1.4 with:
python main.py --config-name=box
By default, the results will be written out to images/points.png
.
Results
Feature | My Output |
---|---|
Points | ![]() |
1.5. Volume rendering (30 points)
Visualization
You can run the code for part 1.5 (for box) with:
python main.py --config-name=box
You can run the code for part 1.5 (for sphere) with:
python main.py --config-name=sphere
By default, the results will be written out to images/part_1.gif
and depth.png
.
Results for Box
Feature | My Output |
---|---|
Color Features | ![]() |
Depth Features | ![]() |
Results for Sphere
Feature | My Output |
---|---|
Color Features | ![]() |
Depth Features | ![]() |
2. Optimizing a basic implicit volume
2.1. Random ray sampling (5 points)
2.2. Loss and training (5 points)
Visualization
You can run the code for part 2.2 for box with:
python main.py --config-name=train_box
By default, the results will be written out to images/part_2.gif
.
Box Center and Side Lengths
Value | My Output | Rounded Output |
---|---|---|
Center | (0.25022584199905396, 0.2505739629268646, -0.0005213514086790383) | (0.25, 0.25, 0.00) |
Side Lengths | (2.005093812942505, 1.503567099571228, 1.5033282041549683) | (2.00, 1.50, 1.50) |
2.3. Visualization
Results for Box
Feature | My Output |
---|---|
Color Features | ![]() |
3. Optimizing a Neural Radiance Field (NeRF) (30 points)
Implementation
I used the network defined in the NeRF paper as reference to define my model. However, unlike the NeRF paper, I decided to use 6 MLP layers instead of 8 layers, and added the skip connection in the 4th layer instead of the 5th layer. Also, I set the number of neurons in the Linear layers to 128, instead of 256 which is used in the paper. I found that using this config was sufficient since the image is of low resolution (i.e 128x128). The implementation of my NeRF model is shared below. In each of the config files, I added a flag for whether to add view dependence or not. This made the implememntation of section 4.1 more convinient. I also calculate the higher frequency harmonic embeddings of the sample points and direction in the forward pass, to be used as input to model and for skip connection. I've used the MSE Loss for minimising the error between the NeRF rgb features output and the ground truth rgb sampled from the image.
Visualization
You can train a NeRF on the lego bulldozer dataset:
python main.py --config-name=nerf_lego
This will train a NeRF for 250 epochs on 128x128 images.
After training, the results will be written out to images/part_3.gif
.
Results
Train Arguments | My Output |
---|---|
Image size: 128 x 128 chunk_size: 32768 n_pts_per_ray: 128 n_hidden_neurons_xyz: 128 n_layers_xyz: 6 append_xyz: [3] view_dependence: False |
![]() |
4. NeRF Extras (Choose at least one! More than one is extra credit)
4.1 View Dependence (10 pts)
Implementation
I used the network defined in previous section, however, I defined different config files for view dependence on low res and high res images. I used the above config of 6 MLP layers instead of 8 layers as in the NeRF paper, and added the skip connection in the 4th layer instead of the 5th layer. Also, I set the number of neurons in the initial Linear layers to 128, instead of 256 which is used in the paper. The implementation of my NeRF model is shared in the above section. In each of the config files, I added a flag for whether to add view dependence or not. This made the implememntation of this section more convinient. I also calculate the higher frequency harmonic embeddings of the sample points and direction in the forward pass, to be used as input to model and for skip connections. In this case, I concat the harmonic embeddings of the direction input to the features output after 7 MLP layers (i.e. only after getting the density output). Like before, I've used the MSE Loss for minimising the error between the NeRF rgb features output and the ground truth rgb sampled from the image.
Visualization
You can train a NeRF on the lego bulldozer dataset with view dependence:
python main.py --config-name=nerf_lego_vd
This will train a NeRF for 250 epochs on 128x128 images.
After training, the results will be written out to images/part_4_vd.gif
You can also train a NeRF on the lego bulldozer dataset with view dependence and high-resolution output:
python main.py --config-name=nerf_lego_highres_vd
This will train a NeRF for 250 epochs on 400x400 images.
After training, the results will be written out to images/part_4_highres_vd.gif
Results
Low Resolution Images (128x128)
Train Arguments | Outputs |
---|---|
Image size: 128 x 128 chunk_size: 32768 n_pts_per_ray: 128 n_hidden_neurons_xyz: 128 n_layers_xyz: 6 append_xyz: [3] view_dependence: False |
![]() |
Image size: 128 x 128 chunk_size: 32768 n_pts_per_ray: 128 n_hidden_neurons_xyz: 128 n_layers_xyz: 6 append_xyz: [3] view_dependence: True |
![]() |
High Resolution Images (400x400)
Train Arguments | Outputs |
---|---|
Image size: 400 x 400 chunk_size: 8192 n_pts_per_ray: 128 n_hidden_neurons_xyz: 256 n_layers_xyz: 8 append_xyz: [4] view_dependence: False |
![]() |
Image size: 400 x 400 chunk_size: 8192 n_pts_per_ray: 128 n_hidden_neurons_xyz: 256 n_layers_xyz: 8 append_xyz: [4] view_dependence: True |
![]() |
Discussion
The NeRF paper used only position encoding to calculate density and used the direction encoding along with position features only for the rgb output. In case we passed both position and direction encoding as inputs, the model might learn to represent only the input views - thus overfitting. In this scenario, the output might be good for the views in the training data, but will not generalise to unknown views not in the training data. In fact, as we increase the dependence of direction embeddings, we tend to observe additional artifacts in the output for unknown views not present in the training data. Due to these reasons, i.e. model over-fitting to input views and failing to generalise to unknown views, we add view dependence only for estimating the rgb output and use only the position encoding for the density output.
Although the difference in outputs aren't very apparent for low-res images, I could observe minute improvements in output for high-res images. One observation I had, especially in the high-res view dependent output, is that the changes in color are more relatistic as view direction changes. There are also improvements when the n_pts_per_ray are increased, and these are discussed in the section below.
4.3 High Resolution Imagery (10 pts)
Implementation
I used the network defined in previous section, however, I defined different config files for with and without view dependence on high-res images (400x400). I used the network defined in the paper for this setup - 8 MLP layers as in the NeRF paper, and added the skip connection in the 5th layer. Also, I set the number of neurons in the initial Linear layers to 256, which is what is used in the paper. I experimented with increasing the n_pts_per_ray (for with and without view dependence on high-res images). Due to hardware restrictions, I also had to reduce chunk size to 8192 (especially when I increased n_pts_per_ray). However, for my experiements with n_pts_per_ray as 32 or 64, I realised that a deeper network wasn't required. For n_pts_per_ray=64, I used the n_hidden_neurons_xyz=128 keeping n_layers_xyz=8, and for n_pts_per_ray=32, I used n_hidden_neurons_xyz=64 and used only 6 layers instead of 8.
Visualization
You can train a NeRF on the lego bulldozer dataset for high-resolution output:
python main.py --config-name=nerf_lego_highres
This will train a NeRF for 250 epochs on 400x400 images.
After training, the results will be written out to images/part_4_highres.gif
You can also train a NeRF on the lego bulldozer dataset with view dependence and high-resolution output:
python main.py --config-name=nerf_lego_highres_vd
This will train a NeRF for 250 epochs on 400x400 images.
After training, the results will be written out to images/part_4_highres_vd.gif
Results
Without View Dependence
Train Arguments | Results |
---|---|
chunk_size: 8192 n_pts_per_ray: 32 n_hidden_neurons_xyz: 64 n_layers_xyz: 6 append_xyz: [3] view_dependence: False |
![]() |
chunk_size: 8192 n_pts_per_ray: 64 n_hidden_neurons_xyz: 128 n_layers_xyz: 8 append_xyz: [4] view_dependence: False |
![]() |
chunk_size: 8192 n_pts_per_ray: 128 n_hidden_neurons_xyz: 256 n_layers_xyz: 8 append_xyz: [4] view_dependence: False |
![]() |
chunk_size: 8192 n_pts_per_ray: 256 n_hidden_neurons_xyz: 256 n_layers_xyz: 8 append_xyz: [4] view_dependence: False |
![]() |
With View Dependence
Train Arguments | Results |
---|---|
chunk_size: 8192 n_pts_per_ray: 32 n_hidden_neurons_xyz: 64 n_layers_xyz: 6 append_xyz: [3] view_dependence: True |
![]() |
chunk_size: 8192 n_pts_per_ray: 64 n_hidden_neurons_xyz: 128 n_layers_xyz: 8 append_xyz: [4] view_dependence: True |
![]() |
chunk_size: 8192 n_pts_per_ray: 128 n_hidden_neurons_xyz: 256 n_layers_xyz: 8 append_xyz: [4] view_dependence: True |
![]() |
chunk_size: 8192 n_pts_per_ray: 192 n_hidden_neurons_xyz: 256 n_layers_xyz: 8 append_xyz: [4] view_dependence: True |
![]() |
Discussion
Intuitively, when we increase n_pts_per_ray, we're sampling more points along the ray, so we can get better estimates of density and color along the ray, hence the model learns to predict more precise features of the object. This was evident especially for the occuluded surfaces, like behind the bulldozer (where there's a shadow) - with 128 points, the floor looks smoothened out and the lego projections aren't visible. However, when I increased n_pts_per_ray to 192/256 points (with/without view-dependence respectively), the lego projections are visible on the back size floor as well. In fact the results are better with adding view dependence, as the color changes are more realistic especially for the occluded back region of the bulldozer. Clearly, for n_pts_per_ray as 32 or 64, the model isn't able to capture the high-frequency features and very smoothened out. In fact, the results for n_pts_per_ray=32 is very blurry, and most of the high-frequency features have been smoothened out.
However, one down side to increase n_pts_per_ray is the increase in compute required. The model required more time for training, and took significantly more time especially to render the output. Which is also why I had to reduce the chuck size when working with high-res images.