Assignment 3
1. Differentiable Volume Rendering
1.3. Ray sampling
The visualization is shown as below:


1.4. Point sampling
The visualization is shown as below:

1.5. Volume rendering
The visualizations are shown as below: (Left: RGB; Right: Depth)


2.Optimizing a basic implicit volume
2.2 Loss and training
The optimized box position is : (0.25, 0.25, -0.00)
The optimized box side lengths are : (2.01, 1.50, 1.50)
2.3. Visualization
The one after training is shown as below:

3. Optimizing a Neural Radiance Field (NeRF)
I changed the model config to be 8 layers_xyz, and append_xyz happens at [5]. Other parts remains the same as default. Also, I did not use view-dependence for this part, but I used the positional encoding on xyz positions to achieve better results.
The visualizations at 10, 100, 240 epochs are shown as below:



4. NeRF Extras
4.1 View Dependence
To improve the NeRF model, I add the view dependence (encoding on the directions and append it to the original feature map for color outputs.)
The visualizations at 10, 100, 240 epoch are shown as below:



The view dependence provides the direction information for the color predictions, which might make the final rendering to be more accurate and more detailed. However, at the same time, the given direction might also make the model overfit to the specific direction, and not learn much about the general feature. Therefore, to avoid overfitting, I increased the model capacity (feature map) before passing the direction encoding vector (only add direction information near the very end of the model structure).
4.2 Hierarchical Sampling
Hierarchical sampling implementation follows the official NeRF implementation. In hierarchical sampling, there are two model passes: coarse and fine. For coarse network, I only sample 64 points per ray. After obtaining the weights, pass it into the importance sampler to sample 128 new points along each ray based on the weights. Then, concatenate these 128 points with the original 64 points. Pass all the 128 + 64 points per ray into the implicit model (fine network) to obtain the new density and color. And rendering the object based on that. The visualizations at 10, 100, 240 epochs are shown as below:



To train the hierarchical model, the training time is doubled since we have two networks running in sequence. Specifically, for my own laptop (2070 Super), I trained the view-dependence network with 7s per epoch, but for hierarchical sampling, it takes me 14s per epoch. Since we have more points per ray, and also those points are sampled based on their weights, we most likely will have points more related to the object itself. Therefore, the output rendering would be more accurate.
4.3 High Resolution Imagery
To train high resolution images, I have to modify the config a little bit to make it fit into the memory and also assure the quality. Previously, we only have 128x128 rays, but right now we will have 400x400 rays. To make the model learn about the whole object, I have to increase the batch_size (the number of rays to be sampled in the random_sample function) from 1024 to 4096, so that every iteration, we will sample more rays from the raybundle to learn about. Another change is about the 'chunk_size'. The 'chunk_size' can be helpful when we render the object (rendering will take all the 400x400 rays into account, which will cause OOM on my laptop), so I have to decrease the 'chunk_size' to be 16384. This would also make the training time increased a lot since right now each rendering will be divided into different chunks and each chunk will go through the whole network and aggregate the color seperately. This might be improved by doing parallization, but I don't have time to explore on this. Also, to save some training time, I didn't apply hierarchical sampling in this case, but I add view-dependence to ensure the quality though. The number of layers is also increased to 8, and the append_xyz happens at 5th layer.
The visualizations at 10, 100, 240 epoch are shown as below:



The training time for each epoch to run is 20s on laptop, which is even longer than the hirarchical sampling. The increasing of training time is mainly because of the increasing of batch_size. Also, the rendering happened at every 10 epochs, which took much longer time than before. All these lead to a longer training time compared to the small resolution images. From the quality side, we can see from the above visualization, with higher resolution, the final rendering contains more details and is more accurate since we have more rays (400x400 compared to 128x128), so that it leads to a better quality.