[16-889] Assignment 4 Submission: Aarush Gupta (aarushg3)

Late days used: 2

2 late days image


1. Sphere Tracing (30pts)

For each ray, the algorithm initializes a t value (based on the near point provided in the intialization of the SphereTracingRenderer), where t is a scaler in the origin + t * direction ray/point parameterization. At every iteration, the algorithm increments t by the SDF value at the point origin + t * direction. The algorithm also maintains a mask tensor which keeps a track of which rays do intersect with a surface and which ones didn't. When a point on a cast ray comes close (closeness defined by a thresold, here 0.005) enough to a surface, the mask is update to reflect the contact. Once the provided maximum number of iterations are performed, the scene is rendered using the obtained points and masks.

2. Optimizing a Neural SDF (30pts)





The MLP consists of an initial set of 6 FC layers of size 128, each followed by a ReLU activation. The output of this initial set of layers is a 128-dim vector which is fed into another dist_layer (see line 346 in a4/implicit.py under NeuralSurface class) which outputs a single distance value for each point. This layer consists of a single FC layer (with no activation) which projects the input 128-d vector to a single value. I also use harmonic embeddings for the input points, as in the NeRF paper.

The eikonal loss computes the norm of the gradients and enforces them to be as close to 1 as possible. I use a L2 loss on the difference between the norm of the gradients and the value 1.

The model was trained for 5k epochs with a batch size of 4096.

The above model and loss produces a better rendering of the bunny in my opinion.

3. VolSDF (30 pts)




For this part, I build up on top of the MLP I used in the previous question. I use the 6 layered "initial" FC network to extract features common to the distance and colour predictions to save computation as suggested. The 128-d output of the "initial" FC network is fed into a couple of FC layers of sizes 128 and a Sigmoid activation (just at the end) which produce a normalized RGB value for each point.

$\alpha$ controls the output range of the density (essentially acting as a scaling factor for the density values) whereas $\beta$ controls how the density drops off as we move away from the surface (see below for explanation).

The following curves shows how the density distribution varies with the SDF values for different $\beta$ values. SDF values have been plotted on the x-axis and density values on the y-axis.

Clearly, as the value of $\beta$ increases, the density distribution becomes smoother.

  1. How does high $\beta$ bias your learned SDF? What about low $\beta$?

I tried running the experiment with $\beta$ values in [0.005, 0.5].

$\beta$ = 0.05 seems to work the best. It apparently strikes the balance between the $\beta$ value not being too high or too low. $\beta$ = 0.5 doesn't produce a sharp and accurate rendering which is expected (follows from the discussion above about $\beta$ values). $\beta$ = 0.005 produces slightly better rendering of the scene, see the views below for comparison (left is $\beta$ = 0.005 and right is $\beta$ = 0.05)

beta ss 1

beta ss 2

But there are some artifacts in the views with $\beta$ = 0.005. Hence, $\beta$ = 0.05 seems to work the best.

4. Neural Surface Extras (CHOOSE ONE! More than one is extra credit)

4.1. Render a Large Scene with Sphere Tracing (10 pts)

For this part, I tried rendering 27 small spheres placed on the edges/corners of a 3D cube. The rendering of this scene using sphere tracing looks like:

Spheres on Cube Sphere Tracing

I also tried rendering round boxes (only and mixed with spheres) instead, but it didn't look visually appealing for small boxes, and hence stuck with spheres instead.

For this question, I modified the SDF surface class in a4/implicit.py (changes made in a duplicate class MultiSDFSurface) to accept multiple SDFs during initialization. This class, MultiSDFSurface returns the minimum of all the individual SDF values obtained from each individual SDF, and is rendered using the normal sphere tracing algorithm used for rendering the torus in Q1.

4.2 Fewer Training Views (10 pts)

Note: The views have been randomly sampled for all the experiments stated below.

Interestingly, the NeRF based rendering is sharper (maybe because of the difference in hyper parameters and slight difference in the network configurations) than the SDF based network. However, the NeRF based representation has a kind of reflection of the lego in the lower part of the gif, which is not there in the SDF rendering.

Further reducing the views to 10 yields the following results:



Also, reducing the views to 5 yields the following results:


The qualtiy of the rendered scenes does degrade with number of views for each of the models (which is expected).

The geometry of NeRF is sharper for 10 and 5 views as well, but for 5 views, NeRF isn't able to generate consistent views from every direction (the GIF almost blanks out for some views) whereas VolSDF, although having a poorer geometric reconstruction, is able to generate consistent renderings from almost every direction.

Hence, for a consistent rendering, VolSDF does seem better than NeRF given a few number of views.

4.3 Alternate SDF to Density Conversions (10 pts)

I used the 'naive' solution from the NeuS paper as an alternative for the equation from VolSDF paper to convert signed distanct to volumetric density. Setting s=50, the function yields the following graph:

s 50 distribution

Here, the signed distance is plotted on the x-axis and the density is plotted on the y-axis.

Training the model using the above function and s value, I obtain the following result:



This results from this alternative from the NeuS paper does produce comparable renderings (it rather seems to be slightly better), but has some artifacts which the VolSDF results don't have. But the NeuS approach produces a poorer geometry as compared to the VolSDF based approach.