16-889 Assignment 4

Name: Adithya Sampath
Andrew ID: adithyas

Late days used:

Late Days

1. Sphere Tracing (30pts)

Visualization

You can run the code for part 1 with:

# mkdir images (uncomment when running for the first time)
python -m a4.main --config-name=torus

By default, the results will be written out to part_1.gif in the images folder.

Results

Feature My Output
Sphere Tracing Grid

Implementation

We're given origins and directions as input. So I initialise the initial points with the origins provided in the input. Then as desscribed in the lecture, I traverse along each of the directions to find the points along them which have the least distance to a surface (i.e closest to the surface). Implicit function provides the signed distances as output when provided a set of points as input. So the steps are as follows:

  1. Define a (small enough) threshold epsilon.
  2. Define initial points as the origins
  3. Iterate as long as either (a) any of the distance outputs from implicit function are > epsilon or (b) num_iters < max iterations
    1. As described in the slides, update t <- t + implicit_fn(points)
    2. Update points <- origins + t * directions
    3. num_iters++
  4. We get the mask by (a) getting the outputs of the implicit_fn on the points we get from the above step (b) and checking which of these distances are lesser then epsilon.

2. Optimizing a Neural SDF (30pts)

Visualization

You can run the code for part 2 with:

python -m a4.main --config-name=points

By default, the results will be written out to part_2_input.gif and part_2.gif in the images folder.

Results

Parameters My Output
n_layers_distance=6
n_hidden_neurons_distance=128
append_distance=[3]
Grid
n_layers_distance=6
n_hidden_neurons_distance=192
append_distance=[3]
Grid
n_layers_distance=6
n_hidden_neurons_distance=256
append_distance=[3]
Grid
n_layers_distance=8
n_hidden_neurons_distance=128
append_distance=[4]
Grid
n_layers_distance=8
n_hidden_neurons_distance=192
append_distance=[4]
Grid
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid

Implementation

MLP

For the input, I use the HarmonicEmbedding class to get higher frequency positional encoding data. For the MLP to get the signed distances, I used an architecture very similar to NeRF in the previous assignment. I used the MLPWithInputSkips provided to define the MLP network. I experiments with 6 layers (with input skip at layer 3) and 8 layers (with input skip at layer 4),and also with 128, 192, 256 neurons in the Linear layers. After the MLP with skips, I have 2 Linear layers, to get the distance output. Unlike NeRF, we don't apply activations on the Linear layer output, since distances can be negative (unlike density). Emperically, I observed that increasing number of layers and/or increasing number of neurons in each Linear layer improved results.

Eikonal Loss

The Eikonal constraint tries to ensure that the norm of the gradients is as close to 1 as possible. It essentially encourages the gradients to be of unit 2-norm. So we try to minimise in a L1 loss manner (a) where I first find abs value of (2-norm of gradients - 1) (b) and then return the mean of the above output as the loss.

3. VolSDF (30 pts)

Visualization

You can run the code for part 3 with:

python -m a4.main --config-name=volsdf

By default, the results will be written out to part_3_geometry.gif and part_3.gif in the images folder.

Results

Best results

Parameters Output RGB Output Geometry
beta=0.05
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid

Experiments with changing Beta and keeping Alpha=10:

Parameters Output RGB Output Geometry
beta=0.0001
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=0.0005
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=0.001
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=0.005
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=0.01
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=0.05
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=0.1
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=0.5
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=1.0
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=2.0
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=5.0
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid

Experiments with changing Alpha and keeping Beta=0.05:

Parameters Output RGB Output Geometry
beta=0.05
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=0.05
alpha=50
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
beta=0.05
alpha=100
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid

Comment on the settings you chose, and why they seem to work well.

Answer: Using positional encodings on the points input and using 8 Linear layers in the MLP with 256 neurons in each layer works well as shown in the previous question. For alpha and beta, empirically, from the above experiments with changing alpha and beta, I found out that alpha=10 and beta=0.05 provides good results. This explained using the below equation - the sdf to density curve should decrease at the surface (since sdf on the surface is 0). Intuitively, the density models a homogeneous object with a constant density alpha that smoothly decreases near the object’s boundary, where the smoothing amount is controlled by beta.

Grid

In your write-up, give an intuitive explanation of what the parameters alpha and beta are doing here. Also, answer the following questions:

1. How does high beta bias your learned SDF? What about low beta?

Answer: Beta controlls the smoothness of the transition from the surface boundary. A higher beta would cause the transitions to be extra smooth, whereas for smaller beta the transitions will be more steep. This is clearly especially from the geometry outputs (ex: beta=5) we can see that higher beta shows a higher density even for regions outside the object (since the density transitions are significantly smoothened out) and the outputs are smoothened out and extremely blurry. However for lower beta, we can see the features are more sharp and the model is able to capture the finer details of the object.

2. Would an SDF be easier to train with volume rendering and low beta or high beta? Why?

Answer: It would definitely be easier to train with higher beta. Although lower beta values provides sharp rendering results, very small values will cause the density to blow up (since beta is in the denominator of the exponent term in both cases of s<=0 amd s>0). Since beta controls the smoothness of the transitions at surfaces, for higher betas even the gradients will be smooth and model will be able to learn well. However, it's not good to use very high beta values as well, since the regions around the surface are treated as nearly the same, and we end up with overly smoothed blurry outputs.

3. Would you be more likely to learn an accurate surface with high beta or low beta? Why?

Answer: We would definitely learn accurate surfaces with lower beta as explained above. At the surface where sdf becomes 0, the density shift from surface to inside object is sharp. Hence, the model can capture finer details of the object more accurately. However, one must be careful to not use very low values of beta, since the exponential term might explore resulting in nan values.

4. Neural Surface Extras (CHOOSE ONE! More than one is extra credit)

4.1. Render a Large Scene with Sphere Tracing (10 pts)

Implementation

I used the following shapes:

a. 1 large torus for the outline of the face

b. 2 torus for the eyes

c. 2 small torus for the outline of the eye balls (may not be distinctly visible due to similar color with eye balls)

d. 2 sphere for the eye balls

e. 2 spheres for the cheek blush

f. 5 cube boxes for the nose

g. 16 spheres for the smile on the lips (totally worth it)

I created a custom config file called large_scene.yaml, which has the config for all the above shapes. I also created a new class called LargeSceneSDF in implicit.py to rendering all the above shapes. Based on the indices of the object, I assign the color. I have defined a color for each of the 30 objects.

Results

Keep smiling! Don't worry be happy :)

Experiment My Output
v0 Grid
v1 Grid
v2 Grid
v3 Grid
v4 Grid
v5 Grid

4.2 Fewer Training Views (10 pts)

Implementation

Answer: I decreased the number of views in the dataset.py and obtained the below results.

Results

Number of views SDF NeRF SDF Geometry
100 Grid Grid Grid
50 Grid Grid Grid
20 Grid Grid Grid
10 Grid Grid Grid
5 Grid Grid Grid
2 Grid Grid Grid

Discussion

Answer: VolSDF and NeRF have comparable results when sufficient views are provided. For num_views = 20 or 50 or 100, the outputs are very similar. However, VolSDF performs clearly better when the number of views is reduced to 10 or 5 - we can observe that, for unknown views, NeRF outputs are really blurred and there are artifacts in the output. In fact, when num_views=2, NeRF isn't even able to produce a results, but the VolSDF model is able to capture the rough geometry of the object.

4.3 Alternate SDF to Density Conversions (10 pts)

Implementation

Grid

Answer: I implemented the above logistic density distribution equation described in the NeuS paper. x is the signed distances, and s is the scale factor (like the alpha value described above). I created a separate config file called volsdf_density.yaml with a flag called alternate_density to use this formula to estmate the density when set to True.

Best results

Parameters Output RGB Output Geometry
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid

Experiments with changing Beta and keeping Alpha=10:

Parameters Output RGB Output Geometry
alpha=200
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
alpha=100
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
alpha=50
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
alpha=20
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
alpha=10
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
alpha=5
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
alpha=2
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid
alpha=1
n_layers_distance=8
n_hidden_neurons_distance=256
append_distance=[4]
Grid Grid

Discussion

Answer: Clearly alpha affects the output geometry. For large values, the geometry was sparse and had gaps. However, for small values of alpha (like alpha=1, 2) the results were were blurry, with lots of artifacts, and very smoothened out. Emperically, the best results were got for alpha=10.