Surface Rendering: Assigment 4 (16-889)

Name: Shefali Srivastava

Andrew ID: shefalis

Late Days:

image

1. Sphere Tracing (30pts)

Visualisation:

image

Write Up for Implementation:

For implementing Sphere Tracing, I used the algorithm discussed in the lecture. At any given point on any ray, we move along the ray as many units as is the current SDF at that point. The logic, as discussed, is that moving SDF units along the ray would be conservative and we will not cross the surface since it is the minimum distance to the nearest point on the surface.

For implementation of multiple rays at once, I ran sphere tracing for a total of max_steps that I defined as 100 for all the rays. I started at origin for all the rays. After those steps, I checked the implicit function for all the N points of the N rays. For any point that has its implicit function evaluated to a very small value (epsilon taken as 1e-5 here) we say that it is on the surface. So points is an array of N that is defined in the x + t * d format where x is the camera origin and d is the direction of the ray. Parameterised by t, it defines the point in space. mask defines whether a point lies on the implicit surface or not and is also an array of size N defined wrt the taken threshold.

2. Optimizing a Neural SDF (30pts)

Visualisation:

image image

Brief Description of MLP:

Similar to NeRF in the previous assignment, I have implemented a backbone network with Linear layers on top.

  1. The first step is Harmonic Embeddings of the points.
  2. The backbone network contains 6 linear layers with input of the first layer as 3 neurons to denote input point (x, y, z) and all the intermediate neurons equal to 128.
  3. On top, a linear layer is again applied with input neurons as 128 and output neurons also as 128. ReLU activation is used everywhere in the network. On top of this, a linear layer that outputs the predicted SDF is applied which is naturally a layer with 128 input neurons and 1 output neuron. The difference here wrt the previous assignment where we predicted density instead of SDF is that the SDF is a distance and not constrained to lie between [0, 1) so I have not applied a sigmoid layer at the end.

Brief Description of Eikonal Loss:

The Eikonal Loss is a geometric regulariser to constrict the network to represent a function that is indeed a Signed Distance Function. The loss is defined as: $|\Delta f| = 1$. That means, the norm of the gradient is equal to 1.

This is intuitive. Let me explain: Let us assume an x-y plane as our 2D surface and the 3D point (0, 0, 1). The direction of the steepest ascent would be along the z axis. The units moved along the z axis would be the change in SDF from the surface as well. Therefore for a function to be the SDF, its differential wrt distance moved should be 1. This is what the Eikonal loss constitutes.

Network Hyperparameter Details:

Hyperparameter Value
n_layers_distance 6
n_hidden_neurons_distance 128
epochs 250
n_harmonic_functions_xyz 4
eikonal_weight 0.04

I experimented with the eikonal_weight hyperparameter to see how that affects training. The results are as shown below:

eikonal_weight Result
0.02 image
0.03 image
0.04 image
0.05 image
0.06 image
0.08 image

I found the best results with eikonal_weights = 0.04. This hyper-parameter regulates how much weightage in the loss is given to the eikonal_loss. We can visually see that while a decent weightage such as 0.02 to 0.04 results in a good SDF function with well defined geometry, increasing the weightage more than that 0.05 to 0.08 actually does not give enough weightage to the distance loss for the point clouds and the eikonal_loss overshadows the loss for the network and therefore the point clouds appear to be everywhere in the region.

3. VolSDF (30 pts)

Visualisation:

image image

Intuitive Explanation of alpha and beta

Alpha is to model the density value at the implicit surface. The density at the surface is given as $\frac{alpha}{2}.$ Below gif denotes this change in alpha from 0 to 10 at steps of 0.1 (beta kept constant at 1).

image

Beta defines how smoothly this density change from inside to outside the surface happens. The greater the beta, the smoother is the shift at the surface. Below gif denotes this change in beta from 0 to 30 at steps of 0.001 (alpha kept constant at 1).

image

Q1. How does high beta bias your learned SDF? What about low beta?

For a very high beta, a constant value of density will be predicted, no matter what the SDF will be. This constant value is $\frac{alpha}{2}$. This is visible in the image below. Predicting a constant value of density will bias the SDF to predict no changes due to no gradients since they do not add any information to the network.

image

A low beta would ideally model sharp density change at the surface. Basically the greater the beta, the smoother is the density shift. Since a very sharp shift has a gradient of infinity, the loss would go to NaN and nothing would be predicted.

image

Q2. Would an SDF be easier to train with volume rendering and low beta or high beta? Why?

SDF will be easier to train with a high beta since the gradient would be smooth and the rendering would be good. A low beta would give a very high gradient and the network will not learn.

Q3. Would you be more likely to learn an accurate surface with high beta or low beta? Why?

SDF should learn an accurate surface with a low beta. This is because the shift in density is sharper. At a surface, the SDF changes signs and I believe this will be modeled more accurately with a low beta, a better defined gradient to model the geometry.

Experiments:

For the purpose of hyper-parameter tuning, I varied the beta parameter to see the rendering results.

Experiment Number beta Rendered Image Rendered Geometry
1 0.0001 image Empty Mesh Predicted
2 0.005 image Empty Mesh Predicted
3 0.02 image Empty Mesh Predicted
4 0.03 image image
5 0.04 image image
6 0.05 image image
7 0.06 image image
8 0.07 image image
9 0.1 image image
10 1 image image
11 10 image image
12 100 image image

Explanation:

For very low beta, as expected, the loss goes to NaN, no mesh is predicted and the rendering density is 0. As the value of beta increases, we see that the geometry and rendering gets better and smoother as expected. At a very high value of beta, there is no gradient and the network does not learn anything, as can be seen from the geometry of very high beta. If seen with squinted eyes, at very high beta, the density value is also predicted as constant (near to alpha) and therefore extremely light change in density at the boundary can be observed.

4. Neural Surface Extras (CHOOSE ONE! More than one is extra credit)

4.1. Render a Large Scene with Sphere Tracing (10 pts)

Rendering a Scene using Sphere Tracing:

For this part, I rendered 24 primitives as shown below. A scene is traced using composition of these primitives. For a scene, the SDF is defined as the minimum distance from ANY of the primitives.

image

Rendered Primitives:

Spheres:

center=[5.0, 5.0, 5.0], radius=1.0)
center=[-5.0, 5.0, 5.0], radius=1.0)
center=[5.0, -5.0, 5.0], radius=1.0)
center=[-5.0, -5.0, 5.0], radius=1.0)
center=[5.0, 5.0, -5.0], radius=1.0)
center=[-5.0, 5.0, -5.0], radius=1.0)
center=[5.0, -5.0, -5.0], radius=1.0)
center=[-5.0, -5.0, -5.0], radius=1.0)

Toruses:

center=[3.0, 3.0, 3.0], radii=[1.0, 0.25]
center=[-3.0, -3.0, 3.0], radii=[1.0, 0.25]
center=[3.0, -3.0, 3.0], radii=[1.0, 0.25]
center=[-3.0, 3.0, 3.0], radii=[1.0, 0.25]
center=[3.0, 3.0, -3.0], radii=[1.0, 0.25]
center=[-3.0, -3.0, -3.0], radii=[1.0, 0.25]
center=[3.0, -3.0, -3.0], radii=[1.0, 0.25]
center=[-3.0, 3.0, -3.0], radii=[1.0, 0.25]

Boxes:

center=[7.0, 7.0, 7.0], side_lengths=[1, 1, 1]
center=[7.0, -7.0, 7.0], side_lengths=[1, 1, 1]
center=[-7.0, 7.0, 7.0], side_lengths=[1, 1, 1]
center=[-7.0, -7.0, 7.0], side_lengths=[1, 1, 1]
center=[7.0, 7.0, -7.0], side_lengths=[1, 1, 1]
center=[7.0, -7.0, -7.0], side_lengths=[1, 1, 1]
center=[-7.0, 7.0, -7.0], side_lengths=[1, 1, 1]
center=[-7.0, -7.0, -7.0], side_lengths=[1, 1, 1]

4.2 Fewer Training Views (10 pts)

I experimented with number of views and results are shown in the table below.

Rendered Image (100 views) Rendered Image (20 views) NeRF Rendering (20 views) Rendered Geometry (100 views) Rendered Geometry (20 views)
image image image image image

Comparison for VolSDF based on number of views:

For VolSDF, the rendered image has much better rendering with more views. Even the geometry is better. If seen clearly, the rendered geometry for 20 views has holes due to less information contained in the 20 viewpoints. The one with 100 views performs better in this respect.

Comparison between VolSDF and NeRF:

The rendering looks much better with NeRF if we take 20 views but with 100 views the rendering looks better for VolSDF. It can be intuitively understood since with more views, the network has more information for backpropagation of the predicted SDF and transformed density and VolSDF works better but with less views, NeRF has better capability to render since it directly predicts density.

4.3 Alternate SDF to Density Conversions (10 pts)

Here, I have implemented the NeuS paper for SDF to density conversion. I played around with the hyperparameter s.

Experiments:

Experiment Number s Rendered Image Rendered Geometry
1 10 image image
2 100 image image
3 1000 image Empty Mesh Predicted

Explanation:

From what I understand, similar to how a very high value of beta gives a very large gradient and the value of the loss goes to NaN, a very large s also gives a very large gradient and the loss indeed goes to NaN. For a decent value of s such as 100, the rendered image has very good and smooth gradient for density and as can be seen above. The geometry looks similar for both the s values since s essentially is controlling the value of the density.

Variation in gradient for s from 0 to 20 in steps of 1:

image