Surface Rendering: Assigment 4 (16-889)
Name: Shefali Srivastava
Andrew ID: shefalis
Late Days:
1. Sphere Tracing (30pts)
Visualisation:
Write Up for Implementation:
For implementing Sphere Tracing, I used the algorithm discussed in the lecture. At any given point on any ray, we move along the ray as many units as is the current SDF at that point. The logic, as discussed, is that moving SDF units along the ray would be conservative and we will not cross the surface since it is the minimum distance to the nearest point on the surface.
For implementation of multiple rays at once, I ran sphere tracing for a total of max_steps
that I defined as 100
for all the rays. I started at origin for all the rays. After those steps, I checked the implicit function for all the N
points of the N
rays. For any point that has its implicit function evaluated to a very small value (epsilon taken as 1e-5 here) we say that it is on the surface. So points
is an array of N
that is defined in the x + t * d
format where x
is the camera origin and d
is the direction of the ray. Parameterised by t, it defines the point in space. mask
defines whether a point lies on the implicit surface or not and is also an array of size N
defined wrt the taken threshold.
2. Optimizing a Neural SDF (30pts)
Visualisation:
Brief Description of MLP:
Similar to NeRF in the previous assignment, I have implemented a backbone network with Linear layers on top.
- The first step is Harmonic Embeddings of the points.
- The backbone network contains
6
linear layers with input of the first layer as3
neurons to denote input point(x, y, z)
and all the intermediate neurons equal to128
. - On top, a linear layer is again applied with input neurons as
128
and output neurons also as128
.ReLU
activation is used everywhere in the network. On top of this, a linear layer that outputs the predictedSDF
is applied which is naturally a layer with128
input neurons and1
output neuron. The difference here wrt the previous assignment where we predicteddensity
instead ofSDF
is that theSDF
is a distance and not constrained to lie between [0, 1) so I have not applied a sigmoid layer at the end.
Brief Description of Eikonal Loss:
The Eikonal Loss is a geometric regulariser to constrict the network to represent a function that is indeed a Signed Distance Function. The loss is defined as: $|\Delta f| = 1$. That means, the norm of the gradient is equal to 1.
This is intuitive. Let me explain: Let us assume an x-y
plane as our 2D surface and the 3D point (0, 0, 1)
. The direction of the steepest ascent would be along the z
axis. The units moved along the z
axis would be the change in SDF from the surface as well. Therefore for a function to be the SDF, its differential wrt distance moved should be 1. This is what the Eikonal loss constitutes.
Network Hyperparameter Details:
Hyperparameter | Value |
---|---|
n_layers_distance | 6 |
n_hidden_neurons_distance | 128 |
epochs | 250 |
n_harmonic_functions_xyz | 4 |
eikonal_weight | 0.04 |
I experimented with the eikonal_weight
hyperparameter to see how that affects training. The results are as shown below:
eikonal_weight | Result |
---|---|
0.02 | ![]() |
0.03 | ![]() |
0.04 | ![]() |
0.05 | ![]() |
0.06 | ![]() |
0.08 | ![]() |
I found the best results with eikonal_weights = 0.04
. This hyper-parameter regulates how much weightage in the loss is given to the eikonal_loss
. We can visually see that while a decent weightage such as 0.02 to 0.04
results in a good SDF function with well defined geometry, increasing the weightage more than that 0.05 to 0.08
actually does not give enough weightage to the distance
loss for the point clouds and the eikonal_loss
overshadows the loss for the network and therefore the point clouds appear to be everywhere in the region.
3. VolSDF (30 pts)
Visualisation:
Intuitive Explanation of alpha
and beta
Alpha
is to model the density value at the implicit surface. The density at the surface is given as $\frac{alpha}{2}.$ Below gif denotes this change in alpha
from 0 to 10 at steps of 0.1 (beta
kept constant at 1).
Beta
defines how smoothly this density change from inside to outside the surface happens. The greater the beta
, the smoother is the shift at the surface. Below gif denotes this change in beta from 0 to 30 at steps of 0.001 (alpha
kept constant at 1).
Q1. How does high beta bias your learned SDF? What about low beta?
For a very high beta
, a constant value of density will be predicted, no matter what the SDF will be. This constant value is $\frac{alpha}{2}$. This is visible in the image below. Predicting a constant value of density will bias the SDF to predict no changes due to no gradients since they do not add any information to the network.
A low beta
would ideally model sharp density change at the surface. Basically the greater the beta, the smoother is the density shift. Since a very sharp shift has a gradient of infinity, the loss would go to NaN and nothing would be predicted.
Q2. Would an SDF be easier to train with volume rendering and low beta or high beta? Why?
SDF will be easier to train with a high
beta since the gradient would be smooth and the rendering would be good. A low
beta would give a very high gradient and the network will not learn.
Q3. Would you be more likely to learn an accurate surface with high beta or low beta? Why?
SDF should learn an accurate surface with a low
beta. This is because the shift in density is sharper. At a surface, the SDF changes signs and I believe this will be modeled more accurately with a low
beta, a better defined gradient to model the geometry.
Experiments:
For the purpose of hyper-parameter tuning, I varied the beta
parameter to see the rendering results.
Experiment Number | beta | Rendered Image | Rendered Geometry |
---|---|---|---|
1 | 0.0001 | ![]() |
Empty Mesh Predicted |
2 | 0.005 | ![]() |
Empty Mesh Predicted |
3 | 0.02 | ![]() |
Empty Mesh Predicted |
4 | 0.03 | ![]() |
![]() |
5 | 0.04 | ![]() |
![]() |
6 | 0.05 | ![]() |
![]() |
7 | 0.06 | ![]() |
![]() |
8 | 0.07 | ![]() |
![]() |
9 | 0.1 | ![]() |
![]() |
10 | 1 | ![]() |
![]() |
11 | 10 | ![]() |
![]() |
12 | 100 | ![]() |
![]() |
Explanation:
For very low beta
, as expected, the loss goes to NaN
, no mesh is predicted and the rendering density is 0
. As the value of beta
increases, we see that the geometry and rendering gets better and smoother as expected. At a very high value of beta
, there is no gradient and the network does not learn anything, as can be seen from the geometry of very high beta
. If seen with squinted eyes, at very high beta
, the density value is also predicted as constant (near to alpha
) and therefore extremely light change in density at the boundary can be observed.
4. Neural Surface Extras (CHOOSE ONE! More than one is extra credit)
4.1. Render a Large Scene with Sphere Tracing (10 pts)
Rendering a Scene using Sphere Tracing:
For this part, I rendered 24 primitives as shown below. A scene is traced using composition of these primitives. For a scene, the SDF is defined as the minimum distance from ANY of the primitives.
Rendered Primitives:
Spheres:
center=[5.0, 5.0, 5.0], radius=1.0)
center=[-5.0, 5.0, 5.0], radius=1.0)
center=[5.0, -5.0, 5.0], radius=1.0)
center=[-5.0, -5.0, 5.0], radius=1.0)
center=[5.0, 5.0, -5.0], radius=1.0)
center=[-5.0, 5.0, -5.0], radius=1.0)
center=[5.0, -5.0, -5.0], radius=1.0)
center=[-5.0, -5.0, -5.0], radius=1.0)
Toruses:
center=[3.0, 3.0, 3.0], radii=[1.0, 0.25]
center=[-3.0, -3.0, 3.0], radii=[1.0, 0.25]
center=[3.0, -3.0, 3.0], radii=[1.0, 0.25]
center=[-3.0, 3.0, 3.0], radii=[1.0, 0.25]
center=[3.0, 3.0, -3.0], radii=[1.0, 0.25]
center=[-3.0, -3.0, -3.0], radii=[1.0, 0.25]
center=[3.0, -3.0, -3.0], radii=[1.0, 0.25]
center=[-3.0, 3.0, -3.0], radii=[1.0, 0.25]
Boxes:
center=[7.0, 7.0, 7.0], side_lengths=[1, 1, 1]
center=[7.0, -7.0, 7.0], side_lengths=[1, 1, 1]
center=[-7.0, 7.0, 7.0], side_lengths=[1, 1, 1]
center=[-7.0, -7.0, 7.0], side_lengths=[1, 1, 1]
center=[7.0, 7.0, -7.0], side_lengths=[1, 1, 1]
center=[7.0, -7.0, -7.0], side_lengths=[1, 1, 1]
center=[-7.0, 7.0, -7.0], side_lengths=[1, 1, 1]
center=[-7.0, -7.0, -7.0], side_lengths=[1, 1, 1]
4.2 Fewer Training Views (10 pts)
I experimented with number of views and results are shown in the table below.
Rendered Image (100 views) | Rendered Image (20 views) | NeRF Rendering (20 views) | Rendered Geometry (100 views) | Rendered Geometry (20 views) |
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
Comparison for VolSDF based on number of views:
For VolSDF, the rendered image has much better rendering with more views. Even the geometry is better. If seen clearly, the rendered geometry for 20 views has holes due to less information contained in the 20 viewpoints. The one with 100 views performs better in this respect.
Comparison between VolSDF and NeRF:
The rendering looks much better with NeRF if we take 20 views but with 100 views the rendering looks better for VolSDF. It can be intuitively understood since with more views, the network has more information for backpropagation of the predicted SDF and transformed density and VolSDF works better but with less views, NeRF has better capability to render since it directly predicts density.
4.3 Alternate SDF to Density Conversions (10 pts)
Here, I have implemented the NeuS paper for SDF to density conversion. I played around with the hyperparameter s
.
Experiments:
Experiment Number | s | Rendered Image | Rendered Geometry |
---|---|---|---|
1 | 10 | ![]() |
![]() |
2 | 100 | ![]() |
![]() |
3 | 1000 | ![]() |
Empty Mesh Predicted |
Explanation:
From what I understand, similar to how a very high value of beta
gives a very large gradient and the value of the loss goes to NaN
, a very large s
also gives a very large gradient and the loss indeed goes to NaN
. For a decent value of s
such as 100
, the rendered image has very good and smooth gradient for density and as can be seen above. The geometry looks similar for both the s
values since s
essentially is controlling the value of the density.
Variation in gradient for s from 0 to 20 in steps of 1: