Command to run:
python -m a4.main --config-name=torus
This should save part_1.gif
in the `images' folder. Please include this in your submission along with a short writeup describing your implementation.
Implementation: Initializing the current position of all the points with origins, we move along the given direction vectors by a distance given by the implicit function for the corresponding points. The output of the implicit function is essentially the distance of the current position of the points with the closest surface. This I have performed iteratively until the following conditions fail:
a) max_iters != cfg.max_iters
: The max_iters reaches the max limit described in the cfg.
b) (implicit_fn(points) > epsilon).any()
: Until all the current points are either on surface or very close to the surface (epsilon distance apart).
Once the above iterations end, we now compute mask for all the points. If the implicit_fn(points) < epsilon
, then the points satisfying this condition are assigned a mask of 1. These points indicate the intersections with the surface.
In this part you need to:
This should save part_2_input.gif
and part_2.gif
in the images
folder. The former visualizes the input point cloud used for training, and the latter shows your prediction which you should include on the webpage alongwith brief descriptions of your MLP and eikonal loss. You might need to tune hyperparameters (e.g. number of layers, epochs, weight of regularization, etc.) for good results.
MLP Implementation:
I have tried two different MLP layer setup:
Setup 1 (Default cfg file): The MLP consists of 6 linear layers with 128 hidden units in each layer. I have used positional embedding with 4 PE levels.
n_harmonic_functions_xyz: 6
n_layers_distance: 6
n_hidden_neurons_distance: 128
append_distance: []
n_layers_color: 2
n_hidden_neurons_color: 128
append_color: []
Command to run with default hyperparameters:
python -m a4.main --config-name=points
|
Setup 2 (Paper implementation): The MLP consists of 8 linear layers with a skip connection at the fourth layer. Each layer consists of 256 hidden units. I have used positional embedding with 6 PE levels in this setup.
n_harmonic_functions_xyz: 6
n_layers_distance: 8
n_hidden_neurons_distance: 256
append_distance: [4]
n_layers_color: 4
n_hidden_neurons_color: 256
append_color: []
Command to run with default custom hyperparameters (according to the paper):
python -m a4.main --config-name=points_1
|
Eikonal Loss implementation: The norm of gradients should be close to 1 hence we have to minimize the squared difference between the norm of gradients and 1.
loss = torch.mean(torch.square(torch.linalg.norm(gradients, dim = 1) - 1))
As we can see from the above visualizations, setup 2 works really well in comparison to the first setup.
In this part, you will implement a function converting SDF -> volume density and extend the NeuralSurface
class to predict color. Experiment with hyper-parameters to and attach your best results on your webpage. Comment on the settings you chose, and why they seem to work well.
In your write-up, give an intuitive explanation of what the parameters alpha
and beta
are doing here. Also, answer the following questions:
A. The ideal sdf to density curve should be the one which decrease the density value at the surface boundary. alpha
can be assumed as the constant density and the parameter beta
is used to control how smoothly the density decreases at the boundary of the surface. Another intuitive explanation as mentioned in the class is that the degree of sharpness is controlled by 1/beta
.
How does high beta
bias your learned SDF? What about low beta
?
A. High beta does not generate sharp rendering whereas low beta generates sharp rendering. We can observe this in the below visualizations.
Would an SDF be easier to train with volume rendering and low beta
or high beta
? Why?
A. High beta
value is easier to train. Because with high beta
value the rendering gives an averaged density around the surface, due to this the mean square error loss for density is low. Hence this ensures faster convergence.
Would you be more likely to learn an accurate surface with high beta
or low beta
? Why?
A. We will be able to learn an accurate surface with low beta as there is clear distinction between the surface.
Visualizations with different beta
values:
Beta value | Bulldozer Geometry | Bulldozer Color |
---|---|---|
beta = 0.5 |
![]() |
![]() |
beta = 0.05 |
![]() |
![]() |
beta = 0.005 |
![]() |
![]() |
beta = 0.0005 |
![]() |
![]() |
With MLP Setup 1 (as described in q2) Command to run:
python -m a4.main --config-name=volsdf
With MLP Setup 2 (as described in q2) Command to run:
python -m a4.main --config-name=volsdf_1
Implementation: The architecture I have implemented is very similar to NeRF implementation (except here we do not have view dependence). We have two heads: a) color head and b) distance head. The distance head MLP implementation is explained in q2. The color head is basically the output of the first 8 linear layers (same backbone for distance head) and then further we have four layers with 256 hidden neurons in each layer. After the last linear layer of the color head we have a sigmoid to output the (r,g,b) values.
One main point to note here is that there is no ReLU after the last linear layer at the distance head because then all the points which have negative signed distance will also have zero signed distance, which means that the points inside the surface also get signed distance 0.
Observations from the above hyperparameter tuning:
beta
we can observe that the renderings are better for lower values of beta
as discussed above. beta
= 0.05 acts as a reasonable tradeoff in our case.Command to run:
python -m a4.main --config-name=custom
I have made my custom SDF function (CustomSDF class in implicit.py) for rendering a solar system, where I have four main group of objects:
a) Orbits: group of torus objects (8) each centered at (0.0, 0.0, 0.0).
b) Planets: group of spheres (8) each with center lying on each of the orbits.
c) Stars: group of tiny spheres (100) located at randomly generated centers lying within (0.0, 0.0, 0.0) and (9.0, 9.0, 9.0).
d) Rings for Saturn and Uranus: (3 + 1) torus centered around Saturn and Uranus.
When we compute the sdf distance of a particular point we consider the minimum distance among all the surfaces. Based on which surface it is closest to, the point corresponds to that surface. The color to that point is given based on the surface it belongs to.
We can see that VolSDF performs really well even with less number of views in comparison to NeRF. I have randomly sampled 20 views from the 100 views in the train set. If we uniformly sample the views, I think the performance of VolSDF would be better than the below results. We can observe from the below visualizations that the NeRF result is not that great for the unseen views.
Number of training views: 20
VolSDF | NeRF |
---|---|
![]() |
![]() |
Number of training views: 10
VolSDF | NeRF |
---|---|
![]() |
![]() |
Number of training views: 5
VolSDF | NeRF |
---|---|
![]() |
![]() |
‘Naive’ solution from the NeuS paper
To run the naive implementation, I have added another parameter in the config: sdf_q4. By setting this to True in the config we can run the following piece of code. I have run the experiments with different values of s. We can conclude that with increasing value of s we get sharper results as expected.
psi = s * torch.exp(- s * signed_distance)/(1 + torch.exp(- s * signed_distance))**2
s value | Bulldozer Geometry | Bulldozer Color |
---|---|---|
s = 10 |
![]() |
![]() |
s = 50 |
![]() |
![]() |
s = 80 |
![]() |
![]() |