16-889 Assignment 4: Neural Surfaces

In this assignment, I learned how to implement sphere tracing to render different shapes. I also learned how to optimize a Neural SDF to predict a mesh from a point cloud, and how to convert an SDF to volume density to predict colour.


One late day was used.

1. Sphere Tracing

Sphere tracing is a technique to render implicit surfaces by using geometric distances. Rays are cast from various points and are marched along until they come into (close) contact with the surface. The process repeats for a maximum number of iterations to determine which rays come into contact with surfaces and at what distance the surface lies.


To render the torus, I used a maximum iteration count of 64 and determined that a ray came into contact with the surface when the implicit function distance was less than 0.0001. Below is the result.


2. Optimizing a Neural SDF

To render meshes from point clouds, we can train a Neural SDF to predict distances from points in space. A point that lies on the surface of the mesh should have a distance output of 0. To implement the SDF, I used a similar network to Assignment 3, however there was only a positional harmonic embedding and no view dependance. As well, there was no activation after after the final layer. The full architecture can be seen below.

Model Architecture

To train the network, mean-squared error loss was used for the points in the point cloud, where the predicted distance value should be zero. As well, an inverse exponential loss function was used for additional sampled points where the predicted distance should be greater than 0. Lastly, we use eikonal regularization to force the gradients to be of unit-2 norm. This is performed by normalizing the SDF gradients and using mse loss with an expected value of 1, although l1 loss was also tested. Then network was trained for 5000 epochs with a batch size of 4096 with an annealing learning rate. Below is a plot of the loss over epoch as well as the final result (which I think looks better than the provided example).



3. VolSDF

Lastly, we train a VolSDF network that both converts the SDF output into density and predicts colour. We then use this colour and density to render an image as we did in Assignment 3.


First, the network is adjusted to add a colour head. The colour head is the same as in Assignment 3. The can be seen below.

Model Architecture

Next, we convert SDF to density. This is done by using hte CDF of the Laplace distribution as described in section 3.1 of VolSDF. The two parameters used are α and β. α is a scaling factor that weights the output of the Laplace distribution CDF and β is the scale (mean absolute deviation) of the Laplace Distribution that controls the smoothness of the density.


How does high beta bias your learned SDF? What about low beta?: The smaller the β, the density is sharper and less smooth and converges to a scaled indicator function. The larger the β, the density is more smooth, which would result in regions outside the object to have higher density.


Would an SDF be easier to train with volume rendering and low beta or high beta? Why?: It would be easier to train with a higher β. This is for two reasons. The first is because β is in the denominator in the exponential. As β shrinks, there is computational cost, and while the result of the exponential approaches 0, the loss can become NAN. I actually saw this while training when using β as a learnable parameter, and had to modify the equation to use 1/β. The second reason for this is because a higher β produces less sharp results and demands a lower level of accuracy. However, the cost is high density in areas where there should be content, producing a less reallistic render.


Would you be more likely to learn an accurate surface with high beta or low beta? Why?: It would be more likely to create an accuracte surface with a lower β. This is because a lower β has sharper transition, and a higher β would be more likely to have higher densities in areas where there should be no density. However, it would be harder to train, as explained above.


The network I implemented is trained for 250 epochs using a batch size of 1024 and annealing learning rate starting at 0.0005. I found that increasing the initial learning rate did not have many noticeable end result differences as it got lower as training progressed, and decreasing it either had no notieceable effect or caused training to take longer. Therefore, I kept the default of provided to us. Increasing batch-size greatly increased training time, and decreasing it had a slightly worse visual effect.


For the α and β values, I tested training a VolSDF with them fixed and with them as trainable parameters. The provided values of α = 10 and β = 0.05 produced a reasonable result when fixed. However, utilizing α and β as trainable parameters, with 10 and 0.05 as initializations, produced better results as seen below.





4.1. Render a Large Scene with Sphere Tracing

First, I wanted to see if I could use a distance function for a different primitive, so I attempted to perform Sphere Tracing on a cone using the distance functions from here. Below is the result.

Sphere Traced Cone

I then took this one step further to see if I could use multiple primitives by combining the distance functions from a sphere, torus, and cone.

Sphere Tracing Mult-Prim

Then I thought it would be fun to create a model of a solar system: not our solar system, but one with more planets. The centre sphere is the main star, and the orbiting spheres are all planets. There are two astroid belts represented by the tori. There is a total of 20 primitives used.

Solar System (not ours)

4.2. Fewer Training Views

I first reduced the number training views to 20 and compared the VolSDF rendered image to that of NeRF from Assignment 3 with also 20 training views. As expected, the VolSDF results were worse compared to when trained with 100 views. What was interesting was that the NeRF rendering looked better.



Then I reduced the number of training views to 10. There were similar results, where the renderings where worse and NeRF performed slightly better.



Lastly, I trained on 5 views. The VolSDF rendering was worse but respectable seeing as only 5 views were used. What was interesting was that NeRF could not render an image using only 5 views. The conclusion drawn is that while NeRF can produce higher quality renderings with fewer views than VolSDF up to a certain point, it breaks down a lot quicker after a certain point as the number of views decrease.



4.3. Alternate SDF to Density Conversions

We compared the SDF to density conversation method from VolSDF to the naive approach from NeuS. How the naive approach works is it uses the logistic density distribution (derivative of sigmoid function) of the SDF to calculate density.


I tested this with both fixed and trainable standard deviations, both with an initialization values of 1/0.05. They both performed worse than the VolSDF method.




Conclusion

Through this assignment, I learned a lot about sphere tracing and SDFs. I was able to implement a neural SDF and train a VolSDF network, while evaluating the effect of modifying different hyperparameters and comparing learnable α and β parameters to fixed ones.