I followed the equation in the lecture slides in implementation. I first initialized the points to be the origins.
At each step, for each point, I get the distance to the nearest surface point from the implicit function. Then I
move each point by such a distance based on their input directions. I repeat this step for each point until their
distance to the surface is below a threshold or the max_step reached. The values of the mask are 1 for points which
hit the surface and within the range of self.near and self.far.
2. Optimizing a Neural SDF (30pts)
part_2_input.gif
part_2.gif
My MLP for feature extractor is 6 layers FFN with ReLU. The hidden size is 256 with a skip-connection for input at layer 3.
The distance decoder is a one-layer linear mapping.
The eikonal loss is computing the MSE between the distance gradient and 1.
3. VolSDF (30 pts)
part_3_geometry.gif
part_3.gif
With a lower beta, the edge could be sharper as a small distance outside the surface will results a near-zero density while insider is near-one.
On the contrary, higher beta means the edge is more blurred.
The SDF be easier to train with volume rendering with a higher beta. This is because higher beta models more smoothed edges s.t. multiple points
on the ray could contribute to the pixel value.
It's more likely to learn an accurate surface with lower beta. This is because lower beta models sharper edges so that we could enforce the geometry
better by avoiding blurring surfaces.
I used the default alpha (10.0) and beta (0.05) because a max density 10 suffices and a beta of 0.05 gives a sharp edge.
They also emperically this setting gives good results.
I tuned the hyperparameters of the num_epochs to 150, lr_scheduler_gamma to 0.1 and inter_weight to 1.0 considering the
emperical results and the better regularize the shape.
For the model, similar to NeRF, my MLP for feature extractor is 6 layers FFN with ReLU. The hidden size is 256 with a
skip-connection for input at layer
3. The distance decoder is a one-layer linear mapping and the color decoder is a three-layer FFN with ReLU in-between
and sigmoid at the end.
4. Neural Surface Extras (CHOOSE ONE! More than one is extra credit)
4.2 Fewer Training Views (10 pts)
I compared the NeRF and VolSDF on lego with 20, 10, and 5 views below.
nerf_20views.gif
part_3_20views.gif
part_3_20views_geometry.gif
nerf_10views.gif
part_3_10views.gif
part_3_10views_geometry.gif
nerf_5views.gif
part_3_5views.gif
part_3_5views_geometry.gif
Since NeRF does not have the geometry constrains, it has some unexpected pattern in the bottom of the image (looks like reflection).
Meanwhile, as the number of views reduced to 5, the NeRF's performance drops significantly.
On the contrary, the VolSDF constrains the model to be objects' geometry. So there are no random patterns outside the object and the
performance doesn't drop much as we reduce the number of views.
Since the NeRF does not have the geometry constrains, it has more freedom to fit the visual data (images).
So it has a little bit clearer appearance in the lego body part with enough number of views.
4.3 Alternate SDF to Density Conversions (10 pts)
I tried the 'naive' solution from the NeuS paper with s=30, and its results are shown below.
part_3_naive_geometry.gif
part_3_naive.gif
When comparing to the equation in Q3, the 'naive' solution will have 0 density inside the geometry. So for inner part of
the lego is non-solid, and for thin rods - two arms - of the lego, the 'naive' solution may predict non-continues balls due to the network error.
Thus, in this case, the equation in Q3 performs better. However, for 'naive' solution may be better predicting non-solid objects like the pipes
because the inside of those objects will have a density of 0.