Assignment 4
Name: Mayank Agarwal
Andrew ID: mayankag
Late Days Used: 1
1. Sphere Tracing (30pts)
You can run the code for part 1 with:
# mkdir images (uncomment when running for the first time)
python -m a4.main --config-name=torus
This should save part_1.gif
in the images
folder.
Implementation
Points are parametrized by the variable t
as discussed in the lecture. Precisely, for a ray defined by its origin
and direction
, we define point = origin + t * direction
. For all input rays defined by origins
: (N_rays, 3
) and directions
: (N_rays, 3
), we initialize points
to origins
by initializing t
with 0s. Next, we iteratively evaluate the sdf for points on all the rays and move by that amount along the respective ray directions. This can be achieved by updating the t
values for each point with the evaluated sdf values. We perform this operation of sphere tracking for max_iters
iterations.
We evaluate the sdf on the updated points given by point = origin + t * direction
at the end of max_iters
iterations of sphere tracing. For points with sdf (point) < eps
, they lie on the foreground. Rest are classified as background.
Instead of iteratively updating for each ray until sdf(point) > eps
, I decided to implement sphere tracing for a fixed number of iterations for all the N_rays
rays in a vectorized manner to speed up the implementation.
2. Optimizing a Neural SDF (30pts)
Command to train a NeuralSurface representation:
python -m a4.main --config-name=points
This should save and part_2.gif
in the images
folder.
Implementation
The SDF MLP used to predict distance for a particular input point is inspired from the architecture similar to NeRF MLP that we had implemented in the previous assignment. The MLP is a fully connected network with intermediate skip connections. Architecture is defined according to parameters defined in a4/configs/points.yaml
. In order to improve the learning of high-frequency details, I’ve also used harmonic embeddings to encode input points to a non-linear high frequency encoding. By doing this, we can overcome the inherent low frequency bias of neural networks. Since, the output of SDF can be any real value, the final layer has a linear activation function (unlike volumetric rendering where we predict density, which is always positive).
Since we use points sampled from a point cloud for training this network, we regress the predicted SDF distance to its ground truth value of 0 for these points.
Eikonal loss is used to enforce the norm of the gradient of SDF (implemented as MLP) to 1. We take L1 loss (mean across all points) between norm of gradient and 1. Additionally, torch.exp(-1e2 * torch.abs(eikonal_distances)).mean()
is used in order to prevent the network from predicting all distances as 0. Eikonal loss is taken over random points sampled in a bounding box.
3. VolSDF (30 pts)
Command to train an SDF on the lego bulldozer model
python -m a4.main --config-name=volsdf
Explanation:
alpha
: It corresponds to a constant scaling factor that governs the density at a point.
beta
: It controls how smoothly the density value changes at the surface. Higher beta
implies a smoother change at the surface, and a small beta
implies sharp change at the boundary.
1. How does high beta
bias your learned SDF? What about low beta
?
For high values of beta
, the network will predict almost the same density for all points. As a result, the network won’t be able to predict high frequency details in the scene, and would produce a smoothed out result. This is observed in the results below, where beta = 0.5
. On the contrary, for low values of beta
such as 0.05
, there is a sharp change in density at the surface boundaries. This results in sharper transitions at the boundaries and sharper renderings of the scene.
2. Would an SDF be easier to train with volume rendering and low beta
or high beta
? Why?
For low values of beta
we might observe very high gradients due to sharp changes at the boundary, which might make the training very unstable. High beta
values would however give smoother gradients.
3. Would you be more likely to learn an accurate surface with high beta
or low beta
? Why?
Since low beta
biases the model to learn sharper boundaries, I believe it will be more likely to learn an accurate surface with low beta
values.
Below are some renderings generated at different values of hyperparameter - beta
. We can observe that for very high or very low values of beta
, the network is not able to learn meaningful 3d geometry.
beta | Rendered Geometry | Rendered Color |
---|---|---|
0.005 | Empty Mesh | ![]() |
0.025 | ![]() |
![]() |
0.05 | ![]() |
![]() |
0.1 | ![]() |
![]() |
0.5 | ![]() |
![]() |
5 | Empty Mesh | ![]() |
4. Neural Surface Extras (CHOOSE ONE! More than one is extra credit)
4.1. Render a Large Scene with Sphere Tracing (10 pts)
I have defined a scene as a composition of 15 toruses, 3 spheres and 12 squares.
python -m a4.main --config-name=scene

4.2 Fewer Training Views (10 pts)
Added functionality for rendering with different number of training views in config files configs/nerf_lego.yaml
and a4/configs/volsdf.yaml
. Leave empty for using all training views.
data:
image_size: [128, 128]
dataset_name: lego
num_views:
Command to train an SDF on the lego bulldozer model
python -m a4.main --config-name=volsdf
Command to train a NeRF on the lego bulldozer model
python main.py --config-name=nerf_lego
num_views | VolSDF Geometry | VolSDF Image Rendering | NeRF Image Rendering |
---|---|---|---|
10 | ![]() |
![]() |
![]() |
20 | ![]() |
![]() |
![]() |
100 | ![]() |
![]() |
![]() |
Observation:
For large (100) and moderate (20) number of views, we can clearly see that NeRF performs a better job and novel view image rendering. However, for fewer views (10), VolSDF is able to generalize better as compared to NeRF. For 10 views, we can see a black cloud in the NeRF renderings.
4.3 Alternate SDF to Density Conversions (10 pts)
Command to train an SDF (NeuS paper) on the lego bulldozer model
python -m a4.main --config-name=neus
s | Rendered Geometry | Rendered Color |
---|---|---|
10 | ![]() |
![]() |
50 | ![]() |
![]() |
100 | ![]() |
![]() |
200 | Empty Mesh | ![]() |