Late Days Used: 4¶

alt

1. Sphere Tracing¶

python -m a4.main --config-name=torus

The output is written at images/part_1.gif.

alt

Description of implementation¶

I initialize t for all rays to be near (the specified lower limit). Then, I keep incrementing t for each ray (i.e., marching along the ray) by the SDF at the point p = origin + t * direction. I keep marching along each ray in this way until we hit the terminating condition for the ray, which is at least one of the following:

We have completed the max_iterations.
The value of t exceeds the set upper limit far.
The value of the SDF at the point p = origin + t * direction is less than $\epsilon$.

This gives us the value of t for every ray. The mask for every ray can be computed as:

the mask is true for any ray at which the SDF at the point p = origin + t * direction is less than $\epsilon$, and t < far, and this value was achieved before running out of our computational budget (max_iterations).
the mask is false for every other ray.

Then, for every ray where the mask is true, we can compute the intersection of the ray with the surface as the point p = origin + t * direction.

2. Optimizing a Neural SDF¶

python -m a4.main --config-name=points

The output is written at images/part_2_input.gif and images/part_2.gif.

alt

MLP: Simple feedforward network with only fully connected layers:

Input: harmonic embedding of point (4 harmonic functions).
Output: signed distance at the point.
6 hidden layers.
ReLU activation after each layer except the last one which had no activation.
128 neurons in each hidden layer.

Eikonal Loss: $$ \frac{1}{N} \sum_p | \|\nabla_p f\|_2 - 1 | $$

where $f$ = SDF, $p$ = 3D point, $N$ = number of points.

The eikonal loss ensures that the norm of the gradient at every point is unity. The further the norm of the gradient is from one, the larger this loss is. This constrains the network to learn an SDF instead of an arbitrary function.

3. VolSDF¶

Intuitive explanation of alpha, beta:

alpha is the maximum density inside the surface.
beta controls the steepness of the fall of the density as we move from inside the surface (with maximum density alpha) to outside the surface (with density decaying to zero).

How does high beta bias your learned SDF? What about low beta?

A high value of beta means that the drop of density from inside the surface (where it is alpha) to outside the surface (where it decays to zero) is very gradual. This means that a slight change in the SDF value causes a slight change in the density value. This means that the SDF will be biased to be smoother/thicker/more diffused and cause regions around the "real" surface to be more opaque (have non-zero density).

In contrast, a low value of beta means that the drop of density from inside to outside the surface is very sharp (in fact, as beta tends to zero, the density approaches a step function from alpha inside the surface to zero outside the surface). A low beta will thus bias the SDF to be thinner/more concentrated and cause regions around the "real" surface to be less opaque, because a slight change in the SDF value will lead to a large change in the density value.

Would an SDF be easier to train with volume rendering and low beta or high beta? Why?

An SDF is easier to train with volume rendering with a high beta. This is because the concept of volume rendering relies on sampling multiple points along each ray and combining/weighting their appearances to form the appearance of a single pixel. Thus, with a high beta, a larger number of points have a non-zero density and a larger number of points are used to compute a pixel's value. This means that the gradients are backpropagated through a higher number of points at a time, leading to dense gradients/learning and faster convergence.

Note: Here, I've only reasoned about which SDF would be easier to train -- that is, I've focused on the ease of optimization and not necessarily on which SDF would be more accurate.

Would you be more likely to learn an accurate surface with high beta or low beta? Why?

An accurate surface would be modeled by a low beta, since in the limit of vanishing beta, the density approaches a step function. The optimization may be tricky, but in principle a lower value of beta would encourage a sharp and accurate boundary/surface.

I created a new class NeuralSurfaceWithColor for this question.

python -m a4.main --config-name=volsdf

The results will be saved in images/<cfg.training.checkpoint_path>/part_3_<epoch>.gif and images/<cfg.training.checkpoint_path>/part_3_geometry_<epoch>.gif.

The values of alpha, beta can be controlled by setting their values in the volsdf.yaml config. cfg.neus should be set to False for this question.

Comment on the settings you chose, and why they seem to work well:

For the MLP hyperparameters, I used the same hyperparameter values as for NeRF (for number of hidden layers, number of units in hidden layers for distance and color, etc).

As in NeRF, I added a skip connection feeding in the input point again further in the network when predicting color. I experimented with using the raw coordinate versus using the harmonic encoding of the coordinate in the skip connection. As we know, using harmonic encoding of the coordinates in the input is crucial for learning to represent high-frequency details. However, it seems that in the skip connection, the network can model high-frequency details even with the raw coordinates themselves. This behavior is controlled by cfg.embedding_in_skip which must be set to False for passing raw coordinates in skip connection and vice versa.

Raw coordinates in skip connection	Harmonic embedding in skip connection

I experimented with different values of alpha ($\alpha$) in the density conversion. $\alpha = 10$ seems to give the best representation of geometry as well as texture. $\alpha = 50$ has a crisper image but poor geometry.

$\alpha = 5$	$\alpha = 10$	$\alpha = 50$

$\beta = 0.05$ fixed for the above experiments.

I also experimented with different values of beta ($\beta$) in the density conversion. $\beta = 0.05$ seems to give the best representation of geometry as well as texture.

$\beta = 0.05$	$\beta = 0.1$	$\beta = 0.2$

$\alpha = 10$ fixed for the above experiments.

4. Neural Surface Extras¶

4.1 Render a Large Scene with Sphere Tracing¶

python -m a4.main --config-name=composite

The output will be stored in images/part_1.gif.

I defined a SphereCollectionSDF which consists of cfg.n_spheres number of spheres sampled randomly within a certain interval (min and max specified in the config).

40 spheres rendered in different locations:

alt

100 spheres:

alt

4.2 Fewer Training Views¶

For this part, uncomment the lines 125-128 in dataset.py. The fewer number of views can be set as num_views on line 125 in dataset.py.

SDF: python -m a4.main --config-name=volsdf

NeRF: python main.py --config-name=nerf_lego

`num_views` = 10	`num_views` = 20

First row: NeRF renderings. Second row: VolSDF renderings. Third row: VolSDF geometry.

We see that the surface-based VolSDF can learn using as few as 10 views where the NeRF fails completely.

4.3 Alternate SDF to Density Conversions¶

I implemented the 'naive' solution from the NeuS paper in the sdf_to_neus_density function inside a4/renderer.py.

Here are the learned SDF and geometry for different values of the hyperparameter s:

$s = 10$	$s = 50$	$s = 100$

The image seems better as compared to the VolSDF paper's renderings, but the learned geometry is more error-prone.