python -m a4.main --config-name=torus
The output is written at images/part_1.gif
.
I initialize t
for all rays to be near
(the specified lower limit). Then, I keep incrementing t
for each ray (i.e., marching along the ray) by the SDF at the point p = origin + t * direction
. I keep marching along each ray in this way until we hit the terminating condition for the ray, which is at least one of the following:
max_iterations
.t
exceeds the set upper limit far
.p = origin + t * direction
is less than $\epsilon$.This gives us the value of t
for every ray. The mask for every ray can be computed as:
true
for any ray at which the SDF at the point p = origin + t * direction
is less than $\epsilon$, and t
< far
, and this value was achieved before running out of our computational budget (max_iterations
).false
for every other ray.Then, for every ray where the mask is true
, we can compute the intersection of the ray with the surface as the point p = origin + t * direction
.
python -m a4.main --config-name=points
The output is written at images/part_2_input.gif
and images/part_2.gif
.
MLP: Simple feedforward network with only fully connected layers:
Eikonal Loss: $$ \frac{1}{N} \sum_p | \|\nabla_p f\|_2 - 1 | $$
where $f$ = SDF, $p$ = 3D point, $N$ = number of points.
The eikonal loss ensures that the norm of the gradient at every point is unity. The further the norm of the gradient is from one, the larger this loss is. This constrains the network to learn an SDF instead of an arbitrary function.
Intuitive explanation of alpha
, beta
:
alpha
is the maximum density inside the surface.beta
controls the steepness of the fall of the density as we move from inside the surface (with maximum density alpha
) to outside the surface (with density decaying to zero).beta
bias your learned SDF? What about low beta
?A high value of beta
means that the drop of density from inside the surface (where it is alpha
) to outside the surface (where it decays to zero) is very gradual. This means that a slight change in the SDF value causes a slight change in the density value. This means that the SDF will be biased to be smoother/thicker/more diffused and cause regions around the "real" surface to be more opaque (have non-zero density).
In contrast, a low value of beta
means that the drop of density from inside to outside the surface is very sharp (in fact, as beta
tends to zero, the density approaches a step function from alpha
inside the surface to zero outside the surface). A low beta
will thus bias the SDF to be thinner/more concentrated and cause regions around the "real" surface to be less opaque, because a slight change in the SDF value will lead to a large change in the density value.
beta
or high beta
? Why?An SDF is easier to train with volume rendering with a high beta
. This is because the concept of volume rendering relies on sampling multiple points along each ray and combining/weighting their appearances to form the appearance of a single pixel. Thus, with a high beta
, a larger number of points have a non-zero density and a larger number of points are used to compute a pixel's value. This means that the gradients are backpropagated through a higher number of points at a time, leading to dense gradients/learning and faster convergence.
Note: Here, I've only reasoned about which SDF would be easier to train -- that is, I've focused on the ease of optimization and not necessarily on which SDF would be more accurate.
beta
or low beta
? Why?An accurate surface would be modeled by a low beta
, since in the limit of vanishing beta
, the density approaches a step function. The optimization may be tricky, but in principle a lower value of beta
would encourage a sharp and accurate boundary/surface.
I created a new class NeuralSurfaceWithColor
for this question.
python -m a4.main --config-name=volsdf
The results will be saved in images/<cfg.training.checkpoint_path>/part_3_<epoch>.gif
and images/<cfg.training.checkpoint_path>/part_3_geometry_<epoch>.gif
.
The values of alpha
, beta
can be controlled by setting their values in the volsdf.yaml
config. cfg.neus
should be set to False
for this question.
Comment on the settings you chose, and why they seem to work well:
For the MLP hyperparameters, I used the same hyperparameter values as for NeRF (for number of hidden layers, number of units in hidden layers for distance and color, etc).
As in NeRF, I added a skip connection feeding in the input point again further in the network when predicting color. I experimented with using the raw coordinate versus using the harmonic encoding of the coordinate in the skip connection. As we know, using harmonic encoding of the coordinates in the input is crucial for learning to represent high-frequency details. However, it seems that in the skip connection, the network can model high-frequency details even with the raw coordinates themselves. This behavior is controlled by cfg.embedding_in_skip
which must be set to False
for passing raw coordinates in skip connection and vice versa.
Raw coordinates in skip connection | Harmonic embedding in skip connection |
---|---|
![]() |
![]() |
![]() |
![]() |
I experimented with different values of alpha ($\alpha$) in the density conversion. $\alpha = 10$ seems to give the best representation of geometry as well as texture. $\alpha = 50$ has a crisper image but poor geometry.
$\alpha = 5$ | $\alpha = 10$ | $\alpha = 50$ |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
$\beta = 0.05$ fixed for the above experiments.
I also experimented with different values of beta ($\beta$) in the density conversion. $\beta = 0.05$ seems to give the best representation of geometry as well as texture.
$\beta = 0.05$ | $\beta = 0.1$ | $\beta = 0.2$ |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
$\alpha = 10$ fixed for the above experiments.
python -m a4.main --config-name=composite
The output will be stored in images/part_1.gif
.
I defined a SphereCollectionSDF
which consists of cfg.n_spheres
number of spheres sampled randomly within a certain interval (min
and max
specified in the config).
40 spheres rendered in different locations:
100 spheres:
For this part, uncomment the lines 125-128 in dataset.py
. The fewer number of views can be set as num_views
on line 125 in dataset.py
.
SDF: python -m a4.main --config-name=volsdf
NeRF: python main.py --config-name=nerf_lego
num_views = 10 |
num_views = 20 |
---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
First row: NeRF renderings. Second row: VolSDF renderings. Third row: VolSDF geometry.
We see that the surface-based VolSDF can learn using as few as 10 views where the NeRF fails completely.
I implemented the 'naive' solution from the NeuS paper in the sdf_to_neus_density
function inside a4/renderer.py
.
Here are the learned SDF and geometry for different values of the hyperparameter s
:
$s = 10$ | $s = 50$ | $s = 100$ |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The image seems better as compared to the VolSDF paper's renderings, but the learned geometry is more error-prone.