16-889 Assignment 4

Name: Chonghyuk Song

Andrew ID: chonghys

Late days used

image

1. Sphere Tracing (30pts)

For a max number of iterations, as specified by self.max_iters, we move along each ray by an amount specified by the SDF value at the point along the ray for the current iteration.

After the max number of iterations, the algorithm finds the rays for which the sdf values of the converged points is smaller than a pre-defined threshold (1e-6), which can be deemed as rays that have intersected with the object surface defined by the SDF.

image

2. Optimizing a Neural SDF (30pts)

image image

The Neural SDF used to produce the first rendering is a MLP $F(x; \theta) \in \mathbb{R}$ that consists of 6 hidden layers of width 128 with Softplus activations ($\beta = 100$), followed by a final linear layer. We use a sinusoidal mapping of the input coordinates $x$ à la [1] (commonly referred to as "positional encodings") as the input to the Neural SDF to better represent high-frequency aspects of the scene.

In order to encourage the MLP $F$ to actually behave as a SDF, we use the following eikonal loss as regularization, where $N$ denotes the batchsize and $||\cdot||$ denotes the L2-norm:

$$\textrm{loss}_{eikonal} = \dfrac{1}{N}(||\nabla_{x}F(x;\theta)|| - 1)^{2}$$

[1] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.

3. VolSDF (30pts)

VolSDF [2] proposes a novel parametrization for volume density - the Laplace's cumulative distribution function (CDF) applied to a signed distance function (SDF) representation:

$$\sigma(x) = \alpha \Psi_{\beta}(-d_{\Omega}(x))$$

where $\Psi_{\beta}$ is the CDF of the Laplace distribution with zero mean and $\beta$ scale:

$$\Psi_{\beta}(s) = \begin{cases} \frac{1}{2}\textrm{exp} \left(\frac{s}{\beta} \right) \quad\quad\quad\quad \textrm{if } s \leq 0 \\ 1 - \frac{1}{2}\textrm{exp} \left(-\frac{s}{\beta} \right) \quad\ \ \textrm{if } s > 0 \end{cases}$$

As demonstrated in the following figure, the proposed volume density parametrization can be intuitively understood as follows - the object whose geometry is modeled by this density has constant density of $\alpha$, which smoothly starts decreasing in the neighbourhood of the object's boundary with a smoothness controlled by $\beta$:

image

  1. Since $\beta$ controls the smoothness of Laplace's CDF, a high $\beta$ biases the learned SDF to be smoother, whereas a low $\beta$ biases the learned SDF to better model the high frequency details of the object's surface:

beta = 0.05 (alpha = 10.0)

image image

beta = 0.10 (alpha = 10.0)

image image

beta = 0.20 (alpha = 10.0)

image image

beta = 0.50 (alpha = 10.0)

image image

  1. However, that doesn't necessarily mean that lower $\beta$ makes it easier to train the Neural SDF. From the diagram above, we can see that as $\beta$ approaches 0, the density curve converges to a scaled indicator function (which equals 1 for negative sdf values and 0 for non-negative sdf values), foreshadowing potential gradient-vanishing issues during backpropagation. In other words, as $\beta$ approaches 0, the range of sdf values for which will receive strong-enough backpropagation signals through the Laplace CDF grows smaller, making it more difficult to train the Neural SDF.

  2. Therefore, in order to $\textit{learn}$ an accurate surface with a Neural SDF, one must choose an intermediate value of $\beta$, which balances the Neural SDF's ability to model high frequency details of the object's surface with the difficulty of training the very same Neural SDF. We choose the intermediate value of $\beta = 0.05$ and train the Neural SDF for 250 epochs ($\alpha = 10$), resulting in the following free-viewpoint renderings:

image image

[2] Lior Yariv, Jiatao Gu, Yoni Kasten, Yaron Lipman. Volume rendering of neural implicit surfaces. In 35th Conference on Neural Information Processing Systems (NeurIPS), 2021.

4.2 Fewer Training Views (10 pts)

VolSDF ($\alpha = 10., \beta = 0.05$) trained on 10 views:

image image

NeRF trained on 10 views:

image

It can be seen that while the reconstructed scene content for VolSDF is restricted to within the object boundary, the reconstructed scene for NeRF in the sparse-view setting is relatively dispersed even outside the object boundary, as evidenced by the colored artifacts throughout the free-viewpoint renderings. In other words, the lack of regularization on NeRF's scene representation makes it inferior to VolSDF, which has an explicit surface regularization, in sparse-view reconstruction.

4.3 Alternate SDF to Density Conversions (10 pts)

In this section, we experiment with an alternative sdf-to-density transform. One example of such a transform is the logistic density distribution (the derivative of the Sigmoid function), which was proposed as a naive baseline for sdf-to-density transform in Neus [3]:

$$\sigma(x) = \dfrac{se^{-sf(x)}}{(1 + e^{-sf(x)})^{2}}$$

where $f(x)$ denotes the signed distance function. The standard deviation of the logistic density distribution is given by $1/s$, which we set to 0.05 (i.e. $s = 20$) for the sake of fair comparison with VolSDF:

VolSDF $(\alpha = 10, \beta = 0.05)$

image image

Neus-naive $(s = 20)$

image image

As it can be seen, the Neus' naive sdf-to-density transform results in geometry with severe artifacts and hence less accurate renderings compared to VolSDF. This can potentially be attributed to the fact that the logistic density distribution function results in a biased weight function for the volume rendering equation i.e., the weight function is not locally maximized at the intersection of the camera ray with the zero-level set of SDF, and therefore the intersection is not the point that ends up contributing the most to the rendered pixel color.

[3] Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In 35th Conference on Neural Information Processing Systems (NeurIPS), 2021.