CMU-16889 Learning for 3D Vision - HW4


1. Sphere Tracing

The sphere tracing is an iterative process. I first compute the signed distance of the origin point using the implicit function. The signed distance is the distance from the origin point to the surface of the closet object in the scene, and is the distance that the origin point to move along the given directions. A new point is then updated using the signed distance and the direction. Getting the new current point, I repeat the process until the distance is smaller than the threshold or reaching the maximum number of iterations.
In the implementation side, I keep two N_raysx3 arrays of points (initialized with origin point coordinate) and masks (initialized with zeros). The mask is for tracking the point of which ray has finished updating (mask value = 1). The point of a specific view in the point array is updated when the SDF value is smaller than 1e-5 and the mask value is 0. The iterative loop ends when all the rays have finished updating or the maximum number of iterations is reached.

2. Optimizing a Neural SDF

My MLP: The input points are first transformed into harmonic embeddings. Then, the embedding first passes through 3 fully connected layers with hidden dimension 128. Following the design of NeRF, the output feature is then concatenated with the harmonic embedding, and fed to another 3 fully connected layers with hidden dimension 128 and a final fully connected layer with output dimension 1 to predict distance.
Eikonal loss: The eikonal constraint is that the norm of the gradient of distance with respect to the input points should be equal to 1. So the implementation is to compute the norm of the gradient and subtract it from 1. In other words: |norm(grad) - 1|.
I train for 10000 epochs to get the result below.

3. VolSDF

The parameter alpha indicates the scale of the density function.
The parameter beta indicates the sharpness of drop of the density function.
Q1. High beta will leads to a smooth drop while low beta will leads to a sharp drop. In other words, high beta may bias the learned SDF to have a less sharp boundary of the object, while low beta may bias the learned SDF to have a more sharp boundary.
Q2. An SDF can be easier to train with high beta. As the SDFtodensity function is smoother, the gradient of the function at large and small SDF values are larger than the ones using low beta. With larger gradients, the SDF may converge faster and be easier to train.
Q3. It's more likely to learn an accurate surface with a small beta because the density function only output a close-to-zero value when the point is close to the surface.
The first column of the result I show below is by setting beta = 0.005. The second column is with the default setting which beta = 0.05. The first row is the 360 degree view of the geometry. With smaller beta, the geometry is shaper in some fine structures, such as bucket (second column) and wheels (third column). This is as expected by the answer to the above questions. Lower beta can have a more accurate surface because only when the query point is close to the surface would it have a lower loss. However, surpisingly, the result of lower beta has less accurate floor geometry.
Below I compare the color output. Lower beta also has a more accurate color output. It's possibly because, with better regularization for the distance function, it has better feature to be shared with the color branch in the implicit model.

4. Neural Surface Extras

4.2 Fewer Training Views
I use 20 training views to train the VolSDF solution and a NeRF solution. The network settings are all the same except the VolSDF outputs distance to be converted to density while NeRF outputs density directly.
Below I first compare the geometry result using 20 views and 100 views in the VolSDF setting (Left image: 20 views. Right image: 100 views). The result shows that using fewer views can still generate a nice geometry -- not accurate details but nice overall structure.
Next, I compare the color result using 20 views in the VolSDF setting and the NeRF setting. (Left image: VolSDF. Right image: NeRF). I set beta=0.01 in VolSDF setting. Theorically, the VolSDF result would have better generalization. As one can see VolSDF's color result is less blurry. This supports the theory that it can have better generalization in different views.
4.3 Alternate SDF to Density Conversions
I implement the SDF to density function in NeuS paper, which is also known as the logistic density function. Below is the result using this function with s=80 and eikonal weight=0.03. Compared to Q3, the geometry seems less satisfying. It is able to catch the overall structure but it is bad with completion and fine details. The reason that the geometry is less "complete" is, the function NeuS uses indicates that the density is zero, meanining empty, inside the object. Therefore, one can see the surface rendered only when the query point is on the surface of an object. On the other hand, the sigmoid function used in Q3 indicates that the an object is solid inside. You can still see surface rendered even when the query point is predicted inside the object.