Name: Chih-Wei Wu
Email: chihweiw@andrew.cmu.edu
This part implement the sphere tracing algorithm as discussed in the lecture. The algorithm is as follows: We traverse each ray starting from the camera origin, and search for the zero crossing of the implicit function (signed distance function). If the function value of a point drops below 0, then a zero crossing has occurred and the point would be determined as the surface point. If not, then we step in the direction of the ray with step size equal to the function value.
In practice, I use a points
array to record the surface point, and a mask
array to keep track of which ray has reached the surface. At each iteration, I only sample SDF value for points that haven't reach a surface, and perform the zero crossing test. The zero crossing test is comparing the SDF value to 0.00001.
The pseudo code of my implementation is as follows:
For _ in range(max_iter):
Sample SDF value for rays that haven't reach surface
Perform zero crossing test, update mask
For rays that haven't reach surface, update t
Above on the left is input point cloud, on the right is the network prediction.
Regarding the MLP to predict distance, I use similar architecture of NeRF for predicting density. The network consists of 6 MLP layers, taking as input the positional encoded xyz coordinate, and output the distance value. There are no activation function after the last linear layer, because the distance value could range from +infinity to -infinity. All network parameters and optimization parameters uses the default parameters provided in TA's config.
Regarding the Eikonal loss, it is |norm(gradient) - 1|
, which is the absolute value of graident L2 norm minus one.
Regarding the setting I choose, I follow the default parameters provided in TA's config, as it already generates satisfying results.
Network Architecture
For the MLP in this question, I use similar architecture as NeRF for predicting color. The color network and distance network shares the 6 layer MLP feature network. I add 1 linear layer and 2 fully-connected layers after the feature network, and a sigmoid function to predict RGB color. I've experimented with the network desribed in VolSDF paper, which is 3 fully-connected layers after feature network, and concatenating positional encoded xyz coord before these 3 layers. The results are shown below. Clearly, there are some artifacts in the background of gif on the right. This could be caused by overfitting on the training views because of excessive amoint of network parameters. That is to say, the original network works better because of meticulously design of a fewer parameter nework.
Questions related to the SDF to density function:
How does high beta bias your learned SDF? What about low beta?
Results of high beta and low beta are shown below. It could be seen that small beta has crispy results, while high beta has blurry results. Also, judging from the surface visualization, large beta loses thin detail of the object, such as the arm to the shovel and the pillar of the driver's box. This is because small beta could learn a much accurate surface than high beta, and could render detail more accurately in turn. See the third point for detailed reasoning. Regarding convergenece speed, high beta converges much faster. This is because large beta results in a SDF to density function that is easier for the network to learn. See the second point for detailed reasoning.
Beta = 0.5
Beta = 0.05
Beta = 0.005
Would an SDF be easier to train with volume rendering and low beta or high beta? Why?
Large beta would be easier to learn in theory. This is because beta controls steepness of the SDF to density function at SDF = 0. The larger the beta is, the less steeper it is. In other words, the transition from high density to low density is smoother, and the gradient at large SDF value (very positive or very negative SDF value) would be larger than the one with small beta. Larger gradient at large SDF value means faster convergence and possibly easier to train.
Would you be more likely to learn an accurate surface with high beta or low beta? Why?
Small beta would more likely learn an accurate surface in theory. The beta controls how steep the SDF to denisty function is at SDF = 0. When beta is small, the steeper it is, which resembles more of a "step function". This means that the density would drop drastically immediate after crossing the surface. If the network manages to learn the SDF under this function with small beta, the surface could be more accurately localized than a network learned with larger beta.
It is said that a benefit of using surface representations, is that the geometry is better regularized and can be inferred from fewer views. To experiment with this, I train both NeRF and VolSDF with different number of training views. For results down below, NeRF is on the left, VolSDF is on the right.
20 views
10 views
5 views
From the results, we could see that NeRF starts to generate artifacts when training with 10 views. In the gif, we could clearly see some flickering views while turning the object, indicating the NeRF had a difficult time generalizing to novel views when trained with only 10 views. On the other hand, VolSDF could still generalize well to different views when trained with 10 views. It starts to break at training with 5 views. This is attributes to the SDF representation of the volume, which has better regularization compared to NeRF. Therefore, it is able to infer novel views from fewer views.
For this section, I implement the SDF to density function introduced in the NeuS paper. Specifically, I implement the "naive" solution from it, which is by using s * exp(-sx) / (1 + exp(-sx))**2
as the SDF to density function. To improve numerical stability, I use s * sigmoid(x) * (1 - sigmoid(x))
instead, which is an equivalent calculation. The idea behind this function is that density should be concentrated on the surface itself, because there is no way telling whether it is a hallow or solid object beneath the surface.
Below are results from NeuS using different s
, and results from VolSDF (Q3) in comparison. The s
parameter controls how concentrated should the density be around the surface. The larger it is, the density would concentrated more in the narrow region where the surface should reside in. We could observe from the result that larger s
gives better rendering result. This is probabily because large s
could localize surface better.
Also, the best result by using the new SDF to density function generates crispier 2D rendering results than VolSDF. This might attribute to the fact that the color and density information is more concentrated on the surface itself, instead of encapsulated in the whole object as in VolSDF. Concentrated density and color on the surface gives the model higher capability of modeling fine details.
However, the surface visualization shows that it is difficult to use this SDF to density function to learn a good surface compare to VolSDF. This might because it is more difficult to learn an SDF to represent a solid object with this SDF to density function.
VolSDF
NeuS
s = 10
s = 100
s = 1000