The code is short enough that I can include it here, but I'll also describe it below.
dist_traveled = torch.zeros_like(origins[:, 0])
mask = torch.zeros(
(origins.shape[0], 1), dtype=torch.bool, device=origins.device
)
points = origins.clone()
for _ in range(self.max_iters):
dist_to_next_surface = implicit_fn(points)
dist_traveled += dist_to_next_surface.view(-1)
points = origins + directions * dist_traveled[:, None]
mask[dist_to_next_surface.view(-1) < 1e-4] = 1
return points, mask
First, I save a tensor of distance traveled for each point. This is updated with the distance to the nearest surface during each iteration. Notably, once a point gets sufficiently close to a surface, the updates are small enough that we can continue to add to the distance traveled along the ray. Furthermore, if the ray eventually overshoots (say, due to a neural SDF not providing exact distances) and enters the interior of an SDF, the distance will become negative and tend to the surface again. The points are simply the distance traveled times the direction vector added to the origin.
One might note that this could be inefficient, as we could check, in each iteration, whether the ray has exceeded the max distance and halt on points that are close to the surface.
However, in my tests, with the iterations being 64 according to the configuration, with the addition of extra tests taking more time there's a negligible increase in render time with the benefit of simpler code. With a more complex SDF, like a neural SDF in which the majority of time spent is in the forward pass of the network, it would likely be faster to terminate early for specific points.
I use L1 loss (MAE) with a target of 0 and L2 loss (MSE) with a target of 1 for the MLP and eikonal loss respectively. I also tried an L2 loss for the MLP, but it didn't give as good results - my guess is that when the SDF tended to zero, the loss term was no longer strong enough to penalize the MLP. Both code is straight forward.
# Point loss
point_loss = F.l1_loss(distances, torch.zeros_like(distances))
# Eikonal loss
norm = torch.linalg.norm(gradients, dim=1)
eikonal_loss = F.mse_loss(norm, torch.ones_like(norm))
I answer the questions in part 3.2. Due to the large number of hyperparameters, and the default settings working quite well, I decided to explore modifying beta. I also noticed that training the model for longer than 50 epochs did not seem to improve results, and likely would increase the chances of overfitting on the provided viewing perspectives.
Beta .05 (default), 50 epochs
Beta .05 (default), 250 epochs
As we can see, there isn't much improvement. I run all further experiments with 50 epochs. Below are the results at different beta, with increasing beta.
Beta .001
Beta .01
Beta .05
Beta .25
We can see that a more restrictive beta results in cleaner geometry and pictures, while the high beta results in more holes, artifacts, and a blurrier render. I discuss why in the answers to the questions, but in general, we have a more precise surface with a lower beta via the density being close to 1 (or alpha) at the surface and 0 elsewhere. This also means that when rendering, only a single point is responsible for the color (as the rest of the points along the ray have near zero density). This results in less blur.
In the paper, beta and alpha are learnable parameters. However, we set ours to be fixed. By fixing the SDF to density conversion with a higher beta, we have the network learn a more spread out density. That is, points near to the surface share density that drops off. When beta is low, the density tends to an indicator function, becoming 1 at the surface and 0 elsewhere.
I think the SDF would be harder to train with a very low beta. This is because a minor step in any XYZ direction would result in a very large density change near the surface, but we want our SDF to be smooth across the surface and outwards for it to give us an accurate distance.
The surface will be more accurate with a very low beta, for the same above reasons. As the SDF tends to zero, we would have a very sharp peak in the density, which very quickly drops off to zero for even small values of the SDF.
I create a config named "scene.yaml" and modify the code to accept an SDF type of "scene". This scene.yaml has arguments for the number of rendered objects (randomly chosen from boxes, spheres, and torus), and whether to take the union OR the intersection (with union: False), which is described below. First, I render the scene with a different number of objects.
"scene.yaml", count: 25
"scene.yaml", count: 50
"scene.yaml", count: 75
As we see, with 75 objects, it starts to get cluttered. This gave me an idea to compute intersections to get new shapes.
However, naively taking the intersection of all objects results in a blank scene, as no area of the picture touches ALL 75 objects. What I want is for at least two objects to be intersecting to preserve some area.
While there's likely more efficient algorithms that use data structures for distance or rendering specific algorithms, I use a brute force O(N^2) method by computing intersections of a specific object against all other objects and taking the union of this. Then, the union of all of these masses is taken at once at the end.
Here is the result with intersection on, and 100 objects.
"scene.yaml", count: 100, union: False
We see very strange, complex partial shapes. This is a good way to understand how by taking intersections and unions of simple objects, we can come up with very complex renderings and final shapes.