Luyuan Wang (luyuanw@andrew.cmu.edu)
vis_grid:

vis_rays:


Rendered color:

Depth map:
I can generate a depth map that similar to the given one with the following code:
xdepth = self._aggregate(weights, depth_values.view(-1, n_pts, 1))The result is shown below:

However, this approach doesn't looks correct to me. For a depth map, we shouldn't consider the weights behind of the nearest point. If we say the dark purple color represents infinity and the yellow represents 0 (opposite to the default color map of matplotlib, this looks more like a inverse depth map), then the edge that facing towords the camera should be yellow, but it shows green in the figure.
A correct way to generate a depth map should be like this: count the distance from the center of the camera, and keep accumulating the distance until reach the first obstacle (density > some threshold). My code is shown below:
x
dens = density.view(-1, n_pts, 1)indices = torch.ones_like(dens, dtype=torch.int) * torch.arange(0, dens.shape[1]).reshape(1, -1, 1).to(dens.device)idx = torch.where(dens > 1, indices, dens.shape[1]-1)idx = idx.min(dim=1, keepdim=False)[0]idx = idx.squeeze()
indices = indices.squeeze()
depth_weight = torch.zeros_like(indices)depth_weight[indices <= idx.unsqueeze(-1)] = 1depth_weight = depth_weight.unsqueeze(-1)
depth = self._aggregate(depth_weight, deltas).squeeze()depth = 1 / depth # inverse depth mapThe generated depth map:

I think this should be the correct answer.

I used a similar network architecture with the original NeRF paper. The result on low-resolution input is shown below. It looks nice and similar to the given GIF image.

With the default settings:

After I changed the number of sampled points to 56, which is smaller than the default setting:

It significantly speed up the training process. However, the image quality gets lower, some of the details are missing.