Luyuan Wang (luyuanw@andrew.cmu.edu)
vis_grid
:
vis_rays
:
Rendered color:
Depth map:
I can generate a depth map that similar to the given one with the following code:
xdepth = self._aggregate(weights, depth_values.view(-1, n_pts, 1))
The result is shown below:
However, this approach doesn't looks correct to me. For a depth map, we shouldn't consider the weights behind of the nearest point. If we say the dark purple color represents infinity and the yellow represents 0 (opposite to the default color map of matplotlib, this looks more like a inverse depth map), then the edge that facing towords the camera should be yellow, but it shows green in the figure.
A correct way to generate a depth map should be like this: count the distance from the center of the camera, and keep accumulating the distance until reach the first obstacle (density > some threshold). My code is shown below:
x
dens = density.view(-1, n_pts, 1)
indices = torch.ones_like(dens, dtype=torch.int) * torch.arange(0, dens.shape[1]).reshape(1, -1, 1).to(dens.device)
idx = torch.where(dens > 1, indices, dens.shape[1]-1)
idx = idx.min(dim=1, keepdim=False)[0]
idx = idx.squeeze()
indices = indices.squeeze()
depth_weight = torch.zeros_like(indices)
depth_weight[indices <= idx.unsqueeze(-1)] = 1
depth_weight = depth_weight.unsqueeze(-1)
depth = self._aggregate(depth_weight, deltas).squeeze()
depth = 1 / depth # inverse depth map
The generated depth map:
I think this should be the correct answer.
I used a similar network architecture with the original NeRF paper. The result on low-resolution input is shown below. It looks nice and similar to the given GIF image.
With the default settings:
After I changed the number of sampled points to 56, which is smaller than the default setting:
It significantly speed up the training process. However, the image quality gets lower, some of the details are missing.