Author: Zhe Huang (zhehuang@andrew.cmu.edu)
!pwd
/home/zheh/Documents/CMU_16_889_22spring/assignment3
!which python
/home/zheh/anaconda3/envs/l3d/bin/python
import torch
import pytorch3d
import imageio
import numpy as np
!pip install -q mediapy
import mediapy as media
from pytorch3d.ops import sample_points_from_meshes
grid_ray = {
'grid': media.read_image('outputs/p1_3_grid.jpg'),
'rays': media.read_image('outputs/p1_3_ray.jpg'),
}
media.show_images(grid_ray, height=500)
grid | rays |
media.show_image(media.read_image('outputs/p1_4.jpg'), height=500)
media.show_video(media.read_video('images/part_1.gif'), codec='gif', title='feature', height=500)
media.show_image(media.read_image('outputs/p1_5.jpg'), title='depth', height=375)
feature |
depth |
After training and rounding to the nearest hundredth:
Box center: (0.25, 0.25, 0.00)
Box side lengths: (2.01, 1.50, 1.50)
ours_theirs = {
'my result': media.read_video('images/part_2.gif'),
'ref result': media.read_video('ta_images/part_2.gif'),
}
media.show_videos(ours_theirs, codec='gif', height=500)
my result | ref result |
The network structure (e.g. layers, types) of my NeRF implementation follows the original NeRF paper, including the positional embeddings. The model is not view dependent so the output layer is just a nn.Linear
layer with out put size $(N_{points}, 4)$ where each point has 3 feature values (with sigmoid activation) and 1 density value (with ReLU activation to avoid negative values). All hyperparameters are default values from nerf_lego.yaml
.
p3_results = {
'after 50 epochs': media.read_video('images/part_3_50.gif'),
'after 100 epochs': media.read_video('images/part_3_100.gif'),
'after 150 epochs': media.read_video('images/part_3_150.gif'),
'after 200 epochs': media.read_video('images/part_3_200.gif'),
'after 250 epochs': media.read_video('images/part_3_250.gif'),
}
media.show_videos(p3_results, codec='gif', height=200)
after 50 epochs | after 100 epochs | after 150 epochs | after 200 epochs | after 250 epochs |
Here I tweaked the network structure from part 3 a little bit by removing the simple output layer and adding the directional embedding layers. I adapt a new set of hyperparameters, as is listed below.
n_harmonic_functions_xyz: 9
n_harmonic_functions_dir: 9
n_hidden_neurons_xyz: 256
n_hidden_neurons_dir: 128
density_noise_std: 0.01
n_layers_xyz: 9
append_xyz: [4]
Here is the result after training 100 episodes. It is little blurry due to the lack of training but we can still clearly see that the luminance of the ground change with as viewpoint changes due to the view dependence.
media.show_video(media.read_video('images/part_4_dir.gif'), codec='gif', height=500)
Here I still use view independent NeRF as it is fast to train. For full model, I use the same hyperparameters as in part 4.1. For lite model, I use the default setting from nerf_lego_highres.yaml
. This is the result after taining 100 epochs.
highres_results = {
'lite model': media.read_video('images/part_4_lite.gif'),
'full model': media.read_video('images/part_4_full.gif'),
}
media.show_videos(highres_results, codec='gif', height=500)
lite model | full model |