Assignment 1. Rendering basics with Pytorch3D

16-889 Learning for 3D Vision

Soyong Shin

Due by Feb. 10 (Thu)

Contents:

Introduction

In this assignment, we learn the basic rendering technique using Pytorch3D.



Task 1. Practicing with Cameras

Subtask 1. 360-degree Render (5 points)

In this section, we practiced to render given mesh varying the camera view. This task was done by following steps.

  1. Load vertices and faces
  1. Build mesh with basic color
  1. Create the list of camera pose (R, T) to render the mesh
  1. Render mesh on each camera view and stack the image
  1. Create .gif file

I created 30 views by changing azimuthal view while fixing the elevation as 30 degree. Also I fixed the distance and FoVFoV of cameras to 2.7 and 60. This is done by creating 30 sets of rotation matrix (RR) and translation (T)(T) using pytorch3d.renderer.load_at_view_transform. The generation of multiple views are implemented as the code below:

# Set elevation and azimuth of the views
elev = torch.ones(num_views) * 30
azim = torch.linspace(-180, 180, num_views)

# Create corresponding camera extrinsics
Rs, Ts = pytorch3d.renderer.load_at_view_transform(
						dist=2.7, elev=elev, device=self.device)

# Every iteration, create pytorch3d camera instance with a different view
for view in tqdm(range(num_views), desc='Rendering ...'):
		R = Rs[view].unsqueeze(0)
		T = Ts[view].unsqueeze(0)
		cameras = pytorch.3d.renderer.FoVPerspectiveCameras(
								R=R, T=T, fov=60, device=self.device)

Figure 1 below shows the results.

Figure 1. Results of Task 1.1.


Subtask 2. Recreating the Dolly Zoom (10 points)

In this section, we create Dolly Zoom video that gradually Zoom into the cow mesh using different set of FoVFoV and TT. Under the fixed distance of camera (dd), the mesh will be zoomed out if we increase the FoVFoV (Figure 2). Therefore, we need to adjust dd to offset the effect of FoVFoV and even makes the object looks closer with large FoVFoV.

Figure 2. Camera field of view.

Thus, I made the dd to be proportional to (FoV)2(FoV)^{-2}. To make the results similar to the sample GIF file, I set d=1.75×104×(FoV)2d = 1.75 \times 10^4 \times (FoV)^{-2}.

fovs = torch.linspace(5, 120, num_frames)
for fov in tqdm(fovs, desc='Rendering ...'):
		distance = (1.75 * 1e4)/(fov ** 2)
		T = [[0, 0, distance]]
		cameras = pytorch3d.renderer.FoVPerspectiveCameras(
				fov=fov, T=T, device=self.device)

The results is as below (Figure 3).

Figure 3. Results of Task 1.2.



Task 2. Practicing with Meshes

Subtask 1. Constructing a Tetrahedron (5 points)

In this seciton, instead of using the given mesh, we practiced to build our own geometries and render them. Here, I created Tetrahedron and rendered it.

To create the Tetrahedral vertices, I set three vertices to be on XYX-Y plane and one vertex to have ZZ axis value. Since all vertices in a Tetrahedron is connected, the faces is just set of all combinations of three vertices.

vertices = torch.tensor([[[0, 2, 0], 1, 0, 0], [-1, 0, 0], [0, 1, 2]]],
											  dtype=torch.float32) * 0.5
faces = torch.tensor([[[0, 1, 2], [0, 1, 3], [0, 2, 3], [1, 2, 3],
                      dtype=torch.int64)
vertices -= vertices.mean(1, keepdims=True) # Place vertices at origin

I used same configuration of cameras to Task 1.1. The results is shown as Figure 4.

Figure 4. Results of Task 2.1.


Subtask 2. Constructing a Cube (5 points)

Similar to Task 2.1., in this section, I created Cube consists of 12 vertices and 6 faces. Although we know that the Cube has faces with 4 vertices each, mesh in pytorch3d should have composed with 3 vertices. Therefore, I designed 2 triangular faces for each square face. Code and the results are as as below:

vertices = torch.tensor([[[1, 1, 1], [1, 1, -1], [1, -1, 1], [1, -1, -1],
												 [-1, 1, 1], [-1, 1, -1], [-1, -1, 1], [-1, -1, -1]]],
												 dtype=torch.float32) * 0.5
faces = torch.tensor([[[0, 1, 2], [1, 2, 3], [0, 4, 6], [0, 2, 6],
											[0, 4, 1], [4, 1, 5], [2, 3, 6], [6, 7, 3], 
											[5, 4, 6], [6, 7, 5], [5, 7, 1], [3, 7, 1]]],
											dtype=torch.int64)
vertices -= vertices.mean(1, keepdims=True)

Figure 5. Results of Task 2.2.



Task 3. Re-texturing a mesh (10 points)

In this section, we generated texture of a mesh in a gradiation of two colors. As described in the assignment, ZZ-axis of the cow is the axis where cow is directing. Therefore, in order to color cow with a gradiation in that direction, we need to add color values using ZZ-axis coordinate. I use the front color with red (1, 0, 0) and back color with blue (0, 0, 1). This is implemented as below:

color1, color2 = torch.tensor([1, 0, 0]), torch.tensor([0, 0, 1])
z = vertices[0, :, -1]
alpha = ((z- z.min()) / (z.max() - z.min()).unsqueeze(-1).repeat(1, 3)
color = alpha * color2 + (1 - alpha) * color1

I compare the results with other uniform color cows.

Figure 6. Results of Task 3.

I also retextured the cow mesh using XX and YY coordinates shown in Figure 7, and this is helpful to check the coordinate system of the cow. This information is used for Task 4.

Figure 7. Texturizing cow in XX and Y coordinate respectively



Task 4. Camera Transformations (20 points)

In this task, we learn the camera transformation rule by rendering the given mesh to certain views. From Figure 7, we learned the coordinate system of the cow mesh. The coordinate systems of the camera and the object (cow) is shown as below:

Figure 8. Coordinate systems of the camera and the object

The first scenario is to rotate the cow in ZZ axis. Following the coordinate systems above, it is 90deg-90\deg rotation of relative camera coordinate in ZZ axis, thus the R_relative and T_relative is given as:

from scipy.spatial.transform import Rotation as _R
r = _R.from_rotvec(np.pi/2 * np.array([0, 0, -1]))
R_rel = torch.from_numpy(r.as_matrix()).float()
T_rel = torch.tensor([0, 0, 0])
Figure 9. The results of Task 4, scenario 1.

This scenario is to move cow backward. This translation is positive ZZ axis in camera coordinate. Therefore, the solution is obatined as:

R_rel = torch.eye(3)
T_rel = torch.tensor([0, 0, 3])
Figure 10. The results of Task 4, scenario 2.

Next scenario includes several operation. First of all, the cow has been slightly tilted (this might not be true, but I made this rotation) and the moved to bottom left corner of the camera plane. The rotation is a positive direction of YY axis, and the relative translation of the camera should be opposite of the cow location, negative YY and positive XX.

r = _R.from_rotvec(np.pi/60 * np.array([0, 1, 0]))
R_rel = torch.from_numpy(r.as_matrix()).float()
T_rel = torch.tensor([0.2, -0.6, 0]).float()
Figure 11. The results of Task 4, scenario 3.

To achieve scenario 4, the camera should be relatively rotated in positive YY direction. Once camera is rotated, in order to capture cow, camera should move to negative XX and positive ZZ axis. The initial relative location of cow is (0, 0, 3), and it changes to (3, 0, 0) by 90 degree of rotation. Therefore, we need to make it back to (0, 0, 3).

R = _R.from_rotvec(np.pi/2 * np.array([0, 0, 1]))
R_rel = torch.from_numpy(r.as_matrix()).float()
T_rel = torch.tensor([-3, 0, 3]).float()
Figure 12. The results of Task 4, scenario 4.



Task 5. Rendering Generic 3D Representations

Subtask 1. Rendering Point Clouds from RGB-D Images (10 points)

Here, we learned how to render point cloud using pytorch3d. Following the starter code, I extracted points and color information using unproject_depth_image function. Since some points are given as nan value, I removed them. Once I obtain the point clouds from two images, I stacked them into a single point cloud.

# vertices = [verts1, verts2]
# colors = [color1, color2]
verts = torch.cat(vertices, dim=1)
color = torch.cat(colors, dim=1)
verts -= verts.mean(1, keepdims=True)
point_cloud = pytorch3d.structures.Pointclouds(points=verts, features=color)

The results are shown as below.

Figure 13. Point cloud from first image
Figure 14. Point cloud from the second image
Figure 15. Point cloud of the union of two images

Subtask 2. Parametric Functions (10 points)

In this task, we created a point cloud using parametric function and render it with different number of samples. The parametric function for torus is given as:

x(θ,ψ)=(R+rcosθ) cosψy(θ,ψ)=(R+rcosθ) sinψz(θ,ψ)=rsinθx(\theta, \psi) = (R + r \cos\theta) ~\cos\psi \\ y(\theta, \psi) = (R + r \cos \theta)~\sin \psi \\ z(\theta, \psi) = r \sin \theta

where θ,ψ\theta, \psi are angles which make a full circle and RR is the distance from the center of the tube, and rr is the radius of the tube.

The code implementation is as below:

phi = torch.linspace(0, 2 * np.pi, num_samples)
theta = torch.linspace(0, 2 * np.pi, num_samples)
Phi, Theta = torch.meshgrid(phi, theta)

x = torch.cos(Phi) * (R + r * torch.cos(Theta))
y = torch.sin(Phi) * (R + r * torch.cos(Theta))
z = r * torch.sin(Theta)

While setting R = 1 and r = 0.5, I generated GIF files varying the number of samples.

Figure 16. Results of Task 5.2.

I also changed the radius of the tube.

Figure 17. Results of Task 5.2 (2)


Subtask 3. Implicit Surfaces (15 points)

Now, I created the torus using implicit surfaces. In this case, we need to build a surface function of torso which is given as:

f(x,y,z)=(x2+y2R)2+(Z2r2)f(x, y, z) = (\sqrt{x^2 + y^2} - R)^2 + (Z^2 -r^2)

The code is as below:

min_value = -1.6
max_value = 1.6
X, Y, Z = torch.meshgrid([torch.linspace(min_value, max_value, voxel_size)] * 3)
voxels = ((X * X + Y * Y) ** (0.5) - R) ** 2 + (Z * Z - r * r)

Here, by creating volumes with different number of voxels, I analyzed the resolution with respect to the voxel numbers. Number of voxels are 16, 32, 64 respectively.

Figure 18. Results of Task 5.3.

Compared to the point cloud, implicit surface has its benefit that it has connectivity between vertices.



Task 6. Do Something Fun! (10 points)

In this section, I created a colorful tunnel and move the camera through the tunnel. To make the tunnel, I used the implicit function defined as:

f(x,y,z)=x2+y2R2,     (0zH)f(x, y, z) = x^2 + y^2 - R^2, ~~~~~ (0 \leq z \leq H)

Also, I re-textured the tunnel using gradiation of color sets. This is done by similar method of Task 3. To move camera, I changed translation TT. The code implementation is as below.

# Scale of tunnel
R = 1; H = 30; volume_size = 32

# Color streams
colors = [(1, 1, 1), (1, 1, 0), (1, 0, 0), (0.5, 0.5, 1), (0, 1, 0.5),
        (0, 0, 1), (0, 1, 1), (0, 1, 0), (1, 1, 0),
        (1, 0, 0), (1, 0, 1), (1, 1, 1), (1, 1, 1)]

# Build tunnel's implicit surface
min_xy = -1.1; max_xy = 1.1; min_z = -0.1; max_z = H + 1;
x_grid = torch.linspace(min_xy, max_xy, volume_size)
y_grid = torch.linspace(min_xy, max_xy, volume_size)
z_grid = torch.linspace(min_z, max_z, volume_size * 15)
X, Y, Z = torch.meshgrid([x_grid, y_grid, z_grid])

# Create marching surface of the voxels
voxels = (X * X + Y * Y) - R * R
vertices, faces = mcubes.marching_cubes(mcubes.smooth(voxels), isovalue=0)
vertices = torch.tensor(vertices).float().unsqueeze(0)
faces = torch.tensor(faces.astype(int)).unsqueeze(0)

# Normalize tunnel and set Z-axis starts from 0
vertices = (vertices / volume_size) * (max_xy - min_xy) + min_xy
vertices[..., 2] -= min_xy

# Gradually adding colors
textures = torch.ones_like(vertices)
for idx, (color1, color2) in enumerate(zip(colors[:-1], colors[1:])):
  mask1 = vertices[..., -1] > (H * idx / len(colors) - 1e-4)
  mask2 = vertices[..., -1] < (H * (idx + 1) / len(colors) + 1e-4)
  mask = mask1 * mask2

  alpha = vertices[..., -1][mask]
  alpha = (alpha - alpha.min()) / (alpha.max() - alpha.min())
  alpha = alpha.unsqueeze(-1).repeat(1, 3)
  color_map = torch.tensor(color2) * alpha + torch.tensor(color1) * (1-alpha)
  textures[mask] = color_map

mesh = self._get_mesh(vertices, faces, textures).to(self.device)
renderer = get_mesh_renderer(image_size=self.image_size)

fov = 60
dists = torch.linspace(5, -1.2 * H, num_frames)
images = []
for dist in tqdm(dists, desc='Rendering ...'):
  T = [[0, 0, dist]]
  cameras = pytorch3d.renderer.FoVPerspectiveCameras(
      fov=fov, T=T, device=self.device)

  image = renderer(mesh, cameras=cameras, lights=self.lights)
  image = image.cpu().numpy()[0, ..., :3]
  images.append((image * 255).astype(np.uint8))

for i, r in enumerate(images):
  image= Image.fromarray(r)
  draw = ImageDraw.Draw(image)
  images[i] = np.array(image)

imageio.mimsave('output/task6/subtask1.gif', images, fps=25)

The results is quite fancy. The below is the results GIF file.

References

  1. Field of view image from https://videoguys.com/blogs/news-and-sales/check-out-the-cool-field-of-view-calculator-for-panasonic-pro-ptz-cameras
  1. Implicit function and sampling of Torus from https://en.wikipedia.org/wiki/Torus