Assignment 1. Rendering basics with Pytorch3D
16-889 Learning for 3D Vision
Soyong Shin
Due by Feb. 10 (Thu)
Contents:
Introduction
In this assignment, we learn the basic rendering technique using Pytorch3D.
Task 1. Practicing with Cameras
Subtask 1. 360-degree Render (5 points)
In this section, we practiced to render given mesh varying the camera view. This task was done by following steps.
- Load vertices and faces
- Build mesh with basic color
- Create the list of camera pose (R, T) to render the mesh
- Render mesh on each camera view and stack the image
- Create .gif file
I created 30
views by changing azimuthal view while fixing the elevation as 30
degree. Also I fixed the distance and of cameras to 2.7
and 60
. This is done by creating 30
sets of rotation matrix () and translation using pytorch3d.renderer.load_at_view_transform
. The generation of multiple views are implemented as the code below:
# Set elevation and azimuth of the views
elev = torch.ones(num_views) * 30
azim = torch.linspace(-180, 180, num_views)
# Create corresponding camera extrinsics
Rs, Ts = pytorch3d.renderer.load_at_view_transform(
dist=2.7, elev=elev, device=self.device)
# Every iteration, create pytorch3d camera instance with a different view
for view in tqdm(range(num_views), desc='Rendering ...'):
R = Rs[view].unsqueeze(0)
T = Ts[view].unsqueeze(0)
cameras = pytorch.3d.renderer.FoVPerspectiveCameras(
R=R, T=T, fov=60, device=self.device)
Figure 1 below shows the results.

Subtask 2. Recreating the Dolly Zoom (10 points)
In this section, we create Dolly Zoom video that gradually Zoom into the cow mesh using different set of and . Under the fixed distance of camera (), the mesh will be zoomed out if we increase the (Figure 2). Therefore, we need to adjust to offset the effect of and even makes the object looks closer with large .

Thus, I made the to be proportional to . To make the results similar to the sample GIF file, I set .
fovs = torch.linspace(5, 120, num_frames)
for fov in tqdm(fovs, desc='Rendering ...'):
distance = (1.75 * 1e4)/(fov ** 2)
T = [[0, 0, distance]]
cameras = pytorch3d.renderer.FoVPerspectiveCameras(
fov=fov, T=T, device=self.device)
The results is as below (Figure 3).

Task 2. Practicing with Meshes
Subtask 1. Constructing a Tetrahedron (5 points)
In this seciton, instead of using the given mesh, we practiced to build our own geometries and render them. Here, I created Tetrahedron and rendered it.
To create the Tetrahedral vertices, I set three vertices to be on plane and one vertex to have axis value. Since all vertices in a Tetrahedron is connected, the faces
is just set of all combinations of three vertices.
vertices = torch.tensor([[[0, 2, 0], 1, 0, 0], [-1, 0, 0], [0, 1, 2]]],
dtype=torch.float32) * 0.5
faces = torch.tensor([[[0, 1, 2], [0, 1, 3], [0, 2, 3], [1, 2, 3],
dtype=torch.int64)
vertices -= vertices.mean(1, keepdims=True) # Place vertices at origin
I used same configuration of cameras
to Task 1.1. The results is shown as Figure 4.

Subtask 2. Constructing a Cube (5 points)
Similar to Task 2.1., in this section, I created Cube consists of 12 vertices and 6 faces. Although we know that the Cube has faces with 4 vertices each, mesh in pytorch3d
should have composed with 3 vertices. Therefore, I designed 2 triangular faces for each square face. Code and the results are as as below:
vertices = torch.tensor([[[1, 1, 1], [1, 1, -1], [1, -1, 1], [1, -1, -1],
[-1, 1, 1], [-1, 1, -1], [-1, -1, 1], [-1, -1, -1]]],
dtype=torch.float32) * 0.5
faces = torch.tensor([[[0, 1, 2], [1, 2, 3], [0, 4, 6], [0, 2, 6],
[0, 4, 1], [4, 1, 5], [2, 3, 6], [6, 7, 3],
[5, 4, 6], [6, 7, 5], [5, 7, 1], [3, 7, 1]]],
dtype=torch.int64)
vertices -= vertices.mean(1, keepdims=True)

Task 3. Re-texturing a mesh (10 points)
In this section, we generated texture of a mesh in a gradiation of two colors. As described in the assignment, -axis of the cow is the axis where cow is directing. Therefore, in order to color cow with a gradiation in that direction, we need to add color values using -axis coordinate. I use the front color with red (1, 0, 0)
and back color with blue (0, 0, 1)
. This is implemented as below:
color1, color2 = torch.tensor([1, 0, 0]), torch.tensor([0, 0, 1])
z = vertices[0, :, -1]
alpha = ((z- z.min()) / (z.max() - z.min()).unsqueeze(-1).repeat(1, 3)
color = alpha * color2 + (1 - alpha) * color1
I compare the results with other uniform color cows.
I also retextured the cow mesh using and coordinates shown in Figure 7, and this is helpful to check the coordinate system of the cow. This information is used for Task 4.
Task 4. Camera Transformations (20 points)
In this task, we learn the camera transformation rule by rendering the given mesh to certain views. From Figure 7, we learned the coordinate system of the cow mesh. The coordinate systems of the camera and the object (cow) is shown as below:

- Scenario 1.
The first scenario is to rotate the cow in axis. Following the coordinate systems above, it is rotation of relative camera coordinate in axis, thus the R_relative
and T_relative
is given as:
from scipy.spatial.transform import Rotation as _R
r = _R.from_rotvec(np.pi/2 * np.array([0, 0, -1]))
R_rel = torch.from_numpy(r.as_matrix()).float()
T_rel = torch.tensor([0, 0, 0])

- Scenario 2.
This scenario is to move cow backward. This translation is positive axis in camera coordinate. Therefore, the solution is obatined as:
R_rel = torch.eye(3)
T_rel = torch.tensor([0, 0, 3])

- Scenario 3.
Next scenario includes several operation. First of all, the cow has been slightly tilted (this might not be true, but I made this rotation) and the moved to bottom left corner of the camera plane. The rotation is a positive direction of axis, and the relative translation of the camera should be opposite of the cow location, negative and positive .
r = _R.from_rotvec(np.pi/60 * np.array([0, 1, 0]))
R_rel = torch.from_numpy(r.as_matrix()).float()
T_rel = torch.tensor([0.2, -0.6, 0]).float()

- Scenario 4.
To achieve scenario 4, the camera should be relatively rotated in positive direction. Once camera is rotated, in order to capture cow, camera should move to negative and positive axis. The initial relative location of cow is (0, 0, 3)
, and it changes to (3, 0, 0)
by 90 degree of rotation. Therefore, we need to make it back to (0, 0, 3)
.
R = _R.from_rotvec(np.pi/2 * np.array([0, 0, 1]))
R_rel = torch.from_numpy(r.as_matrix()).float()
T_rel = torch.tensor([-3, 0, 3]).float()

Task 5. Rendering Generic 3D Representations
Subtask 1. Rendering Point Clouds from RGB-D Images (10 points)
Here, we learned how to render point cloud using pytorch3d
. Following the starter code, I extracted points and color information using unproject_depth_image
function. Since some points are given as nan
value, I removed them. Once I obtain the point clouds from two images, I stacked them into a single point cloud.
# vertices = [verts1, verts2]
# colors = [color1, color2]
verts = torch.cat(vertices, dim=1)
color = torch.cat(colors, dim=1)
verts -= verts.mean(1, keepdims=True)
point_cloud = pytorch3d.structures.Pointclouds(points=verts, features=color)
The results are shown as below.
Subtask 2. Parametric Functions (10 points)
In this task, we created a point cloud using parametric function and render it with different number of samples. The parametric function for torus is given as:
where are angles which make a full circle and is the distance from the center of the tube, and is the radius of the tube.
The code implementation is as below:
phi = torch.linspace(0, 2 * np.pi, num_samples)
theta = torch.linspace(0, 2 * np.pi, num_samples)
Phi, Theta = torch.meshgrid(phi, theta)
x = torch.cos(Phi) * (R + r * torch.cos(Theta))
y = torch.sin(Phi) * (R + r * torch.cos(Theta))
z = r * torch.sin(Theta)
While setting R = 1
and r = 0.5
, I generated GIF files varying the number of samples.
- Left top corner: 50
- Right top corner: 100
- Left bottom corner: 250
- Right bottom corner: 500
I also changed the radius of the tube.
- Left top corner:
r = 0.2
- Right top corner:
r = 0.4
- Left bottom corner:
r = 0.6
- Right bottom corner:
r = 0.8
Subtask 3. Implicit Surfaces (15 points)
Now, I created the torus using implicit surfaces. In this case, we need to build a surface function of torso which is given as:
The code is as below:
min_value = -1.6
max_value = 1.6
X, Y, Z = torch.meshgrid([torch.linspace(min_value, max_value, voxel_size)] * 3)
voxels = ((X * X + Y * Y) ** (0.5) - R) ** 2 + (Z * Z - r * r)
Here, by creating volumes with different number of voxels, I analyzed the resolution with respect to the voxel numbers. Number of voxels are 16
, 32
, 64
respectively.
Compared to the point cloud, implicit surface has its benefit that it has connectivity between vertices.
Task 6. Do Something Fun! (10 points)
In this section, I created a colorful tunnel and move the camera through the tunnel. To make the tunnel, I used the implicit function defined as:
Also, I re-textured the tunnel using gradiation of color sets. This is done by similar method of Task 3. To move camera, I changed translation . The code implementation is as below.
# Scale of tunnel
R = 1; H = 30; volume_size = 32
# Color streams
colors = [(1, 1, 1), (1, 1, 0), (1, 0, 0), (0.5, 0.5, 1), (0, 1, 0.5),
(0, 0, 1), (0, 1, 1), (0, 1, 0), (1, 1, 0),
(1, 0, 0), (1, 0, 1), (1, 1, 1), (1, 1, 1)]
# Build tunnel's implicit surface
min_xy = -1.1; max_xy = 1.1; min_z = -0.1; max_z = H + 1;
x_grid = torch.linspace(min_xy, max_xy, volume_size)
y_grid = torch.linspace(min_xy, max_xy, volume_size)
z_grid = torch.linspace(min_z, max_z, volume_size * 15)
X, Y, Z = torch.meshgrid([x_grid, y_grid, z_grid])
# Create marching surface of the voxels
voxels = (X * X + Y * Y) - R * R
vertices, faces = mcubes.marching_cubes(mcubes.smooth(voxels), isovalue=0)
vertices = torch.tensor(vertices).float().unsqueeze(0)
faces = torch.tensor(faces.astype(int)).unsqueeze(0)
# Normalize tunnel and set Z-axis starts from 0
vertices = (vertices / volume_size) * (max_xy - min_xy) + min_xy
vertices[..., 2] -= min_xy
# Gradually adding colors
textures = torch.ones_like(vertices)
for idx, (color1, color2) in enumerate(zip(colors[:-1], colors[1:])):
mask1 = vertices[..., -1] > (H * idx / len(colors) - 1e-4)
mask2 = vertices[..., -1] < (H * (idx + 1) / len(colors) + 1e-4)
mask = mask1 * mask2
alpha = vertices[..., -1][mask]
alpha = (alpha - alpha.min()) / (alpha.max() - alpha.min())
alpha = alpha.unsqueeze(-1).repeat(1, 3)
color_map = torch.tensor(color2) * alpha + torch.tensor(color1) * (1-alpha)
textures[mask] = color_map
mesh = self._get_mesh(vertices, faces, textures).to(self.device)
renderer = get_mesh_renderer(image_size=self.image_size)
fov = 60
dists = torch.linspace(5, -1.2 * H, num_frames)
images = []
for dist in tqdm(dists, desc='Rendering ...'):
T = [[0, 0, dist]]
cameras = pytorch3d.renderer.FoVPerspectiveCameras(
fov=fov, T=T, device=self.device)
image = renderer(mesh, cameras=cameras, lights=self.lights)
image = image.cpu().numpy()[0, ..., :3]
images.append((image * 255).astype(np.uint8))
for i, r in enumerate(images):
image= Image.fromarray(r)
draw = ImageDraw.Draw(image)
images[i] = np.array(image)
imageio.mimsave('output/task6/subtask1.gif', images, fps=25)
The results is quite fancy. The below is the results GIF file.

References
- Field of view image from https://videoguys.com/blogs/news-and-sales/check-out-the-cool-field-of-view-calculator-for-panasonic-pro-ptz-cameras
- Implicit function and sampling of Torus from https://en.wikipedia.org/wiki/Torus