Caroline Ai
Your first task is to create a 360-degree gif video that shows many continuous views of the provided cow mesh. For many of your results this semester, you will be expected to show full turntable views of your outputs.
The Dolly Zoom is a famous camera effect, first used in the Alfred Hitchcock film Vertigo. The core idea is to change the focal length of the camera while moving the camera in a way such that the subject is the same size in the frame, producing a rather unsettling effect. In this task, you will recreate this effect in Pytorch3D.
In this part, you will practice working with the geometry of 3D meshes. Construct a tetrahedron mesh and then render it from multiple viewpoints. Your tetrahedron does not need to be a regular tetrahedron (i.e. not all faces need to be equilateral triangles) as long as it is obvious from the renderings that the shape is a tetrahedron.
vertices = [[-1, 2, 1], [0, -1, 2], [2, 0, 0], [-2, -1, -1]]
faces = [[0,1,2], [0,1,3], [0,2,3], [1,2,3]]
Construct a cube mesh and then render it from multiple viewpoints. Remember that we are still working with triangle meshes, so you will need to use two sets of triangle faces to represent one face of the cube.
vertices = [[0,0,0], [1,0,0], [1,1,0], [0,1,0],
[0,0,1], [1,0,1], [1,1,1], [0,1,1]]
faces = [[0,1,3], [1,2,3], [1,2,5], [2,5,6], [2,3,6], [3,6,7],
[5,6,7], [4,5,7], [0,3,4], [3,4,7], [0,1,5], [0,4,5]]
Now let's practice re-texturing a mesh. For this task, we will be retexturing the cow mesh such that the color smoothly changes from the front of the cow to the back of the cow.
In this case, color1 = [255/255,248/255,220/255]
and color2 = [205/255,133/255,63/255]
, which are respectively beige and brown.
When working with 3D, finding a reasonable camera pose is often the first step to producing a useful visualization, and an important first step toward debugging.
Running python -m starter.camera_transforms
produces the following image using
the camera extrinsics rotation R_0
and translation T_0
:
The following sets of R_relative
and T_relative
produce the transformations:
R_relative = [[ 0., 1., 0.],
[-1., 0., 0.],
[ 0., 0., 1.]]
T_relative = [0, 0, 2]
T_relative = [ 0.5000, -0.5000, 0.0000]
R_relative = [[ 4.3711e-08, 0.0000e+00, 1.0000e+00],
[ 0.0000e+00, 1.0000e+00, 0.0000e+00],
[-1.0000e+00, 0.0000e+00, 4.3711e-08]]
T_relative = [-3.0000, -0.0000, 3.0000]
The simplest possible 3D representation is simply a collection of 3D points, each possibly associated with a color feature. PyTorch3D provides functionality for rendering point clouds.
Similar to the mesh rendering, we will need a PointCloud
object consisting of 3D
points and colors, a camera from which to view the point cloud, and a Pytorch3D Point
Renderer which we have wrapped similarly to the Mesh Renderer.
In this part, we will practice rendering point clouds constructed from 2 RGB-D images from the Common Objects in 3D Dataset.
In render_generic.py
, the load_rgbd_data
function will load the data for 2 images of the same
plant. The dictionary should contain the RGB image, a depth map, a mask, and a
Pytorch3D camera corresponding to the pose that the image was taken from.
You should use the unproject_depth_image
function in utils.py
to convert a depth
image into a point cloud (parameterized as a set of 3D coordinates and corresponding
color values). The unproject_depth_image
function uses the camera
intrinsics and extrinisics to cast a ray from every pixel in the image into world
coordinates space. The ray's final distance is the depth value at that pixel, and the
color of each point can be determined from the corresponding image pixel.
Construct 3 different point clouds:
A parametric function generates a 3D point for each point in the source domain.
Your task is to render a torus point cloud by sampling its parametric function.
In this part, we will explore representing geometry as a function in the form of an implicit function. In general, given a function F(x, y, z), we can define the surface to be the zero level-set of F i.e. (x,y,z) such that F(x, y, z) = 0. The function F can be a mathematical equation or even a neural network. To visualize such a representation, we can discretize the 3D space and evaluate the implicit function, storing the values in a voxel grid. Finally, to recover the mesh, we can run the marching cubes algorithm to extract the 0-level set.
In practice, we can generate our voxel coordinates using torch.meshgrid
which we will
use to query our function (in this case mathematical ones).
Once we have our voxel grid, we can use the
mcubes
library convert into a mesh.
Your task is to render a torus again, this time as a mesh defined by an implicit function.
Rendering speed: Given the same size of the same object, rendering point clouds is generally faster than rendering meshes, but it also depends on the density of points.
Rendering quality: Meshes can render the object more completely and in a more connected way. Point clouds can render the object smoothly but not fully connected.
Ease of use: In terms of writing codes of the parametric function for point clouds and the implicit function for meshes, rendering meshes is slightly easier than rendering point clouds.
Memory usage: Point clouds utilize smaller memory usage than meshes.
Now that you have learned to work with various 3D represenations and render them, it is time to try something fun. Create your own 3D structures, or render something in an interesting way, or creatively texture, or anything else that appeals to you - the (3D) world is your oyster! If you wish to download additional meshes, Free3D is a good place to start.
We will explore how to obtain point clouds from triangle meshes. One obvious way to do this is to simply discard the face information and treat the vertices as a point cloud. However, this might be unresonable if the faces are not of equal size.
Instead, as we saw in the lectures, a solution to this problem is to use a uniform sampling of the surface using stratified sampling. The procedure is as follows:
For this part, write a function that takes a triangle mesh and the number of samples and outputs a point cloud. Then, using the cow mesh, randomly sample 10, 100, 1000, and 10000 points.