Question 1
Q1.1


Q1.2


Q1.3


Question 2
Q2.1
The architecture for the Voxel Grid network was as follows:- ConvTranspose3D $64 \rightarrow 128$, stride=2, kernel=4
- ConvTranspose3D $128 \rightarrow 256$, stride=2, kernel=4
- ConvTranspose3D $256 \rightarrow 128$, stride=2, kernel=4
- ConvTranspose3D $128 \rightarrow 64$, stride=2, kernel=4
- ConvTranspose3D $64 \rightarrow 1$, stride=1, kernel=3



Q2.2
The architecture for the Point Cloud fitting network was as follows:- Linear $512 \rightarrow 1024$
- Linear $1024 \rightarrow 2048$
- Linear $2048 \rightarrow 3n$



Q2.3
The architecture for the mesh fitting network was as follows:- Linear $512 \rightarrow 1024$
- Linear $1024 \rightarrow 1024$
- Linear $1024 \rightarrow 3v$



Unfortunately I couldn't train this network for very long because of a memory leak that would cause my GPU to run out of memory. I wasn't able to correct this even after following instructions posted by another student on piazza.
Q2.4
The results of the F1@0.05 score for each object prediction are shown:Data Type | F1@0.05 Score |
---|---|
Pointcloud | 91.3% |
Mesh | 84.1% |
Voxel Grid | 38.2% |
I believe the pointcloud performed best since it doesn't need to worry about connectivity constraints compared to the mesh. Even with the smoothing constraint, the mesh fit struggles to properly deform. The F1@0.05 score is a bit misleading, because it only measures how many points of the resultant prediction are within 0.05m to a ground truth point. In theory I could produce a mesh that contains all vertices concentrated within 0.05m of a single ground truth vertex, and it would register 100% F1@0.05 score. That said, the resolution of the voxel grid is likely not high enough in this example to produce a competitive score. From visual inspection we can see that the voxel grid seems to overestimate the shapes of the chairs, "puffing" them up.
Q2.5
Increasing the number of vertices on the ico-sphere used as an initial mesh for the mesh fitting seems to improve the results. I wasn't able to get it to a point where the mesh stops looking jagged just by varying this parameter, which may mean this has more to do with the architecture. I tried to increase the amount of smoothing by increasing thew_smooth
parameter. If I weight smoothness too heavily, we end up with this blob like chair shape, that doesn't take on the form of the target chair very well.


Varying the number of points in the pointcloud meant that I needed to train for different amounts of time. Since increasing the number of points in the cloud effectively increases the amount of parameters in the model, this meant the network needed to be trained for longer in order to find the global minimum of the loss surface.
Q2.6
For my point cloud model, I was curious how general the features learned were. In order to test this I passed multiple different shapes to the model to see what it outputs.Sphere
I wanted to first try an easy mesh for the network. I created an ico-sphere and rendered an image of it, producing this output after passing it to the network:
Chair Images from the Internet
Next, I wanted to see how well the network would approximate chairs from the internet, without having exposure to the ground truth. First I passed it an image of multiple chairs to see what would happen. It outputs one chair, that fits the shape of the passed chairs quite well. However, this leads me to believe the network has learned a "mean shape" that it deforms as needed.

Nothing
Lastly, I passed the network a blank image, which resulted in a clear representation of what the network sees the world as...versions of chairs.