16825 Project 2

Pengliang Ji (pengliaj)

1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

GT: 1_1_tgt Prediction: 1_1_src

1.2. Fitting a point cloud (5 points)

GT: 1_1_tgt Prediction: 1_1_src

1.3. Fitting a mesh (5 points)

GT: 1_1_tgt Prediction: 1_1_src

2. Reconstructing 3D from single view

2.1. Image to voxel grid (20 points)

Image: GT: 1_1_tgt Prediction: 1_1_src

Image: G GT: 1_1_tgt Prediction: 1_1_src

Image: GT: 1_1_tgt Prediction: 1_1_src

2.2. Image to point cloud (20 points)

Image: GT: 1_1_tgt Prediction: 1_1_src

2.3. Image to mesh (20 points)

Image: GT: 1_1_tgt Prediction: 1_1_src

2.4. Quantitative comparisions (10 points)

Following the same training duration, the F1 score attained by the voxel-based approach and mesh-based approach, markedly beneath the F1 scores achieved through point cloud.

This discrepancy is primarily attributed to the voxel and mesh representation's limited resolution, capped at a mere 32^3, insufficient for accurate 3D object reconstruction. Among the techniques, point clouds demonstrate the highest efficiency in crafting 3D models of chairs, followed by meshes. So enhancing the voxel resolution could potentially elevate the F1 score, indicating an improvement in model performance.

Point:

2_4_point

Voxel:

2_4_vox

Mesh:

2_4_mesh

2.5. Analyse effects of hyperparams variations (10 points)

I conducted experiments adjusting the smoothing parameter (w_smooth) to 0.1, 0.5, and 5.0. Observations on a qualitative level revealed that applying any smoothing subtly alters the aesthetics of mesh reconstructions, primarily by softening edges and promoting more uniform surfaces, though these modifications don't substantially impact F1 scores. At a larger elevated smoothing parameter, there's a notable decline in model performance, which can be attributed to an over-smoothing effect. This excessive smoothing blurs essential geometric nuances vital for detailed reconstruction.

w_smooth: 0.1

2_5_0_1

w_smooth: 0.5

2_5_0_5

w_smooth: 5.0

2_5_0_5

2.6. Interpret your model (15 points)

In further analysis, I delve into the model's robustness across different representations such as mesh, voxel, and point cloud by investigating its adaptability and performance nuances. This examination involves altering the noise weight of the input data to observe how the model's accuracy and efficiency fluctuate across these distinct formats.

The exploration reveals that the model demonstrates varying degrees of sensitivity to the representation type. For instance, while handling mesh data, the model may exhibit enhanced precision in capturing intricate surface details, underscoring its proficiency in processing complex geometrical structures. Conversely, with voxel representations, the model's performance might slightly wane, possibly due to the inherent limitations of voxel-based data in conveying fine geometric nuances. Point cloud representations, on the other hand, offer a mixed insight, where the model balances between capturing detailed geometrical features and grappling with the sparsity of data.

These findings suggest the potential advantage of integrating specialized preprocessing or encoding techniques tailored to each representation. For meshes, techniques focusing on surface smoothing and feature enhancement could be beneficial. For voxel data, employing methods that increase resolution or incorporate spatial hierarchies might improve model performance. For point clouds, leveraging advanced sampling methods or point feature encoding could bridge the gap in geometric detail capture. Such strategic enhancements are poised to bolster the model's overall robustness and adaptability across different 3D data representations.

Below show the input and the results for three representation: Mesh, Points, and Voxel.

GT:

2_6_gt_0_0.1_mesh 2_6_gt_0_0.1_point 2_6_gt_0_0.1_vox

Prediction with Noise weight: 0.1

2_6_img_0_0.1 2_6_pred_0_0.1_mesh

Prediction with Noise weight: 0.5

2_6_img_0_0.1 2_6_pred_0_0.1_mesh

Prediction with Noise weight: 1

2_6_img_0_0.1 2_6_pred_0_0.1_mesh

3. Exploring other architectures / datasets.

3.1 Implicit network (10 points)

Under same setting, the preformance of implicit network and baseline network are shown below:

Implicit Network:

Baseline:

3.3 Extended dataset for training (10 points)

Training a 3D reconstruction model on an extended dataset with multiple classes (chair, car, plane) compared to a single-class dataset: Quantitatively, the model trained on the extended dataset showed improved generalization across multiple classes, achieving higher overall performance metrics. However, this came at the cost of slightly reduced performance on any single class, evidenced by a lower F1 score for chairs when the model was trained on the broader dataset. This suggests a trade-off between specialization and generalization in machine learning models, where expanding the training dataset to include a variety of classes enhances the model's ability to understand and reconstruct a wider range of objects, but may reduce its precision for specific class details.

Below are the detailed results: