Author: Zhe Huang (zhehuang@andrew.cmu.edu)
!pwd
/home/zheh/Documents/CMU_16_889_22spring/assignment5
!which python
/home/zheh/anaconda3/envs/l3d/bin/python
import os
# !pip install -q mediapy
import mediapy as media
def cls_to_dict_entry(gif_path):
filename = os.path.basename(gif_path)
filename, prob = filename.rsplit('_', 1)
prob = prob[:4]
id_, filename = filename[8:].split('_', 1)
filename, pred = filename.rsplit('_', 1)
gt, filename = filename.split('_', 1)
key = f'id: {id_}, {gt} predicted as {pred}, prob: {prob}'
return key, media.read_video(gif_path)
def seg_to_dict_entry(gif_prefix):
fileprefix = os.path.basename(gif_prefix)
fileprefix, prob = fileprefix.rsplit('_', 1)
id_, fileprefix = fileprefix[9:].split('_', 1)
key_gt = f'id: {id_}, gt'
val_gt = media.read_video(gif_prefix + '_gt.gif')
key_pred = f'id: {id_}, pred, {prob}'
val_pred = media.read_video(gif_prefix + '_pred.gif')
return key_gt, val_gt, key_pred, val_pred
The best test accuracy is 97.59%.
succ_filelist = [
'./output/res_cls_121_chair_predicted_as_chair_1.00.gif',
'./output/res_cls_621_vase_predicted_as_vase_0.57.gif',
'./output/res_cls_731_lamp_predicted_as_lamp_0.99.gif',
'./output/res_cls_19_chair_predicted_as_chair_1.00.gif',
'./output/res_cls_710_vase_predicted_as_vase_1.00.gif',
'./output/res_cls_884_lamp_predicted_as_lamp_1.00.gif',
'./output/res_cls_491_chair_predicted_as_chair_1.00.gif',
'./output/res_cls_700_vase_predicted_as_vase_0.59.gif',
'./output/res_cls_883_lamp_predicted_as_lamp_0.66.gif',
]
success_dict = {
k: v for k, v in [cls_to_dict_entry(x) for x in succ_filelist]
}
media.show_videos(success_dict, codec='gif', columns=3, height=500)
id: 121, chair predicted as chair, prob: 1.00 | id: 621, vase predicted as vase, prob: 0.57 | id: 731, lamp predicted as lamp, prob: 0.99 |
id: 19, chair predicted as chair, prob: 1.00 | id: 710, vase predicted as vase, prob: 1.00 | id: 884, lamp predicted as lamp, prob: 1.00 |
id: 491, chair predicted as chair, prob: 1.00 | id: 700, vase predicted as vase, prob: 0.59 | id: 883, lamp predicted as lamp, prob: 0.66 |
fail_filelist = [
'./output/res_cls_406_chair_predicted_as_lamp_0.54.gif',
'./output/res_cls_707_vase_predicted_as_lamp_0.98.gif',
'./output/res_cls_787_lamp_predicted_as_vase_0.86.gif',
]
failure_dict = {
k: v for k, v in [cls_to_dict_entry(x) for x in fail_filelist]
}
media.show_videos(failure_dict, codec='gif', columns=3, height=500)
id: 406, chair predicted as lamp, prob: 0.54 | id: 707, vase predicted as lamp, prob: 0.98 | id: 787, lamp predicted as vase, prob: 0.86 |
Interpretation: Our model performs very well with only one misclassified chair class object in the test set. We do notice that those three misclassifications are sort of outliers in each catagory. The first chair is folded, which, to be honest, is very confusing even from the perspective of human perception. The second vase has a weird rectangular shape at the top. The third lamp looks really similar to a vase with some sort of handle. Thus, they are really hard samples that are out of distribution.
The best test accuracy is 89.37%.
succ_filelist = [
'./output/res_seg__333_10000_1.0_0.96',
'./output/res_seg__562_10000_1.0_0.99',
'./output/res_seg__123_10000_1.0_0.91'
]
seg_list = [seg_to_dict_entry(x) for x in succ_filelist]
seg_list = [(x[0], x[1]) for x in seg_list] + [(x[2], x[3]) for x in seg_list]
success_dict = {
k: v for k, v in seg_list
}
media.show_videos(success_dict, codec='gif', columns=3, height=500)
id: 333, gt | id: 562, gt | id: 123, gt |
id: 333, pred, 0.96 | id: 562, pred, 0.99 | id: 123, pred, 0.91 |
fail_filelist = [
'./output/res_seg__235_10000_1.0_0.44',
'./output/res_seg__351_10000_1.0_0.46',
'./output/res_seg__26_10000_1.0_0.42'
]
seg_list = [seg_to_dict_entry(x) for x in fail_filelist]
seg_list = [(x[0], x[1]) for x in seg_list] + [(x[2], x[3]) for x in seg_list]
failure_dict = {
k: v for k, v in seg_list
}
media.show_videos(failure_dict, codec='gif', columns=3, height=500)
id: 235, gt | id: 351, gt | id: 26, gt |
id: 235, pred, 0.44 | id: 351, pred, 0.46 | id: 26, pred, 0.42 |
Interpretation: Almost all failure cases are associated with a boxy coach. These are rare cases in the whole dataset as they are not technically chairs. Due to their unconventional shapes, it is hard for our model to segement legs and arms, causing significant performance downgrade.
We sample different numbers of points to test the robustness of our best trained models both on classification and segmentation tasks. To be specific, we vary the number of points with 10, 100, 1,000, 10,000(original)
. We report the overall accuracy on corresponding test sets. Here are our results.
# of points | Cls. Acc. | Seg. Acc. |
---|---|---|
10 | 27.70% | 45.54% |
100 | 92.34% | 78.42% |
1,000 | 97.06% | 88.43% |
10,000 (original) |
97.59% | 89.37% |
We enlarge or shrink the size of the pointcloud objects just to see how our model would behave with different scales of objects. We test the robustness of our best trained models both on classification and segmentation tasks. To be specific, we vary the scale of pointclouds with 0.25, 0.5, 1(original), 2, 4
with the number of points fixed as 10,000
. We report the overall accuracy on corresponding test sets. Here are our results.
scale of points | Cls. Acc. | Seg. Acc. |
---|---|---|
0.25 | 24.55% | 20.43% |
0.5 | 43.76% | 54.99% |
1 (original) |
97.59% | 89.37% |
2 | 89.09% | 62.08% |
4 | 71.88% | 45.13% |