2.6. Interpret your model
[Concept]
I try to visualize the precision and recall score for a better idea of
how the model performs. Instead of just looking at the F1 score, it's important
to know where false negatives and false positives could happen.
[Visualization explanation]
In the visualization
below, I show an example of voxel, mesh and point cloud predictions. The left column
is the input single-view image. The middle column shows the sampled points
from predictions used to compute the F1 score. The right column shows the sample points
from the ground truth mesh.
Right points and yellow in the middle images are predicticted points whose distance to the closest
ground truth point are below and above threshold 0.05, respectively.
Right points and yellow in the middle images are ground truth points whose distance to the closest
predicted points are below and above threshold 0.05, respectively.
Briefly speaking, more yellow points in the middle images means more false positives and lower precision.
More yellow points in the right images means more false negatives and lower recall.
[Interpretation]
Overall, voxel predictions have more false positives in the hollow part of the chair back
and back chair legs. And all the three predictions have difficulty reconstructing the
connection part of chair legs. The visualization is actually reflected to the F1 score (point cloud > mesh > voxel) and
gives one better idea of what the model learns to do well and not well.
From this visualization, we start wondering why, in the middle images,
most of the points inside the hollow part of the chair back are red, which we expect
to be yellow. There could be two reasons. First, the threshold is too high. Second,
different predicted points can have the same closet ground truth point. Based on the visualization and
these two (or more) reasons, we can try more methods to find a metric that can be more relfective to human perception.