Recognizing and generating object-state compositions has been a challenging task, especially when generalizing to unseen compositions. In this paper, we study the task of cutting objects in different styles and the resulting object state changes. We propose a new benchmark suite Chop & Learn, to accommodate the needs of learning objects and different cut styles using multiple viewpoints. We also propose a new task of Compositional Image Generation, which can transfer learned cut styles to different objects, by generating novel object-state images. Moreover, we also use the videos for Compositional Action Recognition, and show valuable uses of this dataset for multiple video tasks.
ChopNLearn is collected with:
We show the number of samples for each object-style composition in a color-coded manner:
orange represents 12 samples
green represents 8 samples
blue represents 4 samples.
Compositional Image Generation: Given training images of various objects in different states, generate new images of unseen pairs of objects and states.
We consider these methods:
Ground Truth (GT) real images are shown in the first row for reference.
Please select different splits, objects, and states to view the generated images of different compositions.
Results of 3D reconstruction using RealFusion.
Input images are shown on the top row, and the corresponding 3D reconstructions are shown on the bottom row.
Pear
Julienne
Squash
Small Cut
Watermelon
Half Round Slices
Potato
Baton
Onion
Round Slices