You may search for interactions by specifying a query composed of:
- Visual Action: any of the Visual VerbNet actions (having unique and unambiguous visual connotation), acted by a subject upon a certain object, as explained here.
- Visual Adverb: a property associated to a Visual Action, such as the relative location or distance between subject and object, or to the subject performing the action, such as its emotional state.
- Object Category: any of the 81 object categories annnotated in the MS COCO dataset, on which a certain Visual Action is being performed.
You may also submit a query made of a combination of the three above. This will retrieve all the annotations (if present) containing a certain action, with a certain property, acted upon a specific object.
There are two options for composing the desired query:
1) Manual Selection
- Click on a Button depending on the query element you would like to select.
- Search for the element (or elements) within the specified categories and manually select it.
- When clicked upon the element will become highlighted for confirmation and automatically appear in the search bar.
Communication (6)
call
shout
signal
talk
whistle
wink
Contact (22)
avoid
bite
bump
caress
hit
hold
hug
kick
kiss
lick
lift
massage
pet
pinch
poke
pull
punch
push
reach
slap
squeeze
tickle
Nutrition (7)
chew
cook
devour
drink
eat
prepare
spread
Objects (34)
bend
break
brush
build
carry
catch
clear
cut
disassemble
drive
drop
exchange
fill
get
lay
light
mix
pour
read
remove
repair
ride
row
sail
separate
show
spill
spray
steal
put
throw
use
Perception (5)
listen
look
sniff
taste
touch
Posture / Motion (23)
balance
bend
bow
climb
crouch
fall
float
fly
hang
jump
kneel
lean
lie
perch
recline
roll
run
sit
squat
stand
straddle
swim
walk
Social (24)
accompany
be with
chase
dance
dine
dress
feed
fight
follow
give
groom
help
hunt
kill
meet
pay
play baseball
play basketball
play frisbee
play soccer
play tennis
precede
shake hands
teach
Solo (24)
blow
clap
cry
draw
groan
laugh
paint
photograph
play
play baseball
play basketball
play frisbee
play instrument
play soccer
play tennis
pose
sing
sleep
smile
write
skate
ski
snowboard
surf
Emotion (7)
anger
disgust
fear
happiness
neutral
sadness
surprise
Relative Location (6)
above
behind
below
in front
left
right
Relative Distance (4)
far
full contact
light contact
near
Accessory (5)
backpack
handbag
suitecase
tie
umbrella
Animal (10)
bear
bird
cat
cow
dog
elephant
giraffe
horse
Appliance (5)
microwave
oven
refrigerator
sink
toaster
Electronic (6)
cell phone
keyboard
laptop
mouse
remote
tv
Food (10)
apple
banana
broccoli
cake
carrot
donut
hot dog
orange
Furniture (6)
bed
chair
couch
dining table
potted plant
toilet
Indoor (7)
book
clock
hair drier
scissors
teddy bear
toothbrush
vase
Kitchen (7)
bottle
bowl
cup
fork
knife
spoon
wine glass
Outdoor (5)
bench
fire hydrant
parking meter
stop sign
traffic light
Sports (10)
baseball bat
baseball glove
frisbee
kite
skateboard
skis
snowboard
sports ball
Vehicle (8)
airplane
bicycle
boat
bus
car
motorcycle
train
truck
2) Search Bar Autocomplete
- Start typing in the search bar for a Visual Action, Adverb or Object Category.
- All the entries in Visual VerbNet and Object entries having a common root with the typed word will appear and become selectable.
- Once a valid word is entered it will be added to the search bar and a new search can be performed.
Once you composed your query and pressed the Search button you will be presented with results retrieved from the coco-a training set, if present. The results are organized in the following elements:
1) Number of retrieved annotations
The current number of annotations in the coco-a training set matching the selected query.
2) Image Id Button
Hover the mouse on top of this button to view the original image from the MS COCO dataset without the subject/object interaction annotations.
3) Give Feedback Button
Click on this button to give feedback about a specific annotation. Please use it to report errors, inappropriate material or simply doubts and suggestions regarding an interaction annotation.
4) Annotation Results
Each image represents a single interaction or solo action annotation, and can contain either a subject (highlighted in
blue) and an object (highlighted in
green), or simply a subject if the annotation represents a solo action.
Each interaction can thus be composed of two parts:
- Subject properties: describe the subject's state and solo actions, independently from the interaction carried out with the object (if present). Hover the mouse inside the subject's highlight in the image to view these annotations.
- Interaction properties: describe the subject's interaction specifically with the highlighted object and properties of this interaction, such as the relative location and distance. Hover the mouse inside the object's highlight in the image to view these annotations.