# ssv2-contact-release-interaction-dataset A **metadata and annotation repository** for **contact and release interaction events** in videos from the **Something-Something V2 (SSv2)** dataset. Includes human-annotated spatiotemporal labels for object–agent interactions. --- ## Dataset Label Structure This repository follows the **Something-Something V2** interaction schema, where each video in the original dataset is associated with a **template** and **placeholders** representing the objects involved in the interaction. > Note: This repository contains only metadata and annotations — not the original videos. --- ## Metadata Folder All dataset-related metadata files are stored under the `metadata/` directory. These files contain structured information about labeled video events, dataset splits, and mappings between templates and video IDs. --- ### 1. `metadata/video_events_labels.json` Contains detailed labels for each annotated event, including the type of interaction and its frame-level attributes. Each key represents a video ID, and the value is a list of labeled events. **Format example:** ```json { "20": [ { "action": "release", "agent": "hand-object", "frameNumber": 9, "pointX": 113, "pointY": 169 } ] } ``` **Description:** - `action` – Type of event (e.g., `contact`, `release`) - `agent` – Interacting entities (e.g., `hand-object`, `object-surface`) - `frameNumber` – The frame in which the event occurs - `pointX`, `pointY` – Pixel coordinates of the annotated event in the frame --- ### 2. `metadata/template_to_video_ids_map.json` Maps each action template to all video IDs that were manually labeled and contain interaction events (contact/release). Only videos with valid annotations are included. **Format example:** ```json { "Putting something on a surface": ["5845", "8627", "19469"], "Lifting something": ["7001", "7154", "7320"] } ``` --- ### 3. `metadata/train_videos_ids_labeled.json`, `metadata/validation_videos_ids_labeled.json`, `metadata/test_videos_ids_labeled.json` These three files list the video IDs from the corresponding dataset split (train/validation/test) that were selected for labeling and contain at least one interaction event. **Format example:** ```json [ "5845", "8627", "19469", "20251" ] ``` --- ## Frame Extraction We extracted frames from all videos using **OpenCV 4.7.0** at their original FPS. Each frame was saved as a `.jpg` image (default quality = 95) in **BGR color format** — the default channel order used internally by OpenCV. **Example:** ```python import os import cv2 def video_to_frames(video_path, output_dir): os.makedirs(output_dir, exist_ok=True) # ensure output folder exists cap = cv2.VideoCapture(video_path) fps = cap.get(cv2.CAP_PROP_FPS) count = 0 while cap.isOpened(): ret, frame = cap.read() if not ret: break cv2.imwrite(os.path.join(output_dir, f"frame_{count:05d}.jpg"), frame) count += 1 cap.release() print(f"Extracted {count} frames at {fps:.2f} FPS.") ``` The same procedure was applied to all videos in the dataset on a computing cluster, using individual jobs per video. --- ## Annotating Frames ![Annotated Frame Example](images/annotated_frame.png) Collecting human annotations for interactions using the Amazon Mechanical Turk platform. Human subjects were asked to annotate **core interaction events** in videos from the SSv2 dataset. Shown here are example annotations for **“contact”** and **“release”** events, where the target object (white candle) becomes attached to a hand (left) and a surface (middle), or detached from the hand (right). Each annotation includes: - The **event type** (e.g., contact, release) - The **agent–object pair** (hand–object, object–surface, etc.) - The **spatiotemporal location** of the event (frame number and image coordinates) --- **Example code snippet:** ```python from io import BytesIO from urllib.request import urlopen from PIL import Image, ImageDraw import numpy as np from pathlib import Path def annotate_frame(lbl, wrkr_id_save_dir, point_x, point_y, frame_number, agent, action, point_radius=5): """ Draws annotation info (point, frame number, agent, and action) on an image and saves it locally. Args: lbl (dict): Label data containing 'imageURL'. wrkr_id_save_dir (Path or str): Directory where the annotated image will be saved. point_x (int): X coordinate of the clicked point. point_y (int): Y coordinate of the clicked point. frame_number (int): Frame number within the video. agent (str): Agent type (e.g., 'Hand-Object', 'Object-Surface'). action (str): Action type (e.g., 'Contact', 'Release'). point_radius (int, optional): Radius of the point marker. Defaults to 5. Returns: Path: Path to the saved annotated image. """ # Ensure save directory exists save_dir = Path(wrkr_id_save_dir) save_dir.mkdir(parents=True, exist_ok=True) # Load the image from the provided URL img_url = lbl.get('imageURL') url = urlopen(img_url) img = Image.open(BytesIO(url.read())) # Draw the annotations on the image draw = ImageDraw.Draw(img) # Draw the red point draw.ellipse( (point_x - point_radius, point_y - point_radius, point_x + point_radius, point_y + point_radius), fill=(255, 0, 0) ) # Pick a contrasting color for the text (based on mean image color) mean_color = (*(255 - np.asarray(img)[:150, :150].mean(axis=(0, 1))).astype(int), 0) # Add textual information draw.text((10, 10), f'Frame: {frame_number}', fill=mean_color) draw.text((10, 25), f'Point: ({point_x}, {point_y})', fill=mean_color) draw.text((10, 40), f'Agent: {agent}', fill=mean_color) draw.text((10, 55), f'Action: {action}', fill=mean_color) # Save the annotated image output_path = save_dir / f'frame_{frame_number}.jpg' try: img.save(str(output_path)) print(f"Saved: {output_path}") return output_path except Exception as err: print(f"ERROR: Could not save image {img_url}: {err}") return None ``` --- ### Summary Table | File | Description | |------|--------------| | `metadata/video_events_labels.json` | Frame-level annotations for each video, including action type, coordinates, and frame number. | | `metadata/template_to_video_ids_map.json` | Mapping of each action template to all labeled video IDs with interactions. | | `metadata/train_videos_ids_labeled.json` | IDs of labeled videos in the training set. | | `metadata/validation_videos_ids_labeled.json` | IDs of labeled videos in the validation set. | | `metadata/test_videos_ids_labeled.json` | IDs of labeled videos in the test set. |