List of RGBD datasets

Datasets capturing single objects

These datasets capture objects under fairly controlled conditions. Bigbird is the most advanced in terms of quality of image data and camera poses, while the RGB-D object dataset is the most extensive.

RGBD Object dataset

Introduced: ICRA 2011

Device: Kinect v1

Description: 300 instances of household objects, in 51 categories. 250,000 frames in total

Labelling: Category and instance labelling. Includes auto-generated masks, but no exact 6DOF pose information.

Download: Project page

Bigbird dataset

Introduced: ICRA 2014

Device: Kinect v1 and DSLR

Description: 100 household objects

Labelling: Instance labelling. Masks, ground truth poses, registered mesh.

Download: Project page

A large dataset of object scans

Introduced: 2016

Device: PrimeSense Carmine

Description: Over 10,000 objects densely scanned and reconstructed. Data captured from the real world by non-technical operators.

Labelling: Object present in each scan.

Download: Project page

Segmentation, detection and pose estimation under controlled conditions

These datasets include objects arranged in controlled conditions. Clutter may be present. CAD or meshed models of the objects may or may not be provided. Most provide 6DOF ground truth pose for each object.

Object segmentation dataset

Introduced: IROS 2012

Device: Kinect v1

Description: 111 RGBD images of stacked and occluding objects on table.

Labelling: Per-pixel segmentation into objects.

Download: Project page

Willow Garage Dataset

Introduced: 2011

Device: Kinect v1

Description: 353 frames of 110 different household objects on a board in controlled environment.

Labelling: 6DOF pose for each object, taken from board calibration. Per-pixel labelling.

Download: Project page

TUW Dataset

Introduced: IROS 2014

Device: Kinect v1

Description: 15 multi-view sequences of indoor scenes, totalling 163 frames. Also 3 dynamic scenes. 162 different objects.

Labelling: 6DOF pose for each object

Download: Project page

'3D Model-based Object Recognition and Segmentation in Cluttered Scenes'

Introduced: IJCV 2009

Device: Minolta Vivid 910 (only depth, no RGB!)

Description: 50 frames depicting five objects in various occluding poses. No background clutter in any image.

Labelling: Pose and per-point labelling information. 3D mesh models of each of the 5 objects.

Download: Project page

'A Global Hypotheses Verifcation Method for 3D Object Recognition'

Introduced: ECCV 2012

Device: Kinect v1

Description: 50 Kinect frames, library of 35 objects

Labelling: 6DOF pose of each object (unsure how this was gathered). No per-pixel labelling.

Download: Direct link

'Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes'

Introduced: ACCV 2012

Device: Kinect v1

Description: 18,000 Kinect images, library of 15 objects.

Labelling: 6DOF pose for each object in each image. No per-pixel labelling.

Download: Project page

'RGB-D Semantic Segmentation Dataset'

Introduced: IROS 2011

Device: Kinect v1

Description: 16 test scenes of household objects, plus 3D training models for each category.

Labelling: Semantic segmentation of each scene.

Download: Project page

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects

Introduced: WACV 2017

Device: Primesense Carmine 1.09, Microsoft Kinect v2, Canon IXUS 950 IS (the sensors were synchronized)

Description: 30 texture-less objects. 39K training and 10K test images from each sensor. Two types of 3D models for each object - a manually created CAD model and a semi-automatically reconstructed one.

Labelling: 6DOF pose for each object in each image. Per-pixel labelling can be obtained by rendering of the object models at the ground truth poses.

Download: Project page

Kinect data from the real world

RGBD Scenes dataset

Introduced: ICRA 2011

Device: Kinect v1

Description: Real indoor scenes, featuring objects from the RGBD object dataset 'arranged' on tables, countertops etc. Video sequences of 8 scenes.

Labelling: Per-frame bounding boxes for objects from RGBD object dataset. Other objects not labelled.

Download: Project page

RGBD Scenes dataset v2

Introduced: ICRA 2014

Device: Kinect v1

Description: A second set of real indoor scenes featuring objects from the RGBD object dataset. Video sequences of 14 scenes, together with stitched point clouds and camera pose estimations.

Labelling: Labelling of points in stitched cloud into one of 9 classes (objects and furniture), plus background.

Download: Project page

'Object Disappearance for Object Discovery'

Introduced: IROS 2012

Device: Kinect v1

Description: Three datasets: Small, with still images. Medium, video data from an office environment. Large, video over several rooms. Large dataset has 7 unique objects seen in 397 frames. Data is in ROS bag format.

Labelling: Ground truth object segmentations.

Download: Project page

'Object Discovery in 3D scenes via Shape Analysis'

Introduced: ICRA 2014

Device: Kinect v1

Description: KinFu meshes of 58 very cluttered indoor scenes.

Labelling: Ground truth binary labelling (object/not object) performed on segments proposed by the algorithm, with no labelling on the mesh.

Download: Project page

Cornell-RGBD-Dataset

Introduced: NIPS 2011

Device: Kinect v1

Description: Multiple RGBD frames from 52 indoor scenes. Stitched point clouds (using RGBDSLAM).

Labelling: Per-point object-level labelling on the stitched clouds.

Download: Project page

NYU Dataset v1

Introduced: ICCV 2011 Workshop on 3D Representation and Recognition

Device: Kinect v1

Description: Around 51,000 RGBD frames from indoor scenes such as bedrooms and living rooms. Note that the updated NYU v2 dataset is typically used instead of this earlier version.

Labelling: Dense multi-class labelling for 2283 frames.

Download: Project page

NYU Dataset v2

Introduced: ECCV 2012

Device: Kinect v1

Description: ~408,000 RGBD images from 464 indoor scenes, of a somewhat larger diversity than NYU v1. Per-frame accelerometer data.

Labelling: Dense labelling of objects at a class and instance level for 1449 frames. Instance labelling is not carried across scenes. This 1449 subset is the dataset typically used in experiments.

Download: Project page

'Object Detection and Classification from Large-Scale Cluttered Indoor Scans'

Introduced: Eurographics 2014

Device: Faro Lidar scanner

Description: Faro lidar scans of ~40 academic offices, with 2-3 scans per office. Each scan is 0.25GB-2GB. Scans include depth and RGB.

Labelling: No labelling present. The labelling shown in the exemplar image is their algorithm output.

Download: Project page

SUN3D

Introduced: ICCV 2013

Device: Kinect v1

Description: Videos of indoor scenes, registered into point clouds.

Labelling: Polygons of semantic class and instance labels on frames propagated through video.

Download: Project page

SUN RGB-D

Introduced: CVPR 2015

Device: Kinect v1, Kinect v2, Intel RealSense and Asus Xtion Live Pro

Description: New images, plus images taken from NYUv2, B3DO and SUN3D. All of indoor scenes.

Labelling: 10,335 images with polygon annotation, and 3D bounding boxes around objects

Download: Project page

B3DO: Berkeley 3-D Object Dataset

Introduced: ICCV Workshop on Consumer Depth Cameras in Computer Vision 2011

Device: Kinect v1

Description: Aim is to crowdsource collection of Kinect data, to be included in future releases. Version 1 has 849 images, from 75 scenes.

Labelling: Bounding box labelling at a class level.

Download: Project page

Kinect RGBD Dataset for Category Modeling

Introduced: CVPR 2013

Device: Kinect v1

Description: 900 RGBD images from seven different categories. Some images naturally captured, others with specifically arranged objects.

Labelling: Category of dominant object in each image

Download: Project page

ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset

Introduced: IJRR 2015

Device: Kinect v1

Description: Five sequences (total 22454 frames) captured from a robot moving through an office environment

Labelling: Scene type of each frame, plus presence/absence of each of a set of 15 objects.

Download: Project page

SceneNN

Introduced: 3DV 2016

Device: Asus Xtion PRO

Description: Videos of indoor scenes, registered into triangle meshes.

Labelling: Per-vertex and per-pixel instance segmentation, bounding boxes and object poses.

Download: Project page

GMU Kitchen Dataset

Introduced: 3DV 2016

Device: Kinect v2

Description: 9 video sequences captured from 4 different kitchens, each containing objects from the BigBIRD dataset.

Labelling: Per-frame camera pose, 3D point clouds, object bounding box annotations and point labels.

Download: Project page

Stanford 2D-3D-Semantics Dataset

Introduced: arXiv 2017

Device: Matterport Camera (360 degree rotation RGBD sensor)

Description: 360 degree RGBD images captured from 6 large areas in municipal buildings, together with mesh and point cloud reconstructions.

Labelling: Semantic labelling on the mesh (13 classes, plus instance labels), and 3D volumentric reconstruction labels

Download: Project page

Active Vision Dataset (AVD)

Introduced: ICRA 2017

Device: Kinect v2

Description: Dense sampling of images in home and office scenes, captured from a robot. Dataset designed for simulation of motion and instance detection.

Labelling: Per-frame camera pose, object instance bounding boxes, movement pointers between images.

Download: Project page

ScanNet

Introduced: CVPR 2017

Device: Structure sensor

Description: 2.5 million frames from 1513 scenes

Labelling: Automatically computed (and human verified) camera poses and surface reconstructions. Instance and semantic segmentations provided on reconstructed mesh. 3D CAD models + alignment also provided for each scene.

Download: Project page

SLAM, registration and reconstruction

TUM Benchmark Dataset

Introduced: IROS 2012

Device: Kinect v1

Description: Many different scenes and scenarios for tracking and mapping, including reconstruction, robot kidnap etc.

Labelling: 6DOF ground truth from motion capture system with 10 cameras.

Download: Project page

Microsoft 7-scenes dataset

Introduced: CVPR 2013

Device: Kinect v1

Description: Kinect video from 7 indoor scenes.

Labelling: 6DOF 'ground truth' from Kinect Fusion.

Download: Project page

IROS 2011 Paper Kinect Dataset

Introduced: IROS 2011

Device: Kinect v1

Description: Lab-based setup. The aim seems to be to track the motion of camera.

Labelling: 6DOF ground truth from Vicon system

Download: Project page

'When Can We Use KinectFusion for Ground Truth Acquisition?'

Introduced: Workshop on Color-Depth Camera Fusion in Robotics, IROS 2012

Device: Kinect v1

Description: A set of 57 scenes, captured from natural environments and from artificial shapes. Each scene has a 3D mesh, volumetric data and registered depth maps.

Labelling: Frame-to-frame transformations as computed from KinectFusion. The 'office' and 'statue' scenes have LiDAR ground truth.

Download: Project page

DAFT Dataset

Introduced: ICPR 2012

Device: Kinect v1

Description: A few short sequences of different planar scenes captured under various camera motions. Used to demonstrate repeatability of feature points under transformations.

Labelling: Camera motion type. 2D homographies between the planar scene in different images.

Download: Project page

'Automatic Registration of RGB-D Scans via Salient Directions'

Introduced: ICCV 2013

Device: RGBD Laser scanner

Description: Several laser scans taken from each of a European church, city and castle scenes.

Labelling: Results of the authors' registration algorithm.

Download: Project page

Stanford 3D Scene Dataset

Introduced: SIGGRAPH 2013

Device: Xtion Pro Live (Kinect v1 equivalent)

Description: RGBD videos of six indoor and outdoor scenes, together with a dense reconstruction of each scene.

Labelling: Estimated camera pose for each frame. No ground truth pose, so not ideal for quantitative evaluation.

Download: Project page

CoRBS: Comprehensive RGB-D Benchmark for SLAM using Kinect v2

Introduced: WACV 2016

Device: Kinect v2

Description: Twenty sequences from four scenes, with ground truth for trajectory and geometry.

Labelling: 6DOF ground truth trajectory from motion capture system and ground truth geometry from active scanner.

Download: Project page

'MobileRGBD, An open benchmark corpus for mobile RGB-D related algorithms'

Introduced: ICARCV 2014

Device: Kinect v2, lidar

Description: 9.5 hours of recording in 4 different environments, comprising RGBD, infrared and LIDAR. Environments have dummies placed to simulate humans.

Labelling: Position orientation and speed of the robot at each frame, actual ground plane, height and angle of the Kinect and dummies 3D position in the room.

Download: Project page

Depth Reconstruction Occlusionless Temporal (DROT) Dataset

Introduced: 3DV 2016

Device: Kinect v1, v2 and RealSense R200

Description: Five stop-motion sequences of 11-30 frames each

Labelling: Registrations between each camera, together with ground truth depth from David SLS-2 3D scanner.

Download: Project page

CVSSP Dynamic RGBD Modelling

Introduced: Circuits and Systems for Video Technology 2018

Device: Kinect v1, v2 and synthetic

Description: Eight RGBD sequences of general dynamic scenes captured using the Kinect V1/V2 as well as two synthetic sequences. Designed for non-rigid reconstruction.

Labelling: None

Download: Project page

'Shading-based Refinement on Volumetric Signed Distance Functions'

Introduced: TOG 2015

Device: PrimeSense

Description: Four RGBD sequences of small statues and artefacts.

Labelling: 6DOF inferred camera trajectory, plus fused (and refined) reconstructions.

Download: Project page

Synthetic

Synthetic datasets get their whole own section as they typically can be used for multiple purposes.

ICL-NUIM Dataset

Introduced: ICRA 2014

Device: Kinect v1 (synthesised)

Description: Eight synthetic RGBD video sequences: four from a office scene and four from a living room scene. Simulated camera trajectories are taken from a Kintinuous output from a sensor being moved around a real-world room.

Labelling: Camera trajectories for each video. Geometry of the living room scene as an .obj file.

Download: Project page

Augmented ICL-NUIM Dataset

Introduced: CVPR 2015

Device: Kinect v1 (synthesised)

Description: An augmentation of the ICL-NUIM dataset, with camera paths added to allow it to be used for scene reconstruction.

Labelling: In addition to UCL-NUIM: New camera paths for each scene, plus a noise model and a point based surface model to enable reconstruction evaluation.

Download: Project page

SceneNet RGB-D

Introduced: arXiv 2016

Device: Kinect v1 (synthesised)

Description: 5 million images rendered of 16,895 indoor scenes. Room configuration randomly generated with physics simulator.

Labelling: Camera pose, plus per-pixel instance, class labelling and optical flow.

Download: Project page

SUNCG

Introduced: arXiv 2016

Device: User choice

Description: 45,622 scenes with manually created room and furniture layouts. Images can be rendered from the geometry, but are not provided by default.

Labelling: Object semantic class and instance labelling.

Download: Project page

Tracking

See also some of the human datasets for body and face tracking.

Princeton Tracking Benchmark

Introduced: ICCV 2013

Device: Kinect v1

Description: 100 RGBD videos of moving objects such as humans, balls and cars.

Labelling: Per-frame bounding box covering target object only.

Download: Project page

Datasets involving humans: Body and hands

Cornell Activity Datasets: CAD-60 and CAD-120

Introduced: PAIR 2011/IJRR 2013

Device: Kinect v1

Description: Videos of humans performing activities

Labelling: Each video given at least one label, such as eating, opening or working on computer. Skeleton joint position and orientation labelled on each frame.

Download: Project page

RGB-D Person Re-identification Dataset

Introduced: First International Workshop on Re-Identification 2012

Device: Kinect v1

Description: Front and back poses of 79 people walking forward in different poses.

Labelling: In addition to the per-person label, the dataset provides foreground masks, skeletons, 3D meshes and an estimate of the floor.

Download: Project page

Sheffield KInect Gesture (SKIG) Dataset

Introduced: IJCAI 2013

Device: Kinect v1

Description: Total of 1080 Kinect videos of six people performing one of 10 hand gesture sequences, such as 'triangle' or 'comehere'. Sequences captured under a variety of illumination and background conditions.

Labelling: The gesture being performed in each sequence.

Download: Project page

RGB-D People Dataset

Introduced: IROS 2011

Device: Kinect v1

Description: 3000+ frames of people walking and standing in a university hallway, captured from three Kinects.

Labelling: Per-frame bounding box annotations of individual people, together with a `visibility' measure.

Download: Project page

50 Salads

Introduced: UbiComp 2013

Device: Kinect v1

Description: Over 4 hours of video of 25 people preparing 2 mixed salads each

Labelling: Accelerometer data from sensors attached to cooking utensils, and labelling of steps in the recipes.

Download: Project page

Microsoft Research Cambridge-12 Kinect gesture data set

Introduced: CHI 2012

Device: Kinect v1

Description: 594 sequences and 719,359 frames of 30 people performing 12 gestures.

Labelling: Gesture performed in each video sequence, plus motion tracking of human joint locations.

Download: Project page

UR Fall Detection Dataset

Introduced: Computer Vision Theory and Applications 2014

Device: Kinect v1

Description: Videos of people falling over. Consists of 60 sequences recorded with two Kinects.

Labelling: Accelerometer data from device attached to subject.

Download: Project page

RGBD-HuDaAct

Introduced: ICCV Workshops 2011

Device: Kinect v1

Description: 30 different humans each performing the same 12 activities, e.g. 'eat a meal'. Also include a random 'background' activity. All performed in a lab environment. Around 5,000,000 frames in total.

Labelling: Which activity being performed in each sequence.

Download: Project page

Human3.6M

Introduced: PAMI 2014

Device: SwissRanger time-of-flight (+ 2D cameras)

Description: 11 different humans performing 17 different activities. Data comes from four calibrated video cameras, 1 time-of-flight camera and (static) 3D laser scans of the actors.

Labelling: 2D and 3D human joint positions, obtained from a Vicon motion capture system.

Download: Project page

TST Fall detection dataBase

Introduced: ICT Innovations 2015

Device: Kinect v2

Description: Videos of 11 different humans performing activities of daily living and falling over in various ways.

Labelling: Activity performed, acceleration data, skeleton joint locations.

Download: Project page

TST TUG dataBase

Introduced: IEEE ICC 2015

Device: Kinect v2

Description: Videos of 20 different humans standing up and walking around

Labelling: Acceleration data, skeleton joint locations.

Download: Project page

TST Intake Monitoring dataset

Introduced: None

Device: Kinect v1

Description: Videos of 35 different humans simulating food intake actions

Labelling: Skeleton joint locations estimated by three different algorithms. Ground truth positions of hands and head joints.

Download: Project page

MSR 3D Online Action Dataset

Introduced: ACCV 2014

Device: Kinect v1

Description: Videos of human-object interaction, in seven categories, plus a negative class.

Labelling: Activity being performed in each video.

Download: Project page

MSRGesture3D

Introduced: EUSIPCO 2012, ECCV 2012

Device: Kinect v1

Description: 10 humans performing 12 American Sign Language gestures, each gesture being performed 2-3 times. The hands have been segmented.

Labelling: The gesture being performed in each video.

Download: Project page

MSRDailyActivity3D

Introduced: CVPR 2012

Device: Kinect v1

Description: 10 humans performing 16 activities, e.g. read book, play guitar. Each activity performed in sitting and standing positions.

Labelling: Activity being performed, plus 20 joint locations of skeleton positions.

Download: Project page

MSR Action3D Dataset

Introduced: None

Device: ?? (similar to Kinect, with 320x240 resolution)

Description: Videos of 10 humans performing 20 action types. Each subject performs each action 2 or 3 times.

Labelling: Activity being performed, plus 20 joint locations of skeleton positions.

Download: Project page

Northwestern-UCLA Multiview Action 3D Dataset

Introduced: None

Device: Kinect v1

Description: Three Kinects used to simulatneously record 10 actions each being performed by 10 humans

Labelling: Activity being performed

Download: Project page

UTD Multimodal Human Action Dataset (UTD-MHAD)

Introduced: ICIP 2015

Device: Kinect v1

Description: Eight different humans performing 27 actions in a controlled environment, each action repeated 4 times. The humans wore accelerometers.

Labelling: Action being performed, accelerometer data associated with each video.

Download: Project page

Dataset of a human performing daily life activities in a scene with occlusions

Introduced: IROS 2015

Device: Kinect v1

Description: 12 RGB-D video sequences of a person performing activities with obstacles occluding the view from the Kinect

Labelling: 15 position markers of the 3D joint location from a MoCap system

Download: Project page

Background activity dataset

Introduced: 2015

Device: Kinect v1

Description: Humans in TV-watching setup, performing occasional gestures. 52 person-hours of video in total, with 13 groups of 4 humans.

Labelling: Gestures performed. Mocap for all humans.

Download: Project page

ChaLearn gesture challenge dataset

Introduced: 2012

Device: Kinect v1

Description: Originally designed for one-shot learning, for a Kaggle competition.

Labelling: Action being performed in a subset of the videos. Body part annotations

Download: Project page

Montalbano gesture dataset

Introduced: ECCV 2014

Device: Kinect v1

Description: 13858 sequences each depicting one of 27 humans performing one of 20 Italian gestures.

Labelling: Gesture being performed in each sequence.

Download: Project page

Berkeley Multimodal Human Action Database

Introduced: WACV 2014

Device: Kinect v1

Description: 660 videos each of one of 12 humans, each performing one of 11 actions in a MoCap studio.

Labelling: Action being performed. 3D skeleton positions from MoCap, also stereo cameras and accelerometer data.

Download: Project page

Grasp Understanding Dataset (GUN-71)

Introduced: ICCV 2015

Device: Kinect v2

Description: 12,000 images of human hands manipulating one of 28 objects, captured from a chest-mounted RGBD camera. Eight different subjects (4 males and 4 females) in 5 different houses.

Labelling: Each image labelled with one of 71 fine-grained grasps.

Download: Project page

Manipulation Action (MANIAC) Dataset

Introduced: RAS 2014

Device: Kinect v1

Description: Videos of eight manipulation actions, each recorded 15 times (with 5 different humans.). Also videos of chained sequences of actions.

Labelling: Action being performed, per-pixel frame labelling into objects, hands etc.

Download: Project page

TVPR (Top View Person Re-identification) Dataset

Introduced: ICPR 2016

Device: Asus Xtion PRO Live

Description: Videos of 100 humans recorded in a top-down configuration.

Labelling: Person ID in each video

Download: Project page

Datasets involving humans: Head and face

Biwi Kinect Head Pose Database

Introduced: IJCV 2013

Device: Kinect v1

Description: 15K images of 20 different people moving their heads in different directions.

Labelling: 3D position of the head and its rotation, acquired using 'faceshift' software.

Download: Project page

Eurecom Kinect Face Dataset

Introduced: ACCV Workshop on Computer Vision with Local Binary Pattern Variants 2012

Device: Kinect v1

Description: Images of faces captured under laboritory conditions, with different levels of occlusion and illumination, and with different facial expressions.

Labelling: In addition to occlusion and expression type, each image is manually labelled with the position of six facial landmarks.

Download: Project page

3D Mask Attack Dataset

Introduced: Biometrics: Theory, Applications and Systems 2013

Device: Kinect v1

Description: 76500 frames of 17 different people, facing the camera against a plain background. Two sets of the data are captured on the real subjects two weeks apart, while the final set consists of a single person wearing a fake face mask of the 17 different people.

Labelling: Which user is in each frame. Which images are real and which are spoofed. Manually labelled eye positions.

Download: Project page

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2

Introduced: IEEE Transactions on Multimedia 2010

Device: Custom active stereo setup

Description: Simultaneous audio and visual recordings of 1109 sentences spoken by 14 different people. Each sentence spoken neutrally and with an emotion. Depth images converted to 3D mesh.

Labelling: Perceived emotions for each recording. Audio labelled with phonemes.

Download: Project page

ETH Face Pose Range Image Data Set

Introduced: CVPR 2008

Device: Custom active stereo setup

Description: 10,545 images of 20 different people turning their head.

Labelling: Nose potition and coordinate frame at the nose.

Download: Project page

List of RGBD datasets

RGBD Datasets: Past, Present and Future

Datasets capturing single objects

RGBD Object dataset

Bigbird dataset

A large dataset of object scans

Segmentation, detection and pose estimation under controlled conditions

Object segmentation dataset

Willow Garage Dataset

TUW Dataset

'3D Model-based Object Recognition and Segmentation in Cluttered Scenes'

'A Global Hypotheses Verifcation Method for 3D Object Recognition'

'Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes'

'RGB-D Semantic Segmentation Dataset'

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects

Kinect data from the real world

RGBD Scenes dataset

RGBD Scenes dataset v2

'Object Disappearance for Object Discovery'

'Object Discovery in 3D scenes via Shape Analysis'

Cornell-RGBD-Dataset

NYU Dataset v1

NYU Dataset v2

'Object Detection and Classification from Large-Scale Cluttered Indoor Scans'

SUN3D

SUN RGB-D

B3DO: Berkeley 3-D Object Dataset

Kinect RGBD Dataset for Category Modeling

ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset

SceneNN

GMU Kitchen Dataset

Stanford 2D-3D-Semantics Dataset

Active Vision Dataset (AVD)

ScanNet

SLAM, registration and reconstruction

TUM Benchmark Dataset

Microsoft 7-scenes dataset

IROS 2011 Paper Kinect Dataset

'When Can We Use KinectFusion for Ground Truth Acquisition?'

DAFT Dataset

'Automatic Registration of RGB-D Scans via Salient Directions'

Stanford 3D Scene Dataset

CoRBS: Comprehensive RGB-D Benchmark for SLAM using Kinect v2

'MobileRGBD, An open benchmark corpus for mobile RGB-D related algorithms'

Depth Reconstruction Occlusionless Temporal (DROT) Dataset

CVSSP Dynamic RGBD Modelling

'Shading-based Refinement on Volumetric Signed Distance Functions'

Synthetic

ICL-NUIM Dataset

Augmented ICL-NUIM Dataset

SceneNet RGB-D

SUNCG

Tracking

Princeton Tracking Benchmark

Datasets involving humans: Body and hands

Cornell Activity Datasets: CAD-60 and CAD-120

RGB-D Person Re-identification Dataset

Sheffield KInect Gesture (SKIG) Dataset

RGB-D People Dataset

50 Salads

Microsoft Research Cambridge-12 Kinect gesture data set

UR Fall Detection Dataset

RGBD-HuDaAct

Human3.6M

TST Fall detection dataBase

TST TUG dataBase

TST Intake Monitoring dataset

MSR 3D Online Action Dataset

MSRGesture3D

MSRDailyActivity3D

MSR Action3D Dataset

Northwestern-UCLA Multiview Action 3D Dataset

UTD Multimodal Human Action Dataset (UTD-MHAD)

Dataset of a human performing daily life activities in a scene with occlusions

Background activity dataset

ChaLearn gesture challenge dataset

Montalbano gesture dataset

Berkeley Multimodal Human Action Database

Grasp Understanding Dataset (GUN-71)

Manipulation Action (MANIAC) Dataset