########################################################################### # THE KITTI VISION BENCHMARK SUITE: RAW DATA RECORDINGS # # Andreas Geiger Philip Lenz Raquel Urtasun # # Karlsruhe Institute of Technology # # Toyota Technological Institute at Chicago # # www.cvlibs.net # ########################################################################### This file gives more information about the KITTI raw data recordings. General information about streams and timestamps ================================================ Each sensor stream is stored in a single folder. The main folder contains meta information and a timestamp file, listing the timestamp of each frame of the sequence to nanosecond precision. Numbers in the data stream correspond to each numbers in each other data stream and to line numbers in the timestamp file (0-based index), as all data has been synchronized. All cameras have been triggered directly by the Velodyne laser scanner, while from the GPS/IMU system (recording at 100 Hz), we have taken the data information closest to the respective reference frame. For all sequences 'image_00' has been used as the synchronization reference stream. Rectified color + grayscale stereo sequences ============================================ Our vehicle has been equipped with four cameras: 1 color camera stereo pair and 1 grayscale camera stereo pair. The color and grayscale cameras are mounted close to each other (~6 cm), the baseline of both stereo rigs is approximately 54 cm. We have chosen this setup such that for the left and right camera we can provide both color and grayscale information. While the color cameras (obviously) come with color information, the grayscale camera images have higher contrast and a little bit less noise. All cameras are synchronized at about 10 Hz with respect to the Velodyne laser scanner. The trigger is mounted such that camera images coincide roughly with the Velodyne lasers facing forward (in driving direction). All camera images are provided as lossless compressed and rectified png sequences. The native image resolution is 1382x512 pixels and a little bit less after rectification, for details see the calibration section below. The opening angle of the cameras (left-right) is approximately 90 degrees. The camera images are stored in the following directories: - 'image_00': left rectified grayscale image sequence - 'image_01': right rectified grayscale image sequence - 'image_02': left rectified color image sequence - 'image_03': right rectified color image sequence Velodyne 3D laser scan data =========================== The velodyne point clouds are stored in the folder 'velodyne_points'. To save space, all scans have been stored as Nx4 float matrix into a binary file using the following code: stream = fopen (dst_file.c_str(),"wb"); fwrite(data,sizeof(float),4*num,stream); fclose(stream); Here, data contains 4*num values, where the first 3 values correspond to x,y and z, and the last value is the reflectance information. All scans are stored row-aligned, meaning that the first 4 values correspond to the first measurement. Since each scan might potentially have a different number of points, this must be determined from the file size when reading the file, where 1e6 is a good enough upper bound on the number of values: // allocate 4 MB buffer (only ~130*4*4 KB are needed) int32_t num = 1000000; float *data = (float*)malloc(num*sizeof(float)); // pointers float *px = data+0; float *py = data+1; float *pz = data+2; float *pr = data+3; // load point cloud FILE *stream; stream = fopen (currFilenameBinary.c_str(),"rb"); num = fread(data,sizeof(float),num,stream)/4; for (int32_t i=0; i image plane - R_rect_00 (4x4): cam 0 coordinates -> rectified cam 0 coord. - (R|T)_velo_to_cam (4x4): velodyne coordinates -> cam 0 coordinates - (R|T)_imu_to_velo (4x4): imu coordinates -> velodyne coordinates Note that the (4x4) matrices above are padded with zeros and: R_rect_00(4,4) = (R|T)_velo_to_cam(4,4) = (R|T)_imu_to_velo(4,4) = 1. Tracklet Labels =============== Tracklet labels are stored in XML and can be read / written using the C++/MATLAB source code provided with this development kit. For compiling the code you will need to have a recent version of the boost libraries installed. Each tracklet is stored as a 3D bounding box of given height, width and length, spanning multiple frames. For each frame we have labeled 3D location and rotation in bird's eye view. Additionally, occlusion / truncation information is provided in the form of averaged Mechanical Turk label outputs. All tracklets are represented in Velodyne coordinates. Object categories are classified as following: - 'Car' - 'Van' - 'Truck' - 'Pedestrian' - 'Person (sitting)' - 'Cyclist' - 'Tram' - 'Misc' Here, 'Misc' denotes all other categories, e.g., 'Trailers' or 'Segways'. Reading the Tracklet Label XML Files ==================================== This toolkit provides the header 'cpp/tracklets.h', which can be used to parse a tracklet XML file into the corresponding data structures. Its usage is quite simple, you can directly include the header file into your code as follows: #include "tracklets.h" Tracklets *tracklets = new Tracklets(); if (!tracklets->loadFromFile(filename.xml)) delete tracklets; In order to compile this code you will need to have a recent version of the boost libraries installed and you need to link against 'libboost_serialization'. 'matlab/readTrackletsMex.cpp' is a MATLAB wrapper for 'cpp/tracklets.h'. It can be build using make.m. Again you need to link against 'libboost_serialization', which might be problematic on newer MATLAB versions due to MATLAB's internal definitions of libstdc, etc. The latest version which we know of which works on Linux is 2008b. This is because MATLAB has changed its pointer representation. Of course you can also directly parse the XML file using your preferred XML parser. If you need to create another useful wrapper for the header file (e.g., for Python) we would be more than happy if you could share it with us). Demo Utility for projecting Tracklets into Images ================================================= In 'matlab/run_demoTracklets.m' you find a demonstration script that reads tracklets and projects them as 2D/3D bounding boxes into the images. You will need to compile the MATLAB wrapper above in order to read the tracklets. For further instructions, please have a look at the comments in the respective MATLAB scripts and functions.