Our 2nd generation Motion Contrast 3D prototype scanner.

 

Project Description

Structured light 3D scanning systems are fundamentally constrained by limited sensor bandwidth and light source power, hindering their performance in real-world applications where depth information is essential, such as industrial automation, autonomous transportation, robotic surgery, and entertainment. We present a novel structured light technique called Motion Contrast 3D scanning (MC3D) that maximizes bandwidth and light source power to avoid performance trade-offs. The technique utilizes motion contrast cameras that sense temporal gradients asynchronously, i.e., independently for each pixel, a property that minimizes redundant sampling. This allows laser scanning resolution with single-shot speed, even in the presence of strong ambient illumination, significant inter-reflections, and highly reflective surfaces. The proposed approach will allow 3D vision systems to be deployed in challenging and hitherto inaccessible real-world scenarios requiring high performance using limited power and bandwidth.

Publications

"MC3D: Motion Contrast 3D Scanning"
N. Matsuda, O. Cossairt, and M. Gupta
IEEE Conference on Computational Photography (ICCP), April 2015
[PDF]

News

Images

Screen Shot 2015-04-22 at 10.27.10 PM

Teaser Video


A brief introduction to Motion Contrast 3D scanning.

Screen Shot 2015-04-20 at 4.49.51 PM

Video Overview


Supplemental video review of Motion Contrast 3D scanning.

SL_taxonomy_005c

Structured Light Method Trade-offs:


SL systems face trade-offs in acquisition speed, resolution, and light efficiency. Laser scanning (upper left) achieves high resolution at slow speeds. Single-shot methods (mid-right) obtain lower resolution with a single exposure. Other methods such as Gray coding and phase shifting (mid-bottom) balance speed and resolution but have degraded performance in the presence of strong ambient light, scene inter-reflections, and dense participating media. Hybrid techniques from Gupta et al. (curve shown in green) and Taguchi et al. (curve shown in red) strike a balance between these extremes. This paper proposes a new SL method, motion contrast 3D scanning (denoted by the point in the center), that simultaneously achieves high resolution, low acquisition speed, and robust performance in exceptionally challenging 3D scanning environments.

Screen-Shot-2015-03-17-at-3.57.15-PM

Traditional vs Motion Contrast Sensors:


(Left) The space-time volume output of a conventional camera consists of a series of discrete full frame images (here a black circle on a pendulum). (Right) The output of a motion contrast camera for the same scene consists of a small number of pixel change events scattered in time and space. The sampling rate along the time axis in both cameras is limited by the camera bandwidth. The sampling rate for motion contrast is far higher because of the naturally sparse distribution of pixel change events.

Screen-Shot-2015-03-17-at-3.59.09-PM (1)

System Diagram:


A scanning source illuminates projector positions α1 and α2 at times t1 and t2, striking scene points s1 and s2. Correspondence between projector and camera coordinates is not known at runtime. The DVS sensor registers changing pixels at columns i1 and i2 at times t1 and t2, which are output as events containing the location/event time pairs [i1, τ1] and [i2, τ2]. We recover the estimated projector positions j1 and j2 from the event times. Depth can then be calculated using the correspondence between event location and estimated projector location.

Screen Shot 2015-03-17 at 4.00.01 PM

Comparison with Laser Scanning and Microsoft Kinect:


Laser scanning performed with laser galvanometer and traditional sensor cropped to 128×128 with total exposure time of 28.5s. Kinect and MC3D methods captured with 1 second exposure at 128×128 resolution (Kinect output cropped to match) and median filtered. Object placed 1m from sensor under ∼150 lux ambient illuminance measured at object. Note that while the image-space resolution for all 3 methods are matched, MC3D produces depth resolution equivalent to laser scanning, whereas the Kinect depth is more coarsely quantized.

Screen Shot 2015-03-17 at 4.12.45 PM

Output Under Ambient Illumination:


Disparity output for both methods captured with 1 second exposure at 128×128 resolution (Kinect output cropped to match) under increasing illumination from 150 lux to 5000 lux measured at middle of the sphere surface. The illuminance from our projector pattern was measured at 150 lux. Note that in addition to outperforming the Kinect, MC3D returns usable data at ambient illuminance levels an order of magnitude higher than the projector power.

interreflection_007

Scenes with Interreflection


The image on the left depicts a test scene consisting of two pieces of white foam board meeting at a 30 degree angle. The middle row of the depth output from Gray coding and MC3D are shown in the plot on the right. Both scans were captured with an exposure time of 1/30th second. Gray coding used 22 consecutive coded frames, while MC3D results were averaged over 22 frames. MC3D faithfully recovers the V-groove shape while the Gray code output contains gross errors.

specular_008

Scenes with Reflective Surfaces


The image on the left depicts a reflective test scene consisting of a shiny steel sphere. The plot on the right shows the depth output from Gray coding and MC3D. Both scans were captured with an exposure time of 1/30th second. The Gray coding method used 22 consec- utive coded frames, while MC3D results were averaged over 22 frames. The Gray code output produces significant artifacts not present in MC3D output.

Screen Shot 2015-04-21 at 10.55.05 AM

Video Output: Spinning Coin


This spinning coin video was produced using 30Hz MC3D scanning to demonstrate the techniques ability to capture a variety of surfaces, such as the hand and highly reflective coin, at high speed.

Screen Shot 2015-04-20 at 4.48.44 PM

Video Output: Paper Pinwheel


This spinning pinwheel video compares video output from MC3D with output from the 1st generation Kinect. Since Kinect relies on smoothness assumptions to capture depth in real time the output suffers from low resolution compared to MC3D.

Acknowledgements

This work was supported by funding through the Biological Systems Science Division, Office of Biological and Environmental Research, Office of Science, U.S. Dept. of Energy, under Contract DE-AC02-06CH11357. Additionally, this work was supported by ONR award number 1(GG010550)//N00014-14-1-0741,  NSF CAREER grant IIS-1453192, and a Google Faculty Research award.