Structured light 3D scanning systems are fundamentally constrained by limited sensor bandwidth and light source power, hindering their performance in real-world applications where depth information is essential, such as industrial automation, autonomous transportation, robotic surgery, and entertainment. We present a novel structured light technique called Motion Contrast 3D scanning (MC3D) that maximizes bandwidth and light source power to avoid performance trade-offs. The technique utilizes motion contrast cameras that sense temporal gradients asynchronously, i.e., independently for each pixel, a property that minimizes redundant sampling. This allows laser scanning resolution with single-shot speed, even in the presence of strong ambient illumination, significant inter-reflections, and highly reflective surfaces. The proposed approach will allow 3D vision systems to be deployed in challenging and hitherto inaccessible real-world scenarios requiring high performance using limited power and bandwidth.
"MC3D: Motion Contrast 3D Scanning"
N. Matsuda, O. Cossairt, and M. Gupta
IEEE Conference on Computational Photography (ICCP), April 2015
- Prof. Cossairt gives talk on Computational Imaging and Illumination at Oculus Research and Microsoft Research
- Prof. Cossairt gives talk on Computational Imaging and Illumination at Argonne National Labs
- Prof. Cossairt gives talk on MC3D at European Conference on Lasers and Electro Optics (CLEO ’15)
- Prof. Cossairt gives invited talk at annual COSI conference
- Prof. Cossairt gives talk on MC3D at 2015 Dagstuhl Seminar on Computational Imaging
- MC3D covered by photonics.com Light Matters newscast
- MC3D Coverage in the Media: Gizmag, Image Sensors World, Science Daily, More…
- MC3D wins second best demo at ICCP 2015
- Prof. Cossairt gives invited talk on MC3D at OMRON Corporation in Kyoto, Japan
- Comp Photo Lab Wins Google Research Award
- Prof. Cossairt gives invited talk at IPAM Workshop on Computational Photography and Intelligent Cameras
Structured Light Method Trade-offs:
SL systems face trade-offs in acquisition speed, resolution, and light efficiency. Laser scanning (upper left) achieves high resolution at slow speeds. Single-shot methods (mid-right) obtain lower resolution with a single exposure. Other methods such as Gray coding and phase shifting (mid-bottom) balance speed and resolution but have degraded performance in the presence of strong ambient light, scene inter-reflections, and dense participating media. Hybrid techniques from Gupta et al. (curve shown in green) and Taguchi et al. (curve shown in red) strike a balance between these extremes. This paper proposes a new SL method, motion contrast 3D scanning (denoted by the point in the center), that simultaneously achieves high resolution, low acquisition speed, and robust performance in exceptionally challenging 3D scanning environments.
Traditional vs Motion Contrast Sensors:
(Left) The space-time volume output of a conventional camera consists of a series of discrete full frame images (here a black circle on a pendulum). (Right) The output of a motion contrast camera for the same scene consists of a small number of pixel change events scattered in time and space. The sampling rate along the time axis in both cameras is limited by the camera bandwidth. The sampling rate for motion contrast is far higher because of the naturally sparse distribution of pixel change events.
A scanning source illuminates projector positions α1 and α2 at times t1 and t2, striking scene points s1 and s2. Correspondence between projector and camera coordinates is not known at runtime. The DVS sensor registers changing pixels at columns i1 and i2 at times t1 and t2, which are output as events containing the location/event time pairs [i1, τ1] and [i2, τ2]. We recover the estimated projector positions j1 and j2 from the event times. Depth can then be calculated using the correspondence between event location and estimated projector location.
Comparison with Laser Scanning and Microsoft Kinect:
Laser scanning performed with laser galvanometer and traditional sensor cropped to 128×128 with total exposure time of 28.5s. Kinect and MC3D methods captured with 1 second exposure at 128×128 resolution (Kinect output cropped to match) and median filtered. Object placed 1m from sensor under ∼150 lux ambient illuminance measured at object. Note that while the image-space resolution for all 3 methods are matched, MC3D produces depth resolution equivalent to laser scanning, whereas the Kinect depth is more coarsely quantized.
Output Under Ambient Illumination:
Disparity output for both methods captured with 1 second exposure at 128×128 resolution (Kinect output cropped to match) under increasing illumination from 150 lux to 5000 lux measured at middle of the sphere surface. The illuminance from our projector pattern was measured at 150 lux. Note that in addition to outperforming the Kinect, MC3D returns usable data at ambient illuminance levels an order of magnitude higher than the projector power.
Scenes with Interreflection
The image on the left depicts a test scene consisting of two pieces of white foam board meeting at a 30 degree angle. The middle row of the depth output from Gray coding and MC3D are shown in the plot on the right. Both scans were captured with an exposure time of 1/30th second. Gray coding used 22 consecutive coded frames, while MC3D results were averaged over 22 frames. MC3D faithfully recovers the V-groove shape while the Gray code output contains gross errors.
Scenes with Reflective Surfaces
The image on the left depicts a reflective test scene consisting of a shiny steel sphere. The plot on the right shows the depth output from Gray coding and MC3D. Both scans were captured with an exposure time of 1/30th second. The Gray coding method used 22 consec- utive coded frames, while MC3D results were averaged over 22 frames. The Gray code output produces significant artifacts not present in MC3D output.
Video Output: Spinning Coin
This spinning coin video was produced using 30Hz MC3D scanning to demonstrate the techniques ability to capture a variety of surfaces, such as the hand and highly reflective coin, at high speed.
Video Output: Paper Pinwheel
This spinning pinwheel video compares video output from MC3D with output from the 1st generation Kinect. Since Kinect relies on smoothness assumptions to capture depth in real time the output suffers from low resolution compared to MC3D.
This work was supported by funding through the Biological Systems Science Division, Office of Biological and Environmental Research, Office of Science, U.S. Dept. of Energy, under Contract DE-AC02-06CH11357. Additionally, this work was supported by ONR award number 1(GG010550)//N00014-14-1-0741, NSF CAREER grant IIS-1453192, and a Google Faculty Research award.