Home 3D Scans from noisy image and range data

Look at email dated 17th August 2011


The 3D shape of the human body is useful for applications in fitness, games and apparel. Accurate body scanners, however, are expensive, limiting the availability of 3D body models. We present a method for human shape reconstruction from noisy monocular image and range data using a single inexpensive commodity sensor. The approach combines low-resolution image silhouettes with coarse range data to estimate a parametric model of the body. Accurate 3D shape estimates are obtained by combining multiple monocular views of a person moving in front of the sensor. To cope with varying body pose, we use a SCAPE body model which factors 3D body shape and pose variations. This enables the estimation of a single consistent shape while allowing pose to vary. Additionally, we describe a novel method to minimize the distance between the projected 3D body contour and the image silhouette that uses analytic derivatives of the objective function. We propose a simple method to estimate standard body measurements from the recovered SCAPE model and show that the accuracy of our method is competitive with commercial body scanning systems costing orders of magnitude more.

1. Introduction

For many applications an accurate 3D model of the human body is needed. The standard approach involves scanning the body using a commercial system such as a laser range scanner or special-purpose structured-light system.

Several such body scanners exist, costing anywhere from $35,000 to $500,000. The size and cost of such scanners limit the applications for 3D body models. Many computer vision solutions suffer the same problems and require calibrated multi-camera capture systems. Here we describe a solution that produces accurate body scans using consumer hardware that can work in a person’s living room (Fig. 1).

This opens the door to a wide range of new applications.Recently there have been several approaches to capturing 3D body shape from a monocular image [15, 16, 19, 26], a small number of synchronized camera images [5], or from several unsynchronized cameras [17]. We restrict our attention to the monocular case, where the common approach is to segment the person from the background and to estimate the 3D shape of the body such that the silhouette of the body matches the image silhouette. The wide variation in body shape, the articulated nature of the body, and self occlusions in a single view, however, all limit the usefulness of image silhouettes alone. To cope with these issues we combine image silhouettes with coarse monocular rangedata captured with a single Microsoft Kinect sensor [1].


The resolution and accuracy of the sensor is relatively poor and our key contribution is a method to accurately estimate human body pose and shape from a set of monocular low resolution images with aligned but noisy depth information. To be scanned, a person moves in front of a single sensor to capture a sequence of monocular images and depth maps that show the body from multiple angles (Fig. 2). As the person moves, their body shape changes making rigid 3D alignment impossible. We solve for the pose in each frame and for a single common shape across all frames. To do so, we use the SCAPE model [4] which is a parametric 3D model that factors the complex non-rigiddeformations induced by both pose and shape variation and is learned from a database of several thousand laser scans. We estimate model parameters in a generative framework using an objective function that combines a silhouette overlap term, the difference between the observed range data and the depth predicted by our model, and an optional pose prior that favors similarity of poses between frames.

The silhouette term uses a novel symmetric shape dissimilarity function that we locally minimize using a standard quasi-Newton method. Our silhouette formulation has significant advantages over previous methods (such as ICP) and enables accurate optimization of body shape and pose in a very high-dimensional space.

In summary our contributions are:

1) A system for athome body scanning;
2) The combination of multiple lowresolution,noisy, monocular views (range and/or silhouettes)to estimate a consistent 3D body shape with varying pose;
3) A new method for matching 3D models to silhouettes using an objective function that is correspondencefree,bidirectional, and can be optimized with standard methods requiring derivatives of the objective function;
4) A simple method to predict 3D body measurements from SCAPE model parameters using linear regression;
5) A quantitative comparison with a commercial state-of-the-art solution for scanning and measuring the body.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: