Cliplets juxtaposing still and dynamic imagery

The warping procedure typically stretches the geometry in the plane (the SMPL model is usually thinner than the clothed subject, often thinner than even the unclothed subject), without similarly stretching (typically inflating) the depth. We experimented with setting Z ( x ) = Z S M P L ( f ( x ) ), but the resulting meshes were usually too flat in the z direction (See Fig. The mesh is further textured, and animated using motion capture sequences on an inpainted background.

This process is repeated to simulate the model’s back view and combine depth and skinning maps to create a complete, rigged 3D mesh. The core of our system is: find a mapping between person’s silhouette and the SMPL silhouette, warp the SMPL normal/skinning maps to the output, and build a depth map by integrating the warped normal map.

Then, A SMPL template model is fit to the 2D pose and projected into the image as a normal map and a skinning map. Given a photo, person detection, 2D pose estimation, and person segmentation, is performed using off-the-shelf algorithms. The most similar 3D reconstruction work is although they take a video as input. Most methods for 3D body shape estimation focus on semi-nude body reconstruction and not necessarily ready for animation, while we take cloth into account and look for an animatable solution. Most single-image person animation has focused on primarily 2D or pseudo-3D animation (e.g., ) while we aim to provide a fully 3D experience.