Six Degrees of Freedom (6DoF) Video: Standards, Challenges, and Practical Implementation
Youku’s new 6DoF video approach replaces low‑resolution, bandwidth‑heavy VR by capturing multi‑camera footage, reconstructing full‑depth scenes in the cloud, and streaming standard‑compliant streams that let viewers freely translate and rotate within the content, delivering higher visual quality, AR integration, and new business opportunities.
Hello, I am Houteng from Youku. Our team originally started with VR and has since transformed its technical direction. In this talk I will explain why we are pursuing a new direction, what "6DoF video" is, its core technical points, and its future trends.
When Youku first worked on VR (around 2016‑2017), the technology and content were promising, but we encountered severe pain points that remain unsolved. Traditional VR captures a 360° sphere and projects it onto a 2D rectangle, which lacks true 3D depth and scene reconstruction. Consequently, the resulting video contains little useful information, consumes large bandwidth, and offers a poor visual experience compared to ordinary video.
The main problems of current VR video are:
1. Traditional VR capture (even with high‑end cameras like Nokia Ozo) records a spherical image that is flattened into a rectangle, providing no true stereoscopic information or scene reconstruction. Therefore, there is little room for further processing.
2. The information density is low. A fixed VR camera captures only a small portion of the sphere at any moment, resulting in low visible resolution. When a VR version and a normal version of the same show are offered, users overwhelmingly prefer the normal version because the VR experience is visually inferior.
3. VR Live video consumes massive bandwidth and compute resources for content that offers limited value.
We also observed that 5G and 8K VR are often touted as target applications, yet even after deep optimization the bandwidth requirements are not as dramatic as claimed. Youku proposed a solution based on the Chinese AVS VR standard and IEEE standards, using an asymmetric mapping to allocate a larger portion of the view to the region of interest, thereby saving bandwidth and improving visible resolution.
We discussed the need for a video that offers higher visual quality, true 3D scene reconstruction, and higher bandwidth efficiency. To achieve this, we must address three dimensions: visual appeal, stereoscopic scene reconstruction, and bandwidth utilization.
We then introduced the concept of six degrees of freedom (6DoF). In mechanics, a rigid body has six degrees of freedom: three translations (X, Y, Z) and three rotations (roll, yaw, pitch). Current VR video only provides three rotational degrees, limiting immersion to a fixed‑head experience.
Our goal for 6DoF video is to allow the viewer’s viewpoint to move freely within a scene—walking around, changing height, and even moving onto a stage—offering an experience stronger than physically attending the event.
The closest existing example is the "bullet time" effect from the 1998 movie *The Matrix*, where multiple cameras capture a scene from different angles and the footage is stitched together. We reproduced a similar effect in Youku’s variety show *This Is Street Dance* using multi‑camera rigs.
In 2017 we advanced beyond the original multi‑camera approach: most of the final view is generated in post‑production rather than relying on raw camera angles. We capture with multiple cameras, then reconstruct the scene on the cloud, allowing flexible time‑shift and creative effects.
Our 6DoF demo lets viewers rotate left/right, move up/down, and translate within the video, selecting their preferred viewpoint.
We also created a simplified version for the 2020 Double‑11 shopping festival, offering multi‑angle viewing of stage performances. This early version suffers from noticeable stutter between angles due to its simplicity.
In summary, we aim to let users freely choose any of the six mechanical degrees of freedom within a predefined range while watching video.
The production pipeline consists of three main parts:
1. On‑site multi‑camera capture, requiring synchronization and scheduling.
2. Cloud‑side processing, where we reconstruct the scene, compute occlusions, generate reference viewpoints, and encode the data into a standard video stream.
3. Client‑side rendering, which involves decoding, re‑assembling frames according to the chosen viewpoint, interpolating new frames, and displaying the result.
Our cloud processing generates virtual color views and corresponding depth maps. By stitching these together, we create a 3D representation of the scene. When the user drags the view, new frames are interpolated using neighboring frames and depth information.
Because 6DoF video provides full depth information, it naturally integrates with AR. We demonstrated a simple AR demo where a virtual object (a rabbit) is placed on the ground and remains fixed regardless of the viewer’s angle, enabling personalized advertising and brand placement.
From an academic perspective, standards bodies such as MPEG and AVS are working on next‑generation codecs (e.g., H.266, HDR) and immersive media formats (VR, point clouds, sound fields, light fields). Youku has contributed to the Chinese AVS standard since 2016 and is now collaborating with Peking University on a 6DoF video standard.
From a business standpoint, 6DoF video is suitable for content that benefits from fixed‑point viewing (e.g., sports, concerts, dance performances) and for providing a sense of presence where users can explore the scene freely. It also enables personalized sharing, where each user can generate a unique viewpoint‑based clip.
Finally, an upcoming talk at LiveVideoStack2019 Shanghai will cover "6DoF Video Standards and Practice: Towards the Next‑Generation High‑Freedom Video Experience".
Youku Technology
Discover top-tier entertainment technology here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.