Artificial Intelligence 7 min read

OpenAI’s Sora Video Model Is Hyped—But Here Are the Flaws OpenAI Itself Acknowledges

The article walks through OpenAI’s own admission of Sora’s shortcomings—such as unrealistic physics, misplaced spatial details, and erratic object behavior—by showcasing concrete demo failures, additional observations, and technical notes about its diffusion‑based, transformer architecture and metadata embedding.

CSS Magic

Feb 20, 2024

OpenAI’s official Sora page lists a set of current shortcomings for the model.

Flaws

OpenAI describes difficulty simulating complex physical behavior, inability to understand certain causal relationships (e.g., a person bites a cookie but no bite mark remains), and confusion of spatial details such as left/right. The model also struggles with events that evolve over time, for example following a specific camera trajectory.

Case 1

Prompt: A person running, Step‑printing style, 35 mm film shooting.

Flaw: The generated motion can be physiologically implausible, with the running direction reversed and limb rhythms unreasonable.

Case 2

Prompt: Five wolf pups playing on a remote gravel road surrounded by grass, leaping and chasing each other.

Flaw: In scenes with many entities, animals or characters may appear irregularly, such as wolf pups spontaneously splitting and merging.

Case 3

Prompt: A basketball passes through the hoop and then explodes.

Flaw: Physical modeling is inaccurate and object deformation is unnatural; the ball passes through the net, appears out of thin air, and clips with the hoop.

Case 4

Prompt: An archaeologist discovers an ordinary plastic chair in a desert and carefully excavates and cleans it.

Flaw: The chair is not modeled as a rigid object, leading to floating, splitting, or deformation during interaction.

Case 5

Prompt: A tidy‑looking elderly grandmother stands behind a wooden dining table with a colorful birthday cake, blowing out candles while friends and family celebrate.

Flaw: Complex interactions among multiple objects and characters are often simulated incorrectly, producing comical results such as odd candle‑flame direction, candles not responding to blowing, and unnatural character motions.

Other Observations

Additional demos on the official site show foot sliding while walking, unnatural eye gaze, and a slight uncanny‑valley effect in facial expressions.

Technical Details

Sora is still in private beta; access is limited to a small set of safety and creative experts.

Generated videos embed C2PA metadata, the same open provenance standard used in DALL·E 3 images.

The model is a diffusion video generator: it starts from static‑noise video frames and iteratively denoises over many steps, analogous to Stable Diffusion for images.

Sora uses a Transformer architecture, the core technology behind GPT.

Beyond text‑to‑video, Sora supports image‑to‑video generation and can extend or transition existing videos.

OpenAI positions Sora as a foundational model for understanding and simulating the real world, viewing it as a milestone toward AGI.

Appendix

Sora official site: https://openai.com/sora

Sora technical report: https://openai.com/research/video-generation-models-as-world-simulators

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

diffusion model Sora Transformer Video generation OpenAI AI limitations

Written by

CSS Magic

Learn and create, pioneering the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.