What Data Powers OpenAI’s Upcoming Video Model Sora?

OpenAI CTO Mira Murati provided vague answers about Sora’s training data, confirming the use of publicly available, licensed, and Shutterstock content while acknowledging uncertainty about social‑media sources, amid ongoing legal disputes over AI model data usage.

21CTO
21CTO
21CTO
What Data Powers OpenAI’s Upcoming Video Model Sora?

In a Wall Street Journal interview, OpenAI CTO Mira Murati gave vague answers about the data sources for the upcoming video‑generation model Sora, stating that the model is trained on publicly available and licensed data.

The journalist asked whether Sora was trained on content from platforms such as YouTube, Instagram, or Facebook, to which Murati replied she was not sure, adding that if the data were publicly accessible it could have been used.

“You know, if they’re publicly available—publicly available. But I’m not sure. I don’t have confidence in that.”

When questioned about OpenAI’s partnership with Shutterstock, Murati confirmed that Shutterstock data was indeed used for training Sora, though she declined to detail the dataset.

AI models rely on large training datasets to learn patterns, make predictions, and understand language.

Murati has been with OpenAI since 2018, leading projects such as DALL‑E 3, Whisper, and GPT‑4, and briefly served as interim CEO after the board’s removal of Sam Altman in November 2023.

OpenAI faces multiple lawsuits over the use of copyrighted material in its training data, including cases filed by authors in July 2023 and a December 2023 lawsuit by The New York Times alleging the use of its content to train chatbots, as well as a class action in California accusing OpenAI of scraping private user information without consent.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SoraVideo GenerationOpenAIAI training datalegal disputes
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.