Just when we were about to ask what next, OpenAI introduces Sora which can generate videos, and I must confess to the fact that they are pretty neat, and very impressive to say the least.
OpenAI had successfully launched ChatGPT which was widely accepted and used by many all around the world for several content-related purposes before moving on to launch GPT 4 which was the paid version with more features attached to it. The former had already stirred some form of contreversy among writers who felt threatened by the abilities of the AI.
Also see: Mark Zuckerberg Reviews Apple Vision Thinks Meta Quest 3 is the Better Product
Now, given the expected rise of Sora, how do you anticipate that video editors will respond and react to this development by OpenAI? But while we await their reactions and response, let us explore the abilities of Sora and learn as much as we can about the AI.
OpenAI Introduces Sora
OpenAI is entering the video-generating space, following in the footsteps of IT behemoths like Google and Meta as well as startups like Runway.
Sora is a generative AI model that turns text into video. According to OpenAI, Sora can create 1080p movie-like scenes with several people, various sorts of motion, and background features given a brief or extensive description or a still image.
To fill in the blanks, Sora can also “extend” already-existing video segments.
In a blog post, OpenAI claims that “Sora has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions,”
Also seeApple Vision Pro Buyers Returning Headset Due to Discomfort, Productivity Issues
“The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”
Now, the demo page for Sora on OpenAI contains a lot of hyperbole, like the remark above. Still, the model’s selectively shown samples do appear rather good—at least when contrasted with other text-to-video systems we’ve seen.
First off, Sora is significantly more capable of producing films than most text-to-video models, with a maximum length of one minute and a variety of styles (such as photorealistic, animated, and black and white). Furthermore, these videos retain a fair amount of coherence since they don’t always give up to what I like to refer to as “AI weirdness,” such as objects moving in directions that aren’t physically feasible.
View this tour of an art museum created by Sora (please excuse the graininess; it was caused by compression from my video-to-GIF converter):
It’s true that some of Sora’s videos including human subjects, such a robot standing against a cityscape or a person strolling down a snowy path, have a somewhat video game-like feel to them. This could be because there isn’t a lot of background activity. In addition, AI oddities like autos going in one direction and then abruptly turning around or arms melting into a duvet cover manage to infiltrate numerous other clips.
OpenAI — for all its superlatives — acknowledges the model isn’t perfect. It writes:
“[Sora] may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.”
Positioning Sora primarily as a research preview, OpenAI withholds information about the training data (less than 10,000 hours of “high-quality” video) and does not make Sora publicly accessible. The possibility for abuse serves as its justification; OpenAI accurately notes that unscrupulous parties might abuse a model such as Sora in a variety of ways.
According to OpenAI, it is collaborating with specialists to search the model for vulnerabilities and developing instruments to identify if a film was produced by Sora. The company further states that it will make sure provenance metadata is included in the created outputs if it decides to integrate the model into a product that is intended for public use.
“We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology,” OpenAI writes. “Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.”