Enlarge / A nonetheless from Runway’s “Text to Video” teaser promo suggesting image-generation capabilities.
In a tweet posted this morning, synthetic intelligence firm Runway teased a brand new characteristic of its AI-powered web-based video editor that may edit video from written descriptions, usually referred to as “prompts.” A promotional video seems to point out very early steps towards industrial video editing or era, echoing the hype over latest text-to-image synthesis fashions like Stable Diffusion however with some optimistic framing to cowl up present limitations.
Runway’s “Text to Video” demonstration reel reveals a textual content enter field that permits editing instructions comparable to “import city street” (suggesting the video clip already existed) or “make it look more cinematic” (making use of an impact). It depicts somebody typing “remove object” and choosing a streetlight with a drawing software that then disappears (from our testing, Runway can already carry out an analogous impact using its “inpainting” software, with blended outcomes). The promotional video additionally showcases what seems to be like still-image text-to-image era much like Stable Diffusion (notice that the video doesn’t depict any of those generated scenes in movement) and demonstrates textual content overlay, character masking (using its “Green Screen” characteristic, additionally already current in Runway), and extra.
Video era guarantees apart, what appears most novel about Runway’s Text to Video announcement is the text-based command interface. Whether video editors will wish to work with pure language prompts sooner or later stays to be seen, however the demonstration reveals that folks within the video manufacturing trade are actively working towards a future during which synthesizing or editing video is as simple as writing a command.
Enlarge / Runway’s web-based video editor already makes use of AI to masks objects to create a “Green Screen” impact.
Raw AI-based video era (typically referred to as “text2video”) is in a primitive state attributable to its excessive computational calls for and the shortage of a giant open-video coaching set with metadata that may practice video-generation fashions equal to LAION-5B for nonetheless photographs. One of essentially the most promising public text2video fashions, referred to as CogVideo, can generate easy movies in low decision with uneven body charges. But contemplating the primitive state of text-to-image fashions only one 12 months in the past versus in the present day, it appears affordable to anticipate the standard of artificial video era to extend by leaps and bounds over the following few years.
Runway is offered as a web-based industrial product that runs within the Google Chrome browser for a month-to-month payment, which incorporates cloud storage for about $35 per 12 months. But the Text to Video characteristic is in closed “Early Access” testing, and you may join the waitlist on Runway’s web site.