Enlarge / Still picture from an AI-generated video of a teddy bear portray a portrait.
Today, Meta introduced Make-A-Video, an AI-powered video generator that may create novel video content material from text or picture prompts, much like current picture synthesis instruments like DALL-E and Stable Diffusion. It may make variations of current movies, although it isn’t but out there for public use.
On Make-A-Video’s announcement web page, Meta exhibits instance movies generated from text, together with “a young couple walking in heavy rain” and “a teddy bear painting a portrait.” It additionally showcases Make-A-Video’s capacity to take a static supply picture and animate it. For instance, a nonetheless picture of a sea turtle, as soon as processed by the AI mannequin, can look like swimming.
The key expertise behind Make-A-Video—and why it has arrived before some consultants anticipated—is that it builds off current work with text-to-image synthesis used with picture mills like OpenAI’s DALL-E. In July, Meta introduced its personal text-to-image AI mannequin referred to as Make-A-Scene.
Instead of coaching the Make-A-Video mannequin on labeled video knowledge (for instance, captioned descriptions of the actions depicted), Meta as a substitute took picture synthesis knowledge (nonetheless photographs educated with captions) and utilized unlabeled video coaching knowledge so the mannequin learns a way of the place a text or picture immediate would possibly exist in time and house. Then it will probably predict what comes after the picture and show the scene in movement for a brief interval.
A video of a teddy bear portray a portrait, created with Meta’s Make-A-Video AI mannequin (transformed to GIF for show right here).
A video of “a young couple walking in a heavy rain” created with Make-A-Video.
Video of a sea turtle, animated from a nonetheless picture with Make-A-Video.
“Using function-preserving transformations, we extend the spatial layers at the model initialization stage to include temporal information,” Meta wrote in a white paper. “The extended spatial-temporal network includes new attention modules that learn temporal world dynamics from a collection of videos.”
Meta has not made an announcement about how or when Make-A-Video would possibly grow to be out there to the general public or who would have entry to it. Meta offers a sign-up type folks can fill out if they’re desirous about making an attempt it sooner or later.
Meta acknowledges that the flexibility to create photorealistic movies on demand presents sure social hazards. At the underside of the announcement web page, Meta says that each one AI-generated video content material from Make-A-Video comprises a watermark to “help ensure viewers know the video was generated with AI and is not a captured video.”
If historical past is any information, aggressive open supply text-to-video fashions might observe (some, like CogVideo, exist already), which might make Meta’s watermark safeguard irrelevant.