Soon after Meta’s announcement of the launch of Make-A-Video, which allows users to create AI-generated video content using text descriptions and even video clips, Google announced their own AI video-generating program, Imagen Video. Similar to Meta’s program, Imagen Video allows users to feed the program with text descriptions which then create original video content based on what the user provided. Though like its counterpart in Meta, the results aren’t always perfect, and often not what the writer expects.
Even though this might be the case, these advancements in artificial intelligence programs that can create solely by text depictions are major. As we saw during the summer of 2022, AI-powered image creators such as DALL-E 2 and ROBOMOJO have taken the internet by storm, and countless users have utilized these programs to not only create original work but to even win art contests.
In a paper by the Google research team behind Imagen Video, they explain how the program works. Imagen Video takes a text description and then generates a sixteen-frame by three-frames per-second video at a 24 by 48-pixel resolution. The program then upscales the video and predicts additional frames. This produces a final 128-frame, 24-framers per second video at 720p video quality or 1280×768.
Imagen Video was trained using the publically available LAION-400M image-text dataset. It was also trained on 14 million video-text pairs and 60 million image-text pairs. Because of this, it is able to generalize a range of aesthetics. One thing that is quite interesting in the team’s paper is how they claim that Imagen Video is able to understand the depth and three dimensions when generating video content. This has allowed it to create videos where it can capture different angles with minimal or no distortion.
But there is an issue that researchers at Google are still trying to work out. Imagen Video was trained using data with what the company calls “problematic content,” which could result in the program generating graphically violent or sexually explicit clips if prompted to with the right text description. Likely because of this and other issues, Google isn’t going to release Imagen Video’s source code or model. Also, unlike Meta, there wouldn’t be an opportunity for users to signup and use the program.
As of right now, the program will be behind a curtain where Google can continue to work and research this technology further. Though this is a step forward in AI tools that generate content from text, there is still a way to go before it can create videos at a quality that one might see with DALL-E 2, Midjourney, or ROBOMOJO.