Opinion/CINE-REVOLUTION?

AI videos just learned to talk. Filmmaking will never be the same

WIth the launch of Google’s Veo 3, AI video tools now write, direct, and speak – shaking the foundations of filmmaking as we know it.

Synchronised sound was introduced to cinema in the late 1920s, ushering in a period of profound change. Last week, film history completed a full loop, delivering round two of the synchronised sound revolution. This time via the launch of Google’s unsexily titled Veo 3, which for the first time allows users to create AI-generated videos with audio—including background noises, sound effects, and (most significantly) dialogue.

At first blush, this sounds like an ordinary event: just another initiative from a tech industry doubling down on artificial intelligence. But there’s no underestimating the impact of Veo 3 and similar software coming down the pipeline. These video creation tools will vastly eclipse the impact of what Hollywood underwent in the 20s. Most filmmakers seem unaware of the seismic change that’s coming.

From talkies to techies: the second sound revolution

A fresh deluge of AI-generated videos has arrived on social media platforms, giving slack-jawed viewers another taste of a technology certain to progress in leaps and bounds. The general consensus is that these videos have passed (with some room for improvement) the “Will Smith eating spaghetti test.” This refers to a preposterously unconvincing AI-generated video from just a couple of years ago, depicting the actor wolfing down a plate of pasta. Back then, naysayers laughed at how silly it looked. They’re not laughing now.

Viewers come to this test suspicious, hunting for flaws. A better way to judge the tech is to look at ordinary, digitally rendered Joes and Janes and ask: do they look real? Why yes, yes they do. The software is breathtakingly impressive and improving at an exponential rate. But we won’t be reacting with “WOW!” for all that much longer. During the initial shock and awe period, all major new media technologies feel like magic, before becoming ordinary—as ingrained in daily life as toasters and showerheads.

These technologies also generate utopian and dystopian reactions—the giddily excited versus the apoplectic. The invention of photography was proclaimed by Edgar Allan Poe as “the most important and perhaps the most extraordinary triumph of modern science.” Painter J.M.W. Turner had a different take, famously saying: “This is the end of art. I am glad I have had my day.” Even the invention of writing was not universally well received. Socrates argued it would negatively impact human memory by “introducing forgetfulness into the soul of those who learn it.”

Every technology creates moral panic

In this context, it’s not surprising to encounter news articles reporting on Veo 3 with headlines like “You Are Not Prepared for This Terrifying New Wave of AI-Generated Videos,“Google’s New Video-Generating AI May Be the End of Reality as We Know It” and “AI video just took a startling leap in realism. Are we doomed?” I’m not saying I don’t share some of the concerns raised in these articles; I do. Particularly the dawning of a world—which has been coming for a long time—in which we can no longer trust anything we see or hear online; there’s simply no way to ascertain whether it’s real.

Pushing aside that conversation, for now, let’s look at why this technology will catch on, from a business case point-of-view. Everybody knows that filmmaking is extremely expensive: a movie that costs two or three million dollars is considered cheap (and they’re not usually the kind of productions associated with striking visual effects). The “ultra” tier of a monthly Veo 3 subscription—which can produce the kind of content that, just a few years ago, only Hollywood studios could do—currently costs US$249.

Price-wise, there are some caveats; the subscription for instance only buys you so many “credits” before you have to pay extra. But you can times that price by 10 and it’d still be a fraction of what traditional filmmaking costs. There are some limitations, too. For instance in the current iteration of Veo 3, each rendered sequence is limited to a maximum of eight seconds (though any number can be stitched together).

 

View this post on Instagram

 

A post shared by Evolving AI (@evolving.ai)

When that cap is increased to 30 seconds, or a minute, creators will have enormous freedom to produce, visually speaking, anything they can think of. It’s hard to imagine the impact this will have, and how the film industry will respond. Hollywood has a way of adapting to the times, and protecting its lucrative business models. But you can be sure that massive change is coming.

The auteur of the future is a promp engineer

We have, of course, entered the crystal ball realm. One simple fact about the future is that it never plays out exactly how you think it will. My long-term prediction is that, saturated by and every kind of two-dimensional content available, traditional motion pictures will slowly lose their lustre and audiences will gravitate towards other mediums, such as augmented and virtual realities (technologies I’m fascinated with, running a VR reviews website in my spare time).

For the foreseeable future, at least, creating an AI-generated movie will still require a huge amount of work and human input. Some may be disgusted by the suggestion that there’s an art and craft to writing input prompts, and directing AI—but it’s true. And that’s without mentioning the need to tell great stories, which, for a while at least, will remain a human-centric prerogative.

The auteur of the future will be a great coder and a great creative thinker, merging the so-called left and right sides of the brain. It’s going to be a wild ride.