Thing 13: AI video generation

AI video generation has made extraordinary progress. You can type a sentence describing a scene and, within a minute or two, watch a short video clip that didn't exist before, complete with camera movement, lighting, and convincing physics. A year ago, AI-generated video was mostly a novelty: people shared clips of melting faces and impossible physics as curiosities. Today, the best tools produce clips that could pass for footage shot on a real camera, at least for a few seconds.

But video is also where AI's current limitations are most visible. Hands still do strange things. Objects appear and disappear between frames. A person walking might suddenly have three legs, or the camera might do something physically impossible. The clips are short (typically five to ten seconds) and the longer you ask the AI to maintain a scene, the more likely things are to drift into the uncanny. This is a snapshot of where the technology sits right now: useful for some things, not yet reliable enough for others.

Understanding this matters beyond just video. Video generation is the most computationally demanding creative AI task, which is why it's the most expensive, the most limited on free tiers, and the most obviously imperfect. It's also the fastest-moving. The tools you'll use today are noticeably better than what was available six months ago, and what's available six months from now will likely make today's outputs look rough. Learning to evaluate AI video critically is a skill that transfers to every other AI modality you've explored so far.

How AI video generation works

An abstract illustration representing AI video generation, with visual elements suggesting motion, frames, and digital creation — AI video generation can produce short clips from text descriptions or still images, but maintaining consistency across frames remains a core challenge.

The basic process will feel familiar from image generation. You write a text prompt describing a scene, and the AI generates a short video clip. Some tools also accept a still image as a starting point, animating it into motion. This image-to-video approach often produces more predictable results because the AI has a visual reference to work from rather than building everything from text alone.

Under the surface, video generation models are trained on enormous datasets of video footage. They've learned patterns: how water flows, how fabric moves in wind, how a camera pans across a landscape. When you give them a prompt, they draw on those learned patterns to generate a sequence of frames that, ideally, look like a coherent piece of video.

"Ideally" is doing a lot of work in that sentence, though. Video is far more complex than a still image. An image generator produces one consistent frame. A video generator produces dozens of frames per second, and each one has to be consistent with the ones before and after it. Every object needs to move in a physically plausible way. Lighting needs to stay consistent. If there's a person in the scene, their face, body, and clothing need to look the same in every frame. This temporal consistency (maintaining coherence over time) is the central challenge of video generation, and it's why video AI lags behind image AI in reliability.

This is also why the clips are short. Most free-tier generations produce five to ten seconds of video. That's enough to showcase a mood, demonstrate a concept, or create a social media clip, but it's a long way from generating a full scene or a narrative sequence. Some paid tiers allow extensions up to several minutes, but quality tends to degrade with longer generation, and the costs climb steeply.

What the tools can do well

Current video AI is impressive at certain types of output. Slow, atmospheric shots tend to work particularly well: a drone view drifting over a landscape, a close-up of rain hitting a surface, a timelapse of clouds moving across a sky. Scenes with limited human action also tend to be more successful: nature footage, product shots, abstract motion graphics, architectural visualisations. When the AI doesn't have to manage complex human movement or precise hand-object interaction, it can produce surprisingly convincing clips.

Creative and stylised video (animated styles, dreamlike sequences, abstract visuals) often works better than attempts at photorealism, because the viewer's tolerance for imperfection is higher when the aesthetic is already non-realistic.

Where the tools struggle

Hands and fingers remain a persistent challenge, just as in image generation, but more noticeable because they need to move convincingly across multiple frames. Complex human motion (walking naturally, picking up objects, interacting with other people) frequently reveals the AI's limitations. Faces in motion can drift or distort, particularly in longer clips or when the camera angle changes.

Text in video is almost always garbled. Physics can be inconsistent: objects may float, gravity may work differently in different parts of the frame, and water or fabric may move in ways that look slightly wrong without being easy to pinpoint. AI video also sometimes produces what people call "AI drift", a gradual mutation where objects or people slowly transform into something different over the course of a clip.

None of this is reason to write the technology off. But it does mean knowing what it's good at today, and watching how quickly the limitations are being addressed.

The current tool landscape

Video generation is the most expensive AI creative category, which means free tiers are more limited than what you've encountered with image or music tools. That said, several platforms offer enough free access to give you a proper feel for the technology.

Pika

Pika (pika.art) is the recommended starting point for this activity. It's one of the most accessible video generators, with a straightforward interface and a free tier that gives you enough credits to create several clips.

Pika supports text-to-video (describe a scene and the AI generates it), image-to-video (upload a still image and Pika animates it), and a range of creative effects. Pikaffects let you apply surreal transformations to video (melting, inflating, exploding), which are entertaining to experiment with and useful for understanding how AI manipulates visual content. Pikaswaps let you swap objects within a video, and Pikadditions let you insert new elements into existing footage.

The free tier currently provides a limited number of credits per month (the exact amount changes periodically, so check the pricing page when you sign up). Free-tier videos are generated at lower resolution (480p) with a Pika watermark, and image-to-video only. Clips are typically three to five seconds long. For this activity, the free tier is sufficient for the core exercise, though you'll burn through credits faster than you might expect: each generation attempt costs credits whether or not the result is usable.

Paid plans start at around £8/month and unlock text-to-video, higher resolutions, faster generation, watermark-free output, and commercial usage rights.

Kling AI

Kling AI (klingai.com) is a strong alternative, developed by the Chinese technology company Kuaishou. It's known for producing some of the most realistic AI video available, particularly for human motion and facial expressions. Its distinctive feature is video length: Kling can generate clips significantly longer than most competitors, with extensions up to several minutes on paid tiers.

The free tier gives you 66 credits per day, which is enough for one or two short clips in standard mode. Free-tier videos are lower resolution (360p to 540p) with a watermark. Be aware that free-tier generation can be slow, and there are reports of generations occasionally failing while still consuming credits, a frustrating quirk shared by several video AI platforms.

Kling's newer models include native audio generation alongside video, producing synchronised sound effects and ambient audio. When it works well, it's quite something.

Paid plans start at around £7/month and are credit-based, with the cost per video depending on resolution, length, and which model you use.

Other tools worth knowing about

Runway (runwayml.com) is considered an industry standard for professional AI video work. Runway Gen-4 offers excellent precision and character consistency across scenes. The free tier provides a limited one-time credit allowance (currently 125 credits), enough for a handful of test generations but not for sustained experimentation. Paid plans start at around £12/month.

Luma Dream Machine (lumalabs.ai) offers free credits and is known for high-fidelity output with particularly good physics simulation. It's worth exploring if you want to see a different approach to video generation.

Google Veo 3 is accessible through Gemini for subscribers and produces impressive results with native audio generation. However, it requires a Gemini Advanced subscription (around £20/month), so it's not practical for this activity's free-tier approach.

Sora (OpenAI) generates the most photorealistic video currently available but requires a ChatGPT Pro subscription at $200/month, putting it firmly outside the scope of casual exploration. Impressive to watch demos of, but not something most people will use directly.

A note on the Chinese AI tools

You may notice that Kling AI is developed by a Chinese company. Several of the strongest video generation tools come from Chinese AI labs, including Kling (Kuaishou) and others like MiniMax and Zhipu. These tools are technically impressive and often offer generous free tiers.

It's worth being aware that data uploaded to these platforms may be stored on servers in China and subject to different data protection regulations than UK or EU-based services. For this activity, where you're generating video from text prompts or personal images rather than uploading sensitive content, the practical risk is low. But it's the kind of consideration worth noting, particularly if you're thinking about using these tools with anything work-related.

Ethics, copyright, and deepfakes

The ethical questions around AI video are the same ones you've encountered with images, voice, and music, but amplified.

Copyright and training data. Like image and music generators, video models are trained on existing video content. The legal status of this training is still being worked out, and the same tensions between AI companies and content creators apply here.

Deepfakes. Video generation makes the creation of convincing fake footage significantly easier. While the tools discussed here have safeguards against generating realistic depictions of identifiable individuals, the underlying capability exists and is a serious concern. Being able to recognise AI-generated video, and knowing that it exists, is an important part of media literacy in 2026.

Provenance and transparency. Many video generation platforms now embed metadata in their outputs to identify them as AI-created. If you share AI-generated video, being transparent about its origin is both good practice and, increasingly, expected.

Environmental cost. Video generation requires significantly more computing power than image generation, which translates to higher energy consumption. This isn't a reason to avoid using the tools, but it's worth being aware that generating dozens of video clips has a real environmental footprint; another reason to be thoughtful rather than indiscriminate about what you generate.

Resources to explore

Pika

Free tier with limited monthly credits. Image-to-video on free tier; text-to-video on paid tiers. Web-based, no download required. The most accessible starting point for this activity.

Open tool

Kling AI

Free tier with 66 daily credits. Impressive realism and longer video capability. Web-based.

Open tool

Runway

Limited one-time free credits (125). Industry standard for professional AI video. Web-based.

Open tool

Luma Dream Machine

Free credits available. Strong physics simulation and high-fidelity output.

Open tool

AI video generation comparison (PXZ AI)

A regularly updated comparison of the major platforms, useful for understanding the current state of the field.

Read article

Activity: your first AI videos

30–45 minutes Pika (free tier) + optionally a second tool

You're going to generate AI video clips using at least two different approaches, evaluate the results critically, and reflect on where this technology is useful today and where it falls short. Think of this as the video version of the image generation exercise from Thing 9: you're learning by making, comparing, and evaluating.

Set up your account. Create a free account at pika.art. If you'd like to try a second tool for comparison (recommended if you have the patience for two sign-ups) create an account at klingai.com as well.
Start with image-to-video. Find or create a still image to use as your starting point. You could use one of the AI-generated images you created in Thing 9, a photograph you've taken yourself (of a landscape, a pet, a still life; avoid photos of other people), or any image you have the right to use. Upload it to Pika and add a short motion prompt describing how you'd like the scene to come alive.
Try text-to-video (if available). If your chosen tool supports text-to-video on its free tier, try generating a clip entirely from a text description. Write two different prompts: one naturalistic, one creative or abstract. If text-to-video isn't available on your free tier, try two more image-to-video generations with different source images and more ambitious motion prompts.
Evaluate what you've made. Watch each clip several times and assess the results against the evaluation criteria below.
Compare tools (optional but recommended). If you signed up for a second tool, run the same prompt or upload the same image and compare the results.

Example motion prompts for image-to-video

Keep your motion prompts simple for your first attempts:

"Gentle breeze, clouds moving slowly, camera drifts to the right"
"Slow zoom in, light changes gradually, subtle movement"
"Water ripples gently, leaves sway in the wind"

As you gain confidence, try more ambitious descriptions that involve camera movement, changing light, or multiple elements moving at once.

Example prompts for text-to-video

A naturalistic prompt: Describe a real-world scene. Be specific about the setting, lighting, camera movement, and what's happening. For example:

A slow aerial shot drifting over a misty Scottish loch at sunrise. Mountains reflected in perfectly still water. A single bird flies across the frame. Soft golden light.

A creative or abstract prompt: Try something that leans into AI's strengths with stylised content. For example:

An abstract, dreamlike sequence of floating geometric shapes in vivid colours, slowly rotating against a dark background. Smooth, hypnotic motion. Cinematic lighting.

What to look for when evaluating your clips

First impression: before you analysed it, did it look convincing? Would it catch someone's eye on social media?
Motion quality: does movement look natural? Are there any jerky, unnatural, or physically impossible movements?
Visual consistency: do objects, textures, and lighting stay consistent throughout the clip, or do things drift or change?
Prompt adherence: did the AI do what you asked? What did it interpret differently from your intention?
The uncanny valley: is there anything that looks almost right but is subtly wrong? What specifically triggers that feeling?
Practical usefulness: could you actually use this clip for anything? A social media post, a presentation background, a mood board, a placeholder in a video project?

Privacy reminder: use personal images, personal examples, or fictional scenarios. Never use actual work materials, client content, or confidential images.

Your output

A document containing:

The prompts (and source images, if used) for each video you generated
The video clips themselves (downloaded or screen-recorded), or screenshots from the clips if downloading isn't available on your tier
Your evaluation of each clip, covering at least the points listed in the evaluation criteria above
A short reflection (a few paragraphs) on your overall impression: what surprised you, what disappointed you, where you can see AI video being useful in its current state, and what would need to improve before you'd use it for something that mattered

Why this matters

Video generation rounds out your tour of AI's creative capabilities. You've now experienced AI working across text, images, speech, audio, music, and video, and the pattern is consistent: AI produces impressive results quickly, but the quality varies and human judgement is essential for evaluating the output.

More importantly, video is where the pace of improvement is fastest. The limitations you notice today are likely to be significantly reduced within months. By forming a clear, honest assessment of where things stand now, you're creating a baseline you can measure future progress against. That habit of critical evaluation is one of the most useful things you can take from this programme.

Claim your Open Badge

Submit your prompts, generated video clips (or screenshots), evaluation notes, and written reflection as evidence for your Thing 13 badge via cred.scot.

Thing 13: AI video generation

Submit your prompts, video clips or screenshots, evaluation notes, and reflection as evidence to claim this badge via cred.scot.

Claim now

How AI video generation works

The current tool landscape

Ethics, copyright, and deepfakes

Resources to explore

Activity: your first AI videos

Your output

Why this matters

Claim your Open Badge

What's next