Thing 11

AI audio editing

Last reviewed: March 2026 30–45 minutes

In Thing 10, you generated speech from text and converted speech back into text. In Thing 11, you're going to try something that might change how you think about recording: editing audio by changing the words on screen.

If you've ever tried to edit audio or video using traditional software (Audacity, GarageBand, Adobe Premiere, or anything similar), you'll know it involves staring at a waveform. That wobbly line represents your audio, and to cut a section, you have to find the right spot by eye, play it back, scrub through the timeline, and hope you don't accidentally chop the beginning off a word. It works, but it's fiddly, time-consuming, and feels completely disconnected from what was actually said.

AI audio editing flips this. The tool transcribes your audio, shows you the text, and lets you edit the audio by editing the text. Want to remove that "umm" in the middle of your sentence? Delete it from the transcript. Want to cut an entire paragraph you rambled through? Select the text and press delete. The audio follows the text. It sounds too simple to be real until you try it, at which point you wonder why audio editing ever worked any other way.

Text-based editing is only part of the picture, though. AI can also clean up your audio quality after the fact: removing background noise, reducing echo, and making a recording from your kitchen sound like it was made in a proper studio. These enhancement tools have changed what counts as "good enough" recording conditions. You no longer need a quiet room and an expensive microphone to produce audio that sounds professional.

This Thing gives you hands-on experience with both: editing audio through text, and enhancing audio quality with AI. By the end, you'll have a good sense of how capable these tools are, where they fall short, and whether they're useful in your own life.


How AI audio editing works

An abstract illustration suggesting audio editing, with geometric shapes, waveforms, and editing tools arranged on a desk workspace
AI audio editing lets you work with speech as text rather than waveforms, while enhancement tools clean up recordings automatically.

As with image generation and voice synthesis, you don't need a deep technical understanding to use these tools well. But knowing the basics helps you understand what you're hearing and why results vary.


The current tool landscape

You've got two main categories of tool here: one for text-based editing, one for audio enhancement. Here's what's available and worth trying.


Resources to explore

Descript

Your primary text-based editing tool for the activity. Free tier with 60 media minutes per month. Requires desktop app (Windows/macOS).

Open tool
Adobe Podcast Enhance Speech

One-click audio enhancement. Free tier processes files up to 30 minutes with one hour of daily processing. No software to install.

Open tool
Auphonic

Automated audio post-production with levelling, noise reduction, and loudness normalisation. Free tier of two hours per month.

Open tool
Audacity

Free, open-source audio editor with traditional waveform editing. Steeper learning curve but full control over your audio.

Open tool
Krisp

Real-time background noise removal during calls and recordings. Useful if you frequently work from noisy environments.

Open tool

Activity: edit and enhance your audio

30–45 minutes Descript + Adobe Podcast

This activity has four parts. You'll record a short audio clip, then use two different AI tools to process it: one for text-based editing and one for audio enhancement. The comparison will show you two distinct ways AI is changing audio editing.

What you'll need: a computer with a microphone (built-in is fine), a free Descript account, and a free Adobe account.

Part 1: Record your source material

Record yourself talking for two to three minutes. Use your computer's built-in microphone or your phone. Don't worry about finding a quiet room or using good equipment. A slightly imperfect recording is actually better for this exercise because it gives the AI something to work with.

Don't script it; just talk naturally. The goal is a recording that sounds like a real person speaking off the cuff, complete with the "ums," pauses, restarts, and background noise that come with that.

Part 2: Text-based editing in Descript

  1. Install and sign up. Download and install Descript from descript.com if you haven't already, and create a free account.
  2. Upload your recording. Create a new project and upload your audio file.
  3. Wait for transcription. Descript will transcribe your audio. This usually takes a minute or two.
  4. Explore the transcript. Read through it and notice how it maps to your recording. Click on any word and the audio will play from that point.
  5. Remove filler words. Descript highlights "ums" and "uhs" automatically. Try removing them individually or use the filler word removal feature to clear them in bulk.
  6. Try a bigger edit. Delete a sentence or two from the middle of your recording. Play back the result and listen for how smooth (or not) the edit sounds.
  7. Try Studio Sound (optional). If you have AI credits remaining, apply Studio Sound to hear the audio enhancement.
  8. Export your edit. Export the edited audio.

Part 3: Audio enhancement with Adobe Podcast

  1. Open the tool. Go to podcast.adobe.com/enhance and sign in with a free Adobe account.
  2. Upload your original. Upload your original recording (the unedited version, not the Descript export).
  3. Wait for processing. The AI typically takes under a minute for a short recording.
  4. Compare the versions. Listen to the enhanced version and toggle between the original and enhanced audio to hear the difference.
  5. Download the result. Download the enhanced version.

Part 4: Compare and reflect

You should now have three versions of your recording: the original, the Descript-edited version, and the Adobe-enhanced version. Listen to all three and write a short reflection (200–300 words) covering these questions:

  • What did the text-based editing in Descript change about the content of your recording? Did removing filler words and pauses make it sound more polished, or did it lose some natural character?
  • What did the Adobe Podcast enhancement change about the sound quality? Could you hear a difference in background noise, clarity, or overall polish?
  • Which type of improvement (content editing or quality enhancement) made the bigger difference to your recording?
  • Can you think of situations in your own life where either of these tools would be useful? Think about any audio or video content you create, even informally: voice messages, presentation recordings, training materials, or social media content.
Privacy reminder: use personal examples or fictional scenarios, never actual work materials or confidential documents.

Your output

A document or blog post containing:

  • Your original recording
  • The Descript-edited version (with filler words removed and at least one section cut)
  • The Adobe Podcast-enhanced version
  • A written reflection (200–300 words) comparing the three versions

Things to notice

As you work through the activity, pay attention to a few things that reveal how these tools work and where they have limitations.


Why this matters

Audio and video content is increasingly part of professional life. Organisations use it for training, internal communications, social media, and knowledge sharing. But the traditional barrier to creating polished audio has always been the editing. It was slow, required specialist software knowledge, and felt like a completely separate skill from the actual content creation.

AI audio editing removes that barrier in two ways. Text-based editing means you can edit a recording as naturally as you'd edit a document, a skill everyone already has. And AI enhancement means you don't need professional equipment or a treated recording space to produce audio that sounds clean and clear.

You don't need to be a "content creator" for this to be relevant. If you've ever recorded a voice note and wished you could tidy it up before sending it, or given a presentation that was recorded and wished the audio quality was better, or thought about starting a podcast but felt put off by the editing, these tools are directly useful.

You'll see this same pattern throughout the programme: AI is steadily removing the technical barriers between having an idea and executing it. You don't need to learn audio engineering to produce clean audio, just as you don't need to learn graphic design to create images (Thing 9) or voice acting to generate narration (Thing 10). The skills that matter more and more are the creative and editorial ones: knowing what you want to say, recognising what sounds good, and making thoughtful decisions about the tools you use.


Claim your Open Badge

Submit your original recording, the Descript-edited version, the Adobe Podcast-enhanced version, and your written reflection as evidence for your Thing 11 badge via cred.scot.

Thing 11: AI audio editing open badge
Thing 11: AI audio editing

Submit your three audio versions and written reflection as evidence to claim this badge via cred.scot.

Claim now

What's next

You've generated images from text (Thing 9), created and transcribed speech (Thing 10), and edited audio using AI (Thing 11). In Thing 12, you'll move into what might be the most unexpectedly fun part of the programme: AI music generation. You'll use tools that can compose complete songs (lyrics, vocals, instrumentation, the lot) from a short text description. If the image generators in Thing 9 made you do a double-take, wait until you hear what AI can do with music.