Podcastle Pro Review: AI Software Clones Your Voice For Podcasting
We all know by now that AI is coming for our jobs, or if we're "lucky," we'll get to keep our jobs and AI will somehow make us even busier and more stressed — the way some other technologies have. That's the cynical view, but some AI will certainly help us create more and do things we've never done before. That's the motivation behind Podcastle Pro, a largely AI-driven platform for creators to record and edit audio and video from its cloud-synced web browser interface, so you don't have to install or update any software.
The Podcastle Pro subscription ($23.99/month when billed annually) focuses on editing voice recordings like podcasts rather than say, music or sound design audio. It uses AI to transcribe recordings, create written episode summaries, automate tedious editing tasks, remove background noise, and most intriguingly, for "Revoice," which creates entire AI voices based on specific recordings of one's own voice, which can be used for text-to-speech purposes in audio projects.
Audio recording and editing basics
To test Podcastle's audio recording and editing capabilities, I recorded a short public domain audiobook, and some video tutorial overdubs, and also uploaded some previously recorded interviews. When recording audio or video, the files record locally to your own computer so that Internet glitches or dropouts shouldn't mar the results, and then uploads the finished recordings to Podcastle's cloud so that you can access and edit them from anywhere with a web browser. The platform recognized any USB mic or audio interface I used, and I never experienced any interruptions or crashes during recording.
Podcastle tries to simplify audio editing for content creators who don't specialize in it, but there are still enough editing tools and AI-based processing to prepare most voice recordings for distribution. Essential functions include trimming, splitting, and merging audio clips, fading in and out for each clip, and muting or deleting sections of clips. You can create as many audio tracks as you need, each with its own volume and panning (stereo position left to right) settings.
Keyboard shortcuts are available for most editing functions, which you can look up from the many easily accessible help files. But many of the keyboard shortcuts such as undo, redo, zoom in, zoom out, and others are the same familiar keystrokes standardized on desktop software.
Using the shortcuts, I developed a nice little editing workflow. For example, selecting the Split tool, splitting the clip twice to cut out a sentence, going back to the Selection tool, and deleting that clip became a quick little bit of audio surgery. And audio fade-ins and fade-outs that you drag from the ends of clips were quick and convenient for editing out unwanted breath sounds at the beginning and ends of sentences.
Operational hiccups
The editing interface was not rock-solidly reliable all the time. For example, occasionally the Split tool just stopped working. When that happened other editing functions dropped out too, like the ability to move audio clips on the timeline and trim their start and end points and apply fade-ins or fade-outs.
Simply refreshing the browser page would reliably restore the proper editing functions, and happily, I would not lose any of my work. Unfortunately, refreshing the page resets the cursor's place in the project, so it could be time-consuming to get back to my previous spot.
The only other glitches I experienced happened when creating a digital Revoice. Podcastle stopped recognizing my external USB mic when recording the Revoice training sentences. On that occasion, I had to close the browser tab and start again with a new tab. Fortunately again, I didn't lose what I'd already done and just resumed recording. Then when the voice was finished, I had to close out of Podcastle and start it again before the AI voice became available in projects. Most of the time, however, my Podcastle audio editing workflow was fast and smooth.
AI audio processing
An audio clip's contextual menu shows the AI-based audio processing tools, as well as other options like setting the clip playback speed from 0.5x to 2x.
Podcastle offers a few main AI shortcuts for an audio clip or group of clips, including auto-leveling for creating a consistent volume level within a recording and removing periods of silence longer than very brief pauses. Both are great time-saving measures in theory but also worked very well in practice. Auto-leveling helps with a problem still commonly heard in podcasts and videos – often even from big-name brands – where different speakers' loudness varies too much. Leveling and removing silence both can be done manually, but it's a big time- and wrist-saver to have them done with a single click. For example, it took Podcastle about five minutes to auto-level 95 minutes of audio clips.
The "Magic Dust" function combines several processes into one step: background noise removal, auto-leveling, and applying an EQ to voices. A couple of times the tool made the white noise louder rather than removing it, but those occasions were the exception rather than the rule. Most of the time its noise removal worked very well, but I didn't always like its EQ choices on my voice. As an alternative, the Audio Assistant analyzes clips for background noise, uneven volume levels, and long pauses, and then you can just click on Enhance for each of those problems to let the AI tools go to work.
Podcastle's noise-removing ability was impressive and should have its own standalone command in the menu. Remove Silence also helped immensely to speed up the editing time for my audiobook recordings, where I often paused to reset mentally, drink some water, and read ahead for familiarity.
Transcription and editing audio from text
Podcastle Pro includes 25 hours a month of transcribing audio into English, Spanish, German, French, or Italian. It took about six minutes to a 37-minute interview with two people speaking English. The results were not perfect, but they were as good or better than other AI-based transcription services I've used such as Temi.com.
After transcription, you can use the Text Editor to delete words from the transcript, which will also delete the words from the audio. It can also "detect filler words," which highlights things like "uh" and "um" in yellow, so it's easier to see and delete them. This audio editing from text is an excellent feature that compares to popular services like Descript. It's easy to do, and Podcastle succeeds in deleting the words from the transcript of the audio recording. But for really clean results, you should still check the audio file and do any additional micro edits needed to smooth out the pacing of the audio.
In addition, the transcript shows periods of silence longer than one second, which you can also delete straight from the transcript. Again, you should double-check the results in the audio, but this method of editing helps greatly to save time and some tedious editing tasks.
Editing audio from the text transcript won't always be ideal during interviews or podcast conversations, particularly if there's any amount of cross-talk between people. However, it was perfect for editing my audiobook material, because I often pause between sentences before picking back up. So deleting the periods of silence from the transcription made things easy. And when I made mistakes and had to start a sentence over, or repeated multiple takes of sentences, the text editor made the job a lot faster than if I were using Audacity or another audio editor without this transcription editing ability.
Revoice AI and text-to-speech projects
Podcastle's Convert Text to Speech projects let you type or paste words into a text editor and convert them into spoken audio with 20 preset AI voices. You can export the audio and reopen it in a different project to edit it, or from an existing project, click the Text to Podcast button to open the text editor.
Podcastle's AI-generated voices sound fairly natural for the most part, but they did not handle most mid-sentence punctuation very well, leading to a stunted delivery. And certain voices mispronounced random words like "video" that other voices pronounced correctly. You can go back and make changes to the text, click Generate again, and the audio speech quickly adjusts to your changes.
The real fun begins, however, when you generate your Digital Voices, or your Revoice, based on recordings of yourself. You have to read 70 sentences that Podcastle feeds you. The first is a legal disclaimer stating that you will not use the Revoice for illegal purposes. You start and stop a recording for each of the sentences, and you can choose to redo the sentence if you wish before moving to the next. It took me about 20 minutes to finish all 70 sentences, and I ended up saying some things I would never say as my human self, such as "You are truly special to me." But once it's done, you can name your voice and it goes into analysis and processing. Podcastle says it could take up to 24 hours for it to finish, but I got an email after only six hours telling me my first Revoice was ready.
Revoice results
The prospect of creating an AI version of my own voice attracted me to Podcastle in the first place, and it's one of the things the company can offer now that competitors like Descript do not.
So when I first wrote some blather into the text editor and assigned it to my own AI voice, hearing it did not disappoint. That's not to say that it sounded exactly like me. I mean, it was me, but it was AI me. It had all the hallmarks of the Podcastle's 20 other Turing test-failing AI voices: the somewhat off-kilter delivery, the intermittent digital artifacts in the edges of vowel sounds, and the occasional dead-giveaway mispronunciation of words. Even though it was very close to my voice, it was still not all that close. But it was glorious just the same. I highly recommend trying it for yourself, if only just for fun. I for one enjoyed feeding my AI voice goofy lines and absurd soliloquies more than messing with ChatGPT.
However, while AI voice clones will get better over time, exact replication is not really Podcastle's goal for the moment. Rather, Revoice is meant to help creators in fixing mistakes while editing their content. If you have points in your recordings when you said a wrong phrase, left out a brief detail, or maybe need to add an interstitial like "we'll be right back," you can click Text to Podcast in the timeline, type something, and insert it as another audio track or replace a portion of an existing track. For that purpose, the Revoice can be a convincing substitute for making a whole other audio recording just for a few words.
Video recording and nascent video editing
For video projects, Podcastle can record the primary user's webcam, the primary user's screen, and invited guests' webcams. So you can record multi-person podcasts and meetings, solo videos, tutorials including screen capture, and so on. The screen recording can capture a particular Chrome browser tab, any software window open on your computer or the entire screen of any available display.
To include guests in a video recording, you can schedule the recording time, and connect it to a Google Calendar event. Adding guest email addresses will send them an invite. Or you can just enter the studio spontaneously and invite people an email or by messaging them a link.
Podcastle's video editing was still in beta at the time of this writing, and it was much more limited than audio editing. To cut up the video, you have to string together video segments by adding Markers to the timeline. That highlights an adjustable portion of the video. Then in the Highlights tab, all the color-coded highlight clips are shown in order. You can adjust their lengths and reorder them. Then you can then export the full video or the rendered highlights in HD, Full HD, or 4K quality.
The Highlights editor could be good for creators to make quick outtake videos for social media, but to do any serious editing, you'd have to export your full video and open it in a different program – at least for now.
As Podcastle develops its video features, it would be good if you could start recording new video clips from the video editor, the way you can with audio. It would also help creators to be able to record both a microphone and the computer's audio, the way you can with more robust screen-recording/video editing programs like Screenflow.
Media libraries and AI episode summaries
To spruce up your projects, Podcastle provides a library of more than 7,000 royalty-free music tracks and sound effects that you can search and drag right into your projects. You can filter the music by many genres from classical to synthwave or by descriptors like calm, quirky, and suspenseful. Alternatively, you can upload your own music if you prefer. The video editor also has many thousands of photos and graphics from Unsplash to set as your video background image.
When you're finished with a project, the Export button lets you choose your audio format from compressed and uncompressed file types like WAV, FLAC, MP3, AAC, and WMA. You can also opt to download a transcript in one of five languages as a Word doc or PDF.
For Pro-level subscribers, the export function has the ability to include an episode summary with the transcript. The AI-generated episode summaries could potentially save you some of the time needed to write your own, but in my testing, they needed to be heavily edited.
For example, one of them added a list at the end of "ten things you may know about Charlie," when no one named Charlie was ever mentioned. One also stated that I was a bodybuilder (not true) and referred to me with both female and male pronouns. That said, portions of the summaries did manage to sum up the actual content of the audio, which could serve as a foundation for an edited episode summary.
Podcastle Pro verdict
In many ways, Podcastle still feels like a new tool, which it is. For example, the video editing feature, which is key to many creators who plan to distribute their video recordings, is in beta. Podcastle video recordings are limited to using either the Chrome browser or the Podcastle iOS app on an iPhone only. The several functionality bugs I experienced that required either a page refresh or starting over in a new browser tab were far from catastrophic, but they were also more pervasive than I experienced in browser-based audio programs aimed at musicians such as Soundtrap and Soundation.
With that said, Podcastle has a lot going for it that creators can enjoy. Its AI technology for making audio editing and polishing quick and easy, transcribing, editing audio from the transcription, and creating audio speech from text all work quite well, and the machine learning algorithms for the Revoice digital clone of your voice will only get better with time. Its price of $29.99 per month when billed monthly or $23.99 per month when billed annually for Podcastle Pro compares directly to Descript Pro's price, but Podcastle has the extra cool cache of including Revoice clones, which you create for as many vocal styles that you can muster. If you're already paying for transcription services, Podcastle Pro's 25 hours of transcription per month could alone be worth the price. That much transcription would cost $375 at $0.25 per minute for Temi.com's AI transcription.
Podcastle will need much better video editing to challenge Descript, but it's working on that. Some incremental improvements were made within the month or so since I first started testing it. It could also use some publishing features for sending exported projects directly to YouTube and social media platforms. But I really enjoyed using Podcastle Pro, and it helped inspire me to start and finish some projects I'd been putting off. That's what a good productivity app should do.