Blog Layout

Which is the best AI for Automated Transcription?

David French • Jan 29, 2023

Now Updated!

Additional translation engines added after requests.

Introduction

It's often useful or plain good practice to include closed captions or subtitles with video content. It makes your content accessible not only to people who may have hearing difficulties, but also to those who are viewing without audio enabled - in public, on the train, in an open-plan office.


Tools to transcribe text yourself, or where humans manually transcribe your text for you as a service, have been around for years. However, Machine Learning or AI, whichever you choose to call it, is reaching the stage where it may be a viable alternative to manual human transcription.


By the way, I'm not going to get into an argument about the terminology. Whether you want to call it AI, Machine Learning, cloud services or whatever, we're talking about a computer that can interpret speech into text. Realistically we're not talking about true artificial intelligence here - the service won't be mulling over the deep meaning of your content. Calling it AI is as much about branding as anything else.


With several hours worth of video content from a seminar needing transcribing, in order to achieve fast turnaround I wanted to know which of the many services out there would give the most accurate results.


Specifically, rather than feed the AI beautifully enunciated clear text, I wanted to see how it faired with real-world, noisy, flawed audio, which is exactly when you'd normally reach for the human transcribers.

Benefits of AI for Automated Transcription

AI or Machine Learning-based transcription tools have the benefit of being quicker than human-based transcription. If you submit an hour's worth of spoken audio, a human transcription service may take a day or more to turn around the transcription. An AI-based service may be able to transcribe the same content in near-realtime, or even faster than realtime.


AI transcription may also be more cost-effective than human-based transcription. Humans get paid by the hour; the service that employs them may charge you per hour of transcription, or per word. AI can work 24x7 without breaks, unionisation or holidays, and it doesn't have a mortgage to pay, so most AI-based transcription services are cheaper than their human counterparts.


On the other hand, AI is not currently as accurate as a skilled human transcriber, particularly in cases where several voices interact, or there's a lot of background noise, or the vocabulary is technical or specific. AI can't currently summarise or reduce redundancy in content to fit reading speeds, etc.


But given what AI is potentially capable of achieving, I wanted to set up a test to find out the best the AI could do, given a challenging real-life audio recording.

Method

I signed up for free demo accounts on the following services:

  • Simon Says
  • Cockatoo
  • SpeechText.ai
  • trint
  • Otter.ai
  • Sonix
  • Rev
  • Speak AI
  • Beey
  • MacWhisper Pro
  • Descript
  • SpeedScriber
  • I also let Premiere Pro have a crack at it


I tried Lumberjack Builder but found it incomprehensible and gave up. Note that Descript and SpeedScriber both require an app to be installed on your Mac; with the others I used the web interfaces, which are of course cross-platform.


I uploaded a 40 second video clip. I deliberately chose a clip where several voices could be heard (not at the same time), with some speakers on-mic and some picked up by a background mic. Before uploading I manually tweaked the audio to obtain the best quality that I could, given the recording - I normalised the levels, cut out some background noise, removed pops and applied some compression.


I wanted to give the AI platforms a good test; the audio I submitted is typical of the type of events I film, where not everybody remembers to speak into the mic all the time, or the audience contributes without waiting for a roving mic, so you rely on on-camera mics or other mic sources to hear some of the speakers.


Some of the AI platforms warn you in advance that if audio isn't clear, single-voice, properly recorded with no background noise, the platform can only do the best it can. Some such as Rev then offer to use their human-based transcription services to process the audio instead. The point of my test was to give these platforms a challenging task, so I overruled their objections and continued.


As a reminder, I'm not interested in functionality for these tests - just accuracy. All the platforms tested have a range of features including outputting timecoded transcriptions, and some can cope with language translation and will happily identify & manage multiple speakers, online editing etc. Prices also vary, but I was only interested in performance.

Results

Original Text - transcribed by me

[camera mic] "Hiya, I'm [name redacted for privacy], and I joined the communications, erm, team here two weeks ago. My role is to help communicate what's happening in this transformation."


[camera mic] "Morning, I'm [unusual unintelligible name] and I'm the Head of Psychological Professions at Community Mental Health, so I'm representing all the psychological professional groups throughout, erm, the community and specialist services. Morning everyone, [unintelligible]."


[camera mic] "Hello, I'm [name redacted for privacy], I know a lot of people in the room and there's some new faces, uh, my role is the psychology lead on the transformation team."


[handheld mic, good quality] "Lovely, thank you very much everybody. And may I just ask the, erm, service leads and managers to stand up."

Service Speaker 1 Speaker 2 Speaker 3 Speaker 4 Speakers correctly distinguished
Human transcription (me) Hiya, I'm [name], and I joined the communications, erm, team here two weeks ago. My role is to help communicate what's happening in this transformation. Morning, I'm [name] and I'm the Head of Psychological Professions at Community Mental Health, so I'm representing all the psychological professional groups throughout, erm, the community and specialist services. Morning everyone, [unintelligible]. Hello, I'm [name], I know a lot of people in the room and there's some new faces, uh, my role is the psychology lead on the transformation team. Lovely, thank you very much everybody. And may I just ask the, erm, service leads and managers to stand up.
Simon Says There, and I join the communications team here to expand my role to help community people's transformations. This morning I am going to Paso and I'm the head of Psychological Transactions for Community Mental Health. So I'm representing all the psychological professional groups throughout the community and especially since this morning of what I've done. Hello, everyone. I'm [name vaguely correct]. And we can begin with some new faces. My role is to support you lead on the transformation take place. Thank you very much, everybody. And I just us, the service leaders and managers to stand up. N/A
Cockatoo Hi, I'm the joint communications team here to go. My role is help community people to have new transformations. Morning, I'm [name vaguely correct] and I'm a head of psychological professions for community mental health. I'm representing roles like psychological professional groups throughout the community and specialist agencies. Morning, everyone. Hello everyone, I'm [name nearly correct]. I've got a lot of people reading the summary thesis. My role is to support you lead on the transformation team. Lovely, thank you very much everybody. And just to ask the service leader manager for standup. N/A
SpeechText.ai Hi a [word instead of name] and I joined the communications and team here two weeks ago. My roles help communicate what's happening transformation morning. I am gonna castle and I'm the head of psychological directions for Community Mental Health. So, I'm representing all psychological Congressional groups throughout the community in specialist Services morning, everyone. [gave up] thank you very much, everybody and may just ask the service needs of managers to N/A
trint And I joined the communications team here two weeks ago. And my role is to help me people's happiness transformations. Morning. I'm going to ask so and I'm the head of Psychological professions for Queensland Health, so I'm representing all psychological professional groups throughout the community and special services. Morning. Hello, everyone. I'm [name nearly correct]. We can bring some new faces. My role is to support you. Lead on the transformation take. Thank you very much, everybody. And may I just ask, the service leads and manages to stand up. 4
Otter.ai Hi, I'm [name wrong] and I joined the communications team here two weeks ago. My role is to help communicate what's happening in this transformation. Morning I am [name wrong] and I'm the head of psychological depression. So community mental health. So I'm representing log psychological professional groups throughout meeting specialist services, warning everyone I've ever been I've done robbing people in the room with some new faces. My role is in psychology lead on the transformation team. Thank you very much everybody, and may just ask the service leads and managers to stand up 3
Sonix Hiya. I'm [correct name], and I joined the communications team here two weeks ago. My role is to help communicate what's happening with this transformation. Good morning. I'm [nearly correct name], and I'm the head of Psychological Directions for Community Mental Health. So I'm representing all psychological professional groups throughout the community and specialist services. Morning, everyone. Hello, everybody. I'm [nearly correct name]. I know a lot of people in the room. There's some new faces. My role is psychology. Lead on the transformation team. Thank you very much, everybody. And may just ask the service leaders and managers to stand up. 3
Rev You. Hi, uh, I'm, and I joined the communications, um, team here two weeks ago. My role is to help communicate what's happening in this transformation. Good morning. I'm [correct name] and um, I'm the head of psychological professions for community mental health. So I'm representing all psychological professional groups throughout, um, the Community Specialist services. Morning, everyone. We done? Yeah. Hello everyone. I'm [correct name]. I have a lot of people in the room with some new faces. Uh, my role is the psychology lead on the transformation team. Lovely. Thank you very much everybody. And may I just ask the, um, service leads and managers to stand up? 4
Speak AI Hiya, I'm [nearly correct name] and I joined the communications team here 2 weeks ago. My roles help communicate what's happening in this transformation. Good morning. I'm [nearly correct name], and I'm the head of psychological professions for community mental health. So I'm representing all the psychological professional groups throughout the community and specialist services. Morning, everyone. I'm [nearly correct name] and a lot of people in the room with some new faces. My role is to psychology lead on the transformation team. Speaker 4: Lovely. Thank you very much everybody. And may I just ask the service leads and managers to stand up? 4
Beey Pay up and I joined communications and team here 2 weeks ago my roles help. Communicates have these transformations. Morning I'm [roughly correct name] and I'm the head of psychological profession so community mental health, so I'm representing all psychological professional groups throughout. The comedian specialist services. Morning. Everyone' Hello everyone I'm going [incorrect name].  Lovely thank you very much everybody, and may just ask the um service, leads and managers to stand. 2?
MacWhisper Pro Hiya, I'm [correct name] and I joined the communications team here two weeks ago. My role is to help communicate what's happening with these transformations. Morning, I'm [correct name] and I'm the Head of Psychological Professions for Communities Mental Health. So I'm representing all psychological professional groups throughout the community and specialist services. Morning everyone. Hello everyone, I'm [correct name]. I know there are a lot of people in the room with some new faces. My role is the psychology lead on the transformation team. Thank you very much everybody. And may I just ask the service leads and managers to stand up? N/A
Descript Hi, uh, I'm, and I joined the communications, um, team here two weeks ago. My role is to help communicate what's happening in this transformation. Good morning, I'm [name vaguely correct], and um, I'm the head of psychological professions for community mental health. So I'm representing all psychological professional group. Throughout, um, the Community Specialist Services. Morning, everyone. We done? Yeah. I'm [name correct]. I have a lot of people in the room with some new faces. Uh, my role is the psychology lead on the transformation team. Thank you very much everybody. And may I just ask the, um, service leads and managers to stand. 1
SpeedScriber Not translated - trial version won’t transcribe whole 38s clip This morning, I’m going to Purcell and I’m the head of psychological operations for Queensland Health. So I’m representing all the psychological professional groups throughout the community in specialist services. What you. Done? Hello, everybody. I’m [incorrect name]. We can begin with some. Not translated - trial version won’t transcribe whole 38s clip 1
Adobe Premiere Pro Yeah. And I joined the communications team here two weeks ago. And my roles help me with people's happiness transformations. Morning. I'm going to ask. So. And I'm the head of Psychological professions for Queensland Health, so I'm representing all the psychological professional groups throughout the community and special services. Morning. Right. Hello, everyone. I'm [name vaguely correct] and we can bring some new faces. My role psychologist on the transformation take place. Thank you very much, everybody. And just to us, the service leaders and managers to stand up. 3

Conclusion

I gave these AIs a deliberately tough challenge, and most of them couldn't cope, coming up with completely unusable transcripts. I'm not particularly blaming the AIs here - I deliberately chose speech I thought they'd struggle with, to see if any could step up to the mark.


And a handful did manage pretty well. In particular I found Sonix, Rev, Speak AI, MacWhisper Pro and Descript all came up with fairly usable transcriptions. Each has their own strengths and weaknesses and I was in the process of submitting more text for them to transcribe before I made a purchasing decision.


But then I slung another 30 minutes of difficult, technical, jargon-riddled, acronym-rich, accented, off-mic voice into MacWhisper Pro and I was blown off my feet. It uses the OpenAI Whisper engine to do the translations on your local machine. You just drag and drop a video into its window and it does its thing, outputting a text file or timecoded SRT. It was phenomenally accurate. Not quite 100%, but the amount of manual editing I needed to do was absolutely minimal. It coped with ridiculously complicated acronyms, slang, laughter, and somehow it even knew when to capitalise words or phrases.


It also transcribes around 3x faster than realtime on my Mac Studio. It doesn't try to categorise different voices, but for simple closed captions it's astonishingly good. It means there's no reason not to routinely provide closed captions to every client for every project. And the pricing model means it'll pay for itself in minutes, and keep earning for years.


At the risk of offending literally everybody who has their own pet translation engine whose honour they defend to the death, I will however also give a special mention to Rev, whose web interface makes it very easy to pay more and go for human translation when things get too much for the AI.


So AI might have a tiny way to go before it entirely replaces human transcription - but if you want a quick turnaround or have price-sensitive projects, for a ridiculously low price of €9, go and download MacWhisper Pro right now. You might be pleasantly surprised.

by David French 12 Apr, 2023
DSEASDK on MacOS: Is it Malware?
by David French 07 Mar, 2023
Wiral Lite replacement battery (in UK)
by David French 01 Feb, 2023
How do I use BRAW in FCPX?
by David French 24 Oct, 2022
Apple Studio Display - Mini Review
by David French 23 Oct, 2022
ShiftCam SnapGrip - 2 minute review 
by David French 22 Oct, 2022
Review & sample footage of Marshall CV-605 PTZ Camera & VS-PTC-IP controller
by David French 21 Oct, 2022
Panasonic GH5 vs Blackmagic Pocket Cinema Camera 4K (BMPCC4K) - which is best for filmmakers?
by David French 20 Oct, 2022
Controlling Blackmagic ATEM Fairlight audio mixing using Behringer X32 (or Midas M32)
Share by: