Come trascrivere l'audio in testo: 8 metodi, prima quelli più privati
La maggior parte degli strumenti di trascrizione ti obbliga a caricare prima la registrazione sul server di uno sconosciuto. Ecco otto modi per trasformare l'audio in testo, classificati dal più privato al meno, a partire da quello che non invia mai un byte dal tuo dispositivo.

Prefazione
You have a recording. An interview, a lecture, a client call, a voice memo you left yourself on the walk home. Now you need it as text you can search, quote, edit, or paste into something else.
The internet has a hundred tools that will do this in two minutes. Almost all of them share one quiet step: first, upload your audio to our servers. That step is fine for a podcast you are about to publish anyway. It is a problem for a therapy session, a deposition, a patient note, a confidential product meeting, or any recording you would not email to a stranger.
So this guide ranks the methods by the one thing most guides skip: whether your audio leaves your device at all. The most private option comes first. The cloud services come last, recommended for the specific cases where they honestly win.
L'unica decisione che cambia tutto
Before you pick a tool, answer a single question: does your audio leave your device?
Every transcription method falls into one of two camps. On-device tools do the work on the computer or phone in front of you. The audio is read off your disk, turned into text by a model that already lives on the machine, and that is the end of it. Cloud tools do the opposite. They send your recording over the internet to a company's servers, run the transcription there, and send the text back.
The cloud camp is genuinely good at convenience. Upload, wait, download. It is also where every privacy problem lives. Your audio now sits on someone else's hard drive. It may be retained for "quality and training." It is reachable by that company's staff, its sub-processors, and anyone who breaches them. You cannot un-share it once it is gone.
Here is the part that surprises people. You do not have to trade privacy for capability anymore. The machine on your desk is fast enough to transcribe an hour of audio in minutes, fully offline, with accuracy that holds its own against the cloud. The hardware you already paid for can do the job.
Privacy by architecture beats privacy by policy. If the audio never leaves the device, no retention setting, breach, or change of terms can expose what was never uploaded.
The principle behind Yaps

That is the lens for everything below. The list runs from "nothing leaves your machine" to "everything goes to the cloud." Start at the top, and only move down when a feature you actually need is missing.
Gli 8 modi migliori per trascrivere l'audio in testo (prima quelli più privati)
1. Yaps: transcribe a recording offline, export text or subtitles
Yaps is the pick when the recording is yours to protect. It imports an audio file and transcribes it on your Mac, using a speech model that already lives on the machine. The audio is never uploaded. There is no account to sign into for the transcription itself, no server in the loop, and no way for the file to end up in someone's training set, because it never travels.
This is the part most on-device tools get half right. Yaps does the full job. It reads your file, produces a clean transcript with word-level timing, lets you read and search it, and then exports it in the format you actually need.
What makes it the most complete on-device option, point by point:
It exports real subtitles. This is the feature the built-in Apple tools do not have. Yaps keeps word-level and sentence-level timing, so it can write a proper .srt subtitle file with timecodes, not just a wall of text. If you caption videos, that single capability skips a whole separate tool. More on this in the subtitles section below.
It learns your words. Names, jargon, product names, and acronyms are where generic transcription falls apart. Yaps applies your own custom vocabulary to the transcript, so "Awoyemi" and "Kubernetes" and "Ozempic" come out spelled the way you mean them, every time. You teach it once.
It keeps a local library. Every transcript you make stays in Yaps on your machine, searchable, until you delete it. Nothing syncs to a cloud you do not control.
It works on older Macs than Apple's own feature. Apple's built-in audio transcription needs a recent Apple Silicon Mac and the latest macOS. Yaps runs on macOS 13 and later, which covers a much wider range of machines still in daily use.
Here is the honest scope. The audio-file transcription described here is a Mac feature, inside the Yaps Studio. On Android, Yaps is a live voice-typing keyboard, not a file importer, so if your goal is turning an existing recording into text, reach for the Mac. Yaps is also focused on English. If your audio is in another language, skip to the cloud options further down, where multilingual support is the whole point.
How to transcribe an audio file in Yaps
The whole flow takes about five minutes for a typical recording, most of which is the transcription itself running.
Install the speech model onceone-time
In Settings and then Features, install the on-device model. After this single download, every transcription runs with the internet switched off.
Open Studio and choose your file10 sec
Pick your recording from the file dialog. MP3, M4A, WAV, AAC, OGG, and FLAC all work.
Let Yaps transcribe on your Macoffline
Yaps turns the audio into a timed transcript on your own machine. Progress shows inline. The file never leaves your disk.
Read, fix, and export text or subtitles.txt / .srt
Click any word to hear it, correct by ear, then export plain text or an SRT subtitle file with timecodes.
Yaps is the same app you use for live voice typing, so the recording-to-text workflow sits next to the speak-to-type workflow. Capture a thought by voice on the move, transcribe a recorded interview at your desk, and clean both into your notes, all without an account standing between you and your own words. If you want the broader picture, the offline dictation use case and the Mac dictation guide cover the live side.
2. MacWhisper and Buzz: open models in a simple wrapper
If you want offline transcription and Yaps is not on your platform, the next stop is a desktop app built on the open Whisper speech model. On Mac that is MacWhisper. On Windows it is Buzz. Both take an audio or video file, run an open model locally, and give you a transcript without uploading anything.
These tools are private and capable, and they are a reasonable choice. The trade is that each one does the single job of producing a transcript and little else. There is no live voice typing, no shared custom vocabulary across your daily writing, and no broader notes workflow. You transcribe, you export, you move on. For a one-off file on a Windows machine, Buzz is the strongest free, private option. We compare the Mac side in detail in our MacWhisper alternative writeup.
3. Apple Voice Memos and Notes: free, built in, and limited
Recent Apple devices can transcribe audio without any extra app. On an iPhone, open Voice Memos, tap a recording, and tap the transcript button. On a Mac, drag an M4A file into Voice Memos or Notes and the transcript appears. It is free, it is mostly on-device, and for a quick personal recording it is genuinely convenient.
Then you hit the walls. Built-in transcription is plain text only, so there is no subtitle export and no timecodes for video. It covers English and a short list of other languages, not the long tail. It needs a recent Apple Silicon Mac or a newer iPhone and the latest operating system, which leaves a lot of working machines out. And there is no custom vocabulary, so specialist terms come back misspelled with no way to teach the tool.
It is the right answer for "what did I mumble into my phone this morning." It is the wrong answer for captioning a video or transcribing forty interviews with consistent terminology. We break down where the system tool stops and a dedicated app starts in our Apple Dictation comparison.
Built-in Apple transcription
Fine for a quick memo
Plain text only, no subtitle export, English plus a handful of languages, newest hardware required, and no way to teach it your vocabulary. Good enough for a note to self, frustrating for real work.
Yaps on the same Mac
Built for the real job
Exports SRT subtitles and plain text, applies your custom vocabulary, keeps a searchable local library, and runs on macOS 13 and later. Still fully offline, still nothing uploaded.
4. OpenAI Whisper: free, private, and do it yourself
Whisper is the open speech model that quietly powers a large share of the tools on this page, including MacWhisper and Buzz. You can also run it yourself, directly, for free. Install it, point it at an audio file from the command line, and it writes out a transcript and even subtitle files, entirely on your own machine.
For a developer or a tinkerer, this is the purest version of private transcription. Nothing is uploaded, nothing is paid, and you control every setting. The catch is the setup. You are installing a runtime, managing model downloads, and living on the command line. There is no friendly window, no click-a-word-to-hear-it, and no vocabulary memory across your other writing. If you want the full walkthrough, we wrote one: running Whisper locally on a Mac. For most people, a tool that wraps the same idea in a real interface is worth more than the zero-dollar price of doing it by hand.
5. Microsoft 365 Transcribe: good if you live in Word
If your day already runs through Microsoft 365, Word for the web has a Transcribe feature that takes an uploaded audio file and drops a speaker-separated transcript straight into your document. Go to Home, then Dictate, then Transcribe, and upload your file. It handles common formats and is a tidy fit when the transcript is headed into a Word document anyway.
This is a cloud tool. Your audio uploads to Microsoft's servers for processing, which is the line in the sand this guide is drawn around. There are also monthly limits on upload transcription depending on your plan. For internal, non-sensitive recordings inside an organization that already trusts Microsoft with its data, it is convenient. For anything confidential, the on-device options above keep the file in your hands.
6. Google Docs Voice Typing: free, but live only
Google Docs has a free Voice Typing tool under the Tools menu in Chrome. It is genuinely useful, with one catch that trips people up: it only listens to live speech through your microphone. It cannot import a recorded file. People try to transcribe a recording by playing it aloud into the microphone, and the result is noticeably worse, because the audio is now traveling through the air and your room instead of straight from the file.
Use Google Docs Voice Typing for dictating fresh text into a document. Do not reach for it to turn an existing recording into text. For that, any of the file-based methods on this list will give you a cleaner result. And like Word, it is a cloud tool, so your speech is processed on Google's servers.
7. Otter.ai: the right call for meetings and speaker labels
Some jobs genuinely need the cloud, and meetings are the clearest example. When you need to know who said what, you need speaker separation, and that is where a meeting-focused service like Otter earns its place. Otter records or imports a conversation, labels each speaker, and produces a summary with action items. For multi-person calls where the speaker labels are the point, it is the strongest pick on this list.
Be clear-eyed about the trade. Otter is cloud-based, so your conversation is uploaded and stored, and its data practices have drawn real scrutiny, which we covered in is your transcription service safe. Yaps does not do speaker labeling today, so for a four-person board call where attribution matters, Otter is the honest recommendation. For a one-voice recording you want to keep private, it is the wrong tool, and the top of this list is the right one. If you want the broader alternative landscape, see our Otter alternative guide.
8. Rev, Descript, and Happy Scribe: cloud power for the biggest jobs
At the far end sit the full cloud services. Rev offers both AI and human transcription, with humans available when you need certified, near-perfect accuracy on a legal or medical record. Descript turns the transcript into an editing surface for podcasts and video. Happy Scribe, Trint, and Sonix handle 100-plus languages and polished subtitle workflows. These are powerful, mature products.
They are also the least private option here, by design. Everything uploads. You are choosing them precisely when a capability outweighs keeping the file local: a language Yaps does not cover, a guaranteed human accuracy bar, a heavy editing workflow, or a volume of work that wants a team dashboard. For those cases, they are worth it. For the confidential single recording that started this article, they are exactly what you were trying to avoid.
Un confronto fianco a fianco
Five representative tools, scored on what matters when you are turning a recording into text. The full list above has more options, but these cover the shape of the choice.
| What matters | Yaps | Apple Voice Memos | MacWhisper | Otter.ai | Rev |
|---|---|---|---|---|---|
| Best for | Private files + subtitles | Quick personal memos | One-off offline files | Meetings, speaker labels | Human accuracy, editing |
| Runs on-device (private) | Yes | Mostly | Yes | No (cloud) | No (cloud) |
| Works fully offline | Yes | Partial | Yes | No | No |
| Imports a recorded file | Yes | Yes | Yes | Yes | Yes |
| Exports SRT subtitles | Yes | No | Yes | Yes | Yes |
| Custom vocabulary | Yes | No | Yes | Yes | Yes |
| Speaker labels | Not yet | No | Partial | Yes | Yes |
| Languages | English | ~10 | Many | Many | 100+ |
The pattern is the point. Yaps wins on privacy and on the practical export features for English work. The cloud tools win when you need many languages, certified human accuracy, or speaker labels. The built-in Apple tool is the convenient floor, not the ceiling.
Casi speciali che vale la pena risolvere
Most "how to transcribe audio" questions are really one of a handful of specific jobs. Here is the quick guidance for each.
How to transcribe a voice memo
A voice memo is just an audio file, usually an M4A. On a recent iPhone or Mac you can read the transcript inside the Voice Memos app itself, which is the fastest route for a throwaway note. When the memo matters, when you want it spelled correctly, searchable, and exportable, move the file into a real tool. On a Mac, open it in Yaps and you get a clean transcript, your vocabulary applied, and a proper export. The phone is great for capture. The Mac is where the memo becomes usable text.
How to make subtitles (SRT) from audio

This is where the built-in tools quietly fail you. Apple's Voice Memos and Notes produce plain text with no timecodes, so they cannot make subtitles at all. To caption a video you need an .srt file, which pairs each line of text with a start and end time.
The slow way is to write that file by hand in a text editor, typing timecodes line by line. Do not do this. The fast and private way is to transcribe the audio in a tool that keeps timing and exports SRT directly. Yaps does this offline: transcribe the file, then export as SRT, and drop the resulting file into your video editor. Most cloud subtitle generators do it too, with the usual upload. Either way, let the tool write the timecodes.
How to transcribe a YouTube video or any video file
Two routes. If it is your own video file, extract the audio as above and transcribe the M4A. If it is a YouTube video, the platform auto-generates captions you can often open from the transcript panel, though their accuracy varies and they are not always available. For a clean, correctable transcript of a video you own, extracting the audio and running it through an on-device tool gives you the best result and keeps the file private.
How to transcribe an interview, lecture, or podcast
These are the long-form jobs where accuracy and terminology matter most. Record the cleanest audio you can, transcribe the file, then correct the handful of mistakes by ear. The custom vocabulary feature pays for itself here, because a research interview or a technical podcast is full of names and terms a generic model will guess wrong. If you do this work professionally, two deeper guides apply: offline transcription for qualitative research for interviews under an ethics board, and secure transcription software for Mac for legal, medical, and corporate recordings. Students and journalists can start with Yaps for students and Yaps for journalists.
How to transcribe audio in another language
Be honest with yourself about the language first. Yaps and Apple's built-in tools are English-first. If your audio is in Spanish, Mandarin, Arabic, or any of dozens of others, a cloud service built for many languages, like Happy Scribe or Rev, will serve you far better than forcing an English-first tool to guess. This is a clear case where the cloud is the right answer, and the privacy trade is one you make knowingly.
Come ottenere una trascrizione accurata, qualunque sia lo strumento utilizzato
The tool matters less than the input. A clean recording transcribed by an average model beats a noisy recording transcribed by the best model on the market. Three habits do most of the work.
Record close and quiet. Get the microphone near the speaker and kill the background noise you can. A phone on the table beside someone beats a laptop across the room. Hard surfaces echo, so a room with soft furnishings transcribes better than a bathroom.
Teach the tool your words. If your tool supports custom vocabulary, as Yaps does, load it with the names, acronyms, and jargon specific to your work before you transcribe. This is the single highest-leverage fix for the errors that annoy you most.
Correct by ear, not by eye. When you review, play the audio and read along. A tool that lets you click a word to jump to that moment in the recording, the way Yaps does, turns proofreading from a chore into a two-minute pass. Our full checklist lives in dictation accuracy tips, and the broader tool roundup is the best dictation apps for Mac.
Considerazioni finali
Start with the most private tool that does the job, and only move toward the cloud when a specific feature forces you to.
For turning a recording into text on a Mac, that default is Yaps. It transcribes your file on your own machine, never uploads a byte, applies your vocabulary, and exports the plain text or the subtitle file you actually need. It is the rare on-device tool that does the whole job rather than half of it. Reach for the cloud knowingly, in the two honest cases where it wins: Otter when you need speaker labels on a multi-person meeting, and a service like Rev or Happy Scribe when you need a language Yaps does not cover or a certified human accuracy bar.
The recording you have right now is probably more sensitive than you would upload to a stranger if you stopped to think about it. The good news is that you no longer have to. The machine in front of you can do the work, quietly, with the file staying exactly where it is. Download Yaps and transcribe your first recording without sending it anywhere.
Domande frequenti
How do I transcribe audio to text for free?
The most capable free routes are on-device. On a recent Apple device, Voice Memos and Notes transcribe recordings at no cost, though only to plain text. On any Mac, running the open Whisper model yourself is free if you are comfortable with the command line. Yaps offers free on-device transcription of your recordings on Mac as part of the app. Free cloud tools exist too, but most retain your uploads, so read the terms before sending anything sensitive.
How can I transcribe audio without uploading it to the cloud?
Use an on-device tool, where the transcription happens on your own machine and the audio never travels. On Mac, Yaps imports your file and transcribes it locally, exporting text or subtitles with nothing sent to a server. MacWhisper, Buzz on Windows, and running Whisper directly are other offline options. Apple's built-in Voice Memos transcription is also largely on-device. Any tool that asks you to upload a file is, by definition, not one of these.
What is the best way to transcribe audio on a Mac?
For a private, correctable transcript with subtitle export, Yaps is the strongest pick, because it runs offline, applies your custom vocabulary, and writes proper SRT files, which the built-in tools cannot. For a quick throwaway memo, Apple's Voice Memos is the fastest. For a recording in another language or one needing certified human accuracy, a cloud service is the better fit despite the upload.
Can I transcribe an audio file on my iPhone?
Yes. On an iPhone 12 or newer running a recent iOS, open Voice Memos, tap the recording, and tap the transcript button to read it on-device. For a recording you want to keep, spell correctly, and export properly, move the file to a Mac and transcribe it in a dedicated tool. The phone is ideal for capture, less so for turning a long recording into polished, exportable text.
How do I make an SRT subtitle file from audio?
Use a transcription tool that keeps timing and exports SRT directly, rather than writing timecodes by hand. Yaps does this offline: transcribe the audio file, then export the result as an SRT, which you drop into your video editor. The built-in Apple tools cannot do this, since they produce plain text with no timecodes. Cloud subtitle generators can, with the usual upload of your media.
Why can the built-in Mac transcription not create subtitles?
Apple's Voice Memos and Notes transcription produces plain text only. It does not retain the per-word timing that a subtitle file needs to sync each line to the right moment in the video. To get an SRT with timecodes you need a tool that keeps word-level timing, such as Yaps offline or a cloud subtitle service. This is one of the clearest gaps between the convenient built-in feature and a dedicated transcription app.
Is on-device transcription as accurate as cloud transcription?
For clear English audio, on-device transcription is now competitive with the cloud. The open speech models that run locally have closed most of the gap, and the quality of your recording matters more than where the model runs. The cloud still leads in two areas: a wide range of languages, and speaker separation for multi-person recordings. For a single clear English voice, you give up very little by staying on-device, and you gain complete privacy.
What audio formats can Yaps transcribe?
Yaps imports the common audio formats: MP3, M4A, WAV, AAC, OGG, and FLAC. If you have a video file, export its audio to one of these first, for example by using QuickTime Player to export Audio Only as an M4A. If you have an unusual or compressed format, convert it to WAV or M4A before importing.
Does Yaps work offline for transcription?
Yes. After a one-time download of the on-device speech model from Settings and then Features, all transcription runs on your machine with no internet connection required. You can transcribe a recording on a plane with the audio never leaving your laptop. This is the core of the design: the file stays on your disk from import to export.
Can Yaps label who said what in a recording?
Not today. Speaker labeling, also called diarization, is not a feature Yaps offers yet, so for a multi-person meeting where attribution is the point, a meeting-focused cloud tool like Otter is the better choice. For single-speaker recordings, interviews where you already know the two voices, or any audio you need to keep private, Yaps is the stronger pick.
How long can a recording be when I transcribe it?
On-device tools transcribe in proportion to the recording length and your hardware, so a longer file simply takes longer, and there is no upload cap to worry about. A modern Mac transcribes an hour of audio in a handful of minutes. Cloud services often impose monthly limits or per-file caps depending on your plan, which is one more reason the local route suits long interviews and lectures.
Should I use Otter, Rev, or Yaps?
Use Yaps when the recording is yours to protect and you want it to stay on your Mac, especially for English audio you plan to caption or quote. Use Otter when you need speaker labels and summaries for a multi-person meeting. Use Rev when you need a language Yaps does not cover, certified human accuracy for a legal or medical record, or a heavy editing workflow. The privacy cost of the last two is the upload, which is worth it only when their specific strength is what you need.
Is it legal to transcribe a recording of a conversation?
Transcribing is just converting audio you already have into text, so the legal question is about the recording itself, not the transcription. Recording laws vary by region, and some places require all parties to consent. If you lawfully hold the audio, transcribing it is generally fine. Transcribing it on your own device, where the audio never reaches a third party, also keeps you clear of the data-sharing issues that cloud uploads can raise for confidential material.
What is the difference between dictation and transcription?
Dictation turns your live speech into text as you talk, which is what you do when you voice-type a message or a note. Transcription turns an existing recording into text after the fact. Yaps does both: live voice typing across your apps, and offline transcription of recorded files in the Studio. Many tools do only one, so it is worth checking which job a given app is built for before you rely on it.