Auto Subtitles — Private, No Upload

About Auto Subtitles

Auto Subtitles keeps the complete file workflow inside this browser tab. Your selected file, its name, contents, settings, and generated output are not sent to AntiUpload or a cloud conversion service. Ads and aggregate traffic analytics stay outside the processing boundary and receive no file object or working buffer.

AntiUpload's Auto Subtitles extracts a 16 kHz audio track with FFmpeg and runs Whisper locally in a Web Worker. Your media stays on your device; only the selected model may need to be downloaded before transcription begins.

Choose Tiny for the lighter workflow or Base when difficult speech may benefit from a larger model. Use the English-only path, auto-detect a multilingual recording, provide a language hint, or intentionally translate supported non-English speech into English.

The tool exports SRT, VTT, TXT, timestamped JSON, or ASS with word-level karaoke timing. Voice Activity Detection can skip detected long silences, and a -30 to +30 second offset corrects a caption track that consistently appears early or late.

How it works

Choose an audio or video fileThe file is read locally and FFmpeg extracts a mono speech track for Whisper. No media upload is required.
Choose a workflow, model, language, and taskStart from a quick workflow or set Tiny versus Base, English-only versus multilingual, and source-language transcription versus English translation.
Set output and timing behaviorPick SRT, VTT, TXT, JSON, or word-timed ASS. Keep silence skipping on for long pauses, disable it for unusually quiet voices, and correct consistent timing drift with the offset control.
Review the local processing planConfirm the input duration, selected model, task, output, silence behavior, and timing shift. If the model is not cached, the browser downloads it before local processing.
Generate and downloadWhisper transcribes in a worker, detected-silence timestamps are mapped back to the original media, and the selected caption or transcript file downloads when ready.

When to use Auto Subtitles

Caption a podcast or interview

Create an SRT or VTT track while keeping unreleased or sensitive audio on the device.

Create animated word-highlight captions

Export word-timed ASS, then combine it with Subtitle Burn-In when captions must be permanently visible in the video.

Build a searchable research transcript

Export plain TXT for reading or timestamped JSON for a custom search, review, or indexing workflow.

Translate multilingual speech into English captions

Choose the Translate workflow and optionally provide the source language when auto-detection needs help.

Correct a consistently early or late caption track

Apply a precise positive or negative timing offset before exporting instead of editing every cue manually.

Frequently asked questions

Is my audio really never uploaded?

The file is processed locally by FFmpeg and Whisper in your browser. Network requests can still fetch the application, processing engine, and selected model, but the transcription workflow does not send your media file to a processing server. Input files are read from browser memory and the output is created locally for your download. AntiUpload does not send file bytes, names, sizes, page counts, contents, passwords, or generated outputs through analytics or advertising requests.

Should I choose the Tiny or Base model?

Tiny is the lighter choice and is useful for a fast first pass on clear speech. Base uses more download, memory, and processing time, but may improve difficult accents, noise, or specialist vocabulary. Review important captions regardless of model.

Why can the first run take longer?

The selected Whisper model may need to download before processing. A browser cache can reuse model files on later runs, but storage cleanup, private browsing, or cache settings can require another download.

What's the difference between transcribe and translate mode?

Transcribe keeps the source language. Translate asks the multilingual Whisper model to produce English text from supported non-English speech. Choosing Translate automatically avoids the English-only model.

What does Skip long silence do?

It detects long quiet regions, omits them from the audio sent to Whisper, then maps generated timestamps back to the original timeline. Leave it on for recordings with long pauses; turn it off if very quiet speech is being mistaken for silence.

How do I fix captions that are consistently early or late?

Use the timing offset from -30 to +30 seconds. Positive values move every cue later and negative values move every cue earlier. The engine clamps shifted timestamps to the media duration.

Which output format should I choose?

Use SRT for broad editor and platform compatibility, VTT for HTML video, TXT for a readable transcript, JSON for timestamped structured data, or ASS when you need word-level karaoke timing.

Is it really free, with no sign-up or watermark?

Yes — there's no account, no sign-up, and no watermark or branding added to your subtitles. The tool runs entirely in your browser, so there's nothing to pay for and nothing to install.

Does it work on Windows, Mac, and Linux without installing anything?

Yes — it runs in any modern browser, so it works the same on Windows, Mac, Linux, and Chromebooks with no software to install. Once the page and model have loaded you can even disconnect from the internet, since the transcription happens locally on your machine.

About Auto Subtitles

How it works

When to use Auto Subtitles

Frequently asked questions

Related tools