About Auto-Cut Silence
AntiUpload's Auto-Cut Silence finds the silent stretches in a podcast or video recording and cuts them out, leaving the speech (or whatever non-silent content) intact. It's the Descript / Adobe Podcast Enhance / Riverside silence-removal feature, free, running locally in your browser via FFmpeg's `silencedetect` filter. The two-pass workflow: pass 1 scans the audio for stretches quieter than your threshold (-30 dB default) lasting longer than your minimum (0.5s default), and emits silence_start / silence_end timestamp pairs. Pass 2 inverts those into speech ranges (with a configurable padding so words don't get clipped at cut points), builds a filter_complex graph that trims to each speech range, and concats them back together.
The economics matter: Descript charges $24/month for the editing suite that includes this feature; Adobe Podcast (the closest free competitor) limits to 1 hour/month free tier with intermittent quality issues. Our tool runs locally, has no time cap, and produces predictable output (you control the threshold and padding, not an opaque ML model). The trade-off: we use a simple energy-based silence detector (FFmpeg silencedetect), not the speech-aware detector Descript uses. If you have background music that drops below threshold in places, our tool will cut it; Descript's model knows "there's still music underneath, don't cut." For pure-voice content (podcasts without background music, voicemails, meeting recordings) the simple detector matches the smart detector's behaviour at zero cost.
The threshold (-30 dB default) and minimum-silence-duration (0.5s default) are the two main knobs. Lower threshold (more negative, e.g. -40 dB) cuts only the truly silent parts — safer, conservative. Higher (less negative, e.g. -20 dB) catches quieter ambient noise as "silent" — aggressive cut. The padding (0.15s default) is the buffer of speech kept on each side of every cut so the first and last word of each segment aren't clipped. Works on both audio and video files — for video, the picture stays in sync with the audio cuts because we trim both streams simultaneously and re-encode the result.
How it works
- Drop your audio or video fileAccepts every common video container (MP4 / MOV / WebM / MKV / AVI) and every common audio format (MP3 / WAV / M4A / OGG / FLAC / AAC / OPUS). Video stays in sync with audio cuts.
- Set silence threshold (dB)-30 dB default works for typical podcast / Zoom voice. -40 dB for very quiet recordings (kid asleep nearby, ambient noise floor needs respecting). -25 dB if your audio is loud and you want aggressive cuts.
- Set minimum silence (seconds)0.5s default keeps natural beat-pauses ("uh", thinking time) and cuts only longer dead air. Increase to 1.0s for more conservative cuts. Decrease to 0.3s for aggressive pacing.
- Set padding (seconds)0.15s default keeps a small buffer of speech on each side of every cut so words don't clip. Bump to 0.25s if you hear word fragments at cut points. Drop to 0.05s for tighter pacing if your speech is clean.
- Click Remove silencesPass 1 scans the audio (~10% of total time). Pass 2 trims + concats the speech segments (~90%). Output preserves the source format for audio inputs; video inputs always output as MP4.