PDF to Markdown

About PDF to Markdown

PDF to Markdown serializes a PDF into a single self-contained .md file — tables become GFM pipe tables, lists keep their markers and nesting, hyperlinks survive as `[text](url)`, images embed as base64 data URIs so the Markdown renders anywhere without needing a sidecar assets folder. Great for publishing a PDF report to a GitHub README, piping into a static-site generator, or round-tripping through a Markdown-aware note app.

Every word of extraction happens in your browser via the same semantic pipeline that powers our PDF to Word converter. Nothing uploads. Output is deterministic — the same PDF always produces the same Markdown, character for character.

Bold and italic runs inside paragraphs are preserved (`**bold**`, `*italic*`, `***both***`). Captions and pull quotes get rendered as blockquotes since GFM lacks a dedicated caption style. The first-line of the document uses `#` as an H1, and the PDF's title + author metadata (if present) prepend the output.

How it works

Drop your PDFDrag a PDF onto the zone or browse for it. Up to 100 MB. Local.
Conversion runs in your browserThe engine parses structure, merges soft-hyphens, detects tables, stitches cross-page continuations — same machinery used by the Word converter. Emits GFM Markdown at the end.
Download the .mdOne self-contained file with embedded images. Paste into any Markdown editor, renderer, or docs platform.

When to use PDF to Markdown

Publishing a PDF report as a GitHub README

GitHub renders Markdown natively. Convert your PDF once, commit the .md, done — no hosted PDF viewer needed.

Feeding a technical PDF into a docs site

Jekyll, Hugo, Docusaurus, MkDocs — all consume Markdown. Skip the manual "paste and reformat" step.

Editing a legacy PDF manual in Obsidian / Logseq / Bear

Markdown-native note apps can't read PDFs. The converter bridges the gap and preserves the structure.

Frequently asked questions

Why are images inline (data URIs) and not separate files?

Self-contained output. A single .md file works without an assets/ folder, is easier to email or commit, and renders anywhere — even environments that don't resolve relative paths. If you need separate image files, extract them from the data URIs later.

Does this handle tables well?

Yes. The engine runs alignment-based + ruling-line-based table detection, then emits GFM pipe tables (| col1 | col2 |). Merged cells are respected via rowspan/colspan in the source PDF (though GFM itself doesn't support true merged cells — the content still comes through).

Is the order of content preserved?

Reading order yes — zones are decomposed first so a two-column article flows one column at a time. Running headers and footers are dropped (not part of the content). Page boundaries are not preserved as explicit breaks; content flows continuously.

About PDF to Markdown

How it works

When to use PDF to Markdown

Frequently asked questions

Related tools