Supported Document Formats
Firecrawl currently supports the following document formats:-
Excel Spreadsheets (
.xlsx,.xls)- Each worksheet is converted to an HTML table
- Worksheets are separated by H2 headings with the sheet name
- Preserves cell formatting and data types
-
Word Documents (
.docx,.doc,.odt,.rtf)- Extracts text content while preserving document structure
- Maintains headings, paragraphs, lists, and tables
- Preserves basic formatting and styling
-
PDF Documents (
.pdf)- Extracts text content with layout information
- Preserves document structure including sections and paragraphs
- Handles both text-based and scanned PDFs (with OCR support)
- Supports
modeoption to control parsing strategy:fast(text-only),auto(text with OCR fallback, default), orocr(force OCR) - Priced at 1 credit per-page. See Pricing for details.
PDF Parsing Modes
Use theparsers option to control how PDFs are processed:
| Mode | Description |
|---|---|
auto | Attempts fast text-based extraction first, falls back to OCR if needed. This is the default. |
fast | Text-based parsing only (embedded text). Fastest option, but will not extract text from scanned or image-heavy pages. |
ocr | Forces OCR parsing on every page. Use for scanned documents or when auto misclassifies a page. |
How to Use Document Parsing
Document parsing in Firecrawl works in two ways:- URL-based parsing (
/v2/scrape): provide a URL that points to a supported document type. - File upload parsing (
/v2/parse): upload file bytes directly withmultipart/form-data.
Upload documents with /v2/parse
Use /v2/parse when the source document is local or not publicly accessible by URL.
Example: Scraping an Excel File
Node
Example: Scraping a Word Document
Node

