Does it work on scanned PDFs?

No. Scanned PDFs contain images, not a text layer. Use an OCR tool first, then extract text here.

Is rich formatting preserved?

No. Only the raw text characters are extracted. Fonts, colours, columns, and layout are not preserved in the .txt output.

What languages are supported?

Any language present in the PDF's embedded text layer is supported — the extraction is character-level, not language-specific.

Sin subida
CPU local
Funciona sin conexión
Limpieza automática

0 solicitudes salientes

convert / PDF a texto

PDF a texto

Extraer el contenido como .txt.

Añade al menos un PDF en la bandeja para empezar.

Explorar más herramientas

Metodología y Transparencia Técnica

Bibliotecas utilizadas

pdf-lib — Lógica principal de construcción y edición de PDF
pdf.js — Renderizado de PDF y rasterización de páginas

Estrategia de memoria

Tras cada operación, se llama a URL.revokeObjectURL() inmediatamente. Todos los handles de documentos pdf.js se destruyen mediante pdfDoc.destroy(). Los workers se terminan al completar el procesamiento o al desmontar el componente.

No garantizamos el almacenamiento permanente de archivos (ya que no los almacenamos). El procesamiento local de PDFs protegidos con contraseña no está soportado.

Key Features

pdf.js text layer extraction
Extracts the embedded text layer from digitally created PDFs with full UTF-8 support.
One-click .txt download
The extracted content is saved as a plain .txt file with page breaks indicated by section dividers.
Instant preview
Read the extracted text in the browser before downloading to verify the content.

Common Use Cases

Handy for feeding PDF content into LLMs, building full-text search indexes, copying long passages into word processors, or auditing the accessibility of a document.

Frequently Asked Questions

Does it work on scanned PDFs?: No. Scanned PDFs contain images, not a text layer. Use an OCR tool first, then extract text here.
Is rich formatting preserved?: No. Only the raw text characters are extracted. Fonts, colours, columns, and layout are not preserved in the .txt output.
What languages are supported?: Any language present in the PDF's embedded text layer is supported — the extraction is character-level, not language-specific.