rpdf

Home > rpdf

Rpdf is a project mainly written in PERL and SHELL, it's free.

extract text content and layout information from PDF documents

rpdf extracts text and layout from PDF documents, using pdftohtml and cuneiform.

version 0.3 (2010-06-05)

INSTALLATION

rpdf requires

No further installation required. To test if it works, run

./rpdf test/simple.pdf ./rpdf test/ocr.pdf

rpdf [-d int] [-p int] file.pdf

-d int : debug level -p int : max. number of pages to parse

If -p is not specified, what happens depends on whether rpdf can use pdftohtml or has to fall back on OCR.

Previous：tweet-link