Rpdf is a project mainly written in PERL and SHELL, it's free.
extract text content and layout information from PDF documents
rpdf extracts text and layout from PDF documents, using pdftohtml and cuneiform.
version 0.3 (2010-06-05)
rpdf requires
No further installation required. To test if it works, run
./rpdf test/simple.pdf ./rpdf test/ocr.pdf
rpdf [-d int] [-p int] file.pdf
-d int : debug level -p int : max. number of pages to parse
If -p is not specified, what happens depends on whether rpdf can use pdftohtml or has to fall back on OCR.