Home > Threaded-Crawler

Threaded-Crawler

Threaded-Crawler is a project mainly written in Python, it's free.

Threaded Crawler

This web crawler is designed to be a generic and highly configurable crawler, that can quickly traverse sites, and pull content based on regex and other selection criteria.

Requirements

Uses BeatifulSoup to parse html pages (http://www.crummy.com/software/BeautifulSoup/) Uses epydoc for documentation Uses JobSite common package

python-psycopg2 2.0.8

Development

The 'cmd' script can be used to clean and build docs. Documentation is in doc/API.

INSTALL

python setup.py install

Running

$COMMON environment variable should be set to the path for common/patterns.py lib, or the lib should be installed on the default python path.

Previous:demo_app