Home > Sitemap-Crawler

Sitemap-Crawler

Sitemap-Crawler is a project mainly written in Python, it's free.

Crawls a site to find every unique page URL. In Python & Django.

AUTHOR: Darren Nix
Version: 0.1
Date:   2011-9-7
Site: www.darrennix.com
License: Apache 2.0

Crawls a site to find unique page URLs and returns them as a list.
Ignores query strings, badly formed URLs, and links to domains
outside of the starting domain.

Inspired by sitemap_gen from Valdimir Toncar

DEPENDENCIES:
BeautifulSoup HTML parsing library
Previous:tendermaps