Heritrix3 is a project mainly written in Java, it's free.
Mirror of Heritrix 3 (the Internet Archive's crawler)
404: Not Found