Pyquery is a project mainly written in Python, it's free.
simple commandline tool for filtering bad HTML
PyQuery is a simple tool to parse/query bad html in a simple way directly from the Commandline. Its written in Python and does work on python 2.5 -2.6 and u need Beautifullsoup installed.(http://www.crummy.com/software/BeautifulSoup/)
License: BSD (see source)
++Short Tutorial++
Download pyquery.py and Beautifullsoup(install). Put pyquery in any directory, make it executable with """ # chmod +x pyquery.py """ and then fire it """ ./pyquery.py --help """ to see availabel options. Then feed in HTML through stdin (pipes) """ cat some.html | ./pyquery.py -t div """ and watch the magic ;)