Minescraper is a project mainly written in Python, based on the View license.
Screen-scraper for gathering data on mine safety and violations
This is a Python script for scraping data on U.S. mines, inspections, accidents, violations, penalties and contractors from the Department of Labor's enforcement site. The site provides a limited search interface, allowing users to search up to five states at a time, with limited sort, drill-down and export capabilities.
This script allows for all the data provided on the site to be downloaded in a more raw format.
Python >= 2.5 (If you are using an earlier version of Python and need to run this script, please contact me and I will try to assist you.)
BeautifulSoup
The script can be run from the command line, as is, and will download data for every mine in every state: $ ./mines.py
This will, by default, create six CSV files in the current directory:
mines.csv - A list of all mines, along with the following information about each one:
inspections.csv - A list of all inspections, along with the following information about each one:
accidents.csv - A list of all mine accidents, along with the following information about each one:
violations.csv - A list of all mine violations, along with the following information about each one:
assessments.csv - A list of all penalties proposed and assessed against mines, along with the following information about each one:
contractors.csv - A list of all mine contractors, along with the following information about each one:
See the Mine Safety and Health Administration's data dictionary for more detailed data definitions.
Please note: This script can take a long time to run. If you're interested in downloading all the data available, I would suggest running separated instances of the script on several different computers, each responsible for a state or set of states.
Other usage scenarios:
Download data for a single state:
from minescraper.mines import MineScraper
scraper = MineScraper('mines.csv', 'inspections.csv', 'accidents.csv', 'contractors.csv', 'violations.csv', 'assessments.csv')
scraper.write_headers()
scraper.scrape('WV') # Just download data for West Virginia
Get only a list of mines (and the rest of the data in mines.csv) for a state:
from minescraper.mines import MineScraper
scraper = MineScraper('mines.csv', 'inspections.csv', 'accidents.csv', 'contractors.csv', 'violations.csv', 'assessments.csv', False, False, False, False, False)
scraper.write_headers()
scraper.scrape('WV')
Get all data for active mines in a single state:
from minescraper.mines import MineScraper
scraper = MineScraper('mines.csv', 'inspections.csv', 'accidents.csv', 'contractors.csv', 'violations.csv', 'assessments.csv', False, False, False, False, False, True)
scraper.write_headers()
scraper.scrape('WV')
Two-clause BSD. See LICENSE
The Mine Safety and Health Administration has released other mine data here