Home > archive-twitter-search

archive-twitter-search

Archive-twitter-search is a project mainly written in Python, based on the GPL-3.0 license.

Short script to archive twitter searches

Copyright 2010 Dave Challis [email protected]

1 Summary

archive_twitter_search.py is a short script which performs a twitter search using twitter's search API, and saves each tweet (including all metadata) to a SQLite database.

It's designed to require almost no configuration (apart from specifying a search term!).

It only works using twitter's search API, i.e. it will not get/archive an individual's statuses (getting these uses twitter's REST API, which is tightly rate limited).

2 Getting Started

2.1 Archiving tweets from twitter search terms

  1. Edit 'archive_twitter_search.py' and change the 'query' variable
  2. Run the script!

Assuming all goes well, you should see a message about the number of tweets added, and a SQLite database file named 'tweets.db' will have been created.

Ideally you should add a cron job to run the script every N minutes (where N depends on how often new tweets for your search term(s) are likely to turn up).

Only tweets added since your last search will be searched for, so queries after the inital one should be a lot faster.

2.2 Viewing saved tweets

  1. Open the database in SQLite: sqlite3 tweets.db

  2. View the database schema by entering: .schema

  3. Get any/all tweets using an SQL query, e.g.: SELECT * FROM tweets LIMIT 5;

Pretty much every language has SQLite bindings these days, so you should be able to open and access the archived tweets from there.

3 System Requirements

Software:

  • python 2.6+ (hasn't been tested with earlier releases, but might work)
  • sqlite3

Non-standard Python libraries:

  • dateutil
  • sqlalchemy

4 About twitter's search API

Full details on the twitter search API at: http://apiwiki.twitter.com/Twitter-API-Documentation

Twitter's search API allows the last 1500 results for a search term to be queried for. In order to capture all tweets for a given search term, you'll need to experiment with running it often enough that you get <1500 new tweets each time (this may not be possible for trending items!).

While the search API is rate limited, limits for the search API are apparently much more generous than the 150 requests per hour imposed on the REST API.

Rate limiting details: http://apiwiki.twitter.com/Rate-limiting