Home > type-level-tagger

type-level-tagger

Type-level-tagger is a project mainly written in Clojure, based on the EPL-1.0 license.

State-of-The-Art Unsupervised Part-Of-Speech Type-Level Tagger in 300 Lines of Clojure

Simple Type-Level Unsupervised Part-Of-Speech Tagger

This is a short self-contained Clojure implementation of:

Simple Type-Level Unsupervised POS Tagging Yoong Keok Lee, Aria Haghighi and Regina Barzilay To appear in proceedings of EMNLP 2010

Running

Simply run the script with the following arguments

infile: path to a file where each line is a sentence and tokens are space separate
outfile: path to write mapping of words to tags (represented by an integer)
num-iters: number of iterations to run Gibbs Sampler
K: number of tag states to use
alpha: hyper-parameter for type-level distributions (try 1)
beta: hyper-parameter for token-level distributions (try 0.1)

If you want to test this on a corpus of the appropriate size, I have the Brown corpus (approximately 1 million tokens) which you can use as input at http://db.tt/aYl0tfx. Only available for non-commercial purposes.

Author

Aria Haghighi ([email protected])

My Website

Support

Email author with any issues.

License

Copyright (C) 2010 Aria Haghighi

Distributed under the Eclipse Public License, the same as Clojure uses. See the file License.

Previous:htmlreader