Home > docstruct

docstruct

Docstruct is a project mainly written in Python, based on the BSD-3-Clause license.

A tool to create Document Structure trees from XHTML websites.

DocStruct - A Document Structure Parser

A tool to create Document Structure[1] (DS) trees from XHTML websites. This was created as a term project for CSI 5386 (Fall 2009) at the University of Ottawa, Fall 2009. More detailed information on the project can be found in the paper located at http://cloud.github.com/downloads/cfournie/docstruct/paper.pdf

Directories module - Contains the python parser tool spec - Contains example DS trees, and the DS XML Schema

References

[1] R. Power, D. Scott, and N. Bouayad-Agha, "Document structure," Comput. Linguist., vol. 29, no. 2, pp. 211-260, 2003. Accessible at http://www.mitpressjournals.org/doi/abs/10.1162/089120103322145315

Previous:TypingTutor