WARNING : Do not put non GPL resources in the repository : e.g. TnT or the French Treebank
This is a toolkit for training and evaluating statistical parsers acquired from The French Treebank
This is a treebank manipulation toolkit that allows to work specifically with the French Treebank. Though it may be used as well with other treebanks like the Penn TreeBank.
The goal is to prototype from existing tools an accurate parser for constituent and functional dependency parsing of French relying on statistical methods.
There is a Wiki for further information (link)
How to use the toolsSome bits of docs on the tools developped so far... Tree manipulation Parsers (TODO) Machine Learning
Some featuresMain functionalities: treebank query/manipulationrecode facilities : utilities to convert different formats into each other : French TreeBank_, PennTreeBank_ and Ims (basic) tgrep : quick oncordances from corpora (basic) tsed : quick predefined transformation of the corpus (basic) twc : stats from corpora with outputs easy to use with e.g. R (basic) tdiff : find differences between trees (not implemented yet) Reuse existing toolse.g. The berkeley parser Collins/Bikel parser (not yet implemented) TnT LNCKY (CKY parser implemented by M. johnson) evalb XLE dependency annotation/evaluation tools
Informal Schedule/PlanMaximise the constituent parsing accuracy Given constituent Structure, perform some functional role labelling