add cowinterleave |
|
More...
|
almost 11 years ago
|
minor addition to cowsplit |
|
More...
|
about 11 years ago
|
adding cowsplit tool |
|
More...
|
about 11 years ago
|
adding hydra resources |
|
More...
|
about 11 years ago
|
polishing first neuedimensionen release |
|
More...
|
about 11 years ago
|
new milestone: texrex-neuedimensionen; finalized new geolocator |
|
More...
|
about 11 years ago
|
UNFINISHED work on new geolocator |
|
More...
|
about 11 years ago
|
adding hydra |
|
More...
|
about 11 years ago
|
add HyDRA skeleton |
|
More...
|
about 11 years ago
|
adding multi-pass mode to rofl + more refac |
|
More...
|
about 11 years ago
|
simplifications and speed-ups for rofl |
|
More...
|
about 11 years ago
|
miore fixes to rofl |
|
More...
|
about 11 years ago
|
minor fixes to rofl |
|
More...
|
about 11 years ago
|
adding phpBB emoticon fix and updated databases to rofl |
|
More...
|
about 11 years ago
|
final candidate for rofl with data |
|
More...
|
about 11 years ago
|
adding rofl (Run-On Fixer with Lists) |
|
More...
|
about 11 years ago
|
documentation update |
|
More...
|
about 11 years ago
|
undoing incorrect fix in TTrwriter(Pool); the 32-bit indexed TGZFileStreams are to blame |
|
More...
|
about 11 years ago
|
minimal fix in TTrwriter(Pool) |
|
More...
|
about 11 years ago
|
rewrite of texnet as tenet, added Swedish boilerplate data |
|
More...
|
over 11 years ago
|
fixes in URL extraction/processing |
|
More...
|
over 11 years ago
|
implemented external gzip for Heritrix multi-record .arc.gz reading |
|
More...
|
over 11 years ago
|
syncing work on file reading with external gunzip |
|
More...
|
over 11 years ago
|
small improvements to meta data handling |
|
More...
|
over 11 years ago
|
finishing TARC writer + user documentation update |
|
More...
|
over 11 years ago
|
adapt boilerplate detector for training data generation |
|
More...
|
over 11 years ago
|
improved document ID scheme + adapted shingling tools |
|
More...
|
over 11 years ago
|
updating documentation + minor fixes |
|
More...
|
over 11 years ago
|
finalized meta data normalization; fixed + improved link extraction |
|
More...
|
over 11 years ago
|
finished refactoring of second pass cleanser + cleansing HTML meta information now |
|
More...
|
over 11 years ago
|