English | Русский |
Program wordTabulator is intended for text analysis. It can generate index of word elements extracted from defined set of texts. Word elements may be:
WordTabulator processes correctly any Cyrillics and takes into account abolished Russian letters (pre-reform of 1918 year): І, Ї (yi), Ѣ (yat), Ѳ (phita), Ѵ (izhitsa). The program also correctly processes diacrticis for Europien and Scandinavian languages (letters with grave, acute, tilde, diaeresis etc.). Text in UTF-8 may contain absolutely any letters - even Ancient Egyptian or Chinese hieroglyphs. As additional feature program can correctly recognize such abbrevations as U.S.A. or a.b.c.
Source texts maybe defined as a set of flat text files or HTML/XML/SGML documents. In the last case the program can filter content from markup. Moreover, you can process only defined content within selected paired tags. Or you can skip that content from processing.
As additional feature you can analyse a pair of text sets and compare them by common or different elements.
For Russian texts you can search by words in normalized form by rules of Ruissian morphology and find all case endings. Also you can search by regular expressions.
Output of program is a word index of all found text elements. Word index maybe generated in HTML format and contain frequences of each text element and links to original content. Also it may be generated as a flat text file. Words in the index maybe ordered by alphabet, value or frequence.
wordTabulator is a free and open source software. The console processing module was initially written in Icon Programing Language and lately migrated to Unicon. Graphical UI was initially developed with help of Delphi 7 and lately migrated to open source Lazarus.
wordTabulator was born at 1997 as amateur project and then widely used at Russian Virtul Library. Last few years wordTabulator was incorporated to other my project xMarkup.The standalone version of the program was not changed a quite long time. Last version was released at 2012 and slightly modernized at 2016.
However the xMarkup's version of wordTabulator got many new features such as graphic visualisation of results. But new GUI still must be implemented.
Download:
Last updated: 2020-09-19 | |
© Sergey Logichev, 1997-2020 |