Get wordTabulator at SourceForge.net English Русский

wordTabulator

SoftPedia

Program wordTabulator is intended for text analysis. It can generate index of word elements extracted from defined set of texts. Word elements may be:

Format of word elements is defined by user. The program can process texts as in ANSII 1-byte encoding, as in multibyte UTF-8 encoding. Originally the program was created for processing of Russian text exclusively, but may be successfuly used for other languages. For example, languages may be Ucranian, Islandic or Swedish. Definition of source texts' language is quite formal. Actually it is Cyrillic and non Cyrrilic.

WordTabulator processes correctly any Cyrillics and takes into account abolished Russian letters (pre-reform of 1918 year): І, Ї (yi), Ѣ (yat), Ѳ (phita), Ѵ (izhitsa). The program also correctly processes diacrticis for Europien and Scandinavian languages (letters with grave, acute, tilde, diaeresis etc.). Text in UTF-8 may contain absolutely any letters - even Ancient Egyptian or Chinese hieroglyphs. As additional feature program can correctly recognize such abbrevations as U.S.A. or a.b.c.

Source texts maybe defined as a set of flat text files or HTML/XML/SGML documents. In the last case the program can filter content from markup. Moreover, you can process only defined content within selected paired tags. Or you can skip that content from processing.

As additional feature you can analyse a pair of text sets and compare them by common or different elements.

For Russian texts you can search by words in normalized form by rules of Ruissian morphology and find all case endings. Also you can search by regular expressions.

Output of program is a word index of all found text elements. Word index maybe generated in HTML format and contain frequences of each text element and links to original content. Also it may be generated as a flat text file. Words in the index maybe ordered by alphabet, value or frequence.

wordTabulator is a free and open source software. The console processing module was initially written in Icon Programing Language and lately migrated to Unicon. Graphical UI was initially developed with help of Delphi 7 and lately migrated to open source Lazarus.

wordTabulator was born at 1997 as amateur project and then widely used at Russian Virtul Library. Last few years wordTabulator was incorporated to other my project xMarkup.

The standalone version of the program was not changed a quite long time. Last version was released at 2012 and slightly modernized at 2016.

However the xMarkup's version of wordTabulator got many new features such as graphic visualisation of results. But new GUI still must be implemented.

wordTabuator user's guide.

Download:

Last updated: 2020-09-19
© Sergey Logichev, 1997-2020