Etusivu > Julkaisut > Lexica > Lexica XXXI

Lexica Societatis Fenno-Ugricae XXXI:1–2

Electronic Word Lists: Mari, Mordvin and Udmurt
With SFOu WordListTool 1.3
Ed. Jorma Luutonen et al.
Lexica Societatis Fenno-Ugricae XXXI:1

Katso myös / See also http://www.sgr.fi/lexica/lexicaxxxi1.html.

Electronic Word Lists: Komi, Chuvash and Tatar
With SFOu WordListTool 1.4.

Ed. Jorma Luutonen et al.
Lexica Societatis Fenno-Ugricae XXXI:2

Short description in Finnish
Toinen Suomalais-Ugrilaisen Seuran julkaisema sähköisten sanalistojen kokoelma sisältää komin, tšuvassin ja tataarin sanastoa. Listoissa esiintyvien sanojen yhteismäärä on noin 150 000. Jokaiseen sanaan on liitetty tieto kielestä, sanaluokasta ja sanastolähteestä. Näiden sanalistojen keskeisin käyttötarkoitus on toimia johto-opin ja sanojen rakenteen tutkimuksen apuvälineinä. Sähköisiä sanalistoja hyödynnetään myös kehitettäessä erilaisia kieliteknologisia sovelluksia. Sanalistapakettiin liittyy uusi versio SFOu WordListTool -tietokoneohjelmasta, joka on erityisesti kehitetty tämän tyyppisten sanalistojen käsittelyä varten. Ohjelman käyttöliittymäkielet ovat englanti, venäjä ja suomi.



For full bibliographical data, see QuickStartManual_en.pdf, p. 2.

Introduction

The Finno-Ugrian Society has published electronic word lists of the following languages: Mari, Mordvin, Udmurt, Komi, Chuvash and Tatar. The electronic word lists are intended to be sources for the study of word derivation and word structure. The total number of entries in the six lists is ca. 327,000. Each entry word is provided with labels that indicate language, word class and dictionary sources. SFOu WordListTool is a computer program that has been specially developed for handling such lists. The alternative user interface languages of the program are English, Russian and Finnish.

There are two versions of each word list, one with normal alphabetisation (beginning from the first letter of the word) and the other with reverse alphabetical order (beginning from the end of the word). The names of the files are as follows:

• mari_alph.txt
• mordva_alph.txt
• udmurt_alph.txt
• komi_alph.txt
• chuvash_alph.txt
• tatar_alph.txt

• mari_rev.txt
• mordva_rev.txt
• udmurt_rev.txt
• komi_rev.txt
• chuvash_rev.txt
• tatar_rev.txt

The character encoding of the files is Unicode (UTF-8). In the files, the material is arranged in four columns:

1) the word
2) language
3) word class
4) sources

The meanings of the words are not given in the word list. Technically, the files are plain text Comma Separated Value (CSV) files. This simply means that a comma character (,) separates the fields for different types of information (word, language, word class, sources) in each line of the file.

One can handle with the word list files using word processors that can cope with Unicode characters, or with the help of such programs as Microsoft Excel. It is, however, suggested that the user utilizes the SFOu WordListTool program that has been developed specifically for these kinds of word lists.

The documentation of the word lists

The following documents contain detailed descriptions of the word lists:

Mari, Mordvin and Udmurt: Descriptions_en.pdf (in Russian Descriptions_ru.pdf)
Komi, Chuvash and Tatar: Descriptions_en_2016.pdf (in Russian Descriptions_ru_2016.pdf)
There is also a Finnish description of the Tatar word list: Tat_sanalista_kuvaus.pdf

Summary descriptions, as well as advice for the use of the word lists, can be found in the following documents:

• Mari, Mordvin and Udmurt: Booklet_en.pdf (in Russian Booklet_ru.pdf)
• Komi, Chuvash and Tatar: QuickStartManual_en.pdf
  (in Finnish QuickStartManual_fi.pdf, in Russian QuickStartManual_ru.pdf)

Although the Booklet and QuickStartManual documents were written as manuals for the use of word lists with help of the SFOu WordListTool program, they also contain practical information for those not using this program.

The word list and program packages

The word lists, the accompanying documentation and the SFOu WordListTool program are here made available in three packages. The files with names ending in .exe are self-extracting archives. Please read the instructions and licences before extracting the materials. Note that the new word lists licence (2016) also covers the lists in the earlier package (2007).

1) Electronic Word Lists: Komi, Chuvash and Tatar. With SFOu WordListTool 1.4. Lexica Societatis Fenno-Ugricae XXXI:2 (2016)

The first package contains the word lists of the Komi, Chuvash and Tatar languages, as well as the new Windows version of the program SFOu WordListTool. System Requirements: free hard disk space 50 MB, 512 MB of memory, Windows 2000/XP/Vista/7/8. The contents of this package constitute the publication Lexica Societatis Fenno-Ugricae XXXI:2 (ISBN 978-952-5667-79-0).

Instructions_WLT1.4.pdf
WordLists_licence_2016.pdf
SFOuWLT_licence_2016.pdf
SFOu_WLT_1.4_Win.exe

2) Only word lists and their documentation: Mari, Mordvin, Udmurt, Komi, Chuvash and Tatar (2007 + 2016)

The second package combines the word lists of the 2007 and 2016 publications (Lexica Societatis Fenno-Ugricae XXXI:1-2) and makes them available without an accompanying program. The materials include the word lists of the Mari, Mordvin, Udmurt, Komi, Chuvash and Tatar languages, together with all relevant documents.

Instructions_Lists.pdf
WordLists_licence_2016.pdf
Wordlists&Documents.exe

3) Electronic Word Lists: Mari, Mordvin and Udmurt. With SFOu WordListTool 1.3. Lexica Societatis Fenno-Ugricae XXXI:1 (2007)

The Mari, Mordvin and Udmurt word lists were originally published in a CD in 2007, see http://www.sgr.fi/lexica/lexicaxxxi1.html. This package included the first version 1.3 of the SFOu WordListTool program. The contents of the aforementioned CD are now made available through the internet. The licences are shown during the installation procedure.

Instructions_WLT1.3.pdf
SFOu WordListTool 1.3 CD contents.exe


Electronic Word Lists: Mari, Mordvin and Udmurt. With SFOu WordListTool 1.3
. Ed. Jorma Luutonen et al. Lexica Societatis Fenno-Ugricae XXXI:1. ISBN 978-952-5150-98-8. Helsinki 2007.
Electronic Word Lists: Komi, Chuvash and Tatar. With SFOu WordListTool 1.4. Ed. Jorma Luutonen et al. Lexica Societatis Fenno-Ugricae XXXI:2. ISBN 978-952-5667-79-0. Helsinki 2016.


ylös

9 February, 2017

verkko[a]sgr.fi