HomePublicationsManuscripta Castreniana Jurak-SamoiedicaEditorial Principles

Editorial Principles

Editorial Principles of Manuscripta Castreniana Jurak-Samoiedica

The Manuscript Materials and the Digital Edition

Manuscripta Castreniana Jurak-Samoiedica (MCJS) refers to the notes recorded in and about the Tundra Nenets and Forest Nenets languages by Matthias Alexander Castrén (1813–1852) during his field expeditions, archived in the National Library of Finland. Castrén carried out studies among the Tundra Nenets in 1842–1844 and again in 1846, and among the Forest Nenets briefly in 1845. The manuscripts in question contain a large collection of Tundra Nenets materials and a much smaller Forest Nenets corpus. Castrén made grammatical and lexical notes of both Nenets languages, while only the Tundra Nenets corpus includes folklore texts. MCJS represents by far the largest section of the entire Manuscripta Castreniana, and their publication history is accordingly very diverse. The manuscript materials themselves are heterogeneous and range from lexical data, grammatical paradigms, and linguistic commentary to folklore notes with translations into Swedish and Russian and accounts of ethnology, history, and statistics. Consequently, slightly different editorial principles have been applied to the linguistic materials (Manuscripta Castreniana Jurak-Samoiedica Linguistica) and to folklore notes (Manuscripta Castreniana Jurak-Samoiedica Folkloristica) as well as the section containing ethnological and other data (Manuscripta Castreniana Jurak-Samoiedica Ethnographica), the editorial process of which is currently less advanced. The manuscript of the grammar of the Samoyed languages nearly finished for publication by Castrén (volume MC VII Samoiedica 1 in the manuscripts) that was posthumously edited and published by Anton von Schiefner (Castrén 1855b) is not included in MCJS, as it deals not only with the Nenets languages but also the other Samoyed languages recorded by Castrén.

The digital edition of MCJS consists of the following sections within Castrén’s manuscripts:


  1. Grammaticalia pagg. 1–148
  2. Ordförteckningar pagg. 149–546
  3. Samojediska sagor pagg. 547–590

MC IX SAMOIEDICA 3: JURAK-SAMOIEDICA 2 (mikrofiches 69–78)

  1. Samojediska talöfningar samt Grammatikaliska anteckningar (Kanin, Timan). Poemata et notae musicae pagg. 1–190
  2. Samojediskt Vocabularium (Archangelsk, Mesen, Nes, Izma, Kolva) pagg. 191–286
  3. Vocabularia pagg. 287–348
  4. (Utkast till jurak-samoj. ljudläran) pagg. 349–426
  5. Fragmenta grammaticalia pagg. 427–432
  6. Grammatikaliska Anmärkningar öfwer Samojed-språket pagg. 433–747
  7. Grammaticalia pagg. 749–867
  8. Utdrag ur Arkimandriten Vinjamens Samojediska Grammatik pagg. 869–906

MC X SAMOIEDICA 4: JURAK-SAMOIEDICA 3 (mikrofiches 79–81)

  1. Grammaticalia et Vocabularia Pustosersk, Obdorsk etc. pagg. 1–248

MC XI SAMOIEDICA 5: JURAK-SAMOIEDICA 4 (mikrofiches 82–85)

  1. Anmärkningar öfwer Jurakiskan (Plachina, Chantaiko) pagg. 1–194
  2. Anteckningar öfwer Jurakiskan i Dudinka och Tolstoj Nos pagg. 195–218
  3. Anmärkningar öfver den Kondinska dialecten af Samojediskan, gjorda i byn Toropkowa wid Obfloden 1845 om sommaren pagg. 219–294                                                                                                                                                                                           


  1. Samojedisk ordbok (jur. samoj. - svensk) (resp. rysk.) alfabetisk ordbok; Tas, Dud, B, Ob, Tob, Knd, Bs) pagg. 1–410
  2. Samojedisk ordbok (Renskrivet exemplar) pagg. 411–518
  3. Vocabularia pagg. 519–532


  1. Etnographiska, Historiska och Statistiska Anmärkningar pagg. 1–272
  2. Varia pagg. 273–280
  3. Rysk handling (angående samojedernas språk och språkkunskap) pagg. 281–284
  4. Rysk handling (angående samojedernas från Bolsesemelska tundran omvändelse till kristendomen pagg. 285–310
  5. Rysk handling (angående samojediska namn) pagg. 311–340

MCJS Folkloristica consists of data in sections 2 (Ordförteckningar) and 3 (Samojediska sagor) of MC VIII SAMOIEDICA 2: JURAK-SAMOIEDICA 1, and sections 1 (Samojediska talöfningar samt Grammatikaliska anteckningar [Kanin, Timan]. Poemata et notae musicae) and 2 (Samojediskt Vocabularium [Archangelsk, Mesen, Nes, Izma, Kolva]) of MC IX SAMOIEDICA 3: JURAK-SAMOIEDICA 2. MCJS Ethnographica consists of section 1 (Etnographiska, Historiska och Statistiska Anmärkningar) of MC XIII SAMOIEDICA 7: JURAK-SAMOIEDICA 6 (sections 2–5 of this volume have not been edited yet). The remaining data belong to MCJS Linguistica.

The divisions of the MCJS Digital Edition

MCJS Linguistica deals with the entire corpora of Castrén’s published and unpublished linguistic field materials of Tundra Nenets and Forest Nenets. In the course of the project, all of the original manuscripts will be published electronically and accompanied with commentary on both the earlier publications and the current understanding of the data.

MCJS Folkloristica contains the Tundra Nenets folklore manuscripts as well as the accompanying notes. Castrén’s records of sung, metrically organized Tundra Nenets folklore have been published by Schiefner in Castrén (1854) and in an edited form by Lehtisalo in Castrén (1940). Lehtisalo made minor changes to the transcription and the line structure of Castrén’s records, and while these changes have not yet been subject to a systematic comparison in the digital edition, they will constitute a major future task in the digital edition. Lehtisalo also made a number of changes in the order of the texts, which is discussed in the appendix article to the digital edition (Lukin 2017). The digital edition aims at organizing the texts in the way Castrén had intended, including his contemporary understanding of the classification of folklore, and reflecting their acquisition through the fieldwork process. In addition to the sung poetry, five Samoyed tales, included in the posthumous publications in Swedish (1857a) and German (1857b), are also published in the digital edition. Furthermore, the digital edition contains a number of previously unpublished texts, i.e., short individual songs, two translations of Orthodox liturgical texts, and a manuscript on Nenets songs. The value of Castrén’s folklore notes, from the point of view of the evolving paradigm of ethnology, of the Nenets genre system, and of the practice of performing Nenets epic poetry, is discussed in the appendix article (Lukin 2017), where the epic poems and tales are summarized in the section called the Description of the Material in Manuscripta Jurak-Samoiedica Folkloristica.

MCJS Ethnographica comprises a manuscript on Samoyed ethnology, which Castrén prepared in 1852 for the Russian Academy of Sciences (cf. Sjögren 1854). He never managed to complete the task in question, but even unfinished, it is an unprecedented document on the development of ethnology in Russia and generally in Europe.

Changes in the text

The varying editorial principles with regard to the sections of Manuscripta Castreniana Jurak-Samoiedica notwithstanding, the point of departure of the editorial work in the digital edition has always been to follow Castrén’s manuscripts as accurately as possible. The digital edition is, therefore, a ‘diplomatic’ documentary edition, which intends to present Castrén’s original records to the international academic audience for future research. At the same time, the digital edition incorporates texts in the current Tundra Nenets orthography for the purpose of making it accessible to a wider Tundra Nenets readership.

While the digital edition of Manuscripta Castreniana Jurak-Samoiedica aims to present Castrén’s records to the readers as close to the original manuscript as possible, some changes proved to be inevitable when hand-written text was digitized.

In MCJS Linguistica, corrections that were in all likelihood made by Castrén himself at the time of the writing down of the material have been incorporated into the digital edition without further comment, and a limited number of purely graphic modifications have been carried out. The Swedish and Russian glosses appear, as a rule, in their original form, and any changes would be referred to in a special commentary.

As for Manuscripta Castreniana Jurak-Samoiedica Folkloristica, special care has been taken in the digital edition to include the numerous deletions, additions, and clarifications, even if they may make the records somewhat arduous to read. The main concern here involves the records made in the initial phase of Castrén’s fieldwork, when he was still learning the Tundra Nenets language and developing a coherent transcription system for it. The so-called ‘first hand’ of Castrén has been taken as the starting-point for the sake of consistency, and all of the many deletions and additions are presented as such, even, in contrast with the practice of MCJS Linguistica, in cases where it is clear that they were made immediately to correct a mistake. Numerous notes and clarifications made by Castrén are included in the digital edition in the form of comments, including both contextual information related to the interpretation of the folklore texts, most likely made by the informants, and lexical clarifications, notably adding the basic form or some inflected forms as well as the meaning of a particular word. Consequently, the Tundra Nenets folklore texts have not been ‘normalized’ or ‘standardized’ in any manner, insofar as one of the aims of the project is to represent the development of Castrén’s transcription system and the process of recording ethnological material in the course of his fieldwork.

The Principles of Encoding Manuscripta Castreniana Jurak-Samoiedica

Manuscripta Castreniana Jurak-Samoiedica has been encoded in XML form (eXtensible Markup Language) according to the Guidelines of TEI consortium. (TEI: P5 Guidelines 2007). The xml/TEI files are represented in the edition in html.

The starting point of the encoding has been firstly to represent the structure of Castrén’s notes and secondly to avoid complicated structures. One folklore text has been encoded in a single file. The columns that have been used by Castrén to separate the Tundra Nenets version of the sung poetry and its translation into Swedish or Russian are treated as different divisions (<div>), which are further divided into pages and lines (<pb>; <l>). Each line has been given a number using the xml:id attribute so that the line numbers in both divisions correspond to each other. The <linkGrp> element has been used to link the corresponding lines to each other in the xml/TEI file.

In the encoding of the texts themselves the elements <del> and <add> have been used to encode the deletions and additions, and Castrén’s comments are encoded with the <note> element and the xml:id attribute MAC. The underlining has been encoded with the element <hi>. The unavoidable parts that could not be encoded because of the quality of the text have been encoded with <unclear>.

In the prose texts the elements <p> (paragraph) and <ab> (anonymous block) have been used.

Karina Lukin & Tapani Salminen 2018

English revision by Uldis Balodis