recordings, transcriptions, translations, glossing, analyses
Ilia State University
recordings, transcriptions, translations
Ivane Javakishvili Tbilisi State University
Most recordings were made between 2014 and 2016, with a small number dating back to 2005.
Data were collected in Georgia (Tsikhisjvari, Manglisi, Santa, Shua Kharaba, Tetritskaro, Tbilisi) and in Greece (Athens and Thessaloniki, from Pontic speakers who had emigrated from Georgia). The map shows the recording locations in Georgia as well as the places of birth of the speakers in our sample.
The speakers were instructed in Pontic Greek by a native instructor. The instructions (TXT, VA1 subcollections) are listed below in English translation.
| Abbreviation | Text | Instruction |
|---|---|---|
| General instruction | Please answer the following questions spontaneously. Just speak normally, as if you are speaking to a friend. It does not matter if you are not sure about details, just give a natural answer. | |
| AN | Ancestor Story | How did your ancestors come to Georgia? |
| FM | Family | Tell us the history of your family (for speakers who do not live in the original settlements: how did your family came from the villages to Tbilisi and from Tbilisi to further destinations)? |
| VL | Village | Please describe the village where your family comes from. |
| CD | Comparative Description | Please tell us how your people are different from the other people in the village/city (Russian, Greek)? |
| CL | Culture | Please tell me a fairy tail or a poem in your native language. (If you do not know any fairy tail/poem, please tell me what you find most important in the culture of your people). |
| MR | Marriage | Please tell us how your people celebrate an engagement/marriage and what is the difference to the way other people in this village/city feel celebrate a marriage. |
| FE | Feast | Tell us a difference between the way you celebrate a particular feast in your group and the groups of the other people of your environment? (Christmas, Easter, Panajia). |
| LG | Language | Please tell me how you perceive the major differences between your language and Russian. |
The annotation layers of the corpus files (.eaf) contain the following layers:
| Tier name | Content |
|---|---|
| ref | identifier of the sentence: contains the name of the text and the number of the sentence in the last two digits; it provides the reference to cite examples and to store references in searching results |
| tx |
PARENT=ref corrected transcription on word level: makes it possible to search for single words containing particular glosses, that would be not possible if only the tx-a tier would exist; besides, the word boundaries can be used later to align word boundaries and sound |
| mb |
PARENT=tx corrected transcription on morpheme level |
| ge |
PARENT=mb morpheme-aligned glosses in English |
| ps |
PARENT=mb parts of speech |
| ft |
PARENT=ref free translation |
| nt |
PARENT=ref comments / notes about the sentence |
| tx_a |
PARENT=ref transcription on sentence level: this tier makes it possible to search for discontinuous strings of words hosted by a single ref-entry and with an undefined sequence of words in between; it also facilitates reading the whole sentence in the object language |
| ge_a |
PARENT=ref (a = associated) sentence-aligned glosses in English: this tier makes it possible to search for discontinuous strings hosted by a single ref-entry and with an undefined sequence of words in between at the functional level (not at the level of forms, unlike tx_a) |
| id | no content; needed for back-conversion to Toolbox |
In files with more than one speakers, the speaker label is merged to the content label (this applies to subcollections VA1 and VA2):
...@speaker1
...@speaker2
...@speaker3
| class | orthography | IPA |
|---|---|---|
| vowels | a | a |
| ä | æ | |
| e | ɛ | |
| i | i | |
| o | ɔ | |
| u | u | |
| plosives | p | p |
| t | t | |
| k | k/c | |
| b | b | |
| d | d | |
| g | g/ɟ | |
| fricatives | f | f |
| θ | θ | |
| s | s | |
| sh | ʃ | |
| x | χ | |
| v | v | |
| ð | ð | |
| z | z | |
| zh | ʒ | |
| j | j | |
| ɣ | ɣ | |
| affricates | ts | ʧ |
| ch | ʧ | |
| dz | dz | |
| dzh | ʤ | |
| nasals | m | m |
| n | n/ŋ | |
| liquids | r | ɾ |
| l | l/ł |
Notes:
The abbreviations for glosses follow the Leipzig Glossing Rules.
Nominal template (Adjectives, Substantives, Pronouns, adjectival Participles): [Gender, Number, Case], e.g.:
Verbal template: [Voice, Mood, Aspect, Tense, Finiteness, Person/Number], e.g.:
Abbreviations:
| Category | Abbreviation | Meaning |
|---|---|---|
| Gloss | 0 | epenthesis |
| Gloss | 1 | first person |
| Gloss | 2 | second person |
| Gloss | 3 | third person |
| Gloss | ABIL | ability |
| Gloss | ABL | ablative |
| Gloss | ACC | accusative |
| Gloss | ADJR | adjectivalizer |
| Gloss | AOR | aorist |
| Gloss | COND | conditional |
| Gloss | COND.COP | conditional copula |
| Gloss | CONV | converb |
| Gloss | DAT | dative, genitive |
| Gloss | EPST.COP | epistemic copula |
| Gloss | EV.PST | evidential past |
| Gloss | FUT | future |
| Gloss | GEN | genitive |
| Gloss | GER | gerund |
| Gloss | INF | infinitive |
| Gloss | INSTR | instrumental |
| Gloss | IPFV | imperfective |
| Gloss | NEG | negative |
| Gloss | NEG.COP | negative copula |
| Gloss | NEG.EXIST | negative existential |
| Gloss | NR | nominalizer |
| Gloss | OPT | optative |
| Gloss | PASS | passive |
| Gloss | PL | plural |
| Gloss | POSS | possessive |
| Gloss | POT | potential |
| Gloss | PROC | procedural |
| Gloss | PST | past |
| Gloss | PTCP | participle |
| Gloss | SG | singular |
| Gloss | VOC | vocative |
| Part of speech | N | noun |
| Part of speech | V | verb |
| Part of speech | A | adjective |
| Part of speech | Adv | adverb |
| Part of speech | P | adposition |
| Part of speech | Q | quantifier |
| Part of speech | AQ | ordinal nominal |
| Part of speech | C | conjunction |
| Part of speech | PN | pronoun |
| Part of speech | PRT | particle |
| Part of speech | X | unclear |
| Miscellaneous | xxx | unidentified words (mb layer) |
| Miscellaneous | xxx | unknown meaning (ge layer) |
| Miscellaneous | HESIT | hesitation |
| Miscellaneous | ((coughs)) | coughing |
| Miscellaneous | ((laughs)) | laughing |
| Miscellaneous | ((smiles)) | smiling |
The Pontic Greek corpus is online available in ANNIS that allows for visualizations and queries in multimodal annotations. It comes with a powerful query language (AQL=ANNIS query language) that allows to retrieve complex data patterns in multilayered annotations (Krause, Thomas & Zeldes, Amir 2016: ANNIS3: A new architecture for generic corpus query and visualization. in: Digital Scholarship in the Humanities 2016 (31). http://dsh.oxfordjournals.org/content/31/1/118).
You can access the corpus in the SPW installation at: https://spw.uni-goettingen.de/annis/.
Before the query, you need to select a corpus: e.g., "PNT-TXT-0.1-mp3". You write your query in the query window:
In your queries, you need to specify the annotation layer and define the expression that you are looking for. Notice that the query tool only retrieves sentences that equal the queried expression (and not sentences that contain the queried expression).
| Query | Explanation |
|---|---|
| tok | all tokens of the corpus (not very useful, but illustrative) |
| mb="ke" | all tokens in the morpheme layer (mb: words with morphemic boundaries) that contain exactly the form ke 'and'. |
| mb="aðaká" | all tokens in the morpheme layer (mb) that contain exactly the form "aðaká". |
| ge="be:PST:3.PL" | all tokens in the gloss layer (ge) that contain exactly the string "be:PST:3.PL". |
| ge="LOC" | all tokens in the gloss layer (ge) that contain exactly the string "LOC" (= locative). |
| ps="V" | all tokens in the POS layer (ps) that contain exactly "V" (=verb) |
| mb="s" _=_ ge="LOC" | all tokens that contain exactly "s" in the morpheme layer and exactly "LOC" in the gloss layer - in the same slot (_=_). |
Regular expressions are included in slashes. You find some illustrative examples below. More details about the regular expressions in AQL are found here.
| Query | Regular expression | Explanation |
|---|---|---|
| mb=/eksér[oi]/ | "[...]" contains alternative characters | all tokens in the morpheme layer (mb) that contain "ekséro" or "ekséri". |
| mb=/(pos|pu)/ | "(...|...)" contains alternative strings | all tokens in the morpheme layer (mb) that contain "pos" or "pu". |
| mb=/ekséro?/ | "?" stands for "the last character is optional" | all tokens in the morpheme layer (mb) that contain the string "ekséro" or "eksér". |
| mb=/a+/ | "+" stands for "at least one occurrence" | all tokens in the morpheme layer (mb) that contain the at least one occurrence of the character "a", which includes "a" and "aa". |
| mb=/a*/ | "*" stands for "zero or more occurrences" | all tokens in the morpheme layer (mb) that contain "a", "aa", "aaa", "aaaa", etc. |
| mb=/ti./ | "." stands for "whatever character" | all tokens in the morpheme layer (mb) that contain the string "ti" and a character (.), e.g., "tin", "tis", etc. |
| ge=/.*3.PL/ | ".*" stands for "zero or more occurrences of whatever character" | all tokens in the gloss layer (ge) that contain "...3.PL" |
| ge=/.*PFV.*/ | ".*" stands for "zero or more occurrences of whatever character" | all tokens in the gloss layer (ge) that contain "...PFV..." |
More about AQL: AQL documentation site.
Berikashvili, Svetlana. 2022. Contact-Induced Change in the Domain of Grammatical Gender in Pontic Greek spoken in Georgia. Languages 7(2): 79. https://doi.org/ 10.3390/languages7020079.
Berikashvili, Svetlana. 2019. Verb Adaptation in Pontic Greek spoken in Georgia. In Tzitzilis, Ch. & G. Papanastassiou (eds). Language Contact in the Balkans and Asia Minor, Series: Greek Language: Synchrony and Diachrony 2. Thessaloniki: Institute of Modern Greek Studies (M. Triandaphyllidis Foundation), 262-279.
Berikashvili, Svetlana. 2018. Several Features of Aorist and Verbal System in Pontic Greek spoken in Georgia. Arxeion Pontou (Pontic Archive), Vol. 58. Athens: Committee for Pontic Studies, 195-229.
Berikashvili, Svetlana. 2017. Morphological Aspects of Pontic Greek spoken in Georgia. Series: Languages of the World 54, Munich: LINCOM 2017, 168pp.
Berikashvili, Svetlana. 2016. Morphological Integration of Russian and Turkish Nouns in Pontic Greek. Language Typology and Universals, 69.2, 255–276, DOI: 10.1515/stuf-2016-0012.
Berikashvili, Svetlana & Lobzhanidze, Irina. 2017. Number in Pontic Greek spoken in Georgia. In M. Chondrogianni, S. Courtenage, G. Horrocks, A. Arvaniti, I. Tsimpli (Eds.), Proceedings of the 13th International Conference on Greek Linguistics. London: University of Westminster, 51-61.