about

Image

Tush, Tsova-Tush, Batsbi, Bats

is a Nakh-Daghestanian language spoken by a small group of speakers in the Tusheti region of northeastern Georgia (village Zemo Alvani). Tush is especially well known among linguists for its complex ergative alignment, which manifests in both its case-marking and verb agreement systems. With fewer than 500 speakers today, Tush is considered severely endangered, which makes its documentation urgent.

Contributors

Diana Kakashvili

conception, instructions, recordings, transcriptions, translations, analyses

Ivane Javakishvili Tbilisi State University, University of Göttingen

Stavros Skopeteas

conception, advice

University of Göttingen

Tinatin Tsiskarishvili

language expert, philological advice

Tbilisi

Elias Duddey

syntactic annotations

University of Göttingen

Binglong Gao

syntactic annotations

University of Göttingen

Léa George

syntactic annotations

University of Göttingen

Kay Klein

syntactic annotations

University of Göttingen

Charlotte Löwe

syntactic annotations

University of Göttingen

Mansuer Simayi

syntactic annotations

University of Göttingen

Language

Genetic affiliation: Nakh-Daghestanian/East Caucasian, Nakh
Place: Georgia
Language Code: bbl
Population: below 500
Endangerment: severely endangered (Endangered Languages Project)

Glottolog 4.8 edited by Hammarström, Harald & Forkel, Robert & Haspelmath, Martin & Bank, Sebastian, licensed under a Creative Commons Attribution 4.0 International License.

data

Place and Time

data recorded in Zemo Alvani
13–19 March 2022

Speakers

  • women: 12
  • men: 4

age

BBL-AN

range: 1935-1967
average: 1948,25
median: 1949

language use

BBL-PA

How often do you use Batsbi
with your parents/children/
neighbors/neighbors' children?

Instructions

The speakers produced the narratives as an answer to the following instructions. Instructions were presented in Georgian.

Abbreviation Text Instruction
AN Ancestor Story Please tell me how do you imagine that your ancestors lived. It is not a problem if you are not sure about the details. Just tell me the story of your ancestors as far as you know it. If you do not know anything, please tell me how do you imagine that these people were living.
AC Activity Description Please tell me how you are making a K’ot’ori. Do not worry if there are some details that you do not know, just give me a clear description, such that another person can do the same.
CD Comparative Description Please tell me how you perceive the major differences between Georgian and Tsova-Tush language.
EV Event Description Please tell me how did you enjoy the last Dadaloba: what did you prepare for the feast, who was there, what did you do, what did you think, what did you feel, what happened.
PA Path Description Please describe the path to go from Zemo Alvani to Tsovata (or Tbatana) to me. Please give exact descriptions, so that we can recognize the path that we have to follow (by telling me about all the important places on the way).

Recordings

  • Recordings took generally place in quite places (not laboratory conditions),
  • with a ZOOM H6 recorder, interchangeable clip-on microphones, at a sampling frequency of 96000 kHz, audio files were saved in wav format.
  • Total soundfiles: 80;
  • Total duration: 138 min.
Duration of our recordings (in seconds):
speaker AC AN CD EV PA
01403124106276
0268952511162
0344725314490
04175957999105
05608080158334
066293408062
079926968337203
087312874220159
0912616412822091
10100101505396
11137187136274195
12982141847794
131261259919191
1464925411742
159911468233220
16111124157175225

File names

  • Example: BBL-TXT-AN-00000-01
  • BBL- language code: BBL=“Batsbi"
  • TXT- subcollection code: TXT=“Texts"
  • AN- instruction code; 5 options: AN="Ancestors", AC="Activity", CD="Comparison", EV="Event", PA="Path" (see instructions above)
  • 00000- does not play a role in this subcollection
  • 01 Speaker identifier, ranges from 01 to 16

Annotations

Layers

The annotation layers of the corpus files follow the conventions used in ELAN. "A_" stands for the speaker, the further layers are associated with phrase/word/morph intervals, and either contain text (txt) in Tush (bbl) or translations (gls)/part of speech (pos) in English (en). Precisely:

layer speaker level content language
interlinear-txt-title-texttitle-
A_phrase-segnumAtime-aligned intervalnumber-
A_phrase-txt-bblAsentencetranscriptionBatsbi
A_phrase-gls-enAsentencetranslationEnglish
A_word-txt-bblAwordtranscriptionBatsbi
A_word-pos-enAwordpart of speechEnglish
A_morph-txt-bblAmorphemetranscriptionBatsbi
A_morph-msa-enAmorphemepart-of-speechEnglish
A_morph-type-enAmorphemetypeEnglish
A_morph-gls-enAmorphemetranslationEnglish

Transcriptions

local orthography (Georgian alphabet) romanized orthography IPA
aa
აჼaⁿã
ააā
bb
gg
dd
ee
ეჼ
̆
vv
zz
t
თთtttʰː
ii
იჼĩĩ
̆
jj
ll
ლლll
ლ‘ɬɬ
mm
nn
oo
ოჼõõ
̆
žʒ
rr
ss
სსss
ტტṭṭtʼː
uu
უჼũũ
̆ǔǔ
p
k
ğʁ
ყყq̇q̇qʼː
šʃ
čtʃʰ
cts
ʒdz
tsʼ
č̣tʃʼ
xx
ხხxx
q
ჴჴqqqʰː
ǯ
hh
̣ћћ
ɦɦ
Ɂʔ
ʕʕ

Categories

The abbreviations for glosses follow the Leipzig Glossing Rules.

Noun classes are annotated as lexical property of nouns or as a property of the agreement prefixes.

Category Value Abbreviation
caseabsolutiveABS
caseadessiveADESS
caseadverbialADV
caseallativeALL
casecontactCONT
casedativeDAT
caseergativeERG
casegenitiveGEN
caseillativeILL
caseinstrumentalINSTR
caselocativeLOC
casenominativeNOM
caseoblique stemOBL
casevocativeVOC
pronominalfirst1
pronominalsecond2
pronominalthird3
pronominalinclusiveINCL
pronominalexclusiveEXCL
pronominalpossessivePOSS
pronominalreflexiveREFL
pronominalmedial (demonstrative)MED
pronominalproximal (demonstrative)PROX
pronominaldistal (demonstrative)DISTAL
pronominalindefiniteINDF
noun classsingular V, plural B (masculine)VB
noun classsingular J, plural D (feminine)JD
noun classsingular B, plural BBB
noun classsingular D, plural DDD
noun classsingular J, plural JJJ
noun classsingular B, plural DBD
noun classsingular B, plural JBJ
noun classsingular D, plural JDJ
numbersingularSG
numberpluralPL
numberassociative (plural)ASC
adjectivalcomparativeCOMP
adjectivalsuperlativeSUP
adjectivalintensifierINTS
adjectivalmultiplicativeMULT
adjectivalprivativePRIV
adjectivaldistributiveDISTR
tense/aspectaoristAOR
tense/aspectnon-pastNPST
tense/aspectpastPST
tense/aspectfutureFUT
tense/aspectimperfectiveIPFV
tense/aspectperfectivePFV
tense/aspecthabitualHAB
tense/aspectfutureFUT
tense/aspectpresentPRS
verbalsubjunctiveSUBJ
verbalimperativeIMP
verbalpolite (imperative)POL
verbalauxiliaryAUX
verbalcausativeCAUS
verbalnon-witnessedNW
verbalobjectO
verbalsubjectS
verbaloptativeOPT
verbalpreverbPV
verbalinfinitiveINF
verbalconditionalCOND
verbalconverbCVB
verbalverbal nounVN
verbalparticiplePTCP
derivationnominalizerNMLZ
derivationadverbializerADVZ
derivationadjectivalizerADJZ
derivationtransitivizerTR
derivationintransitivizerINTR
derivationabstract (noun)ABSTR
derivationsimilativeSIMV
clausalrelativizerREL
clausaladditive (particle)ADD
clausalaffirmative (particle)AFF
clausalinterjectionINTRJ
clausalnegationNEG
clausalquestion (particle)Q
clausalquotative (particle)QUOT

Texts

The following texts illustrate the types of elicited narratives of the present corpus. The entire corpus (sound files in .wav and annotations in ELAN) are archived and available to download in Zenodo:
DOI: 10.5281/zenodo.15863417.

The corpus is also accessible for queries online through the ANNIS database; see below.

syntax


Syntactic annotations of the Tush corpus are based on SUD (=Surface Universal Dependencies), see Kim Gerdes, Bruno Guillaume, Sylvain Kahane, and Guy Perrier. 2018. SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 66–74, Brussels, Belgium. Association for Computational Linguistics. The basic properties of our corpus are summarized In the following. For more elaborated discussion, we refer to the SUD website.

You can visit and download the Tush Treebanks in ArboratorGrew (Treebank BBL-SUD). The current version of the Treebanks was created semi-automatically with a parser written in R. The data contains errors that will be manually corrected. Next release scheduled for 1.11.2025.

Part of Speech

Part of speech information was converted to the UPOS categories following UD. The XPOS layer contains the abbreviations used in FLEx and are sometimes more detailed (e.g., they are informative for subclasses of pronouns). The inflectional features (FEATS) are only used for the inflectional categories of nouns and the different verb forms.
UPOS XPOS FEATS form gloss
ADJadjaṭṭãeasy
ADJadjpsare-lũyesterday-ADJZ
ADJordnumqa-lğe-čthree.OBL-ORD-OBL
ADJquantmeɬseveral
ADPadpmakon
ADVadvlaxušbelow
ADVadvdoḳšor-ušheart.DD-wide-ADVZ
ADVverbprtxolmeHAB
DETdetmarãsuch
CCONJcoordconnjeand
CCONJcoordconnleor
INTJinterjailook
INTJinterjvaimeoh
NOUNnCase=Abs|Number=Sing|Gender=BBcomdough.B
NOUNnCase=Ins|Number=Sing|Gender=DDdrož-e-vyeast.D-OBL-INSTR
NUMcardnumatsithousand
NUMcardnumbarɬeight
NUMcardnumši-štwo-DISTR
NUMordnumši-lğẽtwo-ORD
NUMmultipnumqo-c̣three-MULT
PARTprt=ḳi=indeed
PARTprtahaɁyes
PRONpersCase=All|Number=Singso-gŏ1SG-ALL
PRONdemCase=Abs|Number=PluroqarDIST.PL
PRONindfproCase=Abs|Number=Singvumsomething
PRONinterrogCase=Abs|Number=Singvuxwhat
PRONrecpCase=Abs|Number=Singvašareach_other
PRONreflCase=Dat|Number=SingšarnREFL.3SG.DAT
PRONpossNumber=PluršuĩPOSS.REFL.3PL
PROPNnpropCase=Abs|Number=SingagvisṭŏAugust
SCONJcompmeSUBORD
SCONJsubordconndaxeɁbefore of
VERBvVerbForm=Infb-iv-ãBB-sow-INF
VERBvVerbForm=Finb-ixBB-go.IPFV
VERBvVerbForm=Vnounb-iḳ-arBB-take-VN
VERBvVerbForm=Partb-iḳ-enBB-take-PST.PTCP
XXim[DISFLUENCY]

Root

The root of the dependency tree is the finite verb of a clause or – in the absence of a finite verb – the highest head of the annotated unit.

You can view our illustrative annotations by selecting the provided examples. The Viewer is compatible with any CoNLL-U file: you may also upload your own for visualization if you wish.

Modifiers

SOURCE: head, TARGET: modifier

  • adjective/quantifier [mod]← noun
  • adverbs [mod]← verb
  • commitative, instrumental, locative, adverbial cases or ablative, superlative, etc. postpositions [mod]← verb (occasionally also other)
  • degree adverb [mod]← adverb or adjective
  • subordinators of adverbial clauses [mod]← element in matrix clause
  • Genitive noun [udep]← noun
  • "possessive pronouns" (=genitive of personal pronouns) [udep]← noun

You can view our illustrative annotations by selecting the provided examples. The Viewer is compatible with any CoNLL-U file: you may also upload your own for visualization if you wish.

Complements

SOURCE: head, TARGET: complement

  • absolutive argument [comp:obj]← transitive verb
  • infinitival complement [comp:obj]← transitive verb
  • nominal complement [comp:obj]← postposition
  • highest verb of embedded clause [comp:obj]← subordinating conjunction
  • comparative conjunction [comp:obj]← comparative marker (degree head) (see comparative constructions)
  • standard of comparison [comp:obj]← comparative conjunction
  • reported speech FEATS [Reported=Yes] [comp:obj]← verb of saying (see reported speech)

In our annotation, "obliques" refer to nominal arguments that are selected by certain (classes of) verbs, such as datives/allatives as well as absolutives as goals of motion verbs.

  • dative/allative of indirect object [comp:obl]← verb
  • experiencer dative [comp:obl]← psych verb
  • controller dative [comp:obl]← modal verb
  • absolutive of goal of motion [comp:obl]← motion verb
  • allative/directional case [comp:obl]← motion verb
  • causee dative/allative [comp:obl]← causative verb
  • predicative adjective [comp:pred]← copula

Note: The subject is a dependent of the copula, not of the predicative adjective.

You can view our illustrative annotations by selecting the provided examples. The Viewer is compatible with any CoNLL-U file: you may also upload your own for visualization if you wish.

Specifiers

SOURCE: head, TARGET: specifier

Subjects and determiners have separate labels in SUD. Note that subjects (target) are dependents of verbs (source), and determiners (target) are dependents of nouns (source).

  • demonstrative [det]← noun (if the demonstrative pronoun is used attributively)

  • ergative argument [subj]← transitive verb
  • single argument (ergative or absolutive) [subj]← intransitive verb
  • infinitive as subject [subj]← transitive/intransitive verb

You can view our illustrative annotations by selecting the provided examples. The Viewer is compatible with any CoNLL-U file: you may also upload your own for visualization if you wish.

Other

  • conjunction [cc]← last conjunct
  • preceding conjunct [conj:coord]← subsequent conjunct

Note: This implies that the syntactic relation of the coordinated phrase to the head is only annotated at the leftmost conjunct.

The first expression (of disfluencies, repetitions, reformulations) is treated as head, which means that it hosts the dependency relation of the entire unit to the head in the clause or is the root.
  • false start [conj:dicto]← revised unit
  • expression [conj:dicto]← repeated or reformulated unit
  • punctuation [punct]← highest element of the preceding punctuation domain

You can view our illustrative annotations by selecting the provided examples. The Viewer is compatible with any CoNLL-U file: you may also upload your own for visualization if you wish.

queries

The Tush corpus is online available in ANNIS that allows for visualizations and queries in multimodal annotations. It comes with a powerful query language (AQL=ANNIS query language) that allows to retrieve complex data patterns in multilayered annotations (Krause, Thomas & Zeldes, Amir 2016: ANNIS3: A new architecture for generic corpus query and visualization. in: Digital Scholarship in the Humanities 2016 (31). http://dsh.oxfordjournals.org/content/31/1/118).

You can access the corpus in the SPW installation at: https://spw.uni-goettingen.de/annis/.

Before the query, you need to select the corpus "BBL-0.1-mp3". You write your query in the query window:


Plain-text queries in AQL

In your queries, you need to specify the annotation layer and define the expression that you are looking for. Notice that the query tool only retrieves sentences that equal the queried expression (and not sentences that contain the queried expression).

Query Explanation
tokall tokens of the corpus (not very useful, but illustrative)
A_morph-txt-bbl="b-"all tokens in the form layer (A_morph-txt-bbl) that contain exactly the prefix "b-" (class prefix).
A_morph-txt-bbl="-o"all tokens in the form layer (A_morph-txt-bbl) that contain exactly the suffix "-o" (suffix for oblique stems, etc.).
A_morph-txt-bbl="c?ovate"all tokens in the form layer (A_morph-gls-en) that contain exactly the word "c?ovate" (the valley where the Tush people come from).
A_morph-gls-en="Tsovata"all tokens in the gloss layer (A_morph-gls-en) that contain exactly the word "Tsovata" (the valley where the Tush people come from).
A_morph-gls-en="ERG"all tokens in the gloss layer (A_morph-gls-en) that contain exactly the string "ERG" (ergative case).
A_morph-gls-en="1SG.ERG"all tokens in the gloss layer (A_morph-gls-en) that contain exactly the string "1SG.ERG" (1st person singular ergative).
A_word-txt-bbl="badri"all tokens in the word form layer (A_word-txt-bbl: words without morphemic boundaries) that contain exactly the form "badri" (=children)
A_word-pos-en="v"all tokens in the POS layer (A_word-pos-en) that contain exactly "v" (=verb)
A_morph-txt-bbl="-i" _=_ A_morph-gls-en="OBL"all tokens that contain exactly "-v" in the form layer and exactly "OBL" in the gloss layer - in the same slot (_=_).


Regular expressions in AQL

Regular expressions are included in slashes. You find some illustrative examples below. More details about the regular expressions in AQL are found here.

Query Regular expression Explanation
A_morph-gls-en=/B[BD]/"[...]" contains alternative charactersall tokens in the gloss layer (A_morph-gls-en) that contain "BB" or "BD".
A_morph-gls-en=/(ABL|ALL)/"(...|...)" contains alternative stringsall tokens in the gloss layer (A_morph-gls-en) that contain "ABL" or "ALL".
A_morph-txt-bbl=/badri?/"?" stands for "the last character is optional"all tokens in the form layer (A_morph-txt-bbl) that contain the string "badri" or "badr".
A_morph-gls-en=/B+/"+" stands for "at least one occurrence"all tokens in the gloss layer (A_morph-gls-en) that contain the at least one occurrence of the character "B", which includes "B" and "BB".
A_morph-gls-en=/AL*/"*" stands for "zero or more occurrences"all tokens in the gloss layer (A_morph-gls-en) that contain "A", "AL", "ALL", "ALLL", etc.
A_morph-gls-en=/.BL/"." stands for "whatever character"all tokens in the gloss layer (A_morph-gls-en) that contain a character (.) and the string "BL", e.g., "ABL", "OBL", etc.
A_phrase-gls-en=/.*vanilla.*/".*" stands for "zero or more occurrences of whatever character"all tokens in the free translation (A_phrase-gls-en) that contain "vanilla"; precisely, repeated characters (.*), the string vanilla, and repeated characters (.*)

More about AQL: AQL documentation site.

sources


grammars

Chrelashvili, K. (2002). C’ova-Tušuri ena [Tsova-Tush Language]. Tbilisi: Tbilisi State University.

Chrelashvili, K. (2007). Tsova-Tushinskij (Bacbijskij) jazyk. Moscow: Nauka.

Desheriev, J.D. (1953). Batsbijskij jazyk: fonetika, morfologija, sintaksis, leksika. Moskva: Izdatel'stvo Akademii nauk SSSR.

URL

Hauk, Bryn and Alice C. Harris (2018). Batsbi. To appear in: Y. Koryakov, Y. Lander, and T. Maisak (eds.). The Caucasian Languages: An International Handbook. Berlin/New York: Mouton. URL

Holisky, Dee Ann and Rusudan Gagua (1994). Tsova-Tush (Batsbi). In Rieks Smeets (ed.), North East Caucasian Languages, Part 2, 147-212. Delmar, NY: Delmar, New York: Caravan Books. URL

Sanikidze, L. (2010). Bacburi (Tsova-Tushuri) ena, [Batsbi (Tsova-Tush) Language]. Tbilisi. URL

Schiefner, Anton (1856). Versuch über die Thusch-Sprache: oder, Die khistische Mundart in Thuschetien [Essay on the Tush language: or, the Kist dialect in Tusheti]. Buchdruckerei der Kaiserlichen Akademie der Wissenschaften. URL

Shanidze, A. (1970). The tush. Mnatobi 2, Tbilisi.

morphosyntax

Gagua, Rusudan (1943). dziritadi da erttandebuliani brunvebi bacburshi [simple and compound cases in Batsbi]. PhD thesis, Tbilisi State University.

Gagua, Rusudan (1962). Bacburi zmnis asp'ekt'i da ricxvis gamoxatVis saSualebani [Batsbi verbal aspect and the means of depicting number]. Iberiul-k'avk'asiuri Enatmecniereba 13, 261-66..

Hauk, Bryn (2020). Deixis and Reference Tracking in Tsova-Tush. PhD dissertation, University of Hawai'i at Mānoa. URL

Hauk, Bryn & Bradley Rentz (2019). Tsova-Tush language attitudes and use.. Poster presented at the 6th International Conference on Language Documentation and Conservation, Honolulu, HI. URL

Harris, Alice C. (2009). Exuberant exponence in Batsbi. Nat Lang Linguist Theory 27, 267–303. URL

Harris, Alice (2011). Clitics and affixes in Batsbi. In Rodrigo Gutiérrez Bravo et al. (eds.), Representing Language: Essays in Honor of Judith Aissen. Santa Cruz: University of California, 137-155. URL

Holisky, Ann (1985). A Stone's Throw from Aspect to Number in Tsova-Tush International Journal of American Linguistics 4, 453-455.

Holisky, Ann (1994). Notes on Auxiliary Verbs in Tsova-Tush (Batsbi). In Howard I. Aronson (ed.), Non-Slavic languages of the USSR: Papers from the fourth conference . Ohio: Slavica Publishers, 143-159. URL

Holisky, Ann (1987). The case of the intransitive subject in Tsova-Tush (Batsbi). Lingua . 71, 103-132. URL

Wichers Schreur, Jesse (2021). Nominal borrowings in Tsova-Tush (Nakh-Daghestanian, Georgia) and their gender assignment. In Diana Forker & Lore A. Grenoble (eds.), Language contact in the territory of the former Soviet Union, 15–33. Amsterdam: John Benjamins. URL

Wichers Schreur, Jesse (2025). Intense language contact in the Caucasus: The case of Tsova-Tush. Berlin: Language Science Press. URL

Wichers Schreur, Jesse, Marc Allassonière-Tang, Kate Bellamy, Neike Rochant (2022). Predicting grammatical gender in Nakh languages: Three methods compared. Linguistic Typology at the Crossroads 2-2, 93-126. URL

sounds

Gagua, Rusudan (1956). Zogierti ponet’ikuri p’rocesi batsburi enis xmovnebši. Ibero-Caucasian linguistics, V8.

Imnaishvili D. (1977). 1977, istoriko-sravnitelnij analiz fonetiki naxskix jazikov [historical-comparative analyses of the Nakh languages phonetics]. Tbilisi.

Mikeladze, M. (1977). Xmovanta redukcia bacbur enaši [Reduction of vowels in the Batsbi language]. Macne 3, 118-127.

words

Bertlani, A., Mikeladze, A., K. Gigashvili (2012-2019). Tsovatush-Georgian-Russian-Enligsh dictionary, vol. 1-4. Tbilisi: Saari.

Fähnrich, H. (2001). Batsisch (Zowatuschisch)-Deutsches Wörterbuch. Jena: Friedrich-Schiller Universität.

Kadagidze, D. & N. Kadagidze (1984). C’ova-tušur-kartul-rusuli leksik’oni [Tsova Tush-Georgian-Russian dictionary]. Tbilisi: Mecniereba. Volume 1, Volume 2, Volume 3, Volume 4.

data

Hauk, Bryn (2020). Batsbi (Tsova-Tush) Repository at Scholar Space of the University of Hawaiʻi at Mānoa. URL

Kakashvili, Diana, and Stavros Skopeteas. (2025). Tsova-Tush (Bats/bi) spoken data corpus (1.0.0) [Data set]. Zenodo. (doi: 10.5281/zenodo.15863418) URL

Tsiskarishvili, Tinatin. (2025). Common Voice Scripted Speech 24.0 - Tush URL

events

Kakashvili, Diana, Léa Nash, and Jérémy Pasquereau. (2025). CAUcasian LAnguages in GEorgia: Linguistic fieldwork summer program. August 18 to 31, 2025, Zemo Alvani, Georgia. URL

citation

Kakashvili, D., & Skopeteas, S. (2025). Tsova-Tush (Bats/bi) spoken data corpus (1.0.0) [Data set]. Zenodo. (doi: 10.5281/zenodo.15863418)