Atlas of Yucatec Maya Online

Yucatán

This was the peninsula of Yucatán as found by Frederick Catherwood and John Lloyd Stephens during their expedition in 1839-1840 (published as Map of Yucatán in Stephens, J.L. 1843, Incidents of Travel in Yucatán. London.):

Yucatec Maya is currently spoken by 796,405 speakers in the peninsula of Yucatán, in the Mexican states of Yucatán, Quintana Roo, and Campeche (census 2010; INEGI, 2011) , as well as by a small group of speakers (estimated 2,000) in the northern part Belize.

Yucatán is a nearly flat peninsula without significant geographical barriers such as rivers or mountains. The central, southern and eastern parts of the peninsula are covered by tropical forest that until the beginning of the 20th century was only traversed by forest trails. A different situation developed in the northwestern part of the peninsula, which was exploited for the production of henequen (sisal hemp) since the second half of the 19th century, which gave rise to labor migration and the creation of a network of roads and railway lines connecting the larger henequen fincas, the capital of Merida, and the ports for shipment (Huntington, 1912:801, 820). Up to the beginning of the 20th century, the major road connections were between Mérida and Valladolid in the north, Mérida and Campeche in the West, and Mérida and Peto in the center of the peninsula (Huntington, 1912; Redfield & Villa Rojas, 1962:3; Stephens, 1843). The Southeastern part of the peninsula (present state of Quintana Roo) was less accessible through road connections in the past and also administratively separated from the rest of the peninsula during the Caste War (1847-1933).

Atlas

How does linguistic variation evolve in geographical space? Is the language situation of indigenous American languages reflected in the patterns of variation? Does grammatical and lexical variation display spatial biases?

Our aim is to understand the sources of variation in contemporary Yucatec Maya, with a major focus on variation in geographical space. For this purpose, we created a corpus resource with data from different locations in the Yucatecan peninsula. The data was elicited with a questionnaire covering various areas of variation in this language.

Key data: 2026-03-06

The present resource provides various possibilities to explore, summarize, and visualize this data in order to answer questions about dialectal variation in Yucatec Maya. The aim of this dataset is to offer a basis for the study of variation of Yucatec Maya in geographical space. The speaker sample contains 176 native speakers from sample locations in the peninsula of Yucatán. The data was collected with Spanish prompts (either single words or complex expressions) that were part of a questionnaire with sections on lexicon, phonology, morphology, and syntax. The native speakers were instructed to translate the Spanish prompts into Yucatec Maya.

Locations

The data was collected in 86 locations in the peninsula of Yucatán covering all areas in which the language is currently spoken.

The gradient red values display the proportion of Yucatec Mayan speakers in the location at issue (dark red = high proportions of native speakers; light red = low proportion of native speakers), based on the data from the INEGI census 2010.

With click on the locations, you may see the exact census numbers in a popup. Total: Publication total of the location, according to census 2010; Indigenous: Number of speakers declaring that they speak an indigenous language; proportion out of population total.

Speakers

The speaker sample contains 176 speakers, who were born and lived (at least 30 years) in the sample locations. The instructors took care to interview speakers who were maximally competent in Yucatec Maya. The data were collected between 2000 and 2007.

The sample contained 117 women and 59 men. The birthyears of the speakers range between 1906 and 1989, almost normally distributed around the median 1953 (mean 1953.2)

Questions

The questions were part of a long questionnaire containing sections on Phonology, Morphonology, Morphology, Syntax, and Lexicon. The prompts were maximally simple expressions in order to elicit the material of interest and to reduce the complexity that would arise in more naturalistic contexts. Nominal concepts were translated with simple word-by-word elicitation. Further categories and grammatical formatives were elicited within minimal expressions that are potential clauses.

Difficulties in the interpretation arise when the translation between Spanish and Yucatec Maya is not one to one, which are discussed within the database (with respect to the cases at issue). In some cases in which more than one Yucatec Maya concepts correspond to the Spanish prompt, we observe geographical spaces for certain options, which is very informative for the layers of variation (and indirectly for differences in markedness asymmetries between geographical spaces).

The selected material contained elements known to vary between dialects or further instances of variation at the lexical or grammatical level that were included to be checked for geographical biases.

Explore data

navigate through the database by prompt.

Choose Question

Choose Annotation

Gather data

aggregated data based on annotations.

You may select a phenomenon (i.e., an annotation layer) from the dropdown list:

Layer:

restrict the selection to a subset of the data:

Subsets:

and restrict the selection to certain prompts:

Questions:

Create data

wizard assisting you creating maps.

page_1

Choose a set of questions:

Download the data (without duplicates):

Download not-annotated file

open the yucdata.csv file (encoding: UTF-8, separator: comma)

label the column D as "Value" and save your annotations there; empty cells and values labeled NV (="no value") are ignored.

save your file (filetype: csv, encoding: UTF-8)

Separator

Comma

Semicolon

Tab

Quote

None

Double Quote

Single Quote

upload your CSV File

Browse...

You may download a sample annotated file here and upload it to check the procedure:

Download example file

More information about importing/exporting .csv

Order colors

blue

red

lightblue

purple

yellow

black

white

orange

darkred

darkblue

antiquewhite

cornsilk

chocolate

gold

grey

darkorange

Space plot

points by Speaker pies by Location color gradients by Location (only for binary features)

Time plot

Estimates

Population-size plot

Estimates

Indigenous-Population density plot

Estimates

Spanish-bias plot

Estimates

Gender plot

Data Table

Notes

Dictionaries

Data

Transcriptions are nearly phonological, rendering the realized segments by the speakers (i.e., not the orthographic representation of the word) in the Mayan orthography. The data were transcribed by four linguists who are native speakers of Yucatec Maya (see menu "Team").

Vowel inventory

Five vowel qualities are represented in the Yucatec Mayan orthography (italics: orthography; []: phonetics):

i [i]

u [u]

e [ɛ]

o [ɔ]

a [a]

Schwa is not contrastive and not represented in the orthography. For instance, it appears in epenthesis, preceding roots with central vowels (while non-low vowels have [i] epenthesis). Since the transcriptions of the present database only contain phonological entities, schwa epenthesis is not transcribed in the data. Epenthetic vowels (Bricker & Orie 2014: 182):

k chamal-o’ob → k[ə] chamal-o’ob ‘(1PL cigarette-PL) our cigarettes’

Syllable types

Yucatec Maya distinguishes between four syllable types, depending on the realization of the vowel and applying to all vowels. The four realizations are distinctive (examples from Bricker et al 1998: 123, 254; see also Lehmann 1990: 34f.). Short vowels are non-tone-bearing units, i.e., they do not host a lexical tone. Long vowels (plain bimoraic vowels) display a phonological contrast between high and low tonic vowels. High vowels contain a high pitch target aligned either with the first or the second mora, while low vowels do not contain a high pitch target and are realized with a low pitch plateau in careful speech. Rearticulated vowels are realized with an intervening glottal stop (in careful speech) or as glottalized vowels (Lehmann 1990: 35). The rearticulated vowels are always realized with a high pitch target in the first mora (see Gussenhoven and Teeuw 2007), which is a possible prosodic realization for the high long vowels. This means that in the absence of glottalization, which may happen in spontaneous data, rearticulated vowels are indistinguishable from high tone vowels.

short: a, e.g., xan 'too'

long/low: aa, e.g., xaan 'slowly'

long/high: áa, e.g., xáan 'delay'

long/rearticulated: a'a, e.g., xa'an 'palm'

Minimal pairs exist (see examples above) but are generally rare, which means that tonal realization is rarely crucial for communicating meaning. There is a lot of variation in the tonal properties of certain words, also attested in dictionaries.

The transcribed data reflect the auditive impression of the transcribers, which is certainly not totally reliable, given the fine-grained cues of tonal perception and the fact that tones are anyway realized with substantial variation in spoken Yucatec Maya. The intuitions of the transcribers may be informative if they display areal biases, in which case it would be worth to examine the recordings more carefully for more precise reports. Some illustrative examples:

Question Q005 ‘monte’:

S028: k'ax

S008: k'áax

S009: k'aax

S113: k'a'ax

Question Q009 ‘sombrero’:

S001: p'ok

S003: p'óok

S145: p'ook

Consonant inventory

The inventory of consonants that appear in native words is listed below (italics = orthography; [x] = corresponding IPA value).

plosive, voiceless: p [p] t [t] k [k] ' [ʔ]

plosive, ejective: p’ [p’] t’ [t’] k’ [k’]

plosive, implosive: b [ɓ]

plosive, voiced: b [b]

nasal: m [m] n [n] n [ŋ]

affricate, simple: ts [ʦ] ch [ʧ]

affricate, ejective: ts’ [ʦ’] ch’ [ʧ’]

fricative: s [s] x [ʃ] j [h]

approximant: w [ʋ] l [l] y [j]

trill: r [r]

Velar nasal

Velar nasals appear in word-final contexts. Table 3 summarizes some occurrences of the velar nasal in the data – all found in the data from Campeche (S020, S027, S033, S037), which suggests a possible dialectal bias. Since the velar nasal is not contrastive in Yucatec Maya, it is not represented in the transcriptions of the data (all these cases are transcribed as nasal <n>).

Voicing

Voicing of unvoiced consonants due to assimilation is not transcribed in the data, e.g., jaantik may correspond to jaan[t]ik or jaan[d]ik.

Trill

A trill r appears in Spanish words, e.g., kwáatrooj ‘four’. In native roots, a trill r appears only in intervocalic contexts in a set of onomatopoetic roots, e.g., arux – alux ‘forest spirit’, x turix - x tulix ‘dragon fly’ (Bastarrachea Manzano & Canto Rosado 2003: 6, 213-214; Bricker et al. 1998: xii). There is no evidence for a contrast between [l] and [r] in the native vocabulary.

The transcriptions of the data are based on the perception of the native transcribers. The variation in native words is relevant for the dialectal variation in Yucatec Maya, since the realization with a trill appears more often in the central and eastern part of the peninsula. Therefore this distinction is faithfully rendered in the transcribed data.

Question Q106 ‘duende’:

S030: alux

S031: arux

Coda weakening

A well-studied phenomenon of Yucatec Mayan phonology is coda weakening in root final contexts (either word internal or word final). Ejective codas reduce to glottal stops (u láak’ vs. u láa’ ‘3SG other’) and non-ejectives to the glottal fricative j (xíimbal vs. xíimbaj ‘walk’); see Orie & Bricker (2000: 296-297). Either type of coda can be further weakened to zero.

The transcriptions in the data reflect the auditive impression of the transcribers and reveal cases of variation that are not expected by the current phonological assumptions. This variation must be treated with caution, also given the fact that these realizations are not always distinguishable in non-laboratory recordings:

Question Q508 ‘estoy caminando’:

S001: tin xiimba

S015: tin xiimba'

S016: tin xíimbaj

S017: táan in xíimbal

Consonants in Spanish words

Spanish words are rendered in the Mayan orthography:

Spanish: apelliido, Yucatec Maya: áapeyiido

Spanish: México, Yucatec Maya: Meejiko

The following letters of the Spanish orthography were retained (since there are no corresponding letters in the Yucatec Mayan orthography): rr, d, g, and only appear in words of Spanish origin:

rr: Káarriyo

f: Feliipe, fíinka

d: saandia, Peedro

g: igwal, gavilán

gu: aguila, oso jormiguero

Conventions for word segmentations follow a principle of maximal transparency:

Epenthesis

Epenthetic y-/w- is written with the word to which it is syllabified:

tin wilik, m-transcription: t=in w-il-ik (PFV=A.1SG 0-see-CMPL) 'I saw'

tu yilik, m-transcription: t=u y-il-ik (PFV=A.3SG 0-see-CMPL) 'he saw'

Definite article

The definite article is separated from preceding material, even if it syllabified with it in the phonetic realization, except if the preceding material cannot form a phonological word, as with the reduced form of the preposition t(i') 'LOC'. If a word ending in l is followed by the definite article le, only one of the l's survives. We assign the surviving l to the preceding word and representing the article by e, because the latter can also appear after consonants other than l.

te k'ano', m-transcription: t=e k'an=o' (LOC=DEF =D2) 'in the hammock'

in ti'l e yao' (instead of in ti' le yao')

Possessors

Possessors are separated from the possessed noun.

u baj (instead of ubaj

tu láak' (instead of tuláak')

Auxiliaries

The aspectual auxiliaries k-/t-, that cannot form a phonological word, are written in a word with the person clitic. Auxiliaries that may form separate phonological words are separated from the person clitic.

tin wilik (instead of t in wilik)

táan in wilik

Noun Classifiers

Noun classifiers are written as prefixes of the noun they specify.

jmiis (instead of j miis)

Numeral classifiers

Numerals are separated from the classifiers.

jun túul áak (instead of juntúul áak)

Postverbal material

Postverbal objects are written as separate words, also in cases in which they may be incorporated – unless there is morphological evidence for incorporation (when the verbal suffix follows the incorporated object), in which case they are not separated.

tin chak si' (instead of tin chaksi')

in wala' peek' (instead of inw ala' peek')

Postverbal material is separated from the inflected stem, e.g., the reflexive

táan k machk u baj (táan k machkubaj), the PP tio’

Alternative translations

Whenever speakers offer more than one translation, their alternative translations are given in the order of presentation and are separated by a hash tag:

Q002 (uno), S016: uno p'éej # u p'éej

Q006 (¿cómo?), S176: bixíij # bíix

Q019 (incienso), S037: inseensiyóoj # pom

Self corrections

Whenever the native speaker informs the instructor that she is not satisfied with the reply at issue, the data is not included in the database.

Q159 (siete), S086,: o' p'eej... no, Form: -

Whenever the native speaker gives an unappropriate translation in a first trial and then s/he corrects her/himself, only the revised trial is displayed:

Q002 (uno), S110,: uuno # um p'eej, Form: um p'eej

Q003 (¿lo estás viendo?), S004,: táan in wilik # tan wa wilik tech, Form: tan wa wilik tech

Snippets

Dialectal variation

Variation in Yucatec Maya has the form of a dialectal continuum with regional varieties arising through dialectal variants that are diffused with different patterns in geographical space (Blaha Pfeiler & Hofling 2006). Some studies report that the major axis of dialectal differentiation is between the western and the eastern part of the peninsula (Tozzer 1921, Edmonson 1986). Dialectal regions have been identified in earlier research mainly by means of phonological and lexicological variants (Bastarrachea Manzano & Canto Rosado 2003, Briceño 2002, Pfeiler 2014):

North: (a) the Agave Area and (b) the Metropolitan area (around Merida);

East: (a) the northeastern part (around Valladolid) and (b) the central eastern part of Quintana Roo;

Center: (a) the southern part of the State of Yucatán and (b) the center of Quintana Roo;

West: (a) the region Los Chenes and (b) the zone of Camino Real.

Based on the present resource, Blaha and Skopeteas (2022) analyzed the similarity between locations, based on the variation in lexical choice. The colors of the map on the right visualize the aggregated results, whereby similarity in color means similarity in lexical choice.

Numeral Classifiers

In Yucatec Maya (as in many other Mayan languages), a numeral classifier is generalized and replaces more specific sortal classifiers. The general classifier is also added to mensural classifiers, rendering complex classifier constructions. Do these phenomena result from the same process of language change?

The dispersion of the data in geographical space indicates that these developments are only partially correlated; another part of the variation is explained by developments in the mensural classifiers (see Blaha & Skopeteas 2024).

Endangered languages
and patterns of convergence

What is special in the language situation of minority languages? Diffusion of linguistic features in space is captured by Gravity Models, that assess the role of urban centers in determining social interactions and their concomitant reflexes on the dispersion of linguistic features in geographical space.

The predictions of the Gravity Models are presented by the simulation on the right, which displays the similarities between locations as predicted by a gravity model with random parameters.

However, in case of minority languages that are more strongly preserved in rural areas, the role of urban centers is deprecated. The properties of variation in these language situations are reflected in the estimates of the Gravity Model (see Blaha & Skopeteas 2022).

Project publications

Blaha Pfeiler B, Hofling A. 2006. Apuntes sobre la variación dialectal en el maya yucateco. Península; 1:27–44.

Blaha Pfeiler B, Skopeteas S. 2022. Sources of convergence in indigenous languages: Lexical variation in Yucatec Maya. PLoS ONE 17(5): e0268448. https://doi.org/10.1371/journal.pone.0268448

Blaha Pfeiler B, Skopeteas S. 2024. Numeral Classifiers in Yucatec Maya: Microvariation and syntactic change. Journal of Historical Syntax 8.6.

Montañez Giustinianovic,Alejandro 2024. Atlas del maya yucateco en línea: mapas dialectológicos.

Cited Literature

Programming

This website is written in pure R (just adding some css):

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. R version 4.2.1 (2022-06-23 ucrt)

Web apps package:

Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, Allen J, McPherson J, Dipert A, Borges B (2022). _shiny: Web Application Framework for R_. R package version 1.7.2, <https://CRAN.R-project.org/package=shiny>.

Maps:

Cheng J, Karambelkar B, Xie Y (2022). _leaflet: Create Interactive Web Maps with the JavaScript 'Leaflet' Library_. R package version 2.1.1, <https://CRAN.R-project.org/package=leaflet>.

Various widgets (picker inputs, button ordering, minicharts, etc.) and facilities:

Perrier V, Meyer F, Granjon D (2022). _shinyWidgets: Custom Inputs Widgets for Shiny_. R package version 0.7.1, <https://CRAN.R-project.org/package=shinyWidgets>.

Tang Y (2022). _shinyjqui: 'jQuery UI' Interactions and Effects for Shiny_. R package version 0.4.1, <https://CRAN.R-project.org/package=shinyjqui>.

Bachelier V, ZAWAM J, Thieurmel B, Guillem F (2021). _leaflet.minicharts: Mini Charts for Interactive Maps_. R package version 0.6.2, <https://CRAN.R-project.org/package=leaflet.minicharts>.

Vaidyanathan R, Xie Y, Allaire J, Cheng J, Sievert C, Russell K (2021). _htmlwidgets: HTML Widgets for R_. R package version 1.5.4, <https://CRAN.R-project.org/package=htmlwidgets>.

Cheng J, Sievert C, Schloerke B, Chang W, Xie Y, Allen J (2022). _htmltools: Tools for HTML_. R package version 0.5.3, <https://CRAN.R-project.org/package=htmltools>.

Data processing:

Wickham H, François R, Henry L, Müller K (2022). _dplyr: A Grammar of Data Manipulation_. R package version 1.0.9, <https://CRAN.R-project.org/package=dplyr>.

Plots:

Wickham H (2016). _ggplot2: Elegant Graphics for Data Analysis_. Springer-Verlag New York. ISBN 978-3-319-24277-4, <https://ggplot2.tidyverse.org>.

Questionnaire

Barbara Blaha Pfeiler

Andrew Hofling

contributions by Fidencio Briceño Chel and Domingo Dzul

Data collection

Yuri Balam

Barbara Blaha Pfeiler

Ernesto Aké Ciau

Flor Canche Teh

Carlos Carrillo Carreón

Cessia Chuc Uc

Evaristo Dzul Caamal

Andrés Hofling

Israel Naím Corripio

Ismael May May

Jorge Monforte

Lorena Pool Balam

Ricardo Santos

Martín Sobrino Gómez

Ceydi

Daysi

Wilbert

Speakers

The core actors of this resource must remain anonymous: grateful thanks to S001, S002, ... S176 for their contribution to this project.

Digitalization

Phonogrammarchiv der Österreichischen Akademie der Wissenschaften, Wien.

(www.oeaw.ac.at/phonogrammarchiv/home)

Transcriptions

Ernesto Aké Ciau

Flor Canche The

Jaime Chi

Ismael May May

Revisions

Barbara Blaha Pfeiler

Christian Lehmann

Stavros Skopeteas

Elisabeth Verhoeven

Data curation

Alina Sementsova

Fernando García Mendivil

Anna Pessarrodona Marfà

Paulien Veenstra

Annotations

Barbara Blaha Pfeiler

Christian Lehmann

Stavros Skopeteas

Elisabeth Verhoeven

Advice

Antonio Gonzalez Poot

Rodrigo Gutiérrez Bravo

Christian Lehmann

Funding

CONACyT (Barbara Blaha Pfeiler)

DFG (Stavros Skopeteas)

DFG (Elisabeth Verhoeven)

Webmaster

abeja maya

Content on this site is licensed under a
Creative Commons Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0) License.

Cite as:

Funding

Yucatán

Atlas

Key data: 2026-03-06

Locations

Speakers

Questions

Explore data

Choose Question

Choose Annotation

Gather data

Create data

Order colors

Space plot

Time plot

Estimates

Population-size plot

Estimates

Indigenous-Population density plot

Estimates

Spanish-bias plot

Estimates

Gender plot

Data Table

Notes

Dictionaries

Data

Cite as:

Vowel inventory

Syllable types

Consonant inventory

Velar nasal

Voicing

Trill

Coda weakening

Consonants in Spanish words

Epenthesis

Definite article

Possessors

Auxiliaries

Noun Classifiers

Numeral classifiers

Postverbal material

Alternative translations

Self corrections

Cite as:

Snippets

Dialectal variation

Numeral Classifiers

Endangered languages and patterns of convergence

Project publications

Cite as:

Cited Literature

Programming

Cite as:

Questionnaire

Data collection

Speakers

Digitalization

Transcriptions

Revisions

Data curation

Annotations

Advice

Funding

Webmaster

Endangered languages
and patterns of convergence