The summer school offers four four-day courses covering statistical modelling and corpus annotation. There will then be four courses on special research topics (two days each) : sign-language corpora, geo-localized data, syntax annotation, and eye-tracking. There are both theoretical and experimental courses covering the topic from cross-linguistic and cross-modular perspectives. Participants will have the opportunity to present their work as a talk or a poster at a workshop at the end of the first week.
foundational courses
Foundational Course 1: statistics
Introduction to Inferential Statistics
by Maik Thalmann (University of Göttingen)
19–22 September, lecture 9:30–10:30, practice 13:00-14:00
Course Information and Materials
19–22 September, lecture 9:30–10:30, practice 13:00-14:00
Course Information and Materials
This class will take as its start the t- and the χ2-tests, which, together with the way to compute them in R will also form the assumed background knowledge. From there, we will work our way through other commonplace ways to analyze data from linguistic experiments and from corpus studies alike. Along the way, we will discuss not only how to perform these tests in R but also frequent problems analysts face when deciding which test to run and what kinds of parameters to use. Our central goal will be to equip attendees with the knowledge necessary for Alexandra Lorson and Vinicius Macuch’s class Introduction to generalized linear modeling for linguists in the second week of this summer school.
Requirements and materials
Requirements and materials will be offered at the following repository: https://mkthalmann.github.io/intro-stats/.
Foundational Course 2: corpus analysis
Information- and discourse-structure analysis with questions under discussion
by Arndt Riester (Universität Bielefeld) & Kordula De Kuthy (University of Tübingen)
19–22 September, lecture 11:00–12:00, practice 13:00-14:00
Course Information and Materials
19–22 September, lecture 11:00–12:00, practice 13:00-14:00
Course Information and Materials
This class is an introduction to the QUD-tree framework (Riester, Brunetti and De Kuthy 2018). Determining the so-called *questions under discussion* of a discourse is a means to analyze both the information structure (i.e. focus-background divide) of the discourse segments, as well as the overall topical organization of the discourse itself. QUD trees are, therefore, a means to represent discourse structure, different from but compatible with rhetorical relations. The class is also devoted to the practical task of annotating information-structural categories in texts from different languages, and to distinguish at-issue from non-at-issue segments.
Requirements/Preparation
In the class we will make use of the QUDA tool, which can be freely downloaded at: https://github.com/MMLangner/QUDA. Participants are kindly asked to install this tool on their laptops prior to the class, which should be possible based on the installation guidelines. The process is a bit cumbersome, and some people might experience technical problems. In this case, do not worry: we will address technical problems during the practice session. But please give it a try!
Materials
Additional materials will be offered at the following webpage: https://intro-qud.github.io/
Foundational Course 3: statistics
Introduction to generalized linear modeling for linguists
by Alexandra Lorson (University of Birmingham) and Vinicius Macuch (University of Birmingham)
27–30 September, lecture 9:30–10:30, practice 13:00-14:00
Course Information and Materials
27–30 September, lecture 9:30–10:30, practice 13:00-14:00
Course Information and Materials
Welcome to the linear modelling course for the Göttingen Summer School!
In this workshop, you will learn how to analyse linguistic data with R, RStudio, and the tidyverse. We will introduce you to statistical models with an emphasis on the (generalized) linear model framework, including mixed models. More specifically, we will be covering topics such as interpreting interactions, fitting logistic and Poisson regression models, random intercepts and random slopes, as well as convergence issues. Importantly, you will also learn how to do data analysis in a way that is open and reproducible.
Requirements/Preparation
In this course we will be using R and RStudio which you will have to install prior to the start of the course. Even if you are an experienced R User, it may still be good to re-install R, RStudio, and the specified R packages to make sure that we're all working with the most up-to-date version.
If you are new to R it would be worth checking out this free datacamp course:
www.datacamp.com/courses/free-introduction-to-r
For keen beans, the following video will help you with installing R and RStudio and introduce you to the world of R:
https://www.youtube.com/watch?v=lVKMsaWju8w
Materials
We will provide you with more information on how to install R and RStudio in due time here:
https://osf.io/4cpyz/
This will also be the place where you can find the course materials (data files, resources etc.).
Foundational Course 4: corpus analysis
Automatic methods for corpus-based linguistic research
by Stefanie Dipper (Ruhr-University Bochum)
27–30 September, lecture 11:00–12:00, practice 13:00-14:00
Course information and Materials
27–30 September, lecture 11:00–12:00, practice 13:00-14:00
Course information and Materials
Materials
research topics
Research topic 1: sign languages
Using sign language corpora for linguistic research
by Marloes Oomen (University of Amsterdam)
19–20 September, 15:30–17:00
Course Information and Materials
19–20 September, 15:30–17:00
Course Information and Materials
Recent years have seen a boom in corpus-based linguistic research on sign languages, following the creation of multiple sign language corpora. These corpora are invaluable for the documentation of sign languages as well as continued linguistic research on their grammatical structure and degree of intra-linguistic variation. At the same time, given the relatively small size of all existing sign language corpora, the time-consuming annotation process, and the present scarcity of automatic annotation tools, using sign language corpora for linguistic research involves different challenges than working with (large) (majority) spoken language corpora. We will look at examples from the literature to explore what type of research questions are ideally tackled by corpus-based studies on sign languages. You will then get some hands-on experience working with data from the German Sign Language Corpus. Using ELAN Linguistic Annotator, you will create and analyze your own annotations in a small subset of the data in this corpus to answer a basic research question. Familiarity with sign languages and sign language linguistics is not required.
Requirements/Preparation
In this course, we will be using ELAN Linguistics Annotator, which you will have to install before the course starts. The latest version can be downloaded here: https://archive.mpi.nl/tla/elan. While you’re at it, feel free to peruse the full manual and/or how-to guide under the ‘documentation’ tab.
We will be working with data from the German Sign Language Corpus (DGS Corpus), available at https://www.sign-lang.uni-hamburg.de/meinedgs/ling/start_en.html. I recommend reading the general information on the homepage prior to the start of the course.
Materials
Some 50 hours of material from the DGS Corpus, including transcription files, are freely available on the DGS Corpus website. A selection of these data will be used during the course and will be made available to you in due time.
Research topic 2: language and space
Language variation and geo-localized data
by Olga Kellert (University of Göttingen)
21–22 September, 15:30-17:00
Course Information and Materials
21–22 September, 15:30-17:00
Course Information and Materials
Abstract
This class will give an introduction into geolinguistics, which is a branch of linguistics and geography. We will look into social media text messages that are associated with location information. This location information can be expressed by an exact address, e.g. Humboldtallee 19, Göttingen. We will first look at language distribution in multilingual countries to trace language borders, e.g. where French is spoken in Belgium and where Dutch is spoken in the same country. We will do the same with multilingual cities such as the city of New York and see what languages are spoken (the most) in Chinatown and other parts of Manhattan, The Bronx, and Queens. We will then look at the distribution of regional language varieties or dialects and trace the border of the dialectal word Semmel ‘bread roll’ and Brötchen ‘bread roll’ in German. Finally, we will visualise the distribution of loanwords from indigenous languages in South America (e.g. pucho ‘cigarette’, cancha ‘pitch’, palta ‘avocado’, etc.). Students will learn the techniques required to visualise language and dialect distribution on geographic maps.
Materials and recommendations
In this particular course, we are going to learn how to work with geolocation associated with natural language data from social media platforms like Twitter.I recommend to getting familiar with geolocation information on Twitter:
https://www.tweetbinder.com/blog/twitter-geolocation-map/
https://developer.twitter.com/en/developer-terms/geo-guidelines
introduction of the article in https://par.nsf.gov/servlets/purl/10106302
For application of geolocation in linguistics, I recommend reading at least Mocanu et al. 2013:
Bland Justin & Terrel A. Morgan, 2020, Geographic variation of voseo on Spanish Twitter. Guillermo Lorenzo (ed.) Issues in Hispanic and Lusophone Linguistics 27. 7-38. John Benjamins.
Gonçalves, Bruno & David Sánchez, 2014, ‘Crowdsourcing dialect characterization through Twitter’, PloS ONE 9: e112074.
Grieve, Jack; Chris Montgomery; Andrea Nini; Akira Murakami & Diansheng Guo, 2019, ‘Mapping lexical dialect variation in British English Using Twitter’, Front. Artif. Intell. 2(11). doi: 10.3389/frai.2019.00011.
Lansley G, Longley PA, 2016, ‘The geography of Twitter topics in London.’ Comput Environ Urban Syst. 58:85–96.
Leemann, Adrian; Marie-José Kolly; Ross Purves; David Britain & Elvira Glaser, 2016, ‘Crowdsourcing Language Change with Smartphone Applications’, PLoS ONE 11(1): e0143060. doi: 10.1371/journal.pone.0143060.
Levy Abitbol, Jacob; Márton Karsai; Jean-Philippe Magué; Jean-Pierre Chevrot & Eric Fleury, 2018, ‘Socioeconomic dependencies of linguistic patterns in Twitter: a multivariate analysis’, Proceedings of the 2018 World Wide Web Conference WWW’18, 1125–1134.
Mocanu, Delia; Baronchelli, Andrea; Perra, Nicola; Gonçalves, Bruno; Zhang, Qian & Vespignani, Alessandro, 2013, The Twitter of Babel: Mapping World Languages through Microblogging Platforms. PLoS ONE 8(4): e61981. DOI: 10.1371/journal.pone.0061981
Wieling M, Nerbonne J, Baayen RH, 2011, Quantitative Social Dialectology: Explaining Linguistic Variation Geographically and Socially. PLoS ONE 6(9): e23613. https://doi.org/10.1371/journal.pone.0023613
Maps and software packages we will use for geolocation analysis, I recommend studying:
OpenStreetMap. OpenStreetMap License; 2017. Available from: http://wiki.openstreetmap.org/wiki/Open_Database_License.
Met Office. 2010–2015. Cartopy: a cartographic python library with a Matplotlib interface. Online: https://scitools.org.uk/cartopy/ (last access: 24.01.2022).
https://developer.twitter.com/en/docs/tutorials/filtering-tweets-by-location
Research topic 3: eye tracking
An overview on eye tracking experiments with silent reading and visual world paradigm
by Daniele Panizza (University of Göttingen)
27–28 September, 15:30-17:00
Course information and Materials
27–28 September, 15:30-17:00
Course information and Materials
Research topic 4: syntactic annotation
Introduction to Computational Corpus Linguistics: Making Tagged Corpora, Universal Dependencies Treebanking, and Natural Language Processing with Deep Learning via BERT
by So Miyagawa (National Institute for Japanese Language and Linguistics)
29–30 September, 15:30-17:00
Course information and Materials
29–30 September, 15:30-17:00
Course information and Materials
Participants will acquire knowledge in the history of computational corpus linguistics and current trends of computational corpus linguistics and related fields such as natural language processing. Students will become accustomed to using and making linguistically tagged corpora, Universal Dependencies treebanks, and natural language processing tools such as Transformer and BERT models.
workshop
A workshop on "Corpus annotation and data analysis" will take place on Friday 23.09 and Saturday 24.09, hosting talks by invited speakers and talks/poster presentations by the participants of the summer school.
social events
Guided Tour through Göttingen on Tu 20.09, 17:30-19:30
(no expenses)
Meeting point, 17:30: in front of the tourist information at the Gänseliesel, Markt 8,
Route in Googlemaps
Dinner on Tu 20.09, 20:00: Villa Cuba
(on own expenses)
Zindelstr. 2 | 37073 Göttingen
https://www.villacuba.de/
Route in Googlemaps
Linguists Tour through Göttingen on Fr 23.09, 18:00
(no expenses)
Meeting point in front of the observatory
Dinner on Fr 23.09, 20:30: Gamie
(on own expenses)
Weender Str. 29 | 37073 Göttingen
https://gamie-restaurant.de/
Route in Googlemaps
Excursion to the Grimmwelt, etc., Kassel, on Su 25.09
(on own expenses)
further details will be announced in the summer school.
Dinner on Th 29.09, 19:00: Le Feu
(on own expenses)
Weender Landstraße 23 | 37073 Göttingen
https://www.lefeu.de/le-feu-flammkuchen-goettingen/
Route in Googlemaps
announcements
Communication
- You may use the e-mail address of the summer school (expired) also during the event.
- Summer school desk, every day during the morning break, 10:30-11:00
- Talk to Nermin Gürkan, she coordinates our group and passes your request to the right person.
Covid-19 regulations
Current covid-19 regulations at the University of Göttingen: level 0
"...masks are optional in buildings and at other events. From 13 June 2022, there is just a recommendation to wear a mask at teaching events and committee meetings. Until the end of the semester, we still recommend wearing a mask as well as getting tested regularly at Campus Covid Screen (CCS)."
Beyond the summer school
- Student Life in Göttingen (University website)
- City of Göttingen Event Calender: festivals, events, music, kino, theater, galeries, museums, nightlife...