The Centre for Research on the Linguistic and Cultural Heritage of Istria hereby informs the scientific community and the interested public of the publication of The Istriot Corpus, a digital language corpus dedicated to the Istriot dialect, one of the most endangered Istroromance idioms within the European linguistic landscape. The corpus is publicly available through the international research platform TalkBank, widely recognised as one of the world’s leading infrastructures for the study of spoken language, language development and discourse practices. The development of the corpus was carried out within the framework of the Croatian Science Foundation (HRZZ) project Multidisciplinary Approaches to Linguistic and Cultural Heritage (MULTIDIS).
The corpus is the result of long-term, systematic and methodologically grounded fieldwork aimed at documenting the living speech of the remaining speakers of the Istriot dialect in several local Istrian communities. Given the extremely small number of speakers and the severe disruption of intergenerational transmission, the creation of this corpus constitutes a scientifically grounded and socially responsible intervention, highlighting the urgent need for the documentation of endangered linguistic varieties.
The corpus includes audio recordings of spontaneous speech, narrative accounts, autobiographical memories and descriptions of traditional practices, accompanied by detailed linguistic transcription and rich metadata. Particular emphasis has been placed on preserving the phonetic, lexical and morphosyntactic features of Istriot, thereby ensuring its long-term scientific usability for Romance linguistics, dialectology, historical linguistics and anthropological research. This form of digital documentation and open accessibility contributes to the preservation of local linguistic and cultural values and represents an internationally relevant scientific resource for the study of endangered Romance languages.
Linguistic data have been transcribed in accordance with the standardised CHAT (Codes for the Human Analysis of Transcripts) protocol, while computational processing and analysis are conducted using the CLAN (Computerized Language Analysis) software package. This methodological framework ensures full compatibility with other international corpora and enables the application of advanced quantitative and qualitative analytical approaches.
The Istriot corpus is available in open access on the TalkBank platform:
https://ca.talkbank.org/access/Istriot.html
Moscarda Mirković, E., Poropat Jeletić, N.; Hržica, G. – CABank Istriot Corpus, 2024, doi: 10.21415/NR0X-ZZ76
Project financed by the Croatian Science Foundation (HRZZ)
Project title: Multidisciplinary Approaches to Linguistic and Cultural Heritage (MULTIDIS) / A multi-level approach to spoken discourse in language development
http://multidis.erf.hr
Principal Investigator: Assist. Prof. Gordana Hržica, PhD
Project start date: 01/01/2018
Project end date: 31/12/2022
Project number: UIP-2017-05-6603
Host institution: Faculty of Education and Rehabilitation Sciences, University of Zagreb
Scientific area: Social Sciences
Scientific field: Speech and Language Pathology