HELADA Language Repository

Photo for HELADA Language Repository
The Heritage Language Data (HELADA) Repository aims to provide an open-access online platform to preserve heritage language data and allow researchers both to access it and contribute their own data. The platform will provide an easily accessible interface to search various data formats, including speech samples, audio recordings, videos, and writing samples. The repository is expected to be ready for public use in the coming months.

The Heritage Language Data (HELADA) Repository aims to provide an open-access online platform to preserve heritage language data and allow researchers both to access it and contribute their own data. The platform will provide an easily accessible interface to search various data formats, including speech samples, audio recordings, videos, and writing samples. The repository is expected to be ready for public use in the coming months.

Dr. Mihail Kopotev, a linguistics professor at the University of Helsinki, Finland, and HELADA’s project adviser, highlighted the breakthrough importance of the data that will be collected and stored on this new platform: “[The repository will collect essential data on] the first language, age of acquisition, age of recording, other languages known to the speaker, etc. This information is crucial and is not available in any other language repository.”

Professor Mikhail Kopotev

Professor Mihail Kopotev

A principal motivation for the creation of HELADA has been the continuing development of the field of heritage languages, and the researchers’ need for streamlined data to conduct more studies.

Maria Polinsky, a professor of linguistics at the University of Maryland and director of research at the National Heritage Language Resource Center (NHLRC), added that much of the growth in the study of heritage languages has occurred in the past 20 years. However, she noted that compiling the proper data for research has been difficult especially as more data has been collected.

Technical factors contribute to this difficulty, especially the need for an intuitive interface in the development of a streamlined repository. Olesya Kisselev, an assistant professor in bilingual bicultural studies at the University of Texas and NHLRC project leader of HELADA explained that, “a lot of these platforms are clunky and not as user-friendly as we would ideally like to see…we hope to attract people who collect data…who may not be well-versed in technical applications.” She added that new data storage and processing capabilities have also driven the creation of the repository.

Another challenge has been addressing licensing issues that govern data collected by various researchers. Dr. Kisselev emphasized that starting early with this project helped the team understand the ethics of public access and data sharing.

Professor Olesya Kisselev

Professor Olesya Kisselev

Dr. Kopetev highlighted that discussions on heritage language preservation encouraged the creation of an organized data repository. A contributing factor in the study of heritage languages has been the impact that cross-cultural interactions have had on the vitality of certain languages.

“Currently, we are witnessing a large-scale migration of people across the world, resulting in the formation of new linguistic communities in various countries,” said Dr. Kopotev in a written statement. “Linguistic extinction is also a growing concern. Some indigenous languages exist as heritage only, they are endangered and may soon be lost forever if not preserved.”

Dr. Polinsky added that heritage language studies offer people in bilingual households, especially in English-speaking societies, a welcoming environment to learn their heritage languages.

“As we look at languages in contact, particularly heritage languages, we can see which areas of language structure are very solid or robust and which are vulnerable and undergo change when there is contact [with other languages],” said Polinsky.

The development of the HELADA Repository has been a collaborative effort representing the core team’s various institutions and several UCLA campus units. The project leaders have drawn on their experiences collaborating with academics around the world who work with various languages and linguistic phenomena. The initial design was implemented by the IT team of the UCLA International Institute, and it was later migrated to the UCLA Library’s Dataverse.

“One of the strengths of the center and how it operates and functions is that it pulls on the expertise of researchers and language practitioners from across the United States,” Dr. Kisselev said.

HELADA is set to be released for limited public use in June, 2023. This will coincide with the NHLRC’s Fourteenth Annual Heritage Research Institute, which will take place on the UCLA campus and will feature a workshop on the open-access repository.

In the initial stages, people will contribute their own data to the repository through close collaboration with its head researchers and designers. Drs. Polinsky and Kisslev emphasized that quality control on the raw data, such as ensuring that transcription modification will be limited, must be implemented in some form. However, as the project continues developing, the ultimate goal is for researchers to be able to autonomously upload their own heritage language data for public wide access.