Data Sets - Linguistics
Data Set Sources by Subject - Linguistics
Go back to main Data Sets page
- These links contain downloadable data sets. The data sets may be available in a variety of formats: csv, Excel, SAS, SPSS, Stata, etc. Some may be available in multiple formats, some in only one (e.g., Excel.) Much linguistics data is in the form of sound or a/v files.
- *Site names preceded by an asterisk (*) represent archives and depositories where you may be able to deposit your own research data.
*DoBeS (Dokumentation Bedroher Sprachen/Documentation of Endangered Languages)
- This site includes annotated audio and video recordings (not all recordings will be analyzed and annotated); multimedia lexica; textual materials of all sorts, such as field notes, notes about phonetics and prosody, sketch grammars, etc.; (annotated) images/photos.
ELAR (Endangered Languages Archive at SOAS, London)
- ELAR preserves and disseminates digital documentation on endangered languages around the world. You must register to access data.
IMDI (ISLE Metadata Initiative) Browser from the Max Planck Institute for Psycholinguistics
- Allows you to search and browse in the whole domain of linked IMDI metadata descriptions as they are registered at the IMDI Portal at the MPI for Psycholinguistics. All metadata descriptions are openly accessible; for many data resources, however, one needs to ask access permission.
- This online catalog from the Open Languages Archives Community provides access to a wealth of information about thousands of languages, including details of text collections, audio recordings, dictionaries, and software, sourced from dozens of digital and traditional archives.