'lazy Linguist Documenter'
Have you ever wondered how linguistics, anthropology, and computer science can come together? Probably not; however, if you have, you’ve come to the right place. I’m a 1st year Masters student here at UofC, and I’m going to go through how linguists (well, me) have used modern computational methods in order to document endangered languages.
An endangered language is one that (obviously) has relatively few speakers, but more importantly, is not being taught to children as a first language. This can happen for many different reasons, some anthropological, some linguistic, and some educational. Typically however, the language being squeezed into extinction (the ‘squeezee’?) is on the lower end of a power dynamic with the squeezer language. A good example of this is seen in the Pyrenées of Southwestern France and Northwestern Spain, in a region called Basque Country: Basque, a language family that has been spoken there for thousands of years, has progressively become less-spoken with a smaller area, and nearly all Basque speakers are mulitilingual with French or Spanish. And, this is one of the best-case scenarios.
As languages become extinct, we lose the unique worldview and cultural heritage and knowledge carried by that language. So, as linguists, we try our best to document and preserve them. However, this is easier said than done. Who has the time or money to travel to somewhere where an endangered language is spoken (which tends to be remote (but not always!))? Thus, this work will outline the Lazy Linguist Language Documenter™, or the LLLD. I’m still checking the name with the people who actually win the grants to support this sort of stuff.
Basically, we worked with heritage speakers of a Franco-Provençal language called Faetar, spoken in the town of Faeto in Northern Italy. This heritage speaker community is found in Toronto, Ontario (#BestProvinceEver). I should note that by ‘we’, I don’t include myself – the people were found and the data was collected by the wonderful people of the Heritage Language Variation and Change (HLVC) project, led by Dr. Naomi Nagy at the University of Toronto. Thanks to a great linguistics Masters student at the University of Western Ontario (Michael Iannozzi, my favourite Southern Ontario sociolinguist), I was able to get my hands on this data.
“As languages become extinct, we lose the unique worldview and cultural heritage and knowledge carried by that language. So, as linguists, we try our best to document and preserve them”
However, my role was attempting to document the phonetic characteristics of Faetar, based solely on information from a small article from the mid-1990s, and the actual speech recordings. Specifically, I was trying to find out where the vowels were pronounced in the mouth, and where they were changing over generations of heritage speakers. Honestly, I could have done this by simply listening to all the recordings and acoustically analysing each vowel; however, I didn’t want to do this because it takes forever and is mind-numbingly boring.
So, I decided to automatically look for patterns in the data, and be able to put my feet up and relax while my computer did all the work. I did this by giving my computer a sample set of vowels, which basically amounted to me listening to about 40 minutes of speech (much less than the 50+ hours of data we had). Using this seed data, I trained a neural network to recognize phones which showed the characteristics of the seed vowels (based on intensity, duration, and pitch). From this, I built a vowel space without having to listen to hours and hours of speech. Cool how computing can be used to make documentation of language easier, cheaper, and faster eh?