Home Using Twitter to Preserve Minority Languages

Using Twitter to Preserve Minority Languages

Of the approximately 6,000 languages alive in the world today, 60 percent or more are said to be dying out. The majority of the world’s languages are, in fact, “minority” languages, used in the shadow of a more politically powerful tongue.

On St. Patrick’s Day, Prof. Kevin Scannell of St. Louis University launched a project called Indigenous Tweets. Using a web-crawling statistical software he wrote called An Crúbadán, Scannell identifies which minority languages are being tweeted, by whom and how.

Michael Schade, one of Scannell’s students, explains the need.

“Twitter describes itself as ‘the best way to discover what’s new in your world,’ but there is a fundamental issue with this: ‘world’ is presently limited by the inclusion of only a handful of languages. Although people can tweet in any language on Twitter, finding users who speak the same language is a difficult, or even a seemingly impossible, task. This is especially true for…minority languages.”

Scannell’s web-crawling software, An Crúbadán, first seeds Twitter searches with common but distinctive words of his 500 languages and crawls Twitter. It finds users who speaks these languages and ranks them. Scannell then recalculates trending topics with a focus on the language specifically, where they are imported into the Indigenous Tweet site.

The top minority languages on Twitter are currently Haitian creole, Basque and Welsh.

On Scannell’s project blog, also called Indigenous Tweets, Prof. Scannell says the way he hopes the project will prove useful to minority speakers seeking to keep their languages vital is to make “it easier for speakers of indigenous and minority languages to find each other in the vast sea of English, French, Spanish, and other global languages that dominate Twitter.”

As Schade points out, Twitter attempts to classify the language of all tweets, but it sometimes does a poor job of it, especially with the minority languages. Because Scannell has amassed large corpora and developed a technology geared toward identifying languages that might have little readily-available data to start with, he has seen much higher accuracy in language identification and analysis.

This is research that could help Twitter differentiate their users with greater specificity, as well as allow the growth of individual user’s communities based on, among other things, language use.

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.