Submissions/Editing challenges on multi-script wikis

From Wikimania

This is an accepted submission for Wikimania 2017.

Submission no. 3045 - T1, T3, TC
Presentation given at Wikimania 2017 on LanguageConverter and editing on multi-script wikis.
Title of the submission
Editing challenges on multi-script wikis
Type of submission (lecture, panel, tutorial/workshop, roundtable discussion, lightning talk, poster, birds of a feather discussion)
Author of the submission
C. Scott Ananian (cscott)
Language of presentation
E-mail address
Country of origin
Affiliation, if any (organisation, company etc.)
Wikimedia Foundation
Personal homepage or blog
Abstract (up to 300 words to describe your proposal)
Editors who write in English, French, or another Western European language probably don't stop to think about the fact they share the same writing system, Latin script. But that's not the only way to write. Many may have heard of Cyrillic script, used across eastern Europe and north and central Asia, including Russia. But there are hundreds of scripts in use around the world, and we have Wikipedia projects in over 50 different scripts.
This talk is concerned with the subset of these projects where multiple scripts are used on the same wiki. In some places the same language is written in different ways by different speakers. For example, Serbian is written in either Latin or Cyrillic script. Standard Chinese is written in traditional or simplified scripts. Kurdish uses Latin, Arabic, or Cyrillic scripts.
Mediawiki uses a technology called LanguageConverter to automatically transliterate between scripts, so that you can read one of these wikis in your choice of writing system. This avoids unnecessary forks of the wikis and makes our content available to more readers.
LanguageConverter also allows you (to a slightly lesser degree) to create and edit articles in your choice of writing system. However, this quickly leads to wikitexts which are a mixture of different writing systems. Unless all editors can read (and proofread) all the writing systems, article editability suffers as the number of interleaved contributions increases.
This talk will describe how the Parsing team is updating LanguageConverter to better integrate it into core, how we are translating LanguageConverter markup into Parsoid's HTML5-based representation, and how we hope to use Parsoid technology to make a substantial improvement in native script editing, finally untangling the jumble of scripts.
What will attendees take away from this session?
Attendees whose native language uses a single (Latin) script will learn a bit about our projects which use multiple writing systems.
Everyone will come away with an understanding of the technology which lets mediawiki convert between writing systems, some of its limitations, and some exciting improvements planned for the future!
Theme of presentation
Technology, Interface & Infrastructure
For workshops and discussions, what level is the intended audience? Intermediate
Length of session (if other than 25 minutes, specify how long)
25 minutes
Will you attend Wikimania if your submission is not accepted?
Slides or further information (optional)
Presentation slides, Slides w/ speaker notes, phab:T17161, phab:T113002, phab:T87652
Google doc link to slides
Special requests
Is this Submission a Draft or Final?

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. Amir É. Aharoni (talk) 09:56, 9 April 2017 (UTC)[reply]
  2. Birgit Müller (WMDE) (talk) 21:39, 25 April 2017 (UTC)[reply]
  3. Christoph Jauera (WMDE) (talk) 07:02, 26 April 2017 (UTC)[reply]
  4. SSastry (WMF) (talk) 18:43, 16 May 2017 (UTC)[reply]
  5. --Elitre (WMF) (talk) 13:34, 23 June 2017 (UTC)[reply]
  6. --Ziko (talk) 13:56, 12 August 2017 (UTC)[reply]
  7. -Krish Dulal (talk) 15:01, 12 August 2017 (UTC)[reply]