Submissions/Kathabhidhana: Recording words for Wiktionary and preparing for an AI assistant
![]() |
This is an Open submission for Wikimania 2017 that has not yet been reviewed by a member of the Programme Committee. |
- Submission no. 3007 Subject - T3
- Title of the submission
- Kathabhidhana: Recording words for Wiktionary and preparing for an AI assistant
- Type of submission (lecture, panel, tutorial/workshop, roundtable discussion, lightning talk, poster, birds of a feather discussion)
- Tutorial/workshop
- Author of the submission
- type of submission
- workshops
- Language of presentation
- English
- E-mail address
- psubhashish
gmail.com
- Username
- psubhashish
- Prateek Pattanaik
- Country of origin
- India
- Affiliation, if any (organisation, company etc.)
- Mozilla (Subhashish Panigrahi)
- Personal homepage or blog
- http://psubhashish.com (Subhashish Panigrahi)
- https://medium.com/prateek-pattanaik (Prateek Pattanaik)
- Abstract (up to 300 words to describe your proposal)
Artificial Intelligence is now taking over the Internet. However, be it Jarvis, Google Assistant, or Siri, there is a human voice behind all of them. And we, the Wikimedia community, can take two interesting takeaways from this; a) we need to think beyond what our Wikimedia projects are currently capable of, and b) how we can beat the proprietary giants in the open way.
Vast libraries of many kinds (wordlist, phoneme libraries, word recordings and more) are the core of most Natural language processing projects. And Wikimedia projects are inherently content-rich repositories of both contemporary and old vocabulary. But when audio recordings are talked about, there exists very little out there under open licenses. The status of text-to-speech and speech-to-text engines for many languages, especially open source solutions, are way too gloomy. And Wikimedians can play a great role in contributing towards bringing more good quality audio that will enrich Wiktionary, open a way for accessibility, and help better/create AI solutions for their language.
Kathabhidhana, a new open source project that aims at creating large chunks of audio recording in any language is in its infancy. Inheriting its source code from another open source project, and limitations of low user documentations of other projects, Kathabhidhana is trying to bridge the gap that exists between the common users and developers in the community.
This workshop aims at discussing the needs for the next millennium as far as finding more AI use of Wiktionary goes, and assessing some of the identified needs, and demonstrate a few recording tools like Kathabhidhana and Kathabhidhana for iOS that can be used to record large chunks of pronunciation, free software like Audacity to clean up recorded audio, and tools like Pattypan to upload them into Commons, curating metadata, and all of this in a home studio setup.
- What will attendees take away from this session?
The participants will get to learn about some zero-minimila investment home-studio setups specifically to record audio that is suitable for Wikimedia projects, curating meta data, and batch automation for recording words either on a Linux device using a python script or on iOS using a simple workflow to save time.
- Theme of presentation
- Technology, Interface & Infrastructure
- For workshops and discussions, what level is the intended audience? Intermediate
- Length of session (if other than 25 minutes, specify how long)
- 45 minutes - 1 hr
- Will you attend Wikimania if your submission is not accepted?
- Yes
- Slides or further information (optional)
- video presentation
- Special requests
- A microphone-mixer-speaker setup would be very useful
- Submission is a Draft
- Yes.
![]() |
This is a Completed submission for Wikimania 2017 ready to be reviewed by a member of the Programme Committee. |
Interested attendees
If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).
- John Andersson (WMSE) (talk) 08:53, 14 March 2017 (UTC)
- Amir É. Aharoni (talk) 17:33, 8 April 2017 (UTC)
- Daniel Mietchen (talk) 03:11, 9 April 2017 (UTC)
- Gtaf (talk) 02:47, 20 May 2017 (UTC)
- Noé (talk) 15:09, 22 May 2017 (UTC)
- MathieuMD (talk) 19:03, 25 May 2017 (UTC)
- Yug (talk) 07:54, 4 June 2017 (UTC)
- ...