Jump to content

Submissions/Creating articles from CC-BY content; how hard can it be?

From Wikimania


This is an Open submission for Wikimania 2017 that has not yet been reviewed by a member of the Programme Committee.

Submission no. 5018 - GT
Title of the submission
Creating articles from CC-BY content; how hard can it be?
Type of submission (lecture, panel, tutorial/workshop, roundtable discussion, lightning talk, poster, birds of a feather discussion)
lecture (and would be willing to participate in any related panels, roundtables etc)
Author of the submission
Kerry Raymond (User:Kerry Raymond)
Language of presentation
E-mail address
User:Kerry Raymond
Country of origin
Affiliation, if any (organisation, company etc.)
Member of Wikimedia Australia
Personal homepage or blog

Status update on the QHR project

Abstract (up to 300 words to describe your proposal)

Re-using open source material in Wikipedia articles is a lot harder than one might expect. In 2013, I negotiated with the Queensland Government to release the Queensland Heritage Register (consisting of approx 1700 sites with associated meta-data, narratives about the history & architecture of each site, and the reasons for its heritage listing) under a CC-BY license. This rich text-and-data source had the potential to create 1500 substantial new articles on English Wikipedia and to expand 200 existing articles (many little more than stubs).

The challenge was then to create/expand those articles, a task that seemed easy at the outset but proved more difficult in practice. As the project is now almost completed, it is appropriate to reflect on:

  • how the task was undertaken
  • what were some of the difficulties encountered and how they were overcome
  • the use of automation to assist in the task

While the presentation will be informed by the Queensland Heritage Register project, the intention is to draw out the generic issues likely to occur in other projects re-using existing material to assist others attempting a similar task to tackle their project more effectively.

Topics to be covered include the issues to be considered in creating the articles, including article titles, infoboxes, adding wikilinks to narratives, geolocation, categorisation, and photos. Other issues relate to creation of in-bound links to the articles (to avoid orphans), the presence of internal cross-referencing and external referencing within the source material, and the challenges that arise when the source text was written at a time and a place and for a purpose and for an audience that are not the same as a Wikipedia article.

Expanding existing articles presents more challenges, especially when the topic is notable for reasons unrelated to the new open source material.

What will attendees take away from this session?

Attendees will learn what issues to be alert for when re-using open content, what work should be done in preparation for a large re-use project both technically and within the community, where and how automation can be used to minimise manual work (e.g. creating wikilinks with articles), and where manual work will almost certainly be needed (the biggest time consumer, best minimised). Strategies for expanding existing articles will be provided.

Theme of presentation

Sister Projects; Legal & Free Culture; Technology, Interface & Infrastructure

For workshops and discussions, what level is the intended audience?

Intermediate and advanced. Also relevant to GLAMs as providers of re-usable source material.

Length of session (if other than 25 minutes, specify how long)
25 minutes
Will you attend Wikimania if your submission is not accepted?
Slides or further information (optional)

Presentation slides and any other associated material will be made available later.

Special requests
Is this Submission a Draft or Final?

This is a Completed submission for Wikimania 2017 ready to be reviewed by a member of the Programme Committee.

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).