Submissions/Wikitext: upcoming changes, available tools, what you can do./notes

From Wikimania

SESSION OVERVIEW

Title
Wikitext: Upcoming changes, upcoming changes, available tools, what you can do.
Day & time
Friday Aug 11 11:00
Session link
Submissions/Wikitext:_upcoming_changes,_available_tools,_what_you_can_do.
Speaker
Subbu Sastry, James Forrester
Note takers
  • User:Slashme
  • User:Econterms
  • User:Catrope

SESSION SUMMARY

What's being done

  • The older PHP parser translated from wikitext to HTML; the newer Parsoid does that and also translates from HTML to wikitext
  • Want to migrate everything to Parsoid.
  • Want to make it easier for editors to write wikitext more easily, less error-prone.
  • Want to make the parsing more efficient. -- this seems to mean easier for the computer, less CPU-consuming, and maybe faster for the user
  • Output: want to make it easier for tools to understand the HTML (what HTML comes from what Wikitext); expose semantics
  • Want to unify the parsers: use the same parser for reads and edits.
  • wikitext syntax will have to change a bit for this to work; there are computer-science criteria involved in making the languages more completely translatable
  • Structure semantics will make it easier to reason about wikitext.
  • Edits to popular templates really hit performance right now; need to change that.
  • Want to support more finely granular edits, to reduce edit conflicts
  • Make templates safer

How does this affect users?

  • Changes to wikitext to fix edge cases, improve semantics, etc.
  • Changes to the HTML will affect gadget creators: HTML5; semantic markup
  • Extension authors rely on extension hooks. This will change, so that parser internals will not be exposed to extensions.

Finished

  • Preprocessor code has been changed to make it more [?] - fixing edge cases in language converter code.

Ongoing

  • Announced on WikiTech etc.- a switch from one library that cleans up HTML before it is output, to another: Tidy (HTML4 based) will be replaced with RemexHTML which is HTML5 based.

Upcoming

  • Changes to how images are marked up; using <figure> tags instead of <div>
  • heredoc syntax for blocks of multiple templates
  • Balanced templates: right now Templates can return almost anything, even broken markup without closing tags. Want to make it such that the output of templates will be valid, to give more structured and predictable output.

Tidy replacement

  • HTML4 spec is 90s tech. Want to replace it with HTML5. For more info see https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy/FAQ
  • Some wikitext might display differently!
  • Changing wrapping, navboxes for example. Tidy has bugs which affect whitespace rendering and can give unlimited line lengths.

How to help


Q&A

Q: Is there something in HTML5 features that will shift? Anything else beyond <figure>, maybe screensets, picture elements, thngs that would help make the site more responsive? We only have fixed pixel sizes for images currently, this makes infoboxes look ugly on mobile. Making that fluid in terms of images would help a lot.

A (James): We transitioned from HTML4 to HTML5 10 years ago when Firefox started scraping our site assuing that it was HTML5. Users were making assumptions ; HTML4 doesn't allow block content in table captions. If you put a list in a table caption, Tidy will then break the list for you, and it willl appear outside the table. Remex doesn't do that. That doesn't change the client rendering: Tidy was rendering it correctly already. In terms of video/audio embedding: separate work in that space is mostly done by volunteers; may have significant impacts on the cleanness of the HTML, but not for readers. Dynamically sized videos will be more reader side?? Technology exists, but not sure whether that will happen. James says HTML 5 is overwhelmingly common in browsers now.

Q follow up: images are a major problem due to fixed image sizes, with no other possibility to set the sizes. Want this to be fixed in wikitext: give an option to make that fluid (e.g. image should fit box)

A (James): In many ways it's profoundly inaccessible to allow users to set image sizes (bad for visually impaired and mobile). Proposals to change wikitext image transclusion, e.g. set image role (e.g. hero / highlight) instead of size, associated with content, e.g. text or description.

Proposals for specifying image roles: https://phabricator.wikimedia.org/T90914 Discussed at wikimania Esino Lario.

Q follow-up: even if the user doesn't set the image size by hand, it's hard-coded in the HTML. You have no chance to say "render this image as wide as the reader's screen". Shouldn't be improved from the ? side but from the render engine side.

A (Subbu): There are a bunch of proposals to improve wikitext for images, we will have consultations to talk through those details.

Q follow-up: If you put a bigger image in an infobox, it makes the infobox bigger. So there's a contradiction: make the image as wide as possible, but keep it within the infobox. Doable, but needs thinking about

Q (Elitre): More details about balanced templates?

A (Subbu): [backup slide about balanced templates] You can add an annotation to the template saying how it's going to be used. As a string, attribute, inline context, in a table, as a block element, etc. Template authors know how they intend for the template to be used. If you provide these annotations, we can enforce that use case. If it's going to be used inline, we can fix markup errors. If it's an attribute, we can do escaping without you having to do nowikis. More predictable rendering. Also allows us to improve performance on pages with lots of templates.

Balance also helps ensure that if you forget a close tag inside your template (or an argument to the template) it doesn't break the rest of the page.

Balance task: https://phabricator.wikimedia.org/T114445

Q (econterms): Any particular projects or sites that will get the new software first?

A (James): Probably. Likely first on sites where few pages would be affected; they have few errors induced or in the wikitext. Haven't worked out a threshold for what level of disruption is OK. Some templates will only change by 1-2 pixels, which is OK, but in some cases (e.g. CSS overlays) we don't want to break that. We will not switch it off if it breaks 10 million pages, but somewhere between that and 0, depending on how active the community is in fixing stuff, we can do this. Some wikis have many breakages which are really all one template. A particular template in Swedish was shown on the slides. We're not here to break wikis, but we have built a system to help people to migrate wikitext. It's been changing since we started, and previously when we broke it, we'd just send out mails saying "we've changed something, locked the wiki". Now communities will not be happy with this. Better to find the problems first

PRESENTATION / SLIDES / LINKS TO MATERIAL SHARED

https://wikimania2017.wikimedia.org/wiki/File:Wikitext.changes.wikimania2017.pdf