You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mwparserfromhell seemingly has an issue to extract text from "File:" wikilinks with additional attributes.
In [1]: import mwparserfromhell
In [2]: w = "[[File:test.jpg|thumb|Label text]]"
In [3]: mwparserfromhell.parse(w).nodes[0].text
Out[3]: u'thumb|Label text'
I think the desired output is not "thumb|Label text" but "Label text".
The text was updated successfully, but these errors were encountered:
@ikuyamada I would actually expect it to spit out an array containing("thumb","Label text"). I'm guessing that it just hasn't evolved to that yet, and lacking that kind of support, "thumb|Label text" seems correct to me.
"thumb|Label text" is correct, since the parser treats all wikilink-like things the same way. Ideally, we would understand what a file is and treat its caption specially (so you could do node.caption instead of node.text, which would give the entire chunk), but this is problematic since we don't have a reliable way to determine what is a file link and what isn't, due to site- and language-specific namespace aliases. I suppose we could just have .caption exist for all links, but this would entail new parsing rules. I'm willing to add this since it's been requested before.
Feel free to 🐟 me if it is already in there, but does this mean that you are going to have it parse the whole string to have it output node.height, node.width, node.align, node.valign, node.mode (thumb, frameless, etc), node.link? If you are going to parse out each chunk, then you might as well put them in their own places.
mwparserfromhell seemingly has an issue to extract text from "File:" wikilinks with additional attributes.
I think the desired output is not "thumb|Label text" but "Label text".
The text was updated successfully, but these errors were encountered: