-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Arrays type and rels should not contain duplicate items. #30
Comments
also from http://pin13.net/mf2-dev/ PHP{
"items": [
{
"type": [
"h-cite",
"h-entry",
"h-entry"
],
"properties": {
"name": [
""
],
"url": [
"#"
]
}
}
],
"rels": {
"me": [
"#",
"#"
],
"bookmark": [
"#"
]
},
"rel-urls": {
"#": {
"rels": [
"me",
"bookmark",
"me"
]
}
}
} |
Since HTML 'class' and 'rel' attributes are defined as unordered sets, we must preserve that semantic across any parsing transformations, in order to avoid introducing meaningless noise like artificial ordering which could accidentally cause a consuming application to erroneously infer and depend on such. The parsing spec must require uniqueness in the 'type' array accordingly, and since JSON arrays do have a defined ordering (whether you want it or not), the best we can do is to define a canonical ordering that does not imply anything about the unordered source, such as alphabetical ordering of unique h-* classnames. For 'rel' attributes, the spec http://microformats.org/wiki/microformats2-parsing#parse_a_hyperlink_element_for_rel_microformats already says the right things for treating them as sets, and notably does preserve source order in the URL sub-arrays for each rel key, which is intentional. (Originally published at: http://tantek.com/2018/079/t2/) |
+1 |
|
* Parse the rel attribute in accordance with the WHATWG spec: https://infra.spec.whatwg.org/#split-on-ascii-whitespace * Only list unique rel values in the rel-urls output, fixes microformats#159: microformats/microformats2-parsing#30 * Sort the unique rel values alphabetically: microformats/microformats2-parsing#29 * Correctly merge attribute values into the resulting object.
* Parse the rel attribute in accordance with the WHATWG spec: https://infra.spec.whatwg.org/#split-on-ascii-whitespace * Only list unique rel values in the rel-urls output, fixes microformats#159: microformats/microformats2-parsing#30 * Sort the unique rel values alphabetically: microformats/microformats2-parsing#29 * Correctly merge attribute values into the resulting object.
The
div
element in the following example really only specifies 2 classes on itself. Even if theclass
attribute contains three terms. And thea
element creates 2 different relations between the source document and the URL inhref
, even with three terms in therel
attribute.If we compare the development version of the Python parser, with the Go parser, the issue becomes clear. The Python parser only shows unique values for
["items"][0].type
and["rel-urls"]["#"].rels
, while the Go parser will show duplicateh-entry
andme
values there.The
class
andrel
attributes in HTML are the only ones microformats parsing depends on that are sets in the source HTML where duplicate terms have no effect. These are mapped to arrays intype
andrels
respectively.The proposed solution is to:
type
andrels
.This is actually already the case for
rels
:Parser output
Python
Go
The text was updated successfully, but these errors were encountered: