Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Abbreviated decade (e.g. ’80s) breaks smart punctuation #1030

Closed
barrymieny opened this issue Jul 14, 2024 · 1 comment · Fixed by #1034
Closed

Abbreviated decade (e.g. ’80s) breaks smart punctuation #1030

barrymieny opened this issue Jul 14, 2024 · 1 comment · Fixed by #1034
Labels
bug Something isn't working right spec compliance Issues or question about compliance with the CommonMark or GFM specs

Comments

@barrymieny
Copy link

Version(s) affected

2.4.2

Description

When the QuoteParser encounters a string such as:

In the middle to late ’90s, it was chaos. "We couldn't get out of that rut."

... it seemingly includes the apostrophe before 90 in its calculations, and you end up with a string like:

In the middle to late ‘90s, it was chaos. “We couldn’t get out of that rut.“

... where it replaces the already formatted closing single quote before 90 (which is how abbreviated decades should be formatted) with an opening single quote, and then the final double quote ends up as another opening double quote instead of a closing double quote.

I don't know whether the solution lies in the QuoteParser being aware of the abbreviated decade format or having the option to instruct it to disregard already formatted quotes or specify it to only format unformatted quotes.

I'm migrating a Node-based site that uses markdown-it to PHP, and markdown-it handles this situation properly, so there does seem to be a way to interpret this.

How to reproduce

I tested this with the GrahamCampbell/Laravel-Markdown package as well as using thephpleague/commonmark directly, so the problem does seem to lie within this package in either the QuoteParser or QuoteProcessor.

To reproduce, use any decade abbreviation (e.g. '60s unstyled or ’70s styled) in a paragraph with other quoted text. Both will result in the wrong formatting.

@colinodell colinodell added bug Something isn't working right spec compliance Issues or question about compliance with the CommonMark or GFM specs labels Jul 22, 2024
@colinodell
Copy link
Member

Thanks for the highly detailed bug report!

The smart punctuation spec doesn't dictate how already-formatted quotes should be handled. It seems that I must have decided at some point that we should try to correct bad formatting, like ”backwards quotes“, and I implemented this by normalizing all quotes (including the already-formatted ones) into " or ' and then applying the algorithm.

In retrospect, I think this was the wrong approach, for three reasons:

  1. We should respect existing formatting choices (as you suggest)
  2. The spec doesn't call for this extra normalization
  3. The commonmark.js library which introduced that spec doesn't apply that extra normalization

So I think the correct change would be to ignore any already-formatted quotes. I'll put together a PR for this.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working right spec compliance Issues or question about compliance with the CommonMark or GFM specs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants