Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat(gatsby-transformer-remark): Better timeToRead for Chinese/Japanese texts #21312

Merged
merged 5 commits into from
Feb 11, 2020

Conversation

jlkiri
Copy link
Contributor

@jlkiri jlkiri commented Feb 9, 2020

Description

This PR addresses the issue #21311.
It provides better Chinese/Japanese character counting heuristics for gatsby-transformer-remark, in place of the current one which outputs timeToRead values twice higher than expected by a native reader.

I considered using actual morphological parsers like kuromoji but they only target one language at a time, so dealing with both Chinese/Japanese automatically would require two new libraries and possibly dictionary files for morphological analysis. I feel like that is too much for this particular function.

Instead, we can use the fact that most words in both Chinese and Japanese consist of two characters (slightly more for Japanese). After playing with different texts, I found that simply multiplying non-latin character count by 0.56 gives almost the same result as analyzing text with an actual morphological parser (±10 words on average). No libraries needed. This is what I'm doing in this PR.

Note that Korean (which uses whitespace) is already perfectly countable by _.words so I am not dealing with it.

Here is a codesandbox that shows how different approaches count words (gatsby is the current one, smart is the one in this PR and moprhological is the most correct one):
https://codesandbox.io/s/better-word-count-2uziu

Documentation

https://www.gatsbyjs.org/packages/gatsby-transformer-remark/

Related Issues

#21311
#17988

@jlkiri jlkiri requested a review from a team as a code owner February 9, 2020 14:09
@jlkiri jlkiri changed the title feat(gatsby-transformer-remark): Better time to read feat(gatsby-transformer-remark): Better timeToRead for Chinese/Japanese texts Feb 9, 2020
@jlkiri
Copy link
Contributor Author

jlkiri commented Feb 11, 2020

What are starters_validate tests are why are they failing?

@jlkiri jlkiri requested a review from pieh February 11, 2020 10:35
@pieh
Copy link
Contributor

pieh commented Feb 11, 2020

What are starters_validate tests are why are they failing?

We run npm audit on our starters and sometimes it will fail when new advirsory is published on unrelated pull requests. This one was fixed in master already ( #21354 , so don't worry about it - but you can merge master in to get rid of that failing check here

Copy link
Contributor

@pieh pieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @jlkiri!

@pieh pieh added the bot: merge on green Gatsbot will merge these PRs automatically when all tests passes label Feb 11, 2020
@gatsbybot gatsbybot merged commit d677deb into gatsbyjs:master Feb 11, 2020
@gatsbot
Copy link

gatsbot bot commented Feb 11, 2020

Holy buckets, @jlkiri — we just merged your PR to Gatsby! 💪💜

Gatsby is built by awesome people like you. Let us say “thanks” in two ways:

  1. We’d like to send you some Gatsby swag. As a token of our appreciation, you can go to the Gatsby Swag Store and log in with your GitHub account to get a coupon code good for one free piece of swag. We’ve got Gatsby t-shirts, stickers, hats, scrunchies, and much more. (You can also unlock even more free swag with 5 contributions — wink wink nudge nudge.) See gatsby.dev/swag for details.
  2. We just invited you to join the Gatsby organization on GitHub. This will add you to our team of maintainers. Accept the invite by visiting https://github.com/orgs/gatsbyjs/invitation. By joining the team, you’ll be able to label issues, review pull requests, and merge approved pull requests.

If there’s anything we can do to help, please don’t hesitate to reach out to us: tweet at @gatsbyjs and we’ll come a-runnin’.

Thanks again!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bot: merge on green Gatsbot will merge these PRs automatically when all tests passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants