Skip to content

Improve intra-site links (1) #19526

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

tengqm
Copy link
Contributor

@tengqm tengqm commented Mar 7, 2020

This attempts to improve intra-site links so that the links can be automatically translated to localized versions.

xref: #18403

This PR is about the concepts/architecture and concepts/cluster-administration directories.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 7, 2020
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Mar 7, 2020
@tengqm
Copy link
Contributor Author

tengqm commented Mar 7, 2020

/assign @zacharysarah @jimangel @kbarnard10

Assigning this to SIG chairs because this PR has some site wide implications.

@netlify
Copy link

netlify bot commented Mar 7, 2020

Deploy preview for kubernetes-io-master-staging ready!

Built with commit d1a5b61

https://deploy-preview-19526--kubernetes-io-master-staging.netlify.app

@remyleone
Copy link
Contributor

As a reference for how Hugo ref work: https://gohugo.io/content-management/cross-references/

@sftim
Copy link
Contributor

sftim commented Mar 7, 2020

This kind of commit is likely to trigger a conflict with work for v1.18
How about holding off until that release is out?

@tengqm you could have this PR target just content/en/docs/concepts/example-concept-template.md and then do more pages in a future PR. Does that approach work for you?

@tengqm tengqm force-pushed the improve-links-1 branch 2 times, most recently from c6ad252 to 4002252 Compare March 8, 2020 01:22
@tengqm
Copy link
Contributor Author

tengqm commented Mar 8, 2020

This kind of commit is likely to trigger a conflict with work for v1.18.
How about holding off until that release is out?

I don't think this PR is anything different from a typo fix. It breaks nothing.

you could have this PR target just content/en/docs/concepts/example-concept-template.md and
then do more pages in a future PR. Does that approach work for you?

I was hoping that I can find out how many gochas out there when applying this change to more docs. A change on one page is not enough for this purpose. With this PR, if you review carefully, you will find out that:

  • Links to images are not changed, although they are "intra-site" links. The reason is that Hugo doesn't process the ref function in that case.
  • Links to generated API docs are not changed because we have embedded shortcodes if we do so. Keeping them unchanged seems reasonable because ... no one will translate the auto-generated API reference, right?
  • Links to some missing pages were treated as redirections (or event redirections of redirections) previously. This will not work for ref. We will have to point the link to the real markdown.
  • Missing links can be found immediately while building the site. This is a bonus.

@tengqm tengqm force-pushed the improve-links-1 branch from 4002252 to 0f7877a Compare March 8, 2020 01:41
@sftim
Copy link
Contributor

sftim commented Mar 8, 2020

Thanks @tengqm—you've answered and addressed my concerns around waiting for the v1.18 release.

I endorse the aim of this PR: /approve
It seems worth discussing this at a SIG Docs weekly meeting too.

@kbhawkey
Copy link
Contributor

This attempts to improve intra-site links so that the links can be automatically translated to localized versions.

xref: #18403

This PR is about the concepts/architecture and concepts/cluster-administration directories.

@tengqm , I am reading through the changes. While this is an isolated PR, it seems as if you are proposing that this is an option for changing all intrasite links (and that more PRs would follow)?
Could you explain a bit more about how the shortcode (ref or relref) works throughout the site. I will look at the generated pages to view the format of the links.
Are there other sites using Hugo that have the same issue (or have solved the issue). What does Hugo recommend in this case?
I only see changes to the English content. What would the changes look like to the same files only localized?

From, https://gohugo.io/content-management/cross-references/#use-rel-and-relref
Hugo is flexible in how we search for documents, so the file suffix may be omitted.

{{< relref path="document.md" lang="ja" >}}

@sftim
Copy link
Contributor

sftim commented Mar 11, 2020

I only see changes to the English content. What would the changes look like to the same files only localized?

That's a really good question. Maybe for a follow up PR? Maybe it needs to be in this one?

@tengqm
Copy link
Contributor Author

tengqm commented Mar 11, 2020

@kbhawkey There was a PR for testing (#18374). If we use the ref way of spelling out URLs, no localization team need to change '/doc/something' to '/zh/doc/something' or '/fr/doc/something'. Also please see discussion in #18403.

@kbhawkey
Copy link
Contributor

To recap my understanding (which may be inaccurate) 😄 :

Currently, each localized page has to modify the intrasite links on a page to point to a link that represents a localized version of some page. If the link is updated to point to a page that does not exist (not translated), that is a problem.
If the link is not updated, then the link points to the English version of the page (which is desirable if the page does not exist, but not desirable if there is a localized version of the page)?

The use of ref or relref will cause a build failure (Hugo) if there is not a translated page referred by the intrasite link? Will the link fall back to the English version of the page? What is the solution in this case.

@tengqm
Copy link
Contributor Author

tengqm commented Mar 12, 2020

@kbhawkey Seems your understanding had been verified by the experiment I did in #18403. Missing page won't pass Hugo compilation. This might be a good thing because it means all dangling links in the localized sites are now detected as early as possible. In the experiment #18403, the Chinese version of Deployment page was missing. Hugo detected that and complained.
Falling back to English version could be an option. However, from user's perspective, the only difference is that we are replacing a 404 page with an English page that the user cannot read. (They may prefer reading the English site if they could).
To make sure Hugo can still compile your current localization PR which links a page that doesn't exists yet, you may want to commit a "placeholder" page. For example, it can be an empty page with a "header 1" saying "TO BE TRANSLATED".

@sftim
Copy link
Contributor

sftim commented Mar 12, 2020

an empty page with a "header 1" saying "TO BE TRANSLATED".

At the moment, the style we have fills in the navigation with (marked) links to the English originals. I'd like to keep that behaviour. To me, serving an “under construction” type page with a 200 response wouldn't be an improvement.

It'd be neat to use CSS or some other technique to highlight if a link is going to switch localization on the reader.

@tengqm
Copy link
Contributor Author

tengqm commented Mar 12, 2020

@sftim If there is a strong opinion for auto-redirecting missing pages in localization to their corresponding English version, we can do that. Actually, this technology has been proposed before. We will need to introduce some shortcodes for that. I was not voting for that approach because I'm unsure whether the website team welcomes such code.
No "under construction" page is supposed to last long. Eventually, the localization team will catch up, hopefully, unless we keep restructuring the English site.

@sftim
Copy link
Contributor

sftim commented Mar 12, 2020

Perhaps (separate PR): the 404 page can something like “sorry, /ko/docs/home/foobar/ is 404. [AJAX query] If you speak English, there's /docs/home/foobar/ though?”

@kbhawkey
Copy link
Contributor

To make sure Hugo can still compile your current localization PR which links a page that doesn't exists yet, you may want to commit a "placeholder" page. For example, it can be an empty page with a "header 1" saying "TO BE TRANSLATED".

Some more thoughts:
With the ref shortcode, it is not possible to "switch" between localized content and the en pages. Is this what readers want?
What is the percentage of translated pages per localization? Some localizations may have a larger number of "empty" pages.

Back to the Hugo build question. If the en pages (in a docs section) update to use ref links then all corresponding localized pages would also update with ref links, correct?
From what I read, I would expect a largish number of build failures unless "substitute" pages are set to create (when a link points to a non-existent localized page).
Do you want to configure Hugo to produce this localized "placeholder" page,
https://gohugo.io/content-management/cross-references/#ref-and-relref-configuration
Variables:
refLinksErrorLevel (“ERROR”)
refLinksNotFoundURL

OR create a static, localized "substitute" page for every missing page?

@sftim
Copy link
Contributor

sftim commented Mar 12, 2020

a static, localized "substitute" page for every missing page

The trouble then is that, as I understand it, Netlify will always serve that with a 200 status. If it were possible to serve a staticly rendered, relevant 404 page then I'd be really fine with that.

It's very likely that there will be localizations with missing content and in the future I expect that to become more common.

@tengqm
Copy link
Contributor Author

tengqm commented Mar 13, 2020

@kbhawkey

With the ref shortcode, it is not possible to "switch" between localized content and the en pages. Is this what readers want?

Not sure about this. ref was not designed with an option that allows you to fall back to a default language. Readers can always switch between languages using the language dropdown list in the main navigation bar. Falling back to English site can be achieved by introducing a shortcode (see #18678). However, as a reader, my experience is not as good. Why am I redirected to English page here? Is this a bug?

What is the percentage of translated pages per localization? Some localizations may have a larger number of "empty" pages.

No idea. Current situation is that if we don't add the /<lang>/ prefix, all links in the localized pages seem to work and they bring you back to the English site. We have no idea how many pages are not translated because 1) during build time, Hugo is not complaining; 2) during runtime, links are "working".

Back to the Hugo build question. If the en pages (in a docs section) update to use ref links then all corresponding localized pages would also update with ref links, correct?

No. Using ref in the English site is an attempt to ease localization rather than to break anything. None of the existing translation will break. It is up to the localization team to use ref or the more traditional way of markdown links. You can actually think of this as a typo fix. Every localization teams can take their own pace to make this transition. Once all transitions are done, no team needs to add the /<lang>/ prefix in the localized pages.

From what I read, I would expect a largish number of build failures unless "substitute" pages are set to create (when a link points to a non-existent localized page).

Hugo build will fail if and only if you are using ref in the a localized page and the target referenced is missing. This is not gonna happen automatically for all pages at once.

Do you want to configure Hugo to produce this localized "placeholder" page, OR create a static, localized "substitute" page for every missing page?

I'd go with placing a "to be localized page". If such a page can be reused by customizing Hugo, that would be great. Creating "placeholder" pages for every missing target is less ideal. I agree. If we can use refLinksNotFoundURL, it may address the concerns from @sftim as well.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2020
@kbhawkey
Copy link
Contributor

Hi. I am reading about upgrading Hugo to take advantage of "portable links". Would this work or not? Introducing the ref shortcode into several English files starts the process of converting the entire site (Are we suggesting that authors should use this shortcode for intra-site links?). I'd rather see a more complete prototype or test out the links/site with the latest version of Hugo.

@tengqm
Copy link
Contributor Author

tengqm commented Mar 19, 2020

@kbhawkey Thanks for the suggestion. I did take a look at the markdown render hooks and the example snippet. My current impression (could be wrong) is that it is not directly applicable to our multilingual case.

The main concern to the approach proposed here, as commented previously, is about the handling of missing links. If we use the "markdown render hooks", we will be introducing some shortcode which has to deal with all kinds of links and all languages, while at the same time solving the missing links issue. Consider the following cases:

I'm not convinced that a simple render hook can resolve all of the above problems.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 2, 2020
@sftim
Copy link
Contributor

sftim commented Apr 18, 2020

@tengqm - in a SIG Docs meeting, @kbhawkey made a compelling case that Hugo could render traditional Markdown hyperlinks in a localization-friendly way:

  • if the localization has the target page, link to that
  • if not, link to the English page instead
  • keep the navigation working too

Localizations might still want to adjust links that include fragment identifiers. However, most of the time when a localization adds a new page the rendering approach that @kbhawkey suggested sounds like it will let links to that new page “just work”.

I hope you're convinced now that a different approach from this PR is that way forward. What are your thoughts?

@kbhawkey
Copy link
Contributor

@tengqm - in a SIG Docs meeting, @kbhawkey made a compelling case that Hugo could render traditional Markdown hyperlinks in a localization-friendly way:

* if the localization has the target page, link to that

* if not, link to the English page instead

* keep the navigation working too

Localizations might still want to adjust links that include fragment identifiers. However, most of the time when a localization adds a new page the rendering approach that @kbhawkey suggested sounds like it will let links to that new page “just work”.

I hope you're convinced now that a different approach from this PR is that way forward. What are your thoughts?

Hi @sftim @tengqm . I looked into this a bit further (PR #20114) and left some comments.
I still think that using a shortcode to publish links is not user/author friendly.
Is there a question about publishing a completely localized version of the content versus continuing to publish content that is a mix of English and localized content?
The English pages are marked (EN) in the left nav tree. It is very clear which pages are localized. I think everyone agrees that updating links can be difficult and reading content with incorrect links is not a great user experience.
I discovered that automatically changing/publishing a link from what is explicitly written in the Markdown file is not straightforward.
Ideally, we want the Markdown file to be the source of truth and generate from this source.
I commented that perhaps what is needed is a script to notify/alert the author that there are links that should be updated to the localized version (a localized version exists, the script lists the required link changes). The script could also check that the English version of the link exists (or suggest a change). The author needs to update the page. What do you think?

@tengqm
Copy link
Contributor Author

tengqm commented Apr 19, 2020

Thanks for keeping the balls rolling ...

I still think that using a shortcode to publish links is not user/author friendly.

Totally agree.

Is there a question about publishing a completely localized version of the content versus continuing to publish content that is a mix of English and localized content?

There have been opinions that a mixture of English and localized content would be the case for a long time. That was also the reason why people wanted the "auto falling back to English" option.

I discovered that automatically changing/publishing a link from what is explicitly written in the Markdown file is not straightforward.

Yup. Automation is difficult, if possible at all. That is why I tried to do all these conversions once for all, so that at least no localization team needs to handle redirected links or links to "foo/bar/_index.md" files.

I commented that perhaps what is needed is a script to notify/alert the author that there are links that should be updated to the localized version (a localized version exists, the script lists the required link changes).

So ... you mean for all French translations, we warn them that there are links not started with "https://" or "#" and they should be revised to "/fr/something"?

The script could also check that the English version of the link exists (or suggest a change).

This is gonna be hard. We have many redirections in the English markdown.

With all the above observations, I'm NOT implying that this PR is the best option going forward. The advantage of this PR is:

  • it works
  • it introduces no new shortcode to maintain
  • as a happy side effect. it cleans the redirected links in the existing contents

The cons, as I see it:

  • converting all existing links is a HUGE job;
  • we are forcing authors to use the ref shortcode for all intrasite links;
  • when a ref used in a localized page points to something not translated, an error is thrown. This can be made into a warning, so not a huge barrier;

@tengqm
Copy link
Contributor Author

tengqm commented Jun 13, 2020

abandoned.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jimangel
You can assign the PR to them by writing /assign @jimangel in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@irvifa
Copy link
Member

irvifa commented Jul 10, 2020

@tengqm Hi, wondering on how you change the whole directory link here? Any provided scripts to change the link for each of localization directory?

@tengqm
Copy link
Contributor Author

tengqm commented Jul 10, 2020

@irvifa When I was working on this, I had to check each and every link in a markdown because I don't know how many variants out there. With this PR, I think I gained a better understanding of the types of links we need to deal with and the "proper" way to deal with them.

Later on, @kbhawkey proposed a script (#21996) for identifying link problems. That is a good starting point. Maybe we can leverage that script to semi-automate this site-wide change.

WDYT?

@irvifa
Copy link
Member

irvifa commented Jul 10, 2020

@tengqm Yes, absolutely I agree with the leverage of doing semi-automate, however using site wide will cause a massive PR because it will affect all available localization, instead how about using that script only for each of the localization? This way each localization team can use the script autonomously and refrain the incoming massive PR. wdyt?

@tengqm
Copy link
Contributor Author

tengqm commented Jul 11, 2020

using site wide will cause a massive PR because it will affect all available localization

Yes, it will be a pain in the short term. Just like what we have done for site theme changes. Once the change to English content is landed, no localization team need to customize each and every intra-site links. The revised links just work for all languages.

how about using that script only for each of the localization? This way each localization team can use the script autonomously and refrain the incoming massive PR. wdyt?

That brings us back to the intent of this PR. We can leave the English site as is. Then for all localization teams (there are 14 of them now), the same link will be changed 14 times, each by an individual localization team. Yes, script can help detect issues, hopefully. What I am trying to argue is that the primary reason we switched to Hugo was that Hugo has a "better" support for localization. Simply by replacing links like this

[CSI](/docs/concepts/storage/volumes/#csi)

with something like this

[CSI]({{< ref "/docs/concepts/storage/volumes.md#csi" >}})

ALL localization teams will benefit from it. Again, we have 14 localization teams now.

@irvifa
Copy link
Member

irvifa commented Jul 11, 2020

@tengqm Understood, would you like to rebase your branch first? I see there’s lots of conflict

This attempts to improve intra-site links so that the links can be
automatically translated to localized versions.

xref: kubernetes#18403

This PR is about the concepts/architecture and
concepts/cluster-administration directories.
@tengqm tengqm force-pushed the improve-links-1 branch from f403c58 to d1a5b61 Compare July 11, 2020 10:29
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 11, 2020
@@ -26,7 +26,7 @@ closer to the desired state, by turning equipment on or off.
## Controller pattern

A controller tracks at least one Kubernetes resource type.
These [objects](/docs/concepts/overview/working-with-objects/kubernetes-objects/#kubernetes-objects)
These [objects]({{< ref "/docs/concepts/overview/working-with-objects/kubernetes-objects.md#kubernetes-objects" >}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is literally should be the absolute path of the link. Also another nit: Probably we can give the best practices of writing links somewhere so others can adopt this way of writing a link? wdyt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The ref shortcode requires the markdown file name. So there are pros and cons:

The benefits as I see it:

  • By explicitly using file names in the links, we can avoid dangling links. For every links written this way, Hugo can help ensure the target does exists, and it can do so for every language translation.
  • As a side-effect, we can kill some link targets that are actually redirects. For example, [kube-apiserver](/docs/admin/kube-apiserver/) has to be rewritten into [kube-apiserver]({{< ref "/docs/reference/command-line-tools-reference/kube-apiserver.md" >}}).

The drawbacks is that rewriting the links is a tedious task because for each link we need to check:

  1. if the link [foo](/docs/bar/) is there, whether we do have [foo](/docs/bar.md) in the file system. If not, we need to check,
  2. if the target file /docs/bar/_index.md is there, and revise the link to [foo](/docs/bar/_index.md) if so. Or
  3. if we have an entry in /static/_redirects that says /docs/foo/ /docs/zoo/ 301. We need to check /docs/zoo/ by repeating step 1 and 2. Or else,
  4. we need to check if the target is of pattern /docs/reference/generated/kube*. We have to leave the target as is in this case because the target is not meant to be localized. The target file lives in /static/docs/... directory.
  5. finally, if the target itself contains a shortcode, we cannot change it. We have to leave it there. For example, [latest version](/docs/path1/{{< latest-version >}}/path2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I had given this some additional thoughts these days. Maybe I was too ambitious to start this thread. I'm not sure such a logic can find its home in GoHugo. Even if I can complete this in a week or so, how difficult it is for doc maintainers to check if all future link changes are obeying such a complicated rule, in order to be both valid and friendly to localization teams.

I'm myself kinda split on this. Maybe we should instead develop a script for this instead? Since we almost know all the gocha's out there, we can come up with a tool for all localization teams to use. Maybe such a tool can be combined with the link checking for English site as well? I don't know. I'm just dumping my brain here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seconded the idea of automation and link checker for this. Since we know the pattern forming automation will be easier since we know the possible gotchas hence we can address this by forming rigorous testing. Need your input on this @sftim wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm minded to look a little more to see if we can convert a hyperlink within a localization eg that appears in Markdown as (say) /docs/tasks/tools/install-kubectl/ so that the rendered HTML has <a href="/eg/docs/concepts/docs/tasks/tools/install-kubectl/>…</a>. In other words, the site handles the conversion automatically.

If that's feasible, that's my favorite option. There might be a change to get advice on this from the people behind Hugo, too.

If we had the automatic conversion in place, we could run a one-time script to clean up all of the links that manually include the localization code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know whom should we contact regarding this Hugo config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hugo has its own page about requesting help so that might be one place to turn.

I'd be delighted to learn that some other group that uses Hugo has already found this problem and prepared a good solution.

@tengqm
Copy link
Contributor Author

tengqm commented Jul 20, 2020

/close
This doesn't seem a viable approach to take for the docs community.
We are tracking down this in a different way (#22541)

@tengqm tengqm closed this Jul 20, 2020
@tengqm tengqm deleted the improve-links-1 branch July 20, 2020 03:26
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants