Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[FR] check_config() should check for valid URLs #616

Closed
apreshill opened this issue Apr 20, 2021 · 4 comments
Closed

[FR] check_config() should check for valid URLs #616

apreshill opened this issue Apr 20, 2021 · 4 comments
Labels
next to consider for next release

Comments

@apreshill
Copy link
Contributor

Again from my debugging diaries:

> blogdown::check_config()
― Checking config.toml
| Checking "baseURL" setting for Hugo...
○ Found baseURL = "openscapes.org"; nothing to do here!

the issue was that there was something to do! The baseURL should start with http- is there a way to check if the baseURL is a valid URL?

@yihui
Copy link
Member

yihui commented Apr 20, 2021

This is actually a hard problem, because in theory, there is no way to tell if openscrapes.org is valid or not. This baseURL string can be a valid subpath (gohugoio/hugo#7823 (comment)) under another domain, e.g., https://example.org/openscrapes.org/. baseURL doesn't have to start with http.

We'll have to use some heuristics to detect "obviously wrong" base URLs, which I think will be helpful, because this seemingly simple problem just confuses too many people (https://yihui.org/en/2018/01/valid-url/).

@yihui yihui added the next to consider for next release label Apr 20, 2021
@apreshill
Copy link
Contributor Author

Oof yes. I wonder if the check could be:

  1. level 0- do you have something there that is not just a slash
  2. level 1- is it a valid URL? maybe use something like: https://search.r-project.org/CRAN/refmans/RCurl/html/url.exists.html
  3. level 2- if not just a slash AND not a valid URL, print essentially: not a valid url, but carry on if using subpaths on purpose!

@mcanouil
Copy link

mcanouil commented May 1, 2021

Maybe r-lib/urlchecker could check the base URL?

@yihui
Copy link
Member

yihui commented Jul 21, 2021

@mcanouil As I said above, the base URL can be just a subpath without the http + domain (e.g., baseURL: foo/). That's a valid base URL. In fact, the special case, a single slash /, is also a valid base URL. In theory, there is nothing wrong with it and it has its advantages. The main problem is that some RSS readers can't understand it (e.g., R-bloggers).

I tend to fix the most common and confusing issue by heuristics, i.e., when the base URL looks like a domain without the protocol (http). I tend not to validate the URL with url.exists() in other cases, since pretty much anything that requires internet connection won't be robust.

@yihui yihui closed this as completed in 8210825 Jul 21, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
next to consider for next release
Projects
None yet
Development

No branches or pull requests

3 participants