Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

clean_duplicates() is now aware of blogdown rendering method #629

Merged
merged 3 commits into from
May 26, 2021

Conversation

cderv
Copy link
Collaborator

@cderv cderv commented May 26, 2021

This should fix #628

@apreshill could you test this is working for you as expected ?

Code with x1, x2, i1, i2 is not always easy to understand at reading but I think I got it right.

In case of duplicates (two files with same name and extension .md AND .html), if an associated .Rmd files exists then, if blogdown.method is markdown, .html is deleted, otherwise ( method html) .md.

If this works, I'll add the new bullet.

Unit tests are not so easy with blogdown project - they often require a dummy project as in this case. 😟

@cderv cderv requested a review from apreshill May 26, 2021 16:06
Copy link
Member

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@apreshill
Copy link
Contributor

apreshill commented May 26, 2021

Testing:

> blogdown::check_content()
― Checking content files
| Checking for validity of YAML metadata in posts...
○ All YAML metadata appears to be syntactically valid.
| Checking for previewed content that will not be published...
○ Found 0 files with future publish dates.
○ Found 0 files marked as drafts.
| Checking your R Markdown content...
○ All R Markdown files have been knitted.
○ All R Markdown output files are up to date with their source files.
| Checking for .html/.md files to clean up...
● [TODO] Found 27 duplicate output files:

  content/blog/2017-08-11-want-to-work-with-data-don-t-wait/2017-08-11-want-to-work-with-data-don-t-wait.html
  content/blog/2017-08-20-join-the-r-for-data-science-online-learning-community/2017-08-20-join-the-r-for-data-science-online-learning-community.html
  content/blog/2017-09-05-learning-to-learn/2017-09-05-learning-to-learn.html
  content/blog/2017-12-22-r4ds-the-next-iteration/2017-12-22-r4ds-the-next-iteration.html
  content/blog/2018-01-03-data-science-with-r-how-do-i-start/2018-01-03-data-science-with-r-how-do-i-start.html
  content/blog/2018-01-28-r4ds-february-challenge-winning/2018-01-28-r4ds-february-challenge-winning.html
  content/blog/2018-02-12-so-you-ve-been-asked-to-make-a-reprex/2018-02-12-so-you-ve-been-asked-to-make-a-reprex.html
  content/blog/2018-02-24-r4ds-january-challenge-get-involved/2018-02-24-r4ds-january-challenge-get-involved.html
  content/blog/2018-02-24-r4ds-march-challenge-participate-in-a-viewing-party/2018-02-24-r4ds-march-challenge-participate-in-a-viewing-party.html
  content/blog/2018-02-26-gif-it-getting-gifs-in-blogdown/2018-02-26-gif-it-getting-gifs-in-blogdown.html
  content/blog/2018-03-26-r4ds-april-challenge-time-for-some-spring-cleaning/2018-03-26-r4ds-april-challenge-time-for-some-spring-cleaning.html
  content/blog/2018-04-01-ymmv-non-profit-data-science/2018-04-01-ymmv-non-profit-data-science.html
  content/blog/2018-04-04-kaggle-panel-recap-my-data-science-journey/2018-04-04-kaggle-panel-recap-my-data-science-journey.html
  content/blog/2018-05-05-r4ds-may-challenge-sign-up-for-office-hours/2018-05-05-r4ds-may-challenge-sign-up-for-office-hours.html
  content/blog/2018-05-23-r4ds-june-challenge-summer-of-data-science-2018/2018-05-23-r4ds-june-challenge-summer-of-data-science-2018.html
  content/blog/2018-07-04-when-in-doubt-optimize-for-joy/2018-07-04-when-in-doubt-optimize-for-joy.html
  content/blog/2018-08-12-learning-to-learn-process-over-product/2018-08-12-learning-to-learn-process-over-product.html
  content/blog/2018-09-16-r4ds-v1-v2-a-retrospective/2018-09-16-r4ds-v1-v2-a-retrospective.html
  content/blog/2018-10-03-til-locf/2018-10-03-til-locf.html
  content/blog/2019-03-28-learning-to-learn-metacognition-and-the-coalesce-function/2019-03-28-learning-to-learn-metacognition-and-the-coalesce-function.html
  content/blog/2019-10-24-programming-parallels-a-lesson-in-empathy/2019-10-24-programming-parallels-a-lesson-in-empathy.html
  content/blog/2020-03-18-learning-machine-learning-personal-inventory-and-roadmap/2020-03-18-learning-machine-learning-personal-inventory-and-roadmap.html
  content/blog/2020-03-29-finding-your-just-right-learning-resources/2020-03-29-finding-your-just-right-learning-resources.html
  content/blog/2020-04-05-reach-for-good-enough/2020-04-05-reach-for-good-enough.html
  content/blog/2020-04-25-there-s-no-crying-in-data-science/2020-04-25-there-s-no-crying-in-data-science.html
  content/blog/2020-05-03-doesn-t-this-make-you-miss-dplyr-tho/2020-05-03-doesn-t-this-make-you-miss-dplyr-tho.html
  content/blog/2021-05-24-3-2-1-mario-kart-tidytuesday-unfiltered/index.html

  To fix, run blogdown::clean_duplicates(preview = FALSE).
○ Found 0 incompatible .html files to clean up.
| Checking for the unnecessary 'content/' directory in theme...
○ Great! Your theme does not contain the content/ directory.
― Check complete: Content
> blogdown::clean_duplicates(preview = FALSE)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[24] TRUE TRUE TRUE TRUE
> blogdown::check_content()
― Checking content files
| Checking for validity of YAML metadata in posts...
○ All YAML metadata appears to be syntactically valid.
| Checking for previewed content that will not be published...
○ Found 0 files with future publish dates.
○ Found 0 files marked as drafts.
| Checking your R Markdown content...
○ All R Markdown files have been knitted.
○ All R Markdown output files are up to date with their source files.
| Checking for .html/.md files to clean up...
○ Found 0 duplicate .html output files.
○ Found 0 incompatible .html files to clean up.
| Checking for the unnecessary 'content/' directory in theme...
○ Great! Your theme does not contain the content/ directory.
― Check complete: Content

I should note visual inspection shows only the md files left! One thing that does appear left behind are the index_file folders which contain like header-attrs left over from the html.

cderv added 2 commits May 26, 2021 18:18
[skip ci]
[skip ci]
@cderv cderv merged commit 00a2090 into master May 26, 2021
@cderv cderv deleted the clean-markdown-method branch May 26, 2021 16:19
@cderv
Copy link
Collaborator Author

cderv commented May 26, 2021

I should note visual inspection shows only the md files left! One thing that does appear left behind are the index_file folders which contain like header-attrs left over from the html.

Oh shoot you edited your comment... 🤕

Yes, index_files folder should maybe be deleted too if it exists. I don't think it is done with .Rmarkdown or .md duplicated file either.

So this would be a feature to add and test with the different scenario.

I am guessing you are needing this because you convert a blog to use another method ?

@apreshill
Copy link
Contributor

yes, I was trying to test a workflow for my Hugo Apero workshop to help convert over an existing academic site. Because of the choice I made with syntax highlighters, I was testing converting over a site (where the posts have no need for pandoc) to full markdown mode. I'll file to track!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] clean_duplicates with more control?
3 participants