Skip to content

Converting Sweave to Rmarkdown using texor package

Madawa Samarathunga edited this page Apr 1, 2024 · 7 revisions

Background

In support of reproducible research, Sweave stands as a package within R designed to generate LaTeX/PDF documents containing executable code snippets. Serving as the precursor to R Markdown, Sweave was instrumental in crafting vignettes and research papers for the Journal of Statistical Software (JSS). The proposal here aims to expand the texor package by incorporating features to convert Sweave (.Rnw) files into R Markdown effectively, while preserving the integrity of executable code segments.

Related work

Over the past two years of Google Summer of Code (GSoC), significant efforts have been made to convert LaTeX-based R Journal articles into R Markdown and then HTML. The texor package offers numerous functions and tools dedicated to transitioning LaTeX articles into Markdown/R Markdown format.

Within the Bioconductor community, several packages have switched from Sweave to R Markdown vignettes, courtesy of Outreachy interns who have engaged in a similar conversion project "sweave2rmd". However, it's worth noting that their approach has limitations in terms of the extent of automation and the utilization of Pandoc. The guide on how to contribute to sweave2rmd describes an iterative process of manual editing and review following initial automated processing, while the helper package Rnw2Rmd is acknowledged to occasionally fail requiring even more manual editing. Some particular limitations of the automated processing are acknowledged in the help file for Rnw2Rmd::Rnw2RmdPandoc.

Details of your coding project

During the project period there are several specific goals that we aim to achieve.

  1. Develop a custom pandoc reader in Lua or use another suitable alternative, to convert the initial Sweave file CodeChunks while preserving sweave parameters enclosed in <<..>>= to Rmarkdown CodeChunks.

  2. Integrate the conversion workflow in texor package.

  3. Enhance and optimize the existing functions and methods in the texor package.

  4. Thoroughly test and document the entire conversion process.

As a contributor, you have the flexibility to propose and contribute any changes or additional features you find valuable to the project.

Expected impact

This project holds the potential to greatly assist numerous R packages that currently utilize Sweave-based vignettes by facilitating their migration to R Markdown. A particular target of this project will be packages that have used the Journal of Statistical Software template to create package vignettes - there are over 200 such examples on CRAN. These articles are expected to be similar to R Journal articles in the sense of making heavy use of LaTeX markup for mathematics, algorithms, tables, etc, that would benefit from the functionality of texor. Therefore, outputs from this project could prove valuable to journals or authors that wish to produce HTML versions of articles created with Sweave.

Mentors

  • EVALUATING MENTOR: Heather Turner heather.turner@r-project.org is a past editor of the R Journal, who contributed to its infrastructure during her time on the editorial board. She is author of several CRAN packages, notably the statistical modelling packages gnm, BradleyTerry2 and PlackettLuce. She has successfully mentored GSoC projects in 2021,2022 and 2023.
  • Abhishek Ulayil abhishek.ulayil.m@gmail.com has been an active Google Summer of Code (GSoC) contributor for the past two years, during which he has successfully published R packages such as texor and rebib. These packages serve as valuable tools, offering functionality for the conversion of LaTeX articles to R Markdown.
  • Dianne Cook dicook@monash.edu is a past editor of the R Journal. She is co-author of numerous R packages including rjtools, tourr, Gally, brolgar, woylier, ferrn. Some of these packages were originally developed as GSoC projects.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

  • Easy: Select any sub-directory apart form bibliography and lua-filters in the supplementary materials on the texor-rjarticle repository. Open the example LaTeX source file (not the wrapper RJwrapper.tex), add the phrase "TEST BY " at the start of the abstract (substituting your name for the placeholder!) and save. Follow these instructions to convert the LaTeX article to HTML. Compare the PDF and HTML versions side by side to understand the conversion process of the LaTeX articles. Include the generated Rmd file as the outcome of the test.

  • Medium: Read the supplementary article on Lua Filters. This introduces the image_number_filter.lua filter, which may be used with pandoc as follows:

    pandoc example.md --from markdown --to html5 --output filtered-example.html --lua-filter image_numbering_filter.lua

    Write an R function that takes an example Markdown file with multiple figures and converts it to HTML optionally with or without using image_number_filter.lua. Document the function using Roxygen2 notation. Write an R script and example Markdown file to demonstrate the use of your function. HINT: you may use rmarkdown::pandoc_convert.

  • Hard: Write a Custom Pandoc Reader in Lua to just extract code chunks from Sweave files and treat them as CodeBlocks. You can clone some example sweave files from the knitr-examples repository to test and demonstrate your reader. HINT: the plain text reader example can give a starting point, while the wiki Creole reader gives an example of defining how CodeBlocks should be read.

References: Sweave usage, Custom Reader reference and Lpeg reference.

Solutions of tests

Contributors, please post a link to your test results here.

  • EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
Contributor Name GitHub Profile Test Results
Yinxiang Huang https://github.com/fzyxh Easy/Medium/Hard test results
Madawa Samarathunga https://github.com/MadawaSamarathunga Test Results