Skip to content

Convert your LaTex (back) to Word Files

Wout Dillen edited this page Jan 7, 2025 · 2 revisions

Why do this?

Since moving to OpenEdition for the publication of Variants (issues 12/13 and up), our journal exists in three parallel forms: a web-version (e.g. that of Variants 14), an ePub version (e.g. this version of the same issue), and a PDF version (currently only for individual articles, like this PDF of Martinez et al. 2019). For Variants, we use a PDF-first workflow. This means that in practice, we first like to prepare everything in LaTeX (using this template), trying to make it look perfect in the resulting PDF. Then, we use that fine-tuned LaTeX file as a base text to produce the files that OpenEdition's publication platform requires for its web and ePub versions. At the moment, OpenEdition's platform can only process documents that are saved in the .doc format (using very specific Word Styles), or in the .xml format (following a specific TEI-XML stylesheet). In the future, it is our goal to automatically transform our LaTeX into XML using the command line tool LaTeXML, and then to write an XSLT stylesheet to transform the result into a valid TEI-XML document. But since we do not have such an XSLT stylesheet yet, we have to work with the .docx format to upload the journal's articles at the moment.

For Issues 12/13 and 14, these .docx files were prepared in parallel to the .tex files. But this workflow produced too much extra work, because any time a small change was made in one document, that same change had to be made in the parallel document. From Issue 15 onwards, we therefore tried to chain these steps in our workflow: to use the finished, proofread text of the LaTeX project as a basis for the .docx files we needed to upload to the OpenEdition platform. If you are in a similar situation, or have any other reason to turn your VariantX LaTeX files into Word documents (e.g. because your publisher only accepts Word files, and no PDFs), you can follow these steps to produce the same result, using the open source command line tool pandoc to manage the transformation.

The following will assume that you have a copy of your journal project (perhaps downloaded form Overleaf, like we do) on your local machine.

Step 1: Prepare the .tex files for transformation

pandoc doesn't work well with complex documents like our LaTeX project (where the main.tex file includes a bunch of articles), nor with all the custom environments we designed (for papers, reviews, etc.). Such customisation are great for producing nice looking PDF documents, but include a lot of redundant information for the online versions (that don't need headers, footers, etc.). So we have to simplify our articles a little before we can use pandoc to transform them into Word files.

1.1 Rename variantex.sty to variantex.tex

Instead of using our own document class, we will use the more straightforward \documentclass{article} and \include{} all the necessary packages in the preamble of our individual contribution .tex files. Since they are all listed in our .sty file, we can just change that extension to .tex to \include{} that file later (see 1.3.4 below).

1.2 Move your files to the root of your LaTeX project

It's the easiest if the files you're transforming are all in the root of your LaTeX project. So if your articles are in subfolders (such as /essays/ or /reviews/), move them into the root of your project first.

1.3 Simplify the preamble

The VariantX template uses a lot of custom classes and environments, that pandoc has trouble with. To solve these issues, we will simplify the preamble of our documents, and switch custom commands for more mainstream alternatives. In short, we will make the following changes:

  1. Get rid of the short forms for contribution metadata (\shortcontribution{} and \shortcontributor{}).
  2. Turn the long forms into mainstream alternatives (\title{} and \author{})
  3. Declare that we are using the \documentclass{article} (rather than KOMA-script's scrbook)
  4. Include any packages pandoc may need, by adding \include{variantex}.
  5. Make sure the author and title are also printed in the new Word document by running the \maketitle command inside the {document} environment

We do this by changing the preamble from something like this:

\contributor{Merisa Martinez, Wout Dillen, Elli Bleeker, Anna-Maria Sichani, and Aodhán Kelly}
\contribution{Refining our Conceptions of
Access in Digital Scholarly Editing: Reflections on a Qualitative Survey on Inclusive Design and Dissemination.}
\shortcontributor{Merisa Martinez et al.}
\shortcontribution{Refining our Conceptions of
Access}

into this:

\documentclass{article}
\include{variantex}
\author{Merisa Martinez[...]}
\title{Refining our Conceptions[...]}

And exchanging the contribution's custom environment (such as {paper} or {review}) into {document} instead. So this:

\begin{paper}

[...]

\end{paper}

Will turn into this:

\begin{document}
\maketitle

[...]

\end{document}

1.4 Remove \frame{} from begin{figure}

We have been using \frame{} to put a nice 1px black border around the images (which is especially useful for images with white backgrounds. But pandoc can’t process this, and decides to skip the images instead. Just removing \frame{} resolves this issue.

So, inside a \begin{figure} environment, something like this:

\frame{\includegraphics[width=\textwidth]{media/italia1.png}}    

Becomes simply:

\includegraphics[width=\textwidth]{media/italia1.png}

To do this automatically, you can use the following RegEx:

Find: \\frame\{\\inc(.*)\}\} Replace: \\inc$1}

You’ll want to use Replace, rather than Replace All, just to make sure you are finding the right lines of LaTeX.

1.5 Get rid of \begin{minipage} inside \begin{figure}

Sometimes, a \begin{figure} environment contains one or more \begin{minipage} environment(s), for example to keep subfigures in place for the PDF layout. Sadly, pandoc does not fully grasp what’s going on here, and so we have to convert what was one figure with several minipages into several figures. Otherwise, pandoc will produce faulty image captions, and won’t be able to properly convert the figure numbers.

This means that something like this:

\begin{figure}[H]
  \centering
  \begin{minipage}[b]{0.49\textwidth}
    \includegraphics[width=\textwidth]{media/escobar1.jpg}
    \caption{(BPL 20, fol. 15v).}
    \label{fig:escobar1}
  \end{minipage}
  \hfill
  \begin{minipage}[b]{0.48\textwidth}
    \includegraphics[width=\textwidth]{media/escobar2.jpg}
    \caption{(BPL, 20, fol. 15r).}
    \label{fig:escobar2}
  \end{minipage}
\end{figure}

Needs to be turned into something like this:

\begin{figure}[H]
  \centering
    \includegraphics[width=\textwidth]{media/escobar1.jpg}
    \caption{(BPL 20, fol. 15v).}
    \label{fig:escobar1}
  \end{figure}

  \begin{figure}
    \includegraphics[width=\textwidth]{media/escobar2.jpg}
    \caption{(BPL, 20, fol. 15r).}
    \label{fig:escobar2}
\end{figure}

The easiest way to do this is to Find all occurrences of minipage, and then (for all relevant cases, i.e., when they are inside a figure environment) delete the first \begin{minipage} and last \end{minipage} and switch all {minipage} to {figure} in between (also deleting all the options for minipage, and the \hfill while you’re at it).

When you’re done, do a Find for [b] to see if you forgot to take out any of the minipage options (which will prevent captions from displaying properly).

Don’t try to automate this, because some occurrences of minipage outside of figure environments don’t cause any issues.

When doing this manually, you may sometimes produce a typo while trying to type figure (quickly; over and over again). pandoc will alert you to this by saying that the document ended unexpectedly (which will stop it from completing the transformation). Have a close look at the error message: it will probably show you your typo, which you can then use to Find it and correct it. For example you may receive an error message like:

Error at "lorente.tex" (line 1213, column 2):
unexpected end of input
expecting \end{figiure}

 ^

In a case like this, try to Find the typo figiure and correct it to figure. This should fix the issue.

1.7 Fix figure numbers and their captions

Pandoc handles captions somewhat differently than our LaTeX to PDF compiler in Overleaf: for example, it does not automatically render figure numbers in the captions. Also, we often need to include more \figure{} elements in the ‘pre-pandoc-LaTeX’ than we do in the Overleaf version. For example to resolve the issue mentioned in 1.7, or when it’s easier to include a \table{} as a \figure{}, or when there are other more complex transcriptions etc. that can be visualised nicely with Overleaf, but that pandoc had fails to render properly.

As a result, it’s best to Find each \figure{} in the .tex file (after completing step 1.6 above), and fix the following issues where relevant.

1.7.1 Make sure \caption{} is in the right place

For pandoc to work properly, the \caption{} needs to follow the \includegraphics{} on the next line. If there is a \label{} (or anything else?) in between the two, the \caption{} will not be rendered in the resulting .docx file.

1.7.1 For images that should not be counted as ‘Figures’:

  1. Lift the \caption{} out of the \figure{}, placing it above the figure
  2. (Optional): Turn \caption{} into \textbf{} so that it pops out more later when you style the .docx file according to the Lodel styles
  3. If the ‘figure’ needs some other number (e.g. ’Table 1’), you can already add it to the ‘caption’.

For example, this:

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{media/lorente0.png}
  \caption{Number of plant descriptions in the six books of the printed edition and in the manuscript}
  \label{fig:lorente0}
\end{figure}

Should be turned into something like:

\textbf{Table 1: Number of plant descriptions in the six books of the printed edition and in the manuscript}

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{media/lorente0.png}
  \label{fig:lorente0}
\end{figure}

This will make sure that the figure numbers in the PDF line up with those in the web version.

1.7.3 For images that should be counted as ‘Figures’:

  1. Make sure that the \caption{} is positioned after \includegraphics{}, and before \label{} (otherwise, it will not be rendered)
  2. Add the figure number to the caption, by pointing to the \label{} in a \ref{}

So, for example, something like this:

\begin{figure}[H]
  \centering
  \includegraphics[width=\textwidth]{media/lorente2.jpg}
    \caption{47v stained with red ink.}
    \label{fig:lorente2}
  \end{figure}

Should become:

\begin{figure}[H]
  \centering
  \includegraphics[width=\textwidth]{media/lorente2.jpg}
    \caption{Figure \ref{fig:lorente2}: 47v stained with red ink.}
    \label{fig:lorente2}
  \end{figure}

1.8 Turn \footnotemark constructions into regular \footnote{}s

Sometimes, we use a combination of \protect\footnotemark and \footnotetext{} in order to split up the footnote and the text, so that we can still use footnotes inside \caption{} etc.

This works well for LaTeX, which has issues turning \footnote{} inside \caption{} into a PDF, but it does not work for pandoc. Thankfully, pandoc does not have an issue processing caption{\footnote{a footnote}} when performing a .tex. to .docx conversion, so just turning our workaround into regular footnotes should do the trick. For example, something like:

\begin{figure}
    \centering
    \includegraphics[width=.75\textwidth]{media/defenu3-new.png}
    \caption{Fernando Pessoa's ``Hora Absurda'' manuscript; line 50.\protect\footnotemark} 
    \label{fig:defenu3-new}
\end{figure}

\footnotetext{My footnote.}

Should become:

\begin{figure}
    \centering
    \includegraphics[width=.75\textwidth]{media/defenu3-new.png}
    \caption{Fernando Pessoa's ``Hora Absurda'' manuscript; line 50.\footnotetext{My footnote.}} 
    \label{fig:defenu3-new}
\end{figure}

The easiest way to do this is to delete the letters text from \footnotetext{My footnote.}, to turn it into \footnote{My footnote.}, and then copy-paste that line onto the corresponding \protect\footnotemark to replace it.

Step 2: execute the transformation command

Now, we should be ready to use pandoc to transform the .tex file into a .docx file. In the command line, cd into the root of your LaTeX folder, that now contains the adapted .tex files, and enter the following command (exchanging martinez with the name of the relevant document):

pandoc martinez.tex -o martinez.docx

Including external .bib-liographies

If you are using BibTeX to manage your references, pandoc will not find the bibliography file you've referenced in the \bibliography{} command. You can make this work by adding the path to the relevant .bib file into the transformation command as such:

pandoc martinez.tex --bibliography=references/martinez.bib -o martinez.docx

Using BibLaTeX (instead of BibTeX)

Since 18 December 2024 VarianTeX switched from BibTeX to the more versatile BibLaTeX (which will be a an integral part of the template’s upcoming 3.0 release). This has helped us simplify the formatting of our references immensely. But it does require us to add a couple more flags to our pandoc command to make the external referencing work. This means that the command should now read:

pandoc martinez.tex --biblatex --bibliography=references/martinez.bib -o martinez.docx --citeproc

When pandocCould not convert TeX Math

Sometimes we need to include TeX Math in our contributions, e.g. to write mathematical formulas, or to use symbols for more complex tables. When this happens, try to save the ‘math’ as an image, and include it with \includegraphics{} instead.