-
Notifications
You must be signed in to change notification settings - Fork 2
Convert your LaTex (back) to Word Files
Since moving to OpenEdition for the publication of Variants (issues 12/13 and up), our journal exists in three parallel forms: a web-version (e.g. that of Variants 14), an ePub version (e.g. this version of the same issue), and a PDF version (currently only for individual articles, like this PDF of Martinez et al. 2019). For Variants, we use a PDF-first workflow. This means that in practice, we first like to prepare everything in LaTeX (using this template), trying to make it look perfect in the resulting PDF. Then, we use that fine-tuned LaTeX file as a base text to produce the files that OpenEdition's publication platform requires for its web and ePub versions. At the moment, OpenEdition's platform can only process documents that are saved in the .doc
format (using very specific Word Styles), or in the .xml
format (following a specific TEI-XML stylesheet). In the future, it is our goal to automatically transform our LaTeX into XML
using the command line tool LaTeXML, and then to write an XSLT
stylesheet to transform the result into a valid TEI-XML document. But since we do not have such an XSLT stylesheet yet, we have to work with the .docx
format to upload the journal's articles at the moment.
For Issues 12/13 and 14, these .docx
files were prepared in parallel to the .tex
files. But this workflow produced too much extra work, because any time a small change was made in one document, that same change had to be made in the parallel document. From Issue 15 onwards, we therefore tried to chain these steps in our workflow: to use the finished, proofread text of the LaTeX project as a basis for the .docx
files we needed to upload to the OpenEdition platform. If you are in a similar situation, or have any other reason to turn your VariantX LaTeX files into Word documents (e.g. because your publisher only accepts Word files, and no PDFs), you can follow these steps to produce the same result, using the open source command line tool pandoc to manage the transformation.
The following will assume that you have a copy of your journal project (perhaps downloaded form Overleaf, like we do) on your local machine.
pandoc
doesn't work well with complex documents like our LaTeX project (where the main.tex
file includes a bunch of articles), nor with all the custom environments we designed (for papers, reviews, etc.). Such customisation are great for producing nice looking PDF documents, but include a lot of redundant information for the online versions (that don't need headers, footers, etc.). So we have to simplify our articles a little before we can use pandoc
to transform them into Word files.
Instead of using our own document class, we will use the more straightforward \documentclass{article}
and \include{}
all the necessary packages in the preamble of our individual contribution .tex
files. Since they are all listed in our .sty
file, we can just change that extension to .tex
to \include{}
that file later (see 1.3.4 below).
It's the easiest if the files you're transforming are all in the root of your LaTeX project. So if your articles are in subfolders (such as /essays/
or /reviews/
), move them into the root of your project first.
The VariantX template uses a lot of custom classes and environments, that pandoc
has trouble with. To solve these issues, we will simplify the preamble of our documents, and switch custom commands for more mainstream alternatives. In short, we will make the following changes:
- Get rid of the short forms for contribution metadata (
\shortcontribution{}
and\shortcontributor{}
). - Turn the long forms into mainstream alternatives (
\title{}
and\author{}
) - Declare that we are using the
\documentclass{article}
(rather than KOMA-script'sscrbook
) - Include any packages
pandoc
may need, by adding\include{variantex}
. - Make sure the author and title are also printed in the new Word document by running the
\maketitle
command inside the {document} environment
We do this by changing the preamble from something like this:
\contributor{Merisa Martinez, Wout Dillen, Elli Bleeker, Anna-Maria Sichani, and Aodhán Kelly}
\contribution{Refining our Conceptions of
Access in Digital Scholarly Editing: Reflections on a Qualitative Survey on Inclusive Design and Dissemination.}
\shortcontributor{Merisa Martinez et al.}
\shortcontribution{Refining our Conceptions of
Access}
into this:
\documentclass{article}
\include{variantex}
\author{Merisa Martinez[...]}
\title{Refining our Conceptions[...]}
And exchanging the contribution's custom environment (such as {paper}
or {review}
) into {document}
instead. So this:
\begin{paper}
[...]
\end{paper}
Will turn into this:
\begin{document}
\maketitle
[...]
\end{document}
We have been using \frame{}
to put a nice 1px black
border around the images (which is especially useful for images with white backgrounds. But pandoc
can’t process this, and decides to skip the images instead. Just removing \frame{}
resolves this issue.
So, inside a \begin{figure}
environment, something like this:
\frame{\includegraphics[width=\textwidth]{media/italia1.png}}
Becomes simply:
\includegraphics[width=\textwidth]{media/italia1.png}
To do this automatically, you can use the following RegEx:
Find: \\frame\{\\inc(.*)\}\}
Replace: \\inc$1}
You’ll want to use Replace, rather than Replace All, just to make sure you are finding the right lines of LaTeX.
Sometimes, a \begin{figure}
environment contains one or more \begin{minipage}
environment(s), for example to keep subfigures in place for the PDF layout. Sadly, pandoc
does not fully grasp what’s going on here, and so we have to convert what was one figure
with several minipage
s into several figures. Otherwise, pandoc
will produce faulty image captions, and won’t be able to properly convert the figure numbers.
This means that something like this:
\begin{figure}[H]
\centering
\begin{minipage}[b]{0.49\textwidth}
\includegraphics[width=\textwidth]{media/escobar1.jpg}
\caption{(BPL 20, fol. 15v).}
\label{fig:escobar1}
\end{minipage}
\hfill
\begin{minipage}[b]{0.48\textwidth}
\includegraphics[width=\textwidth]{media/escobar2.jpg}
\caption{(BPL, 20, fol. 15r).}
\label{fig:escobar2}
\end{minipage}
\end{figure}
Needs to be turned into something like this:
\begin{figure}[H]
\centering
\includegraphics[width=\textwidth]{media/escobar1.jpg}
\caption{(BPL 20, fol. 15v).}
\label{fig:escobar1}
\end{figure}
\begin{figure}
\includegraphics[width=\textwidth]{media/escobar2.jpg}
\caption{(BPL, 20, fol. 15r).}
\label{fig:escobar2}
\end{figure}
The easiest way to do this is to Find all occurrences of minipage
, and then (for all relevant cases, i.e., when they are inside a figure
environment) delete the first \begin{minipage}
and last \end{minipage}
and switch all {minipage}
to {figure}
in between (also deleting all the options for minipage
, and the \hfill
while you’re at it).
When you’re done, do a Find for [b]
to see if you forgot to take out any of the minipage
options (which will prevent captions from displaying properly).
Don’t try to automate this, because some occurrences of minipage
outside of figure
environments don’t cause any issues.
When doing this manually, you may sometimes produce a typo while trying to type figure
(quickly; over and over again). pandoc
will alert you to this by saying that the document ended unexpectedly (which will stop it from completing the transformation). Have a close look at the error message: it will probably show you your typo, which you can then use to Find it and correct it. For example you may receive an error message like:
Error at "lorente.tex" (line 1213, column 2):
unexpected end of input
expecting \end{figiure}
^
In a case like this, try to Find the typo figiure
and correct it to figure
. This should fix the issue.
Pandoc handles captions somewhat differently than our LaTeX to PDF compiler in Overleaf: for example, it does not automatically render figure numbers in the captions. Also, we often need to include more \figure{}
elements in the ‘pre-pandoc-LaTeX’ than we do in the Overleaf version. For example to resolve the issue mentioned in 1.7, or when it’s easier to include a \table{}
as a \figure{}
, or when there are other more complex transcriptions etc. that can be visualised nicely with Overleaf, but that pandoc
had fails to render properly.
As a result, it’s best to Find each \figure{}
in the .tex
file (after completing step 1.6 above), and fix the following issues where relevant.
For pandoc
to work properly, the \caption{}
needs to follow the \includegraphics{}
on the next line. If there is a \label{}
(or anything else?) in between the two, the \caption{}
will not be rendered in the resulting .docx
file.
- Lift the
\caption{}
out of the\figure{}
, placing it above the figure - (Optional): Turn
\caption{}
into\textbf{}
so that it pops out more later when you style the.docx
file according to the Lodel styles - If the ‘figure’ needs some other number (e.g. ’Table 1’), you can already add it to the ‘caption’.
For example, this:
\begin{figure}
\centering
\includegraphics[width=\textwidth]{media/lorente0.png}
\caption{Number of plant descriptions in the six books of the printed edition and in the manuscript}
\label{fig:lorente0}
\end{figure}
Should be turned into something like:
\textbf{Table 1: Number of plant descriptions in the six books of the printed edition and in the manuscript}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{media/lorente0.png}
\label{fig:lorente0}
\end{figure}
This will make sure that the figure numbers in the PDF line up with those in the web version.
- Make sure that the
\caption{}
is positioned after\includegraphics{}
, and before\label{}
(otherwise, it will not be rendered) - Add the figure number to the caption, by pointing to the
\label{}
in a\ref{}
So, for example, something like this:
\begin{figure}[H]
\centering
\includegraphics[width=\textwidth]{media/lorente2.jpg}
\caption{47v stained with red ink.}
\label{fig:lorente2}
\end{figure}
Should become:
\begin{figure}[H]
\centering
\includegraphics[width=\textwidth]{media/lorente2.jpg}
\caption{Figure \ref{fig:lorente2}: 47v stained with red ink.}
\label{fig:lorente2}
\end{figure}
Sometimes, we use a combination of \protect\footnotemark
and \footnotetext{}
in order to split up the footnote and the text, so that we can still use footnotes inside \caption{}
etc.
This works well for LaTeX, which has issues turning \footnote{}
inside \caption{}
into a PDF, but it does not work for pandoc
. Thankfully, pandoc
does not have an issue processing caption{\footnote{a footnote}}
when performing a .tex.
to .docx
conversion, so just turning our workaround into regular footnotes should do the trick. For example, something like:
\begin{figure}
\centering
\includegraphics[width=.75\textwidth]{media/defenu3-new.png}
\caption{Fernando Pessoa's ``Hora Absurda'' manuscript; line 50.\protect\footnotemark}
\label{fig:defenu3-new}
\end{figure}
\footnotetext{My footnote.}
Should become:
\begin{figure}
\centering
\includegraphics[width=.75\textwidth]{media/defenu3-new.png}
\caption{Fernando Pessoa's ``Hora Absurda'' manuscript; line 50.\footnotetext{My footnote.}}
\label{fig:defenu3-new}
\end{figure}
The easiest way to do this is to delete the letters text
from \footnotetext{My footnote.}
, to turn it into \footnote{My footnote.}
, and then copy-paste that line onto the corresponding \protect\footnotemark
to replace it.
Now, we should be ready to use pandoc
to transform the .tex
file into a .docx
file. In the command line, cd
into the root of your LaTeX folder, that now contains the adapted .tex
files, and enter the following command (exchanging martinez
with the name of the relevant document):
pandoc martinez.tex -o martinez.docx
If you are using BibTeX to manage your references, pandoc
will not find the bibliography file you've referenced in the \bibliography{}
command. You can make this work by adding the path to the relevant .bib
file into the transformation command as such:
pandoc martinez.tex --bibliography=references/martinez.bib -o martinez.docx
Since 18 December 2024 VarianTeX switched from BibTeX to the more versatile BibLaTeX (which will be a an integral part of the template’s upcoming 3.0 release). This has helped us simplify the formatting of our references immensely. But it does require us to add a couple more flags to our pandoc
command to make the external referencing work. This means that the command should now read:
pandoc martinez.tex --biblatex --bibliography=references/martinez.bib -o martinez.docx --citeproc
Sometimes we need to include TeX Math in our contributions, e.g. to write mathematical formulas, or to use symbols for more complex tables. When this happens, try to save the ‘math’ as an image, and include it with \includegraphics{}
instead.