Nice and clean formatting of LaTeX Table Sourcecode

Format LaTeX Table Sourcecode

The sourcecode of LaTeX tables tends to be a bit messy and difficult to read. This is particularly obvious when tables are generated by Stata’s Estout, e.g. something like:

&\multicolumn{6}{c}{Car type}
Repair Record 1978&\multicolumn{2}{c}{Domestic}&\multicolumn{2}{c}{Foreign}&\multicolumn{2}{c}{Total}\\
 & N& Row Pct& N& Row Pct& N& Row Pct\\
Result 1 & 2& 100.00& 0& 0.00& 2& 100.00\\
Result 2 & 8& 100.00& 0& 0.00& 8& 100.00\\

It is pretty impossible to understand the information of this table by just looking at the sourcecode. The LaTeX Editor WinEDT offers the TeXtab Plugin that formats the sourcecode much nicer so that each column is clearly separated, but you have to manually select the content of the table that you want to format. The result looks something like this:

                   & \multicolumn{6}{c}{Car type}
Repair Record 1978 & \multicolumn{2}{c}{Domestic} & \multicolumn{2}{c}{Foreign} & \multicolumn{2}{c}{Total} \\
                   & N                            & Row Pct                     & N                         & Row Pct & N & Row Pct \\
Result 1           & 2                            & 100.00                      & 0                         & 0.00    & 2 & 100.00  \\
Result 2           & 8                            & 100.00                      & 0                         & 0.00    & 8 & 100.00  \\

A Batch File for automatic Formatting

I want to integrate the sourcecode formatting directly into my Stata-Estout routine so that all tables are automatically formatted. You can do this in a batch file in the following steps:

  1. Download the TeXtab plugin from the WinEDT homepage including the APL2 runtime module. Extract the archive and install the runtime module (“apl2wr20.exe”). A restart may be required.
  2. The file “textfosp.ans” is the script that does the actual formatting. Copy it to an obvious folder, e.g. “C:\tableformat” (if you use another folder you need to adjust it in the Batch file below).
  3. Create an empty Batch file (e.g. tableformat.bat) and copy the following lines into it:
    DIR *.tex /b > tablelist.lst
    for /F "tokens=*" %%A in (tablelist.lst) do (
     "apl2run.exe" -rns LxTxFo:"C:\tableformat\textfosp" -input "'3 & \\ -- \&' '%%A' 'xX'"
    REM del *.B*K
    REM del *.err
    REM del B*K.*
  4. Copy the Batch file to the folder where your tables are stored. For example, all my Stata-Estout tables are generated to the folder “Project A\Stata\tables” and this folder only contains the Estout tables.

The batch does the following:

  • The textfosp script does not allow multiple file inputs (as far as I know), hence the first step is to create a list of all TeX files in that particular folder (“tablelist.lst”).
  • Next we loop through the list of tables line by line, i.e. each table in the list is formatted by the script.
  • The script generates some backup files. The last three lines delete those backup files, but as I don’t want to be responsible for any data loss I have commented them (remove “REM” if you want to delete the backups).

Call the Script in Stata

To execute the script from Stata run the following after you created the tables:

winexec tableformat.bat

Optional: If you are annoyed by the popping up of the command line interface when the batch is executed create a *.vbs file with the following content:

Set WshShell = CreateObject("WScript.Shell" ) 
WshShell.Run chr(34) & "tableformat.bat" & Chr(34), 0 
Set WshShell = Nothing

And run from Stata with:

winexec wscript.exe tableformat.vbs

PS: If you know a better way, e.g. with regular expressions and something like SED, let me know.

Stata and LaTeX ‘Longtables’ – A solution is coming

Due to popular demand I am currently working on a way to implement the Stata/LaTeX workflow with long tables, i.e. tables spanning over multiple pages. It looks like almost every other question I get is about long tables, so I am finally giving in…

Good news: it works, in principle.

Bad news: Things like table notes and a “one-size-fit-all” code as before is not so easy to implement, hence I need some time to think about use cases. In any case, long tables will definetly require more user interaction than normal tables.

I will also start working on a unifying documentation that is going to provide the code and examples of everything that I have covered so far, including issues that came up in the comments.

Have a good Christmas time, everyone!



LaTeX and Stata integration (3): Improving the Design

If you followed my previous posts regarding automated Stata and LaTeX integration you might already have a good idea how estout works and how a table can be printed in LaTeX in an aesthetically pleasing way. This post is about improving the print quality in LaTeX even further. If you are new to LaTeX and Stata integration please read my introductory post and my follow up post that solves some problems.

The Problem

You might have encountered two problems following the instructions of my previous posts:

  1. You generate hundreds of overfull hbox warnings in LaTeX if you decimal-align the results.
  2. If you use different math- and text-fonts, symbols in the table are set in the math-font, which can look ugly.

Both are relatively serious problems. The first occurs because siunitx, the package we use to decimal-align the values requires precise information about how much space there is in each column before and after the decimal limiter. Consider the table on the right: we have a maximum of 2 characters on the left side of the decimal limiter (minus + one integer) and a maximum of 5 characters on the right side (two decimals + three stars). Hence, we need to specify this in siunitx by S[table-format=-1.5] to avoid overfull hbox warnings. But I don’t like this, because then we are not able to create really compact tables and furthermore it can happen that the column headline is not perfectly centered relative to the integer and digits.

The second problem occurs because siunitx sets symbols that are specified as input-symbols (i.e. they do not interfere with the decimal-alignment) in math-mode. If you use a complete font such as Latin Modern you won’t notice this because both text- and math-font look the same. But if you use distinct fonts, the result will be as the one in the example above: minus symbol and brackets are set in “Euler”, a very pretty math-font, but the rest is set in Linux Libertine, my favourite free LaTeX font. Obviously, this result is not optimal.

The Solution

It is easiest when you download the Sample Document (4506 downloads)  and the Sample Table (3829 downloads)  used to generate the example to see how the solution works. I load the following fonts:

\usepackage{libertine}% Linux Libertine, may favourite text font
\usepackage[euler-digits]{eulervm}% A pretty math font

In the next lines we create the well-known \sym command and load and customise siunitx, now simpler than in my previous posts.

% *****************************************************************
% siunitx
% *****************************************************************
\newcommand{\sym}[1]{\rlap{#1}}% Thanks David Carlisle

		group-digits		= false,
		input-symbols		= ( ) [ ] - +,
		table-align-text-post	= false,
		input-signs             = ,

Now begins the hacki bit: as I mentioned before I don’t like to reserve space for all characters in siunitx. What I want to do is tell siunitx only how many integers and decimals there are – I don’t care about minuses, brackets or stars. Apart from being simpler, it also has the advantage of being able to create really compact tables and also to make sure that the column title is centered relatve to the numbers. If you reserve space for all characters, the title might look offset, because it is centered relative to all characters, while “the eye” focuses on the number to create an aligning point.

The solution to the first problem, avoiding overfull hbox warnings, is to tell LaTeX not to reserve any space for minusses, brackets and stars. The \llap{...} and \rlap{...} commands will do this, we just need to tell LaTeX to wrap all symbols in the table in those commands. The solution to the second problem is to substitute the specific characters that are set in math-mode (in my example minusses and brackets) with their text-mode equivalents. This is done in the following lines:

% Character substitution that prints brackets and the minus symbol in text mode and does not reserve any space. Thanks to David Carlisle
  \mathcode\expandafter`\string-"8000 }

\mathcode\expandafter`\string#1"8000 }

\mathcode\expandafter`\string#1"8000 }


Here we create a new command \textsymbols that incorporates all issues discussed above. To make sure that our tables are printed correctly, we have to adapt the \estwide and \estauto commands accordingly. All that has changed is the added \textsymbols command.

		\textsymbols% Note the added command here

		\textsymbols% Note the added command here

Finally, the command to print the table in LaTeX:

    \caption{Table with Better Notes and Better Symbols}
    \Figtext{Some basic text about the table.}
    \Fignote{With `threeparttables' even long notes don't get wider than the table. The result is much more typographically pleasing.}
    \Figsource{We good the data from here.}

Note the specification of the S-column, where we only specify the maximum number of integers and decimals, and we fix the column-width to 20mm: S[table-format=1.2,table-column-width=20mm].

The result is a pretty table without any LaTeX warnings and where the minus and brackets are set in the correct font.

In the next post we go a bit further: If you are a fan of typography you might use old-style figures, however, it is not recommended to use those in tables. Using Open-Type fonts and XeLaTeX and LuaLaTeX we can print numbers in the text in old-style fonts, but numbers in the table in lining numbers.

Issues with Amsmath

If you load the amsmath package the above solution will not work. You need to add the following lines after you load the package (thanks to David Carlisle for this and many other problems that he helped to solve!):

    \expandafter\@tempb\meaning\@tempa \relax
  \ht\Mathstrutbox@\ht\z@ \dp\Mathstrutbox@\dp\z@

 Downloads & Links

LaTeX and Stata integration (2): Solving some problems

I have noticed a definite increase in the number of questions I receive regarding my Stata and LaTeX integration post (maybe deadlines are approaching?). I guess it’s a good idea to address some of these questions in a new post and also show some changes that improve the LaTeX code.

Wrapping of column titles

Bert suggests an alternative to my \specialcell command to wrap column headings. The \specialcell requires to set the line break manually, which some might find a bit tedious. He suggest estouts prefix and suffix option to insert a minipage of a fixed width. The idea  is that, by telling LaTeX exactly how wide the column is, it can automatically wrap the text accordingly. The estout command would look something like

mlabels(titles prefix("\begin{minipage}{0.5in}") suffix("\end{minipage}"))

I have tried this and it works, but I am not completely satisfied with the result for two reasons: First, I find that an automatic line-break might be in places where I don’t want it and the column titles might be to close to each other if it is a table with lot’s of information. Second, my preference is to avoid putting LaTeX design code into estout as much as possible. I just prefer to change the layout in LaTeX.

However, it is a good idea to adjust the column-width manually if you are not filling the table to the textwidth. Fortunately, this is extremely easy with siunitx, all you have to do is add the following to your S-Column: table-column-width=20mm, which fixes the width of each data column to 20mm. E.g., the full code for the table would look like:

\caption{Table with decimal alignment and fixed column width}
\fignote{A little note below the text}

Notes that have the same width as the table

Another suggestion from Bert (thank you!). Estout has a serious bug that stretches the first column of the table when you use estout’s note function. This is why I created the custom \fignote, \figsource, \fignote and \starnote commands that just add a custom caption below the table. This has one big weakness, as the picture shows: a long note is set as long as the page dimension allows. This does not look very nice, it would be better if the note would be as wide as the table.

Fortunately, this can be easily solved with the threeparttable package. The package uses different commands for notes that set them at the exact width of the table. Hence, we need to create new custom note commands: \Figtext, \Fignote, \Fixsoure and \Starnote(note the capitalisation!). See the code below to see how the table has been generated. Looks nicer, doesn’t it? Don’t forget to add \usepackage{threeparttable} to your preamble.

\usepackage{threeparttable}% Alternative for Notes below table

% Note/Source/Text after Tables

\newcommand{\Starnote}{\Figtext{* p < 0.1, ** p < 0.05, *** p < 0.01. Standard errors in parentheses.}}% Add significance note with \starnote


\caption{Table with better notes and decimal alignment}
\Fignote{This is a very long row. As you can see it is longer than the actual width of the table. This does not look very nice.}


Problem with caption position

Maria has the problem that the caption is “too high” and cuts in to the table. When I generated the note-commands I used the scrartcl class and adjusted margins accordingly. If you use something like article then all notes will be too high.

That’s easily solved: You can either use the new approach to create notes with the threeparttable package (see above). Or, if you prefer the “caption” approach just adjust the vspace that I insert. For the article class you may comment the vspace completely, and everything looks fine.

 %\vspace{-1.9ex}% Comment here or adjust accordingly
 \caption*{\hspace{6pt}\hangindent=1.5em #1}

Ugly significance stars

I made a mistake in the original post and set the significance stars in mathmode. This sometimes causes the significance stars to be too large and not raised. To solve just change the \sym command to the one below (before #1 was wrapped in $#1$ which sets stars into mathmode).


 Print very long tables on several pages

Torben has a problem with very long descriptive tables that do not fit on one page. There are two ways to solve this in LaTeX: the longtable package or the longtabu environment from the tabu package. The tabu package is a bit more modern and flexible, so this might be the way to go.

I have been playing around for a bit, but it is a bit more complex than I anticipated. So, unfortunately, I cannot provide a solution until I encounter a long table myself :(. If anyone has a suggestion, I am happy to hear it!

Further enhancements

In another post I will write something about improving the typography of the table further. This involves some character substitution so that brackets ( ) and the minus sign do not cause any overfull hbox warnings in LaTeX.


Automated Table generation in Stata and integration into LaTeX (1)

I use estout to generate tables of summary statistics and regression results that can be easily imported into LaTeX. The advantage is that the whole system is dynamic. If you change something in your do-file (e.g. you omit a particular group), you don’t have to change anything: the results get automatically updated in LaTeX. That has saved me a lot of time, but setting everything up took a long long time. So hopefully this post is helpful for aspiring applied econometricians who want to automate output reporting from Stata into LaTeX.

I think it is easiest to explain everything with examples, for which I use some tables from my current working paper. First install estout in Stata (ssc install estout), then we can jump right into the examples. I explain three things: creating tables for 1. descriptive statistics and 2. regression output and 3. putting everything into LaTeX. Ready? Let’s start.

Edit: Have a look at my follow-up post if you encounter any problems!
Edit 2: Another follow-up to improve the design.

1. Descriptive Statistics

The principle of estout is simple: you run a command in Stata that generates some statistics, you tell estout to (temporarily) store those results and then you create a table.

Consider Table 2, which is simply a bunch of summarised variables, split into three categories. You cannot see it, but these are actually three tables appended together. Why? Because the first part (Age to Housing) is percentages and has therefore 2 decimal points, the second part (Household Finances) are income variables with 0 decimal points, and the last part has 2 decimal points again. The complete code that generates this table is then:

* TABLE 2 

estpost su $dem if coholder100==1
est store A

estpost su $dem if coholder500==1
est store B

estpost su $dem if coholder1500==1
est store C

	esttab A B C using table2.tex, replace ///
		refcat(age18 "\emph{Age}" male "\emph{Demographics}" educationage "\emph{Education}" employeddummy "\emph{Employment}" oowner "\emph{Housing}", nolabel) ///
		mtitle("> \pounds100" "> \pounds500" "> \pounds1500") ///
		cells(mean(fmt(2))) label booktabs nonum collabels(none) gaps f noobs

estpost su $fin if coholder100==1
est store A

estpost su $fin if coholder500==1
est store B

estpost su $fin if coholder1500==1
est store C

	esttab A B C using table2.tex, append ///
		refcat(hhincome "\emph{Household Finances}", nolabel) ///
		nomtitles ///
		cells(mean(fmt(0))) label booktabs nonum f collabels(none) gaps noobs plain

estpost su $risk if coholder100==1
est store A

estpost su $risk if coholder500==1
est store B

estpost su $risk if coholder1500==1
est store C

	esttab A B C using table2.tex, append ///
		nomtitles ///
		refcat(redundant "\emph{Income and Expenditure Risk}" literacyscore "\emph{Behavioural Characteristics}", nolabel) ///
		stats(N, fmt(%18.0g) labels("\midrule Observations")) ///
		cells(mean(fmt(2))) label booktabs nonum f collabels(none) gaps plain

Some little explanation (for a full list of commands see the estout/esttab manual)

  • refcat includes a heading for a group of variables
  • mtitle specifies the columns heading
  • cells specifies the cell content (in the first part “mean” with 2 decimal place)
  • f creates a fragment of the table, i.e. only the table content is exported to the .tex (see below for more information)

2. Regression Results

Reporting regression results is not as simple, but we are jumping right into a fairly complicated example (Table 4).

As you can see, we are reporting coefficients, standard errors and marginal effects, hence each specification has two columns and two rows. At the bottom we add a few additional statistics (estout can add every statistics that is saved in the e() matrix).

This table is generated by the following code:


quietly probit coholder $dem $fin $bev $risk
	predict pr2_coholder, pr
	quietly su pr2_coholder
	estadd scalar pr = r(mean)
	estadd margins, dydx(*) atmeans

est store A

quietly probit coholder500 $dem $fin $bev $risk
	predict pr2_coholder500, pr
	quietly su pr2_coholder500
	estadd scalar pr = r(mean)
	estadd margins, dydx(*) atmeans

est store B

quietly probit coholder1500 $dem $fin $bev $risk
	predict pr2_coholder1500, pr
	quietly su pr2_coholder1500
	estadd scalar pr = r(mean)
	estadd margins, dydx(*) atmeans

est store C

esttab A B C using table4.tex, replace f ///
	label booktabs b(3) p(3) eqlabels(none) alignment(S S) collabels("\multicolumn{1}{c}{$\beta$ / SE}" "\multicolumn{1}{c}{Mfx}") ///
	drop(_cons spouse*  ) ///
	star(* 0.10 ** 0.05 *** 0.01) ///
	cells("b(fmt(3)star) margins_b(star)" "se(fmt(3)par)") ///
	refcat(age18 "\emph{Age}" male "\emph{Demographics}" educationage "\emph{Education}" employeddummy "\emph{Employment}" oowner "\emph{Housing}" hhincome_thou "\emph{Household Finances}" reduntant "\emph{Income and Expenditure Risk}" literacyscore "\emph{Behavioural Characteristics}", nolabel) ///
	stats(N r2_p chi2 p pr, fmt(0 3) layout("\multicolumn{1}{c}{@}" "\multicolumn{1}{S}{@}") labels(`"Observations"' `"Pseudo \(R^{2}\)"' `"LR chi2"' `"Prob > chi2"' `"Baseline predicted probability"'))

What do we do here? First we quietly run a probit model, then we generate the baseline predicted probability and store this using estadd. Finally, we calculate marginal effects and store them again using estadd.

As you can see, there are many commands that generate that table. Step-by-step:

  • b(3) and p(3):  3 decimal places for coefficients and standard errors
  • alignment (S S): alignment of the decimal places. As we have two data columns per specification (one for ß/SE and one for Mfx), we need to specify algnment for each column. Here we use the siunitx package (see below) to align the results at the decimal point. The alternative would be alignment (c c) for centered data entries.
  • collabels: labels for the data columns. As we want those centered we need the multicolumn option
  • drop: drop some results from the table
  • star: specify how you want to report significance levels
  • cells: specify the content for each cell
  • stats: adds statistics below the results. We add N (observations), r2_p (pseudo R2), chi2 (LR chi2) p (prob > chi2) and pr (the baseline predicted probability, created before). Furthermore: 
    • fmt specifies the number of decimal places (here: N has 0, all the following 3)
    • layout: specify alignment. We want N to be centered and all the following to be decimal aligned.
    • labels: create some nice names


If you are grouping variables with the refcat command, you may want to indent the variables to create a nicer design as in my example. The following Stata command creates a 0.1cm indent for all variable labels. If you want some labels not to have this indent make sure to label (or relabel) this variables after you have run this command.

foreach v of varlist * {
	label variable `v' `"\hspace{0.1cm} `: variable label `v''"'

If you have some long column labels you might have to insert a manual line break to prevent the column from becoming to wide. The usual command in LaTeX for this is \\, but that does not work in table columns. We create a special LaTeX command (see below) that takes care of that issue. If you need to insert a linebreak in your table, wrap the text into a \specialcell field. Then you can use \\ as usual:

mtitle("\specialcell{Co-Holding\\> \pounds100}" "\specialcell{Co-Holding\\> \pounds500}" "\specialcell{Co-Holding\\> \pounds1500}") ///

3. Tables into LaTeX

Now begins the LaTeX hacking part. By default, estout generates a complete table and has the ability to include table titles above and notes below the table. But we are using the fragment option to generate the pure table content only. Why? Because it allows for much more flexibility, plus adding notes below the table with estout almost certainly breaks the width of the first column.

In order for this to work you need to add the following to your LaTeX preamble (i.e. before \begin{document}):

% *****************************************************************
% Estout related things
% *****************************************************************
\newcommand{\sym}[1]{\rlap{#1}}% Thanks to David Carlisle

\let\estinput=\input% define a new input command so that we can still flatten the document



% Allow line breaks with \\ in specialcells

% *****************************************************************
% Custom subcaptions
% *****************************************************************
% Note/Source/Text after Tables
	\caption*{\hspace{6pt}\hangindent=1.5em #1}


% Add significance note with \starnote
\newcommand{\starnote}{\figtext{* p < 0.1, ** p < 0.05, *** p < 0.01. Standard errors in parentheses.}}

% *****************************************************************
% siunitx
% *****************************************************************
\usepackage{siunitx} % centering in tables
		tight-spacing		= true,
		group-digits		= false ,
		input-signs		= ,
		input-symbols		= ( ) [ ] - + *,
		input-open-uncertainty	= ,
		input-close-uncertainty	= ,
		table-align-text-post	= false

These commands to the following:

  • Create two wrappers for estout generated tables. \estwide uses tabular* and fills the table to the width of the text, \estauto uses tabular and uses the “standard” table width (i.e. width adjusted to your content).
    • You need to specify three options immediately after the command (in the curly brackets). \estwide{the .tex of the table}{the number of data columns}{alignment}. Have look at my comment below to clarify the syntax further.
    • In the second curly bracket enter the number of data columns, excluding the label column. In the example of Table 2 we have 3 data columns, in Table 4 we have 3 data columns as well, as the subcolumns “ß/SE” and “MFx” are considered to be one column.
    • For alignment use the S option for decimal alignment using siunitx or C to centre the data. (Hint: decimal alignment looks much, much better. Just look at any Journal).
  • Add the \specialcell command for manual line break (see above).
  • Add commands for subcaptions to include simple notes below the table using the caption package.
    • \figtext adds some basic text
    • \fignote adds text with “Note: ” before.
    • \figsource adds text with “Source: ” before.
    • \starnote adds information about significance levels

You can then include the tables into your latex document as follows:

% Table 2
\caption{Sample Characteristics by the Amount of Co-Holding (\pounds)}

% Table 4
\caption{Probit Model for Characteristics of Co-Holders with Income Risk}
\fignote{Omitted groups: \emph{Employment:} Student/Housewife/Disabled. \emph{Housing:} Private renter/Social renter. Further controls for spouse employment status.}

That’s it. Not so difficult after all! I should add that there is a typographical issue if you are using different text and math fonts, as the brackets and asterisks are set in math font, but the digits in text font. A workaround is provided here, thanks to David Carlise.

Edit: Have a look at my comment below regarding the syntax of the \estauto \estwide commands. The number of data columns (the second entry) and the alignment (third entry) might be a bit confusing at first, I hope that comment clarifies things a bit.


Workflow with Subversion and LaTeX

In the previous post I showed how easy it is to install your own Subversion nowadays. But that brings you only that far because you need to integrate SVN into your daily working routine. Here I show how I do that with my LaTeX projects.

The basic workflow with Subversion is as follows:

  • Update the working copy (right-click and ‘SVN Update’). This updates the locally stored file with the one from the repository. You need to do this when working on the project with others.
  • Make some changes to the document.
  • Commit the changes to the repository (right-click and ‘SVN Commit’). Make sure to add some notes what you changed to make version tracking easier.

When working on large documents it is really important for me to visualise changes. MikTeX and TeX Live come with the great tool latexdiff that highlights differences between two documents. The syntax is simply latexdiff old.tex new.tex > diff.tex from the command-line. Using SVN appears perfect for the job as we can access old revisions easily. The syntax is then, for example, latexdiff-vc --svn -r HEAD file.tex, which compares the file.tex with the HEAD revision (i.e. the latest revision in the repository).

I generally work with multi-file LaTeX projects, i.e. the main file is called master.tex where all sections of the project are included via \include or \input (introduction, chapter 1, chapter 2 etc.). For this, latexdiff offers the -flatten option that replaces the \include and \input commands with the actual content. Unfortunately, latexdiff currently does not support -flatten and -vc simultaneously, so we have to do that manually.

The idea is as follows:

  • Flatten the document, i.e. merge the contents of all .tex files into a single document. This includes the bibliography.
  • Compare the flattened document with a previous version with latexdiff and generate a PDF automatically.
  • Commit the flattened document and the PDF.

I also want to create two more versions of the flattened document:

  • One were each sentence starts in a separate line. This is useful if you check differences on the source rather than the compiled document latexdiff changes the source by including \DIFdelbegin and \DIFdel tags to highlight changes, so if you want to copy & paste some of the text later this is more convinient.
  • One were all comments and to-dos (using the todonotes package \todo{}) are removed. This file could be sent to publishers (or just people who want to have a look), but are not supposed to read your comments.

That sounds like a handful, but there are fortunately scripts around for every task I mentioned above. As my programming skills are just at cavemen level I am only able to combine these scripts into a single batch file that does all these steps in one go.

What you need:

  • flatex flattens the document. It is quite old but works perfectly. It also inputs the bibliography so that you truly need only one .tex file for the whole project. Get the single flatten.c source here and compile or download the executable  flatex.exe (1143 downloads) .
  • by . A perl script that removes comments and to-dos (see post). In principle it flattens the document as well, but for some reason it messes up the bibliography and tables in my files. So we don’t use that feature and use flatex for flattening.
  • by Andrew Stacey. Another perl script that moves each sentence to a new line.

For the last two scripts to work you need to install Perl.

Now it’s time for my batch  cleandiff.bat (1205 downloads)  that combines all scripts. To run it you need to copy it were your master/main tex-file is (in my script it is called ‘master.tex’). You also need to specify were you save the three scripts above (I just copied them to C:).

Here’s what it does:

Create new working folder. Cleandiff contains all new files, temp will be deleted

mkdir cleandiff
mkdir cleandiff\temp

Call the ancient flatex. It does its job perfectly, but for some reason it does not create a new file as it is supposed to do, but instead the new file is called.flt. So we just rename it to clean-comments.tex and move it to a new folder.

C:\flatex.exe master.tex
rename master.flt clean-comments.tex
move clean-comments.tex cleandiff\clean-comments.tex

cd cleandiff

We add and commit this file which is then the HEAD Revision.

svn add clean-comments.tex
svn commit -m "" clean-comments.tex

Remove comments and \todo{} remarks. On its own the script does the same as flatex, but it does not work well with my documents (tables and the bibliography get messed up). So we just call it, if necessary, after flatex.

perl C:\ clean-comments.tex > clean-nocomments.tex

Put each sentence into a new line. Useful if using svn diff as it makes it easier to track changes. Should be called upon clean-comments.tex as it is useful to keep tack on your comments and todos as well.

perl C:\ clean-comments.tex > clean-linebreak.tex

Compare HEAD revision with the PREVIOUS revision. First checkout, the run latexdiff. You can specify the revisions you want to compare, e.g. -r 28 uses the 28th revision.

svn cat -r PREV clean-comments.tex > temp\PREV.tex
svn cat -r HEAD clean-comments.tex > temp\HEAD.tex

latexdiff -t UNDERLINE temp\PREV.tex temp\HEAD.tex > rev_head_prev.tex

Compile the document. It is necessary to run pdflatex 3 times in that order to get the bibliography and cross references right. nonstopmode is used to ignore errors.

pdflatex rev_head_prev.tex -interaction=nonstopmode
bibtex rev_head_prev.tex
pdflatex rev_head_prev.tex -interaction=nonstopmode
pdflatex rev_head_prev.tex -interaction=nonstopmode

We delete some files that are not required to track changes. The last command deletes empty 0 byte files that may be generated.

del *.lof
del *.lot
del *.toc
del *.log
del *.dvi
del *.aux
del *.bbl
del *.blg
del *.brf
del *.out
rmdir /Q /S temp
for /r %%F in (*) do if %%~zF==0 del "%%F"

Finally we add and commit the remaining files

svn add *
svn commit -m ""
svn commit cleandiff

If all goes well you should have five committed files in the cleandiff folder:

  • clean-comments.tex: the flattened document that is used for latexdiff
  • clean-linebreak.tex: flattened document including comments where all sentences start on a new line.
  • clean-nocomments.tex: flattened document where comments and to-dos are removed
  • rev_head_prev.pdf and rev_head_prev.tex: PDF and .tex highlighting the changes to the previous version.

After a long working day where I made lot’s of improvements on my papers I run the script so that I can always keep track of my changes.


Version Control with Subversion for LaTex & Stata (and more…)

Before starting a large project (such as a PhD) it is crucial to have some form version control system for the text and files you are going to produce. I start a small series of blog posts about how I (hope to) achieve this for LaTeX and Stata .do files using the free version control system Subversion (SVN).

One of the main advantages of Microsoft Word is its brilliant system to track changes and make comments, a thing that LaTeX lacks, so it is always a bit difficult to collaborate and be aware of changes. I have only recently switched to LaTeX because of its ability to integrate result tables from Stata automatically (in my opinion the only big advantage of LaTeX. You can do amazing things in Word as well, but more about that in some future posts), so I started looking into version controlling for LaTeX projects. I was always hesitant to try out Subversion because it requires setting up an Apache server, something I had horrible experiences with in the past under Windows, but I have discovered uberSVN by WANdisco, a free and preconfigured Subversion server system – all you have to do is install it like a normal Windows programme. The whole process of setting up an SVN should now take less than 20 minutes (if you are slow).

If you are unsure what Subversion actually is and does, I recommend a look into the LaTeX Wiki. In a nutshell, it allows you to keep track, compare, revise and merge different versions of the same file. It is absolutely crucial if you work on the same project with several authors, but even when you are working alone it is extremely useful. Imagine your supervisor tells you to delete one paragraph. You promptly do so, but the next day (s)he wants it back. You have probably saved the updated document under a new filename, or haven’t, but this system of version control get’s cumbersome after a while, even for small projects. This is where Subversion jumps in. Every time you do a (major) change, you ‘check-in’ in the revised version and SVN stores it under a new revision. If you want to go back, you simply ‘check-out’ the older revision.

I will post more about the workflow in one of the next post, but now let’s have a little ‘Installing Subversion for Dummies’ tutorial. One last piece of advice: Subversion is not a backup tool. If you lose the repository (the database where your files are stored), you lose everything except your local files. For Backup purposes you could use something like Crashplan or Dropbox (that’s what I do).


  • Download uberSVN. That is the Subversion server. While you are at it, make sure to download the Subversion 1.7.x Client because we will be using the 1.7 version of Subversion which cannot be accessed with older clients.
  • Install uberSVN. I am not going to guide you through the installation process because it is so simply. Make a note of the URL (the default should be If you cannot access the server make sure to restart the browser or your computer. After that install the Subversion 1.7.x Client
  • Now open the control panel in your browser, set up a password etc. Once you are asked where to store the SVN files, choose something like C:\SVN. This folder contains the repository, so it should be nowhere near your locally stored project files.
  • Switch to Subversion 1.7 under Administration > SVN Switch.
  • Now install TortoiseSVN (the link is provided on the dashboard, or download it here). That’s the interface that allows you to manage your files, rather then doing it with a command-line tool. A restart might be required to see the overlays that indicate the file status in Windows Explorer.

Setting up your Data

Now we are going to set up the repository, i.e. the database to store your data. If you have existing data that you want to check-in, i.e. a project in working progress, make sure to follow these steps closely. Most importantly: make a backup of your project files!

  • Set up the repository in uberSVN > Repositories > Add.
  • Give your repository a snappy name, i.e. ‘research’. The URL of the repository will then be, for instance,
  • Don’t change anything on the import or permission screens, just click save. The server will then restart and the repository is set up.
  • Switch to Windows Explorer and go to the folder where your original data is stored (i.e. your research/project folder). Right-click on it and select TortoiseSVN > Import. This copies the whole folder into the repository. Enter the URL of your repository (e.g.
  • If you want to make sure that everything is copied click on ‘include ignored files’, as by default Tortoise ignores certain (often useless) file-types. After importing you can delete the folder (again, make a backup).
  • This whole process is necessary so that SVN knows which folders to watch. Therefore go to the folder where you want your project files now to be stored. Right-click on empty space in the Explorer and select SVN Checkout. This copies the content of the repository, i.e. what you just imported, to the new location on your hard-drive.

That’s it! Now Subversion is set up and your project files are being watched. Note that you always have to manually check-in any changes you made to the files, otherwise they are not updated in the repository. If you have have changed a file or folder, this is indicated by a red exclamation mark overlay in the Explorer. To update, just right-click on the item and select SVN Commit.

A version control system is only as good as it’s weakest link: the user. So I will be posting more about the workflow with LaTeX and Stata soon.