Stata and LaTeX ‘Longtables’ – A solution is coming

Due to popular demand I am currently working on a way to implement the Stata/LaTeX workflow with long tables, i.e. tables spanning over multiple pages. It looks like almost every other question I get is about long tables, so I am finally giving in…

Good news: it works, in principle.

Bad news: Things like table notes and a “one-size-fit-all” code as before is not so easy to implement, hence I need some time to think about use cases. In any case, long tables will definetly require more user interaction than normal tables.

I will also start working on a unifying documentation that is going to provide the code and examples of everything that I have covered so far, including issues that came up in the comments.

Have a good Christmas time, everyone!

 

 

Automated Table generation in Stata and integration into LaTeX (1)

I use estout to generate tables of summary statistics and regression results that can be easily imported into LaTeX. The advantage is that the whole system is dynamic. If you change something in your do-file (e.g. you omit a particular group), you don’t have to change anything: the results get automatically updated in LaTeX. That has saved me a lot of time, but setting everything up took a long long time. So hopefully this post is helpful for aspiring applied econometricians who want to automate output reporting from Stata into LaTeX.

I think it is easiest to explain everything with examples, for which I use some tables from my current working paper. First install estout in Stata (ssc install estout), then we can jump right into the examples. I explain three things: creating tables for 1. descriptive statistics and 2. regression output and 3. putting everything into LaTeX. Ready? Let’s start.

Edit: Have a look at my follow-up post if you encounter any problems!
Edit 2: Another follow-up to improve the design.

1. Descriptive Statistics

The principle of estout is simple: you run a command in Stata that generates some statistics, you tell estout to (temporarily) store those results and then you create a table.

Consider Table 2, which is simply a bunch of summarised variables, split into three categories. You cannot see it, but these are actually three tables appended together. Why? Because the first part (Age to Housing) is percentages and has therefore 2 decimal points, the second part (Household Finances) are income variables with 0 decimal points, and the last part has 2 decimal points again. The complete code that generates this table is then:

* TABLE 2 

estpost su $dem if coholder100==1
est store A

estpost su $dem if coholder500==1
est store B

estpost su $dem if coholder1500==1
est store C

	esttab A B C using table2.tex, replace ///
		refcat(age18 "\emph{Age}" male "\emph{Demographics}" educationage "\emph{Education}" employeddummy "\emph{Employment}" oowner "\emph{Housing}", nolabel) ///
		mtitle("> \pounds100" "> \pounds500" "> \pounds1500") ///
		cells(mean(fmt(2))) label booktabs nonum collabels(none) gaps f noobs

estpost su $fin if coholder100==1
est store A

estpost su $fin if coholder500==1
est store B

estpost su $fin if coholder1500==1
est store C

	esttab A B C using table2.tex, append ///
		refcat(hhincome "\emph{Household Finances}", nolabel) ///
		nomtitles ///
		cells(mean(fmt(0))) label booktabs nonum f collabels(none) gaps noobs plain

estpost su $risk if coholder100==1
est store A

estpost su $risk if coholder500==1
est store B

estpost su $risk if coholder1500==1
est store C

	esttab A B C using table2.tex, append ///
		nomtitles ///
		refcat(redundant "\emph{Income and Expenditure Risk}" literacyscore "\emph{Behavioural Characteristics}", nolabel) ///
		stats(N, fmt(%18.0g) labels("\midrule Observations")) ///
		cells(mean(fmt(2))) label booktabs nonum f collabels(none) gaps plain

Some little explanation (for a full list of commands see the estout/esttab manual)

  • refcat includes a heading for a group of variables
  • mtitle specifies the columns heading
  • cells specifies the cell content (in the first part “mean” with 2 decimal place)
  • f creates a fragment of the table, i.e. only the table content is exported to the .tex (see below for more information)

2. Regression Results

Reporting regression results is not as simple, but we are jumping right into a fairly complicated example (Table 4).

As you can see, we are reporting coefficients, standard errors and marginal effects, hence each specification has two columns and two rows. At the bottom we add a few additional statistics (estout can add every statistics that is saved in the e() matrix).

This table is generated by the following code:

* TABLE 4

quietly probit coholder $dem $fin $bev $risk
	predict pr2_coholder, pr
	quietly su pr2_coholder
	estadd scalar pr = r(mean)
	estadd margins, dydx(*) atmeans

est store A

quietly probit coholder500 $dem $fin $bev $risk
	predict pr2_coholder500, pr
	quietly su pr2_coholder500
	estadd scalar pr = r(mean)
	estadd margins, dydx(*) atmeans

est store B

quietly probit coholder1500 $dem $fin $bev $risk
	predict pr2_coholder1500, pr
	quietly su pr2_coholder1500
	estadd scalar pr = r(mean)
	estadd margins, dydx(*) atmeans

est store C

esttab A B C using table4.tex, replace f ///
	label booktabs b(3) p(3) eqlabels(none) alignment(S S) collabels("\multicolumn{1}{c}{$\beta$ / SE}" "\multicolumn{1}{c}{Mfx}") ///
	drop(_cons spouse*  ) ///
	star(* 0.10 ** 0.05 *** 0.01) ///
	cells("b(fmt(3)star) margins_b(star)" "se(fmt(3)par)") ///
	refcat(age18 "\emph{Age}" male "\emph{Demographics}" educationage "\emph{Education}" employeddummy "\emph{Employment}" oowner "\emph{Housing}" hhincome_thou "\emph{Household Finances}" reduntant "\emph{Income and Expenditure Risk}" literacyscore "\emph{Behavioural Characteristics}", nolabel) ///
	stats(N r2_p chi2 p pr, fmt(0 3) layout("\multicolumn{1}{c}{@}" "\multicolumn{1}{S}{@}") labels(`"Observations"' `"Pseudo \(R^{2}\)"' `"LR chi2"' `"Prob > chi2"' `"Baseline predicted probability"'))

What do we do here? First we quietly run a probit model, then we generate the baseline predicted probability and store this using estadd. Finally, we calculate marginal effects and store them again using estadd.

As you can see, there are many commands that generate that table. Step-by-step:

  • b(3) and p(3):  3 decimal places for coefficients and standard errors
  • alignment (S S): alignment of the decimal places. As we have two data columns per specification (one for ß/SE and one for Mfx), we need to specify algnment for each column. Here we use the siunitx package (see below) to align the results at the decimal point. The alternative would be alignment (c c) for centered data entries.
  • collabels: labels for the data columns. As we want those centered we need the multicolumn option
  • drop: drop some results from the table
  • star: specify how you want to report significance levels
  • cells: specify the content for each cell
  • stats: adds statistics below the results. We add N (observations), r2_p (pseudo R2), chi2 (LR chi2) p (prob > chi2) and pr (the baseline predicted probability, created before). Furthermore: 
    • fmt specifies the number of decimal places (here: N has 0, all the following 3)
    • layout: specify alignment. We want N to be centered and all the following to be decimal aligned.
    • labels: create some nice names

Refinements

If you are grouping variables with the refcat command, you may want to indent the variables to create a nicer design as in my example. The following Stata command creates a 0.1cm indent for all variable labels. If you want some labels not to have this indent make sure to label (or relabel) this variables after you have run this command.

foreach v of varlist * {
	label variable `v' `"\hspace{0.1cm} `: variable label `v''"'
	}

If you have some long column labels you might have to insert a manual line break to prevent the column from becoming to wide. The usual command in LaTeX for this is \\, but that does not work in table columns. We create a special LaTeX command (see below) that takes care of that issue. If you need to insert a linebreak in your table, wrap the text into a \specialcell field. Then you can use \\ as usual:

mtitle("\specialcell{Co-Holding\\> \pounds100}" "\specialcell{Co-Holding\\> \pounds500}" "\specialcell{Co-Holding\\> \pounds1500}") ///

3. Tables into LaTeX

Now begins the LaTeX hacking part. By default, estout generates a complete table and has the ability to include table titles above and notes below the table. But we are using the fragment option to generate the pure table content only. Why? Because it allows for much more flexibility, plus adding notes below the table with estout almost certainly breaks the width of the first column.

In order for this to work you need to add the following to your LaTeX preamble (i.e. before \begin{document}):

% *****************************************************************
% Estout related things
% *****************************************************************
\newcommand{\sym}[1]{\rlap{#1}}% Thanks to David Carlisle

\let\estinput=\input% define a new input command so that we can still flatten the document

\newcommand{\estwide}[3]{
		\vspace{.75ex}{
			\begin{tabular*}
			{\textwidth}{@{\hskip\tabcolsep\extracolsep\fill}l*{#2}{#3}}
			\toprule
			\estinput{#1}
			\bottomrule
			\addlinespace[.75ex]
			\end{tabular*}
			}
		}	

\newcommand{\estauto}[3]{
		\vspace{.75ex}{
			\begin{tabular}{l*{#2}{#3}}
			\toprule
			\estinput{#1}
			\bottomrule
			\addlinespace[.75ex]
			\end{tabular}
			}
		}

% Allow line breaks with \\ in specialcells
	\newcommand{\specialcell}[2][c]{%
	\begin{tabular}[#1]{@{}c@{}}#2\end{tabular}}

% *****************************************************************
% Custom subcaptions
% *****************************************************************
% Note/Source/Text after Tables
\newcommand{\figtext}[1]{
	\vspace{-1.9ex}
	\captionsetup{justification=justified,font=footnotesize}
	\caption*{\hspace{6pt}\hangindent=1.5em #1}
	}
\newcommand{\fignote}[1]{\figtext{\emph{Note:~}~#1}}

\newcommand{\figsource}[1]{\figtext{\emph{Source:~}~#1}}

% Add significance note with \starnote
\newcommand{\starnote}{\figtext{* p < 0.1, ** p < 0.05, *** p < 0.01. Standard errors in parentheses.}}

% *****************************************************************
% siunitx
% *****************************************************************
\usepackage{siunitx} % centering in tables
	\sisetup{
		detect-mode,
		tight-spacing		= true,
		group-digits		= false ,
		input-signs		= ,
		input-symbols		= ( ) [ ] - + *,
		input-open-uncertainty	= ,
		input-close-uncertainty	= ,
		table-align-text-post	= false
        }

These commands to the following:

  • Create two wrappers for estout generated tables. \estwide uses tabular* and fills the table to the width of the text, \estauto uses tabular and uses the “standard” table width (i.e. width adjusted to your content).
    • You need to specify three options immediately after the command (in the curly brackets). \estwide{the .tex of the table}{the number of data columns}{alignment}. Have look at my comment below to clarify the syntax further.
    • In the second curly bracket enter the number of data columns, excluding the label column. In the example of Table 2 we have 3 data columns, in Table 4 we have 3 data columns as well, as the subcolumns “ß/SE” and “MFx” are considered to be one column.
    • For alignment use the S option for decimal alignment using siunitx or C to centre the data. (Hint: decimal alignment looks much, much better. Just look at any Journal).
  • Add the \specialcell command for manual line break (see above).
  • Add commands for subcaptions to include simple notes below the table using the caption package.
    • \figtext adds some basic text
    • \fignote adds text with “Note: ” before.
    • \figsource adds text with “Source: ” before.
    • \starnote adds information about significance levels

You can then include the tables into your latex document as follows:

% Table 2
\begin{table}
\caption{Sample Characteristics by the Amount of Co-Holding (\pounds)}
\estwide{table2.tex}{6}{c}
\label{table2}
\end{table}

% Table 4
\begin{table}
\caption{Probit Model for Characteristics of Co-Holders with Income Risk}
\estauto{table4}{3}{S[table-format=4.4]S[table-format=4.4]}
\starnote
\fignote{Omitted groups: \emph{Employment:} Student/Housewife/Disabled. \emph{Housing:} Private renter/Social renter. Further controls for spouse employment status.}
\label{table4}
\end{table}

That’s it. Not so difficult after all! I should add that there is a typographical issue if you are using different text and math fonts, as the brackets and asterisks are set in math font, but the digits in text font. A workaround is provided here, thanks to David Carlise.

Edit: Have a look at my comment below regarding the syntax of the \estauto \estwide commands. The number of data columns (the second entry) and the alignment (third entry) might be a bit confusing at first, I hope that comment clarifies things a bit.

Links

Version Control with Subversion for LaTex & Stata (and more…)

Before starting a large project (such as a PhD) it is crucial to have some form version control system for the text and files you are going to produce. I start a small series of blog posts about how I (hope to) achieve this for LaTeX and Stata .do files using the free version control system Subversion (SVN).

One of the main advantages of Microsoft Word is its brilliant system to track changes and make comments, a thing that LaTeX lacks, so it is always a bit difficult to collaborate and be aware of changes. I have only recently switched to LaTeX because of its ability to integrate result tables from Stata automatically (in my opinion the only big advantage of LaTeX. You can do amazing things in Word as well, but more about that in some future posts), so I started looking into version controlling for LaTeX projects. I was always hesitant to try out Subversion because it requires setting up an Apache server, something I had horrible experiences with in the past under Windows, but I have discovered uberSVN by WANdisco, a free and preconfigured Subversion server system – all you have to do is install it like a normal Windows programme. The whole process of setting up an SVN should now take less than 20 minutes (if you are slow).

If you are unsure what Subversion actually is and does, I recommend a look into the LaTeX Wiki. In a nutshell, it allows you to keep track, compare, revise and merge different versions of the same file. It is absolutely crucial if you work on the same project with several authors, but even when you are working alone it is extremely useful. Imagine your supervisor tells you to delete one paragraph. You promptly do so, but the next day (s)he wants it back. You have probably saved the updated document under a new filename, or haven’t, but this system of version control get’s cumbersome after a while, even for small projects. This is where Subversion jumps in. Every time you do a (major) change, you ‘check-in’ in the revised version and SVN stores it under a new revision. If you want to go back, you simply ‘check-out’ the older revision.

I will post more about the workflow in one of the next post, but now let’s have a little ‘Installing Subversion for Dummies’ tutorial. One last piece of advice: Subversion is not a backup tool. If you lose the repository (the database where your files are stored), you lose everything except your local files. For Backup purposes you could use something like Crashplan or Dropbox (that’s what I do).

Installation

  • Download uberSVN. That is the Subversion server. While you are at it, make sure to download the Subversion 1.7.x Client because we will be using the 1.7 version of Subversion which cannot be accessed with older clients.
  • Install uberSVN. I am not going to guide you through the installation process because it is so simply. Make a note of the URL (the default should be 127.0.0.1:9890/ubersvn). If you cannot access the server make sure to restart the browser or your computer. After that install the Subversion 1.7.x Client
  • Now open the control panel in your browser, set up a password etc. Once you are asked where to store the SVN files, choose something like C:\SVN. This folder contains the repository, so it should be nowhere near your locally stored project files.
  • Switch to Subversion 1.7 under Administration > SVN Switch.
  • Now install TortoiseSVN (the link is provided on the dashboard, or download it here). That’s the interface that allows you to manage your files, rather then doing it with a command-line tool. A restart might be required to see the overlays that indicate the file status in Windows Explorer.

Setting up your Data

Now we are going to set up the repository, i.e. the database to store your data. If you have existing data that you want to check-in, i.e. a project in working progress, make sure to follow these steps closely. Most importantly: make a backup of your project files!

  • Set up the repository in uberSVN > Repositories > Add.
  • Give your repository a snappy name, i.e. ‘research’. The URL of the repository will then be, for instance, http://127.0.0.1:9800/research
  • Don’t change anything on the import or permission screens, just click save. The server will then restart and the repository is set up.
  • Switch to Windows Explorer and go to the folder where your original data is stored (i.e. your research/project folder). Right-click on it and select TortoiseSVN > Import. This copies the whole folder into the repository. Enter the URL of your repository (e.g. http://127.0.0.1:9800/research).
  • If you want to make sure that everything is copied click on ‘include ignored files’, as by default Tortoise ignores certain (often useless) file-types. After importing you can delete the folder (again, make a backup).
  • This whole process is necessary so that SVN knows which folders to watch. Therefore go to the folder where you want your project files now to be stored. Right-click on empty space in the Explorer and select SVN Checkout. This copies the content of the repository, i.e. what you just imported, to the new location on your hard-drive.

That’s it! Now Subversion is set up and your project files are being watched. Note that you always have to manually check-in any changes you made to the files, otherwise they are not updated in the repository. If you have have changed a file or folder, this is indicated by a red exclamation mark overlay in the Explorer. To update, just right-click on the item and select SVN Commit.

A version control system is only as good as it’s weakest link: the user. So I will be posting more about the workflow with LaTeX and Stata soon.

New Paper: ‘Self-Control, Financial Literacy and the Co-Holding Puzzle’

J. Gathergood and I have released a new paper titled ‘Self-Control, Financial Literacy and the Co-Holding Puzzle’. It is concerned with the empirical phenomenon that a substantial proportion of the population hold liquid savings and unsecured credit simultaneously – and not small sums, but in the region of £6,000 or more. It is available on SSRN and the abstract tells you more:

We use UK household survey data incorporating measures of financial literacy and behavioural characteristics to analyse the puzzling co-existence of high cost revolving consumer credit alongside low yield liquid savings in household balance sheets, which we term the ‘co-holding puzzle’. Approximately 20% of households in our sample co-hold, on average, £6,500 of revolving consumer credit alongside £8,000 of liquid savings. Co-holders are typically more financially literate, with above average income and education. However, we show co-holding is also associated with impulsive spending behaviour on the part of the household. Our results lend empirical support to theoretical models in which sophisticated households co-hold as a means of managing a self-control problem.

So why do these households not simply payoff their credit and save a substantial amount of money? The intuitive answer is that households want to keep cash as a measure of last resort, for example for (unexpected) expenditure that cannot be paid off using credit. This paper by Telyukova mentions medical and housing expenses that cannot be paid off using credit card. But is it reasonable to assume that households expect to pay off these large sums in the future? Also, Telyokova’s argument focuses on credit card co-holding, but in the UK consumers hold complex credit portfolios, including payday loans, car loans, overdrafts etc. These forms of credit can be used to pay off almost all expenditure, and are readily available.

We focus on another explanation: we find that households that are more likely to co-hold are typically more sophisticated, i.e. have larger incomes and higher financial literacy, but score high on a self-reported measure on impulsiveness. The intuition is that they anticipate impulsive spending, but – because they are sophisticated – deliberately hold outstanding consumer credit balances so as to limit the spending possibilities.