LaTeX and Stata integration (3): Improving the Design

If you followed my previous posts regarding automated Stata and LaTeX integration you might already have a good idea how estout works and how a table can be printed in LaTeX in an aesthetically pleasing way. This post is about improving the print quality in LaTeX even further. If you are new to LaTeX and Stata integration please read my introductory post and my follow up post that solves some problems.

The Problem

You might have encountered two problems following the instructions of my previous posts:

  1. You generate hundreds of overfull hbox warnings in LaTeX if you decimal-align the results.
  2. If you use different math- and text-fonts, symbols in the table are set in the math-font, which can look ugly.

Both are relatively serious problems. The first occurs because siunitx, the package we use to decimal-align the values requires precise information about how much space there is in each column before and after the decimal limiter. Consider the table on the right: we have a maximum of 2 characters on the left side of the decimal limiter (minus + one integer) and a maximum of 5 characters on the right side (two decimals + three stars). Hence, we need to specify this in siunitx by S[table-format=-1.5] to avoid overfull hbox warnings. But I don’t like this, because then we are not able to create really compact tables and furthermore it can happen that the column headline is not perfectly centered relative to the integer and digits.

The second problem occurs because siunitx sets symbols that are specified as input-symbols (i.e. they do not interfere with the decimal-alignment) in math-mode. If you use a complete font such as Latin Modern you won’t notice this because both text- and math-font look the same. But if you use distinct fonts, the result will be as the one in the example above: minus symbol and brackets are set in “Euler”, a very pretty math-font, but the rest is set in Linux Libertine, my favourite free LaTeX font. Obviously, this result is not optimal.

The Solution

It is easiest when you download the Sample Document and the Sample Table used to generate the example to see how the solution works. I load the following fonts:

\usepackage{libertine}% Linux Libertine, may favourite text font
\usepackage[euler-digits]{eulervm}% A pretty math font

In the next lines we create the well-known \sym command and load and customise siunitx, now simpler than in my previous posts.

% *****************************************************************
% siunitx
% *****************************************************************
\newcommand{\sym}[1]{\rlap{#1}}% Thanks David Carlisle

\usepackage{siunitx}
	\sisetup{
		detect-mode,
		group-digits		= false,
		input-symbols		= ( ) [ ] - +,
		table-align-text-post	= false,
		input-signs             = ,
        }

Now begins the hacki bit: as I mentioned before I don’t like to reserve space for all characters in siunitx. What I want to do is tell siunitx only how many integers and decimals there are – I don’t care about minuses, brackets or stars. Apart from being simpler, it also has the advantage of being able to create really compact tables and also to make sure that the column title is centered relatve to the numbers. If you reserve space for all characters, the title might look offset, because it is centered relative to all characters, while “the eye” focuses on the number to create an aligning point.

The solution to the first problem, avoiding overfull hbox warnings, is to tell LaTeX not to reserve any space for minusses, brackets and stars. The \llap{...} and \rlap{...} commands will do this, we just need to tell LaTeX to wrap all symbols in the table in those commands. The solution to the second problem is to substitute the specific characters that are set in math-mode (in my example minusses and brackets) with their text-mode equivalents. This is done in the following lines:

% Character substitution that prints brackets and the minus symbol in text mode and does not reserve any space. Thanks to David Carlisle
\def\yyy{%
  \bgroup\uccode`\~\expandafter`\string-%
  \uppercase{\egroup\edef~{\noexpand\text{\llap{\textendash}\relax}}}%
  \mathcode\expandafter`\string-"8000 }

\def\xxxl#1{%
\bgroup\uccode`\~\expandafter`\string#1%
\uppercase{\egroup\edef~{\noexpand\text{\noexpand\llap{\string#1}}}}%
\mathcode\expandafter`\string#1"8000 }

\def\xxxr#1{%
\bgroup\uccode`\~\expandafter`\string#1%
\uppercase{\egroup\edef~{\noexpand\text{\noexpand\rlap{\string#1}}}}%
\mathcode\expandafter`\string#1"8000 }

\def\textsymbols{\xxxl[\xxxr]\xxxl(\xxxr)\yyy}

Here we create a new command \textsymbols that incorporates all issues discussed above. To make sure that our tables are printed correctly, we have to adapt the \estwide and \estauto commands accordingly. All that has changed is the added \textsymbols command.

\newcommand{\estwide}[3]{
	\vspace{.75ex}{
		\textsymbols% Note the added command here
		\begin{tabular*}
		{\textwidth}{@{\hskip\tabcolsep\extracolsep\fill}l*{#2}{#3}}
		\toprule
		\estinput{#1}
		\bottomrule
		\addlinespace[.75ex]
		\end{tabular*}
		}
	}	

\newcommand{\estauto}[3]{
	\vspace{.75ex}{
		\textsymbols% Note the added command here
		\begin{tabular}{l*{#2}{#3}}
		\toprule
		\estinput{#1}
		\bottomrule
		\addlinespace[.75ex]
		\end{tabular}
		}
	}

Finally, the command to print the table in LaTeX:

\begin{table}\centering
  \begin{threeparttable}
    \caption{Table with Better Notes and Better Symbols}
    \estauto{table}{3}{S[table-format=1.2,table-column-width=20mm]}
    \Figtext{Some basic text about the table.}
    \Fignote{With `threeparttables' even long notes don't get wider than the table. The result is much more typographically pleasing.}
    \Figsource{We good the data from here.}
    \Starnote
  \end{threeparttable}
\end{table}

Note the specification of the S-column, where we only specify the maximum number of integers and decimals, and we fix the column-width to 20mm: S[table-format=1.2,table-column-width=20mm].

The result is a pretty table without any LaTeX warnings and where the minus and brackets are set in the correct font.

In the next post we go a bit further: If you are a fan of typography you might use old-style figures, however, it is not recommended to use those in tables. Using Open-Type fonts and XeLaTeX and LuaLaTeX we can print numbers in the text in old-style fonts, but numbers in the table in lining numbers.

Issues with Amsmath

If you load the amsmath package the above solution will not work. You need to add the following lines after you load the package (thanks to David Carlisle for this and many other problems that he helped to solve!):

\makeatletter
\edef\originalbmathcode{%
    \noexpand\mathchardef\noexpand\@tempa\the\mathcode`\(\relax}
\def\resetMathstrut@{%
  \setbox\z@\hbox{%
    \originalbmathcode
    \def\@tempb##1"##2##3{\the\textfont"##3\char"}%
    \expandafter\@tempb\meaning\@tempa \relax
  }%
  \ht\Mathstrutbox@\ht\z@ \dp\Mathstrutbox@\dp\z@
}
\makeatother

 Downloads & Links

27 thoughts on “LaTeX and Stata integration (3): Improving the Design

  1. I tried this very pretty solution, and everything seems to work… except the fact that when you I get a negative t-stat ( -not a standard error! and that is displayed between parenthesis by Stata) then the minus sign crosses the left parenthesis… how can we avoid that ? Many thanks!

      • Yes the whole new specification that you propose here… Could you obtain the same problem as I do ? If I am right you just need to tell esttab do display parameter’s values and t-stats insead of parameter’s value and standard errors as in your example ;-) Many thanks!

        • I think I’ve got a solution, but I don’t know how stable it is yet. For now, you could just delete \yyy from the “textsymbols” definition. You will get “overfull hbox” warnings, but “(-” won’t be overlayed.

  2. Hi! I hope everything is fine! I have another question, if you have some time. When I use the package \usepackage{pdflscape} in order to display a wide table in landscape mode, I have a problem using your Latex tricks : the horizontal lines of the table do not reach the right end of the table… they end before. Do you know how to fix that ? Many thanks!

    • Hi, I’m afraid I don’t have an answer to your problem. If you sent me a fully compilable minimum working example via email I might be able to look at it.

      • Use estauto and they’ll stretch as far as you need them. If you use estwide, it’ll set it to the width of a landscape page.

  3. Hi! When I try to insert a table (of 7 columns) in latex, Latex gives me the error:

    ! Undefined control sequence.
    \estwide …tracolsep \fill }l*{#2}{#3}} \toprule
    \estinput {#1} \bottomrule…
    l.95 \estwide{table2.tex}{7}{c}

    Does anyone know what’s wrong?
    Thanks

    • The error says that a command you are using is not defined. My guess is that you are not loading “booktabs” or have not defined the “\estwide” command properly.

  4. These posts have been very helpful for me to transfer my Stata results into LaTeX, thank you so much for sharing.

    Like others, I am having trouble with long tables for my regression results. I hope that in time you will come across a solution that fits with the work you have done so far.

    I have come across a couple of other issues where my limited LaTeX skills have hit a barrier.

    1. By using the ‘siunitx’ package (which generally works quite well) I have found that some of my summary stats get aligned strangely. For example if I have the number of observations (integer) and R2 values (4 decimal) where previously these were centred in the column. Is it possible to disable the digit alignment for specific cells?

    2. One of my columns contains variable descriptions which I would prefer to wrap while not upsetting the positions of the content in other cells in the row, like \multirow. Have you experienced the need to format cells in that way using your methods? I have wondered if I could incorporate this into my ‘estout’ commands in Stata but do not know where to begin.

    I could prepare a MWE if my points are unclear.

    Kind Regards
    Kevin

    • Thanks for you comments.

      @1: Your first problem is discussed in my first post, the example of regression results. “Observations” have no decimal points, hence we need this (and only this cell!) to be centered. The relevant code is:

      layout("\multicolumn{1}{c}{@}" "\multicolumn{1}{S}{@}") labels(`"Observations"' `"Pseudo \(R^{2}\)"' `"LR chi2"' `"Prob > chi2"' `"Baseline predicted probability"'))

      You see, we are telling estout to center “Observations” with \multicolumn{1}{c}{@} and to use siunitx for all following cells with \multicolumn{1}{S}{@}.

      @2: That should be covered in my second post, i.e. including a minipage. I don’t like this approach very much because text-wrapping in a LaTeX table always requires to specify some width. I want to do all layouting in LaTeX, not in my do-file, hence I did not spent much time on this.

      @Longtables: I was unsuccessful before. Unless I really need this myself I’m afraid I don’t have the time to figure it out at the moment. So a cry for help: Let me know if you find a solution!

      • Thanks Jörg, I really appreciate the quick reply.

        @1: I reviewed your first post again and better understood how you have centred the Observations cell. Unfortunately this still doesn’t quite look right in my table against the other cells which rely on siunitx for alignment. I noticed in your example this is also the case but it is less pronounced than my result. My workaround has been to centre all of the summary stats at the expense of being slightly offset with the rest of the table. Not satisfied with this I also reviewed the package documentation and came across the following code in section 7.5 (p. 68).

        \begin{table}
        \caption{Formatting unrelated numbers}
        \label{tbl:xmpl:unrel}
        \centering
        \begin{tabular}
        {
        S[
        table-format = 5.0,
        parse-numbers = false,
        input-symbols=.,
        input-decimal-markers = x
        ]
        }\
        toprule
        \multicolumn{1}{c}{Header} \\
        \midrule
        120 \\
        12.3 \\
        12340 \\
        12.02 \\
        123 \\
        1 \\
        \bottomrule
        \end{tabular}
        \end{table}

        Unfortunately I have been unable to adapt this as I do not want apply to the whole column. Maybe it is possible to incorporate the tabular parameters into the estout ‘layout’ command from your first post.

        @2: I prefer your approach as well, although I think Bert’s solution will suit my needs. Apologies I was not thorough enough in my reading of your posts before posting my comments.

        @Longtables: I do have a need for this and will hopefully have something to add to your growing tutorial if I have any success.

        Cheers
        Kevin

        • An improvement to @1 which achieves my objectives is to add ‘table-parse-only’ to the alignment element of \estauto:

          \estauto{table}{3}{S[table-parse-only,table-format=1.2,table-column-width=20mm]}

          I have also added to @2 by making the long description cell span two rows from the Stata estout command:

          varlabels( , prefix("\multirow{2}{*}{\begin{minipage}{5cm}{") suffix("}\end{minipage}}")) ///

          This incorporates Bert’s minipage improvement within a multirow environment. I accept this moves some formatting to Stata which would preferably be retained in LaTeX but the results are pretty good in my case.

          Next to attack the longtable issue, although I expect that may be well beyond my skills.

          • I don’t quite understand how table-parse-only can be the solution as it turns off all decimal alignment. That’s definitely not suitable when you have the objective to create a nice table with regression results.

            But perhaps I misunderstood your issue. With my approach the numbers of observations are exactly centered according to the width of the column – the decimal alignment is disabled here, so it can’t be any other way. Is that not what you wanted? Perhaps you can email me a screenshot of how it looks when you compile.

  5. Jörg, this code is incredibly helpful. Thanks a lot!

    A useful modification of the \estwide command is to add an optional argument for adjusting the textwidth of the table:

    \newcommand{\estwide}[4][1]{
    \vspace{-1ex}{
    \textsymbols% Note the added command here
    \begin{tabular*}
    {#1\textwidth}{@{\hskip\tabcolsep\extracolsep\fill}l*{#3}{#4}}
    \toprule
    \estinput{#2}
    \bottomrule
    \addlinespace[.75ex]
    \end{tabular*}
    }
    }

    Edit: I should have added that to use the modified estwide command, for a table as wide as 75% of the text width, you type something like

    \estwide[0.75]{tablename}{4}{c}

    The default of the command is set to 100%, so both
    \estwide{tablename}{4}{c}
    and
    \estwide[1]{tablename}{4}{c}
    produce the same output.

    • Hi Ben, thanks for your comment. Defining the width of the table is a bit against my philosophy of letting LaTeX handling most of the layout issues. The first column with variable names has to be as long as the longest label; if necessary, I change the other column widths by using siunitx functionality “S[table-column-width=20mm]“.

      • I fail to understand your point that it is not kosher to define the width of the table, since that is exactly what your command \estwide does.

        Your command \estauto allows LaTeX to determine the table width. Your command \estwide explicitly defines the width of the table to be equal to \textwidth.

        The modification allows the width to be set to possibilities other than \textwidth, say if your table needs to be more narrow than \textwidth. The modification does not explicitly set any column width. The column widths remain to be calculated by LaTeX; in particular the first column satisfies the longest label requirement you mentioned above.

        In any case, thanks a lot for your code and documentation, it’s really useful!

        • Ben, yes you are right. I’ve kind of forgotten what my \estwide command actually does and I guess I didn’t appreciate what you said as it is not a usecase I normally have: all ‘normal’ tables that I produce are ’1\textwidth’ using \estwide. For wider tables I let LaTeX calculate the required width using \estauto, possible tweaking using siunitx ‘table-column-width’.

          But your suggestion to set textwidth using an optional command is great – I’ll includethis when I revise the code the next time.

  6. Hello Jorg, many thanks for this wonderful tutorial! I have a little problem: I am unable to make it work with Beamer… is there a solution? Many thanks!

    • Yes: you need to remove the ‘threeparttable’ package, don’t load it at all when using ‘beamer’. Then, of course, remove the ‘threeparttable’ environment within the ‘table’ environment.

  7. Hi Jörg,
    Thank you so much for these posts!
    I was just wondering have you by any change solved the longtable problem?

    • Hi Paul, unfortunately not. I tried a few things without success. I’m sure it is possible somehow, but as I have no need for longtables at the moment I did not look to deep into it.

  8. Jorg —

    I have two rows below my table indicating fixed effects and controls. I like it when the row for fixed effects/controls says “no no yes”, however then .tex does not like these because it is likely trying to align these by decimal point and since these strings are not numbers it does not seem to work.

    I’ve changed my labels from yes to 1 and no to 0 for the time being, but I am very interesting in how this might work moving forward.

    Cheers,
    -Derek

    Also — have you check out longtablex? I have the need for long tables and would love to use your environment.

    • Quick answer: you need to wrap the “Yes” “No” in ‘multicolumn’ so that it is not parsed by siunitx, same as “AIC” and “Observations” in my sample table.

      More complete answer: The “indicate” function of estout is what you are after. Include the following into the esstab code, and adjust the variable names (here: age* and education*) accordingly.

      indicate("Age controls = age*" "Education controls = education*", labels("\multicolumn{1}{c}{Yes}" "\multicolumn{1}{c}{No}" ))

      Re longtables: I still did not investigate this further besides a quick test that did not work.

      • Thanks a bunch — I got it worked out for now. The longtables will have to wait. I’ll keep an eye on your page to see if you or someone else comes up with a workaround.

Leave a Reply