Blog

How I manage my BibTeX references, and why I prefer not initializing first names

July 4, 2021

This is both a how-to guide, and a discussion of the rationale for some of my preferences.

I use BibTeX for compiling my bibliographic references when writing math papers in LaTeX. BibTeX, when used correctly, eases the burden of bibliographic management. For example, BibTeX automatically excludes uncited references and alphabetizes the entries.

Since a single paper can have a large number of references, it is useful to have a simple system to

Retrieve complete and accurate bibliographic records, and
Maintain consistency.

Here are some of my own processes and conventions, along with my rationale behind some of the choices. This blog post is not a beginner’s guide on BibTeX (here is a good guide). Rather, I focus on my own conventions and choices, specifically tailored for math research papers.

Include author names as they appear in the original publication

It is common practice to shorten first names to initials in references. I do not agree with this practice. In many Asian cultures, a large fraction of the population use a fairly small number of surnames (e.g., Wang, Li, Kim). So first name initialization can lead to ambiguity. Readers are also less likely to distinguish common names and associate them with specific individuals, leading to less recognition and exposure, and this can have an adverse effect especially on early-career academics.

(Aside) As a grad student, every once in a while I was sent some combinatorics paper to referee, but these papers weren’t exactly close to anything I worked on. Looking back, I think perhaps the editors had sometimes intended to ask the other Y. Zhao in combinatorics.

(Another aside) While we are on the topic of names, it’s also worth pointing out that Chinese surnames tend to cluster late in the alphabet. This creates an interesting effect when you scroll to the bottom of a math department directory.

Economists, like mathematicians, also have co-authorship alphabetized by default. A 2006 research paper by economics professors from Stanford and Caltech showed that faculty in top 35 U.S. economics departments “with earlier surname initials are significantly more likely to receive tenure at top ten economics departments, are significantly more likely to become fellows of the Econometric Society, and, to a lesser extent, are more likely to receive the Clark Medal and the Nobel Prize,” even after controlling for “country of origin, ethnicity, religion or departmental fixed effects.” The same effects were not observed in the psychology profession, where authorship is not alphabetized.

To be clear, I am not at all advocating for abolishing the practice of ordering authors alphabetically. Quite the opposite. Ordering authors by contribution invites other conflicts and politics, which seems a lot worse. Ordering “randomly” doesn’t really make sense to me unless one can somehow enforce an independent random permutation of the authors every time the paper is read or mentioned. (End asides)

The practice of initializing first names was perhaps originally due to antiquated needs such as saving ink and space. Many journals still enforce it today in their house style.

(Example) First names are initialized not just in the references, but also sometimes in the author line, even in modern times. Up to the end of the 2006 volume, the highly regarded journal Geometric and Functional Analysis (GAFA) would typically initialize authors’ first names in the byline. There were occasional exceptions where the authors’ names were printed in full. For example, in the 2005 volume, exactly two out of 42 papers had the authors’ full names printed, and these were precisely the two papers (and only ones) having authors with Chinese names. It is likely that these authors requested to have their full names published.

When I first started writing math papers I followed what other papers did. But later on I realized that this practice does not make as much sense as I initially assumed it would. Nowadays, whenever I have control over my references (e.g., my arXiv uploads), I follow this rule:

Include author names as they appear in the original publication

Rationale and caveats:

Why not just always use full names? Not all authors actually want their initializations expanded. Some authors may have chosen a specific stylization of their pen name (e.g., J. K. Rowling), or some specific capitalization. Also, sometimes, it is difficult to look up their full names. It is not my role to guess (just as it would never be proper for me to permute the order of authors in a paper I am citing). Following how they write their own name on their paper is the simplest and most consistent practice.
Some authors may prefer to use different pseudonyms (or pen names) for different publications. I should not be “correcting” them. Some notable examples:
- Noga Alon published several papers under “A. Nilli.” Accompanying the discussion of one of such paper in Proofs from THE BOOK is a photo of a young girl with the caption “A. Nilli.”
- Victor Kac, after emmigrating from the Soviet Union, had to publish his work with his Soviet advisor Ernest Vinberg under the Italian pseudonyms Gatti and Viniberghi due to Soviet restrictions.
  (I learned this fact from another fascinating economics paper on the effect of the collapse of the Soviet Union on the productivity of American mathematicians.)
Amazingly, the AMS Mathematical Reviews (MathSciNet) actually knows about author pseudonyms to high accuracy, including the examples above. There is a manual painstaking process by the Math Reviews team to assign each author to the correct person.
In rare instances, the publisher makes a mistake in printing the authors’ names. For example, someone told me that a journal once erroneously expanded his name as the copy editor mistakenly thought that the author’s first name was provided in a shortened form, after the final galley proof was returned to the publisher (so that there was no further opportunity for the author to correct the last minute error introduced by the copy editor). This error arose from someone deciding how someone else’s name should be written.

When there is a publication error, which name should be used in bibliographic citations? Should it matter whether or not an erratum is published? In this specific case, I would use the correct name as intended by the author, but in general, I might not have access to the extra information. These situations are annoying but rare, and can be decided case by case.

So the simplest and most consistent solution is to just include author names as they appear in the original publication.

It is okay if some entries use full names and some others use first initials.
It is okay if the same person appears in the references multiple times differently.
It is not okay to change how someone’s name is intended to written.
Initializing first names leads to ambiguity.

I admit that I have not always followed this practice, but lately I have been making an effort to do so at least when I am in charge of managing the references. The BibTeX how-to guide here makes it easy to implement. The entries imported from MathSciNet always have author names as they were originally printed.

BibTeX style file

I made a custom modification of the amsplain style file:

amsplain0.bst (click to download)

Compared to the standard amsplain style file, I made the following modifications:

No longer displaying the MR number in the bibliographic entry;
No longer replacing the authors by a long dash when the authors are identical to the previous entry.

To apply the new style file, I include the .bst file in the same folder as my LaTeX file, and insert the references in LaTeX using:

\bibliographystyle{amsplain0}
\bibliography{bib file name (without the .bib extension)}

Optionally, before uploading the LaTeX source to the arXiv or journal, I copy and paste the contents of the auxillary .bbl output to replace the above two lines. Then, I only have to upload a single .tex file rather than both .tex and .bbl files.

Retrieving BibTeX entries from MathSciNet

For the most part, you do not need to enter BibTeX entries by hand. MathSciNet has already compiled all this information.

Accessing MathSciNet requires a subscription, which many universities have. Common ways to access MathSciNet:
- From a device on campus network;
- Using university VPN;
- Using university library’s proxy url;
- Using AMS Remote access to first pair your browser with AMS when you’re on campus or on VPN, so that you can continue accessing MathSciNet and other AMS resources later even when off-campus and off-VPN.
- You can configure your browser to search for MathSciNet directly from the search bar. For example, in Chrome, in Settings / Search Engine you can add a new search engine with the following configurations
  - Search engine: MR search by title
  - Keyword: mr
  - Query url: https://mathscinet.ams.org/mathscinet/search/publications.html?pg1=TI&s1=%s&extend=1
  to search MathSciNet by title. For instance, typing mr large graph into the search bar yields a search for articles with both words “large” and “graph” in the title, whereas typing mr "large graph" into the search bar yields a search for papers with the phrase “large graph” in the title. You can also configure other custom searches by modifying the query url.
- A free lite version, MR Lookup, can be used without subscription, though it only displays the top three search results. It also gives the results in BibTeX format.

All articles listed on MathSciNet can be extracted as BibTeX entries, either individually or in bulk (this can be done on the MathSciNet website by selecting BibTeX from the drop down menu).

I like using BibDesk (only on Mac; free and comes with TeXShop), which offers a nice interface for BibTeX management, including automatic import from MathSciNet (from the External–Web tab on the left). BibDesk also has other useful automated tools such as batch renaming entry keys according to authors-year. (Update: BibDesk is not currently compatible with the updated interface of MathSciNet. You can get around this issue by using the older version of MathSciNet.)

I imagine there must be other similar BibTeX management tools with automatic MathSciNet import, but I have not personally used any of them. See this discussion.

Most entries from MathSciNet are accurate and consistently formatted, especially for published journal articles. MathSciNet has a dedicated staff for maintaining the quality of its entries. They follow the official AMS journal title abbreviations (best not to make up these abbreviations yourself). In contrast, other sources (e.g., Google Scholar search) often produce inconsistently formatted and inaccurate bibliographic entries.

Manual additions and edits

Occasionally some entries need to be added or edited by hand. Here are some common scenarios in my experience.

Additions:

Pre-print and pre-published articles (including those appearing first on the publisher website but not yet assigned an issue) do not appear in MathSciNet. I add them manually as follows:
- For arXiv preprints:
```
@misc{Smi-ar,
    AUTHOR = {John Smith},
    TITLE  = {I proved a theorem},
    NOTE   = {arXiv:2999.9999},
}
```
- For papers to appear (use official AMS journal title abbreviations)
```
@article{Smi-mj,
    AUTHOR  = {John Smith},
    TITLE   = {I proved an excellent theorem},
    JOURNAL = {Math. J.},
    NOTE    = {to appear},
}
```
  (It might annoy some people to have an @article entry without a year, but shrug, it compiles, and I haven’t found a more satisfactory solution.)
Really old (like, pre-1940) and some foreign language articles/books are not listed on MathSciNet — I copy the citations from somewhere else and enter them as an @article/@book by hand.
For non-standard entries like websites and blog posts, I add them as @misc similar to arXiv entries, using the note field as a catch-all. (Yes I know that there is specifically a @url entry type, but I find it easier and more predictable to use @misc. In any case, the choice of entry type does not show up in the output.)
Capitalizations. By default, BibTeX changes the entire title to lower case except for the very first letter. To keep capitalizations of proper nouns, LaTeX commands, etc., surround those letters/words by {...}. BibTeX entries from MathSciNet usually gets this right automatically, but more care if needed when manually adding entries such as from the arXiv.

Modifications:

MathSciNet usually gets it right but sometimes accented names need some manual editing to compile correctly. A notable exaple is that when “Erdős” appears in a title, one needs to surround the ő \H{o} by {...}, e.g., {E}rd{\H{o}}, or else BibTeX changes \H to \h thereby causing a compilation error. Similar issues also arise in the author field, e.g., with the Polish Ł \L. They can always be fixed by surrouding the problematic snippets by {...}.
Some MathSciNet entries for books have extraneous data that should be deleted. It is usually clear what’s going on after seeing the output PDF.
For some journals with reboots in its publication history, the official AMS journal title abbreviation sometimes includes a “series” number, e.g., Annals of Mathematics is actually Annals of Mathematics. Second Series abbreviated as Ann. of Math. (2), which is consistent with MathSciNet imports. Some people don’t like to keep the (2). I’m okay either way.
Conference proceedings are the worst offenders on consistent formatting. They are not covered by AMS guidance and it is a free-for-all. Basically, I import from MathSciNet, compile and see what the PDF output looks like, and then delete extraneous data. If a conference proceeding article is superseded by a journal article, I use the journal article instead.
MathSciNet BibTeX entries include a bunch of other extraneous information like the reviewer’s name. I could delete them but these do not show up in the output anyway, so I don’t bother. The only piece that does show up in the amsplain style is MR number, which does not show up in my modified amsplain0 style.

Entry keys

The simplest cases:

Single author: the first up to three letters of surname (preserving capitalization), followed by two-digit year. Example: Wil95 in

@article {Wil95,
    AUTHOR = {Wiles, Andrew},
     TITLE = {Modular elliptic curves and {F}ermat's last theorem},
   JOURNAL = {Ann. of Math.},
    VOLUME = {141},
      YEAR = {1995},
    NUMBER = {3},
     PAGES = {443--551},
}

Multiple authors: a string of initialized surnames followed by two-digit year. Example: ER60 in

 @article {ER60,
    AUTHOR = {Erd\H{o}s, P. and R\'{e}nyi, A.},
     TITLE = {On the evolution of random graphs},
   JOURNAL = {Magyar Tud. Akad. Mat. Kutat\'{o} Int. K\"{o}zl.},
    VOLUME = {5},
      YEAR = {1960},
     PAGES = {17--61},
}

In BibDesk, I automatically assign keys via the following configuation: in BibDesk Preferences / Cite Key Format, select

Present Format: Custom
Format String: %a91%y%u0

Select one or more rows/entries, enter Cmd-K to automatically generate keys. For multi-author entries, this correctly generates the key, e.g., ER60 (unless there are more than nine authors). For single-author entries, a bit of manual work is needed to change from W95 to Wil95 (I prefer not leaving it like W95 since it leads to more conflicts and confusion).

Where it gets a bit more tricky:

If the above rules produce conflicting keys:
- Add a suffix to the key with a few letters containing some informative data such as journal name or topic, e.g., [Wil95ann], [ER60rg], or
- Distinguish the keys by letters, e.g., [Wil95a] [Wil95b] (this is what BibDesk does automatically; but this system could potentially be more confusing later on).
Pre-prints or pre-publications:
- Forget the year, e.g., [XYZ], and if disambiguation is needed, add a dash followed by some informative letters, e.g., [XYZ-gcd]
Not sure which year to use? E.g., a proceeding published in 2000 for a conference held in 1999. Use the year field in BibTeX as imported by MathSciNet.

What the BibTeX output looks like

Using the amsplain0 style file from the beginning, a citation \cite{Wil95} produces

References

[1] Andrew Wiles, Modular elliptic curves and Fermat’s last theorem, Ann. of Math. 141 (1995), 443–551.

And that’s it! This is how I manage my BibTeX.

Back to main | List of entries |

Mehtaab Sawhney wins Clay Research Fellowship 1/25/2024
Papers by MIT combinatorialists—Fall 2023 12/22/2023
Schildkraut: Equiangular lines and large multiplicity of fixed second eigenvalue 3/6/2023
Nearly all k-SAT functions are unate 9/19/2022
Kwan–Sah–Sawhney–Simkin: High-girth Steiner triple systems 1/13/2022
Graphs with high second eigenvalue multiplicity 9/28/2021
Enumerating k-SAT functions 7/21/2021
Mathematical tools for large graphs 7/12/2021
How I manage my BibTeX references, and why I prefer not initializing first names 7/4/2021
The cylindrical width of transitive sets 1/28/2021
Ashwin Sah and Mehtaab Sawhney win the Morgan Prize 10/29/2020
Jain–Sah–Sawhney: Singularity of discrete random matrices 10/13/2020
Gunby: Upper tails for random regular graphs 10/5/2020
Joints of varieties 9/12/2020
Ashwin Sah's new diagonal Ramsey number upper bound 5/20/2020
Joints tightened 11/21/2019
Equiangular lines with a fixed angle 7/30/2019
Impartial digraphs 6/27/2019
A reverse Sidorenko inequality 9/26/2018
The number of independent sets in an irregular graph 5/12/2018