## Research papers

A list of publications can also be found in my CV | arXiv | MathSciNet | Google Scholar

#### Cayley graphs

• Quasirandom Cayley graphs (with David Conlon)
abstract | arXiv | slides | Oberwolfach report

We prove that the properties of having small discrepancy and having small second eigenvalue are equivalent in Cayley graphs, extending a result of Kohayakawa, Rödl and Schacht, who treated the abelian case. The proof relies on a number of ingredients, including Grothendieck's inequality and non-abelian Fourier analysis. As a corollary, we also prove that a similar result holds in all vertex-transitive graphs.

#### The Green-Tao theorem and a relative Szemerédi theorem

slides | videos: Simons, IAS

• A relative Szemerédi theorem (with David Conlon and Jacob Fox)
Geometric and Functional Analysis 25 (2015), 733–762.
abstract | arXiv | blog | Oberwolfach report

The celebrated Green-Tao theorem states that there are arbitrarily long arithmetic progressions in the primes. One of the main ingredients in their proof is a relative Szemerédi theorem which says that any subset of a pseudorandom set of integers of positive relative density contains long arithmetic progressions.

In this paper, we give a simple proof of a strengthening of the relative Szemerédi theorem, showing that a much weaker pseudorandomness condition is sufficient. Our strengthened version can be applied to give the first relative Szemerédi theorem for $k$-term arithmetic progressions in pseudorandom subsets of $\mathbb{Z}_N$ of density $N^{-c_k}$.

The key component in our proof is an extension of the regularity method to sparse pseudorandom hypergraphs, which we believe to be interesting in its own right. From this we derive a relative extension of the hypergraph removal lemma. This is a strengthening of an earlier theorem used by Tao in his proof that the Gaussian primes contain arbitrarily shaped constellations and, by standard arguments, allows us to deduce the relative Szemerédi theorem.

Companion note: Linear forms from the Gowers uniformity norm (arXiv)
• An arithmetic transference proof of a relative Szemerédi theorem
Mathematical Proceedings of the Cambridge Philosophical Society 156 (2014), 255–261.
abstract | arXiv

Recently Conlon, Fox, and the author gave a new proof of a relative Szemerédi theorem, which was the main novel ingredient in the proof of the celebrated Green-Tao theorem that the primes contain arbitrarily long arithmetic progressions. Roughly speaking, a relative Szemerédi theorem says that if $S$ is a set of integers satisfying certain conditions, and $A$ is a subset of $S$ with positive relative density, then $A$ contains long arithmetic progressions, and our recent results show that $S$ only needs to satisfy a so-called linear forms condition.

This note contains an alternative proof of the new relative Szemerédi theorem, where we directly transfer Szemerédi's theorem, instead of going through the hypergraph removal lemma. This approach provides a somewhat more direct route to establishing the result, and it gives better quantitative bounds.

The proof has three main ingredients: (1) a transference principle/dense model theorem of Green-Tao and Tao-Ziegler (with simplified proofs given later by Gowers, and independently, Reingold-Trevisan-Tulsiani-Vadhan) applied with a discrepancy/cut-type norm (instead of a Gowers uniformity norm as it was applied in earlier works), (2) a counting lemma established by Conlon, Fox, and the author, and (3) Szemerédi's theorem as a black box.

• A short proof of the multidimensional Szemerédi theorem in the primes (with Jacob Fox)
American Journal of Mathematics 137 (2015), 1139—1145.
abstract | arXiv | blog

Tao conjectured that every dense subset of $\mathcal{P}^d$, the $d$-tuples of primes, contains constellations of any given shape. This was very recently proved by Cook, Magyar, and Titichetrakun and independently by Tao and Ziegler. Here we give a simple proof using the Green-Tao theorem on linear equations in primes and the Furstenberg-Katznelson multidimensional Szemerédi theorem.

#### Large deviations in random graphs

• On replica symmetry of large deviations in random graphs (with Eyal Lubetzky)
Random Structures & Algorithms 47 (2015) 109–146.
abstract | arXiv | blog | Oberwolfach report

The following question is due to Chatterjee and Varadhan (2011). Fix $0 \lt p \lt r \lt 1$ and take $G\sim \mathcal G(n,p)$, the Erdős-Rényi random graph with edge density $p$, conditioned to have at least as many triangles as the typical $\mathcal G(n,r)$. Is $G$ close in cut-distance to a typical $\mathcal G(n,r)$? Via a beautiful new framework for large deviation principles in $\mathcal G(n,p)$, Chatterjee and Varadhan gave bounds on the replica symmetric phase, the region of $(p,r)$ where the answer is positive. They further showed that for any small enough $p$ there are at least two phase transitions as $r$ varies.

We settle this question by identifying the replica symmetric phase for triangles and more generally for any fixed $d$-regular graph. By analyzing the variational problem arising from the framework of Chatterjee and Varadhan we show that the replica symmetry phase consists of all $(p,r)$ such that $(r^d,h_p(r))$ lies on the convex minorant of $x\mapsto h_p(x^{1/d})$ where $h_p$ is the rate function of a binomial with parameter $p$. In particular, the answer for triangles involves $h_p(\sqrt{x})$ rather than the natural guess of $h_p(x^{1/3})$ where symmetry was previously known. Analogous results are obtained for linear hypergraphs as well as the setting where the largest eigenvalue of $G\sim\mathcal G(n,p)$ is conditioned to exceed the typical value of the largest eigenvalue of $\mathcal G(n,r)$. Building on the work of Chatterjee and Diaconis (2012) we obtain additional results on a class of exponential random graphs including a new range of parameters where symmetry breaking occurs. En route we give a short alternative proof of a graph homomorphism inequality due to Kahn (2001) and Galvin and Tetali (2004).

• On the variational problem for upper tails in sparse random graphs (with Eyal Lubetzky)
Random Structures & Algorithms, to appear.
abstract | arXiv | blog

What is the probability that the number of triangles in $\mathcal{G}_{n,p}$, the Erdős-Rényi random graph with edge density $p$, is at least twice its mean? Writing it as $\exp[- r(n,p)]$, already the order of the rate function $r(n,p)$ was a longstanding open problem when $p=o(1)$, finally settled in 2012 by Chatterjee and by DeMarco and Kahn, who independently showed that $r(n,p)\asymp n^2p^2 \log (1/p)$ for $p \gtrsim \frac{\log n}n$; the exact asymptotics of $r(n,p)$ remained unknown.

The following variational problem can be related to this large deviation question at $p\gtrsim \frac{\log n}n$: for $\delta>0$ fixed, what is the minimum asymptotic $p$-relative entropy of a weighted graph on $n$ vertices with triangle density at least $(1+\delta)p^3$? A beautiful large deviation framework of Chatterjee and Varadhan (2011) reduces upper tails for triangles to a limiting version of this problem for fixed $p$. A very recent breakthrough of Chatterjee and Dembo extended its validity to $n^{-\alpha}\ll p \ll 1$ for an explicit $\alpha>0$, and plausibly it holds in all of the above sparse regime.

In this note we show that the solution to the variational problem is $\min\{\frac12 \delta^{2/3}\,,\, \frac13 \delta\}$ when $n^{-1/2}\ll p \ll 1$ vs. $\frac12 \delta^{2/3}$ when $n^{-1} \ll p\ll n^{-1/2}$ (the transition between these regimes is expressed in the count of triangles minus an edge in the minimizer). From the results of Chatterjee and Dembo, this shows for instance that the probability that $\mathcal{G}_{n,p}$ for $n^{-\alpha} \leq p \ll 1$ has twice as many triangles as its expectation is $\exp[-r(n,p)]$ where $r(n,p)\sim \frac13 n^2 p^2\log(1/p)$. Our results further extend to $k$-cliques for any fixed $k$, as well as give the order of the upper tail rate function for an arbitrary fixed subgraph when $p\geq n^{-\alpha}$.

• On the lower tail variational problem for random graphs
Combinatorics, Probability and Computing, to appear.
abstract | arXiv

We study the lower tail large deviation problem for subgraph counts in a random graph. Let $X_H$ denote the number of copies of $H$ in an Erdős-Rényi random graph $\mathcal{G}(n,p)$. We are interested in estimating the lower tail probability $\mathbb{P}(X_H \le (1-\delta) \mathbb{E} X_H)$ for fixed $0 < \delta < 1$.

Thanks to the results of Chatterjee, Dembo, and Varadhan, this large deviation problem has been reduced to a natural variational problem over graphons, at least for $p \ge n^{-\alpha_H}$ (and conjecturally for a larger range of $p$). We study this variational problem and provide a partial characterization of the so-called “replica symmetric” phase. Informally, our main result says that for every $H$, and $0 < \delta < \delta_H$ for some $\delta_H > 0$, as $p \to 0$ slowly, the main contribution to the lower tail probability comes from Erdős-Rényi random graphs with a uniformly tilted edge density. On the other hand, this is false for non-bipartite $H$ and $\delta$ close to 1.

• Upper tails and independence polynomials in random graphs (with Bhaswar B. Bhattacharya, Shirshendu Ganguly, and Eyal Lubetzky)
abstract | arXiv

The upper tail problem in the Erdős-Rényi random graph $G\sim\mathcal{G}_{n,p}$ asks to estimate the probability that the number of copies of a graph $H$ in $G$ exceeds its expectation by a factor $1+\delta$. Chatterjee and Dembo (2014) showed that in the sparse regime of $p\to 0$ as $n\to\infty$ with $p \geq n^{-\alpha}$ for an explicit $\alpha=\alpha_H>0$, this problem reduces to a natural variational problem on weighted graphs, which was thereafter asymptotically solved by two of the authors in the case where $H$ is a clique.
Here we extend the latter work to any fixed graph $H$ and determine a function $c_H(\delta)$ such that, for $p$ as above and any fixed $\delta>0$, the upper tail probability is $\exp[-(c_H(\delta)+o(1))n^2 p^\Delta \log(1/p)]$, where $\Delta$ is the maximum degree of $H$. As it turns out, the leading order constant in the large deviation rate function, $c_H(\delta)$, is governed by the independence polynomial of $H$, defined as $P_H(x)=\sum i_H(k) x^k$ where $i_H(k)$ is the number of independent sets of size $k$ in $H$. For instance, if $H$ is a regular graph on $m$ vertices, then $c_H(\delta)$ is the minimum between $\frac12 \delta^{2/m}$ and the unique positive solution of $P_H(x) = 1+\delta$.

• Upper tails for arithmetic progressions in a random set (with Bhaswar B. Bhattacharya, Shirshendu Ganguly, and Xuancheng Shao)
abstract | arXiv

Let $X_k$ denote the number of $k$-term arithmetic progressions in a random subset of $\mathbb{Z}/N\mathbb{Z}$ or $\{1, \dots, N\}$ where every element is included independently with probability $p$. We determine, in certain ranges of parameters, the asymptotics of $\log \mathbb{P}(X_k \ge (1+\delta) \mathbb{E} X_k)$. Recently, Chatterjee and Dembo showed that for $p=p_N\to 0$ sufficiently slowly, this problem, for $k=3$, reduces to a natural variational problem. We solve this variational problem asymptotically, thereby obtaining the large deviation rate. More generally, for $k \ge 4$, using the inverse theorem for the Gowers norms, we determine the rate as long as $p \to 0$ extremely slowly. We solve the variational problem by reducing it to the combinatorial problem of maximizing the number of $k$-APs in a set of given size, which was solved by Green and Sisask when $k = 3$, and extended to all $k$ here. Our results complement those of Warnke, who used completely different methods to estimate, for the full range of $p$, the large deviation rate up to a constant factor.

#### Graph limits

• Hypergraph limits: a regularity approach
Random Structures & Algorithms 47 (2015), 205–226.
abstract | arXiv

A sequence of $k$-uniform hypergraphs $H_1, H_2, \dots$ is convergent if the sequence of homomorphism densities $t(F, H_1), t(F, H_2), \dots$ converges for every $k$-uniform hypergraph $F$. For graphs, Lováasz and Szegedy showed that every convergent sequence has a limit in the form of a symmetric measurable function $W \colon [0,1]^2 \to [0,1]$. For hypergraphs, analogous limits $W \colon [0,1]^{2^k-2} \to [0,1]$ were constructed by Elek and Szegedy using ultraproducts. These limits had also been studied earlier by Hoover, Aldous, and Kallenberg in the setting of exchangeable random arrays.

In this paper, we give a new proof and construction of hypergraph limits. Our approach is inspired by the original approach of Lovász and Szegedy, with the key ingredient being a weak Frieze-Kannan type regularity lemma.

• An $L^p$ theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions (with Christian Borgs, Jennifer T. Chayes, and Henry Cohn)
abstract | arXiv

We introduce and develop a theory of limits for sequences of sparse graphs based on $L^p$ graphons, which generalizes both the existing $L^\infty$ theory of dense graph limits and its extension by Bollobáas and Riordan to sparse graphs without dense spots. In doing so, we replace the no dense spots hypothesis with weaker assumptions, which allow us to analyze graphs with power law degree distributions. This gives the first broadly applicable limit theory for sparse graphs with unbounded average degrees. In this paper, we lay the foundations of the $L^p$ theory of graphons, characterize convergence, and develop corresponding random graph models, while we prove the equivalence of several alternative metrics in a companion paper.

• An $L^p$ theory of sparse graph convergence II: LD convergence, quotients, and right convergence (with Christian Borgs, Jennifer T. Chayes, and Henry Cohn)
abstract | arXiv

We extend the $L^p$ theory of sparse graph limits, which was introduced in a companion paper, by analyzing different notions of convergence. Under suitable restrictions on node weights, we prove the equivalence of metric convergence, quotient convergence, microcanonical ground state energy convergence, microcanonical free energy convergence, and large deviation convergence. Our theorems extend the broad applicability of dense graph convergence to all sparse graphs with unbounded average degree, while the proofs require new techniques based on uniform upper regularity. Examples to which our theory applies include stochastic block models, power law graphs, and sparse versions of $W$-random graphs.

• On derivatives of graphon parameters (with László Miklós Lovász)
Journal of Combinatorial Theory, Series A 145 (2017) 364–368.
abstract | arXiv

We give a short elementary proof of the main theorem in the paper "Differential calculus on graph on space" by Diao et al. (JCTA 2015), which says that any graphon parameters whose $(N+1)$-th derivatives all vanish must be a linear combination of homomorphism densities $t(H, -)$ over graphs $H$ on at most $N$ edges.

#### Graph regularity lemma and applications

• On regularity lemmas and their algorithmic applications (with Jacob Fox and László Miklós Lovász)
abstract | arXiv

Szemerédi's regularity lemma and its variants are some of the most powerful tools in combinatorics. In this paper, we establish several results around the regularity lemma. First, we prove that whether or not we include the condition that the desired vertex partition in the regularity lemma is equitable has a minimal effect on the number of parts of the partition. Second, we use an algorithmic version of the (weak) Frieze—Kannan regularity lemma to give a substantially faster deterministic approximation algorithm for counting subgraphs in a graph. Previously, only an exponential dependence for the running time on the error parameter was known, and we improve it to a polynomial dependence. Third, we revisit the problem of finding an algorithmic regularity lemma, giving approximation algorithms for several co-NP-complete problems. We show how to use the weak Frieze—Kannan regularity lemma to approximate the regularity of a pair of vertex subsets. We also show how to quickly find, for each $\epsilon' > \epsilon$, an $\epsilon'$-regular partition with $k$ parts if there exists an $\epsilon$-regular partition with $k$ parts. Finally, we give a simple proof of the permutation regularity lemma which improves the tower-type bound on the number of parts in the previous proofs to a single exponential bound.

#### Sphere packing and energy minimization

• Sphere packing bounds via spherical codes (with Henry Cohn)
Duke Mathematical Journal 163 (2014), 1965–2002.
abstract | arXiv | blog

The sphere packing problem asks for the greatest density of a packing of congruent balls in Euclidean space. The current best upper bound in all sufficiently high dimensions is due to Kabatiansky and Levenshtein in 1978. We revisit their argument and improve their bound by a constant factor using a simple geometric argument, and we extend the argument to packings in hyperbolic space, for which it gives an exponential improvement over the previously known bounds. Additionally, we show that the Cohn-Elkies linear programming bound is always at least as strong as the Kabatiansky-Levenshtein bound; this result is analogous to Rodemich's theorem in coding theory. Finally, we develop hyperbolic linear programming bounds and prove the analogue of Rodemich's theorem there as well.

• Energy-minimizing error-correcting codes (with Henry Cohn)
IEEE Transactions on Information Theory 60 (2014), 7442–7450.
abstract | arXiv | blog

We study a discrete model of repelling particles, and we show using linear programming bounds that many familiar families of error-correcting codes minimize a broad class of potential energies when compared with all other codes of the same size and block length. Examples of these universally optimal codes include Hamming, Golay, and Reed-Solomon codes, among many others, and this helps explain their robustness as the channel model varies. Universal optimality of these codes is equivalent to minimality of their binomial moments, which has been proved in many cases by Ashikhmin and Barg. We highlight connections with mathematical physics and the analogy between these results and previous work by Cohn and Kumar in the continuous setting, and we develop a framework for optimizing the linear programming bounds. Furthermore, we show that if these bounds prove a code is universally optimal, then the code remains universally optimal even if one codeword is removed.

#### Extremal and Ramsey graph theory

• Extremal results in sparse pseudorandom graphs (with David Conlon and Jacob Fox)
Advances in Mathematics 256 (2014), 206–290.
abstract | arXiv | slides | blog

Szemerédi’s regularity lemma is a fundamental tool in extremal combinatorics. However, the original version is only helpful in studying dense graphs. In the 1990s, Kohayakawa and Rödl proved an analogue of Szemerédi’s regularity lemma for sparse graphs as part of a general program toward extending extremal results to sparse graphs. Many of the key applications of Szemerédi’s regularity lemma use an associated counting lemma. In order to prove extensions of these results which also apply to sparse graphs, it remained a well-known open problem to prove a counting lemma in sparse graphs.

The main advance of this paper lies in a new counting lemma, proved following the functional approach of Gowers, which complements the sparse regularity lemma of Kohayakawa and Rödl, allowing us to count small graphs in regular subgraphs of a sufficiently pseudorandom graph. We use this to prove sparse extensions of several well-known combinatorial theorems, including the removal lemmas for graphs and groups, the Erdős-Stone-Simonovits theorem and Ramsey’s theorem. These results extend and improve upon a substantial body of previous work.

• The critical window for the classical Ramsey-Turán problem (with Jacob Fox and Po-Shen Loh)
Combinatorica 35 (2015) 435—476.
abstract | arXiv | blog

In the decades since its introduction, Szemerédi’s regularity lemma has been widely adopted by the combinatorial community as a powerful microscope for studying the asymptotic regimes of extremal problems. Yet this power comes at the cost of limited resolution outside of the very far asymptotic regime, as the regularity lemma’s quantitative bounds necessarily involve tower-type dependencies. We investigate the very first application of Szemerédi’s regularity lemma, which was the following celebrated Ramsey-Turán result proved by Szemerédi in 1972: any $K_4$-free graph on $n$ vertices with independence number $o(n)$ has at most $(1/8+o(1))n^2$ edges. Four years later, Bollobás and Erdős gave a surprising geometric construction, utilizing the isoperimetric inequality for the high dimensional sphere, of a $K_4$-free graph on $n$ vertices with independence number $o(n)$ and $(1/8-o(1))n^2$ edges.

Bollobás and Erdős asked to estimate the minimum possible independence number in the critical window, when the number of edges is about $n^2/8$. This required a level of accuracy beyond the reach of regularity-based approaches, and remained one of the main open problems in this area, receiving considerable attention. In this paper, we develop new regularity-free methods which give rise to nearly best-possible dependencies, and solve several longstanding open problems concerning this critical window. Along the way, we introduce a new twist on another influential combinatorial technique, known as dependent random choice, which produces substantially better bounds.

slides

## Expositions and surveys

• The Green-Tao theorem: an exposition (with David Conlon and Jacob Fox)
EMS Surveys in Mathematical Sciences 1 (2014), 249–282.
abstract | arXiv

The celebrated Green-Tao theorem states that the prime numbers contain arbitrarily long arithmetic progressions. We give an exposition of the proof, incorporating several simplifica- tions that have been discovered since the original paper.

• Extremal regular graphs: independent sets and graph homomorphisms
abstract | arXiv | blog

This survey concerns regular graphs that are extremal with respect to the number of independent sets, and more generally, graph homomorphisms. The main motivating problem is the following: in the family of $d$-regular graphs, which graph $G$ maximize/minimize the quantity $i(G)^{1/v(G)}$, the number of independent sets in $G$ normalized exponentially by the size of $G$? What if $i(G)$ is replaced by some other graph parameter?

We review existing techniques, highlight some exciting recent developments, and discuss open problems and conjectures for future research.