1. Introduction
Szemerédi's famous theorem on arithmetic progressions, which states that any subset of the integers with positive upper density contains arbitrarily long arithmetic progressions, has the following multidimensional generalization due to Furstenberg and Katznelson [Reference Furstenberg and KatznelsonFK78].
Theorem 1.1 Let $X$ be a finite, nonempty subset of
$\mathbb {Z}^d$. If
$S\subset [N]^d$ contains no nontrivial homothetic copy
$a+bX$ of
$X$, then
$|S|=o(N^d)$.
Here we use the standard notation $[N]:=\{1,\ldots,N\}$. There has been great interest over the past few decades in proving a quantitative version of this theorem with reasonable bounds, i.e. with an upper bound for
$|S|$ whose savings over the trivial bound of
$N^d$ grows at least as quickly as a finite number of iterated logarithms. Indeed, Gowers has posed the problem of proving such a result on several occasions [Reference GowersGow98a, Reference GowersGow01a, Reference GowersGow01b], and others, such as Graham [Reference GrahamGra97], have asked for bounds for sets lacking particular multidimensional configurations. While reasonable bounds are known in Szemerédi's theorem due to work of Gowers [Reference GowersGow98b, Reference GowersGow01b], none are known in the Furstenberg–Katznelson theorem in general. Furstenberg and Katznelson's original proof, which was via ergodic theory, produces no explicit bounds, while the hypergraph regularity proofs of Nagle, Rödl, Schacht, and Skokan [Reference Nagle, Rödl and SchachtNRS06, Reference Rödl and SkokanRS04], Gowers [Reference GowersGow07], and Tao [Reference TaoTao06] each give a saving over the trivial bound of inverse Ackermann type.
Reasonable bounds in Theorem 1.1 are currently known for only one genuinely multidimensional configuration: two-dimensional corners

(and, thus, their linear images) due to the work of Shkredov [Reference ShkredovShk06a, Reference ShkredovShk06b], who proved that any subset of $[N]\times [N]$ containing no nontrivial corners has size at most
$\ll N^2/(\log \log {N})^c$ for some absolute constant
$c>0$. No reasonable bounds are known for any two-or-more-dimensional four-point configuration, such as three-dimensional corners,

or axis-aligned squares,

The latter of these two configurations is the topic of a conjecture of Graham [Reference GrahamGra97], which states that any subset $S\subset \mathbb {N}\times \mathbb {N}$ for which
$\sum _{(x,y)\in S} {1}/({x^2+y^2})$ diverges must contain an axis-aligned square. Graham also conjectured, more generally, that if
$\sum _{(x,y)\in S} {1}/({x^2+y^2})$ diverges, then
$S$ must contain a homothetic copy of
$[m]\times [m]$ for every positive integer
$m$. This is a two-dimensional generalization of the famous and still unresolved conjecture of Erdős that every subset
$T\subset \mathbb {N}$ for which
$\sum _{n\in T} {1}/{n}$ diverges must contain arbitrarily long arithmetic progressions.
Proving reasonable bounds for sets lacking the four-point configurations (2) and (3) seems to be out of reach. This is because no one has managed yet to prove anything useful about a certain two-dimensional directional uniformity norm that naturally appears in the study of these configurations. Details on this difficulty can be found in the work of Austin [Reference AustinAus13a, Reference AustinAus13b], where he demonstrates how enormously complicated and difficult even 100% and 99% inverse theorems can be for directional uniformity norms.
The purpose of this paper is to identify the first two-dimensional four-point configuration for which reasonable bounds in the multidimensional Szemerédi theorem can be proven, and to prove such bounds in the finite field model setting. We will study the configuration

which, when plotted on a two-dimensional integer grid, takes the shape of the capital letter ‘L’. Because of this, we refer to (4) as an L-shaped configuration, and an L-shaped configuration with $z\neq 0$ as a nontrivial L-shaped configuration.
Theorem 1.2 There exists a natural number $m$ and a constant
$C>0$ such that the following holds. Fix a prime
$p\geq 11$, and set
$N:=p^n$. If
$n\geq C$, then all
$S\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ containing no nontrivial L-shaped configurations satisfy

The $m$ obtained in the theorem is huge, so we do not attempt to compute it. The bulk of the size of
$m$ comes from our use of a recent quantitative inverse theorem for the
$U^{10}$-norm on
$\mathbb {F}_p^n$ due to Gowers and Milićević, who in [Reference Gowers and MilićevićGM20] give a rough upper bound for the number of iterated exponentials appearing in their result. Based on this,
$m$ is likely at least 24 trillion. The use of this inverse theorem is necessary in our proof, and no amount of care to argue efficiently in the rest of the argument can reduce
$m$ by much. Thus, we have not tried to optimize the proof of Theorem 1.2, choosing instead to present the simplest argument that gives a reasonable upper bound.
It is likely that the proof of Theorem 1.2 can be adapted to the integer setting to prove a reasonable bound for subsets of $[N]\times [N]$ lacking L-shaped configurations, with the bound obtained being far more reasonable than the bound in Theorem 1.2. This is because the quantitative aspects of Manners's [Reference MannersMan18] inverse theorem for the
$U^{s}$-norm on cyclic groups are better than those of Gowers and Milićević's inverse theorem when
$s>4$. It is also likely that Theorem 1.2 can be extended to more general L-shaped configurations with a longer vertical ‘leg’,

in both the finite field model and integer settings. We expect, however, that understanding L-shaped configurations with two longer ‘legs’,

is significantly more difficult, for some of the same reasons that proving reasonable bounds for sets lacking three-dimensional corners or axis-aligned squares seems out of reach.
While progress in proving a quantitative version of the multidimensional Szemerédi theorem has so far been extremely limited, there has been a bit more success in proving reasonable bounds for sets lacking multidimensional configurations with more degrees of freedom than those in Theorem 1.1. Prendiville [Reference PrendivillePre15] has proven reasonable bounds for subsets of $[N]^d$ lacking any sufficiently nondegenerate three- or four-term matrix progression, and one consequence of his work is that any subset of
$[N]\times [N]$ containing no four vertices of any square (not necessarily axis-aligned) has size at most
$\ll N^2/(\log \log {N})^{c'}$ for some absolute constant
$c'>0$.
The remainder of this paper is organized as follows. In § 2, we give a detailed outline of our proof of Theorem 1.2, including statements of the three main components of the density-increment argument: control of the count of L-shaped configurations by directional uniformity norms, obtaining a density increment on a structured set, and pseudorandomizing the structured set obtained previously. After introducing additional technical preliminaries in §§ 3 and 4, we prove these three main components in §§ 5, 6, and 7, respectively. We then carry out the density increment argument in § 8, proving Theorem 1.2.
2. Outline of the proof of Theorem 1.2
We begin this section by introducing the minimum amount of notation and preliminaries needed to understand our proof outline. We will use the standard asymptotic notation $O,\Omega,$ and
$o$, along with Vinogradov's notation
$\ll,\gg,$ and
$\asymp$. For any two quantities
$A$ and
$B$, the relations
$A=O(B)$,
$B=\Omega (A)$,
$A\ll B$, and
$B\gg A$ all mean that
$|A|\leq C|B|$ for some absolute constant
$C>0$. We will write
$O(B)$ to represent a quantity that is
$\ll B$ and
$\Omega (A)$ to represent a quantity that is
$\gg A$. When any of these asymptotic symbols appears with a subscript, the implied constant is allowed to depend on the parameters in the subscript. Since we fix a prime
$p$ in Theorem 1.2, the implied constants appearing throughout the paper will sometimes depend on
$p$ even though we will not alert the reader to this with a subscript. We will use
$\log _m$ to denote the
$m$-fold iterated logarithm, so that
$\log _1:=\log$ and
$\log _{i}:=\log \circ \log _{i-1}$ for all
$i>1$, as well as
$\exp ^m$ to denote the
$m$-fold iterated exponential, so that
$\exp ^1=\exp$ and
$\exp ^{i}:=\exp \circ \exp ^{i-1}$ for all
$i>1$.
We will frequently denote the indicator function of a set $A$ by the letter
$A$ itself, so that

For any pair of finite sets $X\subset Y$ with
$Y\neq \emptyset$, we denote the density of
$X$ in
$Y$ by

For any function $f:X\to \mathbb {C}$, we denote the average of
$f$ over
$X$ by

When $X=\mathbb {F}_p^n$, we will usually drop ‘
$\in X$’ and just write
$\mathbb {E}_x$ for
$\mathbb {E}_{x\in \mathbb {F}_p^n}$. Whenever
$f$ satisfies
$|f(x)|\leq 1$ for all
$x$ in its domain, we say that it is
$1$-bounded. Note that the indicator function of any set is
$1$-bounded.
For any $f:\mathbb {F}_p^n\to \mathbb {C}$ and
$\xi \in \mathbb {F}_p^n$, we define the Fourier coefficient of
$f$ at
$\xi$ using the normalization

where $e_p(z):= e^{2\pi iz/p}$ and
$\cdot$ denotes the usual dot product in
$\mathbb {F}_p^n$. With this choice of normalization, the Fourier inversion formula and Parseval's identity read

and

respectively.
Let $H$ be any abelian group and
$g:H\to \mathbb {C}$. For any
$h\in H$, we define the function
$\Delta _hg:H\to \mathbb {C}$ by

and, for any $h_1,\ldots,h_s\in H$, define the
$s$-fold iterated differencing operator
$\Delta _{h_1,\ldots,h_s}$ by

Note that $\Delta _{h_1,\ldots,h_s}g=\Delta _{h_{\sigma (1)},\ldots,h_{\sigma (s)}}g$ for any permutation
$\sigma$ of
$\{1,\ldots,s\}$.
Now we can recall the definition of the Gowers uniformity norms.
Definition 2.1 Let $s\in \mathbb {N}$,
$H$ be an abelian group, and
$f:H\to \mathbb {C}$. The
$U^s$-norm of
$f$ is defined by

The basic properties of these norms can be found in [Reference TaoTao12]. One such fact needed in the upcoming outline is the inverse theorem for the $U^2$-norm, which is a simple consequence of Fourier inversion and Parseval's identity.
Lemma 2.2 Let $H$ be an abelian group and
$f:H\to \mathbb {C}$ be
$1$-bounded. If
$\|f\|_{U^2(H)}\geq \delta$, then there exists a
$\psi \in \widehat {H}$ such that

We will also sometimes need the notion of the $U^2$-norm on an affine subspace
$w+V$ of
$\mathbb {F}_p^n$, which is defined by
$\|f\|_{U^2(w+V)}:=\|f(\cdot -w)\|_{U^2(V)}$. The corresponding inverse theorem for these norms follows from Lemma 2.2.
2.1 A review of Shkredov's argument in the finite field model setting
Before we outline the proof of Theorem 1.2, it will be instructive to review Shkredov's argument for corners (1) in the finite field model setting. A detailed account of the argument can be found in the expositions of Green [Reference GreenGre05a, Reference GreenGre05b].
Shkredov's proof proceeds via a density-increment argument. As in all analytic approaches to Szemerédi's theorem and its generalizations, we begin by defining a multilinear average over the configuration of interest. For $g_0,g_1,g_2:\mathbb {F}_p^n\times \mathbb {F}_p^n\to \mathbb {C}$, set

Then, for any $S\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$, the quantity
$\Lambda _{\llcorner }(S,S,S)$ equals the normalized count,

of the number of corners in $S$. Setting
$N:=p^n=|\mathbb {F}_p^n|$, we let
$\sigma :=|S|/N^2$ denote the density of
$S$ in
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ and
$g_S:=S-\sigma$ denote the balanced function of
$S$. It follows from the trilinearity of
$\Lambda _{\llcorner }$ that

Since $\Lambda _{\llcorner }(1,S,S)\geq \sigma ^2$ by the Cauchy–Schwarz inequality, if the normalized count of corners in
$S$ is far below the
$\sim \sigma ^3$ expected for a random set of density
$\sigma$, which is the case when
$S$ has no nontrivial corners and
$N$ is sufficiently large in terms of
$\sigma$, then
$|\Lambda _{\llcorner }(g_S,S,S)|$ must be large.
It can then be shown, by an appropriate sequence of applications of the Cauchy–Schwarz inequality, that $g_S$ must have large box norm

If $\|g_S\|_{\square }$ is large, it follows by an averaging argument that
$S$ has density at least
$\sigma +\Omega (\sigma ^{O(1)})$ on a product set
$A\times B$ for some large
$A,B\subset \mathbb {F}_p^n$.
One may then hope to continue the density-increment argument by proving the following generalization of the result just sketched: if $S$ is a subset of density
$\sigma$ of a product set
$T=A\times B$ and contains no nontrivial corners, then
$S$ has density at least
$\sigma +\Omega (\sigma ^{O(1)})$ on a product set
$T'$ contained in
$T$.
It turns out, however, that the Cauchy–Schwarz argument mentioned previously yields a lower bound on the box norm of large enough size only when $A$ and
$B$ are sufficiently Fourier pseudorandom, meaning that their balanced functions
$A-|A|/N$ and
$B-|B|/N$ both have small
$U^2$-norm. The components of the product set just obtained are essentially arbitrary aside from being large. They are, in particular, not guaranteed to be Fourier pseudorandom.
To overcome this difficulty, Shkredov introduced a pseudorandomizing step into his proof. He used an energy increment argument incorporating the $U^2$-inverse theorem to partition
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ into products of large affine subspaces of the form

for most of which the sets $(A-u)\cap V$ and
$(B-w)\cap V$ are Fourier pseudorandom in
$V$. By an averaging argument, there must exist such a product of affine subspaces (5) on which the restrictions of
$A$ and
$B$ are both sufficiently dense and Fourier pseudorandom, and such that
$S$ still has increased density
$\sigma +\Omega (\sigma ^{O(1)})$ on the intersection of
$T$ with
$(u+V)\times (w+V)$.
By passing to this product of cosets and using that corners are preserved by translation and invertible linear transformations of the form $(x,y)\mapsto (Ex,Ey)$, one can then continue the density-increment argument with
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ replaced by
$\mathbb {F}_p^{n'}\times \mathbb {F}_p^{n'}$, where
$n'=\dim {V}$. If
$S\subset T$ contains no nontrivial corners and
$A$ and
$B$ are sufficiently Fourier pseudorandom, then
$g_S$ must have large box norm localized to
$T$. One must then prove that
$S$ has a further density increment on a product set contained in
$T$, which is, fortunately, of exactly the same difficulty whether
$T=\mathbb {F}_p^n\times \mathbb {F}_p^n$ or some other large product set. By applying the pseudorandomizing procedure to the factors of the product set just produced, one can then deduce that if
$S$ is a subset of density
$\sigma$ of a product set
$T=A\times B$, where
$A$ and
$B$ are large and sufficiently Fourier pseudorandom, and
$S$ contains no nontrivial corners, then
$S$ has density at least
$\sigma +\Omega (\sigma ^{O(1)})$ on a product set
$T'=A'\times B'$ contained in
$T$, where
$A'$ and
$B'$ are also large and sufficiently Fourier pseudorandom. The density increment iteration can be carried out repeatedly to produce a good bound for subsets of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ lacking corners.
2.2 An outline of our argument
The obstructions to uniformity for L-shaped configurations are not just (skew) product sets, as was the case for corners, but also very general sets of the form

where each $u_x+V_x$ is an affine subspace of
$\mathbb {F}_p^n$. For example, assume that
$n\geq 3$, and consider the set

This set has density

in $\mathbb {F}_p^n\times \mathbb {F}_p^n$, but

L-shaped configurations, in contrast to the $\sim N^3/p^4$ expected in a random subset of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ of density
$1/p$. Similarly, the number of L-shaped configurations in the sets

and

where $\phi (x)\in \mathbb {F}_p^n$ and
$u(x)\in \mathbb {F}_p$ are now chosen uniformly at random, is also
${\sim }N^3/p^3$ with high probability, while the sets have density
${\sim }1/p$ with high probability. These new sorts of obstructions to uniformity are the main reason why the study of L-shaped configurations is significantly more difficult than that of corners, and must be taken into account to prove Theorem 1.2.
For any functions $g_0,g_1,g_2,g_3:\mathbb {F}_p^n\times \mathbb {F}_p^n\to \mathbb {C}$, we define

so that $\Lambda (S,S,S,S)$ equals the normalized count of L-shaped configurations in any subset
$S$ of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$. The multilinearity of
$\Lambda$ implies that

where, as before, $g_S=S-\sigma$ is the balanced function of
$S$. Thus, if the normalized count of L-shaped configurations in
$S$ is far from the random normalized count
$\sigma ^4$, one of
$|\Lambda (1,1,g_S,S)|$,
$|\Lambda (1,g_S,S,S)|$, or
$|\Lambda (g_S,S,S,S)|$ must be large. In particular, when
$S$ contains no nontrivial L-shaped configurations and
$N$ is sufficiently large in terms of
$\sigma$, one of these quantities will be larger than
$\sigma ^4/2$. It then follows from several applications of the Cauchy–Schwarz inequality that one of the following directional uniformity norms of
$g_S$ must be larger than
$\sigma ^4/2$:


or

Here $\|\cdot \|_{\star _3}$ is only a semi-norm, while
$\|\cdot \|_{\star _1}$ and
$\|\cdot \|_{\star _2}$ are genuine norms. Since these are all Gowers box norms, one can find a proof that they are (semi-)norms in Appendix B of [Reference Green and TaoGT10]. The norm
$\|\cdot \|_{\star _1}$ had been studied previously, in the setting of cyclic groups, in work of Shkredov [Reference ShkredovShk09].
Directional uniformity norms with two differencing parameters,

for fixed nonzero $v_1,v_2\in H\times H$, are well-understood. Either
$v_1$ and
$v_2$ are scalar multiples of each other, in which case the norm is just the
$U^2$-norm on
$\langle v_1\rangle$ averaged over cosets of
$\langle v_1\rangle$, or they are linearly independent, as in the definition of
$\|\cdot \|_{\star _2}$, in which case the norm is, after a change of variables, equivalent to the two-dimensional box norm. Directional uniformity norms with three differencing parameters,

for fixed nonzero $v_1,v_2,v_3\in H\times H$ analogously fall into one of three cases: either
$v_1,v_2,$ and
$v_3$ are collinear, lie on exactly two lines, or are in general position. In the first case, the norm is just the
$U^3$-norm on
$\langle v_1\rangle$ averaged over cosets of
$\langle v_1\rangle$. In the third case, the norm is linearly equivalent to the intractable norm that arises in the study of three-dimensional corners and axis-aligned squares. The norm
$\|\cdot \|_{\star _1}$ we encounter falls into the second case, and the study and fruitful use of this norm turns out to be possible (though still complicated) due to its structure as a ‘
$U^1\times U^2$-norm’.
The upshot is that if $S$ contains no nontrivial L-shaped configurations, then it must have density at least
$\sigma +\Omega (\sigma ^{O(1)})$ on a set of the form

where $\Phi \subset A\times \mathbb {F}_p^n$ is of the form

for some element $u\in \mathbb {F}_p^n$ and collection of subspaces
$\{V_x:x\in A\}$ of
$\mathbb {F}_p^n$, where
$A,B,C,D\subset \mathbb {F}_p^n$ are large and
$\operatorname {codim} V_x$ is small for each
$x\in A$. Note that this set
$\Phi$ is not quite as general as the one appearing at the very beginning of this subsection, as the element
$u$ of
$\mathbb {F}_p^n$ does not vary with
$x$. It takes some extra work to show that we can guarantee
$\Phi$ to be of this special form, which turns out to be necessary for our density-increment iteration. We say more about this point in § 6.
We would like to continue the density-increment iteration and show that $S':= T\cap S$, which also lacks L-shaped configurations, has a further density increment of at least the same size as the first on a subset
$T'$ of
$T$ of the same general form (11). Analogously to Shkredov's argument for corners, we can only hope to do this if
$A,B,C,D,$ and
$\Phi$ are sufficiently pseudorandom, for some appropriate notions of pseudorandomness. We will need to control the count of L-shaped configurations by the norms
$\|\cdot \|_{\star _1}$,
$\|\cdot \|_{\star _2}$, and
$\|\cdot \|_{\star _3}$ defined in (8), (9), and (10) with no loss of density factors, i.e. show that


and

and also obtain a density increment with no loss of density factors when some localized norm $\|\cdot \|_{\star _i}$ of the balanced function of a set is large, i.e. show that if

where now $g_S:=S-\sigma T$, then there exists a subset
$T'\subset T$ of the same general form,

as $T$ on which
$S$ has a density increment

depending only on $\delta$. Such results are needed so that the density increment obtained at each step of the iteration is independent of the step. If one is not sufficiently careful, it is easy to end up with a density increment that gets smaller as the subset
$T$ of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ gets sparser, which is not enough to close the density increment iteration.
To carry out these arguments, we will need $A,B,C,$ and
$D$ to be pseudorandom with respect to the
$U^{10}(\mathbb {F}_p^n)$-norm. The situation for
$\Phi$ is more complicated, and deciding on a good measure of pseudorandomness for
$\Phi$ that is amenable to a Shkredov-like pseudorandomization procedure and can also be used to analyze the various averages appearing throughout our argument is one of the challenges of the proof of Theorem 1.2. A suitable condition on
$\Phi$ turns out to be that it is pseudorandom with respect to the
$U^{8}(\mathbb {F}_p^n\times \mathbb {F}_p^n)$-norm. This condition is not, on its own, immediately useful in the arguments of §§ 5 and 6, since the various averages that appear are not controllable by the
$U^{8}(\mathbb {F}_p^n\times \mathbb {F}_p^n)$-norm of
$\Phi$. It takes a bit of work to show that it implies a roughly equivalent statement about the typical codimensions of certain affine subspaces obtained from
$\Phi$. We prove this in § 4, deriving some new results on the combinatorics of approximate polynomials along the way.
The proof of the implications (13), (14), and (15) when $A,B,C,D,$ and
$\Phi$ are sufficiently pseudorandom consists of many careful applications of the Cauchy–Schwarz inequality, along with appeals to standard facts about the number of linear configurations of bounded Cauchy–Schwarz complexity in products of pseudorandom sets intersected with subspaces of bounded codimension. We carry out this argument in § 5.
Obtaining a large enough density increment when $\|g_S\|_{\star _1}$ is large for
$S\subset T$ requires some new ideas and a significant amount of extra work beyond the proof of the non-localized case, in contrast to the situation for the box norm localized to product sets, where the argument is the same as the non-localized case. In order to get such a density increment that only depends on
$\delta$ and not on the densities of
$A,B,C,D,$ or
$\Phi$, one of the key ingredients is a density-preserving inverse theorem for the
$U^2(\Phi (x,\cdot ))$-norms on pseudorandom sets derived from
$A,B,C,$ and
$D$, which we prove using a version of the transference principle. We carry out this argument in § 6.
As was the case for corners, the sets $A',B',C',D',$ and
$\Phi '$ obtained in the previous paragraph are not guaranteed to be pseudorandom. We must also carry out a pseudorandomizing procedure to locate a product of large affine subspaces of the form
$(u+V)\times (w+V)$ on which
$A',B',C',D',$ and
$\Phi '$ are sufficiently pseudorandom and
$S$ still has a large density increment on
$T'\cap [(u+V)\times (w+V)]$. Our pseudorandomization procedure is similar to Shkredov's, but with some new complications coming from our desire for
$A',B',C',$ and
$D'$ and
$\Phi '$ to be pseudorandom with respect to the
$U^{10}(\mathbb {F}_p^n)$- and
$U^{8}(\mathbb {F}_p^n\times \mathbb {F}_p^n)$-norms, respectively, and from
$\Phi '$'s particular structure as a union of affine subspaces in the second factor of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$. To handle the first complication, we use a recent quantitative inverse theorem of Gowers and Milićević [Reference Gowers and MilićevićGM20] for the
$U^{s}$-norms on vector spaces over finite fields, combined with a result of Cohen and Tal [Reference Cohen and TalCT15] that allows us to partition
$\mathbb {F}_p^n$ into large affine subspaces on which any finite collection of bounded degree polynomials are all constant. The structure of
$\Phi '$ has the potential to cause issues in a Shkredov-like pseudorandomization argument, since the intersection of
$\Phi '$ with a cell may no longer be the union of affine subspaces all having the same dimension. We will explain how this complication is dealt with in § 7, since it requires a bit of set up.
2.3 Key intermediate results
We finish this section by stating the key intermediate results needed to prove Theorem 1.2 that we just described in the outline. Recall that $g_S=S-\sigma$ denotes the balanced function of
$S$.
Lemma 2.3 (Estimation of
$\Lambda (T,T,T,S)$)
There exist absolute constants $0< c_1<1< c_2$ such that the following holds. Let
$d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Define
$T\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ by (11) and suppose that
$S\subset T$ has density
$\sigma$ in
$T$. Let
$\varepsilon \leq c_1(\sigma \alpha \beta \gamma \delta \rho )^{c_2}$ and assume that

Then

As a consequence, we get that if $\varepsilon$ is small enough,
$n$ is large enough, and
$S\subset T$ has no nontrivial L-shaped configurations, then

Lemma 2.4 (Control by
$\|\cdot \|_{\star _i}$ norms)
Let $d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Assume that

Define $T\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ by (11) and suppose that
$f_0,f_1,f_2,f_3:\mathbb {F}_p^n\times \mathbb {F}_p^n\to \mathbb {C}$ are
$1$-bounded functions supported on
$T$. Then


and

Thus, if $\varepsilon$ is small enough,
$n$ is large enough, and
$S\subset T$ has no nontrivial L-shaped configurations, then one of

is $\gg \sigma ^{O(1)}$.
Theorem 2.5 (
$\|g_S\|_{\star _1}$,
$\|g_S\|_{\star _2}$, or
$\|g_S\|_{\star _3}$ large implies a density increment)
There exist absolute constants $0< c_1<1< c_2,c_3$ such that the following holds. Let
$d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Let
$\sigma,\tau >0$ and

and assume that

Define $T\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ by (11) and assume that
$S\subset T$ has density
$\sigma$ in
$T$. Suppose that

or

Then, $S$ has density at least
$\sigma +\Omega (\tau ^{O(1)})$ on a subset
$T'\subset T$ of the form

where the densities of $A',B',C',D'\subset \mathbb {F}_p^n$ are all
$\gg (\sigma \tau \alpha \beta \gamma \delta \rho )^{O(1)}$, and the set
$\Phi '\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x'$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d+1$.
The first three lemmas combined tell us that if $S$ has density
$\sigma$ and contains no nontrivial L-shaped configurations, then one can find a subset
$T$ of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ of the form (11) on which
$S$ has density at least
$\sigma +\Omega (\sigma ^{O(1)})$. The next lemma tells us that, after restricting to a product of large affine subspaces, we can get this same conclusion with
$A,B,C,D,$ and
$\Phi$ as pseudorandom as we need, which will allow us to continue the density-increment iteration.
Lemma 2.6 There exist absolute constants $0< c_1<1< c_2,c,c'$ such that the following holds. Let
$d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Let
$\varepsilon '>0$, and suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ and takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Define
$T\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ by (11), and assume that
$S\subset T$ has density
$\sigma +\tau$ in
$T$, as well as that

Then there exists a subspace $V\leq \mathbb {F}_p^n$ of dimension

$u,w\in \mathbb {F}_p^n$, and
$0\leq i\leq d$ such that, on setting
$\mathcal {C}=(u+V)\times (w+V)$:
–
$B':= B\cap (w+V)$;
–
$C':=C\cap (u+w+V)$;
–
$D':= D\cap (2u+w+V)$;
–
$\Psi ':=\Phi \cap \mathcal {C}$ and
$\Phi ':=\{(x,y)\in \Psi ':\mathbb {E}_{z\in w+V}\Psi '(x,z)=p^{-i}\}$;
–
$A':=\{x\in u+V:\mathbb {E}_{z\in w+V}\Phi '(x,z)\neq 0\}$;
–
$\alpha '=\mu _{u+V}(A')$;
–
$\beta ':=\mu _{w+V}(B')$;
–
$\gamma ':=\mu _{u+w+V}(C')$;
–
$\delta ':=\mu _{2u+w+V}(D')$;
–
$\rho ':=p^{-i}$; and
–
$T':=\{(x,y)\in \mathcal {C}:B'(y)C'(x+y)D'(2x+y)\Phi '(x,y)=1\}$;
we have

$\alpha ',\beta ',\gamma ',\delta '\gg \tau \mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(T)/4$, and

By combining the previous four lemmas and using that L-shaped configurations are preserved by translation and invertible linear transformations of the form $(x,y)\mapsto (Ex,Ey)$, we thus deduce the following density-increment lemma, which we will iterate in § 8 to prove Theorem 1.2.
Lemma 2.7 There exist absolute constants $0< c_1<1< c_2,c_3,c_4,c,c'$ such that the following holds. Let
$d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Define
$T$ by (11) and let
$S\subset T$ have density
$\sigma$ in
$T$. Let
$\varepsilon \leq (\sigma \alpha \beta \gamma \delta \rho )^{c_2}\exp (-(64/\sigma )^{c_3})$, and assume that

Let $\varepsilon '>0$, and suppose that
$S$ has no nontrivial L-shaped configurations. Then either:
(i)
$n<\exp ^2( {c_4\exp ^c(c'/\varepsilon ')}/{(\sigma \alpha \beta \gamma \delta \rho )^{c_2}})$; or
(ii) there exists natural numbers
$n'$ and
$d'$ satisfying
\[ n'\gg n^{c_1^{\exp^c(c'/\varepsilon')/(\sigma\alpha\beta\gamma\delta\rho)^{c_2}}} \]
$0\leq d'\leq d+1$, subsets
$A',B',C',D'\subset \mathbb {F}_p^{n'}$ of densities
$\alpha ',\beta ',\gamma ',\delta '$, respectively, a subset
$\Phi '\subset \mathbb {F}_p^{n'}\times \mathbb {F}_p^{n'}$ of the form
\[ \Phi'=\{(x,y)\in A'\times\mathbb{F}_p^{n'}:y\in u'+V_x'\}, \]
$V_x'$ is a subspace of
$\mathbb {F}_p^{n'}$ of codimension
$d'$ (so that
$\Phi '$ has density
$\alpha '\rho '$, where
$\rho ':=p^{-d'}$), and a subset
$S'\subset T'$, where
\[ T':=\{(x,y)\in\mathbb{F}_p^{n'}\times\mathbb{F}_p^{n'}:B'(y)C'(x+y)D'(2x+y)\Phi'(x,y)=1\}, \]
$\sigma +\Omega (\sigma ^{O(1)})$ in
$T'$, such that
\begin{gather*} \|A'-\alpha'\|_{U^{10}(\mathbb{F}_p^{n'})},\quad \|B'-\beta'\|_{U^{10}(\mathbb{F}_p^{n'})},\quad \|C'-\gamma'\|_{U^{10}(\mathbb{F}_p^{n'})},\\ \|D'-\delta'\|_{U^{10}(\mathbb{F}_p^{n'})},\quad \|\Phi'-\alpha'\rho'\|_{U^8(\mathbb{F}_p^{n'}\times\mathbb{F}_p^{n'})}<\varepsilon', \end{gather*}
$\alpha ',\beta ',\gamma ',\delta '\geq (\sigma \alpha \beta \gamma \delta \rho )^{c_2}$, and
$S'$ contains no nontrivial L-shaped configurations.
3. Additional preliminaries
In this section, we present some more preliminaries that were not needed for the outline of the proof of Theorem 1.2, but will be convenient to have for the proof itself. We begin with the notion of Cauchy–Schwarz complexity, first defined by Green and Tao in [Reference Green and TaoGT10].
Definition 3.1 (Cauchy–Schwarz complexity)
Let $\psi _1,\ldots,\psi _d:(\mathbb {F}_p^n)^r\to \mathbb {F}_p^n$ be a collection of linear forms in
$r$ variables. We say that
$\psi _1,\ldots,\psi _d$ has Cauchy–Schwarz complexity at most
$s$ if, for every
$j\in [d]$, there exists a partition of
$\{\psi _1,\ldots,\psi _d\}\setminus \{\psi _j\}$ into at most
$s+1$ subsets such that
$\psi _j$ is not contained in the linear span of any of the subsets.
The smallest $s$ such that
$\{\psi _1,\ldots,\psi _d\}$ has Cauchy–Schwarz complexity at most
$s$ is called the Cauchy–Schwarz complexity of
$\{\psi _1,\ldots,\psi _d\}$.
For example, four term arithmetic progressions,

have Cauchy–Schwarz complexity $2$.
Any system of linear forms of complexity at most $s$ can be shown to be controlled by the
$U^{s+1}$-norm using repeated applications of the Cauchy–Schwarz inequality. In particular, carrying out the proof of the generalized von Neumann theorem of Green and Tao in [Reference Green and TaoGT10] in the finite field model setting (where the technical details are much simpler) produces the following useful result.
Theorem 3.2 Let $\psi _1,\ldots,\psi _d:(\mathbb {F}_p^n)^r\to \mathbb {F}_p^n$ be a collection of linear forms in
$r$ variables with Cauchy–Schwarz complexity at most
$s$. For any
$1$-bounded functions
$f_1,\ldots,f_d:\mathbb {F}_p^n\to \mathbb {C}$, we have

Further, if all of $f_1,\ldots,f_d$ are supported on a set
$A\subset \mathbb {F}_p^n$ of density
$\alpha$ that satisfies
$\|A-\alpha \|_{U^{s+1}(\mathbb {F}_p^n)}<\alpha \varepsilon$, then

We will use the following immediate corollary of Theorem 3.2 numerous times throughout the proof of Theorem 1.2.
Corollary 3.3 Let $\psi _1,\ldots,\psi _d:(\mathbb {F}_p^n)^r\to \mathbb {F}_p^n$ be a collection of linear forms in
$r$ variables with Cauchy–Schwarz complexity at most
$s$. Suppose that
$f_1,\ldots,f_d:\mathbb {F}_p^n\to \mathbb {C}$ are
$1$-bounded functions having average values
$\alpha _1,\ldots,\alpha _d$, respectively, and that

for all $1\leq j\leq d$. Then

We also need the Gowers–Cauchy–Schwarz inequality.
Lemma 3.4 (Gowers–Cauchy–Schwarz inequality)
Let $s$ be a natural number,
$H$ be an abelian group, and
$f_\omega :H\to \mathbb {C}$ for every
$\omega \in \{0,1\}^s$. We have

Next, we record the basic fact that a function on $\mathbb {F}_p^n$ with small
$U^2$-norm has small average on affine subspaces of small codimension.
Lemma 3.5 Let $f:\mathbb {F}_p^n\to \mathbb {C}$ be a
$1$-bounded function satisfying
$\|f\|_{U^2(\mathbb {F}_p^n)}<\varepsilon$ and
$w+V\subset \mathbb {F}_p^n$ be an affine subspace of codimension
$d$. Then

Proof. The indicator function of $V$ can be written as

so we have that

by the Gowers–Cauchy–Schwarz inequality. Since $|V|=|\mathbb {F}_p^n|/p^d$, the desired result follows.
The last lemma of this section will be used to analyze the various averages appearing in the proofs of Lemmas 2.3, 2.4, and 2.5.
Lemma 3.6 Let $\psi _1,\ldots,\psi _d\in \mathbb {F}_p[x_1,\ldots,x_t,y]$ be a collection of linear forms such that the coefficient of
$y$ in each of
$\psi _1,\ldots,\psi _d$ is nonzero, and
$F:(\mathbb {F}_p^n)^t\times \mathbb {F}_p^n\to [0,1]$ be a function of the form

for some $1$-bounded functions
$f_j:\mathbb {F}_p^n\to \mathbb {C}$ with average value
$\beta _j$, each satisfying
$\|f_j-\beta _j\|_{U^s(\mathbb {F}_p^n)}<\varepsilon$. If the Cauchy–Schwarz complexity of the set of linear forms

in the variables $x_1,\ldots,x_t,y,h,k$, is at most
$s-1$, then

Proof. Set $\beta :=\prod _{j=1}^d\beta _j$. Note that
$\mathbb {E}_{\mathbf {x}\in (\mathbb {F}_p^n)^t}\|F(\mathbf {x},\cdot )-\beta \|_{U^2(\mathbb {F}_p^n)}^4$ equals

where

for each $\omega \in \{0,1\}^4$ and
$1\leq i\leq 4$. Since (19) has Cauchy–Schwarz complexity at most
$s-1$ by hypothesis, Corollary 3.3 implies that

for every $\omega \in \{0,1\}^4$. Thus,

Markov's inequality then gives

for every $r>0$. Taking
$r=\varepsilon ^{1/8}$ gives the conclusion of the lemma.
4. Pseudorandomness of
$\Phi$
The main purpose of this section is to show that if $\|\Phi -\alpha \rho \|_{U^{2s+2}(\mathbb {F}_p^n\times \mathbb {F}_p^n)}$ is small and
$\Phi$ is a subset of the form

where each $V_x\leq \mathbf {F}_p^n$ is a subspace of density
$\rho$ in
$\mathbf {F}_p^n$ and
$A\subset \mathbf {F}_p^n$ has density
$\alpha$ in
$\mathbf {F}_p^n$, then, whenever
$\psi _1,\ldots,\psi _r\in \mathbb {F}_p[x_1,\ldots,x_m]$ is a collection of linear forms of Cauchy–Schwarz complexity at most
$s$ and
$w_1,\ldots,w_r\in \mathbb {F}_p^n$, the affine subspaces

typically have maximum possible codimension. This allows us to transform the condition that $\Phi$ is
$U^{8}(\mathbb {F}_p^n\times \mathbb {F}_p^n)$-pseudorandom into a more useful property for evaluating the various averages that arise in the proof of Theorem 1.2.
Lemma 4.1 For each nonnegative integer $s$ and positive integer
$r$, there exist constants
$C_{s,r},c_{s,r}>0$ such that the following holds. Let
$d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Let
$\delta >0$,
$A\subset \mathbb {F}_p^n$ have density
$\alpha$, and
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ be a set of the form

where each $V_x\leq \mathbb {F}_p^n$ is a subspace of codimension
$d$. Assume that

Let $\psi _1,\ldots,\psi _r\in \mathbb {F}_p[x_1,\ldots,x_m]$ be a collection of linear forms of Cauchy–Schwarz complexity at most
$s$. Then, for all but at most a
$\delta$-proportion of
$m$-tuples
$\mathbf {x}\in (\mathbb {F}_p^{n})^m$ for which
$\psi _1(\mathbf {x}),\ldots,\psi _r(\mathbf {x})\in A$, we must have that

for all $w_1,\ldots,w_r\in \mathbb {F}_p^n$.
We begin by showing that if $\Phi$ is pseudorandom with respect to the
$U^{s}(\mathbb {F}_p^n\times \mathbb {F}_p^n)$-norm, then
$A$ is pseudorandom with respect to the
$U^{s}(\mathbb {F}_p^n)$-norm. This result will also be useful at a few other points in the proof of Theorem 1.2.
Lemma 4.2 Let $d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Let
$A\subset \mathbb {F}_p^n$ have density
$\alpha$ and
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ be of the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Then, for every natural number
$s$, we have

Proof. We write

and then, for each $\omega \in \{0,1\}^s$, insert the identity

to get that $\|A-\alpha \|_{U^s(\mathbb {F}_p^n)}^{2^s}$ equals

We can average the above quantity over $y,\ell _1,\ldots,\ell _s$ and make the change of variables
$k_\omega \mapsto k_\omega +y+\omega \cdot (\ell _1,\ldots,\ell _s)$ to get that it equals

Applying the Gowers–Cauchy–Schwarz inequality bounds (20) above by

which gives us the conclusion of the lemma.
To prove Lemma 4.1, we will need the notion of an approximate polynomial of bounded degree, which we define using the additive discrete difference operator $\partial$. For
$\phi :H\to H$ and
$h\in H$, define
$\partial _h\phi :H\to H$ by

and, for $h_1,\ldots,h_s\in H$, the
$s$-fold additive difference operator
$\partial _{h_1,\ldots,h_s}$ by

Definition 4.3 Let $H$ be an abelian group,
$A\subset H$, and
$\phi :A\to H$. We say that
$\phi$ is an
$\varepsilon$-approximate polynomial of degree at most
$s-1$ on
$A$ if

for at least an $\varepsilon$-proportion of
$(s+1)$-tuples
$(x,h_1,\ldots,h_s)\in H^{s+1}$ for which
$x+\omega \cdot (h_1,\ldots,h_s)\in A$ for all
$\omega \in \{0,1\}^s$.
We will also need the following result, which is the key combinatorial input into the proof of Lemma 4.1.
Lemma 4.4 For each nonnegative integer $s$, there exist constants
$C_s,c_s>0$ such that the following holds. Let
$A\subset \mathbb {F}_p^n$ have density
$\alpha$, with

and $\phi : A\to \mathbb {F}_p^n$ be a
$\delta$-approximate polynomial of degree at most
$s$ on
$A$. Then, for at least a
$\Omega _s(\delta ^{O_s(1)})$-proportion of
$(2s+2)$-dimensional parallelopipeds

in $A^{2^{2s+2}}$, the derivative of
$\phi$ on the
$(2s+1)$-dimensional face
$(x+\omega \cdot (h_1,\ldots,h_{2s+2}))_{\substack {\omega \in \{0,1\}^{2s+2} \\ \omega _i=\epsilon }}$,

vanishes for all $1\leq i\leq 2s+2$ and
$\epsilon =0,1$.
Proof. We proceed by induction on $s$, beginning with the case
$s=0$. If
$\phi$ is a
$\delta$-approximate polynomial of degree at most
$0$ on
$A$, then
$\mathbb {E}_{x,y\in A}1_{\phi (x)=\phi (y)}\geq \delta$, so that, by the pigeonhole principle, there exists some
$z\in \mathbb {F}_p^n$ for which
$\mu _{A}(\{x\in A:\phi (x)=z\})\geq \delta$. Set
$X:=\{x\in A:\phi (x)=z\}$, and consider the set of quadruples

Note that if $(x,x+h,x+k,x+h+k)\in X'$, then

so certainly the derivatives

of $\phi$ on each of the
$1$-dimensional faces of the parallelopiped
$(x,x+h,x+k,x+h+k)$ vanish. Since

we must have $|X'|\geq (\alpha \delta )^4p^{3n}$. Taking
$c_0=8$, the total number of quadruples
$(x,x+h,x+k,x+h+k)$ in
$A^4$ is
$(\alpha ^4+O(C_0\alpha ^8))p^{3n}$ by Corollary 3.3, which means that
$X'$ consists of at least a
$\delta ^4/2$-proportion of parallelograms
$(x,x+h,x+k,x+h+k)$ in
$A^4$ if
$C_0$ is chosen small enough. Thus, for at least a
$\delta ^4/2$-proportion of parallelograms
$(x,x+h,x+k,x+h+k)$ in
$A^4$, the derivative of
$\phi$ on each
$1$-dimensional face vanishes, as desired.
Now suppose that the result holds for a general degree $s-1\geq 0$, and let
$\phi$ be a
$\delta$-approximate polynomial of degree at most
$s$ on
$A$. By Corollary 3.3,

so that, as long as $C_s$ is sufficiently small and
$c_s$ is sufficiently large, it follows from Markov's inequality that

and, thus,

as well, for all but a $O(\delta ^2)$-proportion of
$h\in \mathbb {F}_p^n$. Thus, for at least a
$\Omega (\delta )$-proportion of
$h_{s+1}\in \mathbb {F}_p^n$, we have
$\|\Delta _{h_{s+1}}A-\alpha ^2\|_{U^{2s}(\mathbb {F}_p^n)}< C_{s-1}(\alpha \delta /2)^{2c_{s-1}}/2$,
$|\mu _{\mathbb {F}_p^n}(A\cap (A-h_{s+1}))-\alpha ^2|< C_{s-1}(\alpha \delta /2)^{2c_{s-1}}/2$, and that the function
$\partial _{h_{s+1}}\phi$ is a
$O(\delta )$-approximate polynomial of degree at most
$s-1$ on
$A\cap (A-h_{s+1})$. Denoting the set of such
$h_{s+1}$ by
$H$, so that
$\mu _{\mathbb {F}_p^n}(H)\gg \delta$, the induction hypothesis then says that, for each
$h\in H$, there are at least a
$\Omega _{s}(\delta ^{O_{s}(1)})$-proportion of
$2s$-dimensional parallelopipeds
$(x+\omega \cdot (h_1,\ldots,h_{2s}))_{\omega \in \{0,1\}^{2s}}$ in
$(A\cap (A-h))^{2^{2s}}$ for which the derivative of
$\partial _h\phi$ on each
$(2s-1)$-dimensional face vanishes.
Summing over all $h\in H$, it follows that, for at least a
$\Omega _{s}(\delta ^{O_s(1)})$-proportion of
$(2s+2)$-tuples
$(x,y,h_1,\ldots,h_{2s})$ for which
$(x+\omega \cdot (h_1,\ldots,h_{2s}))_{\omega \in \{0,1\}^{2s}}$ and
$(y+\omega \cdot (h_1,\ldots,h_{2s}))_{\omega \in \{0,1\}^{2s}}$ are both in
$A^{2^{2s}}$, one has

for all $i=1,\ldots,2s$ and
$\epsilon =0,1$. By Corollary 3.3,

and so, by Markov's inequality,

for all but a $O([C_s(\alpha \delta )^{c_{s}}]^{\Omega (1)})$-proportion of
$(h_1,\ldots,h_{2s})$ in
$(\mathbb {F}_p^n)^{2s}$. By taking
$C_s$ small enough and
$c_s$ large enough, there are therefore at least a
$\Omega _{s}(\delta ^{O_s(1)})$-proportion of
$2s$-tuples
$(h_1,\ldots,h_{2s})$ in
$(\mathbb {F}_p^n)^{2s}$ for which (21) holds and, for at least a
$\Omega _{s}(\delta ^{O_s(1)})$-proportion of pairs
$(x,y)\in A^2$ such that
$(x+\omega \cdot (h_1,\ldots,h_{2s}))_{\omega \in \{0,1\}^{2s}}$ and
$(y+\omega \cdot (h_1,\ldots,h_{2s}))_{\omega \in \{0,1\}^{2s}}$ are both in
$A^{2^{2s}}$, one also has

for all $i=1,\ldots,2s$ and
$\epsilon =0,1$. For each such
$2s$-tuple
$\mathbf {h}$, it follows from the pigeonhole principle that there exists a
$y_{\mathbf {h}}$ in the set

such that, for at least a $\Omega _{s}(\delta ^{O_s(1)})$-proportion of
$x\in A_{\mathbf {h}}$, one has

for all $i=1,\ldots,2s$ and
$\epsilon =0,1$.
Now set $v_{i,\epsilon,\mathbf {h}}:=\partial _{h_1,\ldots,\widehat {h_i},\ldots,h_{2s}}\phi (y_{\mathbf {h}}+\epsilon h_i)$,

so that $\mu _{A_{\mathbf {h}}}(X_{\mathbf {h}})\gg _s\delta ^{O_s(1)}$, and

Note that $(x,x+k,x+k',x+k+k')\in X'_{\mathbf {h}}$ if and only if

for all $i=1,\ldots,2s$,
$\epsilon =0,1$, and
$\omega '\in \{0,1\}^2$. Thus, whenever
$(x,x+k,x+k',x+k+k')\in X'_{\mathbf {h}}$, we have

and

for all $i=1,\ldots,2s$ and
$\epsilon =0,1$. That is, the derivative of
$\phi$ vanishes on all
$(2s+1)$-dimensional faces of the
$(2s+2)$-dimensional parallelopiped
$(x+\omega \cdot (h_1,\ldots,h_{2s},k,k'))_{\omega \in \{0,1\}^{2s+2}}$.
As in the $s=0$ case,

so that

for at least a $\Omega _s(\delta ^{O_s(1)})$-proportion of
$2s$-tuples
$\mathbf {h}$. Each ordered quadruple
$(x,\mathbf {h},k,k')$ for which
$(x,x+k,x+k',x+k+k')\in X_{\mathbf {h}}'$ corresponds to a unique
$(2s+2)$-dimensional parallelopiped

in $A^{2s+2}$. Thus,

In comparison, the number of $(2s+2)$-dimensional parallelopipeds in
$A$ is

by Corollary 3.3. The conclusion of the lemma now follows as long as $C_s$ is sufficiently small and
$c_s$ is sufficiently large.
With a bit more work, it is possible to prove a version of Lemma 4.4 with $(2s+2)$-dimensional parallelopipeds replaced by
$(s+2)$-dimensional parallelopipeds (which is optimal), and thus a version of Lemma 4.1 with the
$U^{2s+2}$-norm replaced by the
$U^{s+2}$-norm, but this would make a negligible difference in Theorem 1.2.
Now we can prove Lemma 4.1.
Proof of Lemma 4.1 We proceed by induction on $r$ and
$s$, beginning with the
$r=1$,
$s=0$ caseFootnote 1. Since
$\operatorname {codim}\{y\in \mathbb {F}_p^n:\Phi (x,y)=1\}=d$ for all
$x\in A$, certainly

for all $\mathbf {x}\in (\mathbb {F}_p^n)^m$ for which
$\psi _1(\mathbf {x})\in A$, and this case follows trivially without even needing the assumption that
$\|\Phi -\alpha \rho \|_{U^2(\mathbb {F}_p^n\times \mathbb {F}_p^n)}$ is small.
Now let $r\geq 2$ or
$s\geq 1$, and assume that the result holds for all pairs of integers
$(r',s')$ satisfying:
(i)
$0\leq r'< r$ and
$1\leq s'\leq s$; or
(ii)
$1\leq s'< s$;
and let $C'$ be at most the minimum of
$C_{r',s'}$ and
$c'$ be at least the maximum of
$c_{r',s'}$ over all such pairs with
$r'<\max (r,2^{s+1})$. As long as
$\|\Phi -\alpha \rho \|_{U^{2s+2}(\mathbb {F}_p^n\times \mathbb {F}_p^n)}< C'(\alpha \delta \rho ^{2r}/2)^{c'2^r}$, it follows from the induction hypothesis that for all but a
$O(\delta )$-proportion of
$\mathbf {x}\in (\mathbb {F}_p^n)^m$ for which
$\psi _1(\mathbf {x}),\ldots,\psi _r(\mathbf {x})\in A$, we must have

for all $w_1,\ldots,w_r\in \mathbb {F}_p^n$. If the codimension of some
$\{y\in \mathbb {F}_p^n:\prod _{i=1}^{r}\Phi (\psi _i(\mathbf {x}),y+w_i)=1\}$ is not
$rd$ for one of these typical
$\mathbf {x}$, then it is either
$n$ or at most
$rd-1$, which means that

in either case, since $p\geq 2$. Squaring both sides of (22), multiplying by
$\prod _{i=1}^rA(\psi _i(\mathbf {x}))$, averaging over all
$\mathbf {x}\in (\mathbb {F}_p^n)^m$, swapping the order of summation, and applying Lemma 4.2 (to deduce the uniformity of
$A$) and Theorem 3.2 yields

By Hölder's inequality, we then have

It follows from this, the induction hypothesis, our assumption that $\|\Phi -\alpha \rho \|_{U^{2s+2}(\mathbb {F}_p^n\times \mathbb {F}_p^n)}< C'(\alpha \delta \rho ^{2r}/2)^{c'2^r}$, and Lemma 4.2 that

since the Cauchy–Schwarz complexity of any proper subset of

is at most $s-1$.
Thus, by taking $C'$ sufficiently small and
$c'$ sufficiently large, we get that

for at least an $\Omega _s((\delta \rho )^{O_s(1)})$-proportion of
$(s+2)$-tuples
$(x,h_1,\ldots,h_{s+1})\in (\mathbb {F}_p^n)^{s+2}$ for which
$(x+\omega \cdot (h_1,\ldots,h_{s+1}))_{\omega \in \{0,1\}^{s+1}}\in A^{2^{s+1}}$. The condition (23) implies that, for each such
$(x,h_1,\ldots,h_{s+1})$, there exist vectors
$v_{x,\mathbf {h},\omega }\in V^{\perp }_{x+\omega \cdot \mathbf {h}}$,
$\omega \in \{0,1\}^{s+1}$, not all of which are zero, such that

Fix a basis $\{\gamma _{z,1},\ldots,\gamma _{z,d}\}$ of
$V_z^\perp$ for each
$z\in \mathbb {F}_p^n$. We can write every vector
$v_{x,\mathbf {h},\omega }$ in terms of this basis, giving us that

for some $2^{s+1}d$-tuple of constants
$(b_{x,\mathbf {h},\omega,j})_{\omega \in \{0,1\}^{s+1},\ 1\leq j\leq d}$, not all of which are zero.
We apply the pigeonhole principle to deduce that there is some $2^{s+1}d$-tuple of constants
$(b_{\omega,j})_{\omega \in \{0,1\}^{s+1},\ 1\leq j\leq d}$, not all of which are zero, such that

for at least a $\Omega _s((\delta \rho )^{O_s(1)})$-proportion of
$(s+2)$-tuples
$(x,h_1,\ldots,h_{s+1})\in (\mathbb {F}_p^n)^{s+2}$ for which
$(x+\omega \cdot (h_1,\ldots,h_{s+1}))_{\omega \in \{0,1\}^{s+1}}\in A^{2^{s+1}}$. Defining

for each $\omega \in \{0,1\}^{s+1}$, the above says that, for at least a
$\Omega _s((\delta \rho )^{O_s(1)})$-proportion of
$(s+2)$-tuples
$(x,h_1,\ldots,h_{s+1})\in (\mathbb {F}_p^n)^{s+2}$ for which
$(x+\omega \cdot (h_1,\ldots,h_{s+1}))_{\omega \in \{0,1\}^{s+1}}\in A^{2^{s+1}}$, we must have

where at least one of the functions $\phi _{\omega }:\mathbb {F}_p^n\to \mathbb {F}_p^n$ does not have the zero vector in its image (because
$\phi _\omega (z)$ is always a nontrivial linear combination of linearly independent vectors). Let
$\phi$ be any such
$\phi _\omega$. It then follows by applying Corollary 3.3 to the inside average of

that $\partial _{h_1,\ldots,h_{s+1}}\phi (x)=0$ for at least a
$\Omega _s((\delta \rho )^{O_s(1)})$-proportion of
$(s+2)$-tuples
$(x,h_1,\ldots,h_{s+1})\in (\mathbb {F}_p^n)^{s+2}$ for which
$(x+\omega \cdot (h_1,\ldots,h_{s+1}))_{\omega \in \{0,1\}^{s+1}}\in A^{2^{s+1}}$, again provided that
$C'$ is sufficiently small and
$c'$ is sufficiently large. That is,
$\phi$ is a
$\Omega _s((\delta \rho )^{O_s(1)})$-approximate polynomial of degree at most
$s$ on
$A$.
If $C'$ is small enough and
$c'$ is large enough, Lemmas 4.2 and 4.4, then imply that for at least a
$\Omega _{s}((\alpha \delta \rho )^{O_s(1)})$-proportion of
$(2s+2)$-dimensional parallelopipeds
$(x+\omega \cdot (h_1,\ldots,h_{2s+2}))_{\omega \in \{0,1\}^{2s+2}}$ in
$A^{2^{2s+2}}$, the derivative of
$\phi$ on each
$(2s+1)$-dimensional face vanishes, i.e.

for all $i=1,\ldots,2s+2$ and
$\epsilon =0,1$. Call the set of
$(2s+3)$-tuples
$(x,h_1,\ldots,h_{2s+2})$ corresponding to such
$(2s+2)$-dimensional parallelopipeds
$X$. Consider
$\|\Phi -\rho A\|_{U^{2s+2}(\mathbb {F}_p^n\times \mathbb {F}_p^n)}^{2^{2s+2}}$, which, plugging in the expression

for $\Phi -\rho A$, equals

The above has size $\gg _s(p-1)(\alpha \rho )^{2^{2s+2}}(\alpha \delta \rho )^{O_s(1)}\gg _s(\alpha \delta \rho )^{O_s(1)}$, coming from the contribution of
$v_\omega =\lambda \phi (x+\omega \cdot (h_1,\ldots,h_{2s+2}))$ for each
$\lambda \in \mathbb {F}_p^\times$ and
$(x,h_1,\ldots,h_{2s+2})\in X$. On the other hand, we have

so that

Taking $C_{s,r}$ sufficiently small and
$c_{s,r}$ sufficiently large will thus yield a contradiction if

for some $w_1,\ldots,w_r\in \mathbb {F}_p^n$ for a
$\delta$-proportion of
$\mathbf {x}\in (\mathbb {F}_p^n)^m$ for which
$\psi _1(\mathbf {x}),\ldots,\psi _r(\mathbf {x})\in A$.
5. Control by directional uniformity norms
This section is devoted to proving Lemmas 2.3 and 2.4. Our arguments mostly consist of careful, repeated applications of the Cauchy–Schwarz inequality to ensure that there is no loss of density factors and using the results of §§ 3 and 4 to analyze the resulting averages. As a simple warm-up, we begin by showing that if $T\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ has the form (11) and
$B,C,$ and
$D$ are sufficiently pseudorandom, then
$T$ has density close to the product density
$\alpha \beta \gamma \delta \rho$.
Lemma 5.1 Let $d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Let
$\varepsilon >0$ and assume that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,$ and
$\delta$, respectively, and satisfy

and that $\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Then the set
$T\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ defined by (11) has density

Proof. The density of $T$ in
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ can be written as

where $F(x,y):=B(y)C(x+y)D(2x+y)$. Set
$L(x,y):=\{y,x+y,2x+y\}$. Lemma 3.6 says that

since

has Cauchy–Schwarz complexity at most $3$. Thus, Lemma 3.5 yields

for all but a $O(\sqrt {\varepsilon })$-proportion of
$x\in \mathbb {F}_p^n$, so that

Now we can prove Lemma 2.3.
Proof of Lemma 2.3 The quantity of interest $\Lambda (T,T,T,S)$ is

which, after a change of variables, can be written as

where $\mu (x,y)$ equals

We will show that $\mu (x,y)$ is very close to the constant value
$\alpha \beta ^2\gamma ^2\delta ^2\rho ^2$ for almost every pair
$(x,y)\in \mathbb {F}_p^n\times \mathbb {F}_p^n$, from which it will then follow that
$\Lambda (T,T,T,S)$ is close to
$\sigma \alpha ^2\beta ^3\gamma ^3\delta ^3\rho ^3$.
The first moment $\mathbb {E}_{x,y}\mu (x,y)$ equals

Applying Lemma 3.6 yields

where

It therefore follows from Lemma 3.5 that

for all but a $O(\sqrt {\varepsilon })$-proportion of
$(x,y)\in \mathbb {F}_p^n\times \mathbb {F}_p^n$. Thus,

By arguing as in the proof of Lemma 5.1, we have

and, thus, conclude that

The second moment $\mathbb {E}_{x,y}\mu (x,y)^2$ equals

Applying Lemma 3.6 again yields

where

and applying Lemma 4.1 yields

for all $y\in \mathbb {F}_p^n$. Thus, by Lemma 3.5,

for all but a $O(\varepsilon ^{\Omega (1)}/\rho ^{O(1)})$-proportion of
$(x,x+h,y)\in A\times A\times \mathbb {F}_p^n$, so that

By Lemmas 3.6 and 4.1 again, we have

and

where

so that

by Lemma 3.5. Using Corollary 3.3 to estimate $\mathbb {E}_{x,h}\Delta _{-h}A(x)$, we thus conclude that

Our estimates for the first and second moments of $\mu$ imply that
$\mu$ has variance
$\mathbb {E}_{x,y}|\mu (x,y)-\alpha \beta ^2\gamma ^2\delta ^2\rho |^2\ll \varepsilon ^{\Omega (1)}/\rho ^{O(1)}$. It follows that

When $c_1$ is sufficiently small and
$c_2$ is sufficiently large, this gives the desired lower bound for
$\Lambda (T,T,T,S)$.
To finish this section, we prove Lemma 2.4.
Proof of Lemma 2.4 We prove (16), (17), and then (18), proceeding in decreasing order of the number of applications of Cauchy–Schwarz required. For (16), we make the change of variables $z\mapsto z-x-y$ to write
$\Lambda (f_0,f_1,f_2,f_3)$ as

which, by applying the Cauchy–Schwarz inequality in the $x$ and
$z$ variables, has modulus squared bounded by

The first factor equals $\alpha \beta \gamma \delta \rho +O(\varepsilon ^{1/8})$ by Lemma 5.1. Expanding the square and making a change of variables, the second factor equals

By another application of the Cauchy–Schwarz inequality in the $y,z,$ and
$h_1$ variables, the modulus squared of this is at most

times

The first factor (24) can be estimated in the same manner as the averages appearing in the proof of Lemma 2.3, and equals $\alpha ^2\beta ^2\gamma \delta ^2\rho ^2+O(\varepsilon ^{\Omega (1)}/\rho ^{O(1)})$. Expanding the square and making a change of variables, we get that the second factor equals

A final application of the Cauchy–Schwarz inequality in the $x,y,h_1,$ and
$h_2$ variables bounds the modulus squared of this by

times

By Lemmas 3.6 and 4.1, we have

for all $y\in \mathbb {F}_p^n$, and

where

and

It then follows from Lemma 3.5 that (25) equals

which equals

It remains to relate

to $\|f_0\|_{\star _1}$. Expanding the square and making a change of variables yields

which can be written as

where

We will show that, for almost every $5$-tuple
$(x,y,h_1,h_2,h_3)\in (\mathbb {F}_p^n)^5$ for which
$x,x+h_2\in A$,
$\mu (x,y,h_1,h_2,h_3)$ is very close to the constant value
$\alpha ^4\beta ^8\gamma ^6\delta ^6\rho ^{6}$. Indeed, the first moment
$\mathbb {E}_{\substack {x,y,h_1,h_2,h_3\\ x,x+h_2\in A}}\mu (x,y,h_1,h_2,h_3)$ is

Lemmas 3.6 and 4.1 tell us that

and

where

so it follows from Lemma 3.5 that the first moment equals

which equals

by Corollary 3.3. The second moment $\mathbb {E}_{\substack {x,y,h_1,h_2,h_3 \\ x,x+h_2\in A}}\mu (x,y,h_1,h_2,h_3)^2$ is

which, noting that

we can write as

Lemmas 3.6 and 4.1, analogously to the case of the first moment, tell us that

and

where

so it follows from Lemma 3.5 that the second moment equals $(\alpha ^2+O(\varepsilon ))^{-1}$ times

To estimate the main term of (26), we note that

and apply Lemmas 3.6 and 4.1 again to get that

and

where

Thus, by Lemma 3.5, we have that

equals

which equals

by Corollary 3.3. It therefore follows that the second moment is

Thus,

and we conclude that

Putting everything together gives

as desired.
For (17), we make a change of variables and apply the Cauchy–Schwarz inequality to bound $|\Lambda (T,f_1,f_2,f_3)|^2$ by

Expanding the square in the second quantity and making a change of variables yields

By applying the Cauchy–Schwarz inequality again in the $x,y,$ and
$h_1$ variables, the modulus squared of this is bounded above by

times

Expanding the square and making a change of variables, the second factor equals

which we can write as

where

The quantity (27) and the weight $\mu '$ can be analyzed in the same manner as the corresponding quantity and weight in the proof of (16), so that

Finally, for (18), we make a change of variables and apply the Cauchy–Schwarz inequality to bound $|\Lambda (T,T,f_2,f_3)|^2$ by
$\mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(T)$ times

Expanding the square in the second quantity and making a change of variables yields

which can be written as

where

This weight can again be analyzed in the same manner as the weights appearing in the proof of (16), giving

and completing the proof of the lemma.
6. Obtaining a density increment
As was mentioned in § 2, one ingredient in the proof of Theorem 2.5 is an inverse theorem for the $U^2(\Phi (x,\cdot ))$-norm localized to pseudorandom sets. Applying the standard, nonlocalized inverse theorem for the
$U^2(\Phi (x,\cdot ))$-norm would yield a density increment that gets weaker as
$T$ becomes sparser, which is inadequate to close the density-increment iteration. To get a strong enough localized version of this inverse theorem, we will need to use the transference principle. The particular instance of it required is an immediate consequence of the dense model lemma from [Reference ZhaoZha14], which appears as Lemma 3.3 in that paper.
Lemma 6.1 (Dense model lemma for the
$U^s$-norm on subspaces)
For every natural number $s$, there exists a constant
$c_s>0$ such that the following holds. Let
$\varepsilon >0$,
$V\leq \mathbb {F}_p^n$ be a subspace, and
$f,\nu :V\to [0,\infty )$ be functions satisfying:
(i)
$0\leq f\leq \nu$;
(ii)
$\mathbb {E}_{x\in V}f(x)\leq 1$; and
(iii)
$\|\nu -1\|_{U^s(V)}\leq \exp (-\varepsilon ^{-c_s})$.
Then there exists a $\tilde {f}:V\to [0,1]$ such that
$\mathbb {E}_{x\in V}f(x)=\mathbb {E}_{x\in V}\tilde {f}(x)$ and
$\|f-\tilde {f}\|_{U^s(V)}\leq \varepsilon$.
One can see that this lemma is a consequence of Zhao's lemma by using the Gowers–Cauchy–Schwarz inequality to translate between his $(s,\varepsilon )$-discrepancy pair condition and our
$U^s$-uniformity condition.
In the course of the proof of Theorem 2.5, we will encounter various averages of linear forms that turn out to be controlled by certain degree $1$ and
$2$ directional uniformity norms. Because of this, we will also need to obtain a density increment when these norms of the balanced function
$g_S=S-\sigma$ are large. The first two subsections of this section are devoted to proving that this is possible.
6.1 Results on degree
$1$ norms
We first show that the relevant fibers of any set of the form (11) typically have close to their average density, provided that $A,B,C,D,$ and
$\Phi$ are sufficiently pseudorandom.
Lemma 6.2 Let $d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ has density
$\alpha \rho$ in
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ and takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Let
$\varepsilon >0$ and assume that

Define $T$ by (11). Then

and

for any $\varepsilon '>0$.
Proof. From Lemma 5.1, we have $\mathbb {E}_{x\in A}\mu _{\mathbb {F}_p^n}(T(x,\cdot ))=\beta \gamma \delta \rho +O(\varepsilon ^{1/8}/\alpha )$. For the second moment, we argue as in § 5 to estimate

using Lemmas 3.5 and 3.6, giving $\mathbb {E}_{x\in A}\mu _{\mathbb {F}_p^n}(T(x,\cdot ))^2=\beta ^2\gamma ^2\delta ^2\rho ^2+O(\varepsilon ^{1/8}/\alpha )$. It now follows from Markov's inequality that

for all $\varepsilon '>0$. The three other estimates are proved analogously.
Now we can obtain a density increment when the degree $1$ uniformity norms controlled by
$\|\cdot \|_{\star _1}$ and
$\|\cdot \|_{\star _2}$ are large. The proof is essentially an averaging argument, like the proof of the analogous Lemma 3.1 in [Reference GreenGre05a]. The most substantial new feature, which will arise many times in this section, is that we must now verify that the set on which we claim to have found a density increment actually has close to the correct density in
$\mathbb {F}_p^n\times \mathbb {F}_p^n$. In the case of corners, any product set
$A\times B$ trivially has density equal to the product of the densities of
$A$ and
$B$. This is not, in general, the case for sets of the form (11) unless further assumptions are made about
$A,B,C,D,$ and
$\Phi$.
Lemma 6.3 There exist absolute constants $0< c_1<1< c_2$ such that the following holds. Let
$d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Let
$\tau >0$ and
$\varepsilon \leq c_1(\tau \alpha \beta \gamma \delta \rho )^{c_2}$, and assume that

Define $T$ by (11), and let
$S\subset T$ have density
$\sigma$ in
$T$. Suppose that


or

Then $S$ has density at least
$\sigma +\Omega (\tau ^{O(1)})$ on some subset
$T'$ of
$T$ of the form

where the densities of $A',B',C'\subset \mathbb {F}_p^n$ are
$\Omega (\tau ^{O(1)}\alpha )$,
$\Omega (\tau ^{O(1)}\beta )$, and
$\Omega (\tau ^{O(1)}\gamma )$, respectively.
Proof. The assumption (28) can be written as

Since $\mu _{\mathbb {F}_p^n}(T(x,\cdot ))>2\beta \gamma \delta \rho$ for at most a
$O(\varepsilon ^{1/8}/\alpha ^3\beta ^2\gamma ^2\delta ^2\rho ^2)$-proportion of
$x\in A$ by Lemma 6.2, it follows that, as long as
$c_2$ is sufficiently large, there exists a subset
$A_0\subset A$ of relative density at least
$\tau /4$ for which

for all $x\in A_0$, provided that
$c_1$ is small enough and
$c_2$ is large enough. Note that
$\mathbb {E}_{y}g_S(x,y)$ is a real number, and thus is either positive or negative. There must therefore exist a subset
$A_1\subset A_0$ of density at least
$1/2$ in
$A_0$ such that either

for every $x\in A_1$ or

for every $x\in A_1$.
In the first case, setting $A'=A_1$ and
$\alpha '=\mu _{\mathbb {F}_p^n}(A')$, we have

by Lemma 6.2 whenever $c_1$ is small enough and
$c_2$ is large enough, so that

Since $\mathbb {E}_{x,y}A'(x)B(y)C(x+y)D(2x+y)\Phi (x,y)=\alpha '\beta \gamma \delta \rho +O(\varepsilon ^{1/16}/\alpha )$ by Lemma 6.2, the conclusion of the lemma follows when
$c_1$ is small enough and
$c_2$ is large enough by taking
$B'=B$.
In the second case, setting $A'=A\setminus A_1$ and
$\alpha '=\mu _{\mathbb {F}_p^n}(A')$, we use the fact that
$\mathbb {E}_{x,y}g_S(x,y)=0$ to deduce that

whenever $c_1$ is small enough and
$c_2$ is large enough. Note that
$\mathbb {E}_{x,y}A'(x)B(y)C(x+y)D(2x+y)\Phi (x,y)=\alpha '\beta \gamma \delta \rho +O(\varepsilon ^{1/16}/\alpha )$ in this case as well by Lemma 6.2, so the conclusion of the lemma will follow as long as
$\alpha '=\Omega (\tau ^{O(1)}\alpha )$. But this also follows from Lemma 6.2, for we have

The proof of the lemma starting from the assumptions (29) and (30) is essentially identical, but using the second and third probability estimates from Lemma 6.2, respectively.
6.2 Results on degree
$2$ norms
To obtain a density increment when the relevant degree $2$ directional uniformity norms of
$g_S$ are large, we first need to show that certain degree
$2$ ‘inner products’ are controlled by the degree
$1$ directional uniformity norms studied in the previous subsection.
Lemma 6.4 Let $d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Let
$\varepsilon >0$ and assume that

Define $T$ by (11), and let
$S\subset T$ and
$\tau >0$.
If

where $g_\omega$ equals
$T$ or
$g_S$ for all
$\omega \in \{0,1\}^2$, at least one
$g_\omega$ equals
$g_S$, and at least one
$g_\omega$ equals
$T$, then

If

where $g_\omega$ equals
$T$ or
$g_S$ for all
$\omega \in \{0,1\}^2$, at least one
$g_\omega$ equals
$g_S$, and at least one
$g_\omega$ equals
$T$, then

or

If

where $g_\omega$ equals
$T$ or
$g_S$ for all
$\omega \in \{0,1\}^2$, at least one
$g_\omega$ equals
$g_S$, and at least one
$g_\omega$ equals
$T$, then

or

Proof. We rewrite the various assumptions that (31), (32), and (33) hold when at least two of the $g_\omega$ equal
$T$ as

and

where

and

Using the definition (11) of $T$ and arguing as in § 5 using Lemmas 3.5, 3.6, and 4.1 gives that each of
$\mu _1,\ldots,\mu _8$ is typically very close to its average value on its support. Precisely, we have the estimates

and

which imply that

and

Since $|\mathbb {E}_{x,y}g_S(x,y)|^2\leq \alpha \mathbb {E}_{x,y,h}g_S(x,y)g_S(x,y+h)$ by an application of the Cauchy–Schwarz inequality, the conclusion of the lemma easily follows starting from any one of the assumptions (31), (32), or (33) when at least two of the
$g_\omega$ equal
$T$.
To prove the lemma when only one $g_\omega$ in (31), (32), or (33) equals
$T$, we will apply the Cauchy–Schwarz inequality once, and then argue analogously. By making a change of variables, we may start from the assumption that

or

We apply the Cauchy–Schwarz inequality in each of these three cases to get that

times $\mathbb {E}_{x,y,h}\Delta _{(0,h)}T(x,y)$ is at least
$\tau ^2\alpha ^2\beta ^8\gamma ^8\delta ^8\rho ^6$,

times $\mathbb {E}_{x,y,h}\Delta _{(h,0)}T(x,y)$ is at least
$\tau ^2\alpha ^4\beta ^4\gamma ^8\delta ^8\rho ^8$, or

times $\mathbb {E}_{x,y,h}\Delta _{(0,h)}T(x,y)$ is at least
$\tau ^2\alpha ^4\beta ^8\gamma ^4\delta ^8\rho ^8$.
We have

and

by Lemmas 3.5, 3.6, and 4.1. Expanding the square and making a change of variables, (34) equals

where

(35) equals

where

and (36) equals

where

Analogously to the weights $\mu _1,\ldots,\mu _8$, Lemmas 3.5, 3.6, and 4.1 give the estimates

and

from which it follows that

and

This completes the proof of the lemma.
Now we are almost ready to prove our desired density-increment result for the localized degree $2$ directional uniformity norms controlled by
$\|\cdot \|_{\star _1}$.
Lemma 6.5 There exist absolute constants $0< c_1<1< c_2,c_3$ such that the following holds. Let
$d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Let
$\tau >0$ and

and assume that

Define $T$ by (11), and let
$S\subset T$ have density
$\sigma$ in
$T$. Suppose that

or

Then $S$ has density at least
$\sigma +\Omega (\tau ^{O(1)})$ on some subset
$T'$ of
$T$ of the form

where the densities of $A',B'\subset \mathbb {F}_p^n$ are both
$\Omega ((\sigma \tau \alpha \beta \gamma \delta \rho )^{O(1)})$, and
$\Phi '$ is of the form

where each $V_x'$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d+1$.
To prove this lemma starting from the assumption (37), we will apply a localized $U^2(\Phi (x,\cdot ))$-norm inverse theorem for many fixed
$x$, which will produce a density increment on a set of the form

where

This is not yet what the conclusion of the lemma promises, since $u'_x$ varies with
$x$. To show that we can select a fixed
$u'$ (at the cost of shrinking the size of
$A'$ a bit) we will need Lemma 6.6.
The reader may wonder why we cannot just run the density-increment argument on sets of the form (39) and skip having to prove Lemma 6.6. The issue with this hypothetical proof is that $U^s(\mathbb {F}_p^n\times \mathbb {F}_p^n)$-uniformity of a set of the form
$\Psi$ is not strong enough to guarantee that the analogue of Lemma 4.1 is true, regardless of how large
$s$ is taken to be. Thus, such sets are not as amenable to a Shkredov-like pseudorandomization procedure.
Lemma 6.6 There exist absolute constants $0< c_1<1< c_2$ such that the following holds. Let
$d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Let
$A\subset \mathbb {F}_p^n$ have density
$\alpha$,
$\Psi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ take the form

where $u_{x}\in \mathbb {F}_p^n$ and each
$V_{x}$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$, and
$K\subset A\times \mathbb {F}_p^n$ satisfy

for every $x\in A$ and every affine subspace
$H$ of
$\mathbb {F}_p^n$ of codimension
$d$. Let
$\tau >0$, assume that
$\varepsilon \leq c_1(\tau \alpha \beta \gamma \delta \rho )^{c_2}$, and suppose that
$S\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ has density at least
$\sigma +\tau$ in
$K\cap \Psi$, where
$\sigma >0$. Then there exists a
$u\in \mathbb {F}_p^n$ and a subset
$A'\subset A$ such that
$S\cap T'$ has density at least
$\sigma +\tau /4$ in
$T'$, where

and $\mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(A')\geq \alpha \rho \tau /2$.
Proof. Define

and

for every $u\in \mathbb {F}_p^n$, and set
$\alpha _{u}:=\mu _{\mathbb {F}_p^n}(A_{u})$, so that
$\mathbb {E}_u\alpha _u=\alpha \rho$. By our assumption on
$K$, we have

and

for all $u\in \mathbb {F}_p^n$.
Note that

and set

for each $u\in \mathbb {F}_p^n$. Then we have

where

so that

when $c_1$ is small enough and
$c_2$ is large enough. The contribution to
$\eta$ coming from
$u$ for which
$\alpha _u<\tau \alpha \rho /2$ is obviously at most
$\tau \alpha \rho /2$, which implies that

which is clearly positive. We thus conclude that there must exist a $u\in \mathbb {F}_p^n$ for which
$\alpha _u=\mu _{\mathbb {F}_p^n}(A_{u})\geq \tau \alpha \rho /2$ and

The conclusion of the lemma now follows by taking $A'=A_{u}$ and
$\Phi '=\Psi _{u}\cap (A'\times \mathbb {F}_p^n)$.
Now we can prove Lemma 6.5.
Proof of Lemma 6.5 First assume that (37) holds. By writing $S=g_S+\sigma T$, we see that
$\mathbb {E}_{x,y,h,k}\Delta _{(0,h),(0,k)}S(x,y)$ equals

plus $14$ terms of the form

where at least one $g_\omega$ equals
$g_S$ and at least one other equals
$\sigma T$, and

If one of the terms (40) has absolute value larger than $\tau \alpha \beta ^4\gamma ^4\delta ^4\rho ^3/32$, then combining Lemmas 6.4 and 6.3 produces the desired density increment. Thus, we may proceed under the assumption that all have size at most
$\tau \alpha \beta ^4\gamma ^4\delta ^4\rho ^3/32$, so that

For each $x\in \mathbb {F}_p^n$, set
$\Phi _x(y):=\Phi (x,y)$,
$T_x(y):=T(x,y)$, and
$S_x(y)=S(x,y)$. Lemmas 3.5, 3.6, and 4.1 tell us that

and that

or else Lemma 6.3 will again give the desired density increment. It follows that there exists a subset $A_0\subset A$ of density
$\gg \tau$ in
$A$ such that

and

for every $x\in A_0$. Setting

for all $x\in A_0$ we then have
$\|f_x\|^4_{U^2(\Phi _x)}\geq \sigma ^4+\tau /4$,
$0\leq f_x\leq \nu _x$,
$\mathbb {E}_{y}f_x(y)\leq 1$, and

provided that $c_1$ is small enough and
$c_2$ is large enough. Thus, as long as
$c_3$ is sufficiently large, Lemma 6.1 tells us that there exists a function
$\tilde {f}_x:\Phi _x\to [0,1]$ such that
$\mathbb {E}_{y\in \Phi _x}f_x(y)=\mathbb {E}_{y\in \Phi _x}\tilde {f}_x(y)$ and
$\|f_x-\tilde {f}_x\|_{U^2(\Phi _x)}^4\leq \tau /32$. As a consequence, since
$\|f_x\|^4_{U^2(\Phi _x)}\geq \sigma ^4+ {\tau }/{4}$, we must have

as well. Set $\tilde {\sigma }_x:=\mathbb {E}_{y}f_x(y)=\mathbb {E}_{y}\tilde {f}_x(y)$ and let
$v_x\in \Phi _x$, so that

Since $|\tilde {\sigma }_x-\sigma |<\tau /64$, it follows that there exists a nonzero
$\xi _x\in \widehat {\Phi _x-v_x}$ such that

where we have crucially used that $\tilde {f}_x$ is
$1$-bounded. As
$\|f_x-\tilde {f}_x\|_{U^2(\Phi _x)}^4\leq \tau /32$, it therefore follows that

for every $x\in A_0$.
Extend $x\mapsto \xi _x$ from
$A_0$ to
$A$ by picking a nonzero
$\xi _x\in \widehat {\Phi _x-v_x}$ arbitrarily for all
$x\in A\setminus A_0$. We now split the average over
$y\in \Phi _x$ above into an average of averages over cosets of
$\langle \xi _x\rangle ^\perp$ in
$\Phi _x$ and average over all of
$A$ to get that

and use the fact that

to deduce that

By applying the pigeonhole principle in the $x$ and
$t$ variables, it follows that there exists a subset
$A_1\subset A$ of density
$\gg \tau$ in
$A$ and, for each
$x\in A_1$, an element
$t_x\in \mathbb {F}_p$ for which

Thus, recalling the definition of $f_x$, we have

Define $\phi :\mathbb {F}_p^n\to \mathbb {F}_p^n$ by taking
$\phi (x)=\xi _x$ for all
$x\in A$ and
$\phi (x)$ to be an arbitrary element of
$(\Phi _x-v_x)^\perp \setminus \{0\}$ for all
$x\in \mathbb {F}_p^n\setminus A$, and similarly extend
$x\mapsto t_x$ from
$A_1$ to
$A$ by taking
$t_x$ to be an arbitrary element of
$\mathbb {F}_p$ for which

for all $x\in A\setminus A_1$. Such an element must exist by the pigeonhole principle. Set

so that $\operatorname {codim}\{y\in \mathbb {F}_p^n:\Psi (x,y)=1\}=d+1$ for every
$x\in A$, and
$\alpha _1:=|A_1|/p^n$. Then (41) can be rewritten as

It remains to check that the density $\mathbb {E}_{x,y}A_1(x)B(y)C(x+y)D(2x+y)\Psi (x,y)$ is close to
$\alpha _1\beta \gamma \delta \rho /p$, so that we indeed have the desired density increment. But by Lemmas 3.5 and 3.6, we have

for all but a $O(\sqrt {\varepsilon })$-proportion of
$x\in A$, from which it follows that

Thus, $S$ has density at least
$\sigma +\Omega (\tau )$ on

provided that $c_2$ is large enough. The conclusion of the lemma now follows from Lemma 6.6
Now suppose that (38) holds. By writing $S=g_S+\sigma T$ and arguing as in the first case, we may proceed under the assumption that

We will first show that either

for almost every pair $(x',y')\in S$, or else we can deduce the desired density increment using Lemma 6.3.
Consider the average

Using that $S=g_S+\sigma T$, the above can be written as

plus seven other terms of the form

where $g_0,g_1,$ and
$g_2$ all equal
$g_S$ or
$\sigma T$ and at least one
$g_i$ equals
$g_S$. By Lemmas 3.5, 3.6, and 4.1, the quantity (42) equals
$\sigma ^3\alpha ^2\beta ^2\gamma ^4\delta ^4\rho ^4+O(\varepsilon ^{\Omega (1)}/\rho ^{O(1)})$. Suppose that
$k$ of the functions
$g_0,g_1,$ and
$g_2$ in (43) equal
$\sigma T$. By a similar argument to those used to prove Lemma 6.4, if any term of the form (43) has size at least
$\tau ^4\sigma ^k\alpha ^2\beta ^2\gamma ^4\delta ^4\rho ^4$, then

or

so that the desired density increment follows from Lemma 6.3. Thus, we may proceed under the assumption that

Now consider the average

Using that $S=g_S+\sigma T$, the above can be written as

plus 31 other terms of the form

where $g_0,g_1,g_2,g_3,$ and
$g_4$ all equal
$g_S$ or
$\sigma T$ and at least one
$g_i$ equals
$g_S$.
By Lemmas 3.5, 3.6, and 4.1, the quantity (44) equals $\sigma ^5\alpha ^3\beta ^3\gamma ^7\delta ^7\rho ^7+O(\varepsilon ^{\Omega (1)}/\rho ^{O(1)})$. Suppose that
$k$ of the functions
$g_0,\ldots,g_4$ in (45) equal
$\sigma T$. Analogously to the situation for the first moment, if any of the terms of the form (45) has size at least
$\tau ^8\sigma ^k\alpha ^3\beta ^3\gamma ^7\delta ^7\rho ^7$, then we will be able to deduce the desired density increment. The most involved case is when
$g_0=\cdots =g_4=g_S$. All other cases can be handled using a simpler version of the argument we are about to carry out.
Thus, consider this most involved case, i.e. that

has size at least $\tau ^8\alpha ^3\beta ^3\gamma ^7\delta ^7\rho ^7$. Applying the Cauchy–Schwarz inequality in the variables
$x',y',y,$ and
$w$ gives that

times

is at least $\tau ^8\alpha ^{6}\beta ^{6}\gamma ^{14}\delta ^{14}\rho ^{14}$. By Lemmas 3.5, 3.6, and 4.1,
$\mathbb {E}_{x',y',y,w}T(x',y')T(x',y)T(x',w)=\alpha \beta ^3\gamma ^3\delta ^3\rho ^3+O(\varepsilon ^{\Omega (1)}/\rho ^{O(1)})$. Expanding the square in the average above, this means that

is $\gg \tau ^8\alpha ^5\beta ^3\gamma ^{11}\delta ^{11}\rho ^{11}$, provided that
$c_1$ is small enough and
$c_2$ is large enough. We can write the above as

where

Yet more applications of Lemmas 3.5, 3.6, and 4.1 give the estimate

so that

It follows from one more application of the Cauchy–Schwarz inequality in the variables $y',z,u,$ and
$v$ and a similar analysis to above that

which, combined with Lemma 6.3, gives the desired density increment.
Thus, we may also proceed under the assumption that

By Markov's inequality, we therefore have

for all but a $O(\tau ^2)$-proportion of
$(x',y')\in S$. As a consequence, there exists a pair
$(x',y')\in S$ for which both (46) holds and

which together imply that

when we take $A'(x)=S(x,y')$,
$B'(y)=S(x',y)$,
$C'=C$,
$D'=D$, and
$\Phi '=\Phi$ in the definition of
$T'$.
6.3 More preliminaries for
$\|\cdot \|_{\star _1}$
We will similarly need that certain $\|\cdot \|_{\star _1}$-inner products are controlled by the degree
$1$ and
$2$ directional uniformity norms appearing in the previous subsections. The proof of the lemma below is similar to the proof of Lemma 6.4, but with an extra application of the Cauchy–Schwarz inequality in some cases.
Lemma 6.7 Let $d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Let
$\varepsilon >0$, and assume that

Define $T$ by (11), and let
$S\subset T$ and
$\tau >0$. If

where at least one $g_\omega$ equals
$T$ and another equals
$g_S$, then

or

Our final preliminary lemma says that, for almost every $(x,x+h_3)\in A^2$, the function
$\Delta _{(h_3,0)}S(x,\cdot )$ is supported on a Fourier uniform subset of the affine subspace
$\{y\in \mathbb {F}_p^n:\Delta _{(h_3,0)}\Phi (x,y)=1\}$.
Lemma 6.8 Let $d$ be a nonnegative integer, and set
$\rho :=p^{-d}$. Suppose that
$A,B,C,D\subset \mathbb {F}_p^n$ have densities
$\alpha,\beta,\gamma,\delta$, respectively, and that
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ takes the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension
$d$. Let
$\varepsilon >0$, and assume that

Setting

and

then the probability

is $\ll \varepsilon ^{\Omega (1)}/\rho ^{O(1)}$.
Proof. By Lemmas 3.6 and 4.1, we have that $\|R_{x,y}-\beta \gamma ^2\delta ^2\|_{U^2(\mathbb {F}_p^n)}\geq \varepsilon ^{1/8}$ or
$\operatorname {codim}\{y\in \mathbb {F}_p^n:\Phi _{x,h}(y)=1\}\neq 2d$ for at most a
$O(\varepsilon ^{\Omega (1)}/\rho ^{O(1)})$-proportion of pairs
$(x,x+h)\in A^2$. For all of these typical pairs
$(x,h)$, we have

which is at most

by inserting the identity

for every $z\in \mathbb {F}_p^n$. For each fixed triple
$(\xi,\eta,\nu )$, the interior average above is bounded by
$\|R_{x,h}-\beta \gamma ^2\delta ^2\|_{U^2(\mathbb {F}_p^n)}$ by the Gowers–Cauchy–Schwarz inequality. Since there are
$\rho ^{-6}$ possible triples to sum over, we must have

from which the conclusion of the lemma follows.
6.4 Proof of Theorem 2.5
Now we can finally finish the proof of Theorem 2.5.
Proof of Theorem 2.5 First assume that

Analogously to the proof of Lemma 6.5, by writing $S=g_S+\sigma T$, we see that
$\|S\|_{\star _1}^8$ equals

plus 62 terms of the form

where at least one $g_\omega$ equals
$\sigma T$ and at least one other equals
$g_S$. Note that

by Lemmas 3.5, 3.6, and 4.1. If any of the terms (47) have absolute value at least $\frac {1}{128}\tau ^8\alpha ^2\beta ^4\gamma ^8\delta ^8\rho ^6$, then combining Lemma 6.7 with Lemma 6.3 or Lemma 6.5 produces the desired density increment. We may thus proceed under the assumption that these terms are all small, so that

For each pair $(x,x+h)\in A^2$, let
$R_{x,h}$ and
$\Phi _{x,h}$ be as in Lemma 6.8, and set
$S_{x,h}(y):=\Delta _{(h,0)}S(x,y)$. By Markov's inequality, either

or else $\mathbb {E}_{x,x+h\in A}|\mathbb {E}_{y}S_{x,h}(y)-\sigma ^2\beta \gamma ^2\delta ^2\rho ^2|^2$ is
$\gg \tau ^{16}\beta ^2\gamma ^4\delta ^4\rho ^4$, in which case a combination of Lemmas 6.3, 6.4, and 6.5 will produce the desired density increment. We may thus proceed under the assumption that (48) holds.
By (48) and Lemma 6.8, there exists a subset $A_0\subset \{(x,h)\in \mathbb {F}_p^n\times \mathbb {F}_p^n:x,x+h\in A\}$ of density
$\gg \tau ^{O(1)}\alpha ^2$ in
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ such that

and $\operatorname {codim}\{y\in \mathbb {F}_p^n:\Phi _{x,h}(y)=1\}=2d$ for all
$(x,h)\in A_0$. Note that
$S_{x,h}$ is supported on
$R_{x,h}\Phi _{x,h}$, and, setting

we have $\|f_{x,h}\|^4_{U^2(\Phi _{x,h})}\geq \sigma ^8+\tau ^8/4$,
$0\leq f_{x,h}\leq \nu _{x,h}$,
$\mathbb {E} f_{x,h}\leq 1$,
$|\mathbb {E} f_{x,h}-\sigma ^2|<\tau ^8/64$, and
$\|\nu _{x,h}-1\|_{U^2(\Phi _{x,h})}<\varepsilon ^{1/32}/\beta \gamma ^2\delta ^2\rho ^{3/2}<\exp (-(64/\tau )^{c_3})$, provided that
$c_1$ is small enough and
$c_2$ is large enough. Applying Lemma 6.1 on the affine subspace
$\Phi _{x,h}$ yields a function
$\tilde {f}_{x,h}:\Phi _{x,h}\to [0,1]$ such that

provided that $c_3$ is large enough. We must then also have

Arguing as in the proof of the first part of Lemma 6.5, it follows that, for every pair $(x,h)\in A_0$, there exists a nonzero
$\xi _{x,h}\in \widehat {\Phi _{x,h}-u}$ such that

so that

as well.
Extend $(x,h)\mapsto \xi _{x,h}$ from
$A_0$ to the set
$\{(x,h)\in \mathbb {F}_p^n\times \mathbb {F}_p^n:x,x+h\in A\}$ by picking a nonzero
$\xi _{x,h}\in \widehat {\Phi _{x,h}-u}$ arbitrarily for all pairs outside of
$A_0$. We split the average over
$y\in \Phi _{x,h}$ up into an average of averages over cosets of
$\langle \xi _{x,h}\rangle ^\perp$ and average over all pairs
$(x,h)$ such that
$x,x+h\in A$ to get that

for some absolute constant $c>0$. Note that

so that, if it were the case that

then we would be able to deduce the desired density increment from Lemma 6.3. Thus, we may proceed under the assumption

which we can combine with (49), a change of variables, and an application of the pigeonhole principle in the $t$ variable to deduce that

for some fixed $t\in \mathbb {F}_p$. Set

To deduce the desired density increment by applying the pigeonhole principle to (50), we will have to show that almost every set of the form

has close to the ‘correct’ density. Most of the remainder of our argument for $\|\cdot \|_{\star _1}$ is devoted to this task.
Set $Q_{x,h}(y):=S(h,y)C(x+y)D(2x+y)$. We start by showing that either
$\|Q_{x,h}-\sigma \beta \gamma ^2\delta ^2\|_{U^2(\Phi _{x,h-x})}$ is small for almost every pair
$(x,h)\in A\times A$, or else we can deduce a density increment from Lemmas 6.3 and 6.5. Note first that

where $\mu (y):=\mathbb {E}_{x}A(x)C(x+y)D(2x+y)\Phi (x,y)$. By Lemmas 3.5, 3.6, and 4.1, we have

so that $|\mathbb {E}_{x,h,y}A(x)A(h)\Phi _{x,h-x}(y)Q_{x,h}(y)-\sigma \alpha ^2\beta \gamma ^2\delta ^2\rho ^2|<\varepsilon ^{\Omega (1)}/\rho ^{O(1)}$. Similarly, the average
$\mathbb {E}_{x,h}A(x)A(h)|\mathbb {E}_yQ_{x,h}(y)\Phi _{x,h-x}(y)-\sigma \beta \gamma ^2\delta ^2\rho ^2|^2$ equals

and $\mathbb {E}_{x,y,h,k}A(x)A(h)\Phi _{x,h-x}(y)\Phi _{x,h-x}(y+k)Q_{x,h}(y)Q_{x,h}(y+k)$ equals

where

By Lemmas 3.5, 3.6, and 4.1, we have

so that

equals

Thus, if

for some $c'>0$ (to be chosen later), and
$c_1$ is small enough and
$c_2$ is large enough, then

in which case the desired density increment follows from Lemma 6.3. We may thus proceed under the assumption that

so that, by Markov's inequality, we have

It follows that $\mathbb {E}_{x,h}A(x)A(h)\|Q_{x,h}-\sigma \beta \gamma ^2\delta ^2\|_{U^2(\Phi _{x,h-x})}^4$ equals


If

then we could deduce the desired density increment from Lemma 6.5. Thus, we may proceed under the assumption that this inequality does not hold, which implies that $\mathbb {E}_{x,h}A(x)A(h)\|Q_{x,h}-\sigma \beta \gamma ^2\delta ^2\|_{U^2(\Phi _{x,h-x})}^4 \ll \tau ^{4c'}\alpha ^2\beta ^4\gamma ^8\delta ^8$. It then follows from Markov's inequality that

Next, we will show that, for typical $(x,h)\in A\times A$, the average size of
$S(x,y)S(h,y)\Psi _{h}(x,y)$ is not very large. Certainly,

for every $(x,h)\in A\times A$. Setting

Lemma 3.6 says that

Thus, by Lemma 3.5, as long as $c_2$ is large enough, we have

say.
Now, setting $c'=8c$, the contribution to the left-hand side of (50) coming from pairs
$(x,h)\in A\times A$ for which
$\|Q_{x,h}-\sigma \beta \gamma ^2\delta ^2\|_{U^2(\Phi _{x,h-x})}\gg \tau ^{16c}\beta \gamma ^2\delta ^2$ or
$|\mathbb {E}_y S(x,y)S(h,y)\Psi _h(x,y)|>100\beta \gamma ^2\delta ^2\rho ^2/p$ is
$\ll \tau ^{16c}$. Thus, by the pigeonhole principle, there exists an
$h\in A$ and a subset
$A'\subset A$ of density
$\alpha '\gg \tau ^{O(1)}\alpha$ in
$\mathbb {F}_p^n$ such that

and $\|Q_{x,h}-\sigma \beta \gamma ^2\delta ^2\|_{U^2(\Phi _{x,h-x})}\ll \tau ^{16c}\beta \gamma ^2\delta ^2$ for every
$x\in A'$. Recalling the definition of
$f_{x,h}$, the above displayed equation says that

We now use Lemma 6.6 to find a subset $A''\subset A'$ of density
$\alpha ''\gg \alpha '\rho \tau ^{O(1)}$ and a
$u'\in \mathbb {F}_p^n$ such that

where

This gives the conclusion of the lemma.
The proofs of the two remaining cases are very similar to those appearing in earlier subsections, so we will be more brief in our arguments. Next, assume that

As in the second part of the proof of Lemma 6.5, we may proceed under the assumption that

and use it to show that either

for almost every pair $(x',y')$ for which
$(-x',x'+y')\in S$, or else the desired density increment follows from Lemma 6.3. By Lemmas 3.5, 3.6, and 4.1 and the Cauchy–Schwarz inequality, either

and

or else we have the desired density increment from Lemma 6.3. It then follows from Markov's inequality that

for all but a $\tau /4$-proportion of pairs
$(x',y')$ for which
$(-x',x'+y')\in S$, which means that we can find such a pair for which we also have

so that we get the desired density increment by taking $A'=S(x,y'-x)$,
$B'=B$,
$C'=S(-x',x+y+x')$,
$D'=D$, and
$\Phi '=\Phi$ in the definition of
$T'$.
Now assume that

which means

By Lemma 6.2 and the pigeonhole principle, there exists a subset $D_0\subset D$ of density at least
$\tau ^2/2$ for which

whenever $z\in D_0$. As in the proof of Lemma 6.3, there is a subset
$D_1\subset D_0$ of density at least
$1/2$ in
$D_0$ such that either
$\mathbb {E}_{2x+y=z}g_S(x,y)\geq \tau \alpha \beta \gamma \rho /4$ or
$\mathbb {E}_{2x+y=z}g_S(x,y)\leq -\tau \alpha \beta \gamma \rho /4$ for all
$z\in D_1$. As in the proof of Lemma 6.3, we can simply take
$D'=D_1$ and
$D'=D\setminus D_1$ in these cases, respectively, since, by Lemma 6.2, the fibers
$\{(x,y)\in T:2x+y=z\}$ typically have very close to their average size.
7. Pseudorandomization
This section begins with yet more preliminaries. To prove Lemma 2.6, we will need a result of Cohen and Tal, which says that, for any finite set of polynomials in $\mathbb {F}_p[x_1,\ldots,x_m]$, one can find a partition of
$\mathbb {F}_p^m$ into affine subspaces of relatively large dimension on which all of the polynomials are constant.
Theorem 7.1 (Cohen and Tal [Theorem 3.6, Reference Cohen and TalCT15] specialized to prime fields)
Let $d,m,$ and
$t$ be natural numbers. There exists a positive integer
$m'$ satisfying

such that, for any polynomials $P_1,\ldots,P_t\in \mathbb {F}_p[x_1,\ldots,x_m]$ of degree at most
$d$, there is a partition of
$\mathbb {F}_p^m$ into affine subspaces of dimension
$m'$ such that
$P_1,\ldots,P_t$ are constant on each affine subspace.
To see that this formulation of Cohen and Tal's theorem is equivalent to Theorem 3.6 of [Reference Cohen and TalCT15], note that one can make all of the affine subspaces in the partitions produced by their theorem have the same dimension by simply partitioning each subspace not of the minimum possible dimension into more subspaces.
We will also need a ‘bilinear’ version of Cohen and Tal's result, which we will use to find partitions of $\mathbb {F}_p^n\times \mathbb {F}_p^n$ into product spaces of the form
$(u+V)\times (w+V)$ on which polynomials in two sets of variables
$x_1,\ldots,x_m,y_1,\ldots,y_m$ are constant.
Corollary 7.2 Let $d$ and
$m$ be natural numbers. There exists a positive integer
$m'$ satisfying

such that, for any polynomial $R\in \mathbb {F}_p[x_1,\ldots,x_m,y_1,\ldots,y_m]$ of degree at most
$d$, there is a partition of
$\mathbb {F}_p^m\times \mathbb {F}_p^m$ into products of affine subspaces of the form

with each $\dim V= m'$, such that
$R$ is constant on each product of affine subspaces.
Proof. Write

where $P_0,\ldots,P_d,Q_0,\ldots,Q_d\in \mathbb {F}_p[z_1,\ldots,z_m]$ satisfy
$\deg {P_i}\leq i$ and
$\deg {Q_i}\leq d-i$ for all
$0\leq i \leq d$. Applying Theorem 7.1 to
$P_0,\ldots,P_d$ gives us a positive integer
$m'_0\gg {m^{1/(d-1)!}}/{(d+1)^e}$ and a partition

of $\mathbb {F}_p^m$, with
$\dim {V_j}=m'_0$ for each
$j\in J$, such that
$P_0,\ldots,P_d$ are all constant on each affine subspace
$u_j+V_j$.
Now write

Since restricting a polynomial to a subspace cannot increase its degree, for each $j\in J$, we can apply Theorem 7.1 on each affine subspace
$w+V_j$ to the polynomial

to get that there exists a positive integer $m'$ satisfying

and a partition

with each subspace $V_k'\leq V_j$ having
$\dim V_k'=m'$, such that
$Q_{j,w}$ is constant on each
$w_k+V_k'$.
Thus, we can write

where $R$ is constant on each product
$(u_j+V_j)\times (w_k+V_k')$. Since
$V_k'\leq V_j$ we can further refine this partition to one of the desired form

on each part of which $R$ is still constant.
Finally, we will recall the recent quantitative inverse theorem of Gowers and Milićević for the $U^s$-norms on vector spaces over finite fields.
Theorem 7.3 (Gowers and Milićević [Theorem 7, Reference Gowers and MilićevićGM20])
Let $s$ be a natural number and assume that
$p\geq s$. There exist constants
$c_s,c_{s,p}'>0$ so that, if
$f:\mathbb {F}_p^m\to \mathbb {C}$ is a
$1$-bounded function satisfying

then there exists a polynomial $P\in \mathbb {F}_p[x_1,\ldots,x_m]$ of degree
$\deg {P}\leq s-1$ such that

7.1 Proof of Lemma 2.6
Our proof of Lemma 2.6 is modeled after the corresponding pseudorandomization arguments in [Reference ShkredovShk06a, Reference ShkredovShk06b] and [Reference GreenGre05a]. As was said in the outline, some new features arise from our desire for $B',C',D',$ and
$\Phi '$ to be uniform with respect to
$U^s$-norms of degree greater than
$2$ and from
$\Phi$'s particular structure as a union of affine subspaces of the second factor of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$. The first of these two complications can be overcome by using Theorem 7.3 and then Theorem 7.1 or Corollary 7.2. We will discuss how
$\Phi$'s structure influences our proof shortly.
The proof of Lemma 2.6 proceeds via an energy-increment argument. Each step of the energy-increment iteration will produce a partition $\mathscr {C}_{j}=(\mathcal {C}_{i,j})_{i\in I_j}$ of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ into cells
$\mathcal {C}_{i,j}$ of the form

for some subspace $V_{i,j}\leq \mathbb {F}_p^n$. For each cell
$\mathcal {C}=(u+V)\times (w+V)$ in a partition
$\mathscr {C}$, we set
–
$B_{\mathcal {C}}:= B\cap (w+V)$,
–
$C_{\mathcal {C}}:= C\cap (u+w+V)$,
–
$D_{\mathcal {C}}:= D\cap (2u+w+V)$, and
–
$\Phi _{\mathcal {C}}:=\Phi \cap \mathcal {C}$,
and, correspondingly,
–
$\beta (\mathcal {C}):=\mu _{w+V}(B_{\mathcal {C}})$,
–
$\gamma (\mathcal {C}):=\mu _{u+w+V}(C_{\mathcal {C}})$,
–
$\delta (\mathcal {C}):=\mu _{2u+w+V}(D_{\mathcal {C}})$, and
–
$\phi (\mathcal {C}):=\mu _{\mathcal {C}}(\Phi _{\mathcal {C}})$,
so that

Analogously to the pseudorandomization procedure for corners, we will show that if $\|B_{\mathcal {C}}-\beta (\mathcal {C})\|_{U^8(\mathbb {F}_p^n)},\|C_{\mathcal {C}}-\gamma (\mathcal {C})\|_{U^8(\mathbb {F}_p^n)},$ or
$\|D_{\mathcal {C}}-\delta (\mathcal {C})\|_{U^8(\mathbb {F}_p^n)}$ is large for a substantial portion of cells
$\mathcal {C}$ in a partition
$\mathscr {C}$, then there exists a refinement
$\mathscr {C}'$ of
$\mathscr {C}$ which has substantially larger energy. But even if
$\Phi _{\mathcal {C}}$ is a union of affine subspaces of the same codimension
$d$ for most cells in the partition, this may not be the case for the
$\Phi _{\mathcal {C}'}$ corresponding to the cells
$\mathcal {C}'$ in the refinement
$\mathscr {C}'$. The codimensions of the affine subspaces
$\{y\in w'+V':\Phi _{\mathcal {C}'}(x,y)=1\}$ can range from
$0$ to
$d$, so before even considering how to obtain a pseudorandom
$\Phi '$, we have already found an obstacle to even getting a set of the same general form as
$\Phi '$.
To get around this issue, we will pseudorandomize each of the sets

for $0\leq i\leq d$, instead of just
$\Phi _{\mathcal {C}}=\Phi ^{\leq d}_{\mathcal {C}}$ itself. The definition (54) of
$\Phi ^{\leq i}_{\mathcal {C}}$ selects all of the subspaces of the second factor of
$\mathcal {C}$ comprising
$\Phi _{\mathcal {C}}$ that have codimension at most
$i$. This will pseudorandomize each of

as well. At the end of the proof of Lemma 2.6, we will use an averaging argument to choose $\Phi '$ to be some suitable
$\Phi _{\mathcal {C}}^i$.
For each $0\leq i\leq d$, set
$\phi ^{\leq i}(\mathcal {C}):=\mu _{\mathcal {C}}(\Phi _{\mathcal {C}}^{\leq i})$. We define the energy
$E(\mathscr {C})$ of a partition
$\mathscr {C}$ of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ by

Note that the energy of any partition is bounded above by $1$. We will now prove a couple of lemmas concerning this energy.
Lemma 7.4 Let $m'>m$ and
$d$ be nonnegative integers,
$A,B,C,D\subset \mathbb {F}_p^n$,
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ be of the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension between
$0$ and
$d$,
$\mathscr {C}$ be a partition of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ with each cell
$\mathcal {C}$ taking the form

for some subspace $V_{\mathcal {C}}\leq \mathbb {F}_p^n$ of codimension
$m$, and suppose that
$\mathscr {C}'$ is a refinement of
$\mathscr {C}$ with each cell
$\mathcal {C}'$ taking the form

for some subspace $V'_{\mathcal {C}'}$ of codimension
$m'$. Then

and

for every $0\leq i\leq d$.
Proof. Note that it suffices to prove the result with the sum over $\mathcal {C}\in \mathscr {C}$ restricted to a single cell
$\mathcal {C}_0$ and the sum over
$\mathcal {C}'\in \mathscr {C}'$ restricted to all cells contained in
$\mathcal {C}_0$, since one can then just sum over
$\mathcal {C}_0$ in
$\mathscr {C}$ to get the desired result. Thus, we may assume without loss of generality that
$\mathscr {C}$ is the trivial partition
$\{\mathbb {F}_p^n\times \mathbb {F}_p^n\}$.
Let $\beta,\gamma,$ and
$\delta$ denote the densities of
$B,C,$ and
$D$, respectively, in
$\mathbb {F}_p^n$. Since
$\mathbb {F}_p^n\times B$ has density
$\beta$ in
$\mathbb {F}_p^n\times \mathbb {F}_p^n$, we have

so that, by the Cauchy–Schwarz inequality,

as desired, since

for all cells $\mathcal {C}'$ of
$\mathscr {C}'$. Similarly, since
$\{(x,y)\in \mathbb {F}_p^n\times \mathbb {F}_p^n:x+y\in C\}$ has density
$\gamma$ in
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ and
$\{(x,y)\in \mathbb {F}_p^n\times \mathbb {F}_p^n:2x+y\in D\}$ has density
$\delta$ in
$\mathbb {F}_p^n\times \mathbb {F}_p^n$, we have

and

it follows again from the Cauchy–Schwarz inequality that

and

The argument for the $\phi ^{\leq i}$ is also essentially identical, but with one small difference. For ease of notation, set
$\Phi ^{\leq i}:=\Phi ^{\leq i}_{\mathbb {F}_p^n\times \mathbb {F}_p^n}$ and
$\phi ^{\leq i}:=\phi ^{\leq i}(\mathbb {F}_p^n\times \mathbb {F}_p^n)$ for all
$0\leq i\leq d$. Then we actually have

instead of equality, since $\Phi ^{\leq i}\cap \mathcal {C}'\subset \Phi ^{\leq i}_{\mathcal {C}'}$ (which is why we run the energy-increment argument with the
$\Phi ^{\leq i}$, instead of the
$\Phi ^i$). It therefore follows yet again from the Cauchy–Schwarz inequality that

for all $0\leq i\leq d$.
Lemma 7.5 Let $m$ and
$d$ be nonnegative integers,
$A,B,C,D\subset \mathbb {F}_p^n$,
$\Phi \subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ be of the form

where each $V_x$ is a subspace of
$\mathbb {F}_p^n$ of codimension between
$0$ and
$d$, and
$\mathscr {C}$ be a partition of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$ with each cell
$\mathcal {C}$ taking the form

for some subspace $V_{\mathcal {C}}\leq \mathbb {F}_p^n$ of dimension
$m$. There exists a positive integer

and positive integers $c,c'_p>0$, such that the following holds.
Let $\mathcal {C}\in \mathscr {C}$.
(i) If
$\|B_{\mathcal {C}}-\beta (\mathcal {C})\|_{U^{10}(w_\mathcal {C}+V_{\mathcal {C}})}\geq \varepsilon$, then there exists a partition
$\mathscr {C}'_{\mathcal {C}}$ of
$\mathcal {C}$ with each cell
$\mathcal {C}'$ taking the form
\[ \mathcal{C}'=(u_{\mathcal{C}'}'+V_{\mathcal{C}'}')\times(w_{\mathcal{C}'}'+V_{\mathcal{C}'}'), \]
$\dim V_{C'}'=m'$, such that
\[ \sum_{\mathcal{C}'\in\mathscr{C}'_{\mathcal{C}}}\beta(\mathcal{C}')^2 \mu_{\mathcal{C}}(\mathcal{C}')\geq\beta(\mathcal{C})^2+\Omega\bigg(\frac{1}{\exp^c(c'_p/\varepsilon)^2}\bigg). \]
(ii) If
$\|C_{\mathcal {C}}-\gamma (\mathcal {C})\|_{U^{10}(u_{\mathcal {C}}+w_\mathcal {C}+V_{\mathcal {C}})}\geq \varepsilon$, then there exists a partition
$\mathscr {C}'_{\mathcal {C}}$ of
$\mathcal {C}$ with each cell
$\mathcal {C}'$ taking the form
\[ \mathcal{C}'=(u_{\mathcal{C}'}'+V_{\mathcal{C}'}')\times(w_{\mathcal{C}'}'+V_{\mathcal{C}'}'), \]
$\dim V_{C'}'=m'$, such that
\[ \sum_{\mathcal{C}'\in\mathscr{C}'_{\mathcal{C}}}\gamma(\mathcal{C}')^2 \mu_{\mathcal{C}}(\mathcal{C}')\geq\gamma(\mathcal{C})^2+\Omega\bigg(\frac{1}{\exp^c(c'_p/\varepsilon)^2}\bigg). \]
(iii) If
$\|D_{\mathcal {C}}-\delta (\mathcal {C})\|_{U^{10}(2u_{\mathcal {C}}+w_\mathcal {C}+V_{\mathcal {C}})}\geq \varepsilon$, then there exists a partition
$\mathscr {C}'_{\mathcal {C}}$ of
$\mathcal {C}$ with each cell
$\mathcal {C}'$ taking the form
\[ \mathcal{C}'=(u_{\mathcal{C}'}'+V_{\mathcal{C}'}')\times(w_{\mathcal{C}'}'+V_{\mathcal{C}'}'), \]
$\dim V_{C'}'=m'$, such that
\[ \sum_{\mathcal{C}'\in\mathscr{C}'_{\mathcal{C}}}\delta(\mathcal{C}')^2 \mu_{\mathcal{C}}(\mathcal{C}')\geq\delta(\mathcal{C})^2+\Omega\bigg(\frac{1}{\exp^c(c'_p/\varepsilon)^2}\bigg). \]
(iv) Let
$0\leq i\leq d$. If
$\|\Phi ^{\leq i}_{\mathcal {C}}-\phi ^{\leq i} (\mathcal {C})\|_{U^{8}(\mathcal {C})}\geq \varepsilon$, then there exists a partition
$\mathscr {C}'_{\mathcal {C}}$ of
$\mathcal {C}$ with each cell
$\mathcal {C}'$ taking the form
\[ \mathcal{C}'=(u_{\mathcal{C}'}'+V_{\mathcal{C}'}')\times(w_{\mathcal{C}'}'+V_{\mathcal{C}'}'), \]
$\dim V_{C'}'=m'$, such that
\[ \sum_{\mathcal{C}'\in\mathscr{C}'_{\mathcal{C}}}\phi^{\leq i}(\mathcal{C}')^2 \mu_{\mathcal{C}}(\mathcal{C}')\geq\phi^{\leq i}(\mathcal{C})^2+\Omega\bigg(\frac{1}{\exp^c(c'_p/\varepsilon)^2}\bigg). \]
Proof. Let $c$ and
$c'_p$ denote the constants
$c_{10}$ and
$c_{10,p}'$, respectively, from Theorem 7.3, and
$m'$ denote the smaller of the two minimum values of
$m'$ appearing in Theorem 7.1 when we take
$m$ as in this lemma,
$d=9$, and
$t=1$ and
$m'$ appearing in Corollary 7.2 when we take
$m$ as in this lemma and
$d=7$.
First assume that $\|B_{\mathcal {C}}-\beta (\mathcal {C})\|_{U^{10}(w_{\mathcal {C}}+V_{\mathcal {C}})}\geq \varepsilon$. Then applying Theorem 7.3 with
$s=10$ yields a polynomial
$P\in \mathbb {F}_p[x_1,\ldots,x_m]$ of degree at most
$9$ such that

By Theorem 7.1, there exists a partition $(w_i+V_i)_{i\in I}$ of
$w_{\mathcal {C}}+V_{\mathcal {C}}$ into affine subspaces of
$w_{\mathcal {C}}+V_{\mathcal {C}}$ of dimension
$m'$, on each of which
$P$ is constant. Thus, by the triangle inequality,

so that, by the Cauchy–Schwarz inequality,

Expanding the square, this means that

Now we partition the whole cell of interest $\mathcal {C}$ by writing

and, for each $i\in I$, use that
$V_i\leq V_{\mathcal {C}}$ to partition
$u_{\mathcal {C}}+V_{\mathcal {C}}$ into cosets of
$V_i$ to get

Since $\mu _{\mathcal {C}}(\mathcal {C}')=|\mathscr {C}'|^{-1}$ for each
$\mathcal {C}'\in \mathscr {C}_{\mathcal {C}}'$, (55) reads

The arguments for $C_{\mathcal {C}}$ and
$D_{\mathcal {C}}$ are again analogous, but we include them for the sake of completeness. Next, assume that
$\|C_{\mathcal {C}}-\gamma (\mathcal {C})\|_{U^{10}(u_{\mathcal {C}}+w_{\mathcal {C}}+V_{\mathcal {C}})}\geq \varepsilon$. Applying Theorem 7.3 yields a polynomial
$P\in \mathbb {F}_p[x_1,\ldots,x_m]$ of degree at most
$9$ such that

By Theorem 7.1, there exists a partition $(v_i+V_i)_{i\in I}$ of
$u_{\mathcal {C}}+w_{\mathcal {C}}+V_{\mathcal {C}}$ into affine subspaces of
$u_{\mathcal {C}}+w_{\mathcal {C}}+V_{\mathcal {C}}$ of dimension
$m'$ on each of which
$P$ is constant. Thus, by the Cauchy–Schwarz inequality again,

Now we partition the whole cell $\mathcal {C}$ by writing

Since $(v_i-w_{\mathcal {C}}+v'+V_i)+(w_{\mathcal {C}}-v'+V_i)=v_i+V_i$, we conclude that

Now assume that $\|D_{\mathcal {C}}-\delta (\mathcal {C})\|_{U^{10}(2u_{\mathcal {C}}+w_{\mathcal {C}}+V_{\mathcal {C}})}\geq \varepsilon$. Applying Theorem 7.3 yields a polynomial
$P\in \mathbb {F}_p[x_1,\ldots,x_m]$ of degree at most
$9$ such that

By Theorem 7.1, there exists a partition $(v_i+V_i)_{i\in I}$ of
$2u_{\mathcal {C}}+w_{\mathcal {C}}+V_{\mathcal {C}}$ into affine subspaces of
$2u_{\mathcal {C}}+w_{\mathcal {C}}+V_{\mathcal {C}}$ of dimension
$m'$ on each of which
$P$ is constant. Thus, by the Cauchy–Schwarz inequality yet again,

Now we partition the whole cell $\mathcal {C}$ by writing

Since $(2u_{\mathcal {C}}+2v'+V_i)+(v_i-2u_{\mathcal {C}}-2v'+V_i)=v_i+V_i$, we conclude that

Finally, suppose that $\|\Phi ^{\leq i}_{\mathcal {C}}-\phi ^{\leq i}(\mathcal {C})\|_{U^{8}(\mathcal {C})}\geq \varepsilon$ for some
$0\leq i\leq d$. Theorem 7.3 then says that there exists a polynomial
$R\in \mathbb {F}_p[x_1,\ldots,x_m,y_1,\ldots,y_m]$ of degree at most
$7$ such that

By Corollary 7.2, there exists a partition $\mathscr {C}'_{\mathcal {C}}$ of
$\mathcal {C}$ into affine subspaces of the form
$(u+V)\times (w+V)$ with
$\dim V=m'$, on each of which
$R$ is constant. Thus, by the Cauchy–Schwarz inequality, we have

Since

the conclusion

now follows.
Now we can prove Lemma 2.6.
Proof of Lemma 2.6 We proceed via an energy-increment argument, as described at the beginning of the subsection. A cell $\mathcal {C}=(u+V)\times (w+V)$ in a partition
$\mathscr {C}_j$ is said to be expired if
$\beta (\mathcal {C}),\gamma (\mathcal {C}),\delta (\mathcal {C}),$ or
$\phi ^{\leq d}(\mathcal {C})$ is less than
$\tau \mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(T)/4$, and a nonexpired cell
$\mathcal {C}$ is said to be uniform if

and

for all $0\leq i\leq d$. We will denote the subset of expired cells of
$\mathscr {C}_j$ by
$\mathscr {E}_j$, the subset of uniform cells by
$\mathscr {U}_j$, and the subset of nonexpired, nonuniform cells by
$\mathscr {N}_j$, so that
$\mathscr {E}_j,\mathscr {U}_j$, and
$\mathscr {N}_j$ partition
$\mathscr {C}_j$. For any subset
$K\subset I_j$, we define
$\eta (K)$ to be the measure of all cells indexed by
$K$:

Finally, we define a sequence of integers $(m_j)_{j=0}^{\infty }$ by setting
$m_0=n$ and, for every
$j>0$,
$m_{j}$ to be the minimum of the value of
$m'$ appearing in Theorem 7.1 when we take
$m=m_{j-1}$,
$d=9$, and
$t=1$ and of
$m'$ appearing in Corollary 7.2 when we take
$m=m_{j-1}$ and
$d=7$, so that

for some absolute constants $0< c_1,c_2<1$.
Set $\mathscr {C}_0$ to be the trivial partition
$\{\mathbb {F}_p^n\times \mathbb {F}_p^n\}$ of
$\mathbb {F}_p^n\times \mathbb {F}_p^n$. Letting
$c=c_{10}$ and
$c'=c'_{10,p}$ be as in Lemma 7.5, then, as long as
$\eta (\mathscr {N}_j)\geq \tau \mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(T)/2$ and
$m_{j+1}\geq 1$, there exists a refinement
$\mathscr {C}_{j+1}$ of
$\mathscr {C}_j$ such that:
(i)
$\dim {V_{i,j+1}}\geq m_{j+1}$ for every
$i\in I_{j+1}$ and
$\dim {V_{i,j+1}}=m_{j+1}$ whenever
$\mathcal {C}_{i,j+1}\in \mathscr {N}_{j+1}$; and
(ii)
$E(\mathscr {C}_{j+1})\geq E(\mathscr {C}_j)+\Omega ( {\tau \mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(T)}/{d\exp ^c(c'_p/\varepsilon )^2})$.
Indeed, suppose that $\eta (\mathscr {N}_j)\geq \tau \mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(T)/2$. Each cell
$\mathcal {C}$ in
$\mathscr {N}_j$ must be of dimension
$m_j\times m_j$. By Lemmas 7.4 and 7.5, there exists a partition
$\mathscr {C}_{k,j+1}=(\mathcal {C}_{k,j+1})_{k\in K_{\mathcal {C}}}$ of each
$\mathcal {C}\in \mathscr {N}_j$ such that

is at least

and each $\mathcal {C}_{k,j+1}$ is of the form
$(u'+V')\times (w'+V')$ with
$\dim {V'}=m_{j+1}\geq 1$. Taking

we see that multiplying both sides of the above by $\mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(\mathcal {C})$ and summing over
$\mathcal {C}\in \mathscr {N}_j$ yields

Since $E(\mathscr {C})\leq 1$ for all partitions
$\mathscr {C}$, this iteration must terminate for some

at which point either $\eta (\mathscr {N}_{j_0})<\tau \mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(T)/2$ or
$m_{j_0+1}<1$. Assuming that
$n\geq c_1^{-c_2^{-(j_0+1)}}$ ensures that the latter case cannot occur.
Since $\eta (\mathscr {E}_j)<\tau \mu _{\mathbb {F}_p^n\times \mathbb {F}_p^n}(T)/4$, we have

This certainly implies that

so that, by the pigeonhole principle, there exists a cell $\mathcal {C}_0$ in
$\mathscr {U}_{j_0}$ for which

Since $\Phi _{\mathcal {C}_0}=\coprod _{i=0}^d\Phi ^{i}_{\mathcal {C}_0}$, another application of the pigeonhole principle tells us that there exists a
$0\leq i\leq d$ for which we also have the density increment

As noted at the beginning of this subsection,

so that the conclusion of the lemma now follows.
8. The density-increment argument
Now we can finally prove Theorem 1.2 by iterating Lemma 2.7.
Proof of Theorem 1.2 Suppose that $S\subset \mathbb {F}_p^n\times \mathbb {F}_p^n$ has density
$\sigma$ and contains no nontrivial L-shaped configurations. Set
$S_0:=S$,
$n_0:=n$,
$d_0=0$,
$\varepsilon _0:=1$,
$A_0=B_0=C_0=D_0=\mathbb {F}_p^n$, and
$\Phi _0:=\mathbb {F}_p^n\times \mathbb {F}_p^n$. Applying Lemma 2.7 repeatedly produces sequences of
$S_i$,
$n_i$,
$d_i$,
$\varepsilon _i$,
$A_i$,
$B_i$,
$C_i$,
$D_i$, and
$\Phi _i$, with
$A_i,B_i,C_i,D_i\subset \mathbb {F}_p^{n_i}$ and
$\Phi _i\subset A_i\times \mathbb {F}_p^{n_i}$ of the form

where each $V_x\leq \mathbb {F}_p^{n_i}$ is a subspace of codimension
$d_i$, such that, on setting

$\alpha _i:=\mu _{\mathbb {F}_p^{n_i}}(A_i)$,
$\beta _i:=\mu _{\mathbb {F}_p^{n_i}}(B_i)$,
$\gamma _i:=\mu _{\mathbb {F}_p^{n_i}}(C_i)$,
$\delta _i:=\mu _{\mathbb {F}_p^{n_i}}(D_i)$, and
$\rho _i:=\mu _{\mathbb {F}_p^{n_i}\times \mathbb {F}_p^{n_i}}(\Phi _i)/\alpha _i=p^{-d_i}$, we have, for each
$i\geq 1$, that:
(i)
$S_i\subset T_i$ has density
$\sigma _i$ in
$T_i$, where
$\sigma _i\geq \sigma _{i-1}+\Omega (\sigma ^{O(1)})$;
(ii)
$n_i\gg n_{i-1}^{c_1^{O(\exp ^c(c'/\varepsilon _i)/(\sigma \alpha _{i-1} \beta _{i-1}\gamma _{i-1}\delta _{i-1}\rho _{i-1})^{O(1)})}}$;
(iii)
$\varepsilon _i\leq (\sigma \alpha _{i}\beta _{i}\gamma _{i}\delta _{i}\rho _{i})^{O(1)}\exp (-(64/\sigma )^{O(1)})$;
(iv)
$\alpha _i,\beta _i,\gamma _i,\delta _i\gg (\sigma \alpha _{i-1}\beta _{i-1} \gamma _{i-1}\delta _{i-1}\rho _{i-1})^{O(1)}$;
(v)
$d_i\leq d_{i-1}+1$;
(vi)
$\|A_i-\alpha _i\|_{U^{10}(\mathbb {F}_p^{n_i})}$,
$\|B_i-\beta _i\|_{U^{10}(\mathbb {F}_p^{n_i})}$,
$\|C_i-\gamma _i\|_{U^{10}(\mathbb {F}_p^{n_i})}$,
$\|D_i-\delta _i\|_{U^{10}(\mathbb {F}_p^{n_i})}$,
$\|\Phi _i- \alpha _i\rho _i\|_{U^8(\mathbb {F}_p^{n_i}\times \mathbb {F}_p^{n_i})}<\varepsilon _i$; and
(vii)
$S_i$ contains no nontrivial L-shaped configurations;
provided that

Since no set can have density larger than $1$, the lower bound (56) must fail for some
$i=i_0+1\ll \sigma ^{-O(1)}$. Thus, there exists an absolute constant
$c''>1$ such that

while, on the other hand,

Comparing the upper and lower bounds for $n_{i_0}$ and taking the
$c''$-fold iterated logarithm of both sides yields the bound in Theorem 1.2.
Acknowledgements
The author thanks Ben Green and Freddie Manners for helpful discussions, and Ben Green, Noah Kravitz, Terry Tao, and the anonymous referees for useful comments on earlier drafts. The author was supported by the NSF Mathematical Sciences Postdoctoral Research Fellowship Program under Grant No. DMS-1903038 and the Oswald Veblen fund, and also gratefully acknowledges the support and hospitality of the Hausdorff Institute for Mathematics, where the bulk of this paper was written.
Conflicts of Interest
None.