1. Introduction
The Longest Induced Path problem concerns finding a largest subset of vertices $S$ in a graph $G$ such that the induced subgraph $G[S]$ is a path. The decision problem of determining whether there exists an induced path of a specified length is NP-complete. It is difficult and interesting even on very specific graphs $G$ . For example, the Snake-in-the-box Problem [Reference Abbott and Katchalski1, Reference Suen29, Reference Łuczak and Palka32], motivated by the theory of error-correcting codes [Reference Kautz24], asks for the longest induced path in the hypercube graph. It also features in the study of Detour Distance [Reference Chartrand, Johns and Tian6] and the spread of information in social networks [Reference Gavril20]. To illustrate the latter connection, we imagine that some person in a social network shares some information with their connections (e.g. by posting it on their profile), who then might share it with their connections, resulting to an information cascade, in which the longest induced path represents the longest existing route for information transmission.
The classical literature on Random Graphs considers many problems on finding induced subgraphs, starting from several independent papers [Reference Bollobás and Erdős4, Reference Grimmett and McDiarmid21, Reference Nešetřil and De Mendez28] in the 1970’s calculating the asymptotic independence number of $G(n,p)$ for fixed $p$ and large $n$ . This was later extended by Frieze [Reference Frieze17] to $p=c/n$ for large constant $c$ (as noted in [Reference Cooley, Draganić, Kang and Sudakov8], this extends to all $p\geq c/n$ ). Erdős and Palka [Reference Erdős and Palka16] initiated the study of induced trees in $G(n, p)$ , which developed a large literature [Reference de la Vega9, Reference Frieze and Jackson18, Reference Kučera and Rödl25, Reference Mütze27] before its final resolution by de la Vega [Reference de la Vega10], showing that the size of the largest induced tree matches the asymptotics found earlier for the independence number: it is $\sim 2 \log _q(np)$ for all $p\geq c/n$ for large constant $c$ . The Longest Induced Path problem in $G(n,p)$ is also classical [Reference Matula26, Reference Łuczak31], and was recently resolved asymptotically by Draganić, Glock, and Krivelevich [Reference Draganić, Glock and Krivelevich12].
We extend this line of research to spectral expanders in the following result. To set up notation, let $G$ be a $d$ -regular graph on $n$ vertices whose adjacency matrix has eigenvalues ${\lambda }_1 \ge \dots \ge{\lambda }_n$ . Following Alon, we call $G$ an $(n,d,{\lambda })$ -graph with ${\lambda }=\max \{{\lambda }_2,-{\lambda }_n\}$ .
Theorem 1.1. Let $G$ be an $(n,d,\lambda )$ -graph with $\lambda \lt d^{3/4}/100$ and $d\lt n/10$ . Then $G$ contains an induced path of length $\frac{n}{64d}$ .
One may compare Theorem 1.1 with an analogous result in the random setting due to Draganić, Glock, and Krivelevich [Reference Draganić, Glock and Krivelevich13]. Their result is stated in a pseudorandom setting but essentially concerns random graphs, due to its strong assumptions on edge distribution (sublinear sets have average degree $\lt 3$ ).
One may also interpret Theorem 1.1 within the theory of Graph Sparsity (see [Reference Nešetřil and De Mendez28]), where one general problem considers some graph class $\mathcal{G}$ and asks for the optimal function $f_{\mathcal{G}}:\mathbb{N}\to \mathbb{N}$ such that any graph in $\mathcal{G}$ with a path of length $k$ must contain an induced path of length $f_{\mathcal{G}}(k)$ . For example, if $\mathcal{G}$ is the class of $C$ -degenerate graphs then Nešetřil and Ossona de Mendez [Reference Nešetřil and De Mendez28] showed that $f_{\mathcal{G}}(k) = \Omega (\log \log k)$ , whereas Defrain and Raymond [Reference Defrain and Raymond11] showed that $f_{\mathcal{G}}(k) = O(\log \log k)^2$ even for $C=2$ . Letting $\mathcal{G}$ be the class of spectral expanders as in Theorem 1.1, also assuming constant average degree, it is well-known that any such $G$ contains a path of length $n-o(n)$ , so Theorem 1.1 shows that $f_{\mathcal{G}}(k)=\Theta (k)$ .
Our next result is in the same spirit as Theorem 1.1, replacing spectral expansion with an edge-distribution condition, which is a mild upper-uniformity condition. Here and throughout the paper we write
Theorem 1.2. Let $G$ be a graph on $n$ vertices with minimum degree $d \ge 2^8$ . Suppose for some $C\gt 1$ that for all $X,Y \subseteq V(G)$ of sizes $|X|=\frac{n}{2^4 Cd}$ and $|Y|= \frac{n}{2^8 C}$ we have $e(X,Y)\lt 2^5 C|X||Y|\frac{d}{n}$ and $e(\Gamma(X),Y) \lt C|X||Y|\frac{d^2}{n}$ . Then $G$ contains an induced path of length $\frac{n}{2^5 Cd}$ .
The numerical constants in Theorem1.2 are quite flexible; we have selected a convenient choice that is whp satisfied in a subgraph $G'\subseteq G(n,d/n)$ of minimum degree $\delta (G')\geq 0.9d$ and size $N=|V(G')|\geq 0.9n$ , for $d \ge 2^{20}$ and $C=100$ , say. Superimposing $2^{20} N/d$ vertex-disjoint $2^{-20} d$ -cliques has very little effect on the upper-uniformity condition, and clearly ensures that there is no induced path of length $\gt 2^{20} N/d$ , so Theorem1.2 is tight up to the constant factor.
Next we will state our main result, which will easily imply the two results stated above. For a third application in Ramsey Theory (a modest improvement on the multicolour induced-size-Ramsey numbers of paths), we consider a more general problem where we are given graphs $G' \subseteq G$ and look for a long path in $G'$ that is induced in $G$ .
Theorem 1.3. Suppose $G$ and $G'$ are graphs on the same vertex set $V$ with $G' \subseteq G$ . Let $d$ be the minimum degree of $G'$ and $s_1,s_2,\ell$ be positive integers with $s_1+s_2+\ell \lt |V|$ . Suppose that
-
(1) $e_{G'}(X,Y)\lt \frac{d}{4}s_1$ for all $X,Y\subseteq V$ with $|X| = s_1$ and $|Y| = \ell +s_1+s_2$ .
-
(2) $e_{G'}(\Gamma _G(X),Y)\lt \frac{d}{4}s_2$ for all disjoint $X,Y\subseteq V$ with $|X| = \ell +s_1$ and $|Y| = s_2$ .
Then $G'$ contains a path of length $\ell$ which is induced in $G$ , and can be found in time $O(e(G))$ .
Our results are based upon a new version of the DFS graph search algorithm, tailored for applications on graphs with a weak local sparsity condition. The algorithm runs in time $O(e(G))$ , which is clearly best possible. We describe this algorithm and prove our main theorem in Section 2; the applications will be deduced in Section 3.
Notation. Besides the notation ${\Gamma }(X)$ and $e(X,Y)$ mentioned above, we also write $N(X) ={\Gamma }(X) \setminus X$ for the external neighbourhood of $X$ in $G$ . We systematically avoid rounding notation for clarity of presentation whenever it does not impact the argument.
2. The algorithm and its analysis
In this section we describe and analyse our algorithm (see Figure1), thus proving our main result, Theorem1.3. As in the statement, we consider graphs $G,G'$ on $V$ with $G' \subseteq G$ . Let $\sigma$ be an arbitrary ordering of $V$ . Throughout the algorithm we maintain a partition of $V$ into four sets $T$ , $P$ , $S_1$ and $S_2$ , where $T$ will be the set of vertices which have not yet been considered, $P$ will be the set of vertices in the current path, and $S_1$ and $S_2$ will be two different sets of discarded vertices. The vertices of $P$ are kept in a stack so that following the order in the stack gives a path in $G'$ which is induced in $G$ .
In the proof of Theorem1.3, we will rely on the following observations. We omit their proofs, as parts 1 to 4 are self-evident, and the complexity analysis for 6 is similar in spirit to that of the standard DFS algorithm. Only the key property in 5 requires a little thought. The key point is that whenever $v$ is added to $S_2$ the path $P-v$ is the same as when $v$ was added to $P$ , so if it satisfies the condition for joining $S_2$ at any point in the algorithm then it does so immediately when added to $P$ .
Observation 2.1. The following hold during the execution of the algorithm.
-
(A) The vertices in $P$ form a path in $G'$ which is induced in $G$ .
-
(B) If a vertex $v$ lands in $P\cup S_1\cup S_2$ , then it stays in this set until the algorithm terminates.
-
(C) When the algorithm terminates, we have $S_1\cup S_2=V$ .
-
(D) In each round, at most one vertex is added to $S_1\cup S_2$ .
-
(E) (key property) If a vertex is added to $S_2$ , it is added there immediately in the round after the round in which it appears in $P$ .
-
(F) The algorithm can be implemented in time $O(e(G))$ .
Now we analyse the algorithm, thus proving our main result.
Proof of Theorem 1.3. We consider the algorithm above applied to $G$ and $G'$ . Suppose for a contradiction that it does not find the required path. By Item 1 at each step of the algorithm we have $|P|\lt \ell$ . Consider the round after which $|S_1|=s_1$ or $|S_2|=s_2$ for the first time; this has to occur by Items 3 and 4. We distinguish two cases, according to whether $|S_1|=s_1$ or $|S_2|=s_2$ .
The first case is that $|S_1|=s_1$ , and so $|S_2|\lt s_2$ . Each vertex added to $S_1$ has at least $\frac{d}{2}$ of its $G'$ -neighbours in $P\cup S_1\cup S_2$ when it is added, and these neighbours remain in $P\cup S_1\cup S_2$ by Item 2. Thus at the round under consideration, we have $|P \cup S_1 \cup S_2| \le \ell +s_1+s_2$ and $e_{G'}(S_1,S_1\cup S_2\cup P)\geq s_1\frac{d}{4}$ , a contradiction with the first condition of the theorem.
The second case is that $|S_2|=s_2$ , and so $|S_1|\lt s_1$ . Each vertex added to $S_2$ had at least $\frac{d}{2}$ of its $G'$ -neighbours in ${\Gamma }(P)$ at the time when it was added, and the path vertices at that time are either still on the path or were added to $S_1$ , not $S_2$ , by the key property Item 5. Thus each vertex in $S_2$ has at least $\frac{d}{2}$ of its $G'$ -neighbours in ${\Gamma }(S_1 \cup P)$ , so $e_{G'}(S_2,{\Gamma }(S_1 \cup P)) \ge s_2 \frac{d}{4}$ . However, $|P\cup S_1|\leq \ell +s_1$ , so this contradicts the second condition of the theorem.
3. Applications
Now we will apply our main result, Theorem1.3, to prove the applications mentioned above, i.e. Theorems1.1 and 1.3, and an induced Ramsey result to be discussed below in Section 3.1.
First we prove our result on long induced paths in expanders. We use the following two well-known properties of $(n,d,{\lambda })$ -graphs that can be found e.g. in the survey [Reference Hoory, Linial and Wigderson23].
-
(E1) Expander Mixing Lemma [Reference Hoory, Linial and Wigderson23, Lemma 2.4]: $\Big |e(A,B)-|A||B|\frac{d}{n}\Big |\leq \lambda \sqrt{|A||B|}$ for any $A,B \subseteq V(G)$ .
-
(E2) Simplified Alon-Boppana Bound [Reference Hoory, Linial and Wigderson23, Claim 2.8]: $\lambda ^2 \ge d \cdot \frac{n-d}{n-1}$ .
Proof of Theorem 1.1. Suppose $G$ is a $(n,d,\lambda )$ -graph with $\lambda \lt d^{3/4}/100$ and $d\lt n/10$ . We will apply Theorem1.3 with $G'=G$ , $\ell =s_1=\frac{n}{64d}$ and $s_2=\frac{\lambda ^2}{d^2}n$ . We note that $s_1+s_2+\ell \lt n$ . By Property (E2) we have $s_2 \ge \frac{n}{d} \cdot \frac{n-d}{n-1} \gt s_1$ .
For condition (1), consider any $X,Y \subseteq V(G)$ with $|X| = s_1$ and $|Y| = \ell +s_1+s_2 \lt 3s_2$ . By Property (E1) we have
For condition (2), consider any $X,Y \subseteq V(G)$ with $|X| = 2s_1$ and $|Y| = s_2$ . As $G$ is $d$ -regular we have $|\Gamma (X)|\leq d|X|$ , so by Property (E1) we have
Since all conditions of Theorem1.3 are satisfied, there is an induced path of length $\ell =\frac{n}{64d}$ .
Now we prove our result on long induced paths in graphs satisfying an upper-uniformity edge-distribution condition.
Proof of Theorem 1.2. Let $G$ be a graph on $n$ vertices with minimum degree $d \ge 2^8$ . Suppose for some $C\gt 1$ that for all $X,Y\subseteq V(G)$ of sizes $|X|=\frac{n}{2^4 Cd}$ and $|Y|= \frac{n}{2^8 C}$ we have $e(X,Y)\lt 2^5 C|X||Y|\frac{d}{n}$ and $e(\Gamma (X),Y) \lt C|X||Y|\frac{d^2}{n}$ . To prove the theorem, it suffices to show that the conditions of Theorem1.3 are satisfied with $\ell =s_1=\frac{n}{2^5 Cd}$ and $s_2=\frac{n}{2^9 C}$ . First note that $s_1+s_2+\ell \lt n$ .
For condition (1), consider any $X,Y\subseteq V(G)$ with $|X| = s_1$ and $|Y| = \ell +s_1+s_2$ . Enlarging $X$ and $Y$ to sizes $2s_1 = \frac{n}{2^4 Cd}$ and $2s_2 = \frac{n}{2^8 C}$ , we have
as required. For condition (2), consider any $X,Y\subseteq V(G)$ with $|X| = \ell +s_1$ and $|Y| =s_2$ . Enlarging $Y$ to size $2s_2 = \frac{n}{2^8 C}$ , we have
as required. This completes the proof.
3.1 Induced-size-Ramsey number of paths
Here we consider a problem on induced-size-Ramsey numbers, combining two well-studied extensions of the classical graph Ramsey problem, in which one colours some ‘host’ graph $G$ and seeks a monochromatic copy of some ‘target’ graph $H$ . In the induced Ramsey problem, one seeks monochromatic induced copies of $H$ and in the size-Ramsey problem one aims to minimise the size $e(G)$ of $G$ . Induced-size-Ramsey problems combine both of these features: the $k$ -colour induced-size-Ramsey number $\hat{r}^k_{ind}(H)$ is the smallest integer $m$ such that there exists a graph $G$ on $m$ edges such that every $k$ -colouring of the edges of $G$ contains a monochromatic copy of $H$ that is an induced subgraph of $G$ . The main open problem in this direction is an old conjecture of Erdős that for any graph $H$ on $n$ vertices one has $\hat{r}^2_{ind}(H) \le 2^{O(n)}$ . For more background we refer the reader to the survey [Reference Conlon, Fox and Sudakov7] and recent papers [Reference Bradač, Draganić and Sudakov5, Reference Draganić and Petrova14].
The case when $H=P_n$ is a path has a large literature in its own right. Haxell, Kohayakawa, and Łuczak [Reference Haxell, Kohayakawa and Łuczak22] showed that $\hat{r}^k_{ind}(P_n)=O_k(n)$ , strengthening the analogous result of Beck [Reference Beck3] for the (not necessarily induced) size-Ramsey number, which in itself was a $ $ $ 100 problem of Erdős. While these results establish the order of magnitude as $O_k(n)$ , they do not say much about the implicit constant, particularly in [Reference Haxell, Kohayakawa and Łuczak22] due to the use of the regularity lemma. Even for the two-colour size-Ramsey number of $P_n$ , there is a substantial constant factor gap between the best known lower bound [Reference Bal and DeBiasio2] and upper bound [Reference Dudek and Prałat15]. For the multicolour induced-size-Ramsey number, a significant recent improvement on [Reference Haxell, Kohayakawa and Łuczak22] by Draganić, Glock, and Krivelevich [Reference Draganić, Glock and Krivelevich13] gives $\hat{r}^k_{ind}(P_n)=O(k^3\log ^4 k)n$ . We will improve this to $\hat{r}^k_{ind}(P_n)=O(k^3\log ^2k)n$ .
Theorem 3.1. For $c=10^{5}$ and for all large enough $k\in \mathbb N$ the following holds. Let $G\sim G(nk,\frac{c\log k}{n})$ . Then with high probability, for every $k$ -colouring of the edges of $G$ , there exists a monochromatic path of length $\frac{n}{c^3k\log k}$ which is induced in $G$ .
Proof. Let $p=\frac{c\log k}{n}$ and let $G\sim G(nk,p)$ . We first show that for $\ell =s_1=s_2=\frac{n}{c^3k\log k}$ the following hold with high probability
-
(a) $e_G(X,Y)\lt s_1 \frac{c\log k}{16}=\frac{n}{16kc^2}$ for all $X\subseteq V(G)$ with $|X| = s_1$ and $|Y| = \ell +s_1+s_2$ .
-
(b) $e_G(\Gamma _G(X),Y)\leq s_2 \frac{c\log k}{16}=\frac{n}{16kc^2}$ for all disjoint $X,Y\subseteq V(G)$ with $|X| = |Y| = 2\ell$ .
-
(c) $e_G(X)\leq \frac{n}{4c^2k}$ for $|X|=s_1+s_2+\ell$ .
For 1, for any such $X,Y$ we have $\mathbb{E} e(X,Y) = p|X||Y| = \frac{3n}{c^5 k^2 \log k}$ , so by Chernoff the bound fails with probability at most $e^{-\frac{n}{kc^2}}$ , say, and we can take a union bound over $\binom{nk}{\ell }\binom{nk}{3\ell }\leq e^{\frac{10n}{kc^3}}$ choices for $X,Y$ . Part 3 is similar. For 2, we first note that similarly to 1 we can assume for any such $X,Y$ that $e(X,Y) \lt \frac{n}{32kc^2}$ , so it suffices to show $e(N(X),Y ) \lt \frac{n}{32kc^2}$ , where $N(X)={\Gamma }(X) \setminus X$ . Also, by Chernoff we have $\mathbb{P}( e(X,V(G)) \gt 2n/c^2 ) \lt e^{-\frac{n}{10c^2}}$ , so by a union bound we can assume $|N(X)| \le e(X,V(G)) \le 2n/c^2$ . We note that $e(N(X),Y)$ is independent of the edges incident to $X$ , so conditional on $|N(X)| \le e(X,V(G)) \le 2n/c^2$ , Chernoff gives $\mathbb{P}( e(N(X),Y ) \gt \frac{n}{32kc^2} ) \lt e^{-\frac{n}{800kc^2}}$ , so we can take a union bound.
Let $G_0$ be the subgraph of $G$ consisting of the edges of the densest colour class. As whp $G$ has $(0.5-o(1))nck^2\log k$ edges, $G_0$ has at least $(0.5-o(1))nck\log k$ edges, so has average degree at least $c\log k/2$ . Now, let $G'$ be a subgraph of $G_0$ obtained by arbitrarily removing vertices with degree less than $c\log k/4$ one by one until this is no longer possible. Note that removing vertices in this way can not decrease the average degree. This process must stop with more than $s_1+s_2+\ell$ vertices remaining, otherwise we reach a set of $s_1+s_2+\ell$ vertices spanning at least $(s_1+s_2+\ell )c\log k/4\geq \frac{n}{2c^2k}$ edges, but this contradicts 3. To conclude, we apply Theorem1.3 to the graphs $G[V(G')]$ and $G'$ with parameters $s_1=s_2=\ell$ and $d= c\log k/4$ , noting that 1 and 2 give the required conditions of the theorem.
Since $G(nk,\frac{c\log k}{n})$ whp has $\Theta (nk^2\log k)$ edges, Theorem3.1 immediately gives the following.
Corollary 3.2. Multicolour induced-size-Ramsey numbers of paths satisfy $\hat{r}_{ind}^k(P_n)= O(nk^3\log ^2 k)$ .
Acknowledgement
We thank an anonymous referee for reading the paper carefully.