Hostname: page-component-7bb8b95d7b-pwrkn Total loading time: 0 Render date: 2024-10-01T22:45:35.128Z Has data issue: false hasContentIssue false

Colorful path detection in vertex-colored temporal

Published online by Cambridge University Press:  18 August 2023

Riccardo Dondi*
Affiliation:
University of Bergamo, Bergamo, Italy
Mohammad Mehdi Hosseinzadeh*
Affiliation:
University of Bergamo, Bergamo, Italy
*
Corresponding authors: Riccardo Dondi, Mohammad Mehdi Hosseinzadeh; Emails: [email protected], [email protected]
Corresponding authors: Riccardo Dondi, Mohammad Mehdi Hosseinzadeh; Emails: [email protected], [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Finding paths is a fundamental problem in graph theory and algorithm design due to its many applications. Recently, this problem has been considered on temporal graphs, where edges may change over a discrete time domain. The analysis of graphs has also taken into account the relevance of vertex properties, modeled by assigning to vertices labels or colors. In this work, we deal with a problem that, given a static or temporal graph, whose vertices are colored graph looks for a path such that (1) the vertices of the path have distinct colors and (2) that path includes the maximum number of colors. We analyze the approximation complexity of the problem on static and temporal graphs, and we prove an inapproximability bound. Then, we consider the problem on temporal graphs, and we design a heuristic for it. We present an experimental evaluation of our heuristic, both on synthetic and real-world graphs. The experimental results show that for many instances of the problem, our method is able to return near-optimal solutions.

Type
Research Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press

1. Introduction

Finding paths in a graph is a basic problem in computer science (Diestel, Reference Diestel2012). Several variants of the problem have been studied and have interesting practical applications. Computing a shortest path between two vertices in a graph is applied for example for finding the distance between two points in a map or IP routing, and computing a longest path in a graph has been applied in scheduling and graph drawing.

Path problems have been recently been studied for real-world data that need a richer description than that of the classic graph model. These new models may include vertex properties (represented with colors) and a description of the dynamics of the interactions. Vertex properties have been studied in several contributions. In biological networks, properties represent reaction types, functionalities, or gene properties (Betzler et al., Reference Betzler, van Bevern, Fellows, Komusiewicz and Niedermeier2011; Dondi et al., Reference Dondi, Fertin, Vialette, Giancarlo and Manzini2011; Lacroix et al., Reference Lacroix, Fernandes and Sagot2006; Zheng et al., Reference Zheng, Swenson, Lyons, Sankoff, Przytycka and Sagot2011). In document classification, colors represent document categories (Bruckner et al., Reference Bruckner, Hüffner, Komusiewicz, Niedermeier, Bonifaci, Demetrescu and Marchetti-Spaccamela2013; Cohen et al., Reference Cohen, Italiano, Manoussakis, Thang and Pham2021). The evolution of interactions between elements has been represented with dynamic or temporal graphs (Holme, Reference Holme2015). In this paper, we consider temporal graphs, where edges are associated with timestamps (over a discrete time domain) that represent when an interaction occurred. The analysis of temporal networks provides valuable insights on properties of complex systems (Castelli et al., Reference Castelli, Dondi and Hosseinzadeh2020; Costa et al., Reference Costa, Yamaguchi, Traina, Traina and Faloutsos2015; Dondi & Hosseinzadeh, Reference Dondi and Hosseinzadeh2021; Holme, Reference Holme2015; Kempe et al., Reference Kempe, Kleinberg and Kumar2002; Kovanen et al., Reference Kovanen, Karsai, Kaski, Kertész and Saramäki2011; Rozenshtein et al., Reference Rozenshtein, Bonchi, Gionis, Sozio and Tatti2019; Sanli & Lambiotte, Reference Sanli and Lambiotte2015; Thejaswi et al., Reference Thejaswi, Gionis and Lauri2020).

In this paper, we consider temporal vertex-colored graphs, where vertices are associated with colors encoding their properties. Given a set of colors, we study a problem that looks for a temporal path whose vertices are associated with distinct colors and that includes the maximum number of colors. The timestamps of consecutive edges in a temporal path must be strictly increasing, in order to satisfy the time constraint specified by the timestamps of the edges. The problem we investigate in this paper is a variant of the one introduced in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020), that, given a vertex-colored temporal graph, asks for a temporal path whose vertices have colors that matches a given multiset [called motif in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020)]. As observed in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020), one possible application of the problem is in tour recommendations (Choudhury et al., Reference Choudhury, Feldman, Amer-Yahia, Golbandi, Lempel, Yu, Chignell and Toms2010; Gionis et al., Reference Gionis, Lappas, Pelechrinis, Terzi, Carterette, Diaz, Castillo and Metzler2014), where vertices correspond to interesting locations, colors represent activities available in locations (museum, sport, and so on), edges represent to transportation links between different locations, timestamps represent departure time and time duration for moving from one location to another. A set (or a multiset) of colors represents activities a tourist may be interested into, and a temporal path associated with different colors is then a suggestion of distinct activities that can be accomplished respecting the time constraints. Another application, discussed in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020), is in analysis of financial transactions. Each vertex represents a financial entity and colors represent their properties. Two entities are connected by a temporal edge when a financial transaction between them occurs. A financial analyst may be interested in chains of transactions involving entities with distinct properties, for example money laundering activities may involve public figures, companies with certain types of contracts, and banks in offshore locations.

However, a temporal graph, due to its topology or to the time constraint, may not contain a temporal path that includes all the colors of a set. Thus, here we consider a natural direction to tackle with the problem, that is we look for a temporal path that includes the maximum number of colors.

1.1. Related works

Finding a colored path whose vertices have distinct colors and that includes the maximum number of colors is a problem known as Max S-CPTG (Cohen et al., Reference Cohen, Italiano, Manoussakis, Thang and Pham2021). Max S-CPTG has been proved to be not approximable within constant factor, unless P = NP (Cohen et al., Reference Cohen, Italiano, Manoussakis, Thang and Pham2021) (similarly to the problem of finding a longest path in a graph). Furthermore, the complexity of Max S-CPTG has been studied for several graph classes (trees, bipartite chain graphs, threshold graphs, block graphs, and proper interval graphs).

A related problem considered in the literature (Alon et al., Reference Alon, Yuster and Zwick1995; Kowalik & Lauri, Reference Kowalik and Lauri2016) is, given a static graph, computing whether there exists a path whose vertices are all colored distinctly so that each color in a given set belongs to some vertex of the path.

Another related problem that has been extensively studied in the literature is that of finding paths in a graph with edges labeled by symbols or strings. A path in the graph is associated with a string obtained by concatenating the strings of the path edges. In this context, a regular expression denotes a set of paths in the graph associated with the strings represented by the regular expression. This approach has been considered for example, in Abiteboul & Vianu (Reference Abiteboul and Vianu1997), to deal with semi-structured data as found on the Web, or to analyze social media (Wadhwa et al., Reference Wadhwa, Prasad, Ranu, Bagchi and Bedathur2019). In these cases, finding paths constrained by a regular expression allows for example to identify related strings. Notice that finding all pairs of vertices in a graph that are connected by a simple path and satisfy a given regular expression is an NP-complete problem (Mendelzon & Wood, Reference Mendelzon and Wood1995).

Several variants of the problem of finding a temporal path in a vertex-colored temporal graph that matches a given set or multiset of colors (called motif) have been introduced in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020). Two variants are Path Motif and Colorful Path. Path Motif, given a vertex-colored temporal graph, looks whether there exists a temporal path that matches a given motif. Colorful Path, given a vertex-colored temporal graph, looks whether there exists a temporal path of a given length so that its vertices are associated with distinct colors. The variants introduced in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020) are all decision problems, so they ask whether there exists a temporal path with some properties (on the length of the path and the colors associated with its vertices). It is shown in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020) that these variants of the problem are NP-complete and admit fixed-parameter tractable algorithms, where the parameter is by the size of the motif.

Several problems related to finding paths in a temporal graph have been studied in the literature (Wu et al., Reference Wu, Cheng, Huang, Ke, Lu and Xu2014, Reference Wu, Cheng, Ke, Huang, Huang and Wu2016). A notable example is checking whether a temporal graph contains a temporal path between two vertices with waiting time constraint, a problem that has been shown to be NP-complete (Casteigts et al., Reference Casteigts, Himmel, Molter, Zschoche, Cao, Cheng and Li2020). A similar problem is temporal graph exploration (Erlebach et al., Reference Erlebach, Hoffmann and Kammer2021), that asks for a temporal walk that starts at a given vertex and visits all vertices of a graph with the smallest arrival time. Other related problems ask for vertex deletions so that temporal paths connecting given pairs of vertices are removed (Zschoche et al., Reference Zschoche, Fluschnik, Molter and Niedermeier2020). Recent contributions have investigated the computational complexity of the problem of exploring a temporal graph when the underlying (static) graph is a star and of the problem finding an Eulerian walk in a temporal graph (Akrida et al., Reference Akrida, Mertzios, Spirakis and Raptopoulos2021; Bumpus & Meeks, Reference Bumpus, Meeks, Flocchini and Moura2021; Marino & Silva, Reference Marino, Silva, Flocchini and Moura2021).

1.2. Our contribution

Given a temporal vertex-colored graph, we consider the Max CPTG problem that looks for a temporal path whose vertices have distinct colors and that includes the maximum number of colors (or equivalently it has maximum length). We analyze the approximation complexity of Max CPTG, also when the input graph is static (a variant called Max S-CPTG). In Section 3, we prove that Max S-CPTG and Max CPTG are not approximable within a factor $O(|V|^{\frac{1}{2}- \varepsilon })$ , unless $\text{P} = \text{NP}$ ( $V$ is the vertex set of the input graph). Since Max S-CPTG is only known to be not approximable, unless $\text{P}=\text{NP}$ , with a constant factor (Cohen et al., Reference Cohen, Italiano, Manoussakis, Thang and Pham2021), our result strengthens the inapproximability of Max S-CPTG.

In Section 4, we present a heuristic for Max CPTG. Our aim is to design a method that is applicable even for a large number of colors. Notice that the methods proposed in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020) are for different problems, that ask whether there exists a temporal path that matches a motif or whose vertices are associated with distinct colors, while Max CPTG is an optimization problem. Moreover, the methods proposed in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020) are fixed-parameter algorithms, where the parameter is the size of the motif; therefore, the running time of these latter algorithms is exponential in the size of the motif. Hence, these methods are able to analyze motifs of moderate size (in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020) the motifs considered have a size up to $18$ ), while our approach can process larger sets of colors. In Section 5, we present our experimental work, on synthetic and real-world graphs, for sets containing up to $50$ colors. However, it has to be pointed out that the methods proposed in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020) return exact solutions, while our method is a heuristic, and for some instances, it does not return optimal solutions for Max CPTG.

The paper is organized as follows. In Section 2, we introduce some definitions and we formally define the Max S-CPTG and Max CPTG problems. Then in Section 3, we present bounds on the approximability for both problems. In Section 4, we present our heuristic, while in Section 5 we present an experimental evaluation on synthetic and real-world graphs. In the Conclusion Section 6, we point out some open problems for Max CPTG.

2. Preliminaries

Given a graph $G=(V,E)$ , $V$ is a set of vertices and $E$ a set of edges. We present now the definition of discrete time domain over which is defined as a temporal graph.

Definition 1. A time domain $\mathcal{T}$ is a sequence of integers $t_i$ , $1 \leq i \leq t_{\max}$ , called timestamps, such that $t_i \lt t_{i+1}$ , $1 \leq i \lt t_{\max}$ . An interval $T=[t_i,t_j]$ over $\mathcal{T}$ , with $t_i,t_j \in \mathcal{T}$ and $t_i \leq t_j$ , is the sequence of timestamps $t$ such that $t_i \leq t \leq t_j$ .

We give now the definition of temporal graph. Notice that the vertex set is not changing on the time domain.

Definition 2. A temporal graph $G = (V,E,{\mathcal{T}})$ consists of

  1. 1. A set $V$ of vertices

  2. 2. A time domain $\mathcal{T}$

  3. 3. A set $E \subseteq V \times V \times{\mathcal{T}}$ of temporal edges, where a temporal edge of $G$ is a triple $\{u,v,t\}$ , with $u,v \in V$ and $t \in{\mathcal{T}}$ .

Given an interval $I$ , the temporal edges defined in some timestamp of $I$ are called the active edges in $I$ .

We present now the definition of path and temporal path.

Definition 3. Given a (static) graph $G=(V,E)$ , a path in $G$ is an alternating sequence of vertices and edges $v_{p,1}\ e_{p,1}\ v_{p,2}\ e_{p,2} \dots \ e_{p,q-1}\ v_{p,q}$ such that:

  1. 1. the vertices $v_{p,1}$ , $v_{p,2}$ , $\dots$ , $v_{p,q}$ are distinct

  2. 2. For each $i$ , with $1 \leq i \leq q-1$ , $e_{p,i} = \{v_{p,i}, v_{p,i+1} \} \in E$

Definition 4. Given a temporal graph $G=(V,E,{\mathcal{T}})$ , a temporal path in $G$ is an alternating sequence of vertices and temporal edges $v_{p,1}\ e_{p,1}\ v_{p,2}\ e_{p,2} \dots \ e_{p,q-1}\ v_{p,q}$ such that:

  1. 1. The vertices $v_{p,1}$ , $v_{p,2}$ , $\dots$ , $v_{p,q}$ are distinct

  2. 2. For each $i$ , with $1 \leq i \leq q-1$ , $e_{p,i} = \{v_{p,i}, v_{p,i+1},t_i \} \in E$ , with $t_i \in{\mathcal{T}}$

  3. 3. For each $i$ , with $1 \leq i \leq q-1$ , it holds $t_i \lt t_{i+1}$ .

Vertices $v_{p,1}$ and $v_{p,q}$ in a path or in temporal path $p$ are the start and end vertex of $p$ . The length of $p$ , denoted by $|p|$ , is the number of vertices in $p$ . In the remaining part of the paper, we refer to Point 3 of Definition 4 as the time constraint of a temporal path.

Next, we introduce the definition of vertex-colored vertex-colored temporal graphs.

Definition 5. $G_c = (V,E,c)$ is a vertex-colored graph, where $G = (V,E)$ is a graph and $c: V \rightarrow C$ is a function that associates with each vertex in $V$ a color from set $C$ .

$G_c = (V,E,{\mathcal{T}},c)$ is a vertex-colored temporal graph, where $G = (V,E,{\mathcal{T}})$ is a temporal graph and $c: V \rightarrow C$ is a function that assigns a color from set $C$ of colors to each vertex in $V$ .

We can now define the concept of colorful set of vertices.

Definition 6. Consider a set $V' \subseteq V$ of vertices of a vertex-colored graph or of a vertex-colored temporal graph. $V'|$ is said colorful if all the vertices in $V'$ have distinct colors.

A path or a temporal path is colorful if it consists of vertices that have distinct colors.

We present now the formal definitions of the problems we consider in this paper.

Problem 1. Maximum Colorful Path in a Temporal Graph (Max CPTG)

Input: A vertex-colored temporal graph $G = (V,E,{\mathcal{T}},c)$ .

Output: A colorful temporal path in $G$ that includes the maximum number of colors (that is it has maximum length).

The Max S-CPTG problem is the variant of Max CPTG on static graphs.

Problem 2. Maximum Colorful Path in a Graph (Max S-CPTG)

Input: A vertex-colored graph $G = (V,E,c)$ .

Output: A colorful path in $G$ that includes the maximum number of colors (that is it has maximum length).

3. Inapproximability of Max S-CPTG and Max CPTG

In this section, we prove that Max S-CPTG and Max CPTG cannot be approximated within a factor $O(|V|^{\frac{1}{2} - \varepsilon })$ , unless $\text{P}= \text{NP}$ . This result is proven by designing an approximation preserving reduction from the Maximum Independent Set problem (denoted by Max IS). For details on approximation preserving reductions, we refer the reader to Williamson & Shmoys (Reference Williamson and Shmoys2011). We start by recalling the definition of the Max IS problem.

Problem 3. Max IS

Input: A graph $G_I = (V_I,E_I)$ , where $|V_I| = n$ and $|E_I| = m$ .

Output: An independent set $I \subseteq V_I$ (i.e. if $u,v \in I$ , it holds that $\{u,v\} \notin E_I$ ) of maximum size.

We start by describing our approximation preserving reduction from Max IS to Max S-CPTG, and then, we discuss how the result can be extended to Max CPTG.

Given an instance $G_I=(V_I, E_I)$ of Max IS, we define a corresponding vertex-colored graph $G=(V,E,c)$ , which is an instance of Max S-CPTG (an overview of $G=(V,E,c)$ is given in Figure 1).

Figure 1. An overview of the graph $G=(V, E,c)$ associated with $G_I$ . Each box contains the set $V_i$ of vertices, with $0 \leq i \leq n+1$ , and the path $p(V_i)$ . For each vertex, its name is the left label, its colors are the right label. We assume that $\{ v_1, v_2 \} \in E_I$ ; hence, $c(v_{1,2})=c(v_{2,1})= c_{1,2}$ , $\{ v_1, v_n \} \notin E_I$ ; hence, $c(v_{1,n}) = a_1^n$ , $c(v_{n,1}) = a_n^1$ , and $\{ v_2, v_n \} \in E_I$ ; hence, $c(v_{2,n})=c(v_{n,2})= c_{2,n}$ .

For each $v_i \in V_I$ , $1 \leq i \leq n$ , $V$ contains a set $V_i$ of $2n+1$ vertices:

\begin{equation*} V_i = \left\{ v_{i,x}\,:\, 0 \leq x \leq 2n \right\}. \end{equation*}

Furthermore, $V$ contains two additional sets of vertices:

\begin{equation*} V_0 = \left\{ v_{0,x}\,:\, 0 \leq x \leq 2n \right\}, \end{equation*}
\begin{equation*} V_{n+1} = \left\{ v_{n+1,x}\,:\, 0 \leq x \leq 2n \right\}. \end{equation*}

The vertex set $V$ of $G$ is defined as the union of the subsets $V_i$ , with $0 \leq i \leq n+1$ :

\begin{equation*} V = \bigcup _{i=0}^{n+1} V_i. \end{equation*}

Now, we define the color function $c : V \rightarrow C$ , where $C$ is equal to:

\begin{equation*} C = \left\{ c_{i,j}: \{ v_i, v_j \} \in E_I \wedge i \lt j\right\} \cup \left\{ a_i^q\,:\, 0 \leq i,q \leq 2n \right\} . \end{equation*}

Informally, each color $c_{i,j}$ represents an edge $\{v_i,v_j\} \in E_I$ , with $1 \leq i \lt j \leq n$ , while each color $a_i^q$ , $1 \leq q \leq n$ , represents the fact that $v_i$ is not adjacent to vertex $v_q$ . Notice that $a_i^q \neq a^q_i$ .

Next, we define the function $c$ that assigns colors in $C$ to vertices in $V$ . For the vertices in $V_i$ , with $1 \leq i \leq n$ , $c$ is defined as follows:

  • $c(v_{i,0}) = a_i^0$

  • $c(v_{i,x}) = a_i^x$ , $n+1 \leq x \leq 2n$

  • $c(v_{i,x}) = c_{i,x}$ , if $\{v_i,v_x\} \in E$ and $1 \leq i \lt x \leq n$

  • $c(v_{i,x}) = c_{x,i}$ , if $\{v_i,v_x\} \in E$ and $1 \leq x \lt i \leq n$

  • $c(v_{i,x}) = a_i^x$ , if $\{v_i,v_x\} \notin E$ and $1 \leq i,x \leq n$

Notice that $c(v_{i,i}) = a_i^i$ , for each $i$ with $0 \leq i \leq n$ , as we assume that $G_I$ does not contain self-loops.

For the vertices of $V_0 \cup V_{n+1}$ , the function $c$ is defined as follows:

  • $c(v_{0,x}) = a_0^x$ , $0 \leq x \leq 2n$

  • $c(v_{n+1,x}) = a_{n+1}^x$ , $0 \leq x \leq 2n$ .

Next, we define the set of edges of $G$ . For each vertex set $V_i$ , $0 \leq i \leq n+1$ , the graph $G$ contains a colorful path $p(V_i)$ induced by the vertices $v_{i,x}$ with $0 \leq x \leq 2n$ and formally defined as follows:

\begin{equation*} p(V_i) = v_{i,0}\ \left\{ v_{i,0}, v_{i,1} \right\}\ v_{i,1} \ \{ v_{i,1}, v_{i,2}\} \ \dots \{ v_{i,2n-1}, v_{i,2n} \} \ v_{i,2n} \end{equation*}

The set $E$ contains also the following edges defined to connect some paths $p(V_i)$ , $0 \leq i \leq n+1$ :

  • $\{ v_{0,2n}, v_{i, 0} \}$ , for each $i$ with $1 \leq i \leq n$

  • $\{ v_{i,2n}, v_{z, 0} \}$ , with $1 \leq i \lt z \leq n$ , for each edge $\{ v_i, v_z \} \notin E_I$

  • $\{ v_{i,2n}, v_{n+1, 0} \}$ , for each $i$ with $1 \leq i \leq n$

This completes the definition of the vertex-colored graph $G=(V,E, c)$ . We prove now a property of $G$ .

Lemma 1. Let $G_I=(V_I, E_I)$ be an instance of Max IS and let $G=(V, E, c)$ be the corresponding instance of Max S-CPTG. Then:

  1. 1. Each path $p(V_i)$ , with $0 \leq i \leq n+1$ , is colorful

  2. 2. The vertices of two paths $p(V_i)$ , $p(V_j)$ , with $1 \leq i \lt j \leq n$ and $\{ v_i,v_j\} \notin E_I$ , have different colors.

Proof. 1. The property follows from the fact that each vertex of $V_i$ , $0 \leq i \leq n+1$ , is associated with a color different from the other vertices of $V_i$ .

2. By definition of coloring $c$ , since $\{ v_i,v_j\} \notin E_I$ , it follows that $c(v_{i,j}) = a_i^j$ and $c(v_{j,i}) = a_j^i$ and, by construction, $a_i^j \neq a_j^i$ . Since by construction all the other vertices of $p(V_i)$ and $p(V_j)$ are associated with different colors, thus the lemma holds.

Given a solution of Max S-CPTG, we show next how to construct in polynomial time a solution of Max IS.

Lemma 2. Consider an instance $G_I=(V_I, E_I)$ of Max IS and let $G=(V, E, c)$ be the corresponding instance of Max S-CPTG. Given a solution $I \subseteq V_I$ of Max IS, we can construct in polynomial time a solution of Max S-CPTG of length at least $(|I|+2)(2n+1)$ .

Proof. Consider an independent set $I = \{ v_{i,1}, v_{i,2}, \dots, v_{i,b}\}$ of $V_I$ , where $i_1 \lt i_2\lt \dots \lt i_b$ . Then, define a solution $p$ of Max S-CPTG as follows. The path $p$ includes the colored path $p(V_0)$ , the colored paths $p(V_{i_x})$ , $1 \leq x \leq b$ , and the colored path $p(V_{n+1})$ . Each of these paths is colorful by Lemma 1. $p$ is obtained by connecting these colored paths with the following edges:

  • $\{ v_{0,2n}, v_{i_1,0}\}$ connects $p(V_0)$ and $p(V_{i_{1}})$ ;

  • $\{ v_{i_x,2n}, v_{i_{x+1},0}\}$ , that exists by construction, since $\{ v_{i_x}, v_{i_{x+1}} \} \notin E$ , connects $p(V_{i_x})$ and $p(V_{i_{x+1}})$ , $1 \leq x \leq b-1$ ;

  • $\{ v_{i_b,2n}, v_{n+1,0}\}$ connects $p(V_{i_b})$ and $p(V_{n+1})$ .

Notice that $v_{i,x}, v_{i,y} \in V_I$ , with $1 \leq x \lt y \leq b$ , are not adjacent in $G_I$ ; thus, from Lemma 1 the vertices in $p(V_{i_x})$ and $p(V_{i_y})$ are associated with different colors. By construction, each vertex in $p(V_0)$ , $p(V_{n+1})$ has a color distinct from the other vertices in $V$ ; hence, $p$ is colorful. Finally, notice that $p$ is obtained connecting of $|I|+2$ paths $p(V_i)$ , each of length $2n+1$ , thus concluding the proof.

In the next lemma, we show that, giving a solution of Max S-CPTG, we can construct in polynomial time a solution of Max IS.

Lemma 3. Consider an instance $G_I=(V_I, E_I)$ of Max IS and let $G=(V, E, c)$ be the corresponding instance of Max S-CPTG. Then, starting from a solution of Max S-CPTG of length $(q+2)(2n+1)$ , we can construct in polynomial time an independent set of $G_I$ of size at least $q$ .

Proof. Let $p$ be a colorful path in $G$ of length $(q+2) (2n+1)$ . Notice that $p$ contains vertices that belong to at least $q$ paths $p(V_i)$ , $1 \leq i \leq n$ . If this is not the case, it follows that by construction the colorful path $p$ contains vertices of less than $q$ paths $p(V_i)$ , with $1 \leq i \leq n$ , and paths $p(V_0)$ , $p(V_{n+1})$ . Since by construction $|p(V_j)| = 2n+1$ , with $0 \leq j \leq n+1$ , it follows that $|p| \lt (q +2)(2n+1)$ .

Each path $p(V_i)$ , $1 \leq i \leq n$ , with some vertices in $p$ , belongs to one of the following sets:

  • The set $S_1$ of paths all included in $p$ (we assume that there are $q'$ of them, with $0 \leq q' \leq n$ );

  • The set $S_2$ of paths not included in $p$ that have more than two vertices in $p$ ; by construction, these paths must include an endpoint of $p$ , thus this set contains at most two paths;

  • The set $S_3$ of paths not included in $p$ that have one or two vertices in $p$ (this set includes at most $n-q'$ paths).

Consider now $p(V_0)$ and $p(V_{n+1})$ . If one of $p(V_0)$ and $p(V_{n+1})$ is not included in $p$ , then (1) either it contains an endpoint of $p$ or (2) it does not contain an endpoint of $p$ and by construction the only vertex of $p(V_0)$ or of $p(V_{n+1})$ included in $p$ can be $v_{0,2n}$ or $v_{n+1,0}$ , respectively. We can conclude that the number of vertices of $p$ , that belongs to paths in $S_2$ , to $p(V_0)$ or to $p(V_{n+1})$ , is at most $2(2n+1)+2$ . The overall number of vertices of $p$ (they belong to paths in $S_1 \cup S_2 \cup S_3$ and to paths $p(V_0)$ , $p(V_{n+1})$ ) is at most:

\begin{equation*} q' (2n+1) + 2(n-q') + 2(2n+1) +2 = (q'+2) (2n+1) + 2(n-q'+1) \end{equation*}

We claim that the paths in $S_1$ are at least $q$ . Assume by contradiction that $q' \lt q$ . Then,

\begin{equation*} (q+2)(2n+1) - (q'+2) (2n+1) \geq 2n+1 \end{equation*}

Since $|p| = (q+2)(2n+1)$ , it follows that $2(n-q'+1) \geq 2n+1$ , which implies $-2q'+2 \geq 1$ . Thus, $2q' \leq 1$ , that is $q' \leq 0$ , since $q'$ is an integer. Now if $q'=0$ , then $q \geq 1$ , and the number of vertices in $p$ is strictly smaller than

\begin{equation*} 3(2n+1) \geq (q+2)(2n+1), \end{equation*}

since it is bounded by $2(2n+1) + 2n$ (if both $p(V_0)$ and $p(V_{n+1})$ are included in $p$ ), by $2(2n+1) + 2(n-1)+1$ (if exactly one of $p(V_0)$ and $p(V_{n+1})$ is included in $p$ ), by $2(2n+1) + 2(n-2)+2$ (if none of $p(V_0)$ and $p(V_{n+1})$ is included in $p$ ).

Hence, $S_1$ must include at least $q$ paths. Now, consider two paths $p(V_i)$ , $p(V_j)$ in $S_1$ , with $1 \leq i \lt j \leq n$ . Notice that, since all the vertices of $p(V_i)$ , $p(V_j)$ belong to $p$ and $p$ is colorful, it follows that the vertices of $p(V_i)$ and $p(V_j)$ are associated with different colors. Then, $\{ v_i, v_j\} \notin E_I$ , otherwise the two vertices $v_{i,j}$ and $v_{j,i}$ in $p(V_i)$ , $p(V_j)$ , respectively, are both assigned the same color $c_{i,j}$ .

It follows that we can define an independent set $I$ of $G_I$ as follows:

\begin{equation*} I = \{ v_i: p(V_i) \text { is a path in } S_1 \} . \end{equation*}

Notice that $|I| \geq q$ , since have proved that $S_1$ contains at least $q$ paths thus concluding the proof.

The inapproximability of Max S-CPTG follows from Lemma 2, Lemma 3, and from the inapproximability of Max IS (Zuckerman, Reference Zuckerman2006).

Theorem 1. Max S-CPTG is not approximable within a factor $O(|V|^{1/2- \varepsilon })$ unless P=NP.

Proof. We prove that we have designed an approximation preserving reduction from Max IS to Max S-CPTG. Denote the value of an optimal solution of Max S-CPTG (Max IS, respectively) by $\text{OPT(S-CPTG)}$ ( $\text{OPT(IS)}$ , respectively); denote the value of an approximate solution of Max S-CPTG (Max IS, respectively) by $\text{APX(S-CPTG)}$ ( $\text{APX(IS)}$ , respectively). Next, consider the approximation factor of Max S-CPTG, that is

\begin{equation*} \frac {\text{OPT(S-CPTG)}}{\text{APX(S-CPTG)}}. \end{equation*}

By Lemma 2, it follows that $\text{OPT(S-CPTG)} \geq (2n+1)\cdot \text{(OPT(IS) + 2)}$ . Thus,

\begin{equation*} \frac {\text{OPT(S-CPTG)}}{\text{APX(S-CPTG)}} \geq \frac {(2n+1) \cdot \text{(OPT(IS)+2)}}{\text{APX(S-CPTG)}}. \end{equation*}

Consider an approximated solution of ${\sf Max CPTG}{}$ of length $(2n+1)\cdot (q+2)$ ; by Lemma 3, we can compute in polynomial time a solution of Max IS of size at least $q$ . It follows that $\text{APX(S-CPTG)} \leq (2n+1)\cdot \text{(APX(IS) + 2)}$ . Thus,

\begin{equation*} \frac {\text{OPT(S-CPTG)}}{\text{APX(S-CPTG)}} \geq \frac {(2n+1) \cdot \text{(OPT(IS)+2)}}{\text{APX(S-CPTG)}} \geq \frac {(2n+1) \cdot \text{(OPT(IS)+2)}}{(2n+1) \cdot \text{( APX(IS)+2)}}. \end{equation*}

Since we can assume that $\text{APX(IS)} \geq 1$ (any vertex is indeed an independent set), it follows that

\begin{equation*} \frac {\text{OPT(S-CPTG)}}{\text{APX(S-CPTG)}} \geq \frac {\text{OPT(IS)+2}}{\text{APX(IS)+2}} \geq \frac {\text{OPT(IS)}}{\text{APX(IS)+2}} \geq \frac {\text{OPT(IS)}}{3 \text{APX(IS)}}. \end{equation*}

Since $\sf Max IS$ is not approximable within factor $O(n^{1-\varepsilon })$ , for any $\varepsilon \gt 0$ unless P = NP (Zuckerman, Reference Zuckerman2006), it follows that

\begin{equation*} \frac {\text{OPT(S-CPTG)}}{\text{APX(S-CPTG)}} \geq \frac {\text{OPT(IS)}}{\text{3APX(IS)}} \geq O\left(n^{1 -\varepsilon }\right). \end{equation*}

By construction, $|V| = (2n+1)(n+2)$ , hence we have that

\begin{equation*} \frac {\text{OPT(S-CPTG)}}{\text{APX(S-CPTG)}} \geq O\left(n^{1 -\varepsilon }\right) = O(|V|^{\frac {1}{2} -\varepsilon }-1) = O\left(|V|^{\frac {1}{2} -\varepsilon }\right), \end{equation*}

thus concluding the proof.

Now, we show how to extend the inapproximability result to Max CPTG.

Corollary 1. Max CPTG is not approximable within a factor $O(|V|^{1/2- \varepsilon })$ unless P=NP.

Proof. We construct an approximation preserving reduction from Max IS to Max CPTG by modifying the previous reduction (from Max IS to Max S-CPTG). We define a time domain $\mathcal{T}$ that consists of the concatenation of $n+2$ time disjoint intervals $T(V_0), T(V_1), \dots, T(V_n), T(V_{n+1})$ , where each $T(V_i)$ , $0 \leq i \leq n+1$ , is associated with vertex set $V_i$ and $T(V_i)$ precedes $T(V_{i+1})$ , for each $i$ with $0 \leq i \leq n$ . In each $T(V_i)$ , $0 \leq i \leq n+1$ , only edges connecting vertices of $V_i$ are active, so that $p(V_i)$ is a temporal path from $v_{i,0}$ to $v_{i,2n}$ . Moreover, in the last timestamp $t$ of $T(V_i)$ , $0 \leq i \leq n$ , the only active edges are those connecting $v_{i,2n}$ with vertices $v_{j,0}$ , as defined in the previous reduction. It follows that: (1) as for Lemma 2, given an independent set of size $q$ , we can compute in polynomial time a colorful temporal path of length at least $(q+2)(2n+1)$ in $G$ ; (2) as for Lemma 3, given a colorful temporal path of length $(q+2)(2n+1)$ in $G$ , we can compute in polynomial time an independent set of size $q$ . Thus the same argument of Theorem 1 holds and we can conclude that Max CPTG is not approximable within a factor $O(|V|^{1/2- \varepsilon })$ unless P = NP.

4. A heuristic for Max CPTG

We present now an efficient heuristic, called Colorful Temporal Path Local Search (CTPLS), for the Max CPTG problem. CTPLS consists of two phases:

  1. 1. CTPLS computes a solution (a colorful temporal path) via a greedy preliminary step

  2. 2. CTPLS applies a local search strategy that looks for a possible improvement of the computed solution so far.

First, we describe the preliminary greedy step. Consider a vertex-colored temporal graph $G_c = (V,E,{\mathcal{T}},c)$ . The preliminary step computes a segmentation of the time domain $\mathcal{T}$ , by dividing $\mathcal{T}$ in $|C|$ disjoint intervals of equal length. Then, it greedily looks for a temporal edge to be added to the path $p$ computed so far in each interval. The path $p$ is initialized as two vertices colored with distinct colors and connected by a temporal edge in the first interval (or in the first possible interval, if there is no such edge in the first interval). In the next intervals, the greedy step looks for a vertex $V$ that satisfies the following properties:

  1. 1. $v$ is connected with a temporal edge to an endpoint of $p$ .

  2. 2. The color of $v$ is distinct from the colors of the vertices already in $p$ .

The second step of CTPLS is based on a local search strategy that applies the following possible modifications to a colorful temporal path $p$ (that does not include all the colors):

  1. 1. LS1 (Edge replacement): each temporal edge $\{w,v,t\}$ is selected and possibly replaced with two temporal edges $\{w,x,t_1\}$ , $\{x,v, t_2\}$ , where $t_1 \lt t_2$ ; notice that vertex $x$ must not be in $p$ and must have a color distinct from the vertices already in $p$ ; moreover, the new path must satisfy the time constraint.

  2. 2. LS2 (Vertex replacement): a given vertex $u$ in $p$ is selected, with the temporal edges of $p$ incident in $u$ , and it is possibly replaced with two vertices $v$ and $w$ (not in $p$ ) and three temporal edges so that the new path satisfies the time constraint. Notice that $v$ and $w$ must have different colors from the vertices of $p$ (except for the replaced vertex $u$ ).

CTPLS applies local search LS1 until it is possible; then, on the resulting solution, it applies LS2 until it is possible.

5. Experimental results

We present now an experimental evaluation of CTPLS on synthetic and real networks.Footnote 1 CTPLS is implemented in Python 3.7, using the NetworkX package for network analysis (Hagberg et al., Reference Hagberg, Swart and Chult2008). We perform the experiments on a MacBook-Pro (OS version 11.4) with processor 2.9 GHzIntel Core i5 and 8 GB 2133MHz LPDDR3 of RAM.

5.1. Synthetic networks

The first part of our experimental evaluation considers the performance of CTPLS on synthetic datasets. First, we describe how the synthetic datasets are generated and then we discuss the results of CTPLS.

Datasets. Each synthetic graph is built as follows. We start by generating a temporal graph consisting of $500$ vertices over $90$ timestamps, where the graph topology is based on one of the following models: Erdös-Renyi (ER) with parameter $pr=0.1$ , Erdös-Renyi with parameter $pr=0.4$ and Barabasi-Albert (BA) with parameter equal to $10$ . $|C|$ vertices of the graph are selected randomly, each one assigned a distinct color in $C$ ; temporal edges are added so that the resulting graph has a temporal path that connects these vertices. This ensures that each synthetic graph built contains a colorful temporal path that includes all the colors in $C$ , which is an optimal solution of Max CPTG. Since there exists such a path, we can compare the solutions computed by CTPLS with the optimal ones. For the remaining vertices, colors from $C$ are assigned randomly (with uniform probability). We consider four different cardinalities for the set $C$ : 10, 20, 30, and 50 colors. We generated 1000 synthetic graphs for each graph model and for each cardinality of $C$ .

Outcome. In Table 1, we present the results of our experimental evaluation on the 12000 instances of the synthetic datasets. We report the minimum, maximum, average and standard deviation of the returned solutions of CTPLS over $1000$ instances for each color set and each graph model. We report also the average running time (in seconds).

Table 1. Performance of CTPLS on synthetic datasets, where the number of colors ranges from 10 to 50. Minimum, maximum, average, and standard deviation over 1000 independent synthetic networks for each different color sets are reported. Notice that the average running time is in seconds

As it is shown in Table 1, the performances of CTPLS degrade with the increasing number of colors. For the BA-based graphs, color sets of size $10$ , the computed solutions contain on average at least $89\%$ of the colors in $C$ , for color set of size $50$ the average number of colors contained in the computed solutions is $16.17$ out of $50$ . Similar performances are observed for ER-based graphs with $pr=0.1$ , where for $50$ colors the returned solutions contain on average $13.63$ colors over $50$ . For ER-based graphs with $pr=0.4$ , the degradation is moderate, and even for $50$ colors the number of average colors contained in the solutions computed by CTPLS is high ( $42.43$ over $50$ ).

The experimental results show that the performances of CTPLS depend on the specific graph model. For the ER model with $pr=0.4$ , the solutions returned are always close to the optimum: $10$ over $10$ for $10$ colors, $99.45\%$ with respect to the optimum for $20$ colors, $95.2\%$ for $30$ colors and $84,86\%$ for $50$ colors. The performances are significantly worse on ER with $pr=0.1$ , with $59.7\%$ with respect to the optimum for $30$ colors, $27.3\%$ for $50$ colors. For the graphs based on this model, it can be observed a high value of the standard deviation, in particular for $30$ colors ( $5.23$ ) and for $50$ colors ( $7.20$ ). In these cases, the minimum length of a returned solution is considerably small ( $4$ and $2$ , respectively, for $30$ and $50$ colors, respectively). For the BA model, the solutions returned by CTPLS are close to the optimum only for the case of $10$ colors (within $89.1\%$ ) and are on average $63.1\%$ , $48.4\%$ and $32.3\%$ for $20$ , $30$ and $50$ colors, respectively, with respect to the optimum.

Notice that it is not surprising that, for some datasets, the lengths of the solutions returned by CTPLS are not close to the optimum, as Max CPTG is hard to approximate (see Section 3),

As for the running time, the method is always fast on synthetic datasets, with an average running time of at most $0.64$ seconds (ER model with $pr=0.4$ and $50$ colors).

5.2. Real networks

We analyze the performance of CTPLS also on four real-world datasets.

Datasets. We consider four different real-world temporal graphs taken from SNAP (Leskovec & Krevl, Reference Leskovec and Krevl2014) for testing CTPLS: College messagesFootnote 2 (CollegeMsg), Email EU coreFootnote 3 (email-Eu-core-temporal), Bitcoin alphaFootnote 4 (soc-sign-bitcoinalpha) and Bitcoin otcFootnote 5 (soc-sign-bitcoinotc). Notice that these graphs are temporal, but not colored, hence we have to define a vertex-coloring of their vertices. Following the approach applied by Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020), we define a coloring by assigning uniformly random colors to vertices from set $C$ of colors; we consider two cases, $|C| = 30$ and $|C|=50$ . Furthermore, for each vertex-colored graph obtained, the value of an optimal solution is unknown and applying an exact exponential algorithm is not practicable on these large datasets. Thus, in order to evaluate the results of CTPLS, we consider two variants for each of these networks: the original temporal graph, denoted by NO-OP, and a modified temporal graph, called YES-OP, obtained by adding a temporal colorful path that contains each color in $C$ . By construction, the YES-OP temporal graph contains an optimal solution of length $|C|$ .

The first dataset, CollegeMsg, is based on private messages sent on an online social network by users from the University of California, Irvine. Vertices represent social network users, and temporal edges private message exchanged by users. The dataset consists of 59,835 temporal interactions, 1899 vertices and time domain of length 58,911. The email-Eu-core-temporal dataset is based on e-mails between members of a large European research institution, where vertices represent institution members and edges represent e-mails exchanged by users. The dataset contains 332,334 temporal interactions, 986 vertices and time domain $\mathcal{T}$ of length 207,880. soc-sign-bitcoinalpha and soc-sign-bitcoinotc are datasets of traders using Bitcoin on two platforms (Bitcoin Alpha and Bitcoin OTC) to prevent transactions with risky users. Vertices represent platform members, and edges represent rates given by users. soc-sign-bitcoinalpha contains 24,186 temporal interactions, 3783 vertices and time domain $\mathcal{T}$ of length 1647, soc-sign-bitcoinotc contains 35,592 temporal interactions, 5881 vertices and time domain $\mathcal{T}$ of length 35,445.

Outcome. In Table 2, we report, for the two groups of real datasets we considered (NO-OP and YES-OP), the number of colors included in the solutions returned by CTPLS and the running time (in minutes) of CTPLS. For the NO-OP datasets with $30$ colors, CTPLS returned in the worst case a path containing $20$ out of $30$ colors (soc-sign-bitcoinalpha), in the best case an optimal solution (email-Eu-core-temporal). For the other two networks, CollegeMsg and soc-sign-bitcoinotc, our heuristic computed near-optimal solutions (with $27$ and $25$ colors, respectively) out of $30$ .

For the YES-OP networks with $30$ colors, the results are not significantly different from the corresponding NO-OP datasets. CTPLS computed in one case (the CollegeMsg) a path with the same number of colors as for the corresponding NO-OP network. In one case, (soc-sign-bitcoinotc) CTPLS computed a larger number of colors ( $27$ instead of $25$ out of $30$ ), in another case (soc-sign-bitcoinalpha) CTPLS computed a slightly smaller number of colors ( $19$ instead of $20$ out of $30$ colors). This decreasing is due to the fact that CTPLS considered a temporal edge that belongs to the YES-OP instance and not to the NO-OP instance and this prevented CTPLS to include all the vertices of the solution of the NO-OP instance. Notice that we don’t report the result for email-Eu-core-temporal, CTPLS was able to compute an optimal solution for this dataset in NO-OP network.

Table 2. Performance of CTPLS on real datasets. The value of the time (in minutes) and the value of return solution (path) for the Max CPTG problem are reported for two different color sets (30 and 50)

In the NO-OP datasets with $50$ colors, CTPLS computed in the worst case a path containing $36$ colors (soc-sign-bitcoinalpha) and in the best case (email-Eu-core-temporal) $49$ out of $50$ colors. For the other two networks, CollegeMsg and soc-sign-bitcoinotc networks, CTPLS computed paths with $38$ and $40$ colors out of $50$ , respectively. For networks with $50$ colors, CTPLS found the same number of colors in both YES-OP and NO-OP networks.

The experiments on real-world datasets show that our heuristic is able to compute for many instances results containing a significant number of colors in $C$ , even when $|C|= 50$ . For networks with $30$ colors, CTPLS returned solutions with at least $63\%$ colors compared to the optimum (soc-sign-bitcoinalpha) and in one case an optimal solution. For the networks with $50$ colors, CTPLS found solutions with at least $72\%$ and at most $98\%$ colors compared to the optimum. Except for soc-sign-bitcoinalpha, the quality of solution returned by CTPLS starts slowly to degrade going from $30$ colors to $50$ colors. However, this deterioration is less evident than in synthetic datasets.

As for the running time of CTPLS, it is able to return a solution of Max CPTG in reasonable time, even for a set of $50$ colors [this value is larger than what has been considered in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020)]. The running time of CTPLS varies significantly depending on the size of the temporal network and, in particular, on the length of the time domain. The highest running time of CTPLS is indeed observed on CollegeMsg and email-Eu-core-temporal, two temporal networks having time domains of 207,880 and 58,911 timestamps, respectively. For soc-sign-bitcoinalpha (NP-OP, $50$ colors), the temporal network with smallest time domain, the running time of CTPLS is at most 0.51 minutes.

5.3. Local search improvements

In this subsection, we further analyze the performance of CTPLS and we consider the results returned by the different phases of the heuristic. We report in Table 3 the performances of the different phases of CTPLS on the synthetic graphs we have considered in our experimental work.

As it can be seen from Table 3, LS1 and LS2 are able to improve significantly the solution returned by the first phase of CTPLS. This improvement is relevant in particular for synthetic graphs based on BA model. The average improvement of LS1 with respect to the solution returned by the preliminary step ranges from $1.4\%$ to $26.2\%$ ; LS2 has better performances for these synthetic graphs, with an average improvement with respect to the solution returned by LS1 that ranges from $9.3\%$ to $44.3\%$ . This behavior of LS2 with respect to LS1 is observable in all the cases except for ER with $pr=0.4$ and $10$ colors, where the improvement of local search is very limited. For the synthetic graphs based on ER models, in particular with $pr=0.4$ , the improvement of the local search phases is lower than the improvement for BA graph models. A possible explanation is that the number of colors in the solutions returned by the first phase is generally larger for these graphs than for BA-based graphs. Thus, LS1 and LS2 have less room for improvement for graphs based on ER models.

Table 3. Performance of the local search phases (LS1 and LS2) of CTPLS on synthetic datasets. The values in the table are the average improvements of LS1 with respect to the solutions returned by preliminary step and the average improvements of LS2 with respect to the solutions computed by LS1

These results on real graphs reported in Table 4 confirm that local search is able to effectively improve the solutions returned in the first phase of CTPLS. Indeed, the local search phase improves the solution for all the real graphs we considered. For example, for soc-sign-bitcoinalpha, the improvement is relevant, as the preliminary step returns a solution of size $5$ for $30$ colors (of size $8$ for $50$ colors, respectively), while the local search phases lead to solution of size 19–20 for $30$ colors (of size $36$ for $50$ colors, respectively).

Table 4. Performance of CTPLS on real datasets. The value of the time (in minutes) and the value of return solution for the preliminary step (first solution), local search edge replacement (LS1), and local search vertex replacement (LS2) are reported for two different color sets (30 and 50)

In Table 4, we analyze also the running time of CTPLS and the impact of the different phases, in particular of LS1 and LS2. First, the preliminary step is always faster than the overall local search running time. Only for email-EU-core-temporal networks the running time of the preliminary step has a great impact on the overall running time, but these are also the two cases where the solutions returned by the preliminary step contain a large number of colors ( $29$ and $40$ ). In all the cases we have considered, the overall running time depends mainly on the running time and the number of iterations of the local search phase, mainly of LS2. This can be seen in all the datasets, except for email-Eu-core-temporal-color 30, since in this case LS2 was able to compute fast an optimal solution starting from a solution of $29$ colors (out of $30$ ).

Finally, the running time of CTPLS does not seem to depend significatively on the number of colors in $C$ . For one graph (CollegeMsg) the running time of CTPLS on the instance with $30$ colors is indeed higher than on the instance with $50$ colors, while for the email-EU-core-temporal the running time of CTPLS on the instance with $50$ colors is higher than on the instance with $30$ colors. In the other cases, the running times are comparable. This behavior of CTPLS makes it a promising method even for larger values of $|C|$ .

6. Conclusion

In this work, we have defined a problem called Max CPTG that looks for a colorful temporal path of maximum length in a vertex-colored temporal graph. We have analyzed the approximation complexity of Max CPTG and of its variant on static graphs (Max S-CPTG). We have proved that both problems are not approximable within factor $O(|V|^{1/2 - \varepsilon })$ , unless P = NP. Then, we have designed CTPLS, a heuristic for Maximum Colorful Path in a Temporal Graph. We have presented an experimental evaluation of CTPLS on synthetic and real-world graphs. For the synthetic datasets, CTPLS returns near-optimal solutions for a set of 10 colors and its performance degrades as the number of colors increases. On the synthetic datasets, the heuristic is always fast. On the real-world datasets, CTPLS in many cases is able to compute solutions that are not far from the optimum in reasonable time, even for networks with $50$ colors.

Future works include an extension of the experimental results, with application of CTPLS to real case studies in tour recommendation and financial transactions analysis. It would also be interesting to consider whether it is possible to apply the algebraic approach proposed in Thejaswi et al. (Reference Thejaswi, Gionis and Lauri2020) to the Max CPTG problem and compare its performance with CTPLS.

Competing interests.

None.

A preliminary version of this paper appeared as Dondi, R., Hosseinzadeh, M.M. (2022). Finding Colorful Paths in Temporal Graphs. In International Conference Complex Networks & Their Applications (pp. 553–565). Cham: Springer. https://doi.org/10.1007/978-3-030-93409-5_46

References

Abiteboul, S., & Vianu, V. (1997). Regular path queries with constraints. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 122133).CrossRefGoogle Scholar
Akrida, E. C., Mertzios, G. B., Spirakis, P. G., & Raptopoulos, C. L. (2021). The temporal explorer who returns to the base. Journal of Computer and System Sciences, 120, 179193. doi: 10.1016/j.jcss.2021.04.001.CrossRefGoogle Scholar
Alon, N., Yuster, R., & Zwick, U. (1995). Color-coding. Journal of the ACM, 42(4), 844856. doi: 10.1145/210332.210337.CrossRefGoogle Scholar
Betzler, N., van Bevern, R., Fellows, M. R., Komusiewicz, C., & Niedermeier, R. (2011). Parameterized algorithmics for finding connected motifs in biological networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(5), 12961308. doi: 10.1109/TCBB.2011.19.CrossRefGoogle ScholarPubMed
Bruckner, S., Hüffner, F., Komusiewicz, C., & Niedermeier, R. (2013). Evaluation of ILP-based approaches for partitioning into colorful components. In: Bonifaci, V., Demetrescu, C., & Marchetti-Spaccamela, A. (Eds.), Experimental Algorithms, 12th International Symposium, SEA 2013, Rome, Italy, June 5-7, 2013. Proceedings, Lecture Notes in Computer Science (Vol. 7933, pp. 176187). Cham: Springer. doi: 10.1007/978-3-642-38527-8_17.Google Scholar
Bumpus, B. M., & Meeks, K. (2021). Edge exploration of temporal graphs. In Flocchini, P., & Moura, L. (Eds.), Proceedings of the 32nd International Workshop on Combinatorial Algorithms, IWOCA 2021 (pp. 107121). Cham: Springer. doi: 10.1007/978-3-030-79987-8_8.Google Scholar
Casteigts, A., Himmel, A., Molter, H., & Zschoche, P. (2020). Finding temporal paths under waiting time constraints. In: Cao, Y., Cheng, S., & Li, M. (Eds.), Procceding of the 31st International Symposium on Algorithms and Computation, ISAAC 2020 (pp. 30:130:18). doi: 10.4230/LIPIcs.ISAAC.2020.30.Google Scholar
Castelli, M., Dondi, R., & Hosseinzadeh, M. M. (2020). Genetic algorithms for finding episodes in temporal networks. Procedia Computer Science, 176, 215224.CrossRefGoogle Scholar
Choudhury, M. D., Feldman, M., Amer-Yahia, S., Golbandi, N., Lempel, R., & Yu, C. (2010). Automatic construction of travel itineraries using social breadcrumbs. In Chignell, M. H., & Toms, E. G. (Eds.), HT’10, Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, June 13-16, 2010 (pp. 3544). doi: 10.1145/1810617.1810626.Google Scholar
Cohen, J., Italiano, G. F., Manoussakis, Y., Thang, N. K., & Pham, H. P. (2021). Tropical paths in vertex-colored graphs. Journal of Combinatorial, 42(3), 476498. doi: 10.1007/s10878-019-00416-y.Google Scholar
Costa, A. F., Yamaguchi, Y., Traina, A. J. M., Traina, C. Jr, & Faloutsos, C. (2015). Rsc: Mining and modeling temporal activity in social media. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (pp. 269278).CrossRefGoogle Scholar
Diestel, R. (2012). Graph Theory, Graduate Texts in Mathematics (Vol. 173, 4th ed.). Berlin/Heidelberg, Springer.Google Scholar
Dondi, R., Fertin, G., & Vialette, S. (2011). Finding approximate and constrained motifs in graphs. In: Giancarlo, R., & Manzini, G. (Eds.), Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Palermo, Italy, June 27-29, 2011. Proceedings, Lecture Notes in Computer Science (Vol. 6661, pp. 388401). Cham: Springer. doi: 10.1007/978-3-642-21458-5_33.Google Scholar
Dondi, R., & Hosseinzadeh, M. M. (2021). Dense sub-networks discovery in temporal networks. SN Computer Science, 2(3), 111.CrossRefGoogle Scholar
Erlebach, T., Hoffmann, M., & Kammer, F. (2021). On temporal graph exploration. Journal of Computer and System Sciences, 119, 118. doi: 10.1016/j.jcss.2021.01.005.CrossRefGoogle Scholar
Gionis, A., Lappas, T., Pelechrinis, K., & Terzi, E. (2014). Customized tour recommendations in urban areas. In: Carterette, B., Diaz, F., Castillo, C., & Metzler, D. (Eds.), Seventh ACM International Conference on Web Search and Data Mining, WSDM.Google Scholar
Hagberg, A., Swart, P., & Chult, D. S. (2008). Exploring network structure, dynamics, and function using networkx, Tech. rep. Los Alamos, NM: Los Alamos National Lab. (LANL).Google Scholar
Holme, P. (2015). Modern temporal network theory: a colloquium. The European Physical Journal B, 88(9), 234.CrossRefGoogle Scholar
Kempe, D., Kleinberg, J., & Kumar, A. (2002). Connectivity and inference problems for temporal networks. Journal of Computer and System Sciences, 64(4), 820842.CrossRefGoogle Scholar
Kovanen, L., Karsai, M., Kaski, K., Kertész, J., & Saramäki, J. (2011). Temporal motifs in time-dependent networks. Journal of Statistical Mechanics: Theory and Experiment, 2011(11), P11005.CrossRefGoogle Scholar
Kowalik, L., & Lauri, J. (2016). On finding rainbow and colorful paths. Theoretical Computer Science, 628, 110114. doi: 10.1016/j.tcs.2016.03.017.CrossRefGoogle Scholar
Lacroix, V., Fernandes, C. G., & Sagot, M. (2006). Motif search in graphs: Application to metabolic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(4), 360368. doi: 10.1109/TCBB.2006.55.CrossRefGoogle ScholarPubMed
Leskovec, J., & Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data Google Scholar
Marino, A., & Silva, A. (2021). Königsberg sightseeing: Eulerian walks in temporal graphs. In Flocchini, P., & Moura, L. (Eds.), Proceedings of the 32nd International Workshop on Combinatorial Algorithms, IWOCA 2021 (pp. 485500). Cham: Springer. doi: 10.1007/978-3-030-79987-8_34.Google Scholar
Mendelzon, A. O., & Wood, P. T. (1995). Finding regular simple paths in graph databases. SIAM Journal on Computing, 24(6), 12351258. doi: 10.1137/S009753979122370X.CrossRefGoogle Scholar
Rozenshtein, P., Bonchi, F., Gionis, A., Sozio, M., & Tatti, N. (2019). Finding events in temporal networks: Segmentation meets densest subgraph discovery. Knowledge and Information Systems, 62, 1611–1639.Google Scholar
Sanli, C., & Lambiotte, R. (2015). Temporal pattern of online communication spike trains in spreading a scientific rumor: how often, who interacts with whom? Frontiers in Physics, 3, 79.CrossRefGoogle Scholar
Thejaswi, S., Gionis, A., & Lauri, J. (2020). Finding path motifs in large temporal graphs using algebraic fingerprints. Big Data, 8(5), 335362. doi: 10.1089/big.2020.0078.CrossRefGoogle ScholarPubMed
Wadhwa, S., Prasad, A., Ranu, S., Bagchi, A., & Bedathur, S. (2019). Efficiently answering regular simple path queries on large labeled networks. In Proceedings of the 2019 International Conference on Management of Data (pp. 14631480).CrossRefGoogle Scholar
Williamson, D. P., & Shmoys, D. B. (2011). The Design of Approximation Algorithms. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Wu, H., Cheng, J., Huang, S., Ke, Y., Lu, Y., & Xu, Y. (2014). Path problems in temporal graphs. Proceedings of the VLDB Endowment, 7(9), 721732. doi: 10.14778/2732939.2732945.CrossRefGoogle Scholar
Wu, H., Cheng, J., Ke, Y., Huang, S., Huang, Y., & Wu, H. (2016). Efficient algorithms for temporal path computation. IEEE Transactions on Knowledge and Data Engineering, 28(11), 29272942. doi: 10.1109/TKDE.2016.2594065.CrossRefGoogle Scholar
Zheng, C., Swenson, K. M., Lyons, E., & Sankoff, D. (2011). Omg! orthologs in multiple genomes - competing graph-theoretical formulations. In Przytycka, T. M., & Sagot, M. (Eds.), Algorithms in Bioinformatics - 11th International Workshop, WABI 2011, Saarbrücken, Germany, September 5-7, 2011. Proceedings, Lecture Notes in Computer Science (Vol. 6833, pp. 364375). Cham: Springer. doi: 10.1007/978-3-642-23038-7_30.Google Scholar
Zschoche, P., Fluschnik, T., Molter, H., & Niedermeier, R. (2020). The complexity of finding small separators in temporal graphs. Journal of Computer and System Sciences, 107, 7292. doi: 10.1016/j.jcss.2019.07.006.CrossRefGoogle Scholar
Zuckerman, D. (2006). Linear degree extractors and the inapproximability of max clique and chromatic number. In Proceedings of the 38th Annual ACM Symposium on Theory of Computing, May 21-23, 2006 (pp. 2123). doi: 10.1145/1132516.1132612.Google Scholar
Figure 0

Figure 1. An overview of the graph $G=(V, E,c)$ associated with $G_I$. Each box contains the set $V_i$ of vertices, with $0 \leq i \leq n+1$, and the path $p(V_i)$. For each vertex, its name is the left label, its colors are the right label. We assume that $\{ v_1, v_2 \} \in E_I$; hence, $c(v_{1,2})=c(v_{2,1})= c_{1,2}$, $\{ v_1, v_n \} \notin E_I$; hence, $c(v_{1,n}) = a_1^n$, $c(v_{n,1}) = a_n^1$, and $\{ v_2, v_n \} \in E_I$; hence, $c(v_{2,n})=c(v_{n,2})= c_{2,n}$.

Figure 1

Table 1. Performance of CTPLS on synthetic datasets, where the number of colors ranges from 10 to 50. Minimum, maximum, average, and standard deviation over 1000 independent synthetic networks for each different color sets are reported. Notice that the average running time is in seconds

Figure 2

Table 2. Performance of CTPLS on real datasets. The value of the time (in minutes) and the value of return solution (path) for the Max CPTG problem are reported for two different color sets (30 and 50)

Figure 3

Table 3. Performance of the local search phases (LS1 and LS2) of CTPLS on synthetic datasets. The values in the table are the average improvements of LS1 with respect to the solutions returned by preliminary step and the average improvements of LS2 with respect to the solutions computed by LS1

Figure 4

Table 4. Performance of CTPLS on real datasets. The value of the time (in minutes) and the value of return solution for the preliminary step (first solution), local search edge replacement (LS1), and local search vertex replacement (LS2) are reported for two different color sets (30 and 50)