Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-23T00:22:36.109Z Has data issue: false hasContentIssue false

Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study

Published online by Cambridge University Press:  22 October 2021

Pedro Rente Lourenco*
Affiliation:
Big Data and Artificial Intelligence, Vodafone Group, London, United Kingdom
Gurjeet Kaur
Affiliation:
Big Data and Artificial Intelligence, Vodafone Group, London, United Kingdom
Matthew Allison
Affiliation:
Big Data and Artificial Intelligence, Vodafone Group, London, United Kingdom
Terry Evetts
Affiliation:
Big Data and Artificial Intelligence, Vodafone Group, London, United Kingdom
*
*Corresponding author. E-mail: [email protected]

Abstract

With the outbreak of COVID-19 across Europe, anonymized telecommunications data provides a key insight into population level mobility and assessing the impact and effectiveness of containment measures. Vodafone’s response across its global footprint was fast and delivered key new metrics for the pandemic that have proven to be useful for a number of external entities. Cooperation with national governments and supra-national entities to help fight the COVID-19 pandemic was a key part of Vodafone’s response, and in this article the different methodologies developed are analyzed, as well as the key collaborations established in this context. In this article we also analyze the regulatory challenges found, and how these can pose a risk of the full benefits of these insights not being harnessed, despite clear and efficient Privacy and Ethics assessments to ensure individual safety and data privacy.

Type
Translational Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press

Policy Significance Statement

Vodafone encourages sharing and reuse of data through voluntary, market-driven mechanisms. Data sharing should only take place where it is legally compliant, ethical and socially acceptable, in line with the principles of trustworthiness and privacy-by-design. To be sustainable, sharing and reuse of data should be subject to fair remuneration that recognizes the significant investment required to collect, curate and maintain data, as well as produce meaningful and accurate insights. It is of particular importance that policy fosters innovation through data, and this can only exist in a sustainable market for data analytics and insights, that levels the playing field across the world and between the different players, and that at the same time extracts real value for society.

1. Introduction

Anonymized and aggregated telecommunications data of cell tower locations (“Telco Data”) offers a wealth of possibilities for innovation and impact in society, particularly when combined with other data sources and shared with entities whose purpose is to have meaningful positive impact in people’s lives. There are many examples of the value to be extracted from these datasets, as they offer a clear proxy of human activity that can be used to map poverty, design containment and mitigation strategies for epidemics or advise on urban mobility (Steenbruggen et al., Reference Steenbruggen, Borzacchiello, Nijkamp and Scholten2013) and human behavior (Douglass et al., Reference Douglass, Ram, Rideout and Song2015), among others.

In the process of using Telco data it is imperative to ensure privacy standards are met, and that proper ethical assessments are conducted so that a trusting relationship can be established between the individuals and the company that holds their data. By implementing privacy controls such as anonymization and aggregation, the benefits of analyzing these datasets can be unleashed while preserving the user’s privacy. Furthermore, the risk of unintended consequences against individuals and society must be assessed and mitigated for example insights on minority population movements which will be discussed in Section 2.3.

Telecommunication networks have different architectures and the backend systems collect different types of data, depending on the operating company. This means that an additional challenge is posed when the intention is to make cross-country and cross-operator analyses.

The COVID-19 pandemic hit Europe in March 2020, with quick spread from Italy to the whole of Europe and with devastating consequences for society. Being a human-mediated transmission type of disease, human behavior was an obvious factor to take into account in exit strategies for the pandemic and containment strategies for emergency situations across the world. Telco data, even at its most basic level, offers the possibility of extracting value to analyze population-level behaviors and patterns of mobility. As telecommunication networks depend on physical “cell sites”—places where the radio transmitters are placed to offer network coverage, and the connections to these have to be monitored by the network—for quality purposes, billing and operations—an approximate view of a cell phone’s location can be inferred. While cell based location today is still less accurate than other methods (such as GPS), we recognize it may reveal sensitive information about an individual and we had to define a more elaborate privacy by design approach; this entailed only collecting location relevant information for analysis, replacing all user IDs with randomly generated IDs (pseudonymisation), as well as aggregating to the level of 50 users or more and overlaying information on large geographical areas.

2. Methodology

With approximate location data and relatively high sampling frequencies on customer bases of several million cell phones, the representativeness of the data for mobility insights is very high, particularly in countries where cell phones are widespread and the usage is very high. The ubiquitous nature of these devices makes them uniquely positioned to offer important insights into human mobility and, consequently, into disease transmission patterns.

In order to have results comparable across countries and due to the differences in raw data capture methodologies, techniques were developed to allow a common framework across the Vodafone footprint, using different kinds of data. This study will focus on two types of analysis, and three types of datasets.

2.1. Raw datasets

Across the Vodafone footprint three different types of datasets were analyzed: Call Detail Records (CDRs)/x-Data Records (XDRs), Probe data, and App-based data.

CDRs and XDRs are datasets that capture only active events in the network. These are records of when a customer sends or receives a voice call, text message, multimedia message, or data packet. This means that the sampling frequency depends heavily on the usage of the cell phone, likely to be higher in societies where the usage of baseline services depends on data (such as web-based messaging or social media running in the background). XDRs tend to have higher sampling frequencies as they are exchanging data packets through the network, whereas CDRs typically correspond to voice and SMS communications. From these datasets, we used only the cell locations and timestamps.

Probe data is captured passively, often for traffic management and network operations. Each cell phone exchanges information with the network beyond active events and these signals represent an approximate location—just like with CDRs. These are higher frequency signals but they are not available at the same sampling rate across all networks, and not all operators capture these signals for analysis beyond network operations.

App-based data (Wang et al., Reference Wang, Wang, Cao, Chen and Ban2019) refers to data captured from a specific app, in this case the “myVodafone” app, in which customers consent to capturing of multiple signals from their phones, including networks speeds to monitor network quality. Some of these signals come with a cell tower position as well, which makes them somewhat equivalent to the other two datasets. The cohort of customers in this case is more limited as only app users who have consented to the use of their data will be providing information.

Data was captured from the start of February 2020 in order to capture pre-pandemic behaviors. All of the data is automatically generated by the networks and in this study we are reusing these datasets to provide anonymous analytics which are compatible with the original purpose of the data collection.

2.2. Types of analysis

In this work, two types of insights were extracted: origin-destination (OD) matrices and behavioral KPIs.

2.2.1. Home locations

Approximate home locations—or home “cell” locations—were calculated for each day by looking at night-time events (Colak et al., Reference Colak, Alexander, Alvim, Mehndiratta and Gonzalez2015). Given the nature of the measures implemented by governments it was decided not to have a longer estimation for home locations for each individual, because of changes in location pre and post lockdown, with large exodus from urban centers into second homes and family homes in more rural areas. The top 3 night cells the cell phone connects to be extracted, and “night” was defined as the period between 22:00 and 5:00, local time.

2.2.2. OD matrices

OD matrices allow an analysis of mobility across a country. By splitting a country into smaller regions (NUTS3 in Europe, local units in other countries) (Eurostat (European Commission), 2020) and counting the numbers of movements between the regions in each day, matrices were extracted that represent the estimated number of people moving from one region to another on each day. Boundary effects were negligible as the in-country geographical splits were large enough to ignore the small amounts of cell towers at the borders of each region that could overlap with other cells and cause odd “back and forth” behaviors.

These OD matrices were also scaled to population levels, based on home locations from the previous night and Eurostat population estimates for NUTS3 regions. By calculating the ratio of customers located in a certain region and dividing the population count by this value we get a multiplier to extrapolate numbers of customers moving to numbers of people moving.

Population estimates from Eurostat are updated as new information is collected for each country, and for the purposes of this study the most recent updates were used for each region. Population under 18 years of age is not counted in this dataset and is assumed to follow a similar behavior to the rest of the population. In order to take into account nuances in the market share in every region, the population-level scaling was done per NUTS 3 region. We assumed the demographic profile of Vodafone’s customer base to not be significantly different to that of the wider population.

2.2.3. Behavioral key performance indicator (KPI): Percentage of time at home cell

Using the approximate home cell calculations and considering the three potential home cells for each customer, the study processed the time in these home cells and the time out of these home cells to extract this KPI. The equation below reflects its calculation:

(1)$$ {\tau}_{home}=\frac{t_{in}}{t_{total}} $$

where $ {t}_{in} $ is the time the customer is seen at his home cells and $ {t}_{total} $ is the total time the customer is seen in the network.

2.2.4. Behavioral KPI: Radius of Gyration

González et al. (Reference González, Hidalgo and Barabási2008) defined the Radius of Gyration for human mobility in terms of its center of mass. In this study we use an adapted measure taking into account the most common home cell as the reference (i.e., the cell to which a customer is connected to the most during the night hours). Using these latitude and longitude values, as per the equations below, we can calculate the Haversine distance (Winarno et al., Reference Winarno, Hadikurniawati and Rosso2017)—the distance between the coordinates of all cell towers visited by the customer (independently of how many times these were visited) and the top home cell. From this we can calculate the Radius of Gyration (“ROG”).

(2)$$ {\displaystyle \begin{array}{c}a={\sin}^2\left(\frac{\Delta lat}{2}\right)+\cos \left({lon}_1\right)\cdot \cos \left({lon}_2\right)\cdot {\sin}^2\left(\frac{\Delta lon}{2}\right)\;\\ {}c=2\cdot a\tan 2\left(\sqrt{a}\sqrt{1-a}\right)\;\\ {} distance=R\times c\end{array}} $$

where R is the approximate radius of Earth (3,671 km). We then calculate ROG with

(3)$$ ROG=\sqrt{\frac{1}{n_{cells}}\sum \limits_{i=1}^n{\left({c}_i-{c}_{home}\right)}^2} $$

where $ \left({c}_i-{c}_{home}\right) $ is given by the Haversine distance calculated above.

The Radius of Gyration was calculated daily to allow a more granular insight into the behavioral changes through time.

2.2.5. Behavioral KPI: Collocation

As COVID-19 transmission is highly influenced by the numbers of people aggregating in the same space and time, a KPI of co-location was used—to assess the amount of people aggregated in a place (different from their home locations) at the same time.

For this 60 min buckets were processed throughout the day and looked into the numbers of customers seen in a specific cell in the network at the same time, as long as this was not their home cell (where it was assumed they would be in their houses). This assumption is quite robust in densely populated areas but can have some shortfalls in areas where the distribution of cell towers is more scarce. We address these limitations in a dedicates section further on.

2.2.6. Behavioral KPI: Number of cells visited

Another metric of the amount of mobility a customer has in the network is the diversity of cell towers “visited” per day, this was calculated to assess the levels of mobility restrictions pre and post lockdown measures were imposed.

2.3. Safeguarding individual privacy

In the process of acquiring mobility insights it is crucial that privacy and ethics are addressed from a design phase also known as “Privacy by Design.”

A Privacy Impact Assessment was conducted to determine the requirements needed to mitigate the privacy risks. This Assessment was conducted using a privacy assessment tool containing 72 questions around the types of data to be used, the purposes, the location of processing and the privacy measures applied to the processing. Separately, a security assessment was conducted to ensure that data was processed in a secure encrypted manner.

The privacy requirements required all insights to be aggregated at a level of 50 or more individuals; this means that during aggregation only movements of 50 or more individuals are represented, and groups below this threshold are discarded and represented as missing data. Furthermore only location network data was used, which is processed for lawful purposes within agreed retention periods; this data was pseudonymized with unique random user IDs and a country by country compliance analysis was performed, and the analysis was not performed if not agreeable by each regulatory body.

In order to get to the anonymized and aggregated tables seen in the figures, journeys need to be mapped. As an additional protection, the internal analysis is done on pseudonymized data sets to reduce the risk of analysts being able to directly identify individual journeys while still allowing the journey data to be analyzed. Once the data has been collated, the anonymized and aggregated outputs were shared with external parties, no personal data or unique identifiers were shared outside of the organization. Each data sharing activity—external to the company—undergoes its own assessment. This assessment analyses whether the external party is “legitimate” in their use of the data and it is not being used for negative purposes. They are also asked to sign up to contractual obligations prohibiting them from misusing the data or attempting to mix the data with other data with an aim to identify individuals. The risk of this being successful was low but protections were sought nevertheless.

2.4. Ethical assessments

Not only did we require controls to ensure individual privacy, we were also aware of the risks the insights might pose to minority group movements. For example, if a particular minority group resided in one area, would the insights show the movements of that minority group? Those insights could then be used by certain parties who might target those minority groups.

In order to mitigate these risks, we completed a “Group Human Impact Assessment”, which went further than assessing the individual right to privacy but the privacy of Groups or demographics as a whole. A country-by-country assessment was completed for the markets where we generated the insights. We then used external human rights benchmarking, European Court of Human Rights jurisprudence and reviewed that country’s surveillance regime to conclude what minority groups may be at risk. Once identified, we tested the data to see if movements in minority group areas could be gleaned. If so, the aggregation was increased to remove that risk. If the risk was still present or if there would be a plausible risk of data being misused, then data was not shared.

3. Results

3.1. Behavioral KPIs

In Figures 5 and 6, the different behavioral KPIs can be seen, plotted as the perceptual difference from a baseline set as the average KPI value for the first week of February. This was considered as a baseline mobility level as the pandemic had not hit these countries at that point, and it allows us to have a comparative analysis of the evolution of mobility and its reduction across different countries.

Figure 1. Example for Spain on the evolution of time spent at home at the beginning of the epidemic. We can clearly see an increase in the time spent at home throughout the country after lockdown measures were implemented (on March 14, 2020).

Figure 2. Example for Spain on the evolution of the Radius of Gyration at the beginning of the epidemic. We can clearly see a decrease in the ROG throughout the country after lockdown measures were implemented (on March 14, 2020). We can also notice some large outliers are present in the data as the average is much higher than the median and P25–75 range.

Figure 7 shows the evolution of reported COVID-19 cases (right axis) and the same for mobility metrics for some example countries.

Figure 3. Example for Spain on the evolution of the collocation KPI at the beginning of the epidemic. We can clearly see an decrease in the collocation KPI throughout the country after lockdown measures were implemented (on March 14, 2020). The dip in this KPI on March 8, 2020 corresponds to a set of missing data for that period due to technical error.

Figure 4. Example for Spain on the evolution of time spent at home at the beginning of the epidemic. We can clearly see a decrease in the number of distinct cells visited throughout the country after lockdown measures were implemented (on March 14, 2020).

Figure 5. Average Radius of Gyration for several European countries. Plots show a 7-day moving average with the respected lockdown date represented by a dotted vertical line. We see the fast drop once lockdown measures are implemented and a slow recovery of mobility as the pandemic evolves.

Figure 6. Average time outside home for several European countries. Plots show a 7-day moving average with the respected lockdown date represented by a dotted vertical line. We see the fast drop once lockdown measures are implemented and a slow recovery of mobility as the pandemic evolves.

Figure 7. Evolution of Average time outside home (left axis) and daily reported COVID-19 cases. Left plot refers to Portugal and the right plot to Italy. We can see that although there is a clear inverse correlation between the number of daily cases and mobility, other underlying factors contribute to the spread of the epidemic, as the number of daily cases in Portugal goes up as mobility increases and is then lowered, due to other measures being taken in the country. The same does not seem to happen for Italy.

4. Collaboration and Data Sharing

4.1. Principles for data sharing

No personal data from mobility insights was ever shared. Data is processed internally with pseudonymized customer identifiers and within secure servers and cloud environments throughout the Vodafone systems and footprint. Several security assessments were undertaken to ensure privacy and security concerns were addressed.

Furthermore, strict Terms and Conditions were set for the usage of Vodafone platforms that allow access to data, and this access was only granted upon a clear Privacy Impact Assessment and agreement of third parties to all of Vodafone’s terms. The mobility insights shared were on a strict case by case basis and only for the purposes agreed, with accesses revoked as soon as these expired or were no longer relevant.

4.2. Regulatory challenges and frameworks

The sensitivity of mobile location data is very important for Vodafone, and the needs to ensure full anonymization of this data are very well understood. In all data sharing events, Vodafone only ever provides fully anonymous, aggregated insights, in order to prevent reidentification of customers. Across Vodafone’s footprint, a number of different regulations apply, and although there have been initiatives to converge these into common ground in some geographical areas (e.g., GDPR or the ePrivacy directive), this is a somewhat localized effort, and it has suffered from fragmentation in the application of such directives across countries. Some data protection regulators (Wood, Reference Wood2020) shared their support for these types of analytics where others did not. This in turn causes uncertainty and slows down the usage of valuable insights like the ones presented in this paper. This is even clearer across other geographical areas, where the diversity of regulations, or the lack thereof, make it very difficult for data collaboratives to exist and thrive (Wiewórowski, Reference Wiewórowski2020).

Vodafone encourages data sharing initiatives through voluntary, market-driven mechanisms, in situations where it is legally compliant, ethical and socially acceptable, in line with the principles of trustworthiness and privacy-by-design. In order to maintain sustainability, these should be subject to fair remuneration that recognizes the significant investment required to collect, curate and maintain data, as well as produce meaningful and accurate insights. It is of particular importance that policy fosters innovation through data, and this can only exist in a sustainable market for data, that levels the playing field across the world and between the different players, and that at the same time extracts real value for society.

4.3. Data usage and usefulness

Within the context of this COVID-19 analytics project, Vodafone has shared anonymized and aggregated insights with the European Commission’s Joint Research Council (JRC), the International Monetary Fund (IMF), the World Bank (WB) and the United Nations Children’s Fund (UNICEF). These entities have been working in close collaboration with Vodafone to extract value from these datasets for different purposes, which we state below.

The JRC is working across Europe with multiple Telco Operators, in order to have a global view of mobility across the continent and how this has been impacted by the pandemic, as well as how mobility is impacting the spread of COVID-19, by extracting relevant network metrics from OD matrices and tracking them across time. The initial reports can be found online (Iacus et al., Reference Iacus, Santamaria, Sermi, Spyratos, Tarchi and Vespe2020a, Reference Iacus, Santamaria, Sermi, Spyratos, Tarchi and Vespeb; Santamaria et al., Reference Santamaria, Sermi, Spyratos, Iacus, Annunziato, Tarchi and Vespe2020).

With the IMF, Vodafone has been working closely to understand disparities of the impact of lockdowns in different age groups and genders. By disaggregating the self-isolation KPIs developed into different age groups and genders, we uncovered different levels of impact for different population segments, which allows for better policy decisions to fight inequality. This research has been conducted across European countries where Vodafone has a large presence (namely Portugal, Italy, and Spain) and is published in the World Economic Outlook of 2020; this work has also been conducted in South Africa, where income gaps are a source of inequality and is currently ongoing.

The purpose of the cooperation with the World Bank is so this entity can use Vodafone’s insights in their policy advice and economic modeling for governments around the world.

UNICEF has expressed interest in evaluating regional differences in these KPIs, as these can uncover social inequalities that put certain population segments at risk. This is particularly important to evaluate the income inequalities and has been a target of UNICEF’s research since the start of the pandemic (Garcia-Herranz et al., Reference Garcia-Herranz, Sekara and Kim2020). More particularly UNICEF’s Country Offices have been using mobility insights to assess the risk and impact of schools reopening, mitigating the impact of social distancing measures, use mobility in epidemiological modeling of the disease and consider socio-economic and urban/rural differences in mobility reduction to better tailor mitigation policies.

Through a collaboration with the WorldPop project at the University of Southampton, Vodafone has used these anonymized and aggregated insights to inform epidemiological models and insights (Floyd et al., Reference Floyd, Lourenço, Tatem, Candrinho and Oliver2020). The geographical spread of disease is a key factor to take into account in policy matters, and the study performed across Europe revealed that there is value in coordinated containment measures across countries to slow down the spread of COVID-19 and the rates of infection (Ruktanonchai et al., Reference Ruktanonchai, Floyd, Lai, Ruktanonchai, Sadilek, Rente-Lourenco, Ben, Carioli, Gwinn, Steele, Prosper, Schneider, Oplinger, Eastham and Tatem2020).

5. Discussion

5.1. Representativeness and limitations

Telco datasets offer a highly representative overview of population behaviors, due to their large scope, having lower rates of fake accounts and high frequency of data. Furthermore, they overcome the “smartphone-only” limitations of app-based location tracking services, which makes them more inclusive and less prone to biases like income gaps and generational bias.

However, these datasets are limited to the density of the Telco networks, which depend on population density and the company’s own strategy and technologies. Geographical locations with higher population densities, such as large metropoles, will offer higher sensitivity to movement, as mobile phones will connect to many different Base Transmitter Stations (BTS) throughout the day, even with short ranged movements, in order to offer the best service in a busy area of the network. On the other hand, geographical locations with lower population densities, such as rural areas, will see less traffic in the network and will not need as many BTS units, which make the coverage much wider, and the inference of locations less precise. This affects many of the calculations, as for example the home cell determination uses a specific number of cells to be considered during the night (in this case 3) and it is much easier for a mobile phone to connect to more than 3 different cells during night hours in a city than it is in a rural area. Our aggregation rules also mean that less populated areas would not be represented.

Despite these limitations, anonymized and aggregated Telco data still offers an extremely valuable set of insights on general population mobility and can be a key element in social good research, as has been thoroughly shown during the COVID-19 pandemic. There is potential for near real-time insight which can be extremely useful for government bodies and other stakeholders, and despite the large amount of compute effort and the need for close monitoring of the data and insights, this is exactly what was produced by Vodafone in this study. By providing a platform with near real-time anonymized insights we aimed at making the most useful information available to the right people at the right time, in contrast to the more usual historical analysis approach seen in this kind of study.

Acknowledgments

We are grateful for the technical assistance of A. Marin, D. Patané, M. Dziduch, M. Pirvulescu, and S. Foster from the Vodafone Group Big Data and Artificial Intelligence team, as well as D. Gonzalez, D. Fierro, G. Lastra, and L. Rodriguez Solis from Vodafone Business in the development of the KPIs. We would also like to thank the Big Data and Artificial Intelligence teams in all of the participating Vodafone local markets.

Supplementary Materials

To view supplementary material for this article, please visit http://dx.doi.org/10.1017/dap.2021.26.

Funding Statement

This research was supported by Vodafone Group Plc.

Competing Interests

The authors declare no competing interests exist.

Author Contributions

Conceptualization: P.R.L.; Methodology: P.R.L.; Data curation: P.R.L.; Data visualization: P.R.L., G.K.; Writing original draft: P.R.L. All authors approved the final submitted draft.

Data Availability Statement

Data and code are only shared on a strictly case by case basis as per Vodafone’s Privacy and Security standards.

Ethical Standards

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

References

Colak, S, Alexander, LP, Alvim, BG, Mehndiratta, SR and Gonzalez, MC (2015) Analyzing cell phone location data for urban travel: current methods, limitations, and opportunities. Transporation Research Record 2526(1), 126135, https://doi.org/10.3141/2526-14CrossRefGoogle Scholar
Douglass, RW, Ram, M, Rideout, D and Song, D (2015) High resolution population estimates from telecommunications data. EPJ Data Science 4, https://doi.org/10.1140/epjds/s13688-015-0040-6CrossRefGoogle Scholar
Eurostat (European Commission) (2020) Statistical regions in the European Union and partner countries NUTS and statistical regions 2021 2020 edition. Publications Office. https://data.europa.eu/doi/10.2785/72829.Google Scholar
Floyd, JR, Lourenço, PR, Tatem, AJ, Candrinho, B and Oliver, N (2020) Malaria parasite mobility in Mozambique estimated using mobile phone records. Spain: ECAI 2020. Available at https://ecai2020.eu/papers/1320_paper.pdf.Google Scholar
Garcia-Herranz, M, Sekara, V and Kim, D (2020) Socioeconomic and rural-urban differences on the effects of physical distancing measures. MagicBox Report. Available at https://www.unicef.org/innovation/media/13811/file.Google Scholar
González, M, Hidalgo, C and Barabási, A (2008) Understanding individual human mobility patterns. Nature, https://doi.org/10.1038/nature06958CrossRefGoogle ScholarPubMed
Iacus, S, Santamaria, C, Sermi, F, Spyratos, S, Tarchi, D and Vespe, M (2020a) How human mobility explains the initial spread of COVID-19: A European regional analysis. European Commission, & Joint Research Centre. Available at https://op.europa.eu/publication/manifestation_identifier/PUB_KJNA30292ENN.Google Scholar
Iacus, S, Santamaria, C, Sermi, F, Spyratos, S, Tarchi, D and Vespe, M (2020b) Mapping Mobility Functional Areas (MFA) using mobile positioning data to inform COVID-19 policies: A European regional analysis. European Commission, & Joint Research Centre. Available at https://op.europa.eu/publication/manifestation_identifier/PUB_KJNA30291ENN.Google Scholar
Ruktanonchai, NW, Floyd, JR,Lai, S, Ruktanonchai, CW, Sadilek, A, Rente-Lourenco, P, Ben, X, Carioli, A, Gwinn, J, Steele, JE, Prosper, O, Schneider, A, Oplinger, A, Eastham, P and Tatem, AJ (2020) Assessing the impact of coordinated COVID-19 exit strategies across Europe. Science 369(6510), 14651470, https://doi.org/10.1126/science.abc5096CrossRefGoogle ScholarPubMed
Santamaria, C, Sermi, F,Spyratos, S, Iacus, SM, Annunziato, A, Tarchi, D and Vespe, M (2020) Measuring the impact of COVID-19 confinement measures on human mobility using mobile positioning data: A European regional analysis. European Commission, & Joint Research Centre. Available at https://op.europa.eu/publication/manifestation_identifier/PUB_KJNA30290ENN.CrossRefGoogle Scholar
Steenbruggen, J, Borzacchiello, MT, Nijkamp, P and Scholten, H (2013) Data from telecommunication networks for incident management: An exploratory review on transport safety and security. Transport Policy 28, 86102, https://doi.org/10.1016/j.tranpol.2012.08.006CrossRefGoogle Scholar
Wang, F, Wang, J, Cao, J, Chen, C and Ban, X (Jeff) (2019) Extracting trips from multi-sourced data for mobility pattern analysis: an app-based data example. Transportation Research Part C: Emerging Technologies 105, 183202, https://doi.org/10.1016/j.trc.2019.05.028CrossRefGoogle ScholarPubMed
Wiewórowski, R (2020) EDPS comments concerning COVID-19 monitoring of spread. Available at https://edps.europa.eu/sites/edp/files/publication/20-03-25_edps_comments_concerning_covid-19_monitoring_of_spread_en.pdf.Google Scholar
Winarno, E, Hadikurniawati, W and Rosso, RN (2017) Location based service for presence system using haversine method?. 2017 International Conference on Innovative and Creative Information Technology (ICITech), IEEE, https://doi.org/10.1109/INNOCIT.2017.8319153CrossRefGoogle Scholar
Wood, S (2020) Statement in response to the use of mobile phone tracking data to help during the coronavirus crisis. Available at https://ico.org.uk/about-the-ico/news-and-events/news-and-blogs/2020/03/statement-in-response-to-the-use-of-mobile-phone-tracking-data-to-help-during-the-coronavirus-crisis/.Google Scholar
Figure 0

Figure 1. Example for Spain on the evolution of time spent at home at the beginning of the epidemic. We can clearly see an increase in the time spent at home throughout the country after lockdown measures were implemented (on March 14, 2020).

Figure 1

Figure 2. Example for Spain on the evolution of the Radius of Gyration at the beginning of the epidemic. We can clearly see a decrease in the ROG throughout the country after lockdown measures were implemented (on March 14, 2020). We can also notice some large outliers are present in the data as the average is much higher than the median and P25–75 range.

Figure 2

Figure 3. Example for Spain on the evolution of the collocation KPI at the beginning of the epidemic. We can clearly see an decrease in the collocation KPI throughout the country after lockdown measures were implemented (on March 14, 2020). The dip in this KPI on March 8, 2020 corresponds to a set of missing data for that period due to technical error.

Figure 3

Figure 4. Example for Spain on the evolution of time spent at home at the beginning of the epidemic. We can clearly see a decrease in the number of distinct cells visited throughout the country after lockdown measures were implemented (on March 14, 2020).

Figure 4

Figure 5. Average Radius of Gyration for several European countries. Plots show a 7-day moving average with the respected lockdown date represented by a dotted vertical line. We see the fast drop once lockdown measures are implemented and a slow recovery of mobility as the pandemic evolves.

Figure 5

Figure 6. Average time outside home for several European countries. Plots show a 7-day moving average with the respected lockdown date represented by a dotted vertical line. We see the fast drop once lockdown measures are implemented and a slow recovery of mobility as the pandemic evolves.

Figure 6

Figure 7. Evolution of Average time outside home (left axis) and daily reported COVID-19 cases. Left plot refers to Portugal and the right plot to Italy. We can see that although there is a clear inverse correlation between the number of daily cases and mobility, other underlying factors contribute to the spread of the epidemic, as the number of daily cases in Portugal goes up as mobility increases and is then lowered, due to other measures being taken in the country. The same does not seem to happen for Italy.

Supplementary material: File

Lourenco et al. supplementary material

Lourenco et al. supplementary material 1
Download Lourenco et al. supplementary material(File)
File 12.4 KB
Supplementary material: File

Lourenco et al. supplementary material

Lourenco et al. supplementary material 2
Download Lourenco et al. supplementary material(File)
File 43.1 KB
Submit a response

Comments

No Comments have been published for this article.

Author comment: Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study — R0/PR1

Comments

This work represents a summary of a global effort within Vodafone Group to tackle COVID-19 and provide relevant insights to institutions and policymakers. At a time of great changes in population behaviour, the insights provided by Vodafone have proven to be extremely relevant for many organisations and the impact these types of insights can have in policy making are great. At the same time and at the forefront of all of Vodafone's activities the ethical and privacy questions were addressed to ensure safety and trust from the public. We hope this paper sheds a light into how important these activities are and elicits more collaboration across the industry and public organisations to address societal problems.

Review: Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study — R0/PR2

Conflict of interest statement

Tuulia Karjalainen is employed at Telia Company as a senior legal counsel. She is currently on study leave from Telia, conducting doctoral research at University of Helsinki.

Comments

Comments to Author: Thank you for the opportunity to review this interesting research article. In general, the article provides practical and detailed insights into deriving movement patterns from telecommunications data, and into the use of this data in the fight against Covid-19. This article adds value to the current discussion on mobile data analytics through a solid and comprehensive case study. The authors explain how mobility patterns are derived from telecommunications data, what kind of analyses Vodafone was able to make from the data during the pandemic, and describe the challenges and considerations related to telco data analytics.

However, there are some opportunities for revision that would make the article even more valuable and I would like to offer my suggestions on how to improve it.

While the article makes a good case study, it could have a broader perspective. The article sets Vodafone’s analytics into the context of Covid-19 and explains the benefits of telco data in assessing the effectiveness of restrictions measures in several countries. In addition, the authors touch on many limitations of telco data throughout the article. However, the article could address these limitations more clearly, while providing more background to the choices made during the project.

Structurally, the article would benefit from the introduction of a section dealing with the general factors and limitations affecting telco data analytics. Throughout the article, the authors present important considerations about, among others, legal restrictions for telco data analytics in the EU, extrapolating operator data to the population level, and the effects of population density to the available data. I would suggest that a section be added to the beginning of the article, explaining all factors that are essential to understanding the benefits and limitations of the data and that help justify some of the choices made in terms of data accuracy and availability.

Furthermore, the article excellently presents the valuable information and unique insights telco data can bring into policymaking through accurate and up-to-date movement analysis. However, I believe that the article could contribute more to both scientific and political discussion by also addressing the loss of accuracy resulting from the aggregation and anonymization of data. I fully agree with the authors that these measures are indispensable to preserve the privacy of mobile customers and also the ethicality of this kind of analysis. At the same time, the balance between accuracy and full anonymization is one of the most difficult questions in telco data analytics, and the article has great potential to contribute more to this discussion. The decisions made in this case to limit group sizes to a minimum of 50 and geographical areas to NUTS3 seem reasonable but it would be interesting to hear about the rationale behind these choices, and the arguments considered in determining these limits.

In addition to my general remarks above, I hope that the following suggestions, albeit less fundamental and more detailed by nature, will also prove helpful:

2.1 Raw Datasets

- The definitions of different types of telco data (CDRs, probe, and app-based data) are well written and clear.

- The significance and representativeness of app-based data from the MyVodafone app compared to the two other types of data could be elaborated. It is noted that the customer cohort is more limited in app-based data than the other data types, and that the customers using the app need to consent to the use of data. These two conditions (use of app and consent) presumably limit the number of customers whose app-based data is available for analysis.

2.2 Types of analysis

- 2.2.1 The use of NUTS3 regions as standard units for geographical location: The units can be relatively large in many countries, affecting the usability of data. In other words, telco data analysis on NUTS3 levels only reveals movement between regions, in many cases failing to identify movement patterns in smaller areas. However, at the same time the spatial inaccuracy improves privacy. Related to my general comment above, it would be interesting to elaborate on the balance between these two concerns.

- 2.2.1 Scaling of OD matrices to population levels: I appreciate the clear description of extrapolating the data to the whole population. One factor reducing the representativeness of telco data is that each operator is usually able to provide data only from their own customers, meaning that the data represents the operator’s customer base instead of the whole population. Does Vodafone’s customer base differ from the general population in terms of demography and was this acknowledged when extrapolating the data?

- 2.2.1 The rationale behind using the top three nighttime cells is not clear to me. Is this related to better accuracy of location (triangulation) or just a technical fact due to devices connecting to varying cells?

2.3 Privacy and Ethics

- The paper only refers to legal constraints in using telco data in the EU. While I understand that this is not a legal study, I think the ePrivacy Directive poses such fundamental restrictions to telecommunication data analytics in EU member states that it should be more clearly acknowledged also in this paper. For example, the anonymization and aggregation requirements are entirely based on law and significantly affect the accuracy of data.

- I appreciate the clear description of the anonymization efforts taken by Vodafone through aggregation of data. However, please see my general comment above about balancing anonymity and the usefulness of the data.

- Vulnerable demographic groups: The fact that even aggregated and anonymized telco data analysis can result in unintended consequences to minorities and other vulnerable groups is an interesting and important finding. On what criteria was the assessment of these vulnerable groups based? Is it possible to give examples of the kind of data that was subject to additional restrictions?

- In the second paragraph of this section you mention a Privacy Impact Assessment and in the next paragraph, a Group Privacy Impact Assessment, of which the latter is described as a novelty in responding to ethical concerns. I was confused as to whether these are two separate impact assessments.

4.2 Regulatory challenges and frameworks

- This section states that “the particular challenges related to full anonymization of this data are very well understood.” While the article touches on many of these challenges, I would take the opportunity to spell these challenges out in this section. Related to my general comment about the limitations of telco data, the easy identifiability of location data, the balance between accuracy and anonymity, and the legal and technical challenges of the anonymization process may presumably explain in part many of the choices Vodafone has made in determining group sizes and geographical areas, and the limitations to the availability of data. I believe addressing these choices more clearly would provide important background for the readers.

4.3 Data usage and usefulness

- This section mentions disaggregating self-isolation KPIs into gender and age groups. This sounds like an important factor for political decision-making that clearly improves the accuracy and usefulness of data. It would, however, be interesting to hear about the effect these additional attributes have on the aggregation of data, and the combination of the demographic information with telco data. Is the age/gender information derived from statistics about geographical areas? How were possible differences between Vodafone’s customer base and the general population considered?

- The list of countries in which the analysis was conducted (Italy, Spain, Portugal) could be mentioned earlier in the article for clarity.

5.1 Representativeness and limitations

- Population density and its effect on telco data analysis is explained well in this section. However, the topic feels somewhat out of place in the concluding section. Please see my above suggestion about a new, separate section on the limitations of telco data.

Review: Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study — R0/PR3

Conflict of interest statement

No Conflicts of Interest.

Comments

Comments to Author: The paper describes Vodafone's efforts to provide mobile CDR and XDR based insights to several major stakeholders via data collaboratives during the management of the COVID crisis. The paper describes the datasets used for this purpose, gives examples of the indicators that are provided, briefly mentions privacy and ethics concerns, and provides a summary of existing initiatives.

While the main take home messages of the paper are not novel (e.g. that telco data can be very useful to provide insights to policy makers during the pandemic), the paper provides some insights into the existing data collaboratives with Vodafone. Examples of visualizations from the provided indicators are quite simple, and by themselves, just illustrate the potential of the data. What would be useful to provide, in addition to these examples, is a guideline for initiating new data collaboratives with Vodafone, or even a starting point for this, which is largely missing from the paper. The ethical and privacy issues are not discussed in detail, and raise more questions than they answer. This part of the paper can benefit from a more in-depth, detailed, and practical discussion.

Further comments, in order of appearance in the paper:

In Section 2.1, the definition of CDR includes data packets. A more typical definition includes voice and SMS in CDR, and data packets in XDR. Data packet exchange has a much higher frequency than voice/SMS, and it may be a good idea to make the reader aware of this. Subsequently, CDR is less "threatening" to privacy than XDR, because of its reduced frequency. If the first dataset includes data packets of applications running in the background, it will be better to call it XDR, instead of CDR.

In the beginning of Section 2.1, it will be useful to state the data collection period, and in particular, its relation to Covid measures. As the authors also note, pre-Covid and post-Covid mobility is different, and depending on stringency measures and lockdowns, show great variation across countries and time periods. Some variables, such as work locations of phone line owners, could be determined using pre-Covid data, and used during Covid, but it is more difficult to determine actual work location during lockdowns.

Section 2.2.2, it will be good to describe potential sources of bias in the estimates. Does the ratio of customers to population estimate also correct for market share of the particular mobile operator, as well as for children, who do not legally own phone lines? How recent are the population estimates from Eurostat, and do they include large movements of people over short periods, such as refugees?

Earlier CDR based collaboratives (e.g. Data for Development - Blondel, V. D., Esch, M., Chan, C., Clérot, F., Deville, P., Huens, E.,... & Ziemlicki, C. (2012). Data for development: the d4d challenge on mobile phone data. arXiv preprint arXiv:1210.0137.) have blurred base station locations, as these were considered to be sensitive information. Was there a similar processing in the calculations, and if yes, what is the impact of such processing in the results?

Section 2.2.3, please use the open form of KPI in its first occurrence.

Are the three potential locations pooled to describe the "home" location? Is this a good assumption? Were these three cells often bordering cells? Why not take the top location and pool the top three? What is the distribution of presence over the top, top two and top three locations? If only the top location is used, how does Figure 1 change?

In Section 2.2.4, the top cell is used instead of the top three. This makes sense, but the use of three cells in the previous subsection is not well-motivated. Are there social arguments to provide more insight there? Do many people have multiple homes?

In Section 2.2.4, Is the radius of Earth taken as 3671km? This small detail may help the reader. For the number of cells, do you use the list of unique cells, or does each visit count (possibly with home or other cell visits in between) as a different entry? This has an important implication. Suppose I visit 100 times a cell at a short distance, and once a cell at a large distance. In the case where unique cell list is used, the impact of the one visit to the large-distance cell will have a large impact on the radius of gyration calculation. The preference does not come clear in the text.

Figure 3 contains a large dip in collocation on 8-9 March. What is the reason for this dip? A similar but smaller dip is there in figure 4, on 8 March (and not 9th). Why?

In Section 2.3, the risks and risk mitigation steps are vaguely specified. " Therefore a country by country assessment was conducted": Who has conducted this assessment? Were they stakeholders from groups at risk involved? How are all the groups at risk identified?

"...where the risk was intolerable, the data analytics either was not conducted or additional aggregation controls were put in place to remove those potential insights.": Which locations had "intolerable risks" and why? Which aggregation controls were put in place for these specific cases?

This section is also not clear about whether the data (in its anonymized and aggregated form) left the company or not. It appears from "...all data is pseudonymised when processing the data into the anonymised reports in order to mitigate any privacy risks." that only the reports are shared, and not the data themselves. However, a few sentences later we have " Each data sharing activity - external to the company", which suggests that there are data sharing activities external to the company. It is not clear from the text what the shared data are, and with whom they are shared. An explicit list of shared items will be helpful to assess the privacy implications.

In Figure 5, it may be a good idea to superpose some stringency index or lockdown indicator markers on the data. Is the RoG increasing despite lockdown measures being in place? Or did the countries gradually relax the measures? The interpretation is quite different in both cases.

The quality of Figure 7 can be improved. The text is too small, the lines are too thin. A few simple guidelines can improve your visualizations very quickly. Check Edward Tufte's books on visualization for some useful principles of visualization. High quality plots can also be created easily with MATLAB, R, or similar software. The following link has some useful suggestions for instance: https://nl.mathworks.com/matlabcentral/fileexchange/35246-matlab-plot-gallery-publication-quality-graphics.

Section 4.1 is not sufficiently clear about whether such data collaboratives can be established at the moment with Vodafone, and what to do to initiate them. They are judged on a case-by-case basis, but are there guidelines for applying for a collaborative? Providing some contact points and guidelines will be useful.

Section 4.3 is a very useful summary on the existing data collaboratives. Which segmentation factors were used for these initiatives? Age and gender based disaggregation are mentioned for the study with IMF, but is this all? Were there no other factors involved in this and the other studies?

There exist other data collaboratives where different factors were involved (e.g. the Data for Refugees study involved a "refugee" tag in segmenting the CDR, and other indicators are computed internally by telecom operators. Salah, A. A., Pentland, A., Lepri, B., Letouzé, E., Vinck, P., de Montjoye, Y. A.,... & Dagdelen, O. (2018). Data for refugees: the D4R challenge on mobility of Syrian refugees in Turkey. arXiv preprint arXiv:1807.00523.). Each segmentation brings different concerns for the privacy and ethics review, but also offers additional insights. The Magicbox report cited in this section, for example, mentions disaggregating behavior according to poverty, where each (home) area is tagged with an indicator about average income.

The paper does not discuss some of the main concerns of mobile data sharing initiatives in times of humanitarian crises, such as pandemics. The short-term benefits are obvious, but what are the longer-term risks? Oliver et al. (2020) expresses some of these concerns:

"A key concern is that the pandemic is used to create and legitimize surveillance tools used by government and technology companies that are likely to persist beyond the emergency. Such tools and enhanced access to data may be used for purposes such as law enforcement by the government or hypertargeting by the private sector. Such an increase in government and industry power and the absence of checks and balance is harmful in any democratic state. The consequences may be even more devastating in less democratic states that routinely target and oppress minorities, vulnerable groups, and other populations of concern." (Oliver, N., Lepri, B., Sterly, H., Lambiotte, R., Deletaille, S., De Nadai, M.,... & Colizza, V. (2020). Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle.)

An important item missing from the discussion (and stressed in the Oliver 2020 paper) is the potential of mobile CDR/XDR to provide real-time insights to the stakeholders. Data collection, aggregation, anonymization, and ethics clearances usually take a lot of time and effort, as the authors also stress. Subsequently, these analyses are typically conducted by looking at historical data. However, what is much more useful for policy makers is the real-time potential of such data. Mobile CDR can provide a snapshot of what is happening now in a country. Are any of the mentioned initiatives included such near-real-time insights? Is there an existing pipeline (including infrastructure, code, agreements, review and approval schemes, etc.), which can be rapidly activated to provide such timely insights?

Review: Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study — R0/PR4

Conflict of interest statement

No Conflicts of Interest.

Comments

Comments to Author: This paper describes the experience of Telco company Vodafone in using mobile phone data to develop metrics to monitor mobility changes during the COVID-19 pandemic. Overall, the paper is interesting. However, there are several aspects that must be clarified and improved. I describe them in the following:

1) When describing the cell sites and the geographic information they convey, the authors say "[...] while still maintaining individual privacy as these locations are approximate and offer less resolution than GPS level datasets". While it is true that mobile phone data provide less precise geographic information then GPS datasets, individuals can be re-identified rather easily even in a mobile phone dataset. For example, de Montjoye et al. 2014 (https://www.nature.com/articles/srep01376) show that just four locations are enough to re-identify an individual in a mobile phone dataset, and more types of attacks are suggested and implemented in Pellungrini et al. 2017 (https://dl.acm.org/doi/10.1145/3106774). In general, the authors may refer to de Montjoye et al. 2018 (https://www.nature.com/articles/sdata2018286) for a summary of the privacy problems in mobile phone data and how to deal with them.

2) The authors use the term Call Detail Records (CDRs) to indicate records corresponding to "voice call, text message, multimedia message or data packet". In many papers, records that are created from data connection are usually referred to as eXtended Detail Records (XDRs). It would be useful to mention this ambiguity in the nomenclature in the paper (maybe the authors could say that they combine an individual’s CDRs and XDRs).

3) Although OD matrices are mentioned in Section 2.2, it is not clear how they are used to monitor mobility changes. It would be beneficial for the reader to briefly describe in which way OD matrices have been used by JRC. Do they just visualize their structure or extract relevant network metrics from them to track their evolution in time?

4) To be precise, the Radius Of Gyration (ROG) of an individual, as initially introduced by Gonzalez et al. 2008 (https://www.nature.com/articles/nature06958) for human mobility, is defined with respect to their centre of mass and not with respect to the individual’s home location. The authors should hence clarify that they use a variant of the original ROG. Moreover, as I understand from Figure 2, the authors computed an individual's ROG day by day, i.e., considering the phone records of that individual for that day only. In general, the ROG is used to describe individual mobility ranges for a wider temporal period (e.g., two weeks, see Gonzalez et al. 2008). Computing the ROG of an individual/day using records of the past week(s) could have provided a more reliable indication of the individual's mobility range. Maybe the authors could comment on that in the paper. Also, note that in the ROG equation variable n is not defined.

5) "For this 60 min buckets were processed throughout the day and looked into the number of customers seen in a specific cell in the network at the same time, as long as this was not their home cell (where it was assumed they would be in their houses)". The assumption that people are in their house when in the home cell is reasonable in densely populated areas (in which many towers are installed), but not very much in scarcely populated areas. Analogously, co-location in scarcely populated areas (where the coverage of a cell is large) is less reliable. These limitations related to the geographic distribution of towers should be mentioned here (and not only in Section 5) to highlight the limitations of the proposed metrics.

6) "In this case all the insights produced are aggregated at a level of 50 or more individuals in order to preserve individual privacy". I am not sure I understand what the authors do here. What does it mean that the data are aggregated at the level of 50 individuals? And why precisely 50?

7) "Whilst individual privacy could be preserved through anonymisation". As raised in point 1), anonymisation (if intended as the process of just removing the individual’s identities) is not enough to guarantee privacy preservation.

8) In the Data Availability Statement, the authors say that "Data and code are only shared on a strictly case by case basis as per Vodafone's Privacy and Security standards". Could you clarify what these standards are? For example, is it possible to access the list of ROGs of the individuals for a given day?

Minor points:

8) In general, the figures are hard to read and many of them are never mentioned in the main text. First, I would make all numbers and labels bigger; second, it would be useful to add country-specific information about salient events in the management of the pandemic (e.g., starting of lockdown periods, as vertical lines for example)

9) Many papers have been published in 2020 on using mobile phone data to monitor mobility changes due to non-pharmaceutical interventions to contrast the COVID-19 pandemic. It would be useful to mention the most relevant ones, at least for the countries mentioned in the paper (e.g., Italy and Spain).

Recommendation: Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study — R0/PR5

Comments

No accompanying comment.

Decision: Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study — R0/PR6

Comments

No accompanying comment.

Author comment: Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study — R1/PR7

Comments

This work represents a summary of a global effort within Vodafone Group to tackle COVID-19 and provide relevant insights to institutions and policymakers. At a time of great changes in population behaviour, the insights provided by Vodafone have proven to be extremely relevant for many organisations and the impact these types of insights can have in policy making are great. At the same time and at the forefront of all of Vodafone's activities the ethical and privacy questions were addressed to ensure safety and trust from the public. We hope this paper sheds a light into how important these activities are and elicits more collaboration across the industry and public organisations to address societal problems.

Recommendation: Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study — R1/PR8

Comments

No accompanying comment.

Decision: Data sharing and collaborations with Telco data during the COVID-19 pandemic: A Vodafone case study — R1/PR9

Comments

No accompanying comment.