Hostname: page-component-586b7cd67f-t8hqh Total loading time: 0 Render date: 2024-11-29T08:18:19.211Z Has data issue: false hasContentIssue false

Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak

Published online by Cambridge University Press:  15 September 2021

Francesco Calabrese
Affiliation:
Big Data and AI, Vodafone, 20147 Milan, Italy
Enrico Cobelli
Affiliation:
Big Data and AI, Vodafone, 20147 Milan, Italy
Vincenzo Ferraiuolo
Affiliation:
Big Data and AI, Vodafone, 20147 Milan, Italy
Giovanni Misseri
Affiliation:
Big Data and AI, Vodafone, 20147 Milan, Italy
Fabio Pinelli*
Affiliation:
Big Data and AI, Vodafone, 20147 Milan, Italy
Daniel Rodriguez
Affiliation:
Big Data and AI, Vodafone, 20147 Milan, Italy
*
*Corresponding author. E-mail: [email protected]

Abstract

In this paper, we present the work conducted by Vodafone to enrich the understanding of people movement in Italy during the outbreak of the Coronavirus in 2020, and the tool developed to support the decisions taken by the authorities during that period. We have developed a solution to anonymously monitor the daily movements of Vodafone SIMs in Italy, at aggregate level, at different spatial and temporal granularity, to provide insights into the movements of Italians.

Type
Translational Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© Vodafone Italia S.p.A., 2021. Published by Cambridge University Press

Policy Significance Statement

The COVID-19 pandemic impacted most of the World from the end of 2019, driven by human mobility and interactions. With different levels of impact, governments have applied restriction measures on the population. Thus, the ability to measure real-life aggregated anonymized population movement patterns would make it possible to gauge the effectiveness of those measures and instruct the decision of possible corrections. Vodafone supported local authorities in Italy by providing a tool for monitoring of such patterns at municipality level on a daily basis, making use of state of the art machine learning techniques applied to its mobile network data.

1. Introduction

The COVID-19 pandemic in Italy started its spread in February 2020, with first cases in northern Italy, in the regions of Lombardy and Veneto. Eleven municipalities in northern Italy were identified as the centers of the two main Italian clusters and placed under quarantine. Driven by people movement, after a few days, the virus had spread all over Italy. With the aim to reduce such spread, the government initially expanded the quarantine to the whole of Lombardy, and 14 other northern provinces, and on the following day to all of Italy, placing more than 60 million people in lockdown. Different actions were taken by the authorities in such period, for example, closing schools or closing businesses, with the aim to influence mobility and thus the opportunities for virus spreading. Thus, it became clear the need to understand how the mobility between municipalities and regions evolved, and how mobility within each municipality changed. Different datasets have been used to monitor in different ways, and at different granularity levels such movements. Initial work was conducted using data from Cuebiq (2020). Google also later released province level data (Google, 2020). Such data has limitations related to the samples of users considered to perform such statistics. In this paper, we describe the work conducted using Mobile phone network data from Vodafone. Vodafone has a market share of over 20% which allows to have a very high penetration of the population, quite homogeneous across genders and age groups. This makes such data a promising candidate to derive generalized statistics on citizens mobility. The Big Data and AI team at Vodafone developed an analytical solution to extract several mobility insights from the analysis of Vodafone Subscriber Identity Module (SIM) network data, and made such insights accessible to the Italian authorities as part of a philanthropic collaboration. Continuous iterations with the authorities allowed as well to evolve such solution to best suit the authorities’ needs and provide actionable insights. Vodafone also is participating in the Big Data and AI taskforce established by the Italian Ministry for Technological Innovation and Digitization for the COVID-19 Emergency. Finally, Vodafone is also sharing the mobility insights with the European Commission (2020) and the International Monetary Fund (2020).

2. Scenario

The telecommunications industry has been contributing to social good initiatives through the use of location analytics across different countries for a number of years (GSMA, 2020).

At the onset of the epidemic in Italy, toward the end of February 2020, we saw the value location-based insights could provide in understanding, modeling, and predicting contagion. We set out to build a first mapping of known outbursts in the Italian territory, as well as the traces of connections between them and international traveling events in an attempt to map the first contagion paths. The intention was initially that of helping predict which areas had the highest chances of becoming high risk of contagion (red) zones so action could be taken to prevent the spread to the whole country. After being made available through the corresponding institutional channels, the authorities were made aware of such capabilities. Because there was still no general knowledge about such techniques in many of the public institutions and organizations, it would not be until a few weeks later that the authorities were ready to be informed and discuss the use of the derived insights. Because by that time the virus had spread across much of northern Italy, the attention of the authorities shifted from understanding and predicting to monitoring social distancing and lockdown measures that were being enforced as a means to stop the spread. The ability to measure real-life aggregated anonymized population movement patterns would make it possible to gauge the effectiveness of those measures and instruct the decision of possible corrections. From the beginning, it was clear that this was a pro bono contribution and one of the pillars of Vodafone’s five pillar plan to help counter the impact of COVID-19 (Vodafone, 2020b).

There was interest and personal engagement in the top levels of the organization as well as across functional areas. Both at a local and group level, parameters were defined to enable information transfer. Concepts like citizen privacy, anonymity, and data security were first in line to start collaborating and teams worked to ensure standards were met before any information transfer was started. It was the task of our external affairs office, both at local and group level to manage stakeholder relationships. The first usable map with the required Key Performance Indicator (KPIs) was made available for the Lombardia region and soon after the dashboard was made available to other regions. By that time, a self-service tool with required user identification and security standards were required. Both were met using our already existing Vodafone Analytics platform (Vodafone, 2020a), which up until then served to provide location-based services to businesses and had already solved and standardized some of the compliance requirements. Based on feedback and requests from the authorities, subsequent versions of the resulting dashboard were improved in terms of KPIs and usability, as well as the inclusion of a download capability for those authorities who wished to use the mapped KPIs as input for deeper and cross-dimensional analysis.

3. Approach

In this section, we describe the analytical solution that we developed. Mobile phone network data from Network probes was already used at aggregated and anonymized level to extract insights into people mobility, both for internal uses, and for third parties as part of the Vodafone Analytics solution (Vodafone, 2020a).

Such solution is based on the use of network probe data which allows collecting information on which cellphone tower a particular SIM is connected to at a specific point in time (Calabrese et al., Reference Calabrese, Ferrari and Blondel2014). The spatial resolution is then at the cellphone tower level, which usually covers an area with radius from a few hundred meters (in urban areas) to a few kilometers (in rural areas). The sampling over time depends on how often the mobile device connects to the network infrastructure, and on the network technology used. For instance, for 4G connections we can identify events a few hundred times during a day (Pinelli et al., Reference Pinelli, Lorenzo and Calabrese2015).

The process to extract mobility insights from the data is as follows:

  • extraction of dwells and trips from the raw location data;

  • creation of mobility insights (e.g., home cell);

  • aggregation of mobility insights data at different granularity levels (e.g., municipality) and representation into a dashboard.

3.1. Extraction of dwells and trips

The first step of the analytical pipeline is the calculation of stay locations by aggregating network data at network towers level. A dwelling time algorithm to estimate stops in a specific area has been used on the probe data. The algorithm receives in input a series of parameters, one of these is the minimum time threshold to be spent in a certain location. As minimum temporal threshold we used 30 min and this corresponds to the minimum stop duration. This allows to filter out noise from the data and secondly to catch only significant events. Finally, we executed the dwelling time algorithm at two spatial granularities to catch different phenomena: at the cell tower level, that is, how long a user stays in the area covered by a single cell tower, to get insights about how citizens were respecting the lockdown restrictions; at the municipality level, that is, how long a user stays in the area obtained by the union of the areas covered by all the cell towers located in the same municipality, to generate daily trips and build Origin–Destination (OD) matrices used to measure mobility patterns across provinces and regions. The mobility estimates have been regularly validated making use of pedestrian counts and sold tickets at different venues and events. The second step is to enrich stay locations: we enriched each stop with gender and age information of the SIM owner, so that at aggregated level it is possible to drill-down the analysis to obtain more useful insights able to provide the right answers to authorities questions, for example, young people not respecting the restrictions.

Finally, all data privacy actions have been put in place: firstly, all events generated by SIMs without geolocalization approval/permission have been removed, and secondly, k-anonymity was applied, that is, movements with less than 15 trips between two towns have been removed from OD matrices. This part of the mobility analytical asset provides the answers to the first question: how the mobility between municipalities and regions evolved over time.

3.2. Creation of mobility insights

The output of the previously described pipeline, such as knowing stop location of each single user, allows the creation of a series of mobility indexes that can be leveraged to understand what could be done in order to slow down the spreading of the virus.

3.2.1. Time away from home

This is a SIM-based attribute, it is calculated for each SIM and then aggregated later in the delivery process. It is defined as the total time a SIM spends away from its home location. For the calculation of this index, we estimate the home location and we decided to use one of the classic approach in literature: Calabrese et al. (Reference Calabrese, Lorenzo, Liu and Ratti2011) defined home as “the location where is registered the most activity during the evening and early morning hours.” Clearly, given the size of an area covered by a cell tower, this is a lower bound to the actual time spent outside home.

3.2.2. Percentage of people away from home

This is an aggregated index. This index aims to spot the proportion of people having at least one stay location different from their home location, over the total estimated population of the chosen aggregation level. As for above, this represents a lower bound of the actual mobility. Therefore, we define an area of interest over which we calculate the measure, and we aggregate it at different levels: municipality, province, region, and state.

3.2.3. Radius of gyration

This is another SIM-based attribute. It is a classic measure in location analytics literature (Gonzalez et al., Reference Gonzalez, Hidalgo and Barabasi2008). This is defined as the radius of the sphere, centered in someone’s home location, such that the greatest part of someone movements are inside the sphere.

With the first index we try to understand how much time people spend time away from home; with the second we try to understand how many people move from their home location; and with the third one we want to understand how far they usually go from their home. All the described indexes are enriched with gender and age information. This set of indexes aggregated for all SIMs of a specific municipality provides the answers to the second question: how mobility within a municipality changed over time.

3.3. Aggregation and dashboard

Once mobility insights were created, they have been made available to external parties through a web application which allows an easy access to the data at the different granularity levels, as well as comparison with different days. The dashboard was updated daily, with a latency of around half a day. The dashboard is developed in tableau embedded on a web application. This allowed us to provide the access to the data and insights in an easy-to-use way. The web application handles everything concerning the login, the permission and the provisioning of the system. The tableau part, instead, allows the end user to navigate and explore the provided insights. In the rest of this section, we provide more details regarding the tableau section of the dashboard, showing how we made accessible the data to the public authorities.

The dashboard has two views, the first one contains the O/D matrices and the second one contains the mobility KPIs.

3.3.1. OD matrix

The OD matrix dashboard is represented in Figure 1. The page is divided into four columns, from the left to the right:

Figure 1. Maps representing the flows with origin in Milan toward the rest of Italy. It is evident from the time series of the dashboard the effects of the lockdown restrictions on the human mobility: there is a quick decrease of the flows IN and OUT at the begin of March 2020.

Time series

Time series of the ODs, from the top to the bottom:

  • Internal regional flows including trips where start and end are from the same city.

  • Internal regional flows excluding trips where start and end are from the same city.

  • Incoming regional flows, all the trips having the end in a province/municipality of the region under analysis.

  • External regional flows, all the trips starting from a province/municipality of the region under analysis.

Maps

A heat map of Italy shows all the flows with the origin in a province/municipality in the selected region. Notice that the darker is the color, the higher is the flow.

Rank

A ranked list is presented that is the representation of the adjacent map. The provinces/municipalities with the highest number of trips with origin one of the province/municipality in the selected region are represented at top of the list.

Filters

The last column contains a series of filters that can be applied on the graphs and maps.

  • Selection of the origin province.

  • Filter on the initial date to begin the visualization of the data.

From the dashboard it is possible to observe the effects of the lockdown on human mobility. From the end of February, mainly in Lombardia, people reduced their movements due to local initial lockdown. This reduction of flow received is more evident at the beginning of March. The time series highlighted very well this phenomenon. This visualization allows the estimation of the amount of movements between municipalities and regions.

3.3.2. Mobility KPIs

The second dashboard, see Figure 2, shows the aggregation of KPIs describing mobility insights. The mobility KPIs reported are: (a) the percentage of people out from home; (b) the average time spent out from home; and (c) and the average radius of gyration. The KPIs are visualized at province or municipality level. The view is divided into four columns. The first column from the left reports a rank of provinces/municipalities of the selected KPI. The second column geographically maps the selected KPIs at province/municipality level. The third column shows three time series of the above mentioned KPIs: percentage of people out from home, average time spent out from home, and average radius of gyration. The last column on the right, instead, provides to the end user:

  • The command to select different province or municipality to visualize the KPIs for that specific geographical area.

  • A radio button to select if the KPIs should include the whole population or only the one detected out from home.

  • A series of controls related to the map: date sliding, spatial granularity radio button, and another radio button to select the KPI to be visualized on the map.

Figure 2. Visualization of the percentage people out from home. Also, in this case it is possible to see the reduction of the human mobility: spatially (radius of gyration) and temporally (time spent out from home).

It is possible to estimate the effects on human mobility of the lockdown through this dashboard, and it also gives the opportunity to identify which municipalities or provinces are better respecting the mobility-related virus containment measures. For instance, from the map the end user can identify municipalities with highest (in red) time spent out from home. This makes the dashboard actionable directly by the authorities so that they are able to send controls in specific and well identified areas. The insights provided by this dashboard are the answers to the second question: how mobility within a municipality changed.

4. Outcomes and Impacts

The definitive use case was that of providing the authorities with population movement patterns and KPIs at different levels of geographical aggregation. Almost each of the administrative regions of Italy had its own intraregion dashboard as well as a version of interregion map and data, which were updated daily. As stated earlier, the insights gained were used to monitor effectiveness of geographic restraint recommendations (e.g., closure of schools or specific types of commercial activities) and measures and correct deviations. While at a global level, Vodafone was vocal about the engagements with the different authorities in different countries or the collaborations with pan-European and international institutions like the EU, at a local level the communication was more reactive. Some of the authors engaged with a number of official institutions which started carrying out research into epidemiological topics in partnership with local researchers and other Communication Service Provider (CSPs). In general terms, the local media took these initiatives very positively as shown by numerous press releases and broadcasts documenting and explaining the benefits of location insights provided by telecommunications companies for controlling the COVID-19 epidemic (Reuters, 2020). The mobility insights generated by this work have been used to estimate the social and economic impact of mobility restrictions. For instance, the IMF performed a study to highlight the differentiated impact of lockdown measures on different age groups and genders (International Monetary Fund, 2020).

5. Lessons Learned

Getting a private enterprise and a local government to share data is not a simple task. A point might be made for and when to treat business data as open data. The public’s main concern has been privacy throughout so the organization had to go out of its way to explain how all processes were privacy compliant and European and Vodafone Group standards were met. Our experience shows that developments made to improve the company’s internal business, if properly industrialized, may be of great value to social applications and that, at least in the telecommunications world, there is a clear sense of responsibility matched to a high degree of development of location algorithms using network data. It is nevertheless also true that these organizations are more ready now than before to make their contributions to society, as blockers are cleared and the potential of data is collectively realized. At the same time, we have experienced different level of readiness of the local authorities in leveraging such data-driven assets. A higher degree of research and integration across official organizations may certainly help in that respect.

Data Availability Statement

Unfortunately, the key resources (i.e., the granular metadata produced by Vodafone network) necessary to replicate the findings are not publicly accessible for legal reasons (i.e., both GDPR and Italian Privacy Law prohibit sharing of such data).

Author Contributions

Conceptualization, F.C., E.C., V.F., G.M., F.P., D.R.; Methodology, F.C., E.C., V.F., G.M., F.P., D.R.; Formal analysis, F.C., E.C., V.F., G.M., F.P., D.R.; Data curation, F.C., E.C., V.F., G.M., F.P., D.R.; Writing—original draft, F.C., E.C., V.F., G.M., F.P., D.R.; Writing—review and editing, F.C., E.C., V.F., G.M., F.P., D.R.; Supervision, F.C., E.C., V.F., G.M., F.P., D.R.

Competing Interests

All the authors are Vodafone Italy employees.

Funding Statement

This work received no specific grant from any funding agency, commercial, or not-for-profit sectors.

References

Calabrese, F, Ferrari, L and Blondel, VD (2014) Urban sensing using mobile phone network data: A survey of research. ACM Computing Surveys 47(2), 120.Google Scholar
Calabrese, F, Lorenzo, GD, Liu, L and Ratti, C (2011) Estimating origin-destination flows using mobile phone location data. IEEE Pervasive Computing 10(4), 3644.CrossRefGoogle Scholar
Cuebiq (2020) Mobility Insights. Available at https://www.cuebiq.com/visitation-insights-covid19/ (accessed 13 October 2020).Google Scholar
European Commission (2020) Mapping Mobility Functional Areas (MFA) Using Mobile Positioning Data to Inform COVID-19 Policies. Available at https://ec.europa.eu/jrc/en/publication/mapping-mobility-functional-areas-mfa-using-mobile-positioning-data-inform-covid-19-policies (accessed 13 October 2020).Google Scholar
Gonzalez, MC, Hidalgo, C and Barabasi, A-L (2008) Understanding individual human mobility patterns. Nature 453, 779782.CrossRefGoogle ScholarPubMed
Google (2020) Community Mobility Reports. Available at https://www.google.com/covid19/mobility/ (accessed 13 October 2020).Google Scholar
GSMA (2020) The State of Mobile Data for Social Good Report. Available at https://www.gsma.com/mobilefordevelopment/resources/mobile-data-for-social-good/ (accessed 13 October 2020).Google Scholar
International Monetary Fund (2020) COVID’s Impact in Real Time: Finding Balance Amid the Crisis. Available at https://blogs.imf.org/2020/10/08/covids-impact-in-real-time-finding-balance-amid-the-crisis/ (accessed 13 October 2020).Google Scholar
Pinelli, F, Lorenzo, GD and Calabrese, F (2015) Comparing urban sensing applications using event and network-driven mobile phone location data. 2015 16th IEEE International Conference on Mobile Data Management 1, 219226.CrossRefGoogle Scholar
Reuters (2020) European Mobile Operators Share Data for Coronavirus Fight. Available at https://www.reuters.com/article/us-health-coronavirus-europe-telecoms-idUSKBN2152C2 (accessed 13 October 2020).Google Scholar
Vodafone (2020a) Vodafone Analytics. Available at https://www.vodafone.it/portal/Aziende/Grandi-Aziende/Soluzioni/Soluzioni/analytics (accessed 13 October 2020).Google Scholar
Vodafone (2020b) Vodafone Launches Five-point Plan to Help Counter the Impacts of the COVID-19 Outbreak. Available at https://www.vodafone.com/news-and-media/vodafone-group-releases/news/vodafone-launches-five-point-plan-to-help-counter-the-impacts-of-the-covid-19-outbreak (accessed 13 October 2020).Google Scholar
Figure 0

Figure 1. Maps representing the flows with origin in Milan toward the rest of Italy. It is evident from the time series of the dashboard the effects of the lockdown restrictions on the human mobility: there is a quick decrease of the flows IN and OUT at the begin of March 2020.

Figure 1

Figure 2. Visualization of the percentage people out from home. Also, in this case it is possible to see the reduction of the human mobility: spatially (radius of gyration) and temporally (time spent out from home).

Submit a response

Comments

No Comments have been published for this article.

Author comment: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R0/PR1

Comments

To the Editor-in-Chief of Data & Policy Journal

Dear Prof. Stefaan Verhulst,

This letter is to inform you of the submission of the commentary manuscript entitled “Using Vodafone mobile

phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbrake” to

Data & Policy Journal for the Special Collection on Telco Big Data Analytics for COVID-19. Authors of this

manuscript are: Francesco Calabrese, Enrico Cobelli, Vincenzo Ferraiuolo, Giovanni Misseri, Fabio Pinelli

(corresponding author), and Daniel Rodriguez.

In the submitted paper we present the work conducted by Vodafone to enrich the understanding of people

movement in Italy during the outbreak of the Coronavirus in 2020, and the tool developed to support the

decisions taken by the authorities during that period.

• we introduce the scenario in which the analytical asset has been developed and the interactions with

the public authorities to give them fully access to our dashboard

• we provide details regarding the technical aspects of the released KPIs together with a description of

the web application accessed by public authorities

• we detailed the outcomes and impacts of the solution at Italian and European level

• we discussed how the developments made to improve the business, if properly industrialised, may be

of great value to social applications.

We believe that this paper shows a significant example of what Telco operators could be done to help the

society to deal with an unprecedented period.

Sincerely,

Francesco Calabrese, Vodafone Spa, Italy, [email protected]

Enrico Cobelli, Vodafone Spa, Italy, [email protected]

Vincenzo Ferraiuolo, Vodafone Spa, Italy, Vincenzo [email protected]

Giovanni Misseri, Vodafone Spa, Italy, Giovanni [email protected]

Fabio Pinelli, Vodafone Spa, Italy, [email protected]

Daniel Rodriguez, Vodafone Spa, Italy, [email protected]

Review: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R0/PR2

Conflict of interest statement

No Conflicts of Interest.

Comments

Comments to Author: General remarks

This article provides little more than a brief introduction into the Vodafone Analytics tool. It neither sheds light on the fault lines between business and local authorities in a manner that could provide actual guidance/lessons learnt for similar endeavours elsewhere, nor does it showcase best/worst practices that have relevance beyond this specific case. Therefore, the paper is considered not suitable for publication as a commentary in Data and Policy. First of all, the paper reads as a blog post of the Vodafone’s Analytics team, not as a scientific commentary that provides context or framing to current discussions. Second of all, the paper neither presents a reflected discussion of the intricate interplay of data and policy or specific examples thereof, nor does it provide enough technical context to enable informed judgement about that quality of the insights provided. Furthermore, neutral language is not maintained throughout the paper. Besides a few spelling mistakes and some ambiguous wordings, the paper is written clearly and its length is appropriate.

Specific remarks

1) Title: outbrake -> outbreak

2) p.2 l.4-5: The authors state that google and cuebiq data “has limitations related to the samples of users” and mobile phone network data from Vodafone with a 20% market share in Italy makes it a “very good candidate to derive generalized statistics”. This juxtaposition could be judged misleading. Obviously, all these sources require some kind of weighting or calibration scheme to obtain generalized population statistics. In addition, the authors state on p.3 in the last paragraph of 3.1. that “SIMs without geolocalization approval/permission have been removed, and secondly, movements with less than 15 trips between two towns have been removed”. At this point, it would be desirable to see some very high-level summary statistics on the composition of the user demographics before and after filtering vis-à-vis official population demographics, e.g. for Lombardia.

3) Introduction, last two sentences: Please provide additional context here, otherwise it carries a marketing connotation.

4) Scenario, second sentence: “our recent more sophisticated algorithms”. More sophisticated than what?

5) Scenario, second paragraph: “anonymised population movement patterns”. Here, it would be good to give some details which re-identification scenarios the team has considered for drafting their anonymization strategy and to provide an illustrative example in which a SIM/user could be uniquely re-identified.

6) Scenario, last sentence: Carries a marketing connotation, please adjust to follow a more neutral language.

7) Scenario, third paragraph: “scrupulously” carries a marketing connotation and is too prosaic, please adjust to follow a more neutral language.

8) Scenario, third paragraph: “authorities who wished to use the mapped KPIs as input for deeper and cross-dimensional analysis”. How did that translate into policy actions specifically? Which level in the administration used the insights/platform? Technical staff to draft the briefings and decision memos or decision makers via personal relationships? How did administrations communicate their information needs? Did the team have to guess them, did the administrations state them ad-hoc or was there some kind of formal process established?

9) Approach: Is there any estimate how much of the actual mobility the analysis has missed due to the granularity / sampling of the data? For example, how much “away-from-home” mobility occurred within a cell and therefore remained hidden? How many trips occurred between two location logs? It would be good either to give insights on the discussions the authors had on that or at least to clearly mention that the mobility insights provided rather represent some form of a lower bound of the actual mobility.

10) Extraction of dwells and trips: an user -> a user

11) Extraction of dwells and trips: add ‘at’: “[...] sex across and age provinces information and regions of the SIM owner, so that [at] an aggregated level [...]”

12) Creation of individual mobility insights: remove the ‘s’ in ‘sets’: “This set of indexes provides the answers [...]”

13) Aggregation and dashboard: add the ‘s’ to ‘insights’: “Once mobility insights were created [...]”

14) Aggregation and dashboard, first paragraph: “[...] insights in a faster way”. Faster than what?

15) OD matrix: add ‘the’: representation of [the] adjacent map

16) Aggregation and dashboard: How did the dashboard/system account for the “self-filtering” service in terms of privacy and how did these privacy limitations interact with the needs requests? Please specify.

17) Individual movements: How was the reliability of the mobility estimates measured and communicated? Please specify.

18) Individual movements: “This could make the dashboard actionable directly by the authorities [...]”. The ‘could’ is ambiguous here, please rephrase.

19) Outcomes and impacts: “The definitive use case was that of [...].” Use case of what for whom? Please specify.

20) Outcomes and impacts: “[...] were used to monitor effectiveness [...]” Please provide references or additional specifics to pin down the scope of policy actions it guided.

21) Outcomes and impacts: “Some of the authors engaged in a number of official institutions [...]”. Replace ‘in’ with ‘with’?

22) Outcomes and impacts: “Our experience shows that developments made to improve the business [...]” Which business? Please specify.

Review: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R0/PR3

Conflict of interest statement

Vodafone is Orange (my company) competitor.

Comments

Comments to Author: A clear and easy-to-follow text describing the operator’s actions to contribute to the effort to fight the epidemic in Italy, the first European country confronted with COVID-19. The paper shows how a telecommunication operator, with a data analysis competence centre, can react quickly to a crisis situation by providing decision-makers with the behavioural indicators (in this case mobility) needed to assess and monitor the situation. I understand that in this same Data & Policy issue, other similar experiences will also be reported.

Data & Policy is dedicated to the impact of Data Science on policy and governance, so my reading has been guided by the need to examine the value of mobile phone data to inform decision-making. Here the question of what is feasible and allowed by regulation is central and the opportunity to compare different national actions will therefore be valuable. Indeed, both the national interpretation of privacy regulations (in the case of the EU) and the social acceptability of the processing of personal data appear to be varied. An example of this is the contrasting response in different countries to COVID-19 contact tracing applications.

On the other hand, the quality of the information that Data Science can provide depends very much on the source data that it can (or has the right to) work on. This quality will determine its usefulness to the authorities or experts in the field. In the case of this paper, for example, the length of the time series used can strongly influence the robustness of the calculated indicators: if the individual geolocalised trace over a week/month can be worked on, the quality of the population mixing indicator or home assignment will be much higher than when the individual trace can only be kept for one day.

It is obvious that in unprecedented situations, even poor quality data are better than nothing. However, a reflection on the future must try to identify the conditions for the production of the most useful information, taking into account both the technical feasibility and the legal admissibility, since we are talking about personal data which use is strictly regulated by the law.

Sorry for this long introduction, but I would like the authors to better understand my remarks, which mainly concern the description of the data mobilised in their work. Indeed, I think that they have not sufficiently emphasized the work needed to produce actionable indicators.

The dashboard layout and illustrations are rich and easy to understand. However, I have a few remarks on the data used since this part is not quite explicit.

The authors describe the process of extracting mobility information from raw data by showing that they reconstruct trips from cell location data, determine the cell of residence and calculate different indicators per user for aggregation at different levels of geographical and temporal granularity.

Later, in the description of the construction of the O-D matrices, the method of extracting trips and the definition of stops by the 30-minute threshold is explained. Would it be possible to have also information on the duration of the individual trace? This will allow us to better appreciate how the place of residence is defined in order to be able to understand the accuracy of the "time spent away from home" indicator, but also other individual indicators calculated and O-D matrices provided.

A second question concerns the necessary data sources: the authors specify that the signalling data are coupled with information on the declared age and gender of the clients. It is therefore a client base that was used to feed the analyses. Did I understand correctly? How was this done? I imagine that the signalling data is pseudonymised before being used, how do you link the information from the customer base to the SIM card observed in the traffic?

Finally some minor remarks.

I don’t understand what this statement means: “all events generated by SIMs without geolocalisation approval/permission have been removed” (p. 3). Does this mean that Vodafone has an opt-out system where customers can choose to accept or decline the use of their data? If so, what percentage of those who agree? This would make it possible to assess the social acceptability in Italy of the use of mobile phone data for the common good.

This sentence is also a bit enigmatic to me: “It is nevertheless also true that these organisations are more ready now than before to make their contributions to society, as blockers are cleared and the potential of data is collectively realised” (p. 6). Has the crisis situation opened the doors within the organisation, or have external constraints (legal, acceptability...) been relaxed? This seems to me quite important as the feedback for this issue of Data & Policy.

In summary, the paper highlights a large-scale public-private collaboration on decision support in the crisis situation led by a team of data scientists with extensive experience in the analysis of mobile phone data. It provides an excellent example of the opportunities that digital data can offer to decision making at different administrative levels, both in terms of the information provided and the rapid refresh rate of the indicators. In addition, it shows the effort to translate the results produced to decision-makers via a clear and easy-to-use dashboard.

The text will therefore undoubtedly make a valuable contribution to Data & Policy after some minor points have been clarified.

Recommendation: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R0/PR4

Comments

Comments to Author: In my view, this is a good paper and needs to be included. It is true that most of the individual comments made by the reviewers make sense and should be taken into account, where possible.

The main comment for "reject" is that it reads like a blog post. We need to find the right balance between developing a valuable collection of papers that others can learn from for future pandemics and scientific quality (which is farther away from real applications), and I believe this paper is within the margins of the balance.

It is good that the paper makes a statement about market share, gender, age. I also like the statement that governments lost precious weeks due to lack of knowledge, which is an important message for the future. Also interesting to know that it was well received by press, and that local authorities were more reluctant than international authorities. Maybe those learnings can be put repeated and elaborated in the lessons learned section.

I think the paper makes some “dangerous” statements related to privacy in the way it is formulated. They speak about “individual mobility insights” while they are aggregated. I strongly advise to reformulate and not mention anything about “individual”, but reformulate in terms of “aggregate”.

I suggest therefore that you carefully look at the reviewers’ and the above suggestions and try to incorporate them as much as possible. Where not possible, please justify why you haven’t included it.

Decision: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R0/PR5

Comments

No accompanying comment.

Author comment: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R1/PR6

Comments

Dear Editor,

We thank the reviewers for the useful comments provided during the review process that definitely improved the quality of our manuscript.

First of all, we would like to thank the Associate Editor and the Reviewers for their very insightful comments.

In the following sections we reported the detailed answers to the reviewers.

* Associate Editor

[+][Q1]It is good that the paper makes a statement about market share, gender, age. I also like the statement that governments lost precious weeks due to lack of knowledge, which is an important message for the future. Also interesting to know that it was well received by press, and that local authorities were more reluctant than international authorities. Maybe those learnings can be put repeated and elaborated in the lessons learned section.

[+][A1] We have included this in the lessons learned section

[+][Q2] I think the paper makes some ``dangerous`` statements related to privacy in the way it is formulated. They speak about ``individual mobility insights`` while they are aggregated. I strongly advise to reformulate and not mention anything about ``individual``, but reformulate in terms of ``aggregate``.

[+][A2] We have reformulated the terms used to make sure it is clear that the insights shared with the authorities are based on aggregated and anonymized data.

* Reviewer 1

[+][Q1] The authors describe the process of extracting mobility information from raw data by showing that they reconstruct trips from cell location data, determine the cell of residence and calculate different indicators per user for aggregation at different levels of geographical and temporal granularity. Later, in the description of the construction of the O-D matrices, the method of extracting trips and the definition of stops by the 30-minute threshold is explained. Would it be possible to have also information on the duration of the individual trace?

[+][A1] The traces are across the entire day, with more samples during the day times, and less during night

[+][Q2] the authors specify that the signalling data are coupled with information on the declared age and gender of the clients. It is therefore a client base that was used to feed the analyses. Did I understand correctly? How was this done? I imagine that the signalling data is pseudonymised before being used, how do you link the information from the customer base to the SIM card observed in the traffic?

[+][A2] The signaling data was pseudonymised together with the gender and age data, so they could be reconciliated to create aggregated insights.

[+][Q3] I don’t understand what this statement means: ``all events generated by SIMs without geolocalisation approval/permission have been removed`` (p. 3). Does this mean that Vodafone has an opt-out system where customers can choose to accept or decline the use of their data? If so, what percentage of those who agree? This would make it possible to assess the social acceptability in Italy of the use of mobile phone data for the common good.

[+][A3]

\begin{itemize}

\item Customer can decide to opt-out from being included in the aggragated and anonymized analysis by accessing the following web-site www.vodafone.it/portal/Privati/Area-Privacy/La-nostra-informativa

\item The actual number cannot be disclosed due to confidentiality issue

\end{itemize}

[+][Q4] This sentence is also a bit enigmatic to me: ``It is nevertheless also true that these organisations are more ready now than before to make their contributions to society, as blockers are cleared and the potential of data is collectively realised`` (p. 6). Has the crisis situation opened the doors within the organisation, or have external constraints (legal, acceptability\ldots) been relaxed? This seems to me quite important as the feedback for this issue of Data \& Policy.

[+][A4] The internal organizational (Vodafone Group) has defined a process to more easily supporting this type of requests.

* Reviewer 2

[+][Q1] Title: outbrake $\rightarrow$ outbreak

[+][A1] We have fixed the typos

[+][Q2] The authors state that google and cuebiq data ``has limitations related to the samples of users`` and mobile phone network data from Vodafone with a 20\% market share in Italy makes it a ``very good candidate to derive generalized statistics``. This juxtaposition could be judged misleading. Obviously, all these sources require some kind of weighting or calibration scheme to obtain generalized population statistics. In addition, the authors state on p.3 in the last paragraph of 3.1. that ``SIMs without geolocalization approval/permission have been removed, and secondly, movements with less than 15 trips between two towns have been removed``. At this point, it would be desirable to see some very high-level summary statistics on the composition of the user demographics before and after filtering vis-\`a-vis official population demographics, e.g. for Lombardia.

[+][A2] See answer \textbf{[A3]} Reviewer 1.

[+][Q3] Introduction, last two sentences: Please provide additional context here, otherwise it carries a marketing connotation.

[+][A3] We have edited the sentence

[+][Q4] Scenario, second sentence: ``our recent more sophisticated algorithms``. More sophisticated than what?

[+][A4] We have edited the sentence

[+][Q5] Scenario, second paragraph: ``anonymised population movement patterns``. Here, it would be good to give some details which re-identification scenarios the team has considered for drafting their anonymization strategy and to provide an illustrative example in which a SIM/user could be uniquely re-identified.

[+][A5] We use the well-known k-anonymity technique. We have specified it in the text

[+][Q6] Scenario, last sentence: Carries a marketing connotation, please adjust to follow a more neutral language.

[+][A6] We have edited the sentence

[+][Q7] Scenario, third paragraph: ``scrupulously`` carries a marketing connotation and is too prosaic, please adjust to follow a more neutral language.

[+][A7] We have edited the sentence

[+][Q8] Scenario, third paragraph: ``authorities who wished to use the mapped KPIs as input for deeper and cross-dimensional analysis``. How did that translate into policy actions specifically? Which level in the administration used the insights/platform? Technical staff to draft the briefings and decision memos or decision makers via personal relationships? How did administrations communicate their information needs? Did the team have to guess them, did the administrations state them ad-hoc or was there some kind of formal process established?

[+][A8] We have better specified the process

[+][Q9] Approach: Is there any estimate how much of the actual mobility the analysis has missed due to the granularity / sampling of the data? For example, how much ``away-from-home`` mobility occurred within a cell and therefore remained hidden? How many trips occurred between two location logs? It would be good either to give insights on the discussions the authors had on that or at least to clearly mention that the mobility insights provided rather represent some form of a lower bound of the actual mobility.

[+][A9] We have better specified the meaning of the mobility insights

[+][Q10] Extraction of dwells and trips: an user $\rightarrow$ a user

[+][Q11] Extraction of dwells and trips: add `at`: ``[\ldots] sex across and age provinces information and regions of the SIM owner, so that [at] an aggregated level [\ldots]``

[+][Q12] Creation of individual mobility insights: remove the `s` in `sets`: ``This set of indexes provides the answers [\ldots]``

[+][Q13] Aggregation and dashboard: add the ‘s’ to ‘insights’: ``Once mobility insights were created [\ldots]``

[+][Q14] Aggregation and dashboard, first paragraph: ``[\ldots] insights in a faster way``. Faster than what?

[+][Q15] OD matrix: add ‘the’: representation of [the] adjacent map

[+][A10-A15] We have fixed the typos

[+][Q16] Aggregation and dashboard: How did the dashboard/system account for the ``self-filtering`` service in terms of privacy and how did these privacy limitations interact with the needs requests? Please specify.

[+][A16] Each output visualized in the dashboard was firstly k-anonymized

[+][Q17] Individual movements: How was the reliability of the mobility estimates measured and communicated? Please specify.

[+][A17] The mobility estimates have been regularly validated making use of pedestrian counts and sold tickets at different venues and events

[+][Q18] Individual movements: ``This could make the dashboard actionable directly by the authorities [\ldots]``. The ‘could’ is ambiguous here, please rephrase.

[+][A18] We have edited the sentence

[+][Q19] Outcomes and impacts: ``The definitive use case was that of [\ldots].`` Use case of what for whom? Please specify.

[+][A19] We have edited the sentence

[+][Q20] Outcomes and impacts: ``[\ldots] were used to monitor effectiveness [\ldots]`` Please provide references or additional specifics to pin down the scope of policy actions it guided.

[+][A20] We have edited the sentence to specify the actions

[+][Q21] Outcomes and impacts: ``Some of the authors engaged in a number of official institutions [\ldots]``. Replace ‘in’ with ‘with’?

[+][A21] We have edited the sentence

[+][Q22] Outcomes and impacts: ``Our experience shows that developments made to improve the business [\ldots]`` Which business? Please specify.

[+][A22] We have edited the sentence

Review: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R1/PR7

Conflict of interest statement

No Conflicts of Interest.

Comments

Comments to Author: General remarks: The authors have addressed most of the issues flagged in the first review in a satisfactory manner. Further, the reclassification of the submission as a ‘translational article’ has been a good decision. Overall, the submission gives a good overview of the work Vodafone Italia has been doing in response to the start of the COVID-19 pandemic in (Northern) Italy.

Specific remarks:

- Introduction. “[...] very good candidate to derive generalized statistics [...]”. Again, this claim could immensely profit from some high-level statistics, e.g. comparing the distribution of age and gender for Lombardia of the sample used in the analysis vis-à-vis official statistics. From my perspective, I do not see a valid reason why confidentiality concerns should not allow for that kind of comparison, especially since the opt-out option obfuscates the true subscriber composition. Otherwise, please adjust the wording to “[...] promising candidate to derive generalized statistics [...]”.

- Please double-check the spelling

-- Introduction. “Vodafone also is participating to the Big Data and AI taskforce[...].” -> “Vodafone also is participating in the Big Data and AI taskforce[...].”

-- Scenario, end of second paragraph. “Vodafone t’s” -> “Vodafone’s”

-- 3.1, last paragraph. “towns have been removed from OD matrices”. Duplicate?

-- 3.2, radius of gyration. “someoneâĂŹs” -> “someone’s”

-- 3.3.1, Rank. “with origin one of the province/” -> “with origin in one of the province/”

-- 3.3.1, filters. “series of filters that is possible to apply” -> “series of filters that can be applied”

-- 3.3.2, first paragraph. “The view is divide in 4 columns.” -> “The view is divided into 4 columns.”

-- 3.3.2, last paragraph. “It is possible to measure the effects of the lockdown through this dashboard, and it gives also the opportunity to identify which municipalities or provinces are better respecting the virus containment measures.” -> “It is possible to estimate the effects on human mobility of the lockdown through this dashboard, and it also gives the opportunity to identify which municipalities or provinces are better respecting the mobility-related virus containment measures.”

-- 4, Outcomes. “commericla” -> “commercial”

-- 4, Outcomes. “economical” -> “economic”

-- Author contributions. “WritingâĂŤOriginal” -> “Writing & Original”

-- Author contributions. “WritingâĂŤReview” -> “Writing & Review”

Recommendation: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R1/PR8

Comments

No accompanying comment.

Decision: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R1/PR9

Comments

No accompanying comment.

Author comment: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R2/PR10

Comments

Dear Editor,

We thank again the reviewers for the useful comments provided during the

review process that definitely improved the quality of our manuscript.

First of all, we would like to thank the Associate Editor and the Reviewers for

their very insightful comments.

In the following sections we reported the detailed answers to the reviewer.

Reviewer 1

Q1 ”[...] very good candidate to derive generalized statistics [...]“. Again, this

claim could immensely profit from some high-level statistics, e.g. com-

paring the distribution of age and gender for Lombardia of the sample

used in the analysis vis-à-vis official statistics. From my perspective, I

do not see a valid reason why confidentiality concerns should not allow

for that kind of comparison, especially since the opt-out option obfuscates

the true subscriber composition. Otherwise, please adjust the wording to

“[...] promising candidate to derive generalized statistics [...]”.

A1 The actual number cannot be disclosed due to confidentiality issue, there-

fore we changed the sentence as suggested by the reviewer.

Q2 Please double-check the spelling

A2 All the sentences and the typos has been fixed as suggested

Recommendation: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R2/PR11

Comments

No accompanying comment.

Decision: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R2/PR12

Comments

No accompanying comment.

Author comment: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R3/PR13

Comments

Dear Editor,

We thank again the reviewers for the useful comments provided during the review process that definitely improved the quality of our manuscript.

First of all, we would like to thank the Associate Editor and the Reviewers for their very insightful comments.

In the following sections we reported the detailed answers to the reviewer.

Reviewer 1

Q1 “[...] very good candidate to derive generalized statistics [...]”. Again, this claim could immensely profit from some high-level statistics, e.g. comparing the distribution of age and gender for Lombardia of the sampleused in the analysis vis-`a-vis official statistics. From my perspective, I do not see a valid reason why confidentiality concerns should not allow for that kind of comparison, especially since the opt-out option obfuscates the true subscriber composition. Otherwise, please adjust the wording to“[...] promising candidate to derive generalized statistics [...]”.

A1 The actual number cannot be disclosed due to confidentiality issue. As the reviewer suggested, we changed the sentence from “very good candidate to derive generalized statistics” to “promising candidate to derive generalized statistics”.

Q2 Please double-check the spelling

A2 All the sentences and the typos has been fixed as suggested

1 Introduction. “Vodafone also is participating to the Big Data and AI task force.”→“Vodafone also is participating in the Big Data and AI task force[...].”

2 Scenario, end of second paragraph. “Vodafone t’s”→“Vodafone’s”

3 3.1, last paragraph. “towns have been removed from OD matrices”.Duplicate?→we removed the duplicated sentence

4 3.2, radius of gyration. “someoneˆaA”→“someone’s”

5 3.3.1, Rank. “with origin one of the province/”→“with origin in one of the province/”

6 3.3.1, filters. “series of filters that is possible to apply”→“series of filters that can be applied”

7 3.3.2, first paragraph. “The view is divide in 4 columns.”→“The view is divided into 4 columns.”

8 3.3.2, last paragraph. “It is possible to measure the effects of the lock-down through this dashboard, and it gives also the opportunity to identify which municipalities or provinces are better respecting the virus containment measures.”→“It is possible to estimate the effects on human mobility of the lock-down through this dashboard, and it also gives the opportunity to identify which municipalities or provinces are better respecting the mobility-related virus containment measures.”

9 4, Outcomes. “commericla”→“commercial”

10 4, Outcomes. “economical”→“economic”

11 Author contributions. “WritingˆaATOriginal”→“Writing & Original”

12 Author contributions. “WritingˆaATReview”→“Writing & Review

Recommendation: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R3/PR14

Comments

No accompanying comment.

Decision: Using Vodafone mobile phone network data to provide insights into citizens mobility in Italy during the Coronavirus outbreak — R3/PR15

Comments

No accompanying comment.