Orange data made available for the D4D Senegal Challenge
The data has been subject to enhanced anonymisation by the Orange Group and Sonatel.
We are making the following data sets available by june 2014, to offer numerous possibilities for analysis:
- communications between antenna tower
- a sample of movement routes: location by mast
- a sample of movement routes: location by administrative unit
- synthetic data set
- weather data
All information related to the positions of users during communications (calls and SMS) has been recorded. We supply either the position of the mast that routes the communication (approximately 2500 mast stations (towers), or an administrative unit of the mast. User identifiers are randomly allocated in each data set.
communications between antenna tower
We supply the number of calls and the total duration of communications made between antenna towers, totalled per hour. We also state which mast initiated the call. Calls beginning within an hourly period are associated with this period, irrespective of when the call ends.
movement routes: high resolution
For each fortnight, we randomly select a sample of active users. We supply the date and time and the positions of the masts for calls and SMS exchanged during this two-week period. A new sample is taken for the following period, etc.
movement routes: low resolution
We randomly select a sample of active users. We supply the date and time and the location of the calls and SMS exchanged by these users over a longer period of several months. Here, the location is not given by the position of the relay mast but by that of an administrative unit. The country is divided up into units and we supply a table with their geographical location.
The data was collected over a 12-month period. It contains several billion records, SMS and calls exchanged by millions of users. The customer identifier is anonymised by Sonatel before the data is forwarded to Orange Labs in Paris. In each random sample a fictitious identifier is allocated to the user.
Here are more details on the 3 sets of anonymised data provided by Sonatel under the D4D Challenge Senegal:
synthetic data set
We expect, during September, to provide a summary data set compatible with one of the D4D original data sets concerning mobility.
You will thus be able to re-run your test campaigns, compare the results obtained and thus enrich your studies.
Synthetic Call Data Records (CDR) is a technique that consists in starting from real anonymized CDR from a mobile network, and create a range of statistical distributions about call patterns, spatial or temporal presence or mobility. Then starting from these statistics to create “backward” a large set of artificial CDR for a large imaginary user base, that would correspond to the same statistics. These Synthetic CDR would therefore be a very granular model of the mobile network, going down to the individual network activities. The benefits of Synthetic CDR are to adjust the level of information to be put in the open, to respect the privacy constraints as it is just a model and contains no real user, while offering a greater flexibility for analysis, to reuse or develop algorithms or visualization tools, etc.
weather data sets
For more information, please refer to File BDDExternes_D4D Senegal.xlsx (last update 2014 July 31th)