This is an educational project that was made for completing Statistics in the university. The main language of the project is Spanish. The programming language is R. This repository contains the raw code file .rmd that generates .html file that contains the output.
The main objective of the project was to elaborate a statistical report (descriptive type). The source file contained data from mobile communications in Milan from one day in 2013 (4 842 624 interactions, including incoming and outcoming SMSs, calls and web traffic). The analysis includes these steps:
- Build new data sheet importing the raw data of traffic per cell. (The map of Milan, as any other region, is divided in cells to register traffic) The data sheet is to include:
- The total sum of traffic for every cell
- The average of traffic for every registered interaction
- Add frequencies for the variable Country code for incoming and outcoming SMSs, calls and web traffic. Interpret the results finding countries that generate max and min of trafic for every type of interaction.
- Add frequencies for the variable Square id for visualising the dymanics of generation of the interactions in Milan.
- Sort out data as SMS, call or Internet. Add frequencies for every type of interaction.
- In the same data sheet proceed to descriptive analysis of frequencies of the following variables:
- Total traffic of incoming SMSs and its average traffic per interaction.
- Total traffic of outcoming SMSs and its average traffic per interaction.
- Total traffic of incoming calls and its average per interaction.
- Total traffic of outcoming calls and its average per interaction.
- Total traffic of Internet and its average per interaction.
- For all the variables from the previous step, add descriptive analysis which includes measures of position, statistical dispersion and its form, identification of atypical cells.
- Make log transformation of the same variables and repeat the same analysis with the new ones. Compare results for the normal and the log scale.
The second part of the project is focused on graphic representation of the data. It includes downloading map of Europe (Eurostat) and of traffic grid of Milan. Additionally, Country code assigment was made.
- Associate country codes with country (additional data).
- Visualise variables from the main objectives in European map. For every variable add two maps: absolute frequency and relative to the population of every country (additional data, source: "World Bank Group" for 2013). Select countries with max and min traffic.
- Visualise the raw data in the grid map of Milan. Select regions of the city with the most interactions.