Skip to content

Commit

Permalink
New README.md and some modification on the analysis scripts that need…
Browse files Browse the repository at this point in the history
…s to be improved for readability
  • Loading branch information
albertoamaduzzi committed May 31, 2024
1 parent 212a9b5 commit e74db8c
Show file tree
Hide file tree
Showing 8 changed files with 342 additions and 129 deletions.
246 changes: 146 additions & 100 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,151 +3,197 @@
It is split in a first Cpp part responsible of the computation of the quantities of interest about trajectories and road network and a second part in python responsible for the plotting and analysis of the computed quantities.

## Input:
REQUIRED:

# Input:
REQUIRED:

1C config.json
1I cartography.pnt, cartography.pro
1R DatiTelecom.csv (or equivalent)
1D DatiTelecomPreProcessed.csv or DatiTelecomToPreprocess.gzip


## Description Input:
1I
- (cartography.pnt,cartography.pro):
#### 1C
##### Configuration file

- `file_pro`: `/path/to/cartography.pro`
- `file_pnt`: `/path/to/cartography.pnt`
- `file_data`: [`/path/to/DatiTelecomPreProcessed.csv`] NOTE: It is a list
- `cartout_basename`: `/path/to/save/output/dir`
- `start_time`: `YY-MM-DD h:m:s`
- `end_time`: `YY-MM-DD h:m:s`
- `bin_time`: `15.0`
- `lat_min,lat_max,lon_min,lon_max`: bounding box vertices
- `map_resolution`: `60`
- `grid_resolution`: `150` (m), for searching algorithms points,poly,arcs ecc...
- `l_gauss`: `10`
- `min_data_distance`: `50` (m), threshold distance between a `record_base` and a `cluster_base.centroid` to create another `cluster_base` object when filtering trajectories.
- `max_inst_speed`: `50` (m\s), maximum speed not to consider `record` as an error and not discard it.
- `min_node_distance`: `10` (m), threshold for two nodes not to be the same. (Not used here, but in other parts of the code base cartography-data, miniedit)
- `min_poly_distance`: `50` (m), threshold for two poly not to be the same. (Not used here, but in other parts of the code base cartography-data, miniedit)
- `enable_threshold`: `true`
- `threshold_v` : `50.0`
- `threshold_t` : `86400`
- `threshold_n` : `3`
- `enable_multimodality`: `true` Enable Fuzzy algorithm for classification of homogeneous trajectories
- `enable_slow_classification`: `true` Used to separate the slowest category that usually does not separate walkers and bikers.
- `num_tm`: `3` number of classes that you want to distinguish.
- `threshold_p`: `0.5` threshold on the probability for one trajectory to belong to one cluster. If less then 0.5 then it belongs to class `10` (unclassified)
- `dump_dt`: `60`
- `enable_fluxes_print`: `true` Enable output: **{basename}**`.fluxes`
- `enable_subnet`: `true` Enable output: **{basename}**`.fluxes.sub`
- `show_subnet`: `true`
- `file_subnet`: `/path/to/subnet/{basename}.fluxes.sub`
- `multimodality_subnet`: `true`
- `num_tm_subnet`: `3`
- `enable_print`: `true` For `_stats.csv` Deprecated
- `enable_geojson`: `false` Uses geojson
- `enable_gui`: `true` Activate gui
- `jump2subnet_analysis`:`false` Does not recalculate the subclass but read them for the construction of the subnetworks

#### 1I
##### Cartography in physycom format.
- (`cartography.pnt`,`cartography.pro`):
Contain all informations needed to build the road network in such a way that the program is able
to read these informations from them.
- cartography.pnt:
- `cartography.pnt`:
Contains informations about where the points of the road are.
- cartography.pro:
- `cartography.pro`:
Contains informations about links.


1. READING:
1a. Trajectory information:
Reads .csv files containing information about mobilephone data containing the following columns:
[iD,lat,lon,time]
NOTE: Usually, we need to create this dataset. TIM gives another format of data that we need to preprocess and create these columns.
Further informations about preprocessing needed...
1b. Cartography information:
It creates a graph of the city from cartography.pnt and .pro. Essentially these are used to create objects that are contained
in carto.h. Whose attributes and functions are initialized in carto.cpp
2. EXTRACT TRAJECTORIES:

## Output:
Trajectories Info:
#### 1D
##### Data
`DatiTelecomToPreprocess.gzip` contains `[iD,lat,lon,time]`, the `DatiTelecomAlreadyPreprocessed.csv` too.


Road Infos:

Disaggregated by fcm.
- **{basename}**_class_i_velocity_subnet.csv

# USAGE:
I1:
Run the preliminary preprocessing of data to construct the format required for city-pro analysis.

# REQUIRED INPUT

1. Produce .pnt .pro (follow instructions in `$WORKSPACE/cartography-data`)
2. `cd $WORKSPACE/city-pro`
3. If `DatiTelecomPreProcessed.csv` exists:
`Do nothing`
else:
`python3 ./python/mdt_converter.py`
NOTE: insert manuallly the dates in LIST_START, LIST_END depending on the dates you have and ensure that the file directories match the structure in your machine.

## Description Output:
------------------------------------------------------
--------------------------------------------
NOTE: There is a part of the output about the road network that contains
1.
..._class_i_velocity_subnet.csv (as many files as many classes usually 4-5)
start_bin;end_bin;poly_id;number_people_poly;total_number_people;av_speed;time_percorrence
Description:
Information about network via poly_id in time for the subnet i. i belongs to {0,..,num_tm} and increases by velocity of percurrence
2)...class_i.txt (as many files as many classes usually 4-5)
Description:
"Space separated" poly ids of the subnet of class i
3)
...iclass_subnet.txt (as many files as many classes usually 4-5)
Description:
#### DESCRIPTION I/O mdt_converter.py
Input:
`/path/to/gzip/files` = [`../dir1`,...,`../dirn`] for those who have access are in (`/nas/homes/albertoamaduzzi/dati_mdt_bologna/`)
Output:
`/path/to/raw/files` = [`/home/aamad/codice/city-pro_old/work_geo/bologna_mdt_detailed/data`] [`file_raw_data1,...,file_raw_datan`]
Columns:
[`id_user,timestamp,lat,lon`]

## CPP
Launch all together:
```
cd WORKSPACE/city-pro
python3 ./python/SetRightDirectoriesConfiguration.py
./config/RunRimini.sh
```
Or
Input:
```
./ccm/build.ps1 -UseVCPKG -DisableInteractive -DoNotUpdateTOOL -DoNotDeleteBuildFolder
./bin/city-pro ./work_geo/bologna_mdt_detailed/date/config_bologna.json
```
# Output:


## Network

1. **{basename}**_class_`i`_velocity_subnet.csv:
Description:
Contains informations about the `velocity` and `time percorrence` in time intervals `[start_bin,end_bin]` of poly `poly_id` of the subenetwork of fcm index `i`.
Columns:
`start_bin;end_bin;poly_id;number_people_poly;total_number_people;av_speed;time_percorrence`

2. **{basename}**...class_`i`.txt
Description:
"Space separated" poly ids of the subnet of class `i`.
i.e. 1 2 10 12 16 ...

3. **{basename}**`i`class_subnet.txt
Description:
"Space separated" poly ids of the subnet of class i that is freed from the network of higher velocity.
In this way we have a "hierarchy" of subnetwork, that is, if I consider a poly that is contained in multiple subnetwork
it will be assigned to the quickest subnet. -> This hopefully will help us find traffic via fondamental diagram.

## Trajectories
1. **{basename}**_presence.csv
Description:
Contains information about all trajectories `id_act` that have just one `stop_point` for the time window `[timestart,timeend]` at `(lat,lon)`.
Columns:
`id_act;timestart;timeend;lat;lon`

TRAJECTORY INFORMATIONS:

1)
..._presence.csv
id_act;timestart;timeend;lat;lon
2. **{basename}**_fcm_centers.csv
Description:
Autoexplicative
2)
..._fcm_centers.csv
Description:
Contains informations about the centers in the feature space.
Are ordered from slowest to quickest.
3)
..._fcm.csv
id_act;lenght;time;av_speed;v_max;v_min;cnt;av_accel;a_max;class;p
Description:
identification, lenght of trajectory, time the trajectory lasted, average_speed, v_min,max, number of points in trajectories, class, and probability of being in that class.
Contains informations about the centers in the feature space coming out from the Fuzzy algorithm for clustering of the trajectories.
*NO COLUMN NAMES*:
`class`;`av_speed`;`vmin`;`vmax`;`sinuosity`
Data are ordered by class from slowest (top) to quickest (bottom).

4)
...fcm_new.csv
id_act;class;0;1;2;3;4
3. **{basename}**_fcm.csv
Description:
The id of traj, the class that is reassigned to, according to the principle, the subnet of the class that contains more points of the trajectory, gives the class.
So, if a person is moving slowly in the just quick subnet, than, it is reassigned to the quickest class.
The columns, 0,... are associated to the hierarchical subnets
5)
Contains information about, `lenght` of trajectories `id_act`, duration `time`, average speed `av_speed`, minimum velocity registered `v_min`, maximum velocity registered `v_max`, number of points `cnt`, `class` (output from Fuzzy clustering algorithm), and probability of being in that class `p`,active in the time window `[start_time,end_time]`.
Columns:
`id_act;lenght;time;av_speed;v_max;v_min;cnt;av_accel;a_max;class;p;start_time;end_time`


..._out_features.csv
id_act;average_speed;v_max;v_min;sinuosity
4. **{basename}**fcm_new.csv:
Description:
Contains information about id of traj `id_act`, the class that is reassigned to `class`, according to the principle, the subnet of the class that contains more points of the trajectory, gives the class.
So, if a person is moving slowly in the just quick subnet, than, it is reassigned to the quickest class.
The columns, 0,... are associated to the hierarchical subnets
Columns:
`id_act;class;0;1;2;3;4`
5. **{basename}**_out_features.csv
Description:
For each trajectory have the informations about the features of the classes


## Structure Program:
Columns:
`id_act;average_speed;v_max;v_min;sinuosity`
# Structure Program:
1. READING:
1a.
Trajectory information:
Reads .csv files containing information about mobilephone data containing the following columns:
[iD,lat,lon,time]
NOTE: Usually, we need to create this dataset. TIM gives another format of data that we need to preprocess and create these columns.
Further informations about preprocessing needed...
1b.
Cartography information:
It creates a graph of the city from cartography.pnt and .pro. Essentially these are used to create objects that are contained
in carto.h. Whose attributes and functions are initialized in carto.cpp
2. EXTRACT TRAJECTORIES:
### SUMMARY
This script in particular is able to:
1. generate trajectories from single records, discarding GPS errors by thresholding on the maximum velocity.
2. Associate the roads they pass by
3. Cluster them according to a FuzzyKMean
4.


Sorgente esperimento, do il nome di source perchè così non devo
modificare il make ogni volta.
NOTA:
Gli script sono fatti per analizzare un giorno alla volta. La struttura delle cartelle rispecchia questo. Per ogni giorno analizzato ho una cartella in work_geo/bologna_mdt_detailed e output/bologna_mdt_detailed

Description:
Telecom gives initial dataset day by day with a lot of fields and zipped. I have created mdt_converter.py that essentially takes the dataset, and extract [iD,lat,lon,time] and saves it in a csv that will be given to analysis.cpp

## RUNNING SCRIPTS:
I1
To produce .pnt .pro:

```cd $WORKSPACE/cartography-data```

ANALISI DATI BOLOGNA MDT:
cd $WORKSPACE/city-pro
1----------------------------------------------- PREPROCESSING PYTHON
Input:
/path/to/gzip/files = ([G:/bologna_mdt/dir1,...,G:/bologna_mdt/dirn]) or (/nas/homes/albertoamaduzzi/dati_mdt_bologna/)
Launch:
python3 ./python/mdt_converter.py
NOTE: insert manuallly the dates in LIST_START, LIST_END depending on the dates you have.
Output:
/path/to/raw/files = [/home/aamad/codice/city-pro_old/work_geo/bologna_mdt_detailed/data] [file_raw_data1,...,file_raw_datan] -> STRUCTURE: [id_user,timestamp,lat,lon]

2----------------------------------------------- PROCESSING CPP (FOR EACH) SINGLE DAY
Input: raw_file in raw_files
./bin/city-pro ./work_geo/bologna_mdt_detailed/date/config_bologna.json ----- ./launch_all_analysis.sh
Output:
subnet file: .fluxes.sub
(class_i)_complete_complement.csv
presence.csv
timed_fluxes.csv
stats.csv
fcm.csv

3------------------------------------------------ POSTPROCESSING PYTHON (FOR EACH) SINGLE DAY
##### COMMENT ON FUZZY ALGORITHM:
Needs to be tuned, try different `num_tm` (3 or 4 for Bologna depending on the day). Increasing the number does not uncover the slow mobility (walkers,bikers), but it finds subgroups on higher velocity group.
This bias is probably due to the sensitivity of the algorithm to the speed, giving more weight in for the separation for classes that have higher velocity.
# DEPRECATED
3a.
./python/config_subnet_create.py (README in the file)
Output:
all_subnets.sh
work_geo/bologna_mdt_detailed/date/plot_subnet
--------------------------------------------------
3b. analysis.ipynb (non è il top affatto)
Bisogna inserire manualmente gli indirizzi dove è salvata la roba nella prima cella. Fatto questo si possono runnare le altre celle.
Poi lanciare cella per cella:
Expand Down
13 changes: 13 additions & 0 deletions custom_style.mplstyle
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# custom_style.mplstyle
figure.figsize: 10, 6
axes.titlesize: 14
axes.labelsize: 12
lines.linewidth: 2
lines.markersize: 6
font.size: 12
legend.fontsize: 10
grid.color: gray
grid.linestyle: --
grid.linewidth: 0.5
axes.grid: True
axes.grid.axis: both
Loading

0 comments on commit e74db8c

Please sign in to comment.