diff --git a/README.md b/README.md index e1ac01de..c32dc6fe 100644 --- a/README.md +++ b/README.md @@ -3,101 +3,175 @@ It is split in a first Cpp part responsible of the computation of the quantities of interest about trajectories and road network and a second part in python responsible for the plotting and analysis of the computed quantities. -## Input: -REQUIRED: - +# Input: +REQUIRED: + + 1C config.json 1I cartography.pnt, cartography.pro - 1R DatiTelecom.csv (or equivalent) + 1D DatiTelecomPreProcessed.csv or DatiTelecomToPreprocess.gzip ## Description Input: - 1I -- (cartography.pnt,cartography.pro): +#### 1C +##### Configuration file + +- `file_pro`: `/path/to/cartography.pro` +- `file_pnt`: `/path/to/cartography.pnt` +- `file_data`: [`/path/to/DatiTelecomPreProcessed.csv`] NOTE: It is a list +- `cartout_basename`: `/path/to/save/output/dir` +- `start_time`: `YY-MM-DD h:m:s` +- `end_time`: `YY-MM-DD h:m:s` +- `bin_time`: `15.0` +- `lat_min,lat_max,lon_min,lon_max`: bounding box vertices +- `map_resolution`: `60` +- `grid_resolution`: `150` (m), for searching algorithms points,poly,arcs ecc... +- `l_gauss`: `10` +- `min_data_distance`: `50` (m), threshold distance between a `record_base` and a `cluster_base.centroid` to create another `cluster_base` object when filtering trajectories. +- `max_inst_speed`: `50` (m\s), maximum speed not to consider `record` as an error and not discard it. +- `min_node_distance`: `10` (m), threshold for two nodes not to be the same. (Not used here, but in other parts of the code base cartography-data, miniedit) +- `min_poly_distance`: `50` (m), threshold for two poly not to be the same. (Not used here, but in other parts of the code base cartography-data, miniedit) +- `enable_threshold`: `true` +- `threshold_v` : `50.0` +- `threshold_t` : `86400` +- `threshold_n` : `3` +- `enable_multimodality`: `true` Enable Fuzzy algorithm for classification of homogeneous trajectories +- `enable_slow_classification`: `true` Used to separate the slowest category that usually does not separate walkers and bikers. +- `num_tm`: `3` number of classes that you want to distinguish. +- `threshold_p`: `0.5` threshold on the probability for one trajectory to belong to one cluster. If less then 0.5 then it belongs to class `10` (unclassified) +- `dump_dt`: `60` +- `enable_fluxes_print`: `true` Enable output: **{basename}**`.fluxes` +- `enable_subnet`: `true` Enable output: **{basename}**`.fluxes.sub` +- `show_subnet`: `true` +- `file_subnet`: `/path/to/subnet/{basename}.fluxes.sub` +- `multimodality_subnet`: `true` +- `num_tm_subnet`: `3` +- `enable_print`: `true` For `_stats.csv` Deprecated +- `enable_geojson`: `false` Uses geojson +- `enable_gui`: `true` Activate gui +- `jump2subnet_analysis`:`false` Does not recalculate the subclass but read them for the construction of the subnetworks + +#### 1I +##### Cartography in physycom format. +- (`cartography.pnt`,`cartography.pro`): Contain all informations needed to build the road network in such a way that the program is able to read these informations from them. -- cartography.pnt: +- `cartography.pnt`: Contains informations about where the points of the road are. -- cartography.pro: +- `cartography.pro`: Contains informations about links. - - 1. READING: - 1a. Trajectory information: - Reads .csv files containing information about mobilephone data containing the following columns: - [iD,lat,lon,time] - NOTE: Usually, we need to create this dataset. TIM gives another format of data that we need to preprocess and create these columns. - Further informations about preprocessing needed... - 1b. Cartography information: - It creates a graph of the city from cartography.pnt and .pro. Essentially these are used to create objects that are contained - in carto.h. Whose attributes and functions are initialized in carto.cpp - 2. EXTRACT TRAJECTORIES: - -## Output: -Trajectories Info: +#### 1D +##### Data +`DatiTelecomToPreprocess.gzip` contains `[iD,lat,lon,time]`, the `DatiTelecomAlreadyPreprocessed.csv` too. -Road Infos: -Disaggregated by fcm. -- **{basename}**_class_i_velocity_subnet.csv +# USAGE: + I1: + Run the preliminary preprocessing of data to construct the format required for city-pro analysis. +# REQUIRED INPUT +1. Produce .pnt .pro (follow instructions in `$WORKSPACE/cartography-data`) +2. `cd $WORKSPACE/city-pro` +3. If `DatiTelecomPreProcessed.csv` exists: +`Do nothing` +else: +`python3 ./python/mdt_converter.py` +NOTE: insert manuallly the dates in LIST_START, LIST_END depending on the dates you have and ensure that the file directories match the structure in your machine. -## Description Output: ------------------------------------------------------- --------------------------------------------- -NOTE: There is a part of the output about the road network that contains -1. -..._class_i_velocity_subnet.csv (as many files as many classes usually 4-5) - start_bin;end_bin;poly_id;number_people_poly;total_number_people;av_speed;time_percorrence -Description: - Information about network via poly_id in time for the subnet i. i belongs to {0,..,num_tm} and increases by velocity of percurrence -2)...class_i.txt (as many files as many classes usually 4-5) -Description: - "Space separated" poly ids of the subnet of class i -3) -...iclass_subnet.txt (as many files as many classes usually 4-5) -Description: +#### DESCRIPTION I/O mdt_converter.py +Input: + `/path/to/gzip/files` = [`../dir1`,...,`../dirn`] for those who have access are in (`/nas/homes/albertoamaduzzi/dati_mdt_bologna/`) +Output: + `/path/to/raw/files` = [`/home/aamad/codice/city-pro_old/work_geo/bologna_mdt_detailed/data`] [`file_raw_data1,...,file_raw_datan`] + Columns: + [`id_user,timestamp,lat,lon`] + +## CPP +Launch all together: +``` +cd WORKSPACE/city-pro +python3 ./python/SetRightDirectoriesConfiguration.py +./config/RunRimini.sh +``` +Or +Input: +``` +./ccm/build.ps1 -UseVCPKG -DisableInteractive -DoNotUpdateTOOL -DoNotDeleteBuildFolder + +./bin/city-pro ./work_geo/bologna_mdt_detailed/date/config_bologna.json +``` +# Output: + + +## Network + +1. **{basename}**_class_`i`_velocity_subnet.csv: +Description: +Contains informations about the `velocity` and `time percorrence` in time intervals `[start_bin,end_bin]` of poly `poly_id` of the subenetwork of fcm index `i`. +Columns: + `start_bin;end_bin;poly_id;number_people_poly;total_number_people;av_speed;time_percorrence` + +2. **{basename}**...class_`i`.txt +Description: + "Space separated" poly ids of the subnet of class `i`. + i.e. 1 2 10 12 16 ... + +3. **{basename}**`i`class_subnet.txt +Description: "Space separated" poly ids of the subnet of class i that is freed from the network of higher velocity. In this way we have a "hierarchy" of subnetwork, that is, if I consider a poly that is contained in multiple subnetwork it will be assigned to the quickest subnet. -> This hopefully will help us find traffic via fondamental diagram. +## Trajectories +1. **{basename}**_presence.csv + Description: + Contains information about all trajectories `id_act` that have just one `stop_point` for the time window `[timestart,timeend]` at `(lat,lon)`. + Columns: + `id_act;timestart;timeend;lat;lon` -TRAJECTORY INFORMATIONS: - -1) -..._presence.csv - id_act;timestart;timeend;lat;lon +2. **{basename}**_fcm_centers.csv Description: - Autoexplicative -2) -..._fcm_centers.csv -Description: - Contains informations about the centers in the feature space. - Are ordered from slowest to quickest. -3) -..._fcm.csv - id_act;lenght;time;av_speed;v_max;v_min;cnt;av_accel;a_max;class;p -Description: - identification, lenght of trajectory, time the trajectory lasted, average_speed, v_min,max, number of points in trajectories, class, and probability of being in that class. + Contains informations about the centers in the feature space coming out from the Fuzzy algorithm for clustering of the trajectories. + *NO COLUMN NAMES*: + `class`;`av_speed`;`vmin`;`vmax`;`sinuosity` + Data are ordered by class from slowest (top) to quickest (bottom). -4) -...fcm_new.csv - id_act;class;0;1;2;3;4 +3. **{basename}**_fcm.csv Description: - The id of traj, the class that is reassigned to, according to the principle, the subnet of the class that contains more points of the trajectory, gives the class. - So, if a person is moving slowly in the just quick subnet, than, it is reassigned to the quickest class. - The columns, 0,... are associated to the hierarchical subnets -5) + Contains information about, `lenght` of trajectories `id_act`, duration `time`, average speed `av_speed`, minimum velocity registered `v_min`, maximum velocity registered `v_max`, number of points `cnt`, `class` (output from Fuzzy clustering algorithm), and probability of being in that class `p`,active in the time window `[start_time,end_time]`. + Columns: + `id_act;lenght;time;av_speed;v_max;v_min;cnt;av_accel;a_max;class;p;start_time;end_time` + -..._out_features.csv - id_act;average_speed;v_max;v_min;sinuosity +4. **{basename}**fcm_new.csv: Description: + Contains information about id of traj `id_act`, the class that is reassigned to `class`, according to the principle, the subnet of the class that contains more points of the trajectory, gives the class. + So, if a person is moving slowly in the just quick subnet, than, it is reassigned to the quickest class. + The columns, 0,... are associated to the hierarchical subnets +Columns: +`id_act;class;0;1;2;3;4` +5. **{basename}**_out_features.csv +Description: For each trajectory have the informations about the features of the classes - - -## Structure Program: +Columns: + `id_act;average_speed;v_max;v_min;sinuosity` +# Structure Program: + 1. READING: + 1a. + Trajectory information: + Reads .csv files containing information about mobilephone data containing the following columns: + [iD,lat,lon,time] + NOTE: Usually, we need to create this dataset. TIM gives another format of data that we need to preprocess and create these columns. + Further informations about preprocessing needed... + 1b. + Cartography information: + It creates a graph of the city from cartography.pnt and .pro. Essentially these are used to create objects that are contained + in carto.h. Whose attributes and functions are initialized in carto.cpp + 2. EXTRACT TRAJECTORIES: +### SUMMARY This script in particular is able to: 1. generate trajectories from single records, discarding GPS errors by thresholding on the maximum velocity. 2. Associate the roads they pass by @@ -105,49 +179,21 @@ Description: 4. -Sorgente esperimento, do il nome di source perchè così non devo -modificare il make ogni volta. NOTA: Gli script sono fatti per analizzare un giorno alla volta. La struttura delle cartelle rispecchia questo. Per ogni giorno analizzato ho una cartella in work_geo/bologna_mdt_detailed e output/bologna_mdt_detailed Description: Telecom gives initial dataset day by day with a lot of fields and zipped. I have created mdt_converter.py that essentially takes the dataset, and extract [iD,lat,lon,time] and saves it in a csv that will be given to analysis.cpp -## RUNNING SCRIPTS: - I1 -To produce .pnt .pro: - -```cd $WORKSPACE/cartography-data``` - -ANALISI DATI BOLOGNA MDT: - cd $WORKSPACE/city-pro -1----------------------------------------------- PREPROCESSING PYTHON -Input: - /path/to/gzip/files = ([G:/bologna_mdt/dir1,...,G:/bologna_mdt/dirn]) or (/nas/homes/albertoamaduzzi/dati_mdt_bologna/) -Launch: - python3 ./python/mdt_converter.py -NOTE: insert manuallly the dates in LIST_START, LIST_END depending on the dates you have. -Output: - /path/to/raw/files = [/home/aamad/codice/city-pro_old/work_geo/bologna_mdt_detailed/data] [file_raw_data1,...,file_raw_datan] -> STRUCTURE: [id_user,timestamp,lat,lon] - -2----------------------------------------------- PROCESSING CPP (FOR EACH) SINGLE DAY -Input: raw_file in raw_files - ./bin/city-pro ./work_geo/bologna_mdt_detailed/date/config_bologna.json ----- ./launch_all_analysis.sh -Output: - subnet file: .fluxes.sub - (class_i)_complete_complement.csv - presence.csv - timed_fluxes.csv - stats.csv - fcm.csv - -3------------------------------------------------ POSTPROCESSING PYTHON (FOR EACH) SINGLE DAY +##### COMMENT ON FUZZY ALGORITHM: +Needs to be tuned, try different `num_tm` (3 or 4 for Bologna depending on the day). Increasing the number does not uncover the slow mobility (walkers,bikers), but it finds subgroups on higher velocity group. +This bias is probably due to the sensitivity of the algorithm to the speed, giving more weight in for the separation for classes that have higher velocity. +# DEPRECATED 3a. ./python/config_subnet_create.py (README in the file) Output: all_subnets.sh work_geo/bologna_mdt_detailed/date/plot_subnet --------------------------------------------------- 3b. analysis.ipynb (non è il top affatto) Bisogna inserire manualmente gli indirizzi dove è salvata la roba nella prima cella. Fatto questo si possono runnare le altre celle. Poi lanciare cella per cella: diff --git a/custom_style.mplstyle b/custom_style.mplstyle new file mode 100644 index 00000000..9d33031f --- /dev/null +++ b/custom_style.mplstyle @@ -0,0 +1,13 @@ +# custom_style.mplstyle +figure.figsize: 10, 6 +axes.titlesize: 14 +axes.labelsize: 12 +lines.linewidth: 2 +lines.markersize: 6 +font.size: 12 +legend.fontsize: 10 +grid.color: gray +grid.linestyle: -- +grid.linewidth: 0.5 +axes.grid: True +axes.grid.axis: both diff --git a/python/work_mdt/script/AnalysisMdt/AnalysisNetwork1Day.py b/python/work_mdt/script/AnalysisMdt/AnalysisNetwork1Day.py index dc54f81b..bed96553 100644 --- a/python/work_mdt/script/AnalysisMdt/AnalysisNetwork1Day.py +++ b/python/work_mdt/script/AnalysisMdt/AnalysisNetwork1Day.py @@ -1,5 +1,8 @@ ''' - NOTE: Stats is Useless + NOTE: stats.csv Is Useless but I keep it for reference. + NOTE: The Organization of the script is around DailyNetworkStats. + This class contains all the informations about trrajectories and Network in one day. + The motivation is to simplify the analysis for multiple days. ''' from collections import defaultdict import geopandas as gpd @@ -9,6 +12,14 @@ from shapely.geometry import box import folium import datetime +import matplotlib.pyplot as plt +if os.path.isfile(os.path.join(os.environ["WORKSPACE"],"city-pro","custom_style.mplstyle")): + plt.style.use(os.path.join(os.environ["WORKSPACE"],"city-pro","custom_style.mplstyle")) +else: + try: + import PlotSettings + except Exception as e: + print("No Plot Settings File Found") def NormalizeWidthForPlot(arr,min_width = 1, max_width = 10): ''' @@ -38,6 +49,16 @@ def km2m(x): def StrDate2DateFormatLocalProject(StrDate): return StrDate.split("_")[0],StrDate.split("_")[1],StrDate.split("_")[2] + +def Timestamp2Datetime(timestamp): + return datetime.datetime.fromtimestamp(timestamp) + +def Timestamp2Date(timestamp): + return datetime.datetime.fromtimestamp(timestamp).date() + +def Datetime2Timestamp(datetime): + return datetime.timestamp() + class DailyNetworkStats: ''' This Class is Used to Contain Informations about The Daily info About the Network. @@ -65,6 +86,8 @@ def __init__(self,config,StrDate): # FILES self.DictDirInput = {"fcm": os.path.join(self.InputBaseDir,self.BaseFileName + '_' + self.StrDate + '_' + self.StrDate + '_fcm.csv'), "fcm_centers": os.path.join(self.InputBaseDir,self.BaseFileName + '_' + self.StrDate + '_' + self.StrDate + '_fcm_centers.csv'), + "fcm_new":os.path.join(self.InputBaseDir,self.BaseFileName + '_' + self.StrDate + '_' + self.StrDate + '_fcm_new.csv'), + "stats":os.path.join(self.InputBaseDir,self.BaseFileName + '_' + self.StrDate + '_' + self.StrDate + '_stats.csv'), "timed_fluxes": os.path.join(self.InputBaseDir,self.BaseFileName+'_'+ self.StrDate+'_'+ self.StrDate + '_timed_fluxes.csv'), "fluxes": os.path.join(self.InputBaseDir,"weights",self.BaseFileName+'_'+ self.StrDate+'_'+ self.StrDate + '.fluxes'), "fluxes_sub": os.path.join(self.InputBaseDir,"weights",self.BaseFileName+'_'+ self.StrDate+'_'+ self.StrDate + '.fluxes.sub')} @@ -101,10 +124,12 @@ def __init__(self,config,StrDate): self.ReadFluxes = False self.ReadFluxesSub = False self.ReadFcm = False + self.ReadFcmNew = False self.ReadFcmCenters = False self.ReadGeojson = False self.ReadVelocitySubnet = False self.BoolStrClass2IntClass = False + self.ComputedMFD = False # SETTINGS INFO self.colors = ['red','blue','green','orange','purple','yellow','cyan','magenta','lime','pink','teal','lavender','brown','beige','maroon','mint','coral','navy','olive','grey'] self.Name = BaseName @@ -133,9 +158,8 @@ def __init__(self,config,StrDate): # FUNDAMENTAL DIAGRAM self.MFD = pd.DataFrame({"time":[],"population":[],"speed":[]}) self.Class2MFD = {class_:pd.DataFrame({"time":[],"population":[],"speed":[]}) for class_ in self.IntClass2StrClass.keys()} - - - + # MINIMUM VALUES FOR (velocity,population,length,time) for trajectories of the day + self.MinMaxPlot = defaultdict() # --------------- Read Files ---------------- # def ReadTimedFluxes(self): self.TimedFluxes = pd.read_csv(self.InputBaseDir["timed_fluxes"],delimiter = ';') @@ -143,12 +167,20 @@ def ReadTimedFluxes(self): def ReadFluxes(self): self.Fluxes = pd.read_csv(self.InputBaseDir["fluxes"],delimiter = ';') - self.ReadFluxes = True - + self.ReadFluxes = True def ReadFcm(self): self.Fcm = pd.read_csv(self.InputBaseDir["fcm"],delimiter = ';') self.ReadFcm = True + + def ReadStats(self): + self.Stats = pd.read_csv(self.InputBaseDir["stats"],delimiter = ';') + self.ReadStats = True + + def ReadFcmNew(self): + self.FcmNew = pd.read_csv(self.DictDirInput["fcm_new"],delimiter = ';') + self.ReadFcmNew = True + def ReadFcmCenters(self): Features = {"class":[],"av_speed":[],"v_min":[],"v_max":[],"sinuosity":[]} FcmCenters = pd.read_csv(self.InputBaseDir["fcm_centers"],delimiter = ';') @@ -161,6 +193,7 @@ def ReadFcmCenters(self): keyidx += 1 self.FcmCenters = pd.DataFrame(Features) self.ReadFcmCenters = True + def ReadFluxesSub(self,verbose = False): ''' Input: @@ -246,9 +279,15 @@ def ReadVelocitySubnet(self): self.VelTimePercorrenceClass[Class] = pd.read_csv(self.RoadInClass2VelocityDir[Class],delimiter = ';') self.ReadVelocitySubnet = True + def AddFcmNew2Fcm(self): + if self.ReadFcm and self.ReadFcmNew: + self.Fcm["class_new"] = self.FcmNew["class"] + if self.ReadStats and self.ReadFcmNew: + self.Stats["class_new"] = self.FcmNew["class"] ##--------------- Plot Network --------------## def PlotIncrementSubnetHTML(self): """ + NOTE: Informations about the subnet are taken from Subnet Files Description: Plots the subnetwork. Considers the case of intersections """ @@ -411,32 +450,33 @@ def PlotTimePercorrenceHTML(self): ## ------- FUNDAMENTAL DIAGRAM ------ ## -def FilterStatsByClass(fcm,i,idx,stats): - fcm_idx = fcm[i].groupby('class').get_group(idx)['id_act'].to_numpy() - mask_idx = [True if x in fcm_idx else False for x in stats[i]['id_act'].to_numpy()] - f_idx = stats[i].loc[mask_idx] - f_idx = f_idx.sort_values(by = 'start_time') - return f_idx - - def PlotMFD(self): + def ComputeMFDVariables(self): + ''' + Description: + Computes the MFD variables (t,population,speed) -> and the hysteresis diagram: + 1) Aggregated data for the day + 2) Conditional to class + Save them in two dictionaries + 1) self.MFD = {time:[],population:[],speed:[]} + 2) self.Class2MFD = {Class:{"time":[],"population":[],"speed":[]}} + ''' if self.ReadFcm: if "start_time" in self.Fcm.columns: # ALL TOGETHER MFD for t in range(int(self.iterations)): mask_idx = [True if (int(x['start_time'])>int(self.TimeStampDate)+t*self.dt and int(x['start_time'])int(self.TimeStampDate)+t*self.dt and int(x['end_time'])int(self.TimeStampDate)+t*self.dt and int(x['start_time'])int(self.TimeStampDate)+t*self.dt and int(x['end_time'])int(self.TimeStampDate)+t*self.dt and int(x['start_time'])int(self.TimeStampDate)+t*self.dt and int(x['end_time'])int(self.TimeStampDate)+t*self.dt and int(x['start_time'])int(self.TimeStampDate)+t*self.dt and int(x['end_time'])int(self.TimeStampDate)+t*self.dt and int(x['start_time'])int(self.TimeStampDate)+t*self.dt and int(x['end_time'])