Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a separate function for the feature extraction #1

Open
relleums opened this issue Mar 8, 2018 · 2 comments
Open

Have a separate function for the feature extraction #1

relleums opened this issue Mar 8, 2018 · 2 comments

Comments

@relleums
Copy link
Member

relleums commented Mar 8, 2018

I propose to use a separate function to extract features from the photon_stream.
At the moment you have if else blocks for different treatment of simulation and observation events inside the feature extraction. This way, one might end up to have multiple of such if else blocks all over the code which is error prone.
I propose to make a separate function which takes only the photon_stream e.g. in point-cloud representation, and then returns a dictionary of the extracted features. The extraction of feature must be independent of the event being a simulation or observation, so I would show and emphasize this in the code.
This function is then called on both simulations and observations.
I would restrict my self to have only one of such if else blocks in the very beginning.

Also when the feature extraction is a function of its own, unit-tests are easier to implement (loosely coupled).

@KevSed
Copy link
Collaborator

KevSed commented Apr 3, 2018

So you propose to have one function to generate features from both, simulation and data? Currently there is a difference between these two when calculating features. There are for example simulation_truth that are saved from the MC (e.g. for training) but, of course, not available in data and observation_info that are saved from the observed data files. So using one function requires if else blocks anyway, no? Except there were two separate functions, which then would make it messy for the "shared" features to make changes...
I definitely see the danger of using too many if else blocks, though.

@relleums
Copy link
Member Author

relleums commented Apr 4, 2018

On the one hand side there is the representation of events (both simulations and observations) in the photon-stream and on the other side there is our intent to perform a certain task on certain parts of the data within these events.
The air-shower-features are independent of the simulation-truth. To express this, I would not pass on the event-class into the function which extracts the air-shower-features, but only the part of the event which is needed to extract the air-shower-features. This might be the raw-photon-stream, but not the full event. The feature-extraction does not need to access the ID of the event or its simulation-truth.
Imagine you want to set-up a unit-test. For a unit-test it is easier to create a fake raw-photon-stream than to also fake the rest of the event. This is about having loose couplings between the functions.
I think, I just recommend to split your processing into small functions which indicate their purpose already from their input and output arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants