Repository for deep convolutional neural networks (CNN) to separate cosmological signal from high foreground noise contamination for 21-centimeter large-scale structure observations in the radio spectrum.
Read the full publication here: https://arxiv.org/abs/2010.15843
Browser-based tutorial available via this Google Colab notebook
Contents:
-
pca_processing
:HEALPix
simulation data processing from.fits
to.npy
voxel format.- Cosmological and foreground simulations generated using the CRIME package
- Principal Component Analysis Python script
pca_format.py
according to Alonso et al (2014)
- Principal Component Analysis Python script
- Ideally
pca_script.py
should be run in parallel (each single-sky simulation takes about 3 minutes to process on a standard CPU node)
-
UNet
CNNs implemented in Keras: -
configs
:.json
parent configuration file with cleaning method and analysis parameters to be edited for user's directory
-
data_utils
:- Data loaded using
dataloaders.py
to generate noisy simulations in batch-sized chunks for network to train my_callbacks.py
for varying learning rate and computing custom metrics during training
- Data loaded using
-
sim_info
:- frequency (
nuTable.txt
) and HEALPix window (rearr_nsideN.npy
) indices forCRIME
simulations
- frequency (
-
train.py
: script for training UNet model. Modify Python dictionary input for appropriate number of training epochs -
run.sh
:- sample slurm-based shell script for training ensemble of models in parallel
-
hyperopt
:- folder for hyperparameter tuning on given dataset
Training Data Availability:
All 100 full-sky simulations used for this analysis are now publicly available on Globus under the folder ska2
. Polarised foregrounds and another set of data are available under ska_polarized
and ska_sims
respectively.
The training data used in the published UNet is located under the folder ska
. Each of the independently-seeded 100 simulations is located under a numbered folder. For instance, for simulation 42's data is structured as:
|`sim_42`
|----`cosmo`
|--------`cosmo_i.fits`
|---`fg`
|--------`fg_i.fits`
where i
indexes frequencies from 350 to 691 MHz. To feed the data into pca_script.py
, the configs/config.json
file should be modified to point to ska2
.