Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the catalogue #57

Open
MatthewJA opened this issue May 24, 2016 · 16 comments
Open

Fix the catalogue #57

MatthewJA opened this issue May 24, 2016 · 16 comments

Comments

@MatthewJA
Copy link
Collaborator

@MatthewJA
Copy link
Collaborator Author

Well... I think I found the leak.

DEBUG:root:3059 hosts with no associated SWIRE object.

@MatthewJA
Copy link
Collaborator Author

MatthewJA commented May 24, 2016

Flipping the coordinates along the y-axis seems to have fixed the issue. I'm not sure why this convention is different to the radio convention - if it isn't different, then the radio matching must be working by pure coincidence, which seems unlikely as the radio sky is not as dense as the IR sky.

I throw out ~2200 hosts, but (at least) hundreds are just duplicates (compared with ~60 duplicates before). The debug output went out of buffer, so I'll have to re-run to get better numbers. I've sent this to Julie, so we'll see if these are reasonable numbers soon.

I am now re-running the catalogue with PG-means.

@MatthewJA
Copy link
Collaborator Author

MatthewJA commented May 24, 2016

Some numbers: PG-means finds 4433 consensuses, compared to KDE's 4400. This means PG-means throws out less classifications; I'm not sure what the implications are.

A note mainly for myself: I changed coordinate systems for PG-means consensus generation, so that may have an effect on how much I throw out.

@MatthewJA
Copy link
Collaborator Author

PG-means throws out 1632 consensuses for not having an associated SWIRE component, bringing the total number of results down to ~1500 which seems too small. This is really strange! I'll stick with KDE for now but I think I could fix PG-means to work properly. It's entirely possible that GMM is a terrible idea for this data, so I might try k-means instead. PG-means is a wrapper, so it's trivial to change.

@jbanfield
Copy link
Collaborator

trying to merge the rgz catalogue you sent with swire.

Look at CI0074C1 and CI0074C2 in the rgz_radio_component file. There is only CI0074C1 in the 11JAN2014 ATLAS catalogue. What is CI0074C2? Plus it is not in the rgz_host file. Not sure how everything goes together.

@MatthewJA
Copy link
Collaborator Author

I don't have the 11JAN2014 ATLAS catalogue; I have the 23JULY2015 ATLAS catalogue. This is the one on GitLab.

@MatthewJA
Copy link
Collaborator Author

Just to make sure we're on the same page, I'm using the CDFS images from 11JAN2014, also on GitLab.

The sources/Zooniverse IDs in the rgz_host file only refer to an arbitrary subject that contains that line's RGZ/SWIRE object within a 1' radius. This is purely for reference, so these aren't relevant (and are actually ID_RGZ, not ID, in the CSV you sent me).

@jbanfield
Copy link
Collaborator

Yes, you should have the 23JULY2015 ATLAS catalogue and the 11JAN2014 images. The ID between the two are different. The 23JULY2015 catalogue has one entry for every gaussian fit to the radio image at a signal-to-noise >= 5. If a guassian is labelled with a C1, C2, C3, etc. then we required more then one gaussian to fit the radio structure at that location in the image. In cases like this, the RGZ image was only centred on C1 in order to reduce the number of duplicates.

I was using the 11JAN2014 catalogue to make the bookkeeping file.

@jbanfield
Copy link
Collaborator

My question is how do you record the match in the catalogue if there are more then one radio component making up the consensus? i.e., the tutorial image - only the double radio source in the centre of the image.

@MatthewJA
Copy link
Collaborator Author

I was using the 11JAN2014 catalogue to make the bookkeeping file.

That might explain why some of the radio components I had expected were missing.

My question is how do you record the match in the catalogue if there are more then one radio component making up the consensus? i.e., the tutorial image - only the double radio source in the centre of the image.

Two rows with the same RGZ name but different component names. The advantage of this is that it means all rows are the same length.

RGZ names should be unique in rgz_hosts.csv and component names should be unique in rgz_radio_components.csv.

@jbanfield
Copy link
Collaborator

What happens to the radio subjects with no infrared host?

@MatthewJA
Copy link
Collaborator Author

They're skipped. If a radio subject doesn't appear in the catalogue, then either there was no nearby SWIRE object or "No IR Source" was selected as the majority by volunteers.

@jbanfield
Copy link
Collaborator

I'm working on creating the RGZ CDFS catalogue. 2 questions:

Where do the radio_component values from rgz_component_kde come from? - I'm thinking from the mondgodb

Where do the source values come from for the rgz_host_kde? - I'm thinking from the 23JULY2015 catalogue

Is this correct?

@MatthewJA
Copy link
Collaborator Author

Other way around — the component IDs are from the 23JULY2015 catalogue. The zooniverse_id and source columns in the hosts file should be ignored.

@jbanfield
Copy link
Collaborator

Thanks!

Do you have a list of the RGZ subjects with no consensus or no SWIRE id?

@MatthewJA
Copy link
Collaborator Author

Hmm, I don't. I could probably generate one but I'll have to regenerate the catalogue (which shouldn't be different).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants