Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Data Access #1

Open
jenna-tomkinson opened this issue Sep 15, 2022 · 5 comments
Open

Open Data Access #1

jenna-tomkinson opened this issue Sep 15, 2022 · 5 comments
Labels
notes Information regarding processes for project

Comments

@jenna-tomkinson
Copy link
Member

@gwaybio For the CFReT data, do we know if there is anything binding the data to prevent it from being public.

If not, there are two options for where to put this data. Currently, before any removal of images, it is a little over 2GB with 990 images. We can:

  1. Add it to DVC:
  • I do not this as viable since I do not see in the documentation on their website that there is DropBox support (see https://dvc.org/doc/command-reference/remote)
  • There is a GitHub issue regarding the want for DropBox support and it shows that the issue was resolved but I am not confident that the issue was fully corrected since it states in the issue that there isn't much demand for it.
  1. Add to Figshare:
  • This seems like the better option but I am not sure about the process so I have a few questions:
    1. Is there a Way Lab figshare login or do I create my own?
    2. Is it like GitHub where there is a Way Science organization and then the datasets within it?

Based on the answers to my questions, we can then proceed in one direction.

@jenna-tomkinson jenna-tomkinson added the notes Information regarding processes for project label Sep 15, 2022
@gwaybio
Copy link
Member

gwaybio commented Sep 15, 2022

Thanks for these notes @jenna-tomkinson !

  1. Sounds good. Let's not use DVC for this then. We should keep in mind that our pipelines would love to have DVC support for either Dropbox or OneDrive.
  2. Let's go with the Figshare option.

Is there a Way Lab figshare login or do I create my own?

Yes, please create your own account. Make sure to connect your ORCID.

Is it like GitHub where there is a Way Science organization and then the datasets within it?

No, not to my knowledge. I think it's good for you to double check me, however, in case things have changed since I last looked. (it's possible there is a thing called "collections", i am not sure)

The procedure is to create a personal account, create a new dataset, fill in the metadata information (feel free to open a new issue to document these items), and then publish (mint a DOI).

Part of the metadata procedure will be to name authors of the dataset. We should discuss authors prior to publishing the dataset.

do we know if there is anything binding the data to prevent it from being public.

I believe that we are all set to publish the data publicly. However, I will double check with Tim

@arka2696
Copy link

Hello @gwaybio and @jenna-tomkinson, I wanted to inquire if the CFReT data has been made publicly available yet, and if so, could you please guide me on how to access it? Thank you.

@gwaybio
Copy link
Member

gwaybio commented Oct 28, 2024

Hi @arka2696 - thanks for reaching out! Our collaborators have elected to keep these data private until publication - we'll definitely be updating this repository once the data are available.

@arka2696
Copy link

Hi @gwaybio – thank you for getting back to me! I understand that the data is private for now, and I completely respect that. I’m currently exploring different data analysis pipelines for image-based profiling across various use cases. While there are several papers using pycytominer, many of them involve large datasets or lack detailed explanations of the pipeline steps.

I was particularly impressed with the NF1 project on your GitHub; the systematic, step-by-step organization of the pipelines is incredibly helpful. Could you recommend any additional resources that provide clear insights into pycytominer and other downstream pipelines, especially for different types of image-based profiling experiments?

@gwaybio
Copy link
Member

gwaybio commented Oct 28, 2024

Thanks @arka2696 ! I recommend that you take a look at our recent preprint: https://arxiv.org/pdf/2311.13417

The Code/data availability and tutorials section gives some pointers. We also try to show explicit examples in the official documentation: https://pycytominer.readthedocs.io/en/stable/ - feel free to let us know if you have any issues (and also feel free to file a pull request to improve the work!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes Information regarding processes for project
Projects
None yet
Development

No branches or pull requests

3 participants