Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update ingestion config validation for ctf and alignment entities #262

Merged
merged 57 commits into from
Sep 27, 2024

Conversation

Bento007
Copy link
Contributor

@Bento007 Bento007 commented Sep 10, 2024

related to chanzuckerberg/cryoet-data-portal#997

Description

update existing ingest configs

  • added a migrate script to migrate existing ingest_configs to the new schema. This cause some reformatting of spaces and newlines of unrelated fields.
  • removed fields that were empty lists or empty mappings except for annotations

collection_metadata

  • move *.mdocs from rawtilts[].sources to collection_metadata[].sources[].source_multi_glob.list_globs

alignments

  • copytomogram[0].metadata.affine_transformation_matrix to alignment[].metadata.affine_transformation_matrix if it is not an identity matrix, and if alignment sources existed. affine_transformation_matrix was copied into each alignment[].source
  • add alignment[].metadata.format and set to appropriate value based on source formats.
    • move *.tlt, *.xf, *.com, and *.aln from rawtilts[].sources to collection_metadata[].sources[].source_multi_glob.list_globs

tomograms

  • remove tomogram[].metadata.affine_transformation_matrix if it is an identity matrix.
  • add tomogram[].metadata.dates from deposition[0].metadata.dates if it existed, else the epoch date was used for all dates. Here is a list of the configs a default date was used
  • 10011_draft.yaml
  • 10431.yaml
  • 10001.yaml
  • 10002.yaml
  • 10302.yaml
  • 10427.yaml
  • 10428.yam
  • 10429.yaml
  • 10430.yaml
  • 10432.yaml
  • 10434.yaml
  • 10435.yaml
  • 10437.yaml
  • 10438.yaml
  • add tomogram[].metadata.is_visualization_default with default set to true

Template.yaml

  • add alignments
    • alignment_type
    • offset
    • x_roation_offset
    • tilt_offset
    • affine_transformation_matrix
    • is_canonical
    • format
    • volume_dimesion
  • add annotations[].sources.*.is_portal_standard
  • add frames[].metadata
    • dose
    • defocus
    • astigmatism
    • astigmatic_angle
  • add tomograms[].metadata
    • is_visualization_default
    • is_portal_standard
    • cross_references
    • dates

Ingestion Config Schema

  • define alignment metadata and sources and add it to ingest_config_models.yaml
  • define frame metadata and add it to the ingest_config_models.yaml
  • add the alignment Container for linkML
  • add extended validation for alignment sources.
  • add extended validation for affine_transformation_matrix in alignment and remove it for tomograms

Testing

  • Generated the schema using make build-ingestion-config
  • verified the schemas using make validate-configs and make validate-configs-with-network

Note

If you have better ideas for the descriptions in the schema I'd be happy to use them.

Bento007 and others added 30 commits September 4, 2024 17:59
# Conflicts:
#	ingestion_tools/dataset_configs/template.yaml
#	schema/ingestion_config/v1.0.0/codegen/ingestion_config_models.py
- add enums for alignment types and format
…smith/997-ctf-alignment

# Conflicts:
#	ingestion_tools/dataset_configs/template.yaml
#	schema/core/v1.1.0/metadata.yaml
@@ -35,6 +36,65 @@ def has_no_sources(data: list[dict[str, Any]] | dict[str, Any]) -> bool:
return isinstance(data, dict) or not any(row.get("sources") for row in data)


def rawtilts_to_alignments(data: dict) -> None:
Copy link
Contributor

@jgadling jgadling Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit for this function in general -- in Python, I think the return early pattern is best practice. There are several points in this function where we do a check for a value and then nest more logic inside that only gets executed in that case, and yet there's no else condition - we could have just returned from this function (or added a continue/break to our loop) if the condition wasn't met. The code here would be shorter and less deeply nested (thus: easier to read) if it were refactored in this way

Copy link
Contributor

@manasaV3 manasaV3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@@ -1,3 +1,3 @@
linkml
linkml==1.8.2 # upgrade blocked by https://github.com/chanzuckerberg/cryoet-data-portal-backend/issues/274
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for capturing this issue. :)

- dataset
- deposition
- run
- tomogram
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As tomogram cannot be the parent of alignment, this should be updated?

config['collection_metadata'] = [{"sources": [{"source_multi_glob": {"list_globs": []}}]}]
config["collection_metadata"][0]["sources"][0]["source_multi_glob"]["list_globs"].extend(list_globs)

def rawtilts_to_alignments(data: dict) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this has been moved to ingestion_tools/scripts/transform_ingestion_configs.py. Do we still want this here?

import yaml


def rawtilts_to_collection_metadata(config: dict) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the only method that hasn't been ported over to the script.

Copy link
Contributor

@jgadling jgadling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved once these things are completed:

  • Add code from ingest_997.py to transform_ingestion_configs.py
  • Make sure tomograms can't be a parent of alignments in the ingestion configs

Bento007 added a commit that referenced this pull request Sep 26, 2024
- Add a docstring and description for the upgrade command
- Update transform_ingestion_configs to be extended to support upgrading an ingest config from version 0.0.0 and beyond.
- move schema migrations code to ingest_tools/schema_migration
- update gjensen_config.py to run with reorganization.
- related to #262
# Conflicts:
#	ingestion_tools/dataset_configs/10002.yaml
#	ingestion_tools/dataset_configs/10004.yaml
#	ingestion_tools/dataset_configs/10008.yaml
#	ingestion_tools/dataset_configs/10009.yaml
#	ingestion_tools/dataset_configs/10426.yaml
#	ingestion_tools/dataset_configs/10436.yaml
#	ingestion_tools/dataset_configs/10437.yaml
#	ingestion_tools/dataset_configs/10438.yaml
#	ingestion_tools/dataset_configs/10439.yaml
#	ingestion_tools/dataset_configs/deposition_10301.yaml
#	ingestion_tools/dataset_configs/deposition_10303.yaml
#	ingestion_tools/dataset_configs/deposition_10304.yaml
#	ingestion_tools/dataset_configs/deposition_10308.yaml
#	ingestion_tools/scripts/schema_migration/transform_ingestion_configs.py
@Bento007
Copy link
Contributor Author

follow up PR #292

@Bento007 Bento007 merged commit 6c52df8 into main Sep 27, 2024
6 of 7 checks passed
@Bento007 Bento007 deleted the tsmith/997-ctf-alignment branch September 27, 2024 16:28
@Bento007 Bento007 restored the tsmith/997-ctf-alignment branch September 27, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants