-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Update ingestion config validation for ctf and alignment entities #262
Conversation
# Conflicts: # ingestion_tools/dataset_configs/template.yaml # schema/ingestion_config/v1.0.0/codegen/ingestion_config_models.py
- add enums for alignment types and format
…smith/997-ctf-alignment # Conflicts: # ingestion_tools/dataset_configs/template.yaml # schema/core/v1.1.0/metadata.yaml
# Conflicts: # ingestion_tools/dataset_configs/10003.yaml # ingestion_tools/dataset_configs/10006.yaml # ingestion_tools/dataset_configs/10436.yaml # ingestion_tools/dataset_configs/10439.yaml
@@ -35,6 +36,65 @@ def has_no_sources(data: list[dict[str, Any]] | dict[str, Any]) -> bool: | |||
return isinstance(data, dict) or not any(row.get("sources") for row in data) | |||
|
|||
|
|||
def rawtilts_to_alignments(data: dict) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit for this function in general -- in Python, I think the return early pattern is best practice. There are several points in this function where we do a check for a value and then nest more logic inside that only gets executed in that case, and yet there's no else
condition - we could have just returned from this function (or added a continue/break
to our loop) if the condition wasn't met. The code here would be shorter and less deeply nested (thus: easier to read) if it were refactored in this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
@@ -1,3 +1,3 @@ | |||
linkml | |||
linkml==1.8.2 # upgrade blocked by https://github.com/chanzuckerberg/cryoet-data-portal-backend/issues/274 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for capturing this issue. :)
- dataset | ||
- deposition | ||
- run | ||
- tomogram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As tomogram cannot be the parent of alignment, this should be updated?
config['collection_metadata'] = [{"sources": [{"source_multi_glob": {"list_globs": []}}]}] | ||
config["collection_metadata"][0]["sources"][0]["source_multi_glob"]["list_globs"].extend(list_globs) | ||
|
||
def rawtilts_to_alignments(data: dict) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this has been moved to ingestion_tools/scripts/transform_ingestion_configs.py
. Do we still want this here?
import yaml | ||
|
||
|
||
def rawtilts_to_collection_metadata(config: dict) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the only method that hasn't been ported over to the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved once these things are completed:
- Add code from ingest_997.py to transform_ingestion_configs.py
- Make sure tomograms can't be a parent of alignments in the ingestion configs
- Add a docstring and description for the upgrade command - Update transform_ingestion_configs to be extended to support upgrading an ingest config from version 0.0.0 and beyond. - move schema migrations code to ingest_tools/schema_migration - update gjensen_config.py to run with reorganization. - related to #262
# Conflicts: # ingestion_tools/dataset_configs/10002.yaml # ingestion_tools/dataset_configs/10004.yaml # ingestion_tools/dataset_configs/10008.yaml # ingestion_tools/dataset_configs/10009.yaml # ingestion_tools/dataset_configs/10426.yaml # ingestion_tools/dataset_configs/10436.yaml # ingestion_tools/dataset_configs/10437.yaml # ingestion_tools/dataset_configs/10438.yaml # ingestion_tools/dataset_configs/10439.yaml # ingestion_tools/dataset_configs/deposition_10301.yaml # ingestion_tools/dataset_configs/deposition_10303.yaml # ingestion_tools/dataset_configs/deposition_10304.yaml # ingestion_tools/dataset_configs/deposition_10308.yaml # ingestion_tools/scripts/schema_migration/transform_ingestion_configs.py
follow up PR #292 |
related to chanzuckerberg/cryoet-data-portal#997
Description
update existing ingest configs
annotations
collection_metadata
*.mdocs
fromrawtilts[].sources
tocollection_metadata[].sources[].source_multi_glob.list_globs
alignments
tomogram[0].metadata.affine_transformation_matrix
toalignment[].metadata.affine_transformation_matrix
if it is not an identity matrix, and if alignment sources existed.affine_transformation_matrix
was copied into eachalignment[].source
alignment[].metadata.format
and set to appropriate value based on source formats.*.tlt
,*.xf
,*.com
, and*.aln
fromrawtilts[].sources
tocollection_metadata[].sources[].source_multi_glob.list_globs
tomograms
tomogram[].metadata.affine_transformation_matrix
if it is an identity matrix.tomogram[].metadata.dates
fromdeposition[0].metadata.dates
if it existed, else the epoch date was used for all dates. Here is a list of the configs a default date was used10431.yaml10001.yaml10002.yaml10302.yaml10427.yaml10428.yam10429.yaml10430.yaml10432.yaml10434.yaml10435.yaml10437.yaml10438.yamltomogram[].metadata.is_visualization_default
with default set totrue
Template.yaml
alignments
annotations[].sources.*.is_portal_standard
frames[].metadata
tomograms[].metadata
is_visualization_default
is_portal_standard
cross_references
dates
Ingestion Config Schema
affine_transformation_matrix
in alignment and remove it for tomogramsTesting
make build-ingestion-config
make validate-configs
andmake validate-configs-with-network
Note
If you have better ideas for the descriptions in the schema I'd be happy to use them.