-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let file sources choose a path for uploaded files #19154
base: dev
Are you sure you want to change the base?
Conversation
4d4b9a6
to
bc3b95a
Compare
The method `write_from()` of `SingleFileSource` and `BaseFilesSource` reads a local file from `native_path` and saves it to `target_path` on a file source. This commit allows the service backing the file source to choose which will be the path of the saved file, meaning that `target_path` and the actual path where the file can be recovered later do not have to match. The latter is the return value of `write_from()`. Therefore, all usages of `write_from()` have also been refactored to consider the paths chosen by the file source's backing service. In addition, when exporting a history, the URI that the service backing the file source assigns to it will be saved to the history export result metadata object.
bc3b95a
to
53012bf
Compare
I've read through the attached issue and I am really nervous about this approach. It seems like there is a mismatch because the abstractions are pretty different. It seems like a augmenting the model import/export process would make more sense than this lower level abstraction. Do have the details of the plugin worked through that prompted this change? David's comment on that issue seems to vaguely point in this direction as well. |
Sorry for not answering earlier, I was out of office for most of the week. Indeed there is a very large mismatch between the abstractions from Galaxy and eLabFTW. Galaxy works with a unique URI that is fixed before uploading to the file source, whose basename is chosen by the user; eLabFTW works with a pair of autoincrementing identifiers that are fixed after creating the experiment or resource or uploading the file and a string enum with two choices (i.e. About what @davelopez commented on the issue
You can create entities via eLabFTW, and you can have as many files attached to them as you wish, therefore you do not necessarily need to create an entity to export a file, in fact most of the time you will be interested in exporting files to an existing entity, because they have a lot of metadata that should be filled through the eLabFTW interface.
I am not sure if you are referring to the design of the plugin or the code of the plugin. The latter is almost ready here, although I do not think it makes sense to have a look yet (let's tackle problems step by step, there is another large problem/mismatch to discuss later). What prompts the change when working the plugin through is the moment one has to choose a valid mapping between eLabFTW entity type, entity id and upload id and Galaxy URI that respects their properties:
In other words, the simple part of the problem is "Construct an injective function from the set of nonnegative integer 2-tuples to the set of strings complying with the definition of a path according to RFC 3986". The impossible part of the problem is that Galaxy needs to apply the inverse function without having anything to apply it to! Anyway, @davelopez wants to have closer look at this together with me to see if the change from this PR can be avoided. If a better solution can be found then fine, otherwise augmenting the model import/export process would make sense, and I do not mind putting the effort. Let's first have this closer look and then we may continue the discussion. |
Change the implementation so that the definitions of `_write_from` in classes that inherit from `BaseFilesSource` do not need to change.
Define a new class `FileSourceModelExportStore` that abstracts the common details of `BcoModelExportStore`, `ROCrateArchiveModelExportStore`, `TarModelExportStore` and `BagArchiveModelExportStore`. This new class manages exports to file sources, from where data can be retrieved later on via a URI. It takes the responsibility of creating a temporary directory to set up the file to export and uploading it to the file source.
lib/galaxy/managers/model_stores.py
Outdated
export_metadata = self.set_history_export_request_metadata(request) | ||
|
||
exception_exporting_history: Optional[Exception] = None | ||
try: | ||
with model.store.get_export_store_factory( | ||
export_store = model.store.get_export_store_factory( | ||
self._app, model_store_format, export_files=export_files, user_context=user_context | ||
)(target_uri) as export_store: | ||
)(target_uri) | ||
with export_store: | ||
history = self._history_manager.by_id(request.history_id) | ||
export_store.export_history( | ||
history, include_hidden=request.include_hidden, include_deleted=request.include_deleted | ||
) | ||
self.set_history_export_result_metadata(request.export_association_id, export_metadata, success=True) | ||
request.target_uri = str(export_store.file_source_uri) or request.target_uri | ||
except Exception as e: | ||
exception_exporting_history = e | ||
raise | ||
finally: | ||
export_metadata = self.set_history_export_request_metadata(request) | ||
self.set_history_export_result_metadata( | ||
request.export_association_id, export_metadata, success=False, error=str(e) | ||
request.export_association_id, | ||
export_metadata, | ||
success=not bool(exception_exporting_history), | ||
error=str(exception_exporting_history) if exception_exporting_history else None, | ||
) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before applying the set of changes from 53012bf, the method ModelStoreManager.set_history_export_request_metadata()
instantiates a ExportObjectMetadata
Pydantic model and dumps it to the database in the form of JSON as the field StoreExportAssociation.export_metadata
. After the export is complete, the method set_history_export_result_metadata()
takes the same instance of ExportObjectMetadata
, instantiates a ExportObjectResultMetadata
Pydantic model, sets it as the result_data
of the ExportObjectMetadata
instance, and then saves the ExportObjectMetadata
Pydantic model in the form of JSON to the database again.
After applying the set of changes, the call to ModelStoreManager.set_history_export_request_metadata()
is delayed until the file has already been saved to the file source, as the actual URI that the file source assigns to the file is otherwise unknown.
The URI assigned by the file source overwrites the original target URI in the request. This involves a slight deviation from the previous behavior: if for example, power gets cut at the right time, StoreExportAssociation.export_metadata
may not exist despite the history having been already saved to the file source, because database writes happen within the finally:
block.
Moreover, overwriting the original target URI from the request is formally wrong, because the actual URI assigned by the file source should be part of the export result metadata, as it becomes known when the export completes. However, that implies modifying the other parts of the codebase that reference the URI from the request.
Despite the slight deviation in behavior and the formal incorrectness, rather than jumping straight into fixing these issues, I think it makes sense to leave the chance for discussion open, as doing things this way may still be an interesting tradeoff. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking...
After applying the set of changes, the call to ModelStoreManager.set_history_export_request_metadata() is delayed until the file has already been saved to the file source, as the actual URI that the file source assigns to the file is otherwise unknown.
If we merge the PR as it is, then we'd never see an export that is in progress in the list from the UI. If the file is large, that alone would justify attempting to fix
overwriting the original target URI from the request is formally wrong, because the actual URI assigned by the file source should be part of the export result metadata, as it becomes known when the export completes
right? I guess it makes sense to make an attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7a9a52a scratches this itch
@jmchilton After discussing with @davelopez, it seems that it is hard to avoid that |
I appreciate you taking the time to at least simplify the implementation classes with that change being applied in the base class. I'm still very uneasy but I will live - especially if @davelopez is okay with this approach. |
Thanks for the trust @jmchilton 😄 🙏 I know it is somewhat of an API change and we are streching a bit the original implementation, but I think it is safe to assume the normal case would be returning the same |
27e9a14
to
5e5cd68
Compare
Write the actual URIs where files are saved to by file sources to a new `uri` field of the `ExportObjectResultMetadata` model. Restore behavior before commit 53012bf, namely setting the history export request metadata before attempting the export. --- Before applying the set of changes from 53012bf, the method `ModelStoreManager.set_history_export_request_metadata()` instantiates a `ExportObjectMetadata` Pydantic model and dumps it to the database in the form of JSON as the field `StoreExportAssociation.export_metadata`. After the export is complete, the method `set_history_export_result_metadata()` takes the same instance of `ExportObjectMetadata`, instantiates a `ExportObjectResultMetadata` Pydantic model, sets it as the `result_data` of the `ExportObjectMetadata` instance, and then saves the `ExportObjectMetadata` Pydantic model in the form of JSON to the database again. After applying the set of changes, the call to `ModelStoreManager.set_history_export_request_metadata()` is delayed until the file has already been saved to the file source, as the actual URI that the file source assigns to the file is otherwise unknown. The URI assigned by the file source overwrites the original target URI in the request. This involves a slight deviation from the previous behavior: if for example, power gets cut at the right time, `StoreExportAssociation.export_metadata` may not exist despite the history having been already saved to the file source, because database writes happen within the `finally:` block. Moreover, overwriting the original target URI from the request is formally wrong, because the actual URI assigned by the file source should be part of the export result metadata, as it becomes known when the export completes. This commit addresses these issues by adding a new field `uri` to the `ExportObjectResultMetadata` model, where the actual URI the file has been saved to by the file source is written. Backwards compatibility is ensured via `HistoriesService._get_export_record_data()`, which retrieves the new field when available and otherwise uses `ExportObjectRequestMetadata.payload.target_uri` (given that `ExportObjectRequestMetadata.payload` is of type `WriteStoreToPayload`).
5e5cd68
to
7a9a52a
Compare
The method
write_from()
ofSingleFileSource
andBaseFilesSource
reads a local file fromnative_path
and saves it totarget_path
on a file source.This PR enables the service backing the file source to choose which will be the path of the saved file, meaning that
target_path
and the actual path where the file can be recovered later do not have to match. The latter is the return value ofwrite_from()
.Therefore, all usages of
write_from()
have also been refactored to consider the paths chosen by the file source's backing service. In addition, when exporting a history, the URI that the service backing the file source assigns to it will be saved to the history export result metadata object (StoreExportAssociation.export_metadata
).The changes from this PR are meant to address a fundamental design mismatch between Galaxy and eLabFTW that impedes the smooth integration of the latter. Galaxy file sources are designed to save files to a
target_uri
which is set before the file is saved, partly by the user, who chooses the basename in the URI. On the other hand, files can be retrieved from eLabFTW via anentity_type
,entity_id
andupload_id
, and the latter two are set by eLabFTW. There is no good mapping betweentarget_uri
andentity_type
,entity_id
andupload_id
that can work without breaking the assumption that "saving a file on pathx
guarantees that it can be retrieved later usingx
", which is precisely what this PR does. The issue #18665 includes a more thorough explanation of this design mismatch.This is the first PR of a series of PRs that integrate eLabFTW with Galaxy via a file source (together they address issue #18665):
How to test the changes?
(Select all options that apply)
License