Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] refactor access to dataset through events #1918

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

superryeti
Copy link
Contributor

Related to #1894 , #1907

Context PR:
Since we created Events app, We now want to provide temporary access to the participants to the datasets that have been linked to event. We havenot implemented this access yet, because we wanted to first figure out the right way to do it.
In this PR, i started refactoring the has_access method from PublishedProjectManager and added a similar method has_access on the EventDataset.

Here is the explanation of the the current methods/functions

  1. PublishedProjects
    on physionet, access to PublishedProjects is determined by has_access method on the PublishedProjects model. this has_access method is used by views on project app which is used check if the current user should have access to this project.
    on healthdatanexus, we use PublishedProjects.objects.accessible_by to load projects, that the current user should have access to.

  2. Events
    On physionet, we haven't implemented logic that lets participants of event access a dataset temporarily.
    On healthdatanexus, the same PublishedProjects.objects.accessible_by is responsible to provide temporary access to the participants of events

On Event app, we also have EventDataset.has_access method which is identical to the PublishedProjects.has_access method in the sense that this method can be used to determine if the current user should have access to this particular event dataset. This method is used in template when a user visits an Event page to determine if they should have access to the Datasets linked to Event.( this only does checks in context of event)

This is my thoughts

  1. We could probably let EventDataset.has_access be as it is as if we were to remove this, we would still need some kind of similar logic to use on the Event detail page to determine if this particular event dataset should be accessible by the current user.
    This methods does checks such as if event is active, event dataset is active, if the current user is a participant or a host, and if the current user has signed the Event Agreement.

For controlling temporary access through event and Normal access
2. The EventDataset model has two access mechanism a) Google BigQuery(GBQ) b)Research Environment(RE) for physionet and healthdatanexus respectively. Based on this when a user has access to the project/dataset from Event, they should not be able to access the project files/content like usual.
For example in case of physionet, they should not see other access mechanism like wget, be able to see files and navigate folders, etc.

Which means we need to be able to differentiate if the user's access is temporary through Event or its a regular access. So we cant simply edit the PublishedProject.has_access method and add bunch of logic for event.

To my point/suggestions, maybe we could have refactor the access in the following way
a. PublishedProject.has_access_basic - we add a new method that checks for common stuff like if download is paused/depreciated, or on embargo
b. PublishedProject.has_access - uses the PublishedProject.has_access_basic and does the regular checks like it does now,
c. PublishedProject.has_access_temporary - uses the PublishedProject.has_access_basic and then check if the access is temporary(currently through Event)

on project view, we use both has_access and has_access_temporary and use these to determine what the users sees or how they access data on project template.
For example: If the user visits the project though a link like this http://localhost:8000/content/demowave/1.0.0, the current logic(from the published_project.html) executes and normal access check is done.

and if the user visits the project like http://localhost:8000/content/demowave/1.0.0?temporary_access=event-slug
then the has_access_temporary is used in the template to check if the user has access to the project and display/give them access to the dataset accordingly.

Now for healthdatanexus, we use the PublishedProjects.objects.accessible_by to check the access, so it already should be good. for HDN, we dont allow file download or direct data access, so its a bit simpler(although in future, we still need to be able to automatically close the research environment once the event is over, which i think we could do by adding logic on the hdn-research-environment app.

[WIP]
Finally the last thing remaining is to implement/enforce the access type GBQ and RE.(for example for physionet, if the dataset was linked to event as RE, the physionet participants should not be able to access the project/dataset)
I dont think it makes sense to hardcode the logic as in future physionet/ or other org might want to implement the Research Environment. Here maybe we can add a .env variable RESEARCH_ENVIRONMENT_ENABLED and use that.

So it would be like
by default RESEARCH_ENVIRONMENT_ENABLED=false and this will be used with the PublishedProject.has_access_temporary to implement the event related access.(which solves the problem for non RE access)

For RE access, maybe we can have a additional parameter on PublishedProjects.objects.accessible_by like filter_research_environment and use that to get datasets linked to event which have access_type=Research Environment

I apologize in advance if the above doesn't sound like a good idea or even worse(doesnot make sense at all :D)

Added an eventdatasetmanager just like publishedproject manager.
This is supposed to be used to get the list of dataset accessible to a user.
updated the logic in `PublishedProjectManager.accessible_by` to use the query filter from EventDatasetManager
@superryeti superryeti marked this pull request as draft March 7, 2023 01:05
@superryeti
Copy link
Contributor Author

Quick notes from @kshalot (sorry if i missed anything)

  1. Reason for creating manager for event dataset
    -- the current method(before the manager) uses EventDataset.has_access which kind of causes a lot of duplicate query to check access(like checking if event is active, if the current user is participant)

  2. Instead of DatasetManager, maybe we could refactor EventDataset.objects.accessible_by(user, project=None) to take a project argument. So that this can be used to see if the current user has access to this project/dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant