Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searches for uid on results of other searches fails when not looking for latest #746

Open
whs92 opened this issue Nov 17, 2022 · 0 comments

Comments

@whs92
Copy link
Member

whs92 commented Nov 17, 2022

Expected Behavior

I would like to be able to perform searches on the results of a tiled backed catalog. For example:

from tiled.client import from_profile
from databroker.queries import TimeRange

c = from_profile("xyz") # note I am connecting directly to the mongodb here and not using the tiled server
results = c.search(TimeRange(since="2022-04-10", until = "2022-04-12")

run = results[5] 

Current Behavior

Currently, if there are multiple runs in the catalog c with the scan_id 5, then calling:

run = results[5]

Will first search the catalog c for all the entries with the scan_id 5, then take the latest one (which might not lie in the time range we want) and then looks to see whether it lies in the time range specified in the original TimeRange search.

This is not the expected order of operations and leads to an error of "KeyError: 'No match for scan_id=5'"

Possible Solution

Naively, I can see that it's possible to achieve what I want by making the scan_id function return the entire list of entries which match the requested scan_ids. Then this subset of the original catalog is searched, in subsequent searches.

def ScanID(*scan_ids, duplicates="all"):
    # Wrap _ScanID to provide a nice usage for *one or more scan_ids*:
    # >>> ScanID(5)
    # >>> ScanID(5, 6, 7)
    # Placing a varargs parameter (*scan_ids) in the dataclass constructor
    # would cause trouble on the server side and generally feels "wrong"
    # so we have this wrapper function instead.
    return _ScanID(scan_ids=scan_ids, duplicates=duplicates)

I still don't understand why the original catalog and not the results catalog is searched though, and this solutution probably breaks other stuff.

It is of course possible to avoid these problems by using uid rather than scan_id.

Context

My users and I make use of agregated searches a lot to find subsets of our database. This is useful when trying to give users parts of the database, maybe becuase that's when they ran their investigation, or I only want them to see parts of it they have added with their user (username is added to metadata). It's quite common for my users to first search the database using a TimeRange, and then look for scan_id's within that time range of whenever their beamtime was.

We are currently using databroker backed by intake, and I am looking at using Tiled, which is why I ran some existing scripts against it and found the problem described above.

Your Environment

python 3.8
tiled==0.1.0a79
databroker ==2.0.0b12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant