Searches for uid on results of other searches fails when not looking for latest #746

whs92 · 2022-11-17T10:45:04Z

Expected Behavior

I would like to be able to perform searches on the results of a tiled backed catalog. For example:

from tiled.client import from_profile
from databroker.queries import TimeRange

c = from_profile("xyz") # note I am connecting directly to the mongodb here and not using the tiled server
results = c.search(TimeRange(since="2022-04-10", until = "2022-04-12")

run = results[5]

Current Behavior

Currently, if there are multiple runs in the catalog c with the scan_id 5, then calling:

run = results[5]

Will first search the catalog c for all the entries with the scan_id 5, then take the latest one (which might not lie in the time range we want) and then looks to see whether it lies in the time range specified in the original TimeRange search.

This is not the expected order of operations and leads to an error of "KeyError: 'No match for scan_id=5'"

Possible Solution

Naively, I can see that it's possible to achieve what I want by making the scan_id function return the entire list of entries which match the requested scan_ids. Then this subset of the original catalog is searched, in subsequent searches.

def ScanID(*scan_ids, duplicates="all"):
    # Wrap _ScanID to provide a nice usage for *one or more scan_ids*:
    # >>> ScanID(5)
    # >>> ScanID(5, 6, 7)
    # Placing a varargs parameter (*scan_ids) in the dataclass constructor
    # would cause trouble on the server side and generally feels "wrong"
    # so we have this wrapper function instead.
    return _ScanID(scan_ids=scan_ids, duplicates=duplicates)

I still don't understand why the original catalog and not the results catalog is searched though, and this solutution probably breaks other stuff.

It is of course possible to avoid these problems by using uid rather than scan_id.

Context

My users and I make use of agregated searches a lot to find subsets of our database. This is useful when trying to give users parts of the database, maybe becuase that's when they ran their investigation, or I only want them to see parts of it they have added with their user (username is added to metadata). It's quite common for my users to first search the database using a TimeRange, and then look for scan_id's within that time range of whenever their beamtime was.

We are currently using databroker backed by intake, and I am looking at using Tiled, which is why I ran some existing scripts against it and found the problem described above.

Your Environment

python 3.8
tiled==0.1.0a79
databroker ==2.0.0b12

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searches for uid on results of other searches fails when not looking for latest #746

Searches for uid on results of other searches fails when not looking for latest #746

whs92 commented Nov 17, 2022 •

edited

Loading

Searches for uid on results of other searches fails when not looking for latest #746

Searches for uid on results of other searches fails when not looking for latest #746

Comments

whs92 commented Nov 17, 2022 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

Context

Your Environment

whs92 commented Nov 17, 2022 •

edited

Loading