net, sta, chan, and loc and Database schemas #479
pavlis
started this conversation in
Design & Development
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I came across an anomaly while testing the new version of Database and read/write_distributed_data that is the latest of a string of problems created by the ubiquitous miniseed net:sta:chan:loc attributes. I discovered the problem when testing the new Database code with the "getting_started" notebook in our tutorials repository. The notebook has this code box that showed the problem that caused me to write this page:
A separate issue is we need to change the method of pulling data with get_waveforms as the download can take hours, but what matters here is that the line calling
save_data
in the loop overstrm
does a nasty thing in my new implementation: it strips the net, sta, chan, loc attributes form the documents it saves in thewf_TimeSeries
collection in this code. After some debugging I now know why. The following standardized code for converting Metadata to a python dict causes this problem:The problem is that the keys net, sta, chan, and loc in this context are not marked as changed. As a result they all are erased.
Now this section has been trouble since the early days of MsPASS. Earlier versions of this same tutorials had to deal with the other way this would be handled. That is, earlier versions had net, sta, chan, and loc somehow marked as "changed" earlier in the algorithm. That is, the current code base has exactly this same algorithm but something else has changed that now doesn't mark net, sta, chan, and loc as changed. When they were marked changed we ended up with obnoxious READONLYERROR_net, READONLYERROR_sta, etc. and an elog entry for every call to save in this tutorial section. The workaround we used back then was to change the schema to "mspass_lite". That would still solve this particular problem too, but I think using that workaround for this particular fix alone is NOT a good idea. I would like to suggest we make a couple bigger changes that will make saving data more robust. This particular anomaly is NOT acceptable as this example runs for hours and products a useless database in which the waveform documents cannot be linked with channel metadata by any means. It would be way way too easy for an innocent user to make that mistake.
Items I recommend for discussion are:
channel_net
. That kind of attribute should ALWAYS be stripped from the document before it is saved. I would recommend we add a kwarg ofnormalizing_collections=["channel", "site", "source"]
and erase any key implied by those names (e.g. "site_lat", "channel_vang", or "source_lat"). I emphasize things like "channel_id" must ALWAYS be treated specially as a standard normalization key.channel_sta
,channel_net
, etc. we should treat as readonly. net, sta, chan, and loc are so pervasive in seismology today we should not create problems when a workflow needs to carry them around.Beta Was this translation helpful? Give feedback.
All reactions