-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate datasets for multiple products on the NCI #166
Comments
There are quite a few products with large numbers of duplicate datasets (2018 only):
We've also made some progress in finding the cause. Some of the AWS Lambda functions used to submit jobs on As a temporary measure we've upped the timeout to 5 minutes, and are looking into more rigorous methods to prevent this happening again. The cause of the SQL to query for single product
SELECT
COUNT(*) filter (where row_number = 1) as should_exist,
COUNT(*) filter (where row_number > 1) as num_dupes,
COUNT(*) as total
FROM (select row_number() over (partition by lat, lon, time ORDER BY metadata_doc ->> 'creation_dt') row_number,
lat,
lon,
time,
metadata_doc->>'creation_dt' as creation_dt,
id
from dv_ls8_pq_albers_dataset
WHERE tstzrange('2018-01-01', '2018-12-31') && time
) t; |
Ingest is not re-entrant. Running ingest second time on the same product while the first run is still in progress will generate duplicate datasets that only differ by uuid (computed via non-deterministic method) and creation time.
There are no locks of any kind, and uuid is generated at random. Having deterministic uuid computation will prevent duplicates but will not prevent wasted compute. Out of band measures to ensure that ingest is not being called concurrently are needed. |
During recent ingest runs (Scenes -> Albers tiles) about 50,000+ duplicate tiles have been created for the
ls8_nbar_albers
product in 2018.It's possible that duplicate data will be returned for anyone using this product!
I'm currently investigating:
The text was updated successfully, but these errors were encountered: