Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch date reader slicer can't slice data with gaps in the time field #166

Open
ciorg opened this issue Oct 17, 2019 · 0 comments
Open
Labels
bug Something isn't working

Comments

@ciorg
Copy link
Member

ciorg commented Oct 17, 2019

this could be related to issue #11

It seems like the issue appears when there is a significant time period with no data then data again.

details:

  • job:
    • assets: elasticsearch: 1.6.1
    • 10 workers
    • 1 slicer
    • operations: elasticsearch_reader -> noop

when reading data from an index 2 slices failed with the error Elasticsearch Error: [query_phase_execution_exception] Result window is too large,

The data itself has second resolution with milliseconds in the timestamp, but the milliseconds are always 000. The slicer is millisecond resolution.

2 slices failed during the initial job:

  • slice1: {"start":"2019-05-22T00:00:34+00:00","end":"2019-05-27T00:03:11+00:00","count":2293241}
  • slice2: {"start":"2019-05-14T00:02:50+00:00","end":"2019-05-17T00:05:39+00:00","count":3873168}

When I broke up the slices into smaller groups the jobs ran with out issues.

I moved the start date in the job based on the slice error start until the job didn't error out anymore.

Slice 1 succeeded with start: 2019-05-22T00:00:34 and end: 2019-05-26T22:34:15, then start: 2019-05-26T22:34:15, end: 2019-05-27T00:03:11

Slice 2 succeeded with start: 2019-05-14T00:02:50 , end: 2019-05-16T23:12:16 then start:2019-05-16T23:12:16 and end:2019-05-17T00:05:39

Slice 1 index searches showed that an index search between 2019-05-22T00:00:34 and end: 2019-05-26T22:34:15 returns 0 results.
Index search between 2019-05-26T22:34:15. and end <2019-05-27T00:03:11.000 returns 2293241 docs the same record count for slice 1.

Slice 2 index searches showed that an index search with the date field:>2019-05-14T00:02:50.000+AND+date_field:<2019-05-16T23:12:16.000 returns a count of 0 docs.

Searching date_field:>2019-05-16T23:12:16+AND+date_field:<2019-05-17T00:05:39.000 returns 3873168. The same record count for slice 2.

Removing the time periods of 0 results resulted in the test jobs finishing with no issues.

@ciorg ciorg added the bug Something isn't working label Oct 17, 2019
@ciorg ciorg changed the title elasticsearch date reader slicer can't slice data elasticsearch date reader slicer can't slice data with gaps in the time field Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant