Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurable buffering #5

Open
zerebubuth opened this issue Oct 12, 2017 · 3 comments
Open

Add configurable buffering #5

zerebubuth opened this issue Oct 12, 2017 · 3 comments

Comments

@zerebubuth
Copy link
Member

At the moment RedshiftExporter immediately uploads all data to Redshift (although I think it would work with any PostgreSQL-compatible database). However, Redshift prefers data to be uploaded in large chunks, and the current Scoville upload size is clearly much smaller than the smallest Redshift allocation chunk, leading to a "table bloat" of approximately 100x.

This bloat can be fixed by compacting the table (by selecting all the data into a new table and swapping the two), but that requires a regular maintenance job to be run, which complicates the system.

Another approach would be for Scoville itself to buffer around 100 readings in memory (120 would be two hours worth at the default 1 minute setting) and upload them in larger chunks. This would make Scoville results less timely, but since Scoville data is used entirely for batch reporting, this isn't very important.

@nvkelso
Copy link
Member

nvkelso commented Oct 17, 2017

(In the meantime we've run a maintenance job in Redshift to compact the existing table, and that can be run again.)

@nvkelso
Copy link
Member

nvkelso commented Oct 17, 2017

@rmarianski can you add details on how to run that Redshift script? (Removing sensitive username, password, etc)

@rmarianski
Copy link
Member

Evan saved the script in the analytics api repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants