Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work out how to automate and formalise custom DB changes in medic-analytics #8

Open
SCdF opened this issue Jul 22, 2016 · 14 comments
Open

Comments

@SCdF
Copy link
Contributor

SCdF commented Jul 22, 2016

The medic-analytics code alters the DB schema to create various tables and views. It also refreshes all existing materialised views in the database at the end of each run. It makes schema alterations by running migration scripts on startup. These scripts get baked into each release.

Tech leads need to be able to alter the DB schema per project, to create their own views (materialised or not). They may also need to add their own custom indexes to existing tables.

We need to think up some mechanism for them to do that in a way that is more formal than "here is the DB credentials go nuts". Though I'm not against it as a plan z.

This problem is somewhat like app settings. But, you can't take back a migration. This means you cannot "wholesale" replace migrations like you can with app settings.

Here are some off-the-cuff options:

  • Alter medic-analytics to support a configurable third-party migration directory on the server. Tech leads would upload their scripts to this directory. The code would either listen for changes and execute them, or it would occur on startup. Tech leads would self-manage their scripts in git.
  • Create an app for each client to contain migrations. This would be a fork of a skeleton project, that boots, check for migrations and runs them, then closes. There would be tests to make sure that these migrations sit correctly on top of medic-analtyics. It would run on medic-os like medic-analytics does, with the same credentials. This would force you to version and check in your schema changes. You would need to store them in medic-projects
@alxndrsn
Copy link
Contributor

you can't take back a migration

Depends how they're written, surely?

This would force you to version and check in your schema changes. You would need to store them in medic-projects

👍 to this

@SCdF
Copy link
Contributor Author

SCdF commented Jul 22, 2016

you can't take back a migration

Depends how they're written, surely?

Yeah that's true. I guess I was thinking about how you could potentially mess up / confuse the migration tool in the first "dropbox" idea, or a variation where you upload a zip via a UI or something, if you forgot to provide the current migrations / did them in the wrong order / whatever. For the second idea it's not relevant.

@garethbowen
Copy link
Contributor

Tech leads need to be able to alter the DB schema per project

This makes me really concerned. It'd be great if we could have a clear line between stuff developers manage and stuff tech leads manage so we don't conflict. For example, can we put project-specific views in a separate db/schema (or whatever the pg way to do that is)?

This would leave tech leads able to run their own migrations in their own way without having to be managed by this codebase.

They may also need to add their own custom indexes to existing tables.

Most of those indexes will probably be of benefit to other projects as well and should be added to the base set. The exception would be if there's a very expensive one for a specific use case.

@SCdF
Copy link
Contributor Author

SCdF commented Jul 25, 2016

It'd be great if we could have a clear line between stuff developers manage and stuff tech leads manage so we don't conflict

Does having all tech-lead stuff in an app that has tests that make sure there are no errors when applying their migrations on top of the default migrations? I suppose even then that won't simulate an ongoing system where these migrations could be interleaved and all sorts of oddness.

For example, can we put project-specific views in a separate db/schema (or whatever the pg way to do that is)?

There is a plugin called dblink that allows that, or there is us using schemas correctly (in pg, db -> schema -> table, though schema is "default" if you don't say anything).

Most of those indexes will probably be of benefit to other projects as well and should be added to the base set.

@michaelkohn what are your thoughts on the above quote?

@michaelkohn
Copy link

I would like to think that there will be a fair amount of re-usability on indexes, though my understanding is that it also depends on what is happening up-stream. We only really have 3 or 4 index-able base objects anyway (the couchdb table which has just one field of the raw json, and a few meta-level matviews).

We can index JSON in the couchdb table, but even for the same use case, my understanding is that the contents and structure of the JSON is dependent on how the TL created the form.... so even if we add an index to some field in the JSON in the couchdb table, if the field name is slightly different or doesn't even exist, I assume the creation of that index would fail (or otherwise be useless).

For the matviews, there are only a few fields in them anyway and they are mostly fully indexed already.

@SCdF
Copy link
Contributor Author

SCdF commented Jul 25, 2016

@michaelkohn is there ever stuff like sorting or filtering in clipfolio? That would require special indexes?

@michaelkohn
Copy link

Not in addition to what we've already built into our queries. We build the queries that KlipFolio executes and any additional filtering or sorting is done client side / from the presentation layer.

@SCdF
Copy link
Contributor Author

SCdF commented Jul 26, 2016

@michaelkohn right so to be clear, are you saying klipfolio works by running your view, downloading all data over to its servers, and then any kind of filtering or sorting it deals with?

@michaelkohn
Copy link

Correct.

KlipFolio executes the query we provide it, downloads the results of that query and then has the ability to sort/filter whatever data was returned by the original query.

So if the query we configure in KlipFolio is this
-> SELECT * FROM contactview_chp WHERE NAME LIKE 'MICHAEL%'

KlipFolio will receive the results of that query and do whatever addition filtering or ordering on that dataset client side, it will not hit the database again.

For most of the datasources, it doesn't actually execute the query interactively (ie when someone clicks a button). It executes the query on a schedule (most are set to refresh every 30 minutes), stores the data and uses that dataset for every dashboard view until the next time it refreshes.

It is possible to run parameterized / interactive queries, but we still configure the WHERE clause for the query. An example is for date ranges... there are some dashboards where users can enter From/To dates. In KF, we would configure the WHERE clause to look something like this...

WHERE meta.reported > '{FromDate}'

Before running the query, KF just replaces that token with the user provided date so PostgreSQL would just see it as

WHERE meta.reported > '2016-05-01' (for example)

@SCdF
Copy link
Contributor Author

SCdF commented Jul 26, 2016

OK, so it sounds like we can look into multiple schemas here, something like:

  • db.couch2pg schema which is only writable by couch2pg, and contains the couchdb table
  • db.xmlforms schema which contains helper views for xmlforms support, again only writable by couch2pg
  • db.hlv schema (name is terrible, think of a better one) which is the only place tech leads / humans can write to, which is where high level views go.

Would need to look into how schemas work in pg. Also, this still means we'd want some kind of code that helps tech leads organise their changes, i.e. one of the options in the ticket description.

@nice-snek
Copy link

hi frand @garethbowen, please triage before the end of this sprint. :parrot_conga::parrot_conga::parrot_conga:

@garethbowen
Copy link
Contributor

This is a really good idea and has lots of good information so I'm going to schedule it but not prioritise it.

@garethbowen garethbowen removed their assignment Apr 18, 2017
@medic-bot medic-bot assigned ghost Jan 31, 2018
@medic-bot
Copy link

Hi @browndav,

This ticket has not been touched in 90 days. Is it still relevant?

(See triaging old issues for more detail)

@ghost ghost removed their assignment Feb 22, 2018
@ghost
Copy link

ghost commented Feb 22, 2018

What Gareth said. Relevant; keeping this open.

@garethbowen garethbowen transferred this issue from medic/cht-core Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants