Including example queries in the contract #85
Replies: 5 comments 4 replies
-
Also curious about it! I have a similar use case where I want to ensure the data quality not on a single data model but on the interdependencies between a few of them. @simonharrer @jochenchrist maybe you can provide more details? Thank you!! |
Beta Was this translation helpful? Give feedback.
-
We currently discuss to add data quality with SQL statements on model and field level. Current development branch: https://github.com/datacontract/datacontract-specification/blob/quality/README.md#sql Idea is to have something like this:
|
Beta Was this translation helpful? Give feedback.
-
What do you think about this proposal? |
Beta Was this translation helpful? Give feedback.
-
Hi @jochenchrist , I see how this I was more thinking of pieces of code that would be guaranteed to run if you run them in the right environment - possibly a pre-configured notebook - to be interpreted as "here is our recommended way to use the data". Your feature can certainly be used that way in SQL. How would it look like for Kafka topics (one of my main use cases for data contracts)? Another question I am asking myself: is |
Beta Was this translation helpful? Give feedback.
-
Hi again, Given that the solution proposed is a bit far off from what I was personally looking for, I am attempting a counter-proposal. My idea would be to expand the specification to include a section named In a nutshell, it would look like this:
I am insisting that the intent is not to provide a data quality check, but rather to hint the consumers as to what are intended ways to use the data. I think quality checks are very valuable, but they do not serve well the users when it comes to teaching them how to write appropriate queries. What would you think of such a proposal? |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am looking for a way to help the consumers by offering example SQL queries to run on the data. Such queries would be seen as "part of the contract", meaning the producer guarantees that the query will return results and that it is a correct intended use of the data.
At first, I was looking at the top-level
examples
section:However, it feels like an abuse of that section: what if I want to make a query to join 2 tables? Should I specify a comma-separated list of tables as the
model
?I would generally be interested in creating data contracts which include examples that one can run, serving both as test and as specification. Ideally, this would not be limited to SQL statements, but could potentially include code samples.
Is this somehow included in the specification (in which case, I missed it and I would like to know how to achieve it)? Or is it not an intended use of data contracts? Or is it something that can be added?
Cheers,
Christophe-Marie
Beta Was this translation helpful? Give feedback.
All reactions