Replies: 8 comments 1 reply
-
Thanks for opening this issue @dubravcik! I do like the idea of colocating the docs for a model along with its definition, but I have reservations about how well this scheme will work in practice. What if you wanted to add column-level tests in here too - would you want to encode all of that information in the model config? And what if there are dozens and dozens of columns? I think that cramming all of that information into a single |
Beta Was this translation helpful? Give feedback.
-
Theoretically, I'd like to move the yaml inside a model, but it is not possible afaik. We have to write in jinja expression in python, i.e. dictionary. Or is there any other option? If it has to be a dictionary, can we have multiple config expressions? Then it just about preference between the syntax of python or yaml. If we put the docs and tests closer to the output I think it is not so noisy. {{
config(
materialized = "table"
)
}}
WITH cte AS (
....
)
SELECT a.[SK_Account_Master]
,[Account Name] = [NAME_AccountName]
,[Account Level] = [NAME_AccountLevel]
,[Account Type] = [NAME_AccountType]
,[Is Testing Account] = [FLAG_IsTestingAccount]
,[Partner Revenue] = [AMT_PartnerRevenue_cur]
,[Partner Revenue LFY] = [AMT_PartnerRevenue_LFY_cur]
,[Partner Revenue TFY] = [AMT_PartnerRevenue_TFY_cur]
,[Partner Revenue Y] = [AMT_PartnerRevenue_12M_cur]
,a.[SK_User_Master_Inserted]
,a.[SK_User_Master_Modified]
,a.[SK_Account_Master_Parent]
,a.[SK_Account_Master_SuperParent]
,a.[SK_Account_Master_Partner]
,a.[SK_Partners_Partner_Master]
,a.[SK_Geography_Master]
,a.[SK_Currency_Master]
,a.[SK_User_Master_TSM]
,a.[SK_User_Master_AccountManager]
,a.[SK_User_Master_AccountExecutive]
,a.[SK_User_Master_CSM]
,a.[SK_Campaign_Master_SourceCampaign]
,[DTIME Inserted] = a.[DTIME_Inserted]
,[DATE Inserted] = a.[SK_Date_Inserted]
,[DTIME Modified] = a.[DTIME_Modified]
,[DATE Modified] = a.[SK_Date_Modified]
FROM {{ source('account') }} a
LEFT JOIN cte ...
{{
config(
description = "account is company"
columns = [
{
"name": "SK_Account_Master",
"description": "primary key of Account",
"tests": ["unique", "not_null"]
},
{
"name": "Account name",
"description": "business name" },
{
"name": "Account Level",
"description": "basic advanced pro"
},
{
"name": "Account Type",
"description": "partner or client"
},
{
"name": "SK_User_Master_Inserted",
"description": "User who created the Account",
"tests": { "relationships": { "to": "ref('user')", "field": "id" } }
}
]
tests = [
{ "accepted_values": {"column_name": "[account type]", "values": ['partner', 'client']}},
{ "unique": {"column_name": "concat([account name], [account type])"}}
]
)
}} I don't say this is exactly how I want it and I would use stright away, just suggestion for discussion :) |
Beta Was this translation helpful? Give feedback.
-
Honestly I like having it separate. git-based version control systems are more "friendly" with multiple smaller files than they are with fewer larger ones. More, smaller files makes it easier to eyeball quickly where changes have taken place and also makes merge conflicts less likely. |
Beta Was this translation helpful? Give feedback.
-
I am coming with another idea. I don't know whether it is a problem for others, but for me the problem is that the documentation
The idea is to have the documentation inside the model using comments. Other programming languages use comments for documentation, so it could work as well. For example a comment --docs < column > < description > could be parsed and used for documentation. The position of such comment would not be important as it holds the column name, so it would work also in long models with many CTEs. It could be another option to document model and could be overridden by schema.yml SELECT
--docs full_name full name of customer
full_name
--docs country country where customer is located based on IP location
,country
--docs browser_language language set in customer browser
,browser_language
--docs date_created_at date customer created
--this is an unrelated commented to docs
,date_created_at
--docs date_last_login date customer logged in
,date_last_login
FROM analytics.customer I haven't thought about column level tests yet. |
Beta Was this translation helpful? Give feedback.
-
I'm thinking about this in the same way I'm thinking about #2401, which is on our 1.0 to-do list, and could be essentially summarized as: "Reconcile node configs and resource properties, where possible and it makes sense." Today, it's not totally clear what can be defined in one vs. the other—or both ( Personally, I find myself agreeing with @jrandrews's point above. I think that, by and large, we'll want to maintain the functional distinction that exists today, recast as a convention:
|
Beta Was this translation helpful? Give feedback.
-
We had a similar problem and wanted to post an issue. The worst thing for us was that we often forgot to edit the schema file when we added / removed columns. But I think having a description in |
Beta Was this translation helpful? Give feedback.
-
I've built out some scripts to do this in a project I maintain. I write specially formatted Javadoc-inspired SQL comments immediately after the declaration of each column that I want to include in the docs, and then I use a jq script to scan the compiled dbt SQL output with a regex looking for a comment of that format. I've been wanting to share for a while, but figured I should start out with a low scope by just describing the approach I took before I put the work into extricating the script from the rest of my codebase. 🙂 The SQL comments look like this: # file: mymodel.sql
/** @modeldoc
This is my super cool model! It does cool things!
**/
select
... as my_cool_column /** @coldoc
This is my description of the cool column. This whole paragraph gets extracted into
a yaml file as the description for the column.
You can even define tests too, by including well-formed dbt test JSON after one or
more @-test annotation(s) at the end of the comment!
@test "not_null"
@test {"accepted_values": {"values": ["Cool", "Cooler", "Coolest"]}}
**/
... and end up in the "schema.yml" (written as JSON for ease of automation) file looking like this: # file: models_generated.yml
{
...
"models": [
...
{
"name": "mymodel",
"description": "This is my super cool model! It does cool things!",
"columns": [
{
"name": "my_cool_column",
"description": "This is my description of the cool column. This whole paragraph gets extracted into a yaml file as the description for the column.\n\nYou can even define tests too, by including well-formed dbt test JSON after one or more @-test annotation(s) at the end of the comment!",
"tests": [
"not_null",
"accepted_values": {
"values": [
"Cool",
"Cooler",
"Coolest"
]
}
]
},
... It's all just from regex extraction, but I've found it to be pretty flexible and easy to use. I've used it for ~6 months now to build and maintain an analytic codebase of ~20 dbt models with ~200 columns in total that are all documented and fully tested this way. |
Beta Was this translation helpful? Give feedback.
-
I opened the related #6853 ... seems like there's not much traction on this topic, but it would be sooo nice to have :| |
Beta Was this translation helpful? Give feedback.
-
Describe the feature
When I started to learn dbt, I thought that data documentation is part of a model. Which is true, once it is published on the docs server. But when talking about source code, documentation is separated to
schema.yml
which is not intuitive for me. I would expect to have an option to write the docs close to the code. I can imagine a part of the config for this :Who will this benefit?
When one is making changes in a model, he can see the docs and moreover edit it immediately + he sees names of columns produced in the model.
Thanks :)
Beta Was this translation helpful? Give feedback.
All reactions