-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BigQuery Loader V2 #877
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for snowplow-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
1a692b2
to
43ead88
Compare
docs/pipeline-components-and-applications/loaders-storage-targets/bigquery-loader/index.md
Outdated
Show resolved
Hide resolved
...tions/loaders-storage-targets/bigquery-loader/previous-versions/bigquery-loader-1.x/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/configuration-reference/_bigquery_config.md
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
...-components-and-applications/loaders-storage-targets/bigquery-loader/_cross-cloud-diagram.md
Outdated
Show resolved
Hide resolved
...-components-and-applications/loaders-storage-targets/bigquery-loader/_cross-cloud-diagram.md
Show resolved
Hide resolved
docs/pipeline-components-and-applications/loaders-storage-targets/bigquery-loader/index.md
Show resolved
Hide resolved
...ns/loaders-storage-targets/bigquery-loader/previous-versions/bigquery-loader-1.x/_diagram.md
Outdated
Show resolved
Hide resolved
...tions/loaders-storage-targets/bigquery-loader/previous-versions/bigquery-loader-1.x/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
8f27c18
to
71dc92d
Compare
docs/pipeline-components-and-applications/loaders-storage-targets/bigquery-loader/_diagram.md
Outdated
Show resolved
Hide resolved
...tions/loaders-storage-targets/bigquery-loader/previous-versions/bigquery-loader-1.x/index.md
Outdated
Show resolved
Hide resolved
...tions/loaders-storage-targets/bigquery-loader/previous-versions/bigquery-loader-1.x/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update this page? https://docs.snowplow.io/docs/storing-querying/storage-options/ BigQuery should be now an option for the 3 clouds
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Show resolved
Hide resolved
...ications/loaders-storage-targets/bigquery-loader/upgrade-guides/2-0-0-upgrade-guide/index.md
Outdated
Show resolved
Hide resolved
- `9999999` is a hash code unique to the schema (i.e. it will change if the schema is overwritten with a different one). | ||
|
||
If you create a new schema `1-0-2` that reverts the offending changes and is again compatible with `1-0-0`, the data for events with that schema will be written to the original column as expected. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:::tip | |
You might find that some of your schemas were evolved incorrectly in the past, which results in the creation of these “recovery” columns after the upgrade. To address this for a given schema, create a new _minor_ schema version that reverts the breaking changes introduced in previous versions. (Or, if you want to keep the breaking change, create a new _major_ schema version.) You can set it to [supersede](/docs/understanding-tracking-design/versioning-your-data-structures/amending/index.md#marking-the-schema-as-superseded) the previous version(s), so that events are automatically validated against the new schema. | |
::: |
Do we also need instructions similar to these? https://docs.snowplow.io/docs/pipeline-components-and-applications/loaders-storage-targets/snowplow-rdb-loader/upgrade-guides/6-0-0-upgrade-guide/#identifying-schemas-that-need-patching @istreeter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That section "Identifying schemas that need patching" was written long ago before we had the supercededBy feature. I would think now we have supercededBy, we shouldn't recommend to anyone to patch schemas.
Instead, the user can publish the fixed schema as a new major version, and configure the old schemas to automatically upgrade to the new major version. This is consistent with our belief that schemas are immutable. And it means we don't need to write a new igluctl feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My doubt was not specifically about patching but whether running igluctl would be useful to identify issues with schemas... Because I think it says what the incompatibilities are? With just a recovery column, how will the users know what to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
igluctl has an existing sub-command verify-parquet
igluctl static verify-parquet <schema_dir>
It prints out warnings like
'com.test/test/jsonschema/1-0-1'. Changes: Incompatible type change String to Double at /id"
Unfortunately, it only works if the schemas are written into files on the local file system. It does not connect to remote Iglus.
We've not really planned what happens when existing users migrate to the new loader. And that is semi-deliberate: it is a lot of work to gather all the requirements and plan for all the things that could go wrong. So I've been delaying it until there is time to work on it properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stanch @istreeter as BQ Loader v2 relies on the same schema compatibility model/rules as parquet/Databricks/lake loader, we could use verify-parquet
igluctl command and recommend it to users to detect incompatibilities. But this parquet
in the name is confusing! Should we also do some renaming in igluctl then to make it more generic? And then also adjust docs
This will now need a rebase due to #915 |
No description provided.