Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC for multi-output #767

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from
Draft

POC for multi-output #767

wants to merge 20 commits into from

Conversation

audunska
Copy link
Contributor

@audunska audunska commented Jul 12, 2023

This is proof of concept for how we might support multiple outputs from a single transformation.

The idea is that you write sql like

select
struct(fields for asset) as a,
struct(fields for event) as e
from...

and then in the type relation option, you specify the types of those names like a:asset,e:event. This would be passed in a more structured json way to the jetfire-backend, but we have to flatten it to a string in spark.

Internally, we parse out the names and types, and then create a relation for each of them. Specific options to each source can be passed by prefixing the options with ${name}:, i.e., an option named a:assetSubTreeIds would be passed to the asset relation.

See the new test in AssetTests for a proof that this works in a simple case. Unfortunately, we have to give all fields in the correct order and explicitly cast all null fields for this to typecheck in spark. Hopefully this can be alleviated in some way. EDIT: This now just works!

This is just a POC. Ideas / criticisms are welcome!

@codecov
Copy link

codecov bot commented Aug 10, 2023

Codecov Report

Merging #767 (fb052b8) into master (0719df1) will decrease coverage by 0.04%.
Report is 1 commits behind head on master.
The diff coverage is 76.59%.

@@            Coverage Diff             @@
##           master     #767      +/-   ##
==========================================
- Coverage   77.07%   77.04%   -0.04%     
==========================================
  Files          49       50       +1     
  Lines        3407     3441      +34     
  Branches      152      154       +2     
==========================================
+ Hits         2626     2651      +25     
- Misses        781      790       +9     
Files Changed Coverage Δ
...rc/main/scala/cognite/spark/v1/MultiRelation.scala 57.14% <57.14%> (ø)
...rc/main/scala/cognite/spark/v1/DefaultSource.scala 91.35% <84.84%> (-0.58%) ⬇️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant