Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28814 Add OpenLineage reporting support for Spark connector #135

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ddebowczyk92
Copy link

This PR introduces OpenLineage support to the Spark HBase connector. The following changes and enhancements have been made:

  • Integration with OpenLineage: Implemented the LineageRelationProvider and LineageRelation interfaces in the DefaultSource and HBaseRelation classes, respectively, to provide input and output dataset identifiers.

  • Metadata Enrichment: Enhanced the connector to publish detailed lineage information, including datasets and operation facets.

  • Compatibility: Ensured compatibility with existing Spark jobs using the connector, allowing seamless lineage tracking without requiring significant modifications.

Key Benefits:

  • Improved Visibility: Provides enhanced visibility into data flow and transformations within Spark jobs using the HBase connector.
  • Facilitates Auditing and Compliance: Helps in tracking data lineage, aiding in auditing and compliance efforts.
  • Eases Debugging and Monitoring: Simplifies debugging and monitoring of data pipelines by providing detailed lineage information.
    Please review the changes and provide feedback. Your input is valuable in ensuring the robustness and utility of this integration.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 15s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for branch
+1 💚 mvninstall 1m 45s master passed
+1 💚 compile 1m 5s master passed
+1 💚 spotless 0m 18s branch has no errors when running spotless:check.
+1 💚 javadoc 1m 4s master passed
+1 💚 scaladoc 0m 0s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 8s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 16s root in the patch failed.
-1 ❌ compile 0m 15s root in the patch failed.
-1 ❌ javac 0m 15s root in the patch failed.
-1 ❌ scalac 0m 15s root in the patch failed.
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
-1 ❌ spotless 0m 6s patch has 25 errors when running spotless:check, run spotless:apply to fix.
-1 ❌ javadoc 0m 14s root in the patch failed.
+1 💚 scaladoc 0m 0s the patch passed
_ Other Tests _
-1 ❌ unit 0m 19s root in the patch failed.
8m 21s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #135
Optional Tests dupname javac javadoc unit spotless xml compile scalac scaladoc
uname Linux 9de6320aa829 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / 166b764
Default Java Oracle Corporation-1.8.0_282-b08
mvninstall https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/artifact/yetus-precommit-check/output/patch-mvninstall-root.txt
compile https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/artifact/yetus-precommit-check/output/patch-compile-root.txt
javac https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/artifact/yetus-precommit-check/output/patch-compile-root.txt
scalac https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/artifact/yetus-precommit-check/output/patch-compile-root.txt
spotless https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/artifact/yetus-precommit-check/output/patch-spotless.txt
javadoc https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/artifact/yetus-precommit-check/output/patch-javadoc-root.txt
unit https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/artifact/yetus-precommit-check/output/patch-unit-root.txt
Test Results https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/testReport/
Max. process+thread count 45 (vs. ulimit of 12500)
modules C: . spark/hbase-spark U: .
Console output https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/1/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@ddebowczyk92 ddebowczyk92 force-pushed the add-openlineage-support branch from 164f9ac to ba0f9b5 Compare August 26, 2024 14:00
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 12s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for branch
-1 ❌ mvninstall 0m 43s root in master failed.
+1 💚 compile 1m 0s master passed
+1 💚 spotless 0m 14s branch has no errors when running spotless:check.
+1 💚 javadoc 0m 49s master passed
+1 💚 scaladoc 0m 0s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 8s Maven dependency ordering for patch
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 compile 1m 29s the patch passed
+1 💚 javac 1m 29s the patch passed
+1 💚 scalac 1m 29s root generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 spotless 0m 11s patch has no errors when running spotless:check.
+1 💚 javadoc 1m 28s the patch passed
+1 💚 scaladoc 0m 0s the patch passed
_ Other Tests _
+1 💚 unit 9m 23s root in the patch passed.
19m 22s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/2/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #135
Optional Tests dupname javac javadoc unit spotless xml compile scalac scaladoc
uname Linux 6cdae7eb7ebc 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / 166b764
Default Java Oracle Corporation-1.8.0_282-b08
mvninstall https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/2/artifact/yetus-precommit-check/output/branch-mvninstall-root.txt
Test Results https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/2/testReport/
Max. process+thread count 965 (vs. ulimit of 12500)
modules C: spark/hbase-spark . U: .
Console output https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/2/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@ddebowczyk92 ddebowczyk92 force-pushed the add-openlineage-support branch from ba0f9b5 to 1c8b91a Compare September 2, 2024 09:59
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 12s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for branch
+1 💚 mvninstall 1m 28s master passed
+1 💚 compile 0m 52s master passed
+1 💚 spotless 0m 15s branch has no errors when running spotless:check.
+1 💚 javadoc 0m 53s master passed
+1 💚 scaladoc 0m 0s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 8s Maven dependency ordering for patch
+1 💚 mvninstall 1m 26s the patch passed
+1 💚 compile 1m 33s the patch passed
+1 💚 javac 1m 33s the patch passed
+1 💚 scalac 1m 33s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 spotless 0m 12s patch has no errors when running spotless:check.
+1 💚 javadoc 1m 31s the patch passed
+1 💚 scaladoc 0m 0s the patch passed
_ Other Tests _
+1 💚 unit 9m 37s root in the patch passed.
20m 34s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/3/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #135
Optional Tests dupname javac javadoc unit spotless xml compile scalac scaladoc
uname Linux 881a22460691 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / 166b764
Default Java Oracle Corporation-1.8.0_282-b08
Test Results https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/3/testReport/
Max. process+thread count 966 (vs. ulimit of 12500)
modules C: spark/hbase-spark . U: .
Console output https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/3/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@ddebowczyk92 ddebowczyk92 changed the title Add OpenLineage reporting support for Spark connector HBASE-28814 Add OpenLineage reporting support for Spark connector Sep 4, 2024
@ddebowczyk92 ddebowczyk92 force-pushed the add-openlineage-support branch from 1c8b91a to 2d3690c Compare September 4, 2024 15:29
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 8s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for branch
+1 💚 mvninstall 1m 24s master passed
+1 💚 compile 0m 53s master passed
+1 💚 spotless 0m 15s branch has no errors when running spotless:check.
+1 💚 javadoc 0m 51s master passed
+1 💚 scaladoc 0m 0s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 8s Maven dependency ordering for patch
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 compile 1m 32s the patch passed
+1 💚 javac 1m 32s the patch passed
+1 💚 scalac 1m 32s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 spotless 0m 11s patch has no errors when running spotless:check.
+1 💚 javadoc 1m 30s the patch passed
+1 💚 scaladoc 0m 0s the patch passed
_ Other Tests _
+1 💚 unit 9m 35s root in the patch passed.
20m 12s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/4/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #135
Optional Tests dupname javac javadoc unit spotless xml compile scalac scaladoc
uname Linux fe300999d764 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / 166b764
Default Java Oracle Corporation-1.8.0_282-b08
Test Results https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/4/testReport/
Max. process+thread count 968 (vs. ulimit of 12500)
modules C: spark/hbase-spark . U: .
Console output https://ci-hbase.apache.org/job/HBase-Connectors-PreCommit/job/PR-135/4/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@mobuchowski
Copy link

mobuchowski commented Oct 25, 2024

@petersomogyi @ndimiduk @NihalJain hey, as HBase Committers active in this repository, could you find the time to take a look at this PR and provide any feedback?

Thanks from another Apache committer 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants