Skip to content

Latest commit

 

History

History
 
 

Splunk-Databricks integration pattern and quick setup

This demo is a collaborative work with [email protected].

This is an automated terraform template to deploy Databricks workspace and a VM hosting Splunk (deployed docker image https://hub.docker.com/r/splunk/splunk/), and integrate Splunk-Databricks.

Overall Architecture

alt text

Context

Please read the source repo: https://github.com/databrickslabs/splunk-integration

Quote: The Databricks add-on for Splunk, an app, that allows Splunk Enterprise and Splunk Cloud users to run queries and execute actions, such as running notebooks and jobs, in Databricks.

What you can do using this integration app? (Quoted from source repo)

  1. Run Databricks SQL queries right from the Splunk search bar and see the results in Splunk UI.
  2. Execute actions in Databricks, such as notebook runs and jobs, from Splunk.
  3. Use Splunk SQL database extension to integrate Databricks information with Splunk queries and reports.
  4. Push events, summary, alerts to Splunk from Databricks.
  5. Pull events, alerts data from Splunk into Databricks.

Getting started

Step 1:

Clone this repo to your local, and make sure you have installed Terraform on your machine. See https://learn.hashicorp.com/tutorials/terraform/install-cli on how to install terraform on your machine.

Step 2:

Navigate to this folder /adb-splunk, run terraform init and terraform apply then type yes when prompted. This will deploy the infra to your Azure subscription, specifically it deploys a resource group, a vnet with 3 subnets inside, a databricks workspace, a vm, and a storage account.

Step 3:

There will be an output id address, use that to replace the public ip in http://20.212.33.56:8000, then login using default username and password: admin and password, this brings you to the Splunk VM landing page.

Step 4:

Logged into Splunk vm UI, then go to Databricks connector, follow the instructuions to interact with Databricks clusters from within Splunk.

alt text

Step 5:

(Clean up resources) To remove all resources, run terraform destroy.

Requirements

Name Version
azurerm >=2.83.0
databricks >=0.5.1
tls >= 3.1

Providers

Name Version
azurerm 3.11.0
external 2.2.2
local 2.2.3
random 3.3.2
tls 3.4.0

Modules

Name Source Version
adls_content ./modules/adls_content n/a

Resources

Name Type
azurerm_databricks_workspace.this resource
azurerm_linux_virtual_machine.example resource
azurerm_network_interface.splunk-nic resource
azurerm_network_security_group.this resource
azurerm_public_ip.splunk-nic-pubip resource
azurerm_resource_group.this resource
azurerm_storage_blob.splunk_databricks_app_file resource
azurerm_storage_blob.splunk_setup_file resource
azurerm_subnet.private resource
azurerm_subnet.public resource
azurerm_subnet.splunksubnet resource
azurerm_subnet_network_security_group_association.private resource
azurerm_subnet_network_security_group_association.public resource
azurerm_virtual_machine_extension.splunksetupagent resource
azurerm_virtual_network.this resource
local_file.private_key resource
local_file.setupscript resource
random_string.naming resource
tls_private_key.splunk_ssh resource
azurerm_client_config.current data source
external_external.me data source

Inputs

Name Description Type Default Required
dbfs_prefix n/a string "dbfs" no
no_public_ip n/a bool true no
private_subnet_endpoints n/a list [] no
rglocation n/a string "southeastasia" no
spokecidr n/a string "10.179.0.0/20" no
workspace_prefix n/a string "adb" no

Outputs

Name Description
databricks_azure_workspace_resource_id n/a
splunk_public_ip n/a
workspace_url n/a