Skip to content

sajohnstone/azure-databricks

Repository files navigation

Terraform Azure Databricks Environment

This project provides Terraform configurations to deploy an example Databricks environment to Azure. It uses Docker containers to manage tooling such as Terraform itself. The project also includes a Makefile to streamline common operations. Note: Since creating this repo I've found this repo (https://github.com/databricks/terraform-databricks-sra) which is an excellent resource.

Project Structure

  • main.tf: Terraform configuration for deploying Databricks resources on Azure.
  • variables.tf: Definitions for Terraform input variables.
  • Dockerfile: Docker configuration for setting up the environment with necessary tools.
  • Makefile: A file containing commands to manage the Terraform environment and tooling.
  • README.md: This file.

Getting Started

Prerequisites

  • Docker installed on your machine.
  • Make installed on your machine.

SAT Tool

To use the SAT tool the service principal 'SP for Security Analysis Tool' which is generated by this needs to be given Account Admin rights. It is not possible to automated this so to use the SAT tool the process is:

  1. Make the SP for Security Analysis Tool an Account Admin
  2. Run the 'SAT Initializer Notebook (one-time)' job

Docker Setup

The Docker container is configured to run Terraform and other tooling required for this project. It will map a volume to bring in the credentials for Azure and Databricks but you will either need to configure these prior to running or update the use the appropriate secrets.

Repo Setup

  • core: Install this first as this creates a VNet and Bastion that can be used to deploy databricks
  • databricks: If you enable full PrivateLink then you need to deploy from within your Azure VNet or it will fail.

References

Databricks SAT tool: https://github.com/databricks-industry-solutions/security-analysis-tool Databricks Dashboards: https://github.com/databricks/tmm/tree/main/System-Tables-Demo

Commands

The project includes a Makefile with several commands to help manage your Terraform configurations. Here’s a brief overview of each command:

  • make apply: Deploys the resources defined in your Terraform configuration to Azure.

    make apply
  • make check-security: Performs static analysis on your Terraform templates to identify potential security issues.

    make check-security
  • make destroy: Destroys all the resources created by the Terraform configuration.

    make destroy
  • make documentation: Generates the README.md file for your project.

    make documentation
  • make format: Rewrites all Terraform configuration files to a canonical format.

    make format
  • make lint: Checks for possible errors and best practices in your Terraform configuration.

    make lint
  • make plan: Shows the deployment plan for your Terraform configuration, outlining what changes will be made.

    make plan

Terraform Documentation

Requirements

Name Version
terraform >= 1.0
azurerm ~> 3.1
databricks ~> 1.4

Providers

Name Version
azurerm 3.113.0
databricks.workspace 1.49.0

Modules

No modules.

Resources

Name Type
azurerm_databricks_access_connector.external resource
azurerm_databricks_workspace.this resource
azurerm_key_vault.this resource
azurerm_network_security_group.private resource
azurerm_network_security_group.public resource
azurerm_resource_group.this resource
azurerm_role_assignment.external resource
azurerm_storage_account.this resource
azurerm_storage_container.this resource
azurerm_subnet.private resource
azurerm_subnet.public resource
azurerm_subnet_network_security_group_association.private resource
azurerm_subnet_network_security_group_association.public resource
azurerm_virtual_network.this resource
databricks_catalog.sandbox resource
databricks_catalog.sandbox_new resource
databricks_cluster.this resource
databricks_external_location.external resource
databricks_job.catalog_migration resource
databricks_metastore_assignment.this resource
databricks_notebook.create_sample_data resource
databricks_notebook.migrate_data resource
databricks_notebook.run_tests resource
databricks_storage_credential.external resource
azurerm_client_config.current data source
databricks_node_type.smallest data source
databricks_spark_version.latest_lts data source

Inputs

Name Description Type Default Required
azure_subscription_id The ID of the Azure subscription string n/a yes
databricks_account_id (Required) The ID of the Databricks string n/a yes
databricks_sku (Optional) The SKU to use for the databricks instance string n/a yes
environment (Required) Three character environment name string n/a yes
location (Optional) The location for resource deployment string "australiaeast" no
metastore_id (Required) The ID of the Metastore string n/a yes
project (Required) The project name string n/a yes

Outputs

No outputs.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published