CI/CD Reference

This comprehensive guide covers enterprise CI/CD patterns for deploying Lakehouse Plumber pipelines with Databricks Asset Bundles across development, testing, and production environments. It includes modern DataOps workflows and practical examples for GitHub Actions, Azure DevOps, and Bitbucket.

Prerequisites

Enterprise deployment of LHP requires: - Source control (Git) - CI/CD platform (GitHub Actions, Azure DevOps, Bitbucket Pipelines) - Databricks Asset Bundles (DABs)

CI/CD Overview

Lakehouse Plumber supports enterprise-grade CI/CD workflows that follow DataOps best practices for data pipeline deployment. The framework enables multiple deployment strategies while maintaining version consistency, audit trails, and robust state management.

Core CI/CD Principles:

Principle

Implementation

Single Source of Truth

YAML configurations are the authoritative source; Python files are ephemeral build artifacts

Version Consistency

Same commit SHA deployed across all environments ensures identical business logic

Environment Isolation

Different substitution files (dev.yaml, test.yaml, prod.yaml) provide environment-specific configurations

Approval Gates

Automated dev/test deployment with manual production approval requirements

Rollback Capability

Complete rollback to any previous version

Important

Generated Python files should never be committed to source control. They are treated as build artifacts and regenerated deterministically from YAML configurations.

This is to:
  • Prevent manual changes to Python files

  • Ensure that the Python files are always in sync with the YAML configurations

Repository Structure

Organize your repository structure to support clean CI/CD workflows and team collaboration.

Recommended repository structure
lakehouse-project/
├── .github/workflows/           # CI/CD pipeline definitions
│   ├── ci-validation.yml        # PR validation workflow
│   ├── dev-deployment.yml       # Automatic dev deployment
│   ├── test-promotion.yml       # Test environment promotion
│   ├── prod-deployment.yml      # Production deployment
│   └── monitoring.yml           # Health and deployment monitoring
├── .gitignore                   # Exclude generated files and state
├── databricks.yml               # Databricks Asset Bundle configuration
├── lhp.yaml                     # LHP project configuration (with version pinning)
├── pipelines/                   # Source pipeline definitions
│   ├── 01_raw_ingestion/
│   ├── 02_bronze/
│   ├── 03_silver/
│   └── 04_gold/
├── substitutions/               # Environment-specific configurations
│   ├── dev.yaml
│   ├── test.yaml
│   └── prod.yaml
├── presets/                     # Reusable configuration patterns
├── templates/                   # Reusable action patterns
├── expectations/                # Data quality definitions
├── schemas/                     # Schema definitions
├── generated/                   # Generated Python code (gitignored)
│   ├── dev/                     # Development environment code
│   ├── test/                    # Test environment code
│   └── prod/                    # Production environment code
├── resources/                   # Generated resource YAMLs (gitignored)
│   └── lhp/
│       ├── dev/                 # Development environment resources
│       ├── test/                # Test environment resources
│       └── prod/                # Production environment resources
├── scripts/                     # Deployment and monitoring scripts
│   ├── integration-tests.sh
│   ├── health-check.py
│   └── deployment-notify.sh
└── docs/                        # Project documentation

See also

For more information on the repository structure see Concepts & Architecture.

Version Management

Lakehouse Plumber supports semantic version (semver) pinning in lhp.yaml for reproducible builds across environments.

Version Pinning in lhp.yaml:

lhp.yaml with version pinning
1name: acme_edw
2version: "1.0"
3description: "acme Delta Lakehouse Project - TPC-H"
4author: "Joe Bloggs"
5created_date: "2025-07-11"
6required_lhp_version: ">=0.5.0,<0.6.0"
7
8include:

Benefits of Version Pinning:

  • Reproducible Builds: Same LHP version across all environments

  • Controlled Upgrades: Test new versions in dev before production

  • Dependency Management: Lock to compatible versions with your pipelines

  • CI/CD Stability: Prevent unexpected changes from automatic updates

Environment-Specific Generation:

Starting with LHP 0.5.0+, generated code and resource YAMLs are organized by environment:

generated/
├── dev/
│   └── pipeline_code.py
├── test/
│   └── pipeline_code.py
└── prod/
    └── pipeline_code.py

resources/
└── lhp/
    ├── dev/
    │   └── pipeline.yml
    ├── test/
    │   └── pipeline.yml
    └── prod/
        └── pipeline.yml

This structure provides:

  • Clear Separation: No accidental cross-environment deployments

  • Environment-Specific Configuration: For instance different cluster configurations in DABs pipeline.yml across environments

Deployment Strategies

Lakehouse Plumber supports multiple CI/CD deployment strategies to fit different organizational needs and maturity levels.

Branch-Based Promotion

Branch-based promotion uses separate branches for environment targeting.

When Branch-Based Promotion Might Be Appropriate:

  • Large, Distributed Data Teams: Organizations with multiple data engineering teams working on independent data domains might benefit from GitFlow approaches. Each team can maintain their own feature branches while coordinating releases through structured merge processes.

  • Regulated Industries: Financial services, healthcare, or other highly regulated industries may require the formal approval processes and audit trails that branch-based promotion provides. The structured release workflow can satisfy compliance requirements.

  • Complex Release Coordination: Organizations deploying large data platform updates quarterly or annually might prefer the predictable release cycles that GitFlow supports. This allows coordinating multiple team contributions into scheduled releases.

Branch Strategy:

Branch-based deployment triggers
1on:
2  push:
3    branches:
4      - main          # Triggers dev deployment
5      - release/test  # Triggers test deployment
6      - release/prod  # Triggers prod deployment

Promotion Process:

Branch promotion workflow
# Develop on feature branches
git checkout -b feature/customer-pipeline
git commit -m "Add customer segmentation pipeline"
git push origin feature/customer-pipeline

# Merge to main triggers dev deployment
git checkout main
git merge feature/customer-pipeline
git push origin main  # → Dev deployment

# Promote to test environment
git checkout release/test
git merge main
git push origin release/test  # → Test deployment

# Promote to production (with approval)
git checkout release/prod
git merge release/test
git push origin release/prod  # → Prod deployment (after approval)

Anatomy of branch-based promotion:

  • Branch Protection: Each environment branch has protection rules and required reviewers

  • Linear Progression: Changes flow through main → release/test → release/prod

  • Approval Gates: Production branch requires pull request approval before merge

  • Environment Isolation: Each branch represents a deployment environment

  • Rollback Strategy: Revert commits on environment branches for rollbacks

Deployment Strategy summary

Regardless of which CI/CD strategy you choose, the key is that YAML configurations are the single source of truth for data pipelines and generated Python code should be treated as build artifacts.

Environment Management

For environment management, Lakehouse Plumber uses substitution files and Databricks Asset Bundle targets to maintain consistent pipeline logic while adapting to environment-specific configurations.

Environment Architecture:

        graph TB
    subgraph "Source Control"
        A[YAML Pipelines<br/>Single Source of Truth]
        B[substitutions/dev.yaml]
        C[substitutions/test.yaml]
        D[substitutions/prod.yaml]
    end

    subgraph "Generation Process"
        E[lhp generate -e dev]
        F[lhp generate -e test]
        G[lhp generate -e prod]
    end

    subgraph "Environments"
        H[DEV Environment<br/>dev_catalog.bronze<br/>Fast iteration]
        I[TEST Environment<br/>test_catalog.bronze<br/>Quality validation]
        J[PROD Environment<br/>prod_catalog.bronze<br/>Business operations]
    end

    A --> E
    A --> F
    A --> G
    B --> E
    C --> F
    D --> G

    E --> H
    F --> I
    G --> J

    style A fill:#e1f5fe
    style H fill:#e8f5e8
    style I fill:#fff3e0
    style J fill:#ffebee
    

See also

For more information on substitution files see Substitutions & Secrets.

See also

For more information on Databricks Asset Bundles see Databricks Asset Bundles Integration.

Environment-Specific Configuration Files

In addition to substitution files, LHP supports environment-specific pipeline and job configuration files for fine-grained control over compute resources, notifications, and scheduling per environment.

Recommended file structure:

config/
├── pipeline_config-dev.yaml    # Dev: smaller clusters, no notifications
├── pipeline_config-prod.yaml   # Prod: larger clusters, full alerting
├── job_config-dev.yaml         # Dev: relaxed timeouts
└── job_config-prod.yaml        # Prod: strict SLAs, schedules

Common environment-specific differences:

Setting

Development

Production

Cluster size

Smaller nodes (cost efficiency)

Larger nodes (performance)

Concurrency

1-2 concurrent runs

3+ concurrent runs

Notifications

Minimal or none

Full alerting to ops teams

Timeouts

Relaxed (for debugging)

Strict (SLA enforcement)

Performance target

STANDARD

PERFORMANCE_OPTIMIZED

Usage in CI/CD:

# Development deployment
lhp generate -e dev -pc config/pipeline_config-dev.yaml
lhp deps -jc config/job_config-dev.yaml --bundle-output

# Production deployment
lhp generate -e prod -pc config/pipeline_config-prod.yaml
lhp deps -jc config/job_config-prod.yaml --bundle-output

See also

For complete configuration options and examples, see the Configuration Management section in Databricks Asset Bundles Integration.

Deployment overview using Databricks Asset Bundles

The following CI/CD workflow ensures consistency without storing generated artifacts in source control.

State Management Flow:

        flowchart TB
    subgraph "Local Development"
        A[YAML Changes] --> B[lhp generate --env dev]
        B --> C[.lhp_state.json<br/>Updated]
    end

    subgraph "CI/CD Pipeline"
        D[Clean Environment<br/>No state file] --> E[lhp generate --env prod]
        E --> F[Complete Regeneration<br/>Deterministic]
        F --> G[databricks bundle deploy --target prod]
        G --> H[Record Deployment<br/>Success/Failure]
    end


    C -.-> D

    style C fill:#e8f5e8
    style F fill:#fff3e0
    style G fill:#e1f5fe
    

CI/CD Deployment Workflows

Deployment workflows orchestrate the complete process from source changes to production deployment with appropriate validation and approval gates.

Complete Deployment Pipeline:

        flowchart TB
    subgraph "Pull Request Validation"
        A[PR Created] --> B[YAML Lint Check]
        B --> C[LHP Validate]
        C --> D[Security Scan]
        D --> E[Dry-run Generation]
        E --> F[Schema Validation]
        F --> G{All Checks Pass?}
        G -->|No| H[❌ Block Merge]
        G -->|Yes| I[✅ Allow Merge]
    end

    subgraph "Deployment Pipeline"
        I --> J[Merge to Main]
        J --> K[🚀 Deploy DEV]
        K --> L[Integration Tests]
        L --> M{Dev Tests Pass?}
        M -->|No| N[🔄 Rollback DEV]
        M -->|Yes| O[📊 Record Success]
        O --> P[Developer Creates<br/>v1.2.3-test Tag]
        P --> Q[🔄 Deploy TEST]
        Q --> R[Comprehensive Tests]
        R --> S{Test Validation?}
        S -->|No| T[🔄 Rollback TEST]
        S -->|Yes| U[Developer Creates<br/>v1.2.3-prod Tag]
        U --> V[⚠️ Approval Gate]
        V --> W[🚀 Deploy PROD]
        W --> X[Health Checks]
        X --> Y[📊 Success Metrics]
    end

    style K fill:#e8f5e8
    style Q fill:#fff3e0
    style W fill:#ffebee
    style V fill:#f3e5f5
    

Pull Request Validation

Comprehensive validation ensures code quality before changes reach deployment pipelines.

Example PR validation workflow with security hardening
 1name: PR Validation
 2
 3on:
 4  pull_request:
 5    branches: [main]
 6
 7concurrency:
 8  group: pr-${{ github.event.pull_request.number }}
 9  cancel-in-progress: true
10
11permissions:
12  contents: read
13  id-token: write
14  pull-requests: write  # For PR comments
15
16jobs:
17  validate:
18    runs-on: ubuntu-latest
19    timeout-minutes: 15
20
21    steps:
22      - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11  # v4.1.1
23        with:
24          fetch-depth: 0
25
26      - name: Setup Python
27        uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c  # v5.0.0
28        with:
29          python-version: '3.10'
30          cache: 'pip'
31
32      - name: Install Dependencies
33        run: |
34          pip install --upgrade pip
35          pip install lakehouse-plumber==0.3.8
36
37      - name: LHP Configuration Validation
38        run: |
39          lhp validate --env dev --verbose
40          lhp validate --env test --verbose
41          lhp validate --env prod --verbose
42
43      - name: Dry-Run Generation Test
44        run: |
45          lhp generate --env dev --dry-run --verbose
46
47      - name: Security Scan
48        uses: gitleaks/gitleaks-action@cb7149a9b57195b609c63e8518d2c6056677d2d0  # v2.3.3
49
50      - name: Comment PR Status
51        if: always()
52        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea  # v7.0.1
53        with:
54          script: |
55            const status = context.job.status === 'success' ? '✅' : '❌';
56            github.rest.issues.createComment({
57              issue_number: context.issue.number,
58              owner: context.repo.owner,
59              repo: context.repo.repo,
60              body: `${status} Validation ${context.job.status}`
61            })

Development Environment Deployment:

As indicated in the flowchart above, the development environment deployment is triggered by a push to the main branch.

Example automatic dev deployment workflow
 1dev-deployment:
 2  runs-on: ubuntu-latest
 3  if: github.ref == 'refs/heads/main' && github.event_name == 'push'
 4
 5  steps:
 6    - uses: actions/checkout@v4
 7
 8    - name: Generate Pipeline Code
 9      run: |
10        lhp generate --env dev
11        # Output: generated/dev/ and resources/lhp/dev/
12
13    - name: Deploy to Databricks
14      run: databricks bundle deploy --target dev
15      env:
16        DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_DEV_TOKEN }}
17        DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
18
19    - name: Run Integration Tests
20      run: ./scripts/integration-tests.sh dev
21
22    - name: Record Deployment
23      run: |
24        echo '{
25          "timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'",
26          "commit_hash": "'$GITHUB_SHA'",
27          "environment": "dev",
28          "lhp_version": "'$(lhp --version)'",
29          "pipeline_files": '$(find generated/dev/ -name "*.py" | jq -R . | jq -s .)',
30          "resource_files": '$(find resources/lhp/dev/ -name "*.yml" | jq -R . | jq -s .)'
31        }' > deployment-manifest-dev.json

Anatomy of deployment workflows:

  • Validation Gates: Multiple validation steps before any deployment

  • Environment Isolation: Separate credentials and configurations per environment

  • Test Integration: Automated testing after deployment

  • Audit Logging: Complete record of deployment activities

  • Failure Handling: Clear error messages and rollback procedures

Important

The above example code is not complete and is only for demonstration purposes.

Warning

Databricks recommends using Oauth for authentication to Databricks rather than using secrets or tokens.

Test Environment Promotion

Test environment promotion is triggered by developer-created tags and includes comprehensive testing.

Example test environment promotion workflow
 1test-promotion:
 2  runs-on: ubuntu-latest
 3  if: startsWith(github.ref, 'refs/tags/v') && endsWith(github.ref, '-test')
 4
 5  steps:
 6    - uses: actions/checkout@v4
 7      with:
 8        ref: ${{ github.ref }}  # Checkout the tagged commit
 9
10    - name: Validate Tag Format
11      run: |
12        if [[ ! "${{ github.ref_name }}" =~ ^v[0-9]+\.[0-9]+\.[0-9]+-test$ ]]; then
13          echo "❌ Invalid tag format. Use: v1.2.3-test"
14          exit 1
15        fi
16
17    - name: Generate for Test Environment
18      run: |
19        lhp generate --env test
20        # Output: generated/test/ and resources/lhp/test/
21
22    - name: Deploy to Test Environment
23      run: databricks bundle deploy --target test
24      env:
25        DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TEST_TOKEN }}
26        DATABRICKS_HOST: ${{ secrets.DATABRICKS_TEST_HOST }}
27
28    - name: Run Comprehensive Tests
29      run: |
30        ./scripts/smoke-tests.sh test
31        ./scripts/data-quality-tests.sh test
32        ./scripts/performance-tests.sh test

Selective Test Execution (Changed Pipelines Only) - COMING SOON

Note

This feature is coming soon and will integrate with LHP “Test” actions

Note

Future roadmap: an lhp impacted-pipelines command will accept changed paths or refs and output impacted pipeline names (and bundle resource names) for use with databricks bundle run <pipeline_name> -t <env>.

Production Deployment with Approval

Production deployment requires explicit approval and includes comprehensive validation and monitoring setup.

Example production deployment with approval workflow
 1prod-deployment:
 2  runs-on: ubuntu-latest
 3  if: startsWith(github.ref, 'refs/tags/v') && endsWith(github.ref, '-prod')
 4  environment:
 5    name: production
 6    url: https://prod-workspace.databricks.com
 7
 8  steps:
 9    - uses: actions/checkout@v4
10      with:
11        ref: ${{ github.ref }}
12
13    # Pre-deployment Validation handled by tag-based promotion and required approvals
14
15    - name: Generate Production Configuration
16      run: |
17        lhp generate --env prod
18        # Output: generated/prod/ and resources/lhp/prod/
19
20    - name: Production Deployment (manual approval gate)
21      run: databricks bundle deploy --target prod --mode production
22      env:
23        DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_PROD_TOKEN }}
24        DATABRICKS_HOST: ${{ secrets.DATABRICKS_PROD_HOST }}
25
26    - name: Post-deployment Verification
27      run: |
28        ./scripts/production-health-check.sh
29        ./scripts/validate-deployment.sh prod
30
31    - name: Setup Monitoring
32      run: ./scripts/setup-production-monitoring.sh
33
34    - name: Notify Stakeholders
35      run: |
36        ./scripts/notify-deployment-success.sh prod ${{ github.ref_name }}

Important

The above example code is not complete and is only for demonstration purposes.

Warning

Databricks recommends using Oauth for authentication to Databricks rather than using secrets or tokens.

Anatomy of production deployment:

  • Environment Protection: GitHub environment with required reviewers

  • Pre-deployment Validation: Ensures proper progression from test environment

  • Production Mode: Databricks bundle deployed with production-level validation

  • Health Checks: Comprehensive post-deployment verification

  • Monitoring Setup: Automated monitoring and alerting configuration

  • Stakeholder Communication: Automated notifications to relevant teams

Rollback Procedures

Rollback procedures provide rapid recovery from deployment issues while maintaining data consistency and audit trails.

Emergency Rollback Flow:

        flowchart TD
    A[🚨 Production Issue Detected] --> B{Issue Severity?}
    B -->|Critical| C[Emergency Rollback<br/>Sub-10 minutes]
    B -->|Minor| D[Planned Rollback<br/>Scheduled maintenance]

    C --> E[Identify Last Good Commit]
    E --> F[Create Rollback Tag<br/>v1.2.1-prod-rollback]
    F --> G[Auto-trigger Rollback Pipeline]
    G --> H[Deploy Previous Version<br/>Same commit SHA]
    H --> I[Critical Path Tests]
    I --> J{Tests Pass?}
    J -->|Yes| K[✅ Rollback Complete<br/>Issue Resolved]
    J -->|No| L[🆘 Escalate to Team<br/>Manual Intervention]

    D --> M[Schedule Maintenance Window]
    M --> N[Create Maintenance Tag<br/>v1.2.1-prod-maintenance]
    N --> O[Controlled Rollback]
    O --> P[Full Validation Suite]
    P --> Q[📊 Success Metrics]

    K --> R[📝 Incident Report<br/>Auto-generated]
    Q --> R
    L --> S[🚨 Page On-call Engineer]

    style C fill:#ffebee
    style H fill:#fff3e0
    style K fill:#e8f5e8
    style L fill:#ff5722
    

Immediate Rollback

Fast rollback for critical production issues using previous deployment artifacts.

Emergency rollback workflow
 1emergency-rollback:
 2  runs-on: ubuntu-latest
 3  if: startsWith(github.ref, 'refs/tags/v') && endsWith(github.ref, '-rollback')
 4  environment:
 5    name: production-emergency
 6
 7  steps:
 8    - name: Parse Rollback Target
 9      id: rollback-target
10      run: |
11        # Extract target version from tag (e.g., v1.2.1-rollback)
12        ROLLBACK_VERSION=$(echo "${{ github.ref_name }}" | sed 's/-rollback$//')
13        echo "rollback_version=$ROLLBACK_VERSION" >> $GITHUB_OUTPUT
14
15        # Find the commit SHA for the target version
16        ROLLBACK_COMMIT=$(git rev-list -n 1 ${ROLLBACK_VERSION}-prod)
17        echo "rollback_commit=$ROLLBACK_COMMIT" >> $GITHUB_OUTPUT
18
19    - uses: actions/checkout@v4
20      with:
21        ref: ${{ steps.rollback-target.outputs.rollback_commit }}
22
23    - name: Generate Rollback Configuration
24      run: |
25        lhp generate --env prod
26        # Regenerates from the rollback commit's YAML configurations
27
28    - name: Deploy Rollback
29      run: databricks bundle deploy --target prod --mode production
30      env:
31        DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_PROD_TOKEN }}
32        DATABRICKS_HOST: ${{ secrets.DATABRICKS_PROD_HOST }}
33
34    - name: Verify Rollback Success
35      run: |
36        ./scripts/critical-path-tests.sh prod
37        ./scripts/verify-rollback-success.sh
38
39    - name: Create Incident Report
40      run: |
41        ./scripts/create-incident-report.sh \
42          --rollback-from "$GITHUB_SHA" \
43          --rollback-to "${{ steps.rollback-target.outputs.rollback_commit }}" \
44          --environment "prod"

Anatomy of rollback procedures:

  • Fast Response: Sub-10-minute rollback capability for critical issues

  • Automated Discovery: Automatic identification of rollback targets

  • Data Consistency: Streaming checkpoints prevent data loss during rollback

  • Verification: Automated testing to confirm rollback success

  • Incident Tracking: Automatic creation of incident reports and documentation

Security and Compliance

Security and compliance considerations for CI/CD workflows ensure data protection, access control, and regulatory compliance throughout the deployment pipeline.

Best Practices

Proven best practices for implementing robust CI/CD pipelines with Lakehouse Plumber.

Workflow Security Hardening

Apply these security measures to all CI/CD workflows:

Concurrency Control:

Prevent overlapping workflow runs
1concurrency:
2  group: ${{ github.workflow }}-${{ github.ref_type }}-${{ github.ref_name }}
3  cancel-in-progress: true

Least Privilege Permissions:

Minimal required permissions
1permissions:
2  contents: read
3  id-token: write  # Only if using OIDC

Pin Action Versions:

Use commit SHAs instead of tags
1steps:
2  - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11  # v4.1.1
3  - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c  # v5.0.0
4  - uses: databricks/setup-cli@6071bbc2e5a862e896c755360cbc7a6a970c4e37  # v0.212.2

Version Pinning:

Pin Python and LHP versions
1- uses: actions/setup-python@<sha>
2  with:
3    python-version: '3.10'
4    cache: 'pip'
5
6- run: |
7    pip install --upgrade pip
8    pip install lakehouse-plumber==0.3.8  # Pin to project version

Platform-Specific Implementations

While the concepts above apply to all CI/CD platforms, this section provides specific implementation details for different platforms.

GitHub Actions Implementation

GitHub Actions is covered extensively in the examples above. Key features:

  • OIDC Auth Type: github-oidc

  • Environment Protection: Native GitHub environments

  • Secret Management: GitHub Secrets and Variables

  • Workflow Syntax: YAML with on:, jobs:, steps:

Azure DevOps Implementation

Azure DevOps Pipelines support OIDC authentication and provide enterprise features for Lakehouse Plumber deployments.

OIDC Federation Policy for Azure DevOps:

Create federation policy for Azure DevOps
databricks account service-principal-federation-policy create <SP_ID> --json '{
  "oidc_policy": {
    "issuer": "https://vstoken.dev.azure.com/<org_guid>",
    "audiences": ["api://AzureADTokenExchange"],
    "subject": "sc://<org>/<project>/<service_connection_name>"
  }
}'

Azure DevOps Pipeline Example:

azure-pipelines.yml
  1trigger:
  2  branches:
  3    include:
  4      - main
  5  tags:
  6    include:
  7      - v*-test
  8      - v*-prod
  9
 10pool:
 11  vmImage: ubuntu-latest
 12
 13variables:
 14  DATABRICKS_HOST: $(DATABRICKS_HOST)
 15  DATABRICKS_AUTH_TYPE: azure-service-principal
 16
 17stages:
 18- stage: Validate
 19  condition: eq(variables['Build.Reason'], 'PullRequest')
 20  jobs:
 21  - job: ValidatePR
 22    steps:
 23    - task: UsePythonVersion@0
 24      inputs:
 25        versionSpec: '3.10'
 26
 27    - script: |
 28        pip install --upgrade pip
 29        pip install lakehouse-plumber==0.3.8
 30      displayName: Install Dependencies
 31
 32    - script: |
 33        lhp validate --env dev --verbose
 34        lhp generate --env dev --dry-run
 35      displayName: Validate Configuration
 36
 37- stage: DeployDev
 38  condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
 39  jobs:
 40  - deployment: DeployToDev
 41    environment: development
 42    strategy:
 43      runOnce:
 44        deploy:
 45          steps:
 46          - checkout: self
 47
 48          - task: UsePythonVersion@0
 49            inputs:
 50              versionSpec: '3.10'
 51
 52          - task: AzureCLI@2
 53            inputs:
 54              azureSubscription: 'databricks-service-connection'
 55              scriptType: 'bash'
 56              scriptLocation: 'inlineScript'
 57              inlineScript: |
 58                # Get OIDC token
 59                export DATABRICKS_AZURE_CLIENT_ID=$(servicePrincipalId)
 60                export DATABRICKS_AZURE_TENANT_ID=$(tenantId)
 61                export DATABRICKS_AZURE_CLIENT_SECRET=$(servicePrincipalKey)
 62
 63                pip install lakehouse-plumber==0.3.8
 64                pip install databricks-cli
 65
 66                lhp generate --env dev
 67                databricks bundle deploy --target dev
 68
 69- stage: DeployTest
 70  condition: and(succeeded(), startsWith(variables['Build.SourceBranch'], 'refs/tags/v'), endsWith(variables['Build.SourceBranch'], '-test'))
 71  jobs:
 72  - deployment: DeployToTest
 73    environment: test
 74    strategy:
 75      runOnce:
 76        deploy:
 77          steps:
 78          - checkout: self
 79
 80          - task: UsePythonVersion@0
 81            inputs:
 82              versionSpec: '3.10'
 83
 84          - task: AzureCLI@2
 85            inputs:
 86              azureSubscription: 'databricks-service-connection'
 87              scriptType: 'bash'
 88              scriptLocation: 'inlineScript'
 89              inlineScript: |
 90                export DATABRICKS_AZURE_CLIENT_ID=$(servicePrincipalId)
 91                export DATABRICKS_AZURE_TENANT_ID=$(tenantId)
 92                export DATABRICKS_AZURE_CLIENT_SECRET=$(servicePrincipalKey)
 93
 94                pip install lakehouse-plumber==0.3.8
 95                pip install databricks-cli
 96
 97                lhp generate --env test
 98                databricks bundle deploy --target test
 99
100- stage: DeployProd
101  condition: and(succeeded(), startsWith(variables['Build.SourceBranch'], 'refs/tags/v'), endsWith(variables['Build.SourceBranch'], '-prod'))
102  jobs:
103  - deployment: DeployToProd
104    environment: production
105    strategy:
106      runOnce:
107        deploy:
108          steps:
109          - checkout: self
110
111          - task: UsePythonVersion@0
112            inputs:
113              versionSpec: '3.10'
114
115          - task: AzureCLI@2
116            inputs:
117              azureSubscription: 'databricks-service-connection-prod'
118              scriptType: 'bash'
119              scriptLocation: 'inlineScript'
120              inlineScript: |
121                export DATABRICKS_AZURE_CLIENT_ID=$(servicePrincipalId)
122                export DATABRICKS_AZURE_TENANT_ID=$(tenantId)
123                export DATABRICKS_AZURE_CLIENT_SECRET=$(servicePrincipalKey)
124
125                pip install lakehouse-plumber==0.3.8
126                pip install databricks-cli
127
128                lhp generate --env prod
129                databricks bundle deploy --target prod --mode production

Bitbucket Pipelines Implementation

Bitbucket Pipelines support OIDC authentication and provide cloud-native CI/CD for Databricks

OIDC Federation Policy for Bitbucket:

Create federation policy for Bitbucket
databricks account service-principal-federation-policy create <SP_ID> --json '{
  "oidc_policy": {
    "issuer": "https://api.bitbucket.org/2.0/workspaces/<workspace>/pipelines-config/identity/oidc",
    "audiences": ["ari:cloud:bitbucket::workspace/<workspace_uuid>"],
    "subject": "{<workspace_uuid>}/{<repo_uuid>}:{<environment>}:<branch_or_tag>"
  }
}'

Bitbucket Pipeline Example:

bitbucket-pipelines.yml
 1image: python:3.10
 2
 3definitions:
 4  steps:
 5    - step: &validate
 6        name: Validate Configuration
 7        script:
 8          - pip install --upgrade pip
 9          - pip install lakehouse-plumber==0.3.8
10          - lhp validate --env dev --verbose
11          - lhp generate --env dev --dry-run
12
13    - step: &deploy-dev
14        name: Deploy to Development
15        deployment: development
16        oidc: true
17        script:
18          - export DATABRICKS_CLIENT_ID=$DATABRICKS_CLIENT_ID
19          - export DATABRICKS_HOST=$DATABRICKS_DEV_HOST
20          - export DATABRICKS_AUTH_TYPE=bitbucket-oidc
21          - export DATABRICKS_OIDC_TOKEN=$BITBUCKET_STEP_OIDC_TOKEN
22
23          - pip install --upgrade pip
24          - pip install lakehouse-plumber==0.3.8
25          - curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
26
27          - lhp generate --env dev
28          - databricks bundle deploy --target dev
29
30    - step: &deploy-test
31        name: Deploy to Test
32        deployment: test
33        oidc: true
34        script:
35          - export DATABRICKS_CLIENT_ID=$DATABRICKS_CLIENT_ID
36          - export DATABRICKS_HOST=$DATABRICKS_TEST_HOST
37          - export DATABRICKS_AUTH_TYPE=bitbucket-oidc
38          - export DATABRICKS_OIDC_TOKEN=$BITBUCKET_STEP_OIDC_TOKEN
39
40          - pip install --upgrade pip
41          - pip install lakehouse-plumber==0.3.8
42          - curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
43
44          - lhp generate --env test
45          - databricks bundle deploy --target test
46
47    - step: &deploy-prod
48        name: Deploy to Production
49        deployment: production
50        oidc: true
51        script:
52          - export DATABRICKS_CLIENT_ID=$DATABRICKS_CLIENT_ID
53          - export DATABRICKS_HOST=$DATABRICKS_PROD_HOST
54          - export DATABRICKS_AUTH_TYPE=bitbucket-oidc
55          - export DATABRICKS_OIDC_TOKEN=$BITBUCKET_STEP_OIDC_TOKEN
56
57          - pip install --upgrade pip
58          - pip install lakehouse-plumber==0.3.8
59          - curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
60
61          - lhp generate --env prod
62          - databricks bundle deploy --target prod --mode production
63
64pipelines:
65  pull-requests:
66    '**':
67      - step: *validate
68
69  branches:
70    main:
71      - step: *deploy-dev
72
73  tags:
74    'v*-test':
75      - step: *deploy-test
76
77    'v*-prod':
78      - step: *deploy-prod
79
80  custom:
81    rollback-prod:
82      - variables:
83          - name: ROLLBACK_VERSION
84      - step:
85          name: Rollback Production
86          deployment: production
87          oidc: true
88          script:
89            - export DATABRICKS_CLIENT_ID=$DATABRICKS_CLIENT_ID
90            - export DATABRICKS_HOST=$DATABRICKS_PROD_HOST
91            - export DATABRICKS_AUTH_TYPE=bitbucket-oidc
92            - export DATABRICKS_OIDC_TOKEN=$BITBUCKET_STEP_OIDC_TOKEN
93
94            - git checkout tags/${ROLLBACK_VERSION}-prod
95            - pip install lakehouse-plumber==0.3.8
96            - curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
97
98            - lhp generate --env prod
99            - databricks bundle deploy --target prod --mode production