Lakehouse Plumber¶

Managing dozens of Lakeflow/DLT pipelines means thousands of lines of repetitive Python — inconsistent patterns, boilerplate sprawl, and painful maintenance across environments.

Lakehouse Plumber turns concise YAML actions into fully-featured Databricks Lakeflow Declarative Pipelines (formerly Delta Live Tables) — without hiding the Databricks platform you already know and love.

How LHP Solves It¶

Eliminates boilerplate — a template + 5-line config replaces 86 lines of Python per table.
Zero runtime overhead — pure code generation, not a runtime framework.
Transparent output — readable Python files, version-controlled and debuggable in the Databricks IDE.
Fits DataOps workflows — CI/CD, automated testing, multi-environment substitutions.
No lock-in — the output is plain Python & SQL you own and control.
Data democratization — power users create artifacts within platform standards.

Real-World Example

Instead of repeating 86 lines of Python per table, write a 5-line configuration:

customer_ingestion.yaml (5 lines per table)¶

pipeline: raw_ingestions
flowgroup: customer_ingestion

use_template: csv_ingestion_template
template_parameters:
  table_name: customer
  landing_folder: customer

Result: 4,300 lines of repetitive Python → 250 lines total (1 template + 50 simple configs). See Getting Started for the full template and generated output.

Quick Start¶

Get started in minutes:

pip install lakehouse-plumber
lhp init my_project --bundle
cd my_project

# Edit your YAML flowgroups (IntelliSense auto-configured)
lhp validate --env dev
lhp generate --env dev

# Inspect the generated/ directory — readable Python ready for Databricks

Note

New to LHP? Follow the Getting Started tutorial to build your first pipeline in 10 minutes.

Core Workflow¶

The execution model is deliberately simple:

        graph LR
    A[Load] --> B{0..N Transform}
    B --> C[Write]

Load Ingest raw data from CloudFiles, Delta, JDBC, SQL, or custom Python.
Transform Apply zero or many transforms (SQL, Python, schema, data-quality, temp-tables…).
Write Persist results as Streaming Tables, Materialized Views, or Snapshots.

Features at a Glance¶

Pipeline Definition

Actions — Load | Transform | Write with many sub-types (see Actions Reference).
Sinks — Stream to external destinations: Delta tables, Kafka, Event Hubs, custom APIs.
CDC & SCD — change-data capture SCD type 1 and 2, and snapshot ingestion.
Append Flows — multi-source writes to a single streaming table.
Data-Quality — declarative expectations integrated into transforms, with optional quarantine mode for DLQ recycling.
Seeding — seed data from existing tables using Lakeflow native features.

Reusability

Presets & Templates — reuse patterns without copy-paste.
Local Variables — flowgroup-scoped variables (%{var}) reduce repetition.
Substitutions — environment-aware tokens & secret references.

Operations

Operational Metadata — custom audit columns and metadata.
Pipeline Monitoring — centralized event log aggregation and analysis (see Pipeline Monitoring).
Test Result Reporting — publish DQ expectation results to Azure DevOps, Delta tables, or custom systems (see Test Result Reporting (Publishing)).
Dependency Analysis — automatic dependency detection and orchestration job generation (see Dependency Analysis & Job Generation).
Smart State Management — regenerate only what changed; cleanup orphaned code.

Developer Experience

IntelliSense — VS Code schema hints & YAML completion (automatically configured).

Next Steps¶

Getting Started

Getting Started – a hands-on walk-through using the ACMI demo project.
Examples – real-world examples and sink configurations.

Configuration Guides

Concepts & Architecture – deep-dive into FlowGroups, Actions, presets, templates and more.
Substitutions & Secrets – environment tokens, local variables, and secret management.
Operational Metadata – audit columns, version requirements, and event log configuration.
Multi-Flowgroup YAML Files – reduce file proliferation with multiple flowgroups per YAML file.
Actions Reference – complete reference for all action types and sub-types.
Templates Reference – comprehensive guide to creating and using templates.
Dynamic Templates Guide – conditionals, loops, and advanced Jinja2 features.
Presets Reference – reusable default configurations.
Enterprise Best Practices – enterprise patterns for naming, structure, presets, and production readiness.
Pipeline Patterns – practical patterns for multi-source ingestion, path filtering, and fan-in architectures.
Quarantine (Dead Letter Queue) – quarantine mode with DLQ recycling for data quality transforms.

Deployment & Operations

Databricks Asset Bundles Integration – integrate with Databricks Asset Bundles for production deployments.
Pipeline Monitoring – centralized event log monitoring and analysis across all pipelines.
Test Result Reporting (Publishing) – publish test results to external systems.
Dependency Analysis & Job Generation – pipeline dependency analysis and orchestration job generation.
CI/CD Reference – CI/CD patterns, deployment strategies, and DataOps best practices.

Reference

CLI Reference – command-line reference.
Error Reference – error codes, causes, and resolution steps.
API Reference – REST API reference.