Substitutions & Secrets¶

Summary¶

LakehousePlumber uses multiple substitution syntaxes, each resolved at a different stage of the generation pipeline. The table below shows all forms, their scope, and the processing order.

Substitution Types¶
Syntax	Name	Scope	Defined In
`%{var}`	Local variable	Flowgroup	`variables:` section in flowgroup YAML
`{{ param }}`	Template parameter	Template	`template_parameters:` in flowgroup; consumed by Jinja2 template
`${token}`	Environment token	Global / per-environment	`substitutions/<env>.yaml`
`${secret:scope/key}`	Secret reference	Global / per-environment	`substitutions/<env>.yaml` (scope aliases); resolved to `dbutils.secrets.get()`

Processing order:

%{var} — Local variables are resolved first, within the flowgroup
{{ param }} — Template parameters are expanded via Jinja2
${token} — Environment tokens are substituted from the env file
${secret:scope/key} — Secret references are converted to secure dbutils.secrets.get() calls

Each phase only processes its own syntax and passes all other forms through untouched, so tokens from later phases can safely appear in earlier contexts (e.g., ${catalog} inside a %{var} value).

Environment Configuration¶

Tokens wrapped in ${token} are replaced at generation time using files under substitutions/<env>.yaml. This enables environment-specific configurations while keeping pipeline definitions portable.

Example substitution file:

substitutions/dev.yaml¶

# Environment-specific tokens
dev:
  catalog: dev_catalog
  bronze_schema: bronze
  silver_schema: silver
  landing_path: /mnt/dev/landing
  checkpoint_path: /mnt/dev/checkpoints

# Secret configuration
secrets:
  default_scope: dev_secrets
  scopes:
    database_secrets: dev_db_secrets
    storage_secrets: dev_azure_secrets
    api_secrets: dev_external_apis

Local Variables¶

Local variables allow you to define reusable values within a single flowgroup, reducing repetition and improving maintainability. They are resolved before templates and environment substitutions.

Syntax: %{variable_name}

Key Features:

Flowgroup-scoped: Variables are only accessible within the flowgroup where they’re defined
Inline substitution: Supports patterns like prefix_%{var}_suffix
Strict validation: Undefined variables cause immediate errors with clear messages
Processed first: Resolved before templates, presets, and environment substitutions

Example:

pipelines/customer_bronze.yaml¶

pipeline: acmi_edw_bronze
flowgroup: customer_pipeline

variables:
  entity: customer
  source_table: customer_raw
  target_table: customer

actions:
  - name: "load_%{entity}_raw"
    type: load
    source:
      type: delta
      database: "${catalog}.${raw_schema}"
      table: "%{source_table}"
    target: "v_%{entity}_raw"

  - name: "write_%{entity}_bronze"
    type: write
    source: "v_%{entity}_cleaned"
    write_target:
      type: streaming_table
      database: "${catalog}.${bronze_schema}"
      table: "%{target_table}"

Secret Management¶

Secret references use the ${secret:scope/key} syntax and are converted to secure dbutils.secrets.get() calls in generated Python code. LHP validates scope aliases and collects every secret used by the pipeline, making security reviews and approvals easier.

Secret reference formats:

${secret:scope_alias/key} - Uses specific scope alias (resolved to actual Databricks scope)
${secret:key} - Uses default_scope if configured

Note

Scope aliases (like database_secrets) are mapped to actual Databricks secret scope names (like dev_db_secrets) in the substitution file. This provides flexibility to use different scope names across environments while keeping pipeline definitions portable.

File Substitution Support¶

Added in version Latest.

LakehousePlumber now supports substitutions in external files, providing the same environment-specific flexibility for Python functions and SQL files that you have in YAML configurations.

Supported File Types:

File Type	Where Used
Python Files	Snapshot CDC `source_function` files Python transform `module_path` files Custom datasource `module_path` files
SQL Files	SQL load actions with `sql_path` SQL transform actions with `sql_path`

Example Python Function with Substitutions:

py_functions/customer_snapshot.py¶

from typing import Optional, Tuple
from pyspark.sql import DataFrame

catalog = "${catalog}"
schema = "${bronze_schema}"

def next_customer_snapshot(latest_version: Optional[int]) -> Optional[Tuple[DataFrame, int]]:
    if latest_version is None:
        df = spark.sql(f"""
            SELECT * FROM {catalog}.{schema}.customers
            WHERE snapshot_id = 1
        """)
        return (df, 1)
    return None

Example SQL File with Substitutions:

sql/customer_metrics.sql¶

SELECT
    customer_id,
    customer_name,
    '${environment}' as source_env
FROM ${catalog}.${bronze_schema}.customers
WHERE created_date >= '${cutoff_date}'

Secret Support in Files:

Both Python and SQL files support secret substitutions with the same syntax as YAML:

Example with secrets¶

# Environment token
api_endpoint = "${api_base_url}"

# Secret reference
api_key = "${secret:api_keys/service_key}"
db_password = "${secret:database/password}"

Processing Behavior:

Tokens and secrets are processed before the file content is used
Python files have substitutions applied before import management
SQL files have substitutions applied before query execution
Backward compatible - files without substitution variables work unchanged
Same syntax as YAML substitutions for consistency

Example pipeline with secrets:

pipelines/customer_ingestion/external_load.yaml¶

pipeline: customer_ingestion
flowgroup: external_load

actions:
  - name: load_from_postgres
    type: load
    source:
      type: jdbc
      url: "jdbc:postgresql://${secret:database_secrets/host}:5432/customers"
      user: "${secret:database_secrets/username}"
      password: "${secret:database_secrets/password}"
      driver: "org.postgresql.Driver"
      table: "customers"
    target: v_customers_raw

Generated Python code:

Generated DLT code with secure secret handling¶

@dp.temporary_view()
def v_customers_raw():
    """Load from external database"""
    df = spark.read \
        .format("jdbc") \
        .option("url", f"jdbc:postgresql://{dbutils.secrets.get(scope='dev_db_secrets', key='host')}:5432/customers") \
        .option("user", f"{dbutils.secrets.get(scope='dev_db_secrets', key='username')}") \
        .option("password", f"{dbutils.secrets.get(scope='dev_db_secrets', key='password')}") \
        .option("driver", "org.postgresql.Driver") \
        .option("dbtable", "customers") \
        .load()

    return df

Substitution Syntax¶

LakehousePlumber supports multiple substitution syntaxes for different purposes:

Local Variables (Flowgroup-scoped): %{variable}

variables:
  entity: customer

actions:
  - name: "load_%{entity}_raw"
    target: "v_%{entity}_raw"

Environment Substitution: ${token}

catalog: ${my_catalog}
table: ${catalog}.${schema}.customers

Secret References: ${secret:scope/key}

password: ${secret:database/db_password}

Template Parameters: {{ parameter }}

use_template: my_template
template_parameters:
  table_name: customer
# In template: table: "{{ table_name }}"

Note

Syntax Distinction:

%{var} = Local variable (flowgroup-scoped)
${token} = Environment substitution
${secret:scope/key} = Secret reference
{{ parameter }} = Template parameter (Jinja2)

Warning

Legacy syntax: The bare {token} form (without $) is still supported for backward compatibility but is deprecated. In external Python files (transforms, batch handlers, custom datasources, snapshot CDC functions, custom sinks), the {token} pattern directly collides with Python f-string syntax — if a Python runtime variable like {catalog} in f"SELECT * FROM {catalog}.{schema}.table" matches a substitution token name, it will be silently replaced at generation time, breaking your code. The ${token} syntax avoids this entirely because ${} is not valid Python f-string syntax. Use ${token} in all new configurations.

Note

Processing Order:

Local variables (%{var}) are resolved first within the flowgroup
Template parameters ({{ }}) are resolved when templates are applied
Environment substitutions (${ }) are resolved at generation time
Secret references (${secret:}) are converted to dbutils.secrets.get() calls

Warning

Python Code Context: When using LHP substitution tokens inside external Python files (batch handlers, Python transforms, custom datasources, snapshot CDC functions, custom sinks), you must use ${} syntax. LHP applies substitution to these files at generation time, and the legacy {token} pattern matches Python f-string variables.

Correct — LHP tokens use ${}, Python variables use {}¶

# ${catalog} is replaced by LHP at generation time
default_catalog = "${catalog}"

# {table} is a Python runtime variable — safe because it has no $ prefix
spark.sql(f"SELECT * FROM {default_catalog}.{table}")

Dangerous — {catalog} collides with LHP substitution¶

def my_transform(df, spark, parameters):
    catalog = parameters.get("catalog", "main")
    # If 'catalog' is also a substitution token, LHP replaces {catalog}
    # at generation time, breaking this f-string!
    return spark.sql(f"SELECT * FROM {catalog}.{schema}.lookup")