Substitutions & Secrets ======================= .. meta:: :description: Environment substitutions, local variables, secret management, and file substitution support in Lakehouse Plumber. Summary ------- LakehousePlumber uses multiple substitution syntaxes, each resolved at a different stage of the generation pipeline. The table below shows all forms, their scope, and the processing order. .. list-table:: Substitution Types :header-rows: 1 :widths: 20 25 25 30 * - Syntax - Name - Scope - Defined In * - ``%{var}`` - Local variable - Flowgroup - ``variables:`` section in flowgroup YAML * - ``{{ param }}`` - Template parameter - Template - ``template_parameters:`` in flowgroup; consumed by Jinja2 template * - ``${token}`` - Environment token - Global / per-environment - ``substitutions/.yaml`` * - ``${secret:scope/key}`` - Secret reference - Global / per-environment - ``substitutions/.yaml`` (scope aliases); resolved to ``dbutils.secrets.get()`` **Processing order:** 1. ``%{var}`` — Local variables are resolved first, within the flowgroup 2. ``{{ param }}`` — Template parameters are expanded via Jinja2 3. ``${token}`` — Environment tokens are substituted from the env file 4. ``${secret:scope/key}`` — Secret references are converted to secure ``dbutils.secrets.get()`` calls Each phase only processes its own syntax and passes all other forms through untouched, so tokens from later phases can safely appear in earlier contexts (e.g., ``${catalog}`` inside a ``%{var}`` value). Environment Configuration ------------------------- Tokens wrapped in ``${token}`` are replaced at generation time using files under ``substitutions/.yaml``. This enables environment-specific configurations while keeping pipeline definitions portable. **Example substitution file:** .. code-block:: yaml :caption: substitutions/dev.yaml :linenos: :emphasize-lines: 10-15 # Environment-specific tokens dev: catalog: dev_catalog bronze_schema: bronze silver_schema: silver landing_path: /mnt/dev/landing checkpoint_path: /mnt/dev/checkpoints # Secret configuration secrets: default_scope: dev_secrets scopes: database_secrets: dev_db_secrets storage_secrets: dev_azure_secrets api_secrets: dev_external_apis Local Variables --------------- **Local variables** allow you to define reusable values within a single flowgroup, reducing repetition and improving maintainability. They are resolved **before** templates and environment substitutions. **Syntax:** ``%{variable_name}`` **Key Features:** - **Flowgroup-scoped**: Variables are only accessible within the flowgroup where they're defined - **Inline substitution**: Supports patterns like ``prefix_%{var}_suffix`` - **Strict validation**: Undefined variables cause immediate errors with clear messages - **Processed first**: Resolved before templates, presets, and environment substitutions **Example:** .. code-block:: yaml :caption: pipelines/customer_bronze.yaml :linenos: :emphasize-lines: 4-7,12,17,20 pipeline: acmi_edw_bronze flowgroup: customer_pipeline variables: entity: customer source_table: customer_raw target_table: customer actions: - name: "load_%{entity}_raw" type: load source: type: delta database: "${catalog}.${raw_schema}" table: "%{source_table}" target: "v_%{entity}_raw" - name: "write_%{entity}_bronze" type: write source: "v_%{entity}_cleaned" write_target: type: streaming_table database: "${catalog}.${bronze_schema}" table: "%{target_table}" .. seealso:: For complete details on local variables, see :doc:`templates_reference`. Secret Management ----------------- **Secret references** use the ``${secret:scope/key}`` syntax and are converted to secure ``dbutils.secrets.get()`` calls in generated Python code. LHP validates scope aliases and collects every secret used by the pipeline, making security reviews and approvals easier. **Secret reference formats:** - ``${secret:scope_alias/key}`` - Uses specific scope alias (resolved to actual Databricks scope) - ``${secret:key}`` - Uses default_scope if configured .. note:: Scope aliases (like ``database_secrets``) are mapped to actual Databricks secret scope names (like ``dev_db_secrets``) in the substitution file. This provides flexibility to use different scope names across environments while keeping pipeline definitions portable. File Substitution Support ------------------------- .. versionadded:: Latest LakehousePlumber now supports substitutions in external files, providing the same environment-specific flexibility for Python functions and SQL files that you have in YAML configurations. **Supported File Types:** ================== ================================================== File Type Where Used ================== ================================================== **Python Files** • Snapshot CDC ``source_function`` files • Python transform ``module_path`` files • Custom datasource ``module_path`` files **SQL Files** • SQL load actions with ``sql_path`` • SQL transform actions with ``sql_path`` ================== ================================================== **Example Python Function with Substitutions:** .. code-block:: python :caption: py_functions/customer_snapshot.py :linenos: :emphasize-lines: 4-5,10 from typing import Optional, Tuple from pyspark.sql import DataFrame catalog = "${catalog}" schema = "${bronze_schema}" def next_customer_snapshot(latest_version: Optional[int]) -> Optional[Tuple[DataFrame, int]]: if latest_version is None: df = spark.sql(f""" SELECT * FROM {catalog}.{schema}.customers WHERE snapshot_id = 1 """) return (df, 1) return None **Example SQL File with Substitutions:** .. code-block:: text :caption: sql/customer_metrics.sql :linenos: :emphasize-lines: 4-6 SELECT customer_id, customer_name, '${environment}' as source_env FROM ${catalog}.${bronze_schema}.customers WHERE created_date >= '${cutoff_date}' **Secret Support in Files:** Both Python and SQL files support secret substitutions with the same syntax as YAML: .. code-block:: python :caption: Example with secrets # Environment token api_endpoint = "${api_base_url}" # Secret reference api_key = "${secret:api_keys/service_key}" db_password = "${secret:database/password}" **Processing Behavior:** - **Tokens and secrets** are processed before the file content is used - **Python files** have substitutions applied before import management - **SQL files** have substitutions applied before query execution - **Backward compatible** - files without substitution variables work unchanged - **Same syntax** as YAML substitutions for consistency **Example pipeline with secrets:** .. code-block:: yaml :caption: pipelines/customer_ingestion/external_load.yaml :linenos: :emphasize-lines: 9-12 pipeline: customer_ingestion flowgroup: external_load actions: - name: load_from_postgres type: load source: type: jdbc url: "jdbc:postgresql://${secret:database_secrets/host}:5432/customers" user: "${secret:database_secrets/username}" password: "${secret:database_secrets/password}" driver: "org.postgresql.Driver" table: "customers" target: v_customers_raw **Generated Python code:** .. code-block:: python :caption: Generated DLT code with secure secret handling :linenos: :emphasize-lines: 6-8 @dp.temporary_view() def v_customers_raw(): """Load from external database""" df = spark.read \ .format("jdbc") \ .option("url", f"jdbc:postgresql://{dbutils.secrets.get(scope='dev_db_secrets', key='host')}:5432/customers") \ .option("user", f"{dbutils.secrets.get(scope='dev_db_secrets', key='username')}") \ .option("password", f"{dbutils.secrets.get(scope='dev_db_secrets', key='password')}") \ .option("driver", "org.postgresql.Driver") \ .option("dbtable", "customers") \ .load() return df Substitution Syntax ------------------- LakehousePlumber supports multiple substitution syntaxes for different purposes: **Local Variables (Flowgroup-scoped):** ``%{variable}`` .. code-block:: yaml variables: entity: customer actions: - name: "load_%{entity}_raw" target: "v_%{entity}_raw" **Environment Substitution:** ``${token}`` .. code-block:: yaml catalog: ${my_catalog} table: ${catalog}.${schema}.customers **Secret References:** ``${secret:scope/key}`` .. code-block:: yaml password: ${secret:database/db_password} **Template Parameters:** ``{{ parameter }}`` .. code-block:: yaml use_template: my_template template_parameters: table_name: customer # In template: table: "{{ table_name }}" .. note:: **Syntax Distinction:** - ``%{var}`` = Local variable (flowgroup-scoped) - ``${token}`` = Environment substitution - ``${secret:scope/key}`` = Secret reference - ``{{ parameter }}`` = Template parameter (Jinja2) .. warning:: **Legacy syntax:** The bare ``{token}`` form (without ``$``) is still supported for backward compatibility but is deprecated. In external Python files (transforms, batch handlers, custom datasources, snapshot CDC functions, custom sinks), the ``{token}`` pattern directly collides with Python f-string syntax — if a Python runtime variable like ``{catalog}`` in ``f"SELECT * FROM {catalog}.{schema}.table"`` matches a substitution token name, it will be silently replaced at generation time, breaking your code. The ``${token}`` syntax avoids this entirely because ``${}`` is not valid Python f-string syntax. Use ``${token}`` in all new configurations. .. note:: **Processing Order:** 1. **Local variables** (``%{var}``) are resolved first within the flowgroup 2. **Template parameters** (``{{ }}``) are resolved when templates are applied 3. **Environment substitutions** (``${ }``) are resolved at generation time 4. **Secret references** (``${secret:}``) are converted to ``dbutils.secrets.get()`` calls .. warning:: **Python Code Context:** When using LHP substitution tokens inside external Python files (batch handlers, Python transforms, custom datasources, snapshot CDC functions, custom sinks), you **must** use ``${}`` syntax. LHP applies substitution to these files at generation time, and the legacy ``{token}`` pattern matches Python f-string variables. .. code-block:: python :caption: Correct — LHP tokens use ${}, Python variables use {} # ${catalog} is replaced by LHP at generation time default_catalog = "${catalog}" # {table} is a Python runtime variable — safe because it has no $ prefix spark.sql(f"SELECT * FROM {default_catalog}.{table}") .. code-block:: python :caption: Dangerous — {catalog} collides with LHP substitution def my_transform(df, spark, parameters): catalog = parameters.get("catalog", "main") # If 'catalog' is also a substitution token, LHP replaces {catalog} # at generation time, breaking this f-string! return spark.sql(f"SELECT * FROM {catalog}.{schema}.lookup")