Presets Reference¶
Overview¶
Presets provide reusable configuration defaults that are automatically merged with explicit configurations in your FlowGroups and Templates. They enable consistent settings across your data platform without repetition.
Key Benefits:
Enforce organizational standards (error handling, retention policies)
Reduce configuration duplication
Simplify updates to common settings
Support configuration inheritance and precedence
How Presets Work:
Presets use implicit type matching to apply defaults. When a preset defines
load_actions.cloudfiles, those defaults automatically apply to any action
where source.type == "cloudfiles". No conditional logic or explicit matching
is required.
Preset Structure¶
Basic Structure¶
name: my_preset
version: "1.0"
description: "Preset description"
defaults:
load_actions:
cloudfiles:
options:
key: value
write_actions:
streaming_table:
table_properties:
key: value
Required Fields:
name: Unique identifier for the preset
defaults: Configuration defaults organized by action type
Optional Fields:
version: Version tracking for change management
description: Documentation about preset purpose
extends: Parent preset name for inheritance
Configuration Defaults¶
Load Actions¶
CloudFiles Defaults¶
name: cloudfiles_defaults
version: "1.0"
defaults:
load_actions:
cloudfiles:
options:
cloudFiles.rescuedDataColumn: "_rescued_data"
ignoreCorruptFiles: "true"
ignoreMissingFiles: "true"
cloudFiles.useStrictGlobber: "false"
cloudFiles.maxFilesPerTrigger: 200
cloudFiles.schemaEvolutionMode: "addNewColumns"
How it works:
Options are deep-merged into
source.optionsof CloudFiles actionsExplicit options in flowgroup/template override preset defaults on conflicts
Non-conflicting options from both sources are preserved
Generated Code Example:
When a template defines cloudFiles.format: csv and the preset defines the above,
the generated code contains ALL options:
df = (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "csv") # From template
.option("cloudFiles.rescuedDataColumn", "_rescued_data") # From preset
.option("ignoreCorruptFiles", "true") # From preset
.option("ignoreMissingFiles", "true") # From preset
.option("cloudFiles.useStrictGlobber", "false") # From preset
.option("cloudFiles.maxFilesPerTrigger", 200) # From preset
.load(path)
)
Write Actions¶
Streaming Table Defaults¶
name: bronze_layer
version: "1.0"
defaults:
write_actions:
streaming_table:
table_properties:
delta.enableRowTracking: "true"
delta.autoOptimize.optimizeWrite: "true"
delta.enableChangeDataFeed: "true"
quality: "bronze"
How it works:
Properties are deep-merged into
write_target.table_propertiesPreset properties + explicit properties are combined
Explicit properties override preset values on conflicts
Generated Code Example:
dp.create_streaming_table(
name="catalog.schema.table",
table_properties={
"PII": "true", # From explicit config
"delta.enableRowTracking": "true", # From preset
"delta.autoOptimize.optimizeWrite": "true", # From preset
"delta.enableChangeDataFeed": "true", # From preset
"quality": "bronze", # From preset
}
)
Preset Application¶
How Presets Match Actions¶
Presets use implicit type-based matching:
Load Actions:
load_actions.{source_type}matches actions wheresource.type == {source_type}load_actions.cloudfiles→ applies to CloudFiles load actionsload_actions.delta→ applies to Delta load actionsload_actions.jdbc→ applies to JDBC load actions
Write Actions:
write_actions.{target_type}matches actions wherewrite_target.type == {target_type}write_actions.streaming_table→ applies to streaming table writeswrite_actions.materialized_view→ applies to materialized view writes
No when conditions or explicit selectors are needed. The system automatically
applies the appropriate defaults based on action types.
Precedence Rules¶
When the same configuration is defined at multiple levels:
Flowgroup explicit config (highest precedence)
Flowgroup preset
Template explicit config
Template preset (lowest precedence)
Merge Behavior¶
Deep Merge for Nested Objects:
Options and properties are deep-merged, not replaced:
defaults:
load_actions:
cloudfiles:
options:
cloudFiles.rescuedDataColumn: "_rescued_data"
ignoreCorruptFiles: "true"
source:
type: cloudfiles
options:
cloudFiles.format: csv
cloudFiles.inferColumnTypes: "true"
Result: ALL options present in generated code:
.option("cloudFiles.format", "csv")
.option("cloudFiles.inferColumnTypes", "true")
.option("cloudFiles.rescuedDataColumn", "_rescued_data")
.option("ignoreCorruptFiles", "true")
Usage Examples¶
Template with Preset¶
Templates can include presets that apply to all generated actions:
name: ingestion_template
version: "1.0"
presets:
- cloudfiles_defaults # Applied to all template actions
parameters:
- name: table_name
type: string
required: true
actions:
- name: "load_{{ table_name }}"
type: load
source:
type: cloudfiles
options:
cloudFiles.format: parquet # Merges with preset options
target: "v_{{ table_name }}_raw"
Flowgroup with Multiple Presets¶
FlowGroups can apply multiple presets in order:
pipeline: data_pipeline
flowgroup: customer_ingestion
presets:
- cloudfiles_defaults
- bronze_layer
actions:
- name: load_customers
type: load
source:
type: cloudfiles
path: "/data/customers/*.csv"
options:
cloudFiles.format: csv
target: v_customers_raw
- name: write_customers
type: write
source: v_customers_raw
write_target:
type: streaming_table
database: "${catalog}.${schema}"
table: "customers"
Preset Inheritance¶
Presets can extend other presets:
name: base_config
version: "1.0"
defaults:
load_actions:
cloudfiles:
options:
cloudFiles.rescuedDataColumn: "_rescued_data"
name: bronze_cloudfiles
version: "1.0"
extends: base_config # Inherits from base_config
defaults:
load_actions:
cloudfiles:
options:
cloudFiles.maxFilesPerTrigger: 200
ignoreCorruptFiles: "true"
Best Practices¶
Structure Correctly
Always nest CloudFiles options under
options:key:# ✅ CORRECT load_actions: cloudfiles: options: cloudFiles.rescuedDataColumn: "_rescued_data" # ❌ WRONG - Missing 'options' nesting load_actions: cloudfiles: cloudFiles.rescuedDataColumn: "_rescued_data"
Use Descriptive Names
Good:
cloudfiles_error_handling,bronze_layer_defaultsBad:
preset1,my_preset
Version Presets
Track breaking changes with version numbers
Document Clearly
Explain what each preset configures and why
Test Merging
Verify preset + explicit configs merge correctly by inspecting generated code
Common Patterns¶
Error Handling Preset¶
name: error_handling
version: "1.0"
description: "Standard error handling for all data sources"
defaults:
load_actions:
cloudfiles:
options:
ignoreCorruptFiles: "true"
ignoreMissingFiles: "true"
cloudFiles.rescuedDataColumn: "_rescued_data"
Bronze Layer Preset¶
name: bronze_layer
version: "1.0"
description: "Standard configuration for bronze layer tables"
defaults:
write_actions:
streaming_table:
table_properties:
delta.enableRowTracking: "true"
delta.autoOptimize.optimizeWrite: "true"
delta.enableChangeDataFeed: "true"
quality: "bronze"
Performance Tuning Preset¶
name: performance_tuning
version: "1.0"
description: "Optimized settings for large-scale ingestion"
defaults:
load_actions:
cloudfiles:
options:
cloudFiles.maxFilesPerTrigger: 1000
cloudFiles.useStrictGlobber: "false"
Troubleshooting¶
Preset Options Not Appearing¶
Problem: Preset options don’t appear in generated code
Solutions:
Verify correct nesting structure:
Check that CloudFiles options are under
options:key:defaults: load_actions: cloudfiles: options: # ← This level is CRITICAL cloudFiles.rescuedDataColumn: "_rescued_data"
Check source type matches:
The source type in your action must match the preset key:
Action has
source.type: cloudfilesPreset must have
load_actions.cloudfiles
Inspect generated code:
Look at the
.option()calls in generated Python files to confirm merge
Property Conflicts¶
Problem: Explicit config value being ignored
Expected Behavior: Explicit configurations override preset values (by design)
Solution: This is correct behavior. If you want the preset value to win, remove the explicit configuration.
Preset Not Found Error¶
Problem: ValueError: Preset 'my_preset' not found
Solutions:
Verify preset file exists in
presets/directoryCheck preset filename matches the
namefield in the YAMLEnsure preset file has
.yamlextension
Limitations¶
No Conditional Logic:
Presets do NOT support:
whenconditions or conditional applicationDynamic value selection based on action properties
Runtime evaluation of expressions
Presets use simple type-based matching only. For conditional behavior, use separate presets for different scenarios.
No Field-Level Merging for Non-Dict Values:
Preset merge is deep for nested dictionaries, but not for:
Lists (preset list replaces explicit list entirely)
Scalar values (explicit value wins on conflict)
Summary¶
Key Takeaways:
Presets provide defaults that merge with explicit configs
Use correct structure with
optionsnesting for CloudFilesPresets match actions by type (implicit matching)
Explicit configs override preset defaults
Non-conflicting values from both sources are preserved
No conditional logic - use separate presets for different cases
Related Documentation:
Concepts & Architecture - Presets overview and basic examples
Templates Reference - Template documentation
Actions Reference - Action configuration reference