Snakemake Workflow#
PyPSA-GB uses Snakemake to orchestrate the model building and solving process. This page explains the workflow structure.
Workflow Overview#
The main workflow is defined in Snakefile and consists of modular rules in rules/*.smk:
flowchart TB
subgraph Data["Data Processing"]
FES["Process FES Data"]
DUKES["Process DUKES"]
REPD["Process REPD"]
end
subgraph Network["Network Assembly"]
BASE["Build Base Network"]
DEMAND["Add Demand"]
FLEX["Add Demand Flexibility"]
RENEW["Add Renewables"]
THERMAL["Add Thermal Generators"]
STORAGE["Add Storage"]
HYDRO["Add Hydrogen System"]
INTER["Add Interconnectors"]
end
subgraph Cluster["Optional Transforms"]
CLUST["Cluster Network"]
UPGRADES["Apply ETYS Upgrades"]
end
subgraph Solve["Optimization"]
FINALIZE["Finalize Network"]
SOLVE["Solve Network"]
ANALYZE["Analyze Results"]
end
subgraph Market["Optional Market Dispatch"]
WHOLESALE["Solve Wholesale Market"]
BM["Solve Balancing Mechanism"]
MARKET_ANALYZE["Analyze Market Results"]
end
FES --> RENEW
FES --> THERMAL
FES --> STORAGE
FES --> HYDRO
DUKES --> THERMAL
REPD --> RENEW
BASE --> DEMAND
DEMAND --> FLEX
FLEX --> RENEW
RENEW --> THERMAL
THERMAL --> STORAGE
STORAGE --> HYDRO
HYDRO --> INTER
INTER --> CLUST
INTER --> UPGRADES
CLUST --> FINALIZE
UPGRADES --> FINALIZE
INTER --> FINALIZE
FINALIZE --> SOLVE
SOLVE --> ANALYZE
FINALIZE --> WHOLESALE
WHOLESALE --> BM
BM --> MARKET_ANALYZE
Two Workflows#
PyPSA-GB has two separate Snakemake workflows:
Main Workflow (Snakefile)#
Runs the core model - assumes weather cutouts exist.
snakemake resources/network/HT35_solved.nc -j 4
Cutouts Workflow (Snakefile_cutouts)#
Acquires weather cutouts using a tiered strategy to minimize wait times:
Check
data/atlite/cutouts/for cached filesDownload from Zenodo (~5-10 minutes for years 2010-2024)
Generate from ERA5 via atlite (~2-4 hours per year)
# Configure years in config/cutouts_config.yaml, then run:
snakemake -s Snakefile_cutouts --cores 1
Key Features:
Automatic Zenodo download for years 2010-2024 (no CDS credentials needed)
MD5 checksum verification
Falls back to ERA5 API for other years or if Zenodo fails
See Installation for detailed setup instructions.
Rule Files#
Rules are organized by function in rules/:
File |
Purpose |
|---|---|
|
Base network construction (ETYS/Reduced/Zonal) |
|
Demand extraction, profiles, disaggregation, and demand-side flexibility integration |
|
Renewable generation profiles (wind/solar profiles from ERA5) |
|
Thermal generator integration (DUKES historical + FES future) |
|
Storage unit integration (battery, pumped hydro, LAES) |
|
Hydrogen system (electrolysis, H2 storage, H2 turbines) |
|
Cross-border connections (GB-Europe, GB-Ireland, etc.) |
|
Network reduction/clustering (spatial/k-means presets) |
|
FES data downloading and processing |
|
Network finalization and optimization solving |
|
Optional wholesale market, balancing mechanism, validation, and market notebooks |
|
Results analysis, spatial plots, dashboards, notebooks |
Key Rules#
build_network#
Creates the base PyPSA network from ETYS/Reduced/Zonal topology.
For the ETYS network model, this is a two-stage pipeline:
process_ETYS_data— Parses the raw ETYS Appendix B Excel file into standardized intermediate CSVs:resources/network/ETYS/ETYS_{year}_components.csv(circuits, transformers, HVDC)resources/network/ETYS/ETYS_{year}_buses.csv(bus names, coordinates, voltage levels)Uses
GB_network.xlsxfor offshore wind farm edges and demand node mappingsUses
substation_coordinates.csvfor bus coordinate lookupsScript:
scripts/network_build/process_ETYS_data.py
build_ETYS_base_network— Assembles the intermediate CSVs into a PyPSA network:Resolves missing bus coordinates using a multi-tier strategy (GSP mapping → substation lookup → prefix fallback → distance-weighted guessing)
Validates bus locations against GB land boundaries (GSP region GeoJSON)
Identifies offshore buses (OFTO wind farm connection nodes)
Applies ETYS network upgrades if
etys_upgrades.enabled: trueScript:
scripts/network_build/ETYS_network.py
For Reduced and Zonal networks, a single rule reads the CSV files directly.
Input: Network topology files in data/network/
Output: resources/network/{model}_base_network.nc (then copied to {scenario}_network.nc)
integrate_renewable_generators#
Adds renewable generators with capacity and profiles.
Input: REPD sites (historical) or FES projections (future), demand network
Output: resources/network/{scenario}_network_demand_renewables.pkl
solve_network#
Runs the linear optimal power flow optimization.
Input: Complete unsolved network
Output: resources/network/{scenario}_solved.nc
solve_wholesale_market#
Runs the optional Stage 1 market solve when market.enabled: true. It relaxes
line and transformer capacities to create a copperplate wholesale schedule and
uniform demand-bus wholesale price.
Input: Finalized network (resources/network/{scenario}.nc)
Output: resources/market/{scenario}_wholesale.nc and wholesale CSVs
solve_balancing_mechanism#
Runs the optional Stage 2 market solve unless market.wholesale_only: true. It
restores network constraints and redispatches from the wholesale position using
configured bid and offer prices.
Input: Finalized network and wholesale outputs
Output: resources/market/{scenario}_balancing.nc, redispatch, congestion, and price comparison CSVs
See Market Dispatch for configuration and output details.
Running the Workflow#
Basic Commands#
# Run to a specific target
snakemake resources/network/HT35_solved.nc -j 4
# Dry run (show what would execute)
snakemake resources/network/HT35_solved.nc -n -p
# Force re-run of specific rule and downstream
snakemake -R solve_network -j 4
# Run all active scenarios
snakemake -j 4
Parallel Execution#
The -j flag controls parallelism:
# Single-threaded (sequential)
snakemake target.nc -j 1
# Use 4 cores
snakemake target.nc -j 4
# Use all available cores
snakemake target.nc -j
Viewing the DAG#
Generate a visualization of the workflow:
snakemake --dag resources/network/HT35_solved.nc | dot -Tpng > dag.png
Intermediate Files#
The workflow generates intermediate files that can be inspected. Files use .pkl format for fast I/O during development, switching to .nc at final assembly:
resources/network/
├── ETYS/
│ ├── ETYS_2024_components.csv # Stage 1: parsed circuits/transformers/HVDC
│ └── ETYS_2024_buses.csv # Stage 1: bus names and coordinates
├── ETYS_2024_base_network.nc # Stage 2: assembled PyPSA network
├── HT35_network.nc # Base network (scenario copy)
├── HT35_network_demand.pkl # + demand
├── HT35_network_demand_renewables.pkl # + renewables
├── HT35_network_demand_renewables_thermal_generators.pkl # + thermal
├── HT35_network_demand_renewables_thermal_generators_storage.pkl # + storage
├── HT35_network_demand_renewables_thermal_generators_storage_hydrogen.pkl # + H2
├── HT35_network_demand_renewables_thermal_generators_storage_hydrogen_interconnectors.nc # final
├── HT35_network_clustered_demand_renewables_thermal_generators_storage_hydrogen_interconnectors.nc # (optional)
└── HT35_solved.nc # Solved (final results)
Market-enabled scenarios also add outputs under resources/market/, including
the wholesale network, wholesale price CSV, balancing network, redispatch
summary, congestion diagnostics, and price comparison files. Market dashboards
and notebooks are written under resources/analysis/.
Logs#
Each rule writes logs to logs/:
logs/
├── network_build/HT35.log
├── solve/HT35.log
├── plotting/HT35_spatial.log
└── ...
Market rules write logs under logs/market/, for example
logs/market/solve_wholesale_{scenario}.log and
logs/market/solve_balancing_{scenario}.log.
Configuration Flow#
flowchart LR
CONFIG["config/config.yaml"] --> SCENARIOS["config/scenarios.yaml"]
DEFAULTS["config/defaults.yaml"] --> SCENARIOS
SCENARIOS --> SNAKEMAKE["Snakemake"]
SNAKEMAKE --> RULES["rules/*.smk"]
RULES --> SCRIPTS["scripts/*.py"]
Troubleshooting#
Rule Failed#
Check the log file:
cat logs/solve/HT35.log
Re-run Failed Rule#
Force re-execution:
snakemake -R failed_rule_name -j 4
Missing Input#
Snakemake will tell you which input is missing. Generate it first:
snakemake missing_file.nc -j 4
Clean and Restart#
Remove outputs for a scenario:
rm -rf resources/network/HT35*.nc
snakemake resources/network/HT35_solved.nc -j 4
Advanced Usage#
Running Specific Rules#
# Only run data processing (no solving)
snakemake resources/network/HT35_finalized.nc -j 4
# Only run analysis on existing solved network
snakemake resources/analysis/HT35_spatial.html -j 1
Profile Performance#
snakemake --profile slurm # For cluster execution
Using Conda Environments#
snakemake -j 4 --use-conda # Use rule-specific envs