Network Clustering#
Network clustering reduces the ETYS network to fewer buses while preserving key characteristics.
Why Cluster?#
The full ETYS network (~2000 buses) provides maximum detail but:
Takes 30-60 minutes to solve per week
Requires significant memory
Slows sensitivity analysis
Clustering to 50-200 buses can:
Reduce solve time by 5-10x
Maintain most accuracy for aggregate results
Enable faster iteration
Clustering Methods (configurable)#
These map to presets in config/clustering.yaml (resolved by clustering: <preset> in scenarios.yaml), or can be inlined with clustering: { method: <...> }.
K-Means (method: kmeans)#
HT35_kmeans:
network_model: "ETYS"
clustering:
method: "kmeans"
n_clusters: 100
Pros: Simple, fast, good spatial distribution
Cons: May split electrically-connected areas
GSP Spatial (preset: gsp_spatial, method: spatial)#
HT35_gsp:
network_model: "ETYS"
clustering: gsp_spatial # uses method: spatial + GSP boundaries
Pros: Aligns with FES GSP granularity
Cons: Fixed cluster count (~300)
Busmap / Regional (method: busmap)#
HT35_regional:
network_model: "ETYS"
clustering:
method: "busmap"
busmap_source: "data/zone/zonal_bus_mapping.csv"
Pros: Meaningful regional analysis, explicit control
Cons: Fixed mapping; you must maintain the CSV
Configuration#
Basic Clustering#
MyScenario:
clustering:
enabled: true
n_clusters: 100
Advanced Options#
MyScenario:
clustering:
method: "kmeans"
n_clusters: 100
# Component aggregation — top-level config, runs for ALL scenarios at finalization
component_aggregation:
enabled: true
include_loads: true # merge loads per bus
include_storage_units: true # merge identical storage units
include_stores: false # merge Store components
Running Clustered Scenarios#
# Build and solve clustered network
snakemake resources/network/HT35_clustered_100_solved.nc -j 4
The clustering happens automatically in the workflow.
How Clustering Works#
1. Bus Aggregation#
Buses are grouped into clusters based on the algorithm:
flowchart LR
subgraph Original["Original (2000 buses)"]
B1[Bus 1]
B2[Bus 2]
B3[Bus 3]
B4[Bus 4]
end
subgraph Clustered["Clustered (100 buses)"]
C1[Cluster 1]
C2[Cluster 2]
end
B1 --> C1
B2 --> C1
B3 --> C2
B4 --> C2
2. Generator & Storage Aggregation (optional)#
If component_aggregation.enabled: true (a top-level config, not nested under clustering), identical generators and/or storage are merged (capacities summed) when they share the same attributes and time series. Loads can also be merged per bus. This runs in finalize_network.py for all scenarios — independent of whether clustering is enabled.
3. Line Aggregation#
Lines between clusters are combined:
# Original: 3 lines between areas A and B
# Line 1: 1000 MW
# Line 2: 1000 MW
# Line 3: 500 MW
# After clustering:
# Equivalent line: 2500 MW (parallel)
4. Demand Aggregation#
Loads are summed at cluster buses.
Accuracy Considerations#
What’s Preserved#
Total generation capacity by technology
Total demand
Major transmission constraints
Regional balance
What’s Lost#
Intra-cluster congestion
Precise locational prices
Some line flow patterns
Local voltage issues
Accuracy vs Speed Trade-off#
Clusters |
Accuracy |
Solve Speed |
|---|---|---|
500 |
~98% |
3x faster |
200 |
~95% |
5x faster |
100 |
~90% |
10x faster |
50 |
~80% |
15x faster |
Accuracy measured as correlation with full ETYS results for system cost.
Validation#
Compare clustered results to full ETYS:
import pypsa
# Full ETYS solve
n_full = pypsa.Network("resources/network/HT35_solved.nc")
# Clustered solve
n_clust = pypsa.Network("resources/network/HT35_clustered_100_solved.nc")
# Compare key metrics
print(f"Full ETYS cost: £{n_full.objective/1e9:.2f}B")
print(f"Clustered cost: £{n_clust.objective/1e9:.2f}B")
print(f"Difference: {(n_clust.objective - n_full.objective)/n_full.objective*100:.1f}%")
# Generation mix comparison
full_gen = n_full.generators.groupby('carrier').p_nom.sum()
clust_gen = n_clust.generators.groupby('carrier').p_nom.sum()
print("\nCapacity preserved:", (clust_gen / full_gen).mean())
Use Cases#
Sensitivity Analysis#
Run many scenarios quickly:
# Test 10 different configurations
scenarios:
- HT35_clustered_100_base
- HT35_clustered_100_high_wind
- HT35_clustered_100_low_nuclear
# ...
Screening Studies#
Identify interesting cases to run with full detail:
# Quick clustered runs
snakemake resources/network/HT35_clustered_100_solved.nc -j 4
snakemake resources/network/HT50_clustered_100_solved.nc -j 4
# Then full detail for most interesting
snakemake resources/network/HT35_solved.nc -j 4
Educational#
Faster iteration for learning:
Tutorial_scenario:
network_model: "ETYS"
clustering:
n_clusters: 50
solve_period:
start: "2035-01-01"
end: "2035-01-02" # Single day
Troubleshooting#
Infeasible After Clustering#
If the clustered network won’t solve:
Check that major lines aren’t eliminated
Increase
n_clustersto preserve more detailUse
preserve_busesfor critical nodes
Results Don’t Match Full ETYS#
Expected differences are 5-15%. If larger:
Check cluster count (try higher)
Verify generators mapped correctly
Check line capacity aggregation
Memory Issues#
If clustering or solving uses too much memory:
# Reduce parallel jobs
snakemake resources/network/HT35_clustered.nc -j 1
Enable
component_aggregationto reduce asset count after clustering.Shorten
solve_periodor increasetimestep_minutesto reduce time steps during solve.