Network Clustering#

Network clustering reduces the ETYS network to fewer buses while preserving key characteristics.

Why Cluster?#

The full ETYS network (~2000 buses) provides maximum detail but:

Takes 30-60 minutes to solve per week
Requires significant memory
Slows sensitivity analysis

Clustering to 50-200 buses can:

Reduce solve time by 5-10x
Maintain most accuracy for aggregate results
Enable faster iteration

Clustering Methods (configurable)#

These map to presets in config/clustering.yaml (resolved by clustering: <preset> in scenarios.yaml), or can be inlined with clustering: { method: <...> }.

K-Means (`method: kmeans`)#

HT35_kmeans:
  network_model: "ETYS"
  clustering:
    method: "kmeans"
    n_clusters: 100

Pros: Simple, fast, good spatial distribution
Cons: May split electrically-connected areas

GSP Spatial (`preset: gsp_spatial`, `method: spatial`)#

HT35_gsp:
  network_model: "ETYS"
  clustering: gsp_spatial    # uses method: spatial + GSP boundaries

Pros: Aligns with FES GSP granularity
Cons: Fixed cluster count (~300)

Busmap / Regional (`method: busmap`)#

HT35_regional:
  network_model: "ETYS"
  clustering:
    method: "busmap"
    busmap_source: "data/zone/zonal_bus_mapping.csv"

Pros: Meaningful regional analysis, explicit control
Cons: Fixed mapping; you must maintain the CSV

Configuration#

Basic Clustering#

MyScenario:
  clustering:
    enabled: true
    n_clusters: 100

Advanced Options#

MyScenario:
  clustering:
    method: "kmeans"
    n_clusters: 100
  # Component aggregation — top-level config, runs for ALL scenarios at finalization
  component_aggregation:
    enabled: true
    include_loads: true           # merge loads per bus
    include_storage_units: true   # merge identical storage units
    include_stores: false         # merge Store components

Running Clustered Scenarios#

# Build and solve clustered network
snakemake --cores 4 --config scenario=HT35_clustered

The clustering happens automatically in the workflow.

How Clustering Works#

1. Bus Aggregation#

Buses are grouped into clusters based on the algorithm:

        flowchart LR
    subgraph Original["Original (2000 buses)"]
        B1[Bus 1]
        B2[Bus 2]
        B3[Bus 3]
        B4[Bus 4]
    end
    
    subgraph Clustered["Clustered (100 buses)"]
        C1[Cluster 1]
        C2[Cluster 2]
    end
    
    B1 --> C1
    B2 --> C1
    B3 --> C2
    B4 --> C2

2. Generator & Storage Aggregation (optional)#

If component_aggregation.enabled: true (a top-level config, not nested under clustering), identical generators and/or storage are merged (capacities summed) when they share the same attributes and time series. Loads can also be merged per bus. This runs in finalize_network.py for all scenarios — independent of whether clustering is enabled.

3. Line Aggregation#

Lines between clusters are combined:

# Original: 3 lines between areas A and B
# Line 1: 1000 MW
# Line 2: 1000 MW
# Line 3: 500 MW

# After clustering:
# Equivalent line: 2500 MW (parallel)

4. Demand Aggregation#

Loads are summed at cluster buses.

Accuracy Considerations#

What’s Preserved#

Total generation capacity by technology
Total demand
Major transmission constraints
Regional balance

What’s Lost#

Intra-cluster congestion
Precise locational prices
Some line flow patterns
Local voltage issues

Accuracy vs Speed Trade-off#

Clusters	Accuracy	Solve Speed
500	~98%	3x faster
200	~95%	5x faster
100	~90%	10x faster
50	~80%	15x faster

Accuracy measured as correlation with full ETYS results for system cost.

Validation#

Compare clustered results to full ETYS:

import pypsa

# Full ETYS solve
n_full = pypsa.Network("resources/network/HT35_solved.nc")

# Clustered solve  
n_clust = pypsa.Network("resources/network/HT35_clustered_100_solved.nc")

# Compare key metrics
print(f"Full ETYS cost: £{n_full.objective/1e9:.2f}B")
print(f"Clustered cost: £{n_clust.objective/1e9:.2f}B")
print(f"Difference: {(n_clust.objective - n_full.objective)/n_full.objective*100:.1f}%")

# Generation mix comparison
full_gen = n_full.generators.groupby('carrier').p_nom.sum()
clust_gen = n_clust.generators.groupby('carrier').p_nom.sum()
print("\nCapacity preserved:", (clust_gen / full_gen).mean())

Use Cases#

Sensitivity Analysis#

Run many scenarios quickly:

# Test 10 different configurations
run_scenarios:
  - HT35_clustered_100_base
  - HT35_clustered_100_high_wind
  - HT35_clustered_100_low_nuclear
  # ...

Screening Studies#

Identify interesting cases to run with full detail:

# In config/config.yaml
run_scenarios:
  - HT35_clustered
  - EE50_clustered

# Quick clustered runs
snakemake --cores 4

# Then full detail for the most interesting case
snakemake --cores 4 --config scenario=HT35

Educational#

Faster iteration for learning:

Tutorial_scenario:
  network_model: "ETYS"
  clustering:
    n_clusters: 50
  solve_period:
    start: "2035-01-01"
    end: "2035-01-02"  # Single day

Troubleshooting#

Infeasible After Clustering#

If the clustered network won’t solve:

Check that major lines aren’t eliminated
Increase n_clusters to preserve more detail
Use preserve_buses for critical nodes

Results Don’t Match Full ETYS#

Expected differences are 5-15%. If larger:

Check cluster count (try higher)
Verify generators mapped correctly
Check line capacity aggregation

Memory Issues#

If clustering or solving uses too much memory:

# Reduce parallel jobs
snakemake --cores 1 --config scenario=HT35_clustered

Enable component_aggregation to reduce asset count after clustering.
Shorten solve_period or increase timestep_minutes to reduce time steps during solve.