Data Maintenance#
Guide for updating data files when new versions are released.
Update Schedule#
Data Source |
Frequency |
Typical Release |
|---|---|---|
FES |
Annual |
July |
DUKES |
Annual |
March (provisional), July (final) |
REPD |
Quarterly |
Jan, Apr, Jul, Oct |
TEC Register |
Monthly |
1st of month |
ETYS |
Annual |
November |
ERA5 Cutouts |
As needed |
N/A |
Updating FES Data#
When New FES is Released#
Download new data:
Visit NESO Data Portal
Download the Data Workbook (Excel)
Update API configuration:
# data/FES/FES_api_urls.yaml FES_2025: base_url: "https://api.neso.energy/..." capacity_endpoint: "/fes/capacity" demand_endpoint: "/fes/demand"
Process new data:
snakemake resources/FES/FES_2025_data.csv -j 1 -R process_fes_data
Update scenario defaults:
# config/defaults.yaml FES_year: 2025
Validating FES Update#
import pandas as pd
# Compare old and new
old = pd.read_csv("resources/FES/FES_2024_data.csv")
new = pd.read_csv("resources/FES/FES_2025_data.csv")
# Check capacity totals
print("2024 FES - 2035 Wind:", old[(old.year==2035) & (old.technology=='Wind')].capacity_mw.sum())
print("2025 FES - 2035 Wind:", new[(new.year==2035) & (new.technology=='Wind')].capacity_mw.sum())
Updating DUKES Data#
When New DUKES is Released#
Download from GOV.UK:
Download Table 5.11 (Major power producers)
Place in data directory:
mv DUKES_5.11_2026.xlsx data/generators/
Update configuration:
# config/defaults.yaml dukes_file: "DUKES_5.11_2026.xlsx"
Re-process generators:
snakemake resources/generators/DUKES/DUKES_2026_generators.csv -j 1 -R process_dukes
DUKES Data Mapping#
When new power stations appear:
Check coordinates: Verify grid reference is correct
Map fuel type: Add to
data/generators/fuel_mapping.yamlif neededVerify capacity: Cross-reference with TEC register
Updating REPD Data#
Quarterly Update Process#
Download from GOV.UK:
Replace existing file:
mv repd-q3-oct-2025.csv data/renewables/
Update reference:
# config/defaults.yaml repd_file: "repd-q3-oct-2025.csv"
Re-process renewables:
snakemake -R prepare_renewable_sites -j 1
Handling REPD Changes#
REPD format occasionally changes. Check:
Column names haven’t changed
Technology categories are consistent
Coordinate format is the same
Updating TEC Register#
Monthly Update#
Download from NESO:
Replace file:
mv tec-register-august-2025.csv data/generators/
Update reference in config
Updating ETYS Network#
Annual Update (Major)#
Download Appendices:
Get Appendix B Excel file (contains circuits, transformers, HVDC, and upgrade data)
Add the new Excel file to
data/network/ETYS/:Follow NESO’s naming convention (e.g.,
ETYS 2024 Appendix-B V1.xlsx)
Register the new file in
scripts/network_build/etys_file_registry.py:Add an entry to the
ETYS_FILESdictionary mapping the new year to its filenameVerify sheet names match the
ETYS_BASE_SHEETSandETYS_UPGRADE_SHEETSmappings
Update the default ETYS year in
config/defaults.yaml:etys: year: 2025 # Update to new publication year
Process and validate via Snakemake:
# Process raw Excel into intermediate CSVs (stage 1) snakemake resources/network/ETYS/ETYS_2025_components.csv -j 1 # Build and validate the network (stage 2, includes topology validation) snakemake resources/network/ETYS_2025_base_network.nc -j 1
Topology validation (connectivity, parameter ranges, coordinate checks) runs automatically during the
build_ETYS_base_networkrule.Update substation coordinates (if needed):
New buses may need coordinates in
data/network/ETYS/substation_coordinates.csvCheck build logs for warnings about missing or guessed coordinates
Generating New Weather Cutouts#
When to Generate#
Modeling a new historical year
Updated ERA5 data available
Different geographic scope needed
Process#
Set up CDS API:
# Create ~/.cdsapirc url: https://cds.climate.copernicus.eu/api/v2 key: YOUR-API-KEY
Generate cutout:
snakemake -s Snakefile_cutouts resources/atlite/GB_2023.nc -j 2
Validate:
import atlite cutout = atlite.Cutout("resources/atlite/GB_2023.nc") print(cutout) print(f"Time range: {cutout.data.time.min().values} to {cutout.data.time.max().values}")
Validation After Updates#
Quick Validation#
# Validate all scenarios
python scripts/validate_scenarios.py
# Check specific data
python scripts/validate_data_integrity.py
Full Regression Test#
After major updates, run a known scenario and compare:
import pypsa
# Reference (before update)
n_ref = pypsa.Network("resources/network/HT35_solved_reference.nc")
# New (after update)
n_new = pypsa.Network("resources/network/HT35_solved.nc")
# Compare key metrics
print(f"System cost change: {(n_new.objective - n_ref.objective)/n_ref.objective*100:.2f}%")
# Generation by carrier
gen_ref = n_ref.generators.groupby('carrier').p_nom.sum()
gen_new = n_new.generators.groupby('carrier').p_nom.sum()
print("\nCapacity changes:")
print((gen_new - gen_ref).sort_values())
Version Control#
Best Practices#
Commit data updates separately:
git add data/generators/DUKES_5.11_2026.xlsx git commit -m "data: Update DUKES to 2026 edition"
Tag releases:
git tag -a "data-2026Q1" -m "Data update Q1 2026"
Document changes:
Update
docs/readthedocs/source/development/release_notes.md
Large Files#
For large data files (>100MB), consider:
Git LFS for versioning
External hosting with download scripts
Documentation of manual download steps
Troubleshooting#
Data Format Changed#
If a data source changes format:
Check the processing script for hardcoded column names
Update column mappings in
config/if applicableTest with
--dry-runfirst
Missing Data Points#
If data is missing for specific years:
Check if the year is covered by the source
Consider interpolation for intermediate years
Document any assumptions made
Coordinate Issues#
If generators map to wrong locations:
Verify coordinate system (OSGB36 vs WGS84)
Check for data entry errors in source
Use
scripts/validate_coordinates.pyto identify issues