CMIP Model Inputs¶
This page focuses on CMIP6/CMIP7 model data: what you need, how it is
interpreted, and how to prepare and validate it before and during the
i7aof pipeline. It complements (but does not repeat) the full 8‑step
workflow described in the End‑to‑End Workflows page.
Scope & Role¶
CMIP ocean monthly fields (thetao, so, optionally zos) provide the time‑varying
forcing foundation. They are transformed to TEOS‑10 conservative temperature
(ct) and absolute salinity (sa), remapped, extrapolated, bias‑corrected
against an observational climatology, then used to derive thermal forcing (TF)
and annual products. All CMIP steps (except the bias correction invocation) run
per scenario (historical, plus one future scenario like ssp585).
Required Inputs & Minimum Coverage¶
You should supply:
Monthly
thetaoandso(Omon) for the full intended historical climatology window (e.g. 1995–2024 as set by[biascorr] climatology_start_year/end_year) and the chosen future scenario span.Consistent horizontal grid and vertical coordinate (depth/lev); model native vertical must be monotonic increasing in depth or pressure.
Optional
zos(sea surface height) if downstream workflows or validation need it.Complete months (no gaps, no duplicates). Leap‑year handling should match CF conventions.
Minimum viable dataset: a continuous monthly time series covering the configured climatology window plus at least the first decade of the future scenario for testing.
Scenario Handling & Bias Correction Interaction¶
Steps 1–4 (split, convert, remap, extrapolate) are executed independently for
historical and the future scenario. Step 5 (bias correction) is invoked once
with the future scenario argument; internally it reads extrapolated CT/SA from
both scenarios and writes bias‑corrected monthly outputs for each. Subsequent CMIP steps (TF,
annual averages, back‑conversion) run per scenario using those corrected fields.
Directory & Naming Conventions (CMIP portion)¶
The pipeline builds a predictable hierarchy under <workdir> with separate
intermediate and final trees:
intermediate/01_split/<model>/<scenario>/Omon/{thetao,so}/...
intermediate/02_cmip_to_ct_sa/<model>/<scenario>/Omon/ct_sa/*_{ct,sa}_native.nc
intermediate/03_remap/<model>/<scenario>/Omon/ct_sa/*_{ct,sa}_remap.nc
intermediate/04_extrap/<model>/<scenario>/Omon/ct_sa/*_{ct,sa}_extrap_*.nc
intermediate/05_biascorr/<model>/<scenario>/<clim>/Omon/ct_sa/*_{ct,sa}_biascorr_*.nc
intermediate/06_ct_sa_to_tf/<model>/<scenario>/<clim>/Omon/ct_sa_tf0/*_{ct,sa,tf}_*.nc
intermediate/07_annual/<model>/<scenario>/<clim>/Oyr/ct_sa_tf/*_ann.nc
intermediate/08_ct_sa_to_thetao_so/<model>/<scenario>/<clim>/Oyr/thetao_so_tf/*_{thetao,so,tf}_ann.nc
final/AIS/<model>/<scenario>/ocean/<variable>/<version>/<variable>_AIS_<model>_<scenario>_ocean_<version>_<YYYY-YYYY>.nc
final/AIS/<model>/<scenario>/ocean/extras/{climatology,bias}/<variable>/<version>/<variable>_AIS_<model>_<scenario>_ocean_extras_{climatology,bias}_<version>_<YYYY-YYYY>.nc
Key points:
Scenario directory names must exactly match the names you pass (
historical,ssp585, etc.).Filenames carry variable tags (
_ct_native,_sa_remap,_ct_biascorr) for clarity and automated discovery.Annual products use
Oyrwhile monthly products useOmon.
Configuration Blocks & Tuning¶
Relevant sections in your config (*.cfg):
[inputdir] base_dir = /path/to/raw_cmip_and_clim
[workdir] base_dir = /scratch/work_i7aof
[split_cmip]
months_per_file = 120 # 10-year blocks (adjust for I/O patterns)
[convert_cmip]
time_chunk = 12 # TEOS-10 compute chunk length (months)
[remap_cmip]
vert_time_chunk = 1 # Vertical interpolation chunk
horiz_time_chunk = 120 # Horizontal remap chunk
[extrap_cmip]
time_chunk = 12 # Extrapolation chunk for Fortran steps
time_chunk_resample = 12 # Post-extrap vertical resample chunk
[biascorr]
climatology_start_year = 1995
climatology_end_year = 2024
time_chunk = 12 # Bias application chunk
For each CMIP scenario you want to process, you will also define a
[<scenario>_files] section (for example, [historical_files] or
[ssp585_files]). Within each of these sections you provide one or more
expressions for thetao and so input files, typically glob patterns
relative to [inputdir] base_dir.
Optionally, you can restrict the split to a subset of years using
integer start_year and/or end_year options in the same
[<scenario>_files] section. When either or both are provided,
split_cmip (and the ismip7-antarctic-split-cmip CLI) will first
subset each input dataset to the overlapping year range and will skip
files that do not overlap at all. This is the recommended way to work
with a limited time span of the CMIP input data without modifying the
original files.
Tuning guidance:
Increase
months_per_filefor fewer open/close cycles if filesystem latency is high; keep manageable for restarts.Match
time_chunkto available memory; larger chunks reduce Python overhead but raise peak memory.Set
[remap_cmip] vert_time_chunk = 1unless vertical interpolation becomes a bottleneck.Adjust
time_chunk_resampleif resampling memory or speed issues arise.Keep bias correction
time_chunkaligned with extrapolated chunking to minimize rechunk cost.
Enable TEOS‑10 debug/profiling with:
export I7AOF_DEBUG_TEOS10=1
Performance Considerations¶
Prefer contiguous storage layouts (e.g., reorganize native NetCDFs so
timeis the slowest varying dimension).Use a fast parallel filesystem for
<workdir>(scratch or burst buffer) and keep<inputdir>on reliable long-term storage.Avoid very small chunk sizes (< 3 months)—Python/Xarray overhead dominates.
Monitor I/O wait with tools like
iostator HPC profiler to guide chunk adjustments.
Validation Checklist¶
Before running the pipeline (or after Step 2):
Temporal coverage: all months present; no duplicate timestamps.
Units:
thetaoin degC or K? (must match expected TEOS‑10 conversion path);soshould be dimensionless Practical Salinity (PSS‑78). Convert if necessary prior to use.Missing data: proportion of NaNs within cavity regions—large gaps may produce extensive extrapolation regions.
Vertical coordinate monotonic and positive down (or pressure increasing). If not, preprocess.
Global attributes: record source institution and experiment ID for provenance in downstream NetCDF outputs.
After bias correction (Step 5):
Mean difference (model minus reference) over the climatology window should trend toward zero for
ctandsaat most depths.Spot-check a few profiles for unrealistic gradients introduced by extrapolation.
Common Pitfalls¶
Mixing scenario names (
SSP585vsssp585) leading to separate directories.Supplying incomplete final year (e.g., 2024 missing later months) causing biased climatology.
Extremely fine time chunks (1 month) causing poor throughput.
Depth coordinate mislabeled (e.g., using
levthat is actually layer number without physical meaning).Not cleaning temporary partial outputs after an interrupted run—reruns may skip steps with incomplete data.
Minimal Programmatic Snippet (CMIP portion only)¶
from i7aof.convert.split import split_cmip
from i7aof.convert.cmip_to_ct_sa import convert_cmip_to_ct_sa
from i7aof.remap.cmip import remap_cmip
from i7aof.extrap.cmip import extrap_cmip
from i7aof.biascorr.classic import biascorr_cmip
model = 'CESM2-WACCM'
future = 'ssp585'
clim = 'zhou_annual_06_nov'
cfg = 'my.cfg'
for scenario in ['historical', future]:
split_cmip(model, scenario, user_config_filename=cfg)
convert_cmip_to_ct_sa(model, scenario, user_config_filename=cfg)
remap_cmip(model, scenario, user_config_filename=cfg)
extrap_cmip(model, scenario, user_config_filename=cfg)
# Single bias correction invocation writes both scenarios
biascorr_cmip(model, future, clim_name=clim, user_config_filename=cfg)
For TF, annual averages, and back‑conversion see End‑to‑End Workflows.
References¶
TEOS‑10 Manual: https://teos-10.org
CF Conventions: https://cfconventions.org
ISMIP Documentation (grid definitions): see Remapping
Next Steps¶
Once inputs validate, proceed to the full workflow or integrate with a climatology (see Climatology Workflows). Consider adding automated QA checks using the validation checklist above before large batch processing on HPC.