i7aof.io¶
Purpose: Shared I/O helpers for NetCDF writing and robust file downloads.
Note
This page documents both modules: i7aof.io
and i7aof.download
.
Public Python API (by module)¶
Import paths and brief descriptions by module:
Module:
i7aof.io
write_netcdf()
: Write an xarray.Dataset to NetCDF with appropriate fill values and optional progress bar; supports a conversion path for 64-bit NetCDF3.
Note
Why use i7aof.io.write_netcdf()
instead of xarray.Dataset.to_netcdf
?
Typed fill values, not NaNs: xarray’s default can leave NaNs for missing values; many tools (e.g., NCO) expect a typed
_FillValue
. This wrapper maps NumPy dtypes to NetCDF fill values (vianetCDF4.default_fillvals
), skips string fills, and only writes_FillValue
when a variable actually contains NaNs.Unlimited time by default: automatically marks
time
as an unlimited dimension when present (and clears unlimited dims when not), improving compatibility with tools that append or concatenate over time.Reliable CDF5 output:
format='NETCDF3_64BIT_DATA'
is inconsistently supported by backends. Here, data are written as NETCDF4 and converted to CDF5 using NCO (ncks -5
), then the temp file is removed. Ifengine='scipy'
is requested for this path, it is coerced tonetcdf4
to avoid failures.Better user experience for large writes: optional Dask progress bar without changing user code.
Note within a note: CDF5 conversion uses ncks
(NCO). In the conda-forge
environment defined by dev-spec.txt
, NCO is included and available on PATH.
Module:
i7aof.download
download_file()
: Download a single file to a directory or explicit path, with progress bar.download_files()
: Download many files from a base URL to a directory, preserving subpaths.
Required config options¶
None. Functions accept explicit arguments and use no global config.
Outputs¶
NetCDF files written via
i7aof.io.write_netcdf()
to the provided filename (may use a temporary file during format conversion).Downloaded files saved to requested paths; parent directories are created.
Data model¶
Encodes
_FillValue
per variable based on NumPy dtype → NetCDF mapping, skipping string types; time is set as an unlimited dimension when present.For
format='NETCDF3_64BIT_DATA'
, the dataset is written as NETCDF4 then converted usingncks -5
, and the temporary file is removed.
Runtime and external requirements¶
Core:
xarray
,netCDF4
,numpy
,dask
(for the optional progress bar).Tools:
ncks
from NCO if converting toNETCDF3_64BIT_DATA
.Downloads:
requests
,tqdm
for progress.For the authoritative conda-forge environment, see
dev-spec.txt
(notepyproject.toml
lists a PyPI-only subset).
Usage¶
import xarray as xr
from i7aof.io import write_netcdf
ds = xr.Dataset({'a': ('x', [1, 2, 3])})
write_netcdf(ds, 'out.nc')
from i7aof.download import download_file
download_file('https://example.com/file.txt', 'data/', quiet=True)
Internals (for maintainers)¶
write_netcdf
builds an encoding dict over all data variables and coords, sets per-dtype_FillValue
, and manages unlimited dims.When converting to
NETCDF3_64BIT_DATA
, a temporary NETCDF4 file is created and NCO is invoked withncks -O -5
to produce the final output.Downloads stream content in 1KB chunks and update a
tqdm
progress bar.
Edge cases / validations¶
Fill values are only applied when variables contain NaNs; string dtypes intentionally have no
_FillValue
.If
engine='scipy'
is requested for a conversion path, it is forced tonetcdf4
to avoid incompatibilities.Download functions do nothing if destination files exist unless
overwrite=True
is passed.
Extension points¶
Add retry/backoff to downloads; allow custom chunk sizes.