Processing: ERA5 Regional Mean Profiles for Libradtran

Processing: ERA5 Regional Mean Profiles for Libradtran#

This notebook prepares seasonal climatological temperature and humidity profiles for two regions — tropics (20°S–20°N) and Arctic (70°N–85°N) — from the ARCO ERA5 reanalysis archive and saves them in the format expected by pyRadtran.

The output (data/era5_region_profiles.nc) can be loaded directly in lapse_rate_feedback.ipynb to replace the single ARCO snapshot.

The workflow mirrors sea_ice_era5.ipynb: download a regional subset from Google Cloud Storage, group by season, compute means, rename to pyRadtran conventions, save.

Warning

This notebook downloads ERA5 data from Google Cloud Storage. The subset used here (~2000–2020, two lat/lon boxes, 6-hourly, 64×128 grid) is manageable on a laptop with ~8 GB RAM, but will take several minutes to load. Use the Dask cluster cell below if you want faster parallel loading.

Setup#

import numpy as np
import pandas as pd
import xarray as xr
import gcsfs
import matplotlib.pyplot as plt
from pathlib import Path

OUTPUT_PATH = Path("data/era5_region_profiles.nc")
OUTPUT_PATH.parent.mkdir(parents=True, exist_ok=True)

Load ARCO ERA5 (6-hourly, coarse resolution)#

We use the 6-hourly 64×128 equiangular ARCO store — the same one used in sea_ice_era5.ipynb. It is much smaller than the high-resolution 1-hourly store and sufficient for computing seasonal mean profiles.

gcs  = gcsfs.GCSFileSystem(token="anon")
path = "gs://gcp-public-data-arco-era5/ar/1959-2022-6h-128x64_equiangular_with_poles_conservative.zarr/"

full_era5 = xr.open_zarr(gcs.get_mapper(path), chunks="auto")
print(list(full_era5.data_vars))
full_era5

/home/josh/micromamba/envs/mamba_josh/lib/python3.13/site-packages/pyproj/network.py:59: UserWarning: pyproj unable to set PROJ database path.
  _set_context_ca_bundle_path(ca_bundle_path)

['10m_u_component_of_wind', '10m_v_component_of_wind', '10m_wind_speed', '2m_temperature', 'angle_of_sub_gridscale_orography', 'anisotropy_of_sub_gridscale_orography', 'geopotential', 'geopotential_at_surface', 'high_vegetation_cover', 'lake_cover', 'lake_depth', 'land_sea_mask', 'low_vegetation_cover', 'mean_sea_level_pressure', 'sea_ice_cover', 'sea_surface_temperature', 'slope_of_sub_gridscale_orography', 'soil_type', 'specific_humidity', 'standard_deviation_of_filtered_subgrid_orography', 'standard_deviation_of_orography', 'surface_pressure', 'temperature', 'toa_incident_solar_radiation', 'toa_incident_solar_radiation_12hr', 'toa_incident_solar_radiation_24hr', 'toa_incident_solar_radiation_6hr', 'total_cloud_cover', 'total_column_water_vapour', 'total_precipitation_12hr', 'total_precipitation_24hr', 'total_precipitation_6hr', 'type_of_high_vegetation', 'type_of_low_vegetation', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity', 'wind_speed']

<xarray.Dataset> Size: 326GB
Dimensions:                                           (time: 92040,
                                                       longitude: 128,
                                                       latitude: 64, level: 13)
Coordinates:
  * latitude                                          (latitude) float64 512B ...
  * level                                             (level) int64 104B 50 ....
  * longitude                                         (longitude) float64 1kB ...
  * time                                              (time) datetime64[ns] 736kB ...
Data variables: (12/38)
    10m_u_component_of_wind                           (time, longitude, latitude) float32 3GB dask.array<chunksize=(40, 128, 64), meta=np.ndarray>
    10m_v_component_of_wind                           (time, longitude, latitude) float32 3GB dask.array<chunksize=(40, 128, 64), meta=np.ndarray>
    10m_wind_speed                                    (time, longitude, latitude) float32 3GB dask.array<chunksize=(40, 128, 64), meta=np.ndarray>
    2m_temperature                                    (time, longitude, latitude) float32 3GB dask.array<chunksize=(40, 128, 64), meta=np.ndarray>
    angle_of_sub_gridscale_orography                  (longitude, latitude) float32 33kB dask.array<chunksize=(128, 64), meta=np.ndarray>
    anisotropy_of_sub_gridscale_orography             (longitude, latitude) float32 33kB dask.array<chunksize=(128, 64), meta=np.ndarray>
    ...                                                ...
    type_of_high_vegetation                           (longitude, latitude) float32 33kB dask.array<chunksize=(128, 64), meta=np.ndarray>
    type_of_low_vegetation                            (longitude, latitude) float32 33kB dask.array<chunksize=(128, 64), meta=np.ndarray>
    u_component_of_wind                               (time, level, longitude, latitude) float32 39GB dask.array<chunksize=(40, 13, 128, 64), meta=np.ndarray>
    v_component_of_wind                               (time, level, longitude, latitude) float32 39GB dask.array<chunksize=(40, 13, 128, 64), meta=np.ndarray>
    vertical_velocity                                 (time, level, longitude, latitude) float32 39GB dask.array<chunksize=(40, 13, 128, 64), meta=np.ndarray>
    wind_speed                                        (time, level, longitude, latitude) float32 39GB dask.array<chunksize=(40, 13, 128, 64), meta=np.ndarray>

Optional: Dask cluster for faster loading#

Uncomment the cell below if you want to parallelise the data download.

# from dask.distributed import Client, LocalCluster
#
# cluster = LocalCluster(
#     n_workers=8,
#     threads_per_worker=1,
#     memory_limit="auto",
#     local_directory="/tmp/dask",
#     dashboard_address=":8787",
# )
# client = Client(cluster)
# print(client)

Select regional subsets (2000–2020)#

We restrict to 2000–2020 — long enough for a stable climatology, short enough to stay manageable. Pressure-level variables (temperature, specific_humidity, geopotential) plus the surface variable sea_surface_temperature are kept.

VARS = ["temperature", "specific_humidity"]
if "geopotential" in full_era5.data_vars:
    VARS.append("geopotential")
if "sea_surface_temperature" in full_era5.data_vars:
    VARS.append("sea_surface_temperature")

print("Using variables:", VARS)

ds_time = full_era5[VARS].sel(time=slice("2000-01-01", "2020-12-31"))

Using variables: ['temperature', 'specific_humidity', 'geopotential', 'sea_surface_temperature']

# Latitude orientation: check whether it runs 90→-90 or -90→90
lats = full_era5.latitude.values
print(f"Latitude range: {lats.min():.1f} to {lats.max():.1f}")

# Tropics: 20°S to 20°N
ds_tropics = ds_time.sel(latitude=slice(
    min(-20.0, 20.0), max(-20.0, 20.0)
) if lats[0] < lats[-1] else slice(20.0, -20.0))

# Arctic: 70°N to 85°N
ds_arctic = ds_time.sel(latitude=slice(
    min(70.0, 85.0), max(70.0, 85.0)
) if lats[0] < lats[-1] else slice(85.0, 70.0))

print(f"Tropics lat: {ds_tropics.latitude.values[[0,-1]]}")
print(f"Arctic  lat: {ds_arctic.latitude.values[[0,-1]]}")

Latitude range: -90.0 to 90.0
Tropics lat: [-18.57142857  18.57142857]
Arctic  lat: [70.         84.28571429]

Load into memory#

We take the spatial mean over each region before loading so the in-memory footprint is small: time × level only.

print("Computing tropical spatial mean...")
ds_tropics_mean = ds_tropics.mean(dim=["latitude", "longitude"]).load()
print("Computing Arctic spatial mean...")
ds_arctic_mean  = ds_arctic.mean(dim=["latitude", "longitude"]).load()
print("Done.")

Computing tropical spatial mean...
Computing Arctic spatial mean...
Done.

Save to netCDF#

ds_out.to_netcdf(OUTPUT_PATH)
print(f"Saved: {OUTPUT_PATH}")

Saved: data/era5_region_profiles.nc

Quick verification#

Check that the profiles look physically reasonable.

ds_check = xr.open_dataset(OUTPUT_PATH)

fig, axes = plt.subplots(1, 2, figsize=(11, 5), sharey=True)

for region, color in [("tropics", "#d73027"), ("arctic", "#4575b4")]:
    for season, ls in zip(SEASONS, ["-", "--", "-.", ":"]):
        ds_check["t"].sel(region=region, season=season).plot(
            y="pressure_level",
            yincrease=False,
            ax=axes[0],
            color=color,
            ls=ls,
            label=f"{region} {season}",
        )
        ds_check["q"].sel(region=region, season=season).plot(
            y="pressure_level",
            yincrease=False,
            ax=axes[1],
            color=color,
            ls=ls,
            label=f"{region} {season}",
        )

for ax in axes:
    ax.set_yscale("log")
    ax.set_ylim(1000, 100)

axes[0].set_title("Temperature (K)")
axes[1].set_title("Specific humidity (kg/kg)")
axes[0].legend(fontsize=7, ncol=2)
plt.tight_layout()

../_images/8cdfc72cd8b61a09ecb1b9d7528461a2c2aa45104da6bc4f137f15f1518614a9.png

Processing: ERA5 Regional Mean Profiles for Libradtran

Contents