Predicting Coastal Hypoxia Across 12 Stations and 3 Data Systems

Hypoxia, dissolved oxygen dropping below 2 mg/L, kills coastal ecosystems fast. Fish flee, shellfish die, and the damage can cascade for months. Monitoring networks catch it after it starts. The question we wanted to answer: can we predict when a new hypoxia event will begin, days before it happens, using data from a network of stations, satellite products, and offshore profiles?

This post walks through building that system for the northern Gulf of Mexico. Along the way, we used AQUAVIEW as the data discovery and access layer, and we'll be honest about what that did and didn't change.

The Hard Problem: Onset, Not State

The Gulf of Mexico hypoxia prediction field has moved fast. Rajasekaran et al. (2025) benchmark four deep learning architectures on daily hypoxia classification, with their Spatio-Temporal Transformer reaching AUC-ROC above 0.98. Xue et al. (2025) blend ROMS hindcast output with AI for 72-hour forecasts. XGBoost and Random Forest models show up in study after study, from Gulf-wide spatial mapping to lagoon-scale predictions in the Mediterranean.

Nearly all of this work predicts hypoxia state: is dissolved oxygen below the threshold right now, or will it be tomorrow? Most models also train on output from coupled hydrodynamic-biogeochemical models rather than raw observations. Both choices make sense for their purposes, but they leave a gap.

We targeted something different: onset prediction. Not "will DO be low?" but "when will a new hypoxic event begin?" Predicting the transition from normal oxygen to hypoxic conditions, days in advance, from observational data alone. That's the question coastal managers actually need answered, and it's harder than state classification because DO autocorrelation can't carry you.

Finding the Data

The first step in any ocean analysis is figuring out what data exists, where it lives, and how to get it. This is usually the most tedious part. You're browsing NOAA pages, clicking through THREDDS directories, guessing at URL patterns, and hoping variable names are consistent across stations.

We used AQUAVIEW's STAC API to shortcut this. Three sets of API calls did the core discovery work:

GET /search?collections=NDBC&bbox=-91,28,-85,31        → 72 NDBC stations
GET /search?collections=WOD&q=Oxygen&bbox=-91,28,-85,31 → 1,770 WOD cruise datasets
GET /search?collections=COASTWATCH&q=sea surface temperature → 25 satellite SST products
GET /search?collections=COASTWATCH&q=chlor               → 8 chlorophyll products

Three different data systems. Three different hosting infrastructures. Same API.

From the NDBC results, we filtered for stations where aquaview:variables includes dissolved_oxygen, which narrowed 72 stations down to 14 with DO sensors listed. Then for each of those 14, a GET /collections/NDBC/items/{id} call returned the full item metadata including OPeNDAP asset URLs we'd use to download data. No URL construction, no guessing. The download links came straight from the catalog.

Two of those 14 turned out to have garbage data: every dissolved oxygen reading was the fill value 99.0. Quality filtering dropped them, leaving 12 stations with valid dissolved oxygen records.

The CoastWatch discovery was the most interesting addition. AQUAVIEW returned satellite product items with ERDDAP endpoints in their assets field. One product, the Multiparameter Eddy Significance Index, bundles daily satellite SST, sea surface salinity, chlorophyll-a, sea surface height, and eddy kinetic energy into a single gridded dataset at 0.25° resolution. Because ERDDAP supports server-side spatial subsetting, we could request just the grid cells covering our study area. That's 1,438 days of daily satellite coverage from a dataset we wouldn't have found through NDBC alone.

That's three data access patterns: NDBC stations via OPeNDAP (THREDDS), WOD profiles via S3, and CoastWatch satellite grids via ERDDAP. Each has its own URL structure, its own subsetting conventions, and its own metadata schema. AQUAVIEW abstracts all of that behind assets.csv.href and assets.dods.href.

The Station Network

AQUAVIEW discovered 12 usable NDBC stations with dissolved oxygen sensors across the northern Gulf.

30,500

Station-Days

237

Hypoxia Onsets

1,438

Satellite Days

319

Features

That's roughly 30,500 station-days of observations, 237 hypoxia onset events, and 835 total hypoxic days across 9 stations. GBHM6 alone accounts for more than half of all hypoxic days. Hypoxia is extremely localized even within this relatively small network.

What the Data Shows

The station correlations tell an interesting story. Once fill values were properly filtered, most stations showed strong monthly DO correlations with DPHA1: Cedar Point (r=0.901), Bon Secour (r=0.895), Perdido Pass (r=0.883). The Grand Bay stations correlate well too (GDQM6: r=0.793, GBHM6: r=0.786).

This matters for modeling because it means the cross-station features carry real signal. A drop in DO at Cedar Point today likely reflects conditions that will show up at Dauphin Island soon. The network isn't just more data; it's spatial context that a single-station model can't capture.

Station Data

Station	Location	Hypoxic Days	Onsets	Mean DO

Model Results

At one-day lead, local station data alone (AUC-ROC 0.919) performs essentially the same as the full multi-source model (0.917). When you're predicting 24 hours out, the target station's own dissolved oxygen trend contains most of the signal.

The picture changes at five and seven-day lead times. At five days, the satellite-enriched model (0.866 AUC-ROC, 0.610 AUC-PR) outperforms NDBC-only (0.847, 0.555). At seven days, the gap persists: 0.853/0.642 vs 0.842/0.590. The satellite features, SST anomalies and eddy kinetic energy from CoastWatch, provide basin-scale context that arrives at the coast days later. That's 36 additional engineered features from 1,438 days of satellite coverage, discovered through AQUAVIEW's CoastWatch collection.

The no-DO set is revealing. At one day, removing dissolved oxygen drops AUC-ROC from 0.921 to 0.807. At seven days, no-DO (0.850) nearly matches the full model (0.853). Environmental context alone carries almost all the predictive signal at longer horizons.

For context: these are onset predictions from raw observational data, a harder task than the state classification most published models target. Rajasekaran et al. report AUC-ROC of 0.98+ for daily state classification using deep learning on ROMS hindcast data. Xue et al. report accuracy of 0.85 and F1 of 0.72 for their blended mechanistic-AI model. Our numbers are lower, but they're answering a different question: not "is DO low right now?" but "will a new event start in 5-7 days?" with no physics model in the loop. The apples-to-apples comparison doesn't quite exist yet in the literature.

Model Performance Across Lead Times

AUC-ROC / AUC-PR for four feature sets. Satellite features provide the largest advantage at 5-7 day leads.

Feature Engineering

Cross-Source (319)

NDBC: 18 lags of DO, salinity, temperature
Satellite: 36 CoastWatch features
Network: Cross-station DO correlations
Temporal: Month, hour-of-day indicators

NDBC-Only (283)

DO lags (1-7 days)
Salinity lags
Temperature lags
Temporal indicators

Local-Only (173)

Same-station DO lags
Temperature lags
Salinity lags
Temporal indicators

No-DO (182)

All satellite features
Cross-station temp/salinity
Network correlations
Temporal context only

What AQUAVIEW Did and Didn't Do

AQUAVIEW helped with discovery across three different data systems. Each item's assets field contains the actual download URLs. The script goes from catalog query to data download without URL reverse-engineering.

We found stations we didn't know about. The original plan called for 9 stations. AQUAVIEW returned 14 DO-equipped stations, 5 more than expected. KATA1 alone contributed 110 hypoxic days and 43 onset events.

A practical limitation: WOD annual files are multi-gigabyte global datasets. AQUAVIEW correctly provided their S3 URLs, but these files require byte-range requests rather than direct HTTP access. The CoastWatch ERDDAP datasets required iteration too. AQUAVIEW provided the base URLs and variable metadata, but ERDDAP's griddap subsetting syntax is finicky: each variable needs its own dimension constraints, some datasets have hidden altitude dimensions, and coordinate values must align with the actual grid. Getting the queries right took several rounds. The catalog handled discovery; the subsetting details were ours to figure out.

What AQUAVIEW didn't do: the science. Knowing that satellite SST anomalies serve as hypoxia precursors, designing lag features, filtering fill values — that's domain knowledge. A catalog can point you to the data. It can't tell you what to do with it.

Where This Is Heading

This analysis was built with an AI agent (Claude) handling the data discovery and pipeline construction. In that world, a structured STAC API with consistent metadata and asset URLs is what makes agent-driven data access reliable rather than merely possible.

AQUAVIEW currently spans 24 collections. As that grows, no agent can independently track all those access patterns. The catalog becomes the only practical entry point.

Key Takeaways

Onset prediction from observational data is feasible at multi-day lead times, with AUC-ROC above 0.85 at seven days out, without requiring physics-model hindcasts.
Satellite features matter most when lead time matters most. At 5-7 days, satellite-derived SST, chlorophyll, and eddy energy improve AUC-PR by 5+ points over station-only models. That's consistent with basin-scale signals propagating to the coast on multi-day timescales.
Hypoxia is hyperlocal. GBHM6 accounts for 56% of all hypoxic days across 12 stations. Multi-station discovery matters because you can't assume in advance which stations carry the signal.
As agents handle more of the workflow, unified catalogs become infrastructure. This entire analysis was built in a single session by an AI agent using AQUAVIEW as the entry point. Try it yourself, or see the live dashboard running these predictions on current NDBC data.