Output Formats¶
AQUAVIEW tools accept an output_format parameter with three values: csv, toon, and json. This page shows what each looks like and when to pick which.
TL;DR¶
| Tool | Default | Use csv for |
Use toon for |
Use json for |
|---|---|---|---|---|
list_collections |
csv |
LLM consumption | rare | structured pipelines |
search_datasets |
csv |
LLM consumption (most cases) | single-record dumps | Python / TS code |
aggregate |
csv |
LLM consumption | rare | dashboards / charts |
get_item |
toon |
rare | LLM consumption (default) | code |
The defaults are tuned: tabular tools default to csv because it costs ~20–50% fewer tokens than toon for the same content; get_item defaults to toon because there's only one record and tabular framing wastes tokens.
csv — sectioned CSV (LLM-friendly)¶
AQUAVIEW emits a sectioned CSV — multiple ## name blocks each followed by a header row and data rows. The agent doesn't have to call a separate "list assets" tool because the assets are either summarized inline (asset_keys hint column) or appended as their own section.
## items
# rows: 3
id,collection,datetime,geometry,asset_keys
ndbc_41001_2024-09-15,NDBC,2024-09-15T00:00:00Z,"POINT(-72.6 34.7)",realtime;hist;column_stats
ndbc_41002_2024-09-15,NDBC,2024-09-15T00:00:00Z,"POINT(-74.8 32.3)",realtime;hist;column_stats
...
## next_token
abc123def456...
When you set include_assets=true, a separate ## assets section is appended:
## assets
item_id,key,href,type,title
ndbc_41001_2024-09-15,realtime,https://www.ndbc.noaa.gov/.../41001.txt,text/plain,Realtime data
ndbc_41001_2024-09-15,hist,https://www.ndbc.noaa.gov/.../41001h2024.txt,text/plain,Historical data
...
This is the densest format for any LLM that has to reason over many results.
toon — token-optimized object notation¶
YAML-like indentation, optimized for visual scanning of single records.
id: wod_xbt_ZZ144579
collection: WOD
datetime: 1998-04-12T14:33:00Z
geometry: { type: Point, coordinates: [-65.1, 38.7] }
properties:
aquaview:institution: NOAA/NCEI
aquaview:instrument_type: XBT
aquaview:column_stats_summary:
variables:
Temperature: { min: 4.2, max: 24.6, mean: 12.8, count: 412 }
Salinity: { min: 34.1, max: 36.9, mean: 35.5, count: 410 }
assets:
netcdf:
href: https://www.ncei.noaa.gov/data/oceans/woa/.../ZZ144579.nc
type: application/netcdf
Default for get_item. Use it when reading a single record matters more than processing many.
json — STAC API native¶
The native STAC API shape — strict, predictable, machine-readable. Use this when you're consuming AQUAVIEW from Python, TypeScript, or any non-LLM data pipeline.
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"id": "wod_xbt_ZZ144579",
"collection": "WOD",
"geometry": {"type": "Point", "coordinates": [-65.1, 38.7]},
"bbox": [-65.1, 38.7, -65.1, 38.7],
"properties": {
"datetime": "1998-04-12T14:33:00Z",
"aquaview:institution": "NOAA/NCEI",
"aquaview:instrument_type": "XBT",
"aquaview:column_stats_summary": {
"variables": {
"Temperature": {"min": 4.2, "max": 24.6, "mean": 12.8, "count": 412}
}
}
},
"assets": {
"netcdf": {
"href": "https://www.ncei.noaa.gov/data/oceans/woa/.../ZZ144579.nc",
"type": "application/netcdf"
}
}
}
],
"links": [],
"next_token": "abc123..."
}
Token cost comparison¶
A search_datasets call returning 25 items costs approximately:
| Format | Tokens | Notes |
|---|---|---|
csv (no assets) |
~1.0× (baseline) | with asset_keys hint column |
csv (with assets) |
~1.6× | adds ## assets section |
toon |
~1.2× | per-row object framing |
json |
~1.5× | structural braces, quoted keys |
Numbers vary with payload, but the ordering is stable: csv < toon < json for tabular results.
Field projection¶
Whatever format you pick, you can shrink the payload further by passing fields= on search_datasets:
In csv mode this drops the canonical columns and emits only your projected ones (no asset_keys hint, no ## assets section). In json mode the response contains only the requested keys.
Choosing the right format in code¶
# LLM agent — let the default ride
result = await session.call_tool("search_datasets", {"q": "argo"})
# Python data pipeline — get JSON, parse to Pydantic
result = await session.call_tool("search_datasets", {
"q": "argo",
"output_format": "json"
})
data = json.loads(result.content[0].text)
features = data["features"]