Datasets Catalog

Human-readable HTML: HTML LLM-friendly Markdown: Markdown

Dateline: 2026-03-07 03:05 UTC

Compact reference list. Each item is 1–2 sentences: what it is and why it matters.

Catalog metadata: 73 datasets • 11 domains • structure-optimized for cadence retrieval

Conflict, unrest, and information control
Humanitarian and hazard context
Energy, trade, and maritime
Aviation and mobility
Economy, governance, and structural risk
Ownership, sanctions, and procurement
AI capability, risk, and labor
Cyber vulnerability and exploitation risk
Domestic public safety
Telegram/public-channel analytics
Space weather and disruption context

Retrieval lenses (for fast story triage)

Use this compact map before scanning full entries.

Fast operational corroboration (minutes to hourly): OONI, RIPE Atlas, CAIDA IODA, USGS Earthquake Feeds, NOAA SWPC JSON feeds, CISA KEV Catalog, FIRST EPSS.
Event-tracking + anomaly detection (hourly to daily): ACLED, GDELT, ReliefWeb API, NASA FIRMS, OpenSky Network, ADS-B Exchange, IMF PortWatch.
Structural baselines (monthly to annual): UCDP GED, EM-DAT, UN Comtrade, World Bank Indicators API, IMF Data API, FAOSTAT, SIPRI Milex.
Entity/ownership resolution: OpenSanctions, OpenCorporates API, Open Ownership Register, UK Companies House PSC, GLEIF LEI Golden Copy.
Revision-sensitive series (confidence-capped until confirmed): Eurostat annual demographic indicators with break/provisional flags, IMF/World Bank indicators near recent release boundaries, and any feed with explicit estimated/provisional markers.

Catalog maintenance rules (DATASETS_OPTIMIZE)

Preserve section-level taxonomy unless a split/merge clearly improves retrieval speed.
Prefer editing descriptors over moving entries across sections.
Keep each entry to one sentence of scope + one sentence of caveat/value.
If adding aliases in future, keep one canonical entry and mention aliases in-text.
Re-run duplicate-domain and section-balance checks before publish.
For entries used in current-cycle analysis, surface revision/provisional flags in the story method/limitations when the source exposes them.

Conflict, unrest, and information control

ACLED — Near-real-time conflict and protest event data. Strong for intensity/trajectory checks, but regional reporting lag and source bias require triangulation.
UCDP GED — Curated historical conflict events. Best for baseline and long-window comparison, not immediate same-day claims.
ICEWS — Machine-coded actor/event dataset useful for interaction trend shifts. Treat as signal-heavy and media-dependent, not ground truth.
GDELT — Global event/media coding for fast anomaly detection and narrative-shift tracking. Excellent early warning, weaker as final evidentiary anchor.
OONI — Network interference and censorship measurements. Useful for shutdown/censorship corroboration, with coverage uneven by probe geography.
RIPE Atlas — Distributed active internet measurements for reachability/latency checks. Good for disruption diagnostics, limited by probe placement.
CAIDA IODA — Outage detection via BGP/darknet/active signals. Strong for macro outage events, less sensitive to app/platform-level blocking.

Humanitarian and hazard context

ReliefWeb API — Structured humanitarian situation reports and updates. Useful chronology layer; quality follows upstream submitters.
Humanitarian Data Exchange (HDX) — OCHA-managed open humanitarian datasets and APIs for crisis indicators, displacement, and response operations. High-value cross-country evidence layer with dataset-specific quality/coverage variance.
EM-DAT — Disaster impact database for severity and cross-event context. Reliable for structured comparisons but estimates revise over time.
UNHCR Refugee Data — Displacement statistics for migration and conflict follow-through analysis. Be mindful of registration lag.
USGS Earthquake Feeds — Fast seismic event context and magnitude tracking. Great for timing/frequency checks; not causal attribution.
NASA FIRMS — Satellite fire hotspot detections for wildfire and conflict-adjacent fire patterns. Hotspots are signal, not cause.
NOAA IBTrACS — Tropical cyclone tracks for hazard overlays and route-risk context. Cross-era comparability is imperfect.
NOAA CO-OPS Data Retrieval API — Official U.S. coastal observations/predictions (water level, tides, currents, meteorology) with high-frequency station queries. Strong disruption-context source with station-coverage and interval-limit constraints.
Copernicus CDS (ERA5) — Reanalysis weather/climate fields for baseline comparisons. Excellent contextual control layer with spatial-resolution limits.

Energy, trade, and maritime

UN Comtrade — Official trade statistics for rerouting and sanctions-evasion pattern checks. Powerful but lagged and occasionally revised.
UNCTADstat — Trade and shipping indicators for structural maritime baselines. Strong macro context; indicator release cadence varies.
World Bank Pink Sheet — Commodity benchmark prices for shock context. Not a retail/local price proxy.
IMF PortWatch — AIS-derived chokepoint transit estimates for Suez/Panama-type stress checks. Sensitive to window definitions and AIS coverage.
U.S. EIA Open Data — Official US energy series for oil/gas/power claims. Strong grounding source; unit/metadata discipline required.
AGSI+ — European gas storage dynamics. Useful for storage-stress monitoring, but storage level alone does not imply outages.
ENTSO-E Transparency — Power flows/generation/outage indicators across Europe. High value for grid shock stories; country/product completeness varies.
Global Fishing Watch — Vessel activity telemetry for disputed waters and maritime behavior changes. AIS gaps/spoofing are recurring limitations.
IMO GISIS — Maritime safety/security registry modules. Good official context source with module-dependent completeness.
Vortexa Freight Tracker — Commercial tanker flow analytics for rerouting pressure. Strong directional signal, methodology is vendor-defined.
MarineCadastre AccessAIS — US-focused AIS archive access. Useful for US waters analyses, not global coverage.
AISHub API — Community AIS data feeds for vessel tracking redundancy. Coverage quality depends on contributor network.

Aviation and mobility

OpenSky Network — Open flight surveillance data for route anomaly checks. Coverage/rate limits vary by access tier.
ADS-B Exchange — Broad ADS-B feeds for flight path reconstruction and discontinuities. Commercial endpoint/term changes should be monitored.
ADSB.lol Open Data — Community-backed ADS-B API and archives. Useful as independent aviation corroboration with uneven geography.

Economy, governance, and structural risk

World Bank Indicators API — Cross-country macro/social baselines. Good for context framing, weak for fast-cycle stories.
IMF Data API — Sovereign and macro indicators for stress-consistency checks. Definitions/coverage differ across IMF datasets.
Eurostat APIs (Statistics + Catalogue) — Official EU statistical APIs with machine-readable JSON-stat access, dataset discovery (TOC/DCAT/RSS), and structured metadata for reproducible regional baselines. High analytical value, but indicator publication cadence and definitional changes must be tracked per dataset.
U.S. Census Bureau APIs — Official U.S. demographic, housing, and economic datasets (ACS, Decennial, and more) with granular geographic cuts. High utility for domestic baseline/anomaly work, with endpoint/version heterogeneity to manage.
FRED API (St. Louis Fed) — Large macro/financial time-series API with release, category, and observation endpoints for reproducible economic context checks. Excellent baseline layer, but mixed source provenance across series requires metadata discipline.
FAOSTAT — Food/agriculture structural data for medium/long-horizon analysis. Not suitable for immediate operational claims.
ITU ICT Indicators — Digital infrastructure and usage baselines. Useful control variable for shutdown/censorship narratives.
Worldwide Governance Indicators — Institutional context and fragility proxies. Annual composites are poor short-term attribution tools.
OECD.AI — Cross-country AI ecosystem/policy indicators. Best for comparative policy context, not daily movement.
SIPRI Milex — Defense spending series for security posture trend framing. Cross-country accounting differences matter.

Ownership, sanctions, and procurement

Sanctions provenance rule (quality guardrail): Use originating-authority lists (e.g., OFAC, Global Affairs Canada, EU official files) as final evidentiary anchors; use aggregators (e.g., OpenSanctions) for discovery, cross-linking, and rapid triage.

OpenSanctions — Aggregated designation/entity datasets for sanctions-wave and network monitoring. Confirm high-stakes claims at originating authority.
UN Security Council Consolidated List — Official UN consolidated sanctions list in XML/HTML/PDF with committee-linked identifiers. High-value multilateral baseline for cross-jurisdiction sanctions timing and scope checks.
OpenCorporates API — Company registry federation for entity resolution and corporate linkage checks. Jurisdiction depth varies.
Open Ownership Register — Beneficial ownership datasets for ownership-chain reconstruction. Coverage quality is country-dependent.
UK Companies House PSC — UK person-with-significant-control snapshots. Powerful for UK ownership-change analysis with filing-lag caveats.
USAspending API — US federal spending/procurement patterns and vendor concentration. Interpret obligation vs outlay carefully.
TED Open Data — EU procurement notices and awards for cross-country trend analysis. Requires normalization across taxonomy changes.
UK Contracts Finder API — UK procurement opportunities/awards stream. Field completeness differs across authorities.
GLEIF LEI Golden Copy — Daily global legal-entity identifiers and reference data. High-value backbone for cross-border entity deduplication.
OFAC Sanctions List Service (SLS) — Official U.S. Treasury sanctions list distribution (SDN + consolidated non-SDN datasets) with machine-readable download pathways. Critical primary-source anchor for sanctions designation timing and entity-screening verification.
Consolidated Canadian Autonomous Sanctions List — Global Affairs Canada list of individuals/entities sanctioned under SEMA and JVCFOA, published in HTML/PDF/XML. Valuable jurisdictional complement for cross-country sanctions verification and entity-resolution workflows.

AI capability, risk, and labor

Stanford HELM — Standardized model benchmark tracking for capability/safety comparisons. Protocol drift can mimic model jumps.
Epoch AI GPU Clusters — Frontier compute concentration and buildout signal. Public disclosure bias means incomplete coverage.
AI Incident Database — Curated AI incident records for qualitative risk trend monitoring. Public-report dependence limits completeness.
LMArena — Human-preference leaderboard signal for model shifts and previews. Preference-based Elo is useful but sampling-sensitive.
Artificial Analysis — Multi-benchmark model comparisons for cross-checking leaderboard narratives. Vendor/test selection can shape rankings.
Indeed Hiring Lab — Labor-market signal on skills demand and hiring shifts. Platform composition effects should be considered.
LinkedIn Economic Graph — Workforce trend analytics for occupational and skills transitions. Platform sample bias applies.

Cyber vulnerability and exploitation risk

CISA KEV Catalog — Authoritative list of vulnerabilities observed exploited in the wild for federal prioritization. Crucial corroboration source for exploitation claims.
FIRST EPSS — Daily exploit-likelihood scoring for CVE triage prioritization. Probability signal, not exploitation confirmation.
NIST NVD CVE API — CVE metadata/severity/references and modification tracking. Enrichment lag/revisions are common.

Domestic public safety

FBI CDE/UCR — US crime baseline comparisons across jurisdictions. Participation and categorization shifts can affect comparability.
Statistics Canada — Canadian official statistical series for national and provincial context. Release lag and revisions apply.
Chicago Crimes (2001–present) — High-frequency city-level incident records for local anomaly scans. Backfills/reclassifications occur.
Toronto MCI — Neighbourhood-level major-crime indicators for city trend checks. Definitions differ from US systems.
Edmonton EPS occurrences — Monthly neighbourhood crime indicators for local outlier detection. Not directly comparable across municipalities.

Telegram/public-channel analytics

TGStat API — Public Telegram channel growth/citation signal. Third-party methodology should be treated as a caveated proxy.
Telemetr API — Channel benchmarking and trend data for comparative network scans. Coverage and terms may change.
TGDataset — Research snapshot corpus for historical Telegram network structure. Best for baseline context, not live operational monitoring.

Space weather and disruption context

NOAA SWPC JSON feeds — Operational space-weather observations/forecasts (e.g., flare/Kp products). Useful for timing and severity context in comms/power/GNSS stories, with forecast/observed distinction required.