← AI OSINT Home
Datasets Catalog
Human-readable HTML: HTML
LLM-friendly Markdown: Markdown
Dateline: 2026-03-07 03:05 UTC
Compact reference list. Each item is 1–2 sentences: what it is and why it matters.
Catalog metadata: 73 datasets • 11 domains • structure-optimized for cadence retrieval
Quick navigation
- Conflict, unrest, and information control
- Humanitarian and hazard context
- Energy, trade, and maritime
- Aviation and mobility
- Economy, governance, and structural risk
- Ownership, sanctions, and procurement
- AI capability, risk, and labor
- Cyber vulnerability and exploitation risk
- Domestic public safety
- Telegram/public-channel analytics
- Space weather and disruption context
Retrieval lenses (for fast story triage)
Use this compact map before scanning full entries.
- Fast operational corroboration (minutes to hourly): OONI, RIPE Atlas, CAIDA IODA, USGS Earthquake Feeds, NOAA SWPC JSON feeds, CISA KEV Catalog, FIRST EPSS.
- Event-tracking + anomaly detection (hourly to daily): ACLED, GDELT, ReliefWeb API, NASA FIRMS, OpenSky Network, ADS-B Exchange, IMF PortWatch.
- Structural baselines (monthly to annual): UCDP GED, EM-DAT, UN Comtrade, World Bank Indicators API, IMF Data API, FAOSTAT, SIPRI Milex.
- Entity/ownership resolution: OpenSanctions, OpenCorporates API, Open Ownership Register, UK Companies House PSC, GLEIF LEI Golden Copy.
- Revision-sensitive series (confidence-capped until confirmed): Eurostat annual demographic indicators with break/provisional flags, IMF/World Bank indicators near recent release boundaries, and any feed with explicit estimated/provisional markers.
Catalog maintenance rules (DATASETS_OPTIMIZE)
- Preserve section-level taxonomy unless a split/merge clearly improves retrieval speed.
- Prefer editing descriptors over moving entries across sections.
- Keep each entry to one sentence of scope + one sentence of caveat/value.
- If adding aliases in future, keep one canonical entry and mention aliases in-text.
- Re-run duplicate-domain and section-balance checks before publish.
- For entries used in current-cycle analysis, surface revision/provisional flags in the story method/limitations when the source exposes them.
- ACLED — Near-real-time conflict and protest event data. Strong for intensity/trajectory checks, but regional reporting lag and source bias require triangulation.
- UCDP GED — Curated historical conflict events. Best for baseline and long-window comparison, not immediate same-day claims.
- ICEWS — Machine-coded actor/event dataset useful for interaction trend shifts. Treat as signal-heavy and media-dependent, not ground truth.
- GDELT — Global event/media coding for fast anomaly detection and narrative-shift tracking. Excellent early warning, weaker as final evidentiary anchor.
- OONI — Network interference and censorship measurements. Useful for shutdown/censorship corroboration, with coverage uneven by probe geography.
- RIPE Atlas — Distributed active internet measurements for reachability/latency checks. Good for disruption diagnostics, limited by probe placement.
- CAIDA IODA — Outage detection via BGP/darknet/active signals. Strong for macro outage events, less sensitive to app/platform-level blocking.
Humanitarian and hazard context
- ReliefWeb API — Structured humanitarian situation reports and updates. Useful chronology layer; quality follows upstream submitters.
- Humanitarian Data Exchange (HDX) — OCHA-managed open humanitarian datasets and APIs for crisis indicators, displacement, and response operations. High-value cross-country evidence layer with dataset-specific quality/coverage variance.
- EM-DAT — Disaster impact database for severity and cross-event context. Reliable for structured comparisons but estimates revise over time.
- UNHCR Refugee Data — Displacement statistics for migration and conflict follow-through analysis. Be mindful of registration lag.
- USGS Earthquake Feeds — Fast seismic event context and magnitude tracking. Great for timing/frequency checks; not causal attribution.
- NASA FIRMS — Satellite fire hotspot detections for wildfire and conflict-adjacent fire patterns. Hotspots are signal, not cause.
- NOAA IBTrACS — Tropical cyclone tracks for hazard overlays and route-risk context. Cross-era comparability is imperfect.
- NOAA CO-OPS Data Retrieval API — Official U.S. coastal observations/predictions (water level, tides, currents, meteorology) with high-frequency station queries. Strong disruption-context source with station-coverage and interval-limit constraints.
- Copernicus CDS (ERA5) — Reanalysis weather/climate fields for baseline comparisons. Excellent contextual control layer with spatial-resolution limits.
Energy, trade, and maritime
- UN Comtrade — Official trade statistics for rerouting and sanctions-evasion pattern checks. Powerful but lagged and occasionally revised.
- UNCTADstat — Trade and shipping indicators for structural maritime baselines. Strong macro context; indicator release cadence varies.
- World Bank Pink Sheet — Commodity benchmark prices for shock context. Not a retail/local price proxy.
- IMF PortWatch — AIS-derived chokepoint transit estimates for Suez/Panama-type stress checks. Sensitive to window definitions and AIS coverage.
- U.S. EIA Open Data — Official US energy series for oil/gas/power claims. Strong grounding source; unit/metadata discipline required.
- AGSI+ — European gas storage dynamics. Useful for storage-stress monitoring, but storage level alone does not imply outages.
- ENTSO-E Transparency — Power flows/generation/outage indicators across Europe. High value for grid shock stories; country/product completeness varies.
- Global Fishing Watch — Vessel activity telemetry for disputed waters and maritime behavior changes. AIS gaps/spoofing are recurring limitations.
- IMO GISIS — Maritime safety/security registry modules. Good official context source with module-dependent completeness.
- Vortexa Freight Tracker — Commercial tanker flow analytics for rerouting pressure. Strong directional signal, methodology is vendor-defined.
- MarineCadastre AccessAIS — US-focused AIS archive access. Useful for US waters analyses, not global coverage.
- AISHub API — Community AIS data feeds for vessel tracking redundancy. Coverage quality depends on contributor network.
Aviation and mobility
- OpenSky Network — Open flight surveillance data for route anomaly checks. Coverage/rate limits vary by access tier.
- ADS-B Exchange — Broad ADS-B feeds for flight path reconstruction and discontinuities. Commercial endpoint/term changes should be monitored.
- ADSB.lol Open Data — Community-backed ADS-B API and archives. Useful as independent aviation corroboration with uneven geography.
Economy, governance, and structural risk
- World Bank Indicators API — Cross-country macro/social baselines. Good for context framing, weak for fast-cycle stories.
- IMF Data API — Sovereign and macro indicators for stress-consistency checks. Definitions/coverage differ across IMF datasets.
- Eurostat APIs (Statistics + Catalogue) — Official EU statistical APIs with machine-readable JSON-stat access, dataset discovery (TOC/DCAT/RSS), and structured metadata for reproducible regional baselines. High analytical value, but indicator publication cadence and definitional changes must be tracked per dataset.
- U.S. Census Bureau APIs — Official U.S. demographic, housing, and economic datasets (ACS, Decennial, and more) with granular geographic cuts. High utility for domestic baseline/anomaly work, with endpoint/version heterogeneity to manage.
- FRED API (St. Louis Fed) — Large macro/financial time-series API with release, category, and observation endpoints for reproducible economic context checks. Excellent baseline layer, but mixed source provenance across series requires metadata discipline.
- FAOSTAT — Food/agriculture structural data for medium/long-horizon analysis. Not suitable for immediate operational claims.
- ITU ICT Indicators — Digital infrastructure and usage baselines. Useful control variable for shutdown/censorship narratives.
- Worldwide Governance Indicators — Institutional context and fragility proxies. Annual composites are poor short-term attribution tools.
- OECD.AI — Cross-country AI ecosystem/policy indicators. Best for comparative policy context, not daily movement.
- SIPRI Milex — Defense spending series for security posture trend framing. Cross-country accounting differences matter.
Ownership, sanctions, and procurement
Sanctions provenance rule (quality guardrail): Use originating-authority lists (e.g., OFAC, Global Affairs Canada, EU official files) as final evidentiary anchors; use aggregators (e.g., OpenSanctions) for discovery, cross-linking, and rapid triage.
- OpenSanctions — Aggregated designation/entity datasets for sanctions-wave and network monitoring. Confirm high-stakes claims at originating authority.
- UN Security Council Consolidated List — Official UN consolidated sanctions list in XML/HTML/PDF with committee-linked identifiers. High-value multilateral baseline for cross-jurisdiction sanctions timing and scope checks.
- OpenCorporates API — Company registry federation for entity resolution and corporate linkage checks. Jurisdiction depth varies.
- Open Ownership Register — Beneficial ownership datasets for ownership-chain reconstruction. Coverage quality is country-dependent.
- UK Companies House PSC — UK person-with-significant-control snapshots. Powerful for UK ownership-change analysis with filing-lag caveats.
- USAspending API — US federal spending/procurement patterns and vendor concentration. Interpret obligation vs outlay carefully.
- TED Open Data — EU procurement notices and awards for cross-country trend analysis. Requires normalization across taxonomy changes.
- UK Contracts Finder API — UK procurement opportunities/awards stream. Field completeness differs across authorities.
- GLEIF LEI Golden Copy — Daily global legal-entity identifiers and reference data. High-value backbone for cross-border entity deduplication.
- OFAC Sanctions List Service (SLS) — Official U.S. Treasury sanctions list distribution (SDN + consolidated non-SDN datasets) with machine-readable download pathways. Critical primary-source anchor for sanctions designation timing and entity-screening verification.
- Consolidated Canadian Autonomous Sanctions List — Global Affairs Canada list of individuals/entities sanctioned under SEMA and JVCFOA, published in HTML/PDF/XML. Valuable jurisdictional complement for cross-country sanctions verification and entity-resolution workflows.
AI capability, risk, and labor
- Stanford HELM — Standardized model benchmark tracking for capability/safety comparisons. Protocol drift can mimic model jumps.
- Epoch AI GPU Clusters — Frontier compute concentration and buildout signal. Public disclosure bias means incomplete coverage.
- AI Incident Database — Curated AI incident records for qualitative risk trend monitoring. Public-report dependence limits completeness.
- LMArena — Human-preference leaderboard signal for model shifts and previews. Preference-based Elo is useful but sampling-sensitive.
- Artificial Analysis — Multi-benchmark model comparisons for cross-checking leaderboard narratives. Vendor/test selection can shape rankings.
- Indeed Hiring Lab — Labor-market signal on skills demand and hiring shifts. Platform composition effects should be considered.
- LinkedIn Economic Graph — Workforce trend analytics for occupational and skills transitions. Platform sample bias applies.
Cyber vulnerability and exploitation risk
- CISA KEV Catalog — Authoritative list of vulnerabilities observed exploited in the wild for federal prioritization. Crucial corroboration source for exploitation claims.
- FIRST EPSS — Daily exploit-likelihood scoring for CVE triage prioritization. Probability signal, not exploitation confirmation.
- NIST NVD CVE API — CVE metadata/severity/references and modification tracking. Enrichment lag/revisions are common.
Domestic public safety
- FBI CDE/UCR — US crime baseline comparisons across jurisdictions. Participation and categorization shifts can affect comparability.
- Statistics Canada — Canadian official statistical series for national and provincial context. Release lag and revisions apply.
- Chicago Crimes (2001–present) — High-frequency city-level incident records for local anomaly scans. Backfills/reclassifications occur.
- Toronto MCI — Neighbourhood-level major-crime indicators for city trend checks. Definitions differ from US systems.
- Edmonton EPS occurrences — Monthly neighbourhood crime indicators for local outlier detection. Not directly comparable across municipalities.
Telegram/public-channel analytics
- TGStat API — Public Telegram channel growth/citation signal. Third-party methodology should be treated as a caveated proxy.
- Telemetr API — Channel benchmarking and trend data for comparative network scans. Coverage and terms may change.
- TGDataset — Research snapshot corpus for historical Telegram network structure. Best for baseline context, not live operational monitoring.
Space weather and disruption context
- NOAA SWPC JSON feeds — Operational space-weather observations/forecasts (e.g., flare/Kp products). Useful for timing and severity context in comms/power/GNSS stories, with forecast/observed distinction required.