How FDA Data MCP Works

FDA data is powerful but fragmented. Facility registrations live in one system, recalls in another, and product approvals in yet another. FDA Data MCP unifies these sources into a single, normalized interface designed for AI agents and regulated workflows.

The goal: ask one question like fda_lookup_company and get facilities, enforcement actions, and aliases across every FDA dataset, without stitching the data together yourself.

1) Data sources and coverage

We pull from five public FDA sources plus dashboard APIs. Each dataset has its own schema, naming conventions, and update cadence. The first step is to normalize them into a shared shape.

Source	Coverage	Update	Key fields
DECRS	Drug establishment registrations	Daily	FEI, firm name, operations, address
OpenFDA Devices	Device registrations	Monthly	FEI, facility name, owner/operator
OpenFDA Enforcement	Recall actions	Weekly	Recall number, firm, classification
510(k) Clearances	Device approvals	Weekly	Applicant, product code, decision date
Drugs@FDA	Drug applications	Monthly	Sponsor name, application number

2) Ingestion pipeline

Each source is ingested via a dedicated TypeScript pipeline. We stream large JSON files to avoid memory spikes, normalize records, and store provenance for every row. Each run is tracked by a run_id so you can audit the source and timing of every record.

// Stream JSON to avoid loading 500MB+ files at once
const readable = new Readable({
  read() {
    let offset = 0;
    while (offset < buffer.length) {
      const end = Math.min(offset + 64 * 1024, buffer.length);
      this.push(buffer.subarray(offset, end));
      offset = end;
    }
    this.push(null);
  },
});

3) Normalization and alias resolution

Company names are normalized using a consistent ruleset: punctuation stripped, suffixes removed, and whitespace collapsed. The normalized form powers matching, alias expansion, and cross-references.

Primary normalized names are stored alongside raw fields.
Aliases from manual packs, EDGAR exhibits, and FEI cross-references map subsidiaries.
Every alias carries a confidence score and source for auditing.

4) Cross-referencing facilities and enforcement

The core capability is linking facilities to enforcement actions, approvals, and inspections. We key on FEI numbers when available and fall back to normalized company names when FEI is missing.

SELECT facility.fei_number, facility.firm_name, recall.recall_number
FROM facilities facility
JOIN enforcement recall
  ON facility.fei_number = recall.fei_number
WHERE facility.firm_name_normalized = $1;

5) MCP tools and API behavior

MCP tools are designed for AI workflows: predictable schemas, explicit error codes, and deterministic ordering. Most tools accept a canonical company_id plus optional filters.

{
  "tool": "fda_lookup_company",
  "args": {
    "company": "Pfizer",
    "include_enforcement": true,
    "include_facilities": true
  }
}

6) Provenance and data freshness

Every row includes source, ingested_at, and run_id columns. The public /health/data endpoint summarizes dataset freshness so you can validate how recently each source was updated.

7) What this enables

With normalized data, clean identifiers, and cross-references in place, FDA Data MCP makes it possible to answer questions like:

Which facilities are tied to a parent company and its subsidiaries?
Which recalls are associated with a company’s manufacturing footprint?
What approvals or clearances are linked to a specific organization?

If you want to go deeper, the full tool reference is available in the documentation, and you can request access on the signup page.