FDA data is powerful but fragmented. Facility registrations live in one system, recalls in another, and product approvals in yet another. FDA Data MCP unifies these sources into a single, normalized interface designed for AI agents and regulated workflows.
The goal: ask one question like fda_lookup_company and get facilities, enforcement actions,
and aliases across every FDA dataset, without stitching the data together yourself.
1) Data sources and coverage
We pull from five public FDA sources plus dashboard APIs. Each dataset has its own schema, naming conventions, and update cadence. The first step is to normalize them into a shared shape.
| Source | Coverage | Update | Key fields |
|---|---|---|---|
| DECRS | Drug establishment registrations | Daily | FEI, firm name, operations, address |
| OpenFDA Devices | Device registrations | Monthly | FEI, facility name, owner/operator |
| OpenFDA Enforcement | Recall actions | Weekly | Recall number, firm, classification |
| 510(k) Clearances | Device approvals | Weekly | Applicant, product code, decision date |
| Drugs@FDA | Drug applications | Monthly | Sponsor name, application number |
2) Ingestion pipeline
Each source is ingested via a dedicated TypeScript pipeline. We stream large JSON files to avoid
memory spikes, normalize records, and store provenance for every row. Each run is tracked by a
run_id so you can audit the source and timing of every record.
// Stream JSON to avoid loading 500MB+ files at once
const readable = new Readable({
read() {
let offset = 0;
while (offset < buffer.length) {
const end = Math.min(offset + 64 * 1024, buffer.length);
this.push(buffer.subarray(offset, end));
offset = end;
}
this.push(null);
},
});
3) Normalization and alias resolution
Company names are normalized using a consistent ruleset: punctuation stripped, suffixes removed, and whitespace collapsed. The normalized form powers matching, alias expansion, and cross-references.
- Primary normalized names are stored alongside raw fields.
- Aliases from manual packs, EDGAR exhibits, and FEI cross-references map subsidiaries.
- Every alias carries a confidence score and source for auditing.
4) Cross-referencing facilities and enforcement
The core capability is linking facilities to enforcement actions, approvals, and inspections. We key on FEI numbers when available and fall back to normalized company names when FEI is missing.
SELECT facility.fei_number, facility.firm_name, recall.recall_number
FROM facilities facility
JOIN enforcement recall
ON facility.fei_number = recall.fei_number
WHERE facility.firm_name_normalized = $1;
5) MCP tools and API behavior
MCP tools are designed for AI workflows: predictable schemas, explicit error codes, and deterministic
ordering. Most tools accept a canonical company_id plus optional filters.
{
"tool": "fda_lookup_company",
"args": {
"company": "Pfizer",
"include_enforcement": true,
"include_facilities": true
}
}
6) Provenance and data freshness
Every row includes source, ingested_at, and run_id columns. The public
/health/data endpoint summarizes dataset freshness so you can
validate how recently each source was updated.
7) What this enables
With normalized data, clean identifiers, and cross-references in place, FDA Data MCP makes it possible to answer questions like:
- Which facilities are tied to a parent company and its subsidiaries?
- Which recalls are associated with a company’s manufacturing footprint?
- What approvals or clearances are linked to a specific organization?
If you want to go deeper, the full tool reference is available in the documentation, and you can request access on the signup page.