Ground truth is verified, authoritative data derived from primary sources rather than estimates, models, or aggregated signals. In business verification, ground truth comes from official registries, observed transactions, and validated operating data—not inferred or modeled attributes.
Ground Truth vs. Estimates
Much business data is estimated or modeled:
Revenue
- Estimate: Modeled from employee count and industry
- Ground Truth: Actual transaction data
Employee count
- Estimate: Inferred from office size
- Ground Truth: Payroll records
Operating status
- Estimate: Assumed from last filing date
- Ground Truth: Observed recent transactions
Location
- Estimate: Registered address
- Ground Truth: Verified operating site
Estimates have their place, but high-stakes decisions require ground truth.
Why Ground Truth Matters
Verification Accuracy
Estimates can be wildly wrong:
- A company might file in Delaware but have zero Delaware presence
- Revenue models assume industry averages; actual businesses vary enormously
- A business might be registered but never actually operated
Ground truth tells you what's real.
Risk Assessment
Risk models built on estimates inherit their errors:
- Overestimated revenue → underestimated risk
- Assumed active status → missed business closures
- Modeled employee count → wrong industry classification
Ground truth enables accurate risk scoring.
Regulatory Compliance
Regulators expect verified information:
- KYB requires confirming business legitimacy
- CDD requires understanding the customer
- EDD requires source of funds verification
"We estimated they were legitimate" doesn't satisfy examiners.
Sources of Ground Truth
Official Registries
- Secretary of State filings (entity existence, officers, registered agent)
- IRS records (EIN, tax status)
- State licensing databases (professional licenses, permits)
- Court records (liens, judgments, bankruptcies)
Transaction Data
- Card transaction records (actual revenue, operating status)
- Banking data (account activity, cash flow)
- Payment processor records (processing volume)
Direct Verification
- Site visits (physical presence)
- Utility records (operational indicators)
- Business correspondence (verified contact)
Third-Party Validation
- Credit bureau business records
- Industry-specific databases
- Verified review platforms
The Ground Truth Hierarchy
Not all sources are equal:
1
- Source Type: Government records
- Example: Secretary of State, IRS
2
- Source Type: Financial transactions
- Example: Card spend, bank records
3
- Source Type: Licensed third parties
- Example: Credit bureaus, D&B
4
- Source Type: Self-reported, verified
- Example: Applications with document upload
5
- Source Type: Self-reported, unverified
- Example: Form submissions
6
- Source Type: Modeled/estimated
- Example: Revenue models, inferred data
Higher tiers provide stronger ground truth.
Ground Truth in Practice
KYB Verification
Ground truth approach:
- Match application to Secretary of State record (Tier 1)
- Verify operating status via transaction data (Tier 2)
- Confirm ownership through registry (Tier 1)
- Validate address through multiple sources (Tier 1-3)
Estimate approach:
- Accept stated name and address
- Model revenue from industry
- Assume active if recently filed
When Estimates Are Acceptable
Ground truth isn't always available or necessary:
- Low-risk decisions may tolerate estimates
- Some attributes (future growth) can only be projected
- Cost/benefit may favor estimates for certain use cases
The key is knowing when you have ground truth and when you don't.
Key Takeaways
- Ground truth is verified data from primary, authoritative sources
- Estimates and models are not ground truth—they're approximations
- High-stakes decisions require ground truth—verification, compliance, risk
- Sources have different authority levels—government records > models
- Know what you have—distinguish ground truth from estimates in your data
Related: Entity Verification | Data Enrichment | Operating Status