From a CSV to the most complete product profile on the internet
7 phases from raw CSV to verified intelligence
Phase 1: Import
Enters
A CSV with titles, SKUs, prices — maybe 5-10 fields
Exits
Structured product records with auto-detected categories. Your data scored at 1.0 confidence.
Phase 2: Field Suggestion
Enters
Category-assigned products with minimal fields
Exits
A schema of 50-129 category-specific fields that should exist — weight, noise level, certifications, materials, dimensions. The system knows what's missing.
Phase 3: Web Scraping
Enters
Product identifiers (name, EAN, brand)
Exits
10-20 web sources scraped per product — manufacturer sites, retailers, review sites, spec databases. Raw HTML stored for extraction.
Phase 4: Field Discovery
Enters
Scraped web pages with unstructured content
Exits
Additional fields discovered from real-world sources that weren't in the original schema. The web reveals what matters.
Phase 5: Extraction
Enters
Raw web pages + comprehensive field schema
Exits
Structured field values extracted from every source. Each value tagged with its source URL and extraction confidence.
Phase 6: Consolidation (Truth Engine)
Enters
Multiple values per field from multiple sources — often disagreeing
Exits
One canonical value per field, confidence-scored. Multi-source consensus. Disagreements resolved by evidence weight. The Truth Engine.
Phase 7: Optimization
Enters
Complete, validated product intelligence profiles
Exits
Channel-ready content: Google Shopping titles, Amazon keywords, meta descriptions, Schema.org markup, Smart Negatives, Living FAQ, contextual specs. Anti-hallucination checked.
A credit score for every fact
- Brand-owned data
- Your own import data. Always trusted. The gold standard.
- 5+ independent sources agree
- Near-certainty. Multiple independent sources confirming the same value.
- 3-4 sources agree
- High confidence. Strong consensus across multiple web sources.
- 2 sources agree (display threshold)
- The minimum for display. Below this, the system stays silent.
- Single source only
- Stored but never shown. Silence is better than fiction.
Confidence Hierarchy
Below threshold — stored but not displayed
Every claim checked against 3 source layers
Layer 1: Import Data
Your original data — always scored 1.0. The foundation of truth.
Layer 2: Scraped Data
10-20 web sources per product. Raw, independent observations from across the internet.
Layer 3: Enriched Data
Consolidated, confidence-scored intelligence. Multi-source validated values.
6 violation types detected and blocked
Fabricated Specifications
AI invents a spec that exists in no source. Blocked.
"SNELL certified" — not found in any of 14 sources.
Inflated Measurements
AI exaggerates a numeric value beyond any source. Blocked.
"Battery lasts 72 hours" — best source says 48 hours.
False Certifications
AI claims a certification the product doesn't have. Blocked.
"IP68 waterproof" — product is IP54 rated.
Invented Comparisons
AI makes competitive claims without data basis. Blocked.
"Best in class" — no comparative data exists.
Hallucinated Features
AI adds features that don't exist on the product. Blocked.
"Bluetooth 5.3" — product has no Bluetooth.
Misleading Context
AI provides technically true but misleading framing. Blocked.
"Lightweight at 2.1kg" — heaviest in its category.
What comes out the other side
What goes in
5 fields · 0 validated · No competitive context
What comes out
1,640g — lighter than 72%
84 dB(A) — quieter than 68%
ECE 22.06
Not for track racing — no SNELL/FIM
87 product-specific Q&As
87 fields · 67.6% multi-source validated · Channel-perfect
Go deeper
Enrichment Engine
The full 7-phase pipeline deep dive. Follow a real product through every phase.
ExploreChannel Router
Same gold, different shapes. How each channel gets exactly what it needs.
ExploreProduct Widget
Install a salesperson on every product page. One script tag.
ExploreAIO & LLM Layer
Make your products visible to AI shopping agents.
Explore