A Data Lake for
E-commerce Product Intelligence

Production-grade product datasets delivered as partitioned Apache Parquet in a cloud-native lake architecture. Built for analytics teams, consultants, and builders who need repeatable, query-ready data.

🎯 Lead magnet: guide + updates

Get our competitive analysis guide and product updates (separate from the 10K Parquet sample above).

No spam. Unsubscribe anytime. 2,847 downloads this month.

Data lake architecture

Partitioned Parquet in R2, served by an API layer

Your deliverable is not “a CSV download.” It’s a lake-style layout you can ingest into DuckDB, Polars, Spark, BigQuery, Snowflake, or any Parquet-native workflow.

Partitioning

data/year=YYYY/quarter=Q#/month=M/category=Category_Slug/

Manifests + catalog

Each dataset includes metadata (row counts, schema version, checksums) for repeatable ingestion.

Secure access

Customers authenticate, entitlement is checked, then Parquet is streamed from R2.

What you get

R2 Data Lake
Partitioned Parquet objects
data/year=2026/…/category=Electronics/*.parquet
API Layer (Worker)
Catalog + auth + downloads
/api/catalog /api/me /api/download
Your Analytics
DuckDB / Polars / Spark
read_parquet(\"*.parquet\")
Start with the free Parquet sample, then purchase entitlements for full category drops.

Designed for repeatability: stable schemas, predictable keys, and an API-backed catalog.

Production format

10,000-row Apache Parquet sample

The same column layout and file format as commercial drops: partitioned product-link datasets delivered as Parquet for analytics pipelines (Spark, DuckDB, Polars, BigQuery load, and more).

  • Exactly 10,000 rows in one .parquet file
  • Schema aligned with monthly year=/quarter=/month=/category= releases
  • Marketing use only — see license note on download

Typical columns

Exact fields vary slightly by retailer feed; exports include a manifest.

  • product_url
  • site_domain
  • title
  • brand
  • price
  • currency
  • availability
  • sku
  • breadcrumbs
  • primary_image
  • images
  • description
  • extraction_timestamp
  • confidence
  • http_status
  • quality_score
  • kde_sector
  • kde_subcategory
  • kde_score

Host the file on your Worker (/public/samples/kodira_sample_10k.parquet) or set window.KODIRA_CONFIG.apiBase in config.js so this button resolves.

Trusted by researchers and businesses

500K+
Products Tracked
50+
Major Retailers
98.7%
Data Accuracy

See the Data Quality

Below is a tiny HTML preview (five rows). Your real deliverable is Parquet at scale—start with the 10,000-row sample file. All data is collected from publicly visible product pages.

Representative rows (5 of many millions in production)

Format sold: Parquet Free structured sample: 10K rows
product_url site_domain title brand price currency availability category extraction_timestamp
…/guess-laptop-bag/ 99percents.com Guess Laptop Bag 50000.00 NGN out_of_stock Bags_Travel 2026-04-09T…Z
…/once-upon-a-wrinkle 100percentpure.com Once Upon a Wrinkle 34.00 unknown Footwear 2026-04-09T…Z
…/turntable-cartridges… 2001audiovideo.com Turntable Cartridges and Stylus 249.00 unknown Gaming_Peripherals 2026-04-07T…Z
Download 10K Parquet sample Parquet sample is the best representation of production files.

What's In Our Product Datasets

📱

Product Information

Names, descriptions, SKUs, categories, specifications, and detailed product attributes

💰

Pricing Data

Current prices, sale prices, discounts, price history, and competitive pricing intelligence

Reviews & Ratings

Star ratings, review counts, review sentiment, and customer feedback analysis

📦

Availability Status

Stock levels, availability indicators, shipping information, and inventory tracking

🏪

Retailer Information

Store names, URLs, seller details, and marketplace identifiers for complete attribution

📊

Market Analytics

Trend data, category insights, seasonal patterns, and competitive positioning metrics

How Businesses Use Our Product Data

🔍

Competitive Price Analysis

Track competitor pricing strategies, identify pricing gaps, and optimize your own pricing for maximum profitability.

Saves $10,000+ in consultant fees
📈

Market Research

Analyze product trends, seasonal patterns, and consumer preferences to inform product development and marketing strategies.

Replaces $25,000+ research reports
🎯

Product Sourcing

Discover new products, identify popular items, and find profitable niches in various retail categories.

Identifies 6-figure opportunities
📊

Brand Monitoring

Track how your brand and products are positioned across different retailers and marketplaces.

Prevents costly positioning mistakes

Professional-Grade Product Intelligence

Comprehensive product data that would cost $50,000+ to collect manually. Choose the scale that fits your research needs.

Starter Package

$299
Up to 25,000 products of your choice
  • Perfect for focused competitive analysis
  • Apache Parquet delivery
  • Detailed documentation & schema
  • 6 months of quarterly updates
  • Email support
Ideal for small businesses & researchers

Professional Package

$899
Up to 100,000 products of your choice
  • Comprehensive category intelligence
  • Mix and match across all datasets
  • Parquet + documentation
  • 12 months of quarterly updates
  • Priority email support
  • Custom data requests available
Perfect for consultants & market research

Enterprise Package

$2,999
Up to 2,000,000 products of your choice
  • Industry-dominating intelligence
  • Massive cross-platform datasets
  • Parquet + custom export jobs
  • 12 months of quarterly updates
  • Priority support & consultation
  • Unlimited commercial usage
For enterprise-scale market intelligence

Mega Enterprise

Custom Quote
2,000,000+ products
  • Datasets with tens of millions of records
  • Complete platform intelligence
  • Custom collection & formats
  • Flexible update schedules
  • White-label options available
  • Dedicated account management
Ultimate competitive intelligence
Flexible Selection: Choose products from any of our datasets to reach your package limit. Mix electronics, fashion, home goods, etc.
Updates: All packages include regular quarterly data refreshes to keep your intelligence current.
Custom Datasets: Need specific retailers or massive datasets? Mega Enterprise includes custom collection starting at $8,000.
Value: Manual collection of this data would cost $50,000-500,000+ depending on scale. Get professional-grade intelligence at a fraction of the cost.

How We Collect Product Data

01

Ethical Web Scraping

We only collect publicly visible product information using respectful scraping practices. All robots.txt files are honored and rate limits are strictly followed.

02

Data Validation & Cleaning

Every product record goes through automated validation: price format checking, duplicate removal, and data completeness scoring to ensure high quality.

03

Privacy Compliant

We only collect product information visible to any website visitor. No personal data, no customer information, no backend systems - just public product catalogs.

04

Regular Updates

Product data is refreshed daily to capture price changes, new products, and availability updates. You always get the most current market information.

Ready to Get Started?

Questions about our product data? Need a custom dataset for your industry? Let's talk.

Email hello@kodira.dev

Typical response time: Under 2 hours during business hours