How the data works

No black boxes. This page explains what we parse, how we structure it, what we infer, and what we don't. If you're evaluating the data for enterprise use, start here.

From menu text to structured intelligence

Every data point traces back to a real venue menu. Here's the pipeline.

1

Source

Venue websites, PDF menus, and Google Maps photos. Publicly available data only.

2

Navigate

AI browser finds menu pages across hundreds of site structures - Shopify, WordPress, custom, PDF links.

3

Extract

Vision API reads every item: brand, category, price, size, ingredients. HTML and images both supported.

4

Normalize

Brand names unified. Spelling variants resolved. Cocktail ingredients mapped to structured records.

5

Validate

Quality controller filters noise, confirms categories, rejects garbage items. Every record timestamped.

What goes in. What comes out

We parse

  • Venue website menus (HTML, rendered JavaScript)
  • PDF menus (uploaded or linked)
  • Google Maps venue photos (menu boards, chalkboards)
  • Cocktail lists and ingredient descriptions
  • Explicit brand mentions
  • Pricing information and serving sizes

We extract

  • Brand name (normalized to canonical entity)
  • Category and subcategory
  • Cocktail ingredient list (structured per slot)
  • Base spirit type and brand
  • Price and currency
  • Parse timestamp and source URL
  • Venue metadata: location, rating, venue type

Not just what's listed. What it implies

Most menus list cocktail names without specifying ingredient brands. "Espresso Martini" appears on 422 NL menus - but fewer than 5% name the coffee liqueur.

  • Map cocktail names to required ingredient sets (base spirit, liqueur, mixer, garnish)
  • Cross-reference ingredient brands mentioned elsewhere on the same menu
  • Flag explicit brand mentions vs unspecified ingredient slots
  • Mark unspecified slots as open territory - the competitive opportunity

If a brand is not explicitly named on the menu, it is not counted as confirmed presence. This protects analytical integrity.

Paloma - decoded
As seen on venue menu: "Paloma - Cazadores Blanco, lime, grapefruit, salt"
Base Cazadores Blanco Confirmed
Spirit type Tequila Inferred
Citrus Lime juice On menu
Mixer Grapefruit soda Brand open
Garnish Salt rim On menu

Data without cleaning is noise

Menus spell brands differently. Abbreviate. Use legacy names. Mix languages. We normalize everything to canonical entities so "Hendricks" in one venue matches "Hendrick's Gin" in another.

Variants cluster under standardized brand entities, enabling accurate comparison across thousands of venues.

Normalization examples
Spelling variants
"Hendricks" Hendrick's Gin
"Hendrick's" Hendrick's Gin
"Hendrick's Gin" Hendrick's Gin

Category resolution
"Patron Silver Tequila" Patron Silver
"Ketel One Botanical Peach" Ketel One Botanical

Noise removal
"Heineken 0.3L" Heineken
"Coca-Cola 33cl Bottle" Coca-Cola

Structured, timestamped, continuously expanding

Every record carries a parse timestamp. Coverage grows continuously as we add venues and re-parse existing ones.

We prioritize:

  • High-rating venues (4.0+ Google rating)
  • Major city centers (Amsterdam, Rotterdam, The Hague, Utrecht)
  • High review volume venues (established, active locations)
  • Venues with online menus (website, PDF, or photo-based)

Coverage depth varies by geography. Netherlands is our deepest market. Other countries are expanding.

7,200+
Venues parsed in the Netherlands
8,000+
Brands tracked across all categories
5x
Larger than leading panel provider's NL sample (~1,000 outlets)
< 8 wks
Rolling refresh cycle. Every record timestamped to its parse date.
Market context

Nobody in Europe does this

Several tools cover parts of on-trade intelligence. None parse actual venue menus at scale in Europe.

Tool What it does Known clients What it does NOT do
CGA by NIQ Panel-based on-trade trends (~1,000 NL outlets surveyed) Diageo, AB InBev, De Kuyper Venue-specific brand presence. No menu parsing.
IWSR Market sizing, volume forecasts Enterprise spirits companies No operational venue targeting. No menu data.
SharpGrid Outlet census (which venues exist) Heineken, Coca-Cola, Asahi What's actually on the menu. No brand-level data.
GroundSignal US on-premise social + menu intelligence US brands Zero European coverage.
Overproof US menu intelligence (10M+ menus) US brands Zero European coverage.
TAPP.cafe NL POS data for horeca Was Heineken, Coca-Cola Bankrupt October 2024. Budget unallocated.
Keggly Insights NL venue-level menu parsing (7,200+ venues) Building client base The only venue-level menu intelligence in Europe.

This is a greenfield market. We are not displacing existing tools. We are filling a gap that nobody else covers.

Not a replacement. A missing layer

CGA, IWSR, and Nielsen answer macro questions. We answer operational ones. Both are needed.

What your existing tools tell you
~ Quarterly surveys across ~1,000 NL outlets
~ Market-level aggregates and trend lines
~ "Cocktails grew 4pp in NL on-trade"
~ Category-level volume and share estimates
~ 34 countries covered
What Keggly adds
+ Continuous parsing across 7,200+ NL venues
+ Venue-level specifics: name, address, menu, prices
+ "These 50 bars added cocktails but don't carry your brand"
+ Cocktail ingredient slots: which brands fill which slots
+ NL deep, expanding to Benelux + DACH
CGA tells you the weather. We give you the terrain map.

Others have validated this model

Menu intelligence is proven in the US. Europe is the gap.

Concept validation

A distillery family built this for the US

An 11th-generation spirits family founded a menu intelligence platform covering 10M+ US menus and 1M+ US venues. Zero European coverage. One of their EU brand managers confirmed they're "looking for a tool" for European markets.

Market timing

The only NL horeca data provider went bankrupt

TAPP.cafe, which provided POS data for major NL beverage companies, went bankrupt in October 2024. Those budgets are now unallocated. The gap is open and growing.

Technology enabler

Visual AI made this possible in 2024

Menu parsing at scale was not practical before GPT-4V and Claude Vision. We read HTML, PDFs, photos, and chalkboard menus with accuracy that was impossible two years ago. This is a new capability, not a better version of an old one.

What we do not claim

Transparency about boundaries is part of the methodology.

×

No back-of-house inventory

We track what's on the menu, not what's in the storage room. Menu presence and pour volume are different things.

×

No pour volume claims

We do not estimate how much of a product is sold. Menu intelligence is about presence and positioning, not volume.

×

No inference without evidence

If a brand name does not appear in menu text, it is not counted. We do not guess what might be poured.

×

Public menu data only

Coverage is limited to publicly accessible menus - websites, PDFs, and venue photos. Venues without online menus are not included.

This makes menu intelligence a missing execution layer - not a competing panel product.

Data integrity principles

These apply to every report and dataset we deliver.

No speculative inference

Extraction follows defined rules. If the rule doesn't apply, the field stays empty. No hidden models filling gaps.

Confirmed vs inferred vs open

Every data point is classified. Brand explicitly named = confirmed. Spirit type from cocktail recipe = inferred. Ingredient slot without brand = open.

No hidden estimation

Venue counts are exact parsed counts, not projections. Penetration rates are against parsed universe, not total market estimates.

Methodology available on request

Parsing logic, extraction prompts, validation rules, and coverage methodology documented and available for enterprise clients.

Three layers of market intelligence

Shipment data shows movement. Panel data shows trends. Menu intelligence shows execution. Each layer answers a different question.

Shipment data

What moved through the warehouse. Source: ERP, distributor reports.

Panel data

Category trends and consumer behavior. Source: CGA, Nielsen, IWSR.

Menu intelligence

What's actually on the menu, venue by venue. Brands, prices, ingredients, competitive context. Source: Keggly Insights.

Methodology questions

How do you handle venues without online menus?

We don't include them. If a venue has no website, no PDF menu, and no readable menu photos on Google Maps, it is excluded from our dataset. We do not estimate or guess. Coverage is limited to venues with publicly accessible menu data.

How accurate is the brand extraction?

When a brand name appears explicitly on a menu, extraction accuracy is above 95%. Spelling variants are normalized to canonical entities. If a brand is not explicitly named, it is not counted - we do not infer presence from context alone.

How often is data refreshed?

Netherlands venues are re-parsed on a rolling basis. Most data is less than 8 weeks old. Every record carries a parse timestamp so you can see exactly when each venue was last checked. Priority venues (high-traffic, premium) are refreshed more frequently.

How does this compare to CGA or Nielsen?

CGA tracks market trends via survey panels (~1,000 NL outlets). We parse actual menus across 7,200+ venues. CGA answers "how is the category trending?" We answer "which specific venues carry your brand and which don't?" Both are useful - for different questions.

Can you prove the source of a data point?

Yes. Every record links to the source URL and parse timestamp. For enterprise clients, we provide full methodology documentation including extraction prompts, validation rules, and coverage criteria.

What about menus in Dutch or other languages?

Our extraction pipeline handles Dutch, English, French, German, and mixed-language menus. Brand names are language-independent. Category and product descriptions are normalized regardless of source language.

See the data for yourself

Name your brand and market. We'll show you exactly what we find - with sources.