Report · UGC-132 · D1

Brand-Mention Detection & Paid-Partnership Signals

Detecting brand mentions across structured tags, hashtags, and free text — and weighing them against the platform’s disclosure flag — over 99,920 Instagram posts.

DATASET D1 / full ROWS 99,920 DB ugc@:5433 GENERATED 2026-06-23
Reading note. paid_partnership is the platform's raw disclosure flag, not ground truth — a false value is known to hide undisclosed ads. This report measures detected brand mentions and compares them against that flag; it never treats the flag as the answer.

§01 · Findings & issues — example browser

Recognizable posts behind every category — the four partnership types and the Confirmed tier from the taxonomy doc, plus three things we discovered. Click an account to open it; the → @author shows who Instagram says posted. Tabs marked are problems we surfaced, not clean findings.

OUR FINDINGS FROM DOC

Platform paid_partnership flag = true. The only certain signal — 466 posts. Recognizable disclosed ads: View all 150 →

Post → authorBrand / cueCaption
reel/DV8uClCDCxf → @goodmorningamerica 4.3M@burgerking“Our sponsor @burgerking asked for feedback — and you delivered…”
reel/DW6ruPbAP7Q → @instylemagazine 4.4M@macys“#ad Soft, romantic layers for spring, styled head to toe from @macys 💐”
reel/DLXolpeAe6W → @rappingchef 2.4M@cawalnuts“California walnuts 😂 @cawalnuts #CAWalnutsPartner #Sponsored”
reel/DMFTpwXqC4x → @brettlee_58 2.1MAussie Avocados“Fuel up with Aussie avocados 🥑 #AustralianAvocados”

Type: Sponsored. Platform flag, or #ad/#sponsored/“sponsored by”. 700 posts. View all 150 →

Post → authorBrandCaption
reel/DWwIt4MEUyG → @markwahlberg 30M@hallowapp“HAPPY EASTER🙏 @hallowapp 🙏 #ad”
reel/DRR0Ca6DHCy → @ladbible 15.8M“So pet cloning is actually a thing now? 😳🐾 #ad”
reel/DQZVGnaifU6 → @todayshow 5.5MWinter Olympics“The 2026 Winter Olympics are 100 days away… #ad”
reel/DSI1eeeklH3 → @rocky_barnes 3.3M“All dressed up for the holidays ✨ 🎁 #ad”

Type: Affiliate. Promo/discount code or affiliate-platform link. 656 posts (after removing the noisy % off pattern — see §09). View all 150 →

Post → authorCode / cueCaption
reel/DUWAS7SjpFl → @tweets 4.2MSUPERBOWL10“Click the link in bio to claim $10 with code SUPERBOWL10…”
reel/DWoysvSDnV2 → @annabel.lucinda 3.4M@gymshark“GLUTE DAY… wearing the new @gymshark ‘interval’ collection”
p/DYkIPPXkZ0B → @kellylmatthews 1.7MKELLY“EUPHORIA … **Code KELLY** #activewear”
reel/DQmNKDGgGmP → @styled.by.josepha 878KJOSEPHA35 · @vici“🔥35% off with my code: JOSEPHA35 🔗Comment VICI…”

Type: Collaboration. A brand co-author on the post (appears on both grids). 1,921 posts. The co-author handle is shown. View all 150 →

Post → authorBrand co-authorCaption
p/DMDSVYfI3Y9 → @victoriabeckham 32.9M@victoriabeckhambeauty“Late check-out. Instant obsession…”
reel/DXVCvtLjMXJ → @premierleague 79.5M@astonvilla“Unai Emery celebrations never disappoint.”
reel/DNVhbzIg8-B → @ralphlauren 16.7M@poloralphlauren“The Polo Bear Chronicles: Operation Black Tie…”

Type: Gifted. Gifting language or #gifted — the weakest/noisiest type. 300 posts. View all 150 →

Post → authorBrand / cueCaption
reel/DMjXnEOvKuN → @bybinalshah 479KNuFACE“gifted by NuFACE ✨ 3 hours of sleep, but make it look like 8…”
reel/DQIJHV7Efst → @aarondinin 809K(box of product)“What would you have done in a failure challenge featuring a giant box…”
reel/DXzemYAOZjO → @bandana.eats1 417KRed Lobster“RED LOBSTER ENDLESS SHRIMP!!! #foodie #redlobster”

Discovered: undisclosed organic mentions. Creator @-mentions a hiker-verified brand, no paid flag, no brand co-author (so genuinely the creator’s own post — distinct from the author-inversion cases). The triage pool for hidden ads. View all 150 →

Post → authorBrandCaption
reel/DS2_kLdkrG5 → @renatarrii 1.4M@iamgia“training in @iamgia 🥊”
reel/DYsDhjSAoSp → @malloryervin 1.1M@ultabeauty“Testing my more is more mentality 🤪✨ I am a makeup lover…”
reel/DRzrlOXjA9I → @phenixsoul 782K@yslbeauty“60 second red carpet glam. Products used (in order)…”
reel/DUGcZTFEkAx → @heatherdubrow 1.8M@popupbagels“Bagels are always better in NY … @popupbagels”

Discovered: hashtag matching ≠ mention. A #brand hashtag conflates “sponsored by” with “about this brand.” These matched our dictionary by a hashtag but are not brand mentions — they are news topic-tags or brands tagging themselves. This is why the raw “hidden-in-hashtag” count (5,406) collapses to ~420 once cleaned. View all 150 →

Post → authorHashtagWhy it’s noise
p/DYuDMtUiEll → @dainikbhaskar_ news#AmazonHindi news story on tech layoffs — topic tag, no Amazon relationship.
p/DPboq_pk5xI → @9gag media#legoMeme account; #lego is the subject, not a sponsor.
p/DN22AxOYqHC → @bmw brand#bmwBMW tagging itself — first-party, not an influencer ad.
reel/DLSWrb3Iljl → @tommyhilfiger brand#TommyHilfigerBrand’s own post.

Discovered: primary-author inversion on collab posts. For brand×creator collaborations, the dataset’s influencer_name can record a co-author creator as the author when Instagram’s real owner is the brand. Verified against hiker get_v2_media_info_by_code (true owner in media.user). These look like “undisclosed creator mentions” but are brand-published collab ads.

PostDB says authorHiker: true author
reel/DWEmrEPhWdQ@atsukocomedy (creator)@ugg (brand); atsukocomedy = co-author
reel/DWeNYlyjJNg@zeyatilgan (creator)@lcwaikiki (brand); caption has “işbirliği”=collab

Impact: 217 of the 2,327 “verified-brand undisclosed” posts are actually brand-coauthor collabs. Recoverable per-post via hiker; logged as a known data issue, not yet corrected in bulk.

Snippets truncated/whitespace-collapsed; full captions in posts.caption. Links built from post_links view (verified: reel/DX6wxRhRv_3 = @mattestlea).

§02 · Headline numbers

12,891
posts mention a known brand (12.9%)
how: count of posts in mv_brand_mentions where has_brand_mention = true (any dictionary handle found in mention_tags, hashtags, or caption text). 12,891 / 99,920 = 12.9%.
5,406 → 420
brand only via hashtag (raw → cleaned)
how: posts where brand_via_hashtag AND NOT brand_via_mention_tag AND NOT brand_via_caption_text. Caveat: raw 5,406 conflates ads with topic tags (e.g. a news post tagging #Amazon) and first-party brand posts. Restricting to creator authors and dropping generic/place tokens (instagram, miami, nfl, disney…) leaves 420 plausible. See §05.
1,141
mention a sponsor brand but not disclosed paid
how: posts mentioning a handle from sponsor_handles (brands proven to pay here), paid_partnership=false, minus 12 generic platform handles (instagram, nfl…). See "high-confidence" section.
587
brands confirmed by hikerapi
how: handles in handle_verdict with verdict = 'brand' (hikerapi returned a commercial account category). The defensible brand set. (Was “18,485 dictionary handles” — that was a noisy raw lookup list, mostly unused; see methodology.)

A brand mention is detected in 12.9% of posts. A notable channel: brands referenced only through a hashtag (e.g. #nike), invisible to the structured mention_tags field — though raw hashtag matching is noisy (see §04: #Amazon on a news story is a topic tag, not an ad). And among posts naming a brand that provably runs paid partnerships in this data, undisclosed mentions outnumber disclosed ones roughly after noise filtering — the candidate pool this detection project exists to surface.

§03 · How detection works

Three mention locations

  1. mention_tags — structured @-handles Instagram resolved for the post.
  2. hashtag_tags — hashtags; a brand can hide as #brand.
  3. free text@handles in the caption / transcript that never reached the structured arrays (regex-extracted).

Transcripts yielded zero @-handles — speech-to-text doesn't emit handles — so text detection is caption-driven.

What counts as a "brand"

Step 1 is a dictionary mined from the dataset itself — handles harvested from three fields. This is a broad, noisy candidate list (18,485 total); hikerapi verification (below) is what turns it into the trustworthy 587-brand set.

Sourcehandlesconfidence
sponsor handles (paid)130high
brand-typed co-authors3,438medium
brand-typed accounts14,917broad
how each: sponsor = distinct lowercased handles from posts.sponsor_handles where sponsor_present. coauthor_brand = handles from coauthor_handles where coauthor_is_brand. brand_account = influencer_name of authors whose influencer_account_type_v2='brand'. De-duplicated (keep strongest source) → 18,485. Only 6,221 are ever actually mentioned in a post.

§04 · Hiker-verified brands live API enrichment

The data-derived dictionary is broad and noisy. To ground it in reality, the top 2,000 most-mentioned handles were enriched live via the hikerapi Instagram API (get_v2_user_by_username), storing the complete raw profile per handle. The brand-discriminating signal is the account categorynot is_business, which is true for creators too (a creator like maddiedoodle__ is a "business" account categorised "Entrepreneur").

587
handles hiker confirms are brands
how: of the top-2,000 most-mentioned handles enriched via hikerapi, those whose account category matched a commercial pattern (Brand, Clothing, Retail, Health/beauty…) via classify_brand().
2,403
posts with a hiker-confirmed brand
how: posts in mv_brand_verified where has_verified_brand = true (mention a handle with verdict 'brand').
303
posts whose “brand” is really a person (noise removed)
how: posts where has_verified_person = true — a handle the raw dictionary called a brand but hikerapi categorised as a creator/person (e.g. carlifestyle = "Digital creator").
2,327
confirmed-brand mentions not disclosed paid
how: has_verified_brand = true AND paid_partnership = false. Disclosed counterpart = 76 (paid_partnership = true).

Enrichment outcome (2,000 handles)

Hiker verdicthandlesmeaning
brand587commercial category → confirmed
person223creator/individual → false positive
org_other196team / community / label / media
uncertain949empty category → kept at dict confidence
not_found45handle no longer resolves

~49% of accounts return an empty category (incl. ZARA, Louis Vuitton, BMW and Kim Kardashian) — category can't classify those, so they fall back to the dictionary.

Top hiker-confirmed brands

Handlecategoryposts
amazonRetail company300
sheinofficialBrand209
tommyhilfigerBrand99
yslbeautyHealth/beauty93
ultabeautyBeauty/cosmetic89
uggClothing (Brand)64
sezaneClothing (Brand)61
larocheposayHealth/beauty45
Hiker is sharper, not perfect. Category corrects clear noise (carlifestyle → "Digital creator", removed) — but some genuine brands self-select a creator category and get dropped: minecraft ("Video Game"), cerakote ("Entrepreneur"), gozwift/Zwift ("Fitness Trainer"). So treat "person" as high-precision noise removal, not a complete brand blocklist. The full raw hiker profile is stored per handle (brand_verified.raw) for richer signals later.

§05 · Where brands are mentioned

Locationposts w/ brandshare of detections
Structured mention_tags7,477
Hashtags #brand6,738
Caption free text (extra, not in tags)11
Transcript free text0

Locations overlap (a post can mention a brand in several). 5,406 posts are caught only by hashtag — but see the caveat below. Caption free-text adds almost nothing (11) — and that is correct, not a bug: the JSONL is already structured, so Instagram has resolved real caption @-mentions into mention_tags. Of 23,848 posts with an @ in the caption, 23,099 are already in mention_tags; only 749 are “extra,” and those are mostly not real mentions — email addresses (name@gmail.com) and truncated URLs (@amazon.com) caught by the @ regex. Just 11 coincidentally matched a brand handle. The structured field has effectively pre-absorbed this channel.

how: per-location counts are count(*) FILTER (WHERE brand_via_mention_tag / _hashtag / _caption_text / _transcript_text) over mv_brand_mentions. A handle counts here if it's in the 18,485-dictionary; transcript = 0 because speech-to-text emits no @-handles.
Hashtag matching is a string match, not an intent judgment. A #brand hashtag conflates “sponsored by this brand” with “this post is about this brand.” Validated example: p/DYuDMtUiEll (@dainikbhaskar_, a news outlet) carries #Amazon in a story about tech layoffs — there is no Amazon mention or relationship in the post; it’s a topic tag. The raw 5,406 includes such topic tags, first-party brand posts (2,371 authored by brand accounts), and generic tokens (#instagram 1,416, #miami, #nfl, #disney, #starwars). Restricting to creator authors and removing generic/place/platform tokens leaves 420 plausible cases — and even some of those (#imax, #hornets=insects) are topical. Treat hashtag-only as a weak, high-recall/low-precision signal, not a confirmed mention.

§06 · Brand mention vs. disclosure flag

brand detectednot detected
paid = true218248
paid = false12,67386,781

Cross-tab of all 99,920 posts.

how: GROUP BY paid_partnership, has_brand_mention over mv_brand_mentions. “detected” = dictionary brand mention (18,485 set); rows: 218 / 248 / 12,673 / 86,781 sum to 99,920.

Two readings

Recall on disclosed ads = 46.8%. We detect a brand handle in only 218 of 466 disclosed-paid posts — many paid posts name the sponsor in prose or rely on the platform partnership banner rather than an @-handle. A handle/hashtag detector alone under-covers.

12,673 brand mentions are undisclosed. Most are organic (a creator tagging a brand they like), but this is precisely where hidden ads live — the pool to triage.

§07 · High-confidence undisclosed candidates

Restricting to posts that mention a known sponsor brand (brands proven to pay creators in this dataset) and excluding 12 generic platform/place handles (instagram, nfl, amazon, disney, …) that polluted the sponsor list:

1,141
sponsor-brand mention, not disclosed
how: distinct posts mentioning a sponsor-tier handle, paid_partnership=false, excluding 12 generic handles (instagram/nfl/amazon/disney/youtube/tiktok/spotify/google/facebook/threads/whatsapp/miami).
151
sponsor-brand mention, disclosed paid
how: same sponsor-handle set, paid_partnership=true.
~7.6×
undisclosed : disclosed ratio
how: 1,141 ÷ 151 = 7.55. Read: for brands proven to pay here, undisclosed mentions outnumber disclosed ~7.6×.

Top sponsor brands among undisclosed posts

Brand handleundisclosed posts
lego311
sheinofficial182
yslbeauty92
etsy76
adidasoriginals70
ebay52
peppermayo39
garageclothing33
newlook31
macys28
googlepixel25
bravotv24

§08 · Breakdowns

By post type

Typepostsbranded%
post47,7457,05914.78
reel41,7735,20512.46
story10,4026276.03
how: GROUP BY type over mv_brand_mentions; branded = has_brand_mention. % = branded ÷ posts.

By author account type

Account typepostsbranded%
brand15,6554,35627.82
creator33,9674,44413.08
unknown2,65134813.13
media6,92578511.34
pro_service13,6371,42910.48
venue2,3181385.95
(empty)23,8491,3445.64
how: GROUP BY influencer_account_type_v2 over mv_brand_mentions. Caveat: brand & pro_service authors are first-party accounts — high brand-mention % is expected (brands mention brands) and is not an ad signal. Ad detection should focus on creator (0.98% paid-flag rate, 3–7× the others).

§09 · Partnership taxonomy & doc validation audited the spec

We tagged every post with a two-filter taxonomy — a confidence tier (one per post) and multi-label partnership types — then audited it against the Competitor Insights — Paid Partnership Taxonomy Notion doc (built on this same D1 dataset). The doc is a hypothesis we double-checked, not a spec we followed.

Filter 1 — confidence tier (one per post)

TierOur triggerpostsdoc
Confirmedplatform flag466466 ✓
Likely#ad/#sponsored, “sponsored by/paid partnership”, brand co-author2,0952,151
Possiblepromo-code / affiliate-host / gifting (soft, low-conf)836668
Noneno commercial signal96,52396,635

Tiers are leak-free: Confirmed = exactly the 466 flagged; Likely/Possible/None contain 0 flagged posts.

how: pp_tier column in mv_partnership (CASE over signal booleans, highest tier wins). “doc” = numbers published in the Notion taxonomy doc, shown for comparison — we reproduce the hard signals exactly but our soft-signal tiers differ (see audit note below).

Filter 2 — partnership type (multi-label)

TypeOur triggerpostsdoc
collaborationbrand co-author1,9211,921 ✓
affiliatepromo code / affiliate host656487
sponsoredflag / #ad / #sponsored / “sponsored by”700766
giftedgifting language / #gifted300291

Types are multi-label and only apply to commercial posts — counts don’t sum to 100%.

how: unnest(partnership_types) from mv_partnership, grouped. A post can carry several types. collaboration (=brand co-author) and the hard sponsored signals reproduce the doc; affiliate/gifted are soft-text regex and diverge.
Audit verdict — the doc is right where it’s structured, soft where it’s textual. Every hard signal reproduced exactly: #ad=338, #sponsored=32, #ambassador=8, brand co-author=1,921, any @mention=43,416, Confirmed=466. The earlier gap on Possible / Affiliate traced to a single over-broad sub-pattern: a bare [0-9]% off regex that fired on any retail markdown — spa pricing, product listings, first-party sales — not creator affiliate offers (472 such posts). Removing it brings Affiliate 1,130 → 656 and Possible 1,263 → 836, both now near the doc’s 487 / 668. The residual gap is the opposite error in the doc: it missed real bare-code XXXX affiliate posts (“use code jess15”, “code ali20 to save”). Net lesson: soft-text volumes swing on regex wording — keep them at the lowest tier only (we never let them reach “Likely”).

Cross-check: tiers × hiker-verified brand mentions

Tierpostsw/ verified brand
Confirmed46676
Likely2,095267
Possible83670
None96,5231,990

Brand mentions concentrate in commercial tiers — but 1,990 “None” posts still mention a hiker-verified brand, reinforcing that brand-mention detection surfaces commercial content the disclosure-driven tiers miss.

how: mv_partnership JOIN mv_brand_verified on post id, GROUP BY pp_tier, counting has_verified_brand.

§10 · Limitations & next steps