Skip to content

Instantly share code, notes, and snippets.

@iamarsenibragimov
Created April 7, 2026 19:36
Show Gist options
  • Select an option

  • Save iamarsenibragimov/4671b41b1d8e05923274ca6529160954 to your computer and use it in GitHub Desktop.

Select an option

Save iamarsenibragimov/4671b41b1d8e05923274ca6529160954 to your computer and use it in GitHub Desktop.
CrustData: Missing profiles investigation — null employer fields, DB Search gaps, RT Search 400s

CrustData: Missing Profiles Investigation

Date of search run: 2026-04-07
Search: VP Engineering / CTO / Head of Engineering, Germany region, current-or-past employee at 71 European tech companies
API version used: 1.6.27

Files in this Gist

File Description
01_production_request.json Exact request body sent to /screener/persondb/search
02_response_and_gap.md Response stats + comparison with LinkedIn results
03_enrich_evidence.md Stored enrich response showing null employer fields (v1.6.27)
04_rt_search_400s.md RT Search returning 400 on all pages for the same companies

Summary

Our production DB Search returned 471 candidates from 71 companies. LinkedIn scraping of the same companies returned ~15% more candidates. For Delivery Hero specifically:

  • LinkedIn found: 56 candidates
  • CrustData DB Search found: 46 candidates
  • Missing: 14 profiles (listed in 02_response_and_gap.md)

Root cause hypothesis: Profiles with null values in employer_name / employee_title fields are:

  1. Not matchable by current_employers.title [.] "..." filters → invisible to DB Search
  2. Not passable by strict_title_and_company_match in RT Search → returns HTTP 400

The specific enrich call (request_id 473bda6e, v1.6.27, 18,502 bytes) stored employer objects with null name/title. Your re-pull (request_id cef6c08b, v1.6.29, 18,650 bytes) reportedly returned "full profile". The 148-byte difference is consistent with a few null → non-null string field fixes — and would explain why the profiles appeared after v1.6.29.

Question for CrustData: Can you confirm that current_employers[0].employer_name and current_employers[0].employee_title are non-null in your v1.6.29 response for request cef6c08b? If yes — this was a parser regression in v1.6.27 that silently corrupted structured data for an unknown number of profiles.

{
"endpoint": "POST https://api.crustdata.com/screener/persondb/search",
"timestamp": "2026-04-07T12:29:25Z",
"request": {
"filters": {
"op": "and",
"conditions": [
{
"op": "or",
"conditions": [
{
"column": "all_employers.company_linkedin_profile_url",
"type": "in",
"value": [
"https://www.linkedin.com/company/spotify",
"https://www.linkedin.com/company/delivery-hero-se",
"https://www.linkedin.com/company/glovo-app",
"https://www.linkedin.com/company/wolt-oy",
"https://www.linkedin.com/company/getyourguide-ag",
"https://www.linkedin.com/company/blablacar",
"https://www.linkedin.com/company/flixbus",
"https://www.linkedin.com/company/booking.com",
"https://www.linkedin.com/company/trivagonv",
"https://www.linkedin.com/company/skyscanner",
"https://www.linkedin.com/company/wiseaccount",
"https://www.linkedin.com/company/monzo-bank",
"https://www.linkedin.com/company/trade-republic",
"https://www.linkedin.com/company/picnictechnologies",
"https://www.linkedin.com/company/vinted",
"https://www.linkedin.com/company/depop",
"https://www.linkedin.com/company/mytheresa-com",
"https://www.linkedin.com/company/boozt-fashion",
"https://www.linkedin.com/company/asos-com",
"https://www.linkedin.com/company/allegro-pl",
"https://www.linkedin.com/company/farfetch.com",
"https://www.linkedin.com/company/h&m",
"https://www.linkedin.com/company/gymshark",
"https://www.linkedin.com/company/klarna",
"https://www.linkedin.com/company/otb-",
"https://www.linkedin.com/company/revolut",
"https://www.linkedin.com/company/n26",
"https://www.linkedin.com/company/sumup",
"https://www.linkedin.com/company/bolt-eu",
"https://www.linkedin.com/company/about-you",
"https://www.linkedin.com/company/adyen",
"https://www.linkedin.com/company/deezer",
"https://www.linkedin.com/company/just-eat-takeaway-com",
"https://www.linkedin.com/company/trustpilot",
"https://www.linkedin.com/company/deliveroo",
"https://www.linkedin.com/company/hellofresh",
"https://www.linkedin.com/company/ocadogroup",
"https://www.linkedin.com/company/gousto",
"https://www.linkedin.com/company/goflink",
"https://www.linkedin.com/company/careersoda",
"https://www.linkedin.com/company/rohlikgroup",
"https://www.linkedin.com/company/vestiaireco",
"https://www.linkedin.com/company/wallapop",
"https://www.linkedin.com/company/back-market",
"https://www.linkedin.com/company/rebuy-recommerce-gmbh",
"https://www.linkedin.com/company/sellpy",
"https://www.linkedin.com/company/rebelle",
"https://www.linkedin.com/company/swappie",
"https://www.linkedin.com/company/zalando"
]
},
{
"column": "all_employers.company_website_domain",
"type": "in",
"value": [
"spotify.com", "deliveryhero.com", "glovoapp.com", "wolt.com",
"getyourguide.com", "blablacar.com", "flixbus.com", "booking.com",
"trivago.com", "skyscanner.com", "wise.com", "monzo.com",
"traderepublic.com", "picnic.app", "vinted.com", "depop.com",
"mytheresa.com", "boozt.com", "asos.com", "allegro.eu",
"farfetch.com", "hm.com", "gymshark.com", "klarna.com",
"otb.net", "revolut.com", "n26.com", "sumup.com",
"bolt.eu", "aboutyou.de", "adyen.com", "deezer.com",
"justeat.com", "trustpilot.com", "takeaway.com",
"deliveroo.co.uk", "hellofresh.com", "ocadogroup.com",
"gousto.co.uk", "goflink.com", "oda.com", "rohlik.cz",
"vestiairecollective.com", "wallapop.com", "backmarket.com",
"rebuy.de", "sellpy.se", "rebelle.com", "swappie.com", "zalando.de"
]
}
]
},
{
"op": "or",
"conditions": [
{"column": "current_employers.title", "type": "[.]", "value": "Chief Technology Officer"},
{"column": "current_employers.title", "type": "[.]", "value": "CTO"},
{"column": "current_employers.title", "type": "[.]", "value": "Chief Technical Officer"},
{"column": "current_employers.title", "type": "[.]", "value": "VP Engineering"},
{"column": "current_employers.title", "type": "[.]", "value": "Vice President of Engineering"},
{"column": "current_employers.title", "type": "[.]", "value": "VP of Engineering"},
{"column": "current_employers.title", "type": "[.]", "value": "Head of Engineering"},
{"column": "current_employers.title", "type": "[.]", "value": "SVP Engineering"},
{"column": "current_employers.title", "type": "[.]", "value": "Senior Vice President of Engineering"},
{"column": "current_employers.title", "type": "[.]", "value": "Chief Product and Technology Officer"},
{"column": "current_employers.title", "type": "[.]", "value": "CPTO"},
{"column": "current_employers.title", "type": "[.]", "value": "VP Eng"},
{"column": "current_employers.title", "type": "[.]", "value": "SVP Eng"},
{"column": "current_employers.title", "type": "[.]", "value": "Head of Eng"}
]
},
{
"column": "region",
"type": "(.)",
"value": "Germany"
}
]
},
"limit": 1000
},
"notes": [
"Batch 1 of 2: 50 companies, including Delivery Hero (deliveryhero.com + delivery-hero-se LinkedIn ID)",
"This is a CURRENT_OR_PAST company scope search (all_employers includes both current and past positions)",
"Title filter uses CURRENT employers only (current_employers.title)",
"api_version in response header: 1.6.27"
]
}

DB Search Response + LinkedIn Comparison

DB Search Response Summary (api version 1.6.27)

  • Batch 1 (50 companies incl. Delivery Hero): 389 profiles returned in a single page
  • Batch 2 (21 companies): 136 profiles returned in a single page
  • Total: 525 profiles → 471 unique candidates created (54 duplicates skipped)

The Gap: Delivery Hero (deliveryhero.com)

We ran the exact same search on two projects:

  • Project A — LinkedIn direct scraping (March 27, 2026): 56 Delivery Hero candidates found
  • Project B — CrustData DB Search (April 7, 2026): 46 Delivery Hero candidates found

Both projects have identical search criteria (CTO / VP Engineering / Head of Engineering, Germany region, current or past company scope).

14 profiles LinkedIn found that CrustData's DB Search did not return:

LinkedIn URL Name Last known DH role
https://www.linkedin.com/in/abodeif Ahmed Abodeif Engineering Manager
https://www.linkedin.com/in/ali-ramezan1 Ali Ramezan (to investigate)
https://www.linkedin.com/in/bastianbuch Bastian Buch CPTO
https://www.linkedin.com/in/chris-kamarakis Chris Kamarakis (to investigate)
https://www.linkedin.com/in/felixhaberland Felix Haberland Chief Product Officer
https://www.linkedin.com/in/jankapusta Ján Kapusta Head of Development
https://www.linkedin.com/in/jose-martin-bejarano José Martín-Bejarano Director of Engineering
https://www.linkedin.com/in/kai-kopperberg-5948519b Kai Kopperberg (to investigate)
https://www.linkedin.com/in/krgolik Konstantin G. (to investigate)
https://www.linkedin.com/in/kristaps-zeibarts Kristaps Zeibarts (to investigate)
https://www.linkedin.com/in/patrick-kauder-21b9506b Patrick Kauder (to investigate)
https://www.linkedin.com/in/tagir-a Tagir A. (to investigate)
https://www.linkedin.com/in/tiagobutzke Tiago Butzke (to investigate)
https://www.linkedin.com/in/vetrenkomaxim Maxim Vetrenko (to investigate)

The company filter explicitly included deliveryhero.com (domain) and delivery-hero-se (LinkedIn company URL) in the all_employers match — so these profiles should have been returned if their CrustData records have correct employer data.

Enrich Evidence: Profile With Empty Employer Fields

Background

When we enriched jose-martin-bejarano via RT Enrich (enrich_realtime=true) during our investigation, our system stored the raw response in the database. The response had:

  • Size: 18,502 bytes
  • API version: 1.6.27 (from response header)
  • Request ID: 473bda6e (from your logs, Mohit)

The response body was not truncated — we stored the complete JSON. The issue was in the content: the current_employers and past_employers arrays existed and had objects, but critical string fields were null inside those objects.

What We Received (api version 1.6.27)

The stored response for current_employers looked like this:

"current_employers": [
  {
    "employer_name": null,
    "employer_linkedin_id": "2393200",
    "employer_company_website_domain": ["deliveryhero.com"],
    "employer_company_id": [5723],
    "employee_title": null,
    "employee_description": null,
    "employee_location": null,
    "start_date": "2021-01-01T00:00:00",
    "end_date": null
  }
]

The employer objects exist. The company ID and LinkedIn ID are populated. But employer_name and employee_title are null.

This is exactly why the profile was invisible to DB Search:

  • DB Search filter: current_employers.title [.] "Director" → no match (title is null)
  • DB Search filter: all_employers.company_website_domain in ["deliveryhero.com"] → this might match via company_id... but only if the index uses company_id, not domain string

What You Received (api version 1.6.29, request cef6c08b)

You told us you "retrieved full profile" — meaning in your version, those same fields are populated.

The 148-Byte Difference

You called it "trivial". We believe it's not — it's the difference between null strings and actual employer name/title strings. A field like "employee_title": "Director Of Engineering" is ~40 bytes. A few such fields explain the entire 148-byte gap.

Reproduction Steps

  1. Enrich https://www.linkedin.com/in/jose-martin-bejarano with enrich_realtime=true
  2. Check current_employers[0].employer_name and current_employers[0].employee_title in the stored v1.6.27 response in your logs (request_id: 473bda6e)
  3. Compare with v1.6.29 response (request_id: cef6c08b)

If those fields differ between versions → v1.6.27 had a parser bug that v1.6.29 fixed. That would explain why 14+ profiles were invisible to our DB Search at the time of our run.

RT Search: 10 Pages of 400 Errors for Delivery Hero

After the DB phase completed (389 profiles), we ran RT Search for each company individually to catch profiles that DB missed. For Delivery Hero specifically:

RT Search Request (per-company)

{
  "filters": [
    {
      "filter_type": "current_company_linkedin_url",
      "type": "EQUALS",
      "value": "https://www.linkedin.com/company/delivery-hero-se"
    },
    {
      "filter_type": "current_job_title",
      "type": "CONTAINS",
      "value": "CTO"
    }
  ],
  "page": 1,
  "post_processing": {
    "strict_title_and_company_match": true,
    "exclude_profiles": ["... 471 already-found slugs ..."]
  }
}

RT Search Response

Every single page (1–10) returned HTTP 400:

[BatchedPipeline] RT current page 1: 400 (strict filtered all) {project: 1427, empty_streak: 1}
[BatchedPipeline] RT current page 2: 400 (strict filtered all) {project: 1427, empty_streak: 2}
...
[BatchedPipeline] RT current page 10: 400 (strict filtered all) {project: 1427, empty_streak: 10}

strict_title_and_company_match: true filters profiles server-side. When it returns 400, it means ALL profiles on that page failed the strict match — i.e., CrustData found profiles matching the query but their employer/title fields didn't strictly match what was searched.

This is consistent with the hypothesis: CrustData's index finds profiles by company_id/domain, but the structured employer data (name, title strings) is null in some records, causing strict match to fail for every result.

Scale of the Problem

We ran DB Search on 71 companies total:

  • 471 unique candidates created
  • RT phase: most companies returned 400 on every page
  • Net RT contribution: 4 additional candidates (from a different batch)

The RT failure rate is abnormally high. Normally we expect RT to supplement DB with 10-20% additional candidates. Here it contributed less than 1%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment