Waterfall enrich HubSpot contacts across Apollo, Clearbit, and PDL using code

high complexityCost: $0Recommended

Prerequisites

Prerequisites
  • Node.js 18+ or Python 3.9+
  • HubSpot private app token with crm.objects.contacts.read and crm.objects.contacts.write scopes
  • Apollo API key (Settings → Integrations → API)
  • Clearbit API key (API → API Keys in dashboard)
  • People Data Labs API key (from PDL dashboard)
  • A scheduling environment: cron or GitHub Actions

Why code?

The waterfall pattern maps perfectly to a loop over a providers array. Adding, removing, or reordering providers is one line. The merge logic is a simple conditional check per field. No platform credits, no visual clutter from 10+ nodes, and full control over error handling for each provider's unique failure modes (Clearbit 404s, PDL 402s, Apollo rate limits).

The trade-off is maintenance. You own retry logic, rate limiting, and monitoring for three different APIs. There's no visual execution history showing which provider was called for each contact. If non-technical team members need to modify the provider order, use n8n instead.

Step 1: Set up the project

# Test each API key
curl -X POST "https://api.apollo.io/api/v1/people/match" \
  -H "x-api-key: $APOLLO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com"}'
 
curl "https://person.clearbit.com/v2/people/find?email=test@example.com" \
  -H "Authorization: Bearer $CLEARBIT_API_KEY"
 
curl -X POST "https://api.peopledatalabs.com/v5/person/enrich" \
  -H "x-api-key: $PDL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com"}'

Step 2: Build the enrichment providers

Define each provider as a function that accepts an email and returns a normalized result. This makes it easy to reorder or swap providers:

import requests
import os
import time
 
APOLLO_API_KEY = os.environ["APOLLO_API_KEY"]
CLEARBIT_API_KEY = os.environ["CLEARBIT_API_KEY"]
PDL_API_KEY = os.environ["PDL_API_KEY"]
 
def enrich_apollo(email):
    """Apollo People Match — returns normalized fields."""
    resp = requests.post(
        "https://api.apollo.io/api/v1/people/match",
        headers={"x-api-key": APOLLO_API_KEY, "Content-Type": "application/json"},
        json={"email": email}
    )
    resp.raise_for_status()
    person = resp.json().get("person")
    if not person:
        return {}
    return {
        "jobtitle": person.get("title"),
        "company": person.get("organization", {}).get("name"),
        "phone": (person.get("phone_numbers") or [{}])[0].get("sanitized_number"),
        "linkedin_url": person.get("linkedin_url"),
        "industry": person.get("organization", {}).get("industry"),
        "seniority": person.get("seniority"),
    }
 
def enrich_clearbit(email):
    """Clearbit Person Enrichment — returns normalized fields."""
    resp = requests.get(
        f"https://person.clearbit.com/v2/people/find?email={email}",
        headers={"Authorization": f"Bearer {CLEARBIT_API_KEY}"}
    )
    if resp.status_code == 404:
        return {}
    resp.raise_for_status()
    data = resp.json()
    handle = data.get("linkedin", {}).get("handle")
    return {
        "jobtitle": data.get("employment", {}).get("title"),
        "company": data.get("employment", {}).get("name"),
        "seniority": data.get("employment", {}).get("seniority"),
        "linkedin_url": f"https://linkedin.com/in/{handle}" if handle else None,
    }
 
def enrich_pdl(email):
    """People Data Labs Enrichment — returns normalized fields."""
    resp = requests.post(
        "https://api.peopledatalabs.com/v5/person/enrich",
        headers={"x-api-key": PDL_API_KEY, "Content-Type": "application/json"},
        json={"email": email}
    )
    if resp.status_code == 404:
        return {}
    resp.raise_for_status()
    data = resp.json().get("data", resp.json())
    phones = data.get("phone_numbers") or []
    return {
        "jobtitle": data.get("job_title"),
        "company": data.get("job_company_name"),
        "phone": phones[0] if phones else None,
        "linkedin_url": data.get("linkedin_url"),
        "industry": data.get("industry"),
    }

Step 3: Implement the waterfall logic

The core pattern: call the first provider, check for missing fields, call the next provider only for what's still missing. Each subsequent provider's data only fills gaps — it never overwrites.

REQUIRED_FIELDS = ["jobtitle", "company", "phone", "linkedin_url", "industry"]
PROVIDERS = [
    ("apollo", enrich_apollo),
    ("clearbit", enrich_clearbit),
    ("pdl", enrich_pdl),
]
 
def waterfall_enrich(email):
    """Call providers in sequence, stopping when all fields are filled."""
    merged = {}
    sources = []
 
    for name, enrich_fn in PROVIDERS:
        missing = [f for f in REQUIRED_FIELDS if not merged.get(f)]
        if not missing:
            break
 
        try:
            result = enrich_fn(email)
            filled_something = False
            for field, value in result.items():
                if value and not merged.get(field):
                    merged[field] = value
                    filled_something = True
            if filled_something:
                sources.append(name)
        except Exception as e:
            print(f"  {name} failed for {email}: {e}")
 
        time.sleep(0.5)  # rate limit buffer between providers
 
    merged["enrichment_source"] = "+".join(sources) if sources else "none"
    return merged
Reorder with one line

The waterfall order is defined by the PROVIDERS array. To try Clearbit first, just move it to index 0. No other code changes needed.

Step 4: Fetch contacts and write results to HubSpot

HUBSPOT_ACCESS_TOKEN = os.environ["HUBSPOT_ACCESS_TOKEN"]
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_ACCESS_TOKEN}", "Content-Type": "application/json"}
 
def get_unenriched_contacts(limit=50):
    """Find contacts missing job title."""
    resp = requests.post(
        "https://api.hubapi.com/crm/v3/objects/contacts/search",
        headers=HS_HEADERS,
        json={
            "filterGroups": [{"filters": [{
                "propertyName": "jobtitle",
                "operator": "NOT_HAS_PROPERTY"
            }]}],
            "properties": ["email", "jobtitle", "company", "phone"],
            "limit": limit
        }
    )
    resp.raise_for_status()
    return resp.json()["results"]
 
def update_contact(contact_id, fields):
    """Write non-null fields to HubSpot."""
    properties = {k: v for k, v in fields.items() if v}
    if not properties:
        return
    resp = requests.patch(
        f"https://api.hubapi.com/crm/v3/objects/contacts/{contact_id}",
        headers=HS_HEADERS,
        json={"properties": properties}
    )
    resp.raise_for_status()
 
def main():
    contacts = get_unenriched_contacts()
    print(f"Found {len(contacts)} contacts to enrich")
 
    for contact in contacts:
        email = contact["properties"].get("email")
        if not email:
            continue
 
        print(f"Enriching {email}...")
        fields = waterfall_enrich(email)
        update_contact(contact["id"], fields)
        print(f"  Source: {fields.get('enrichment_source')} | Fields: {[k for k,v in fields.items() if v and k != 'enrichment_source']}")
 
    print("Done.")
 
if __name__ == "__main__":
    main()

Step 5: Schedule the script

# .github/workflows/waterfall-enrich.yml
name: Waterfall Enrichment
on:
  schedule:
    - cron: '0 */4 * * *'  # Every 4 hours
  workflow_dispatch: {}
jobs:
  enrich:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install requests
      - run: python enrich.py
        env:
          HUBSPOT_ACCESS_TOKEN: ${{ secrets.HUBSPOT_ACCESS_TOKEN }}
          APOLLO_API_KEY: ${{ secrets.APOLLO_API_KEY }}
          CLEARBIT_API_KEY: ${{ secrets.CLEARBIT_API_KEY }}
          PDL_API_KEY: ${{ secrets.PDL_API_KEY }}

Rate limits

APILimitDelay
Apollo People Match5 req/sec (Basic)500ms between calls
Clearbit Person600 req/minUnlikely to hit
People Data Labs10 req/sec500ms between calls
HubSpot Search5 req/sec200ms between pages
HubSpot PATCH150 req/10 secNo delay needed

Troubleshooting

Cost

  • Apollo: 1 credit/enrichment. Called for every contact.
  • Clearbit: Volume-based, starting at $99/mo. Called only when Apollo has gaps (~30% of contacts).
  • People Data Labs: $0.03-0.10/enrichment. Called only when both Apollo and Clearbit have gaps (~10-15%).
  • Per 100 contacts (typical): 100 Apollo credits + ~30 Clearbit credits + ~10-15 PDL credits. Roughly $15-25 total at standard pricing.
  • HubSpot: Free within rate limits.
  • GitHub Actions: Free tier (2,000 min/month).
Clearbit 404s cost credits

Clearbit charges for 404 (not found) responses on some plans. Check your Clearbit plan terms — if 404s cost credits, add a domain pre-check or only call Clearbit for well-known company domains.

Common questions

How much does the full waterfall cost per 100 contacts?

Best case (Apollo fills everything): 100 Apollo credits (~$5.44 on Basic). Typical case: 100 Apollo + 30 Clearbit + 10 PDL = ~$8-15 total depending on your plans. GitHub Actions and HubSpot are free.

How do I add a fourth provider?

Write a new enrich_* function that accepts an email and returns a normalized dict with the same field names. Add it to the PROVIDERS array at the position you want. The waterfall loop handles the rest.

Should I run the waterfall on every new contact or in batches?

For real-time enrichment on contact creation, use n8n or Make with a trigger. For batch processing, the code approach with a cron schedule is more cost-effective. Most teams run daily batches for new contacts and weekly batches for re-enrichment of contacts where the first pass missed fields.

Next steps

  • Add provider ROI tracking — log fill rates per provider to a CSV and review monthly to see if all three are worth paying for
  • Weight by ICP — if your ICP is enterprise, try Clearbit first (stronger enterprise coverage). If SMB, try Apollo first.
  • Add caching — store enrichment results in a local SQLite database to avoid re-enriching the same email across runs

Looking to scale your AI operations?

We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.