Batch enrich HubSpot contacts missing job title or company size using an agent skill
low complexityCost: Usage-based
Prerequisites
Prerequisites
- Claude Code, Cursor, or another AI coding agent that supports skills
- HubSpot private app token stored as
HUBSPOT_TOKEN(scopes:crm.objects.contacts.read,crm.objects.contacts.write) - Apollo API key stored as
APOLLO_API_KEY
Overview
Create an agent skill that finds contacts in HubSpot with incomplete data, batch-enriches them via Apollo, and fills in the gaps — without overwriting any existing data. Run /batch-enrich on demand or schedule it weekly.
Step 1: Create the skill directory
mkdir -p .claude/skills/batch-enrich/scriptsStep 2: Write the SKILL.md file
Create .claude/skills/batch-enrich/SKILL.md:
---
name: batch-enrich
description: Finds HubSpot contacts missing critical fields (job title, company, phone) and batch-enriches them via Apollo. Only fills empty fields — never overwrites existing data. Uses Apollo's bulk endpoint for efficiency.
disable-model-invocation: true
allowed-tools: Bash(python *)
---
Batch enrich HubSpot contacts with missing fields:
1. Run: `python $SKILL_DIR/scripts/batch_enrich.py`
2. Review the enrichment summary
3. Confirm the number of contacts updated and fields filledStep 3: Write the batch enrichment script
Create .claude/skills/batch-enrich/scripts/batch_enrich.py:
#!/usr/bin/env python3
"""
Batch Enrichment: HubSpot (search missing) → Apollo (bulk enrich) → HubSpot (update)
Finds contacts missing job title, enriches via Apollo bulk endpoint,
writes only empty fields back to HubSpot.
"""
import os
import sys
import time
from datetime import datetime
try:
import requests
except ImportError:
os.system("pip install requests -q")
import requests
HUBSPOT_TOKEN = os.environ.get("HUBSPOT_TOKEN")
APOLLO_API_KEY = os.environ.get("APOLLO_API_KEY")
if not all([HUBSPOT_TOKEN, APOLLO_API_KEY]):
print("ERROR: Set HUBSPOT_TOKEN and APOLLO_API_KEY environment variables")
sys.exit(1)
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_TOKEN}", "Content-Type": "application/json"}
APOLLO_HEADERS = {"x-api-key": APOLLO_API_KEY, "Content-Type": "application/json"}
PERSONAL_DOMAINS = {"gmail.com", "yahoo.com", "hotmail.com", "outlook.com", "aol.com"}
# --- Search for contacts missing job title ---
print(f"[{datetime.now().isoformat()}] Searching for contacts missing job title...")
contacts = []
after = 0
while True:
resp = requests.post(
"https://api.hubapi.com/crm/v3/objects/contacts/search",
headers=HS_HEADERS,
json={
"filterGroups": [{"filters": [{
"propertyName": "jobtitle",
"operator": "NOT_HAS_PROPERTY"
}]}],
"properties": ["email", "firstname", "lastname", "jobtitle", "company", "phone", "linkedin_url", "industry"],
"limit": 100,
"after": after,
}
)
resp.raise_for_status()
data = resp.json()
contacts.extend(data["results"])
if data.get("paging", {}).get("next"):
after = data["paging"]["next"]["after"]
time.sleep(0.2)
else:
break
# Filter to business emails only
business_contacts = [
c for c in contacts
if c["properties"].get("email")
and c["properties"]["email"].split("@")[-1].lower() not in PERSONAL_DOMAINS
]
print(f"Found {len(contacts)} total, {len(business_contacts)} with business emails\n")
# --- Batch enrich via Apollo bulk endpoint ---
enriched = 0
fields_filled = 0
for i in range(0, len(business_contacts), 10):
batch = business_contacts[i:i+10]
details = [{"email": c["properties"]["email"]} for c in batch]
apollo_resp = requests.post(
"https://api.apollo.io/api/v1/people/bulk_match",
headers=APOLLO_HEADERS,
json={"details": details}
)
apollo_resp.raise_for_status()
matches = apollo_resp.json().get("matches", [])
for contact, match in zip(batch, matches):
if not match:
continue
props = contact["properties"]
updates = {}
# Only fill empty fields
if not props.get("jobtitle") and match.get("title"):
updates["jobtitle"] = match["title"]
if not props.get("company") and match.get("organization", {}).get("name"):
updates["company"] = match["organization"]["name"]
if not props.get("phone") and match.get("phone_numbers"):
phone = match["phone_numbers"][0].get("sanitized_number") if match["phone_numbers"] else None
if phone:
updates["phone"] = phone
if not props.get("linkedin_url") and match.get("linkedin_url"):
updates["linkedin_url"] = match["linkedin_url"]
if not props.get("industry") and match.get("organization", {}).get("industry"):
updates["industry"] = match["organization"]["industry"]
if updates:
updates["enrichment_date"] = datetime.now().strftime("%Y-%m-%d")
updates["enrichment_source"] = "apollo-batch"
requests.patch(
f"https://api.hubapi.com/crm/v3/objects/contacts/{contact['id']}",
headers=HS_HEADERS,
json={"properties": updates}
).raise_for_status()
enriched += 1
field_count = len(updates) - 2 # exclude enrichment_date and enrichment_source
fields_filled += field_count
email = contact["properties"]["email"]
print(f" {email} -> {list(k for k in updates if k not in ('enrichment_date', 'enrichment_source'))}")
print(f" Batch {i//10 + 1}: processed {min(i+10, len(business_contacts))}/{len(business_contacts)}")
time.sleep(1)
print(f"\nDone. Enriched {enriched}/{len(business_contacts)} contacts, filled {fields_filled} fields.")Step 4: Run the skill
# Via Claude Code
/batch-enrich
# Or directly
python .claude/skills/batch-enrich/scripts/batch_enrich.pyStep 5: Schedule it weekly (optional)
Option A: Cron
# crontab -e — run every Sunday at 10 PM
0 22 * * 0 cd /path/to/project && python .claude/skills/batch-enrich/scripts/batch_enrich.py >> /var/log/batch-enrich.log 2>&1Option B: GitHub Actions
name: Weekly Batch Enrichment
on:
schedule:
- cron: '0 3 * * 0' # Sunday 3 AM UTC
workflow_dispatch: {}
jobs:
enrich:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install requests
- run: python .claude/skills/batch-enrich/scripts/batch_enrich.py
env:
HUBSPOT_TOKEN: ${{ secrets.HUBSPOT_TOKEN }}
APOLLO_API_KEY: ${{ secrets.APOLLO_API_KEY }}Cost
- Apollo: 1 credit per person (bulk_match has the same per-person cost as individual calls). Basic plan ($49/mo) = 900 credits.
- HubSpot: Free within API rate limits.
- Compute: Free on GitHub Actions.
- Weekly batch of 100 contacts: 100 Apollo credits + 10 bulk API calls. Monthly: ~400 credits for weekly runs.
The script never overwrites existing data
The most important behavior of this script is the if not props.get("jobtitle") check for every field. If a sales rep manually entered a job title, the script will not overwrite it — even if Apollo has a different value. This preserves manually curated data.
When to use this approach
- You want to clean up data gaps right now without building automation infrastructure
- You're onboarding a new data vendor and want to test fill rates before committing to a platform
- You want to run enrichment on a specific segment — modify the search filter for a custom list
- You prefer seeing enrichment results in real-time as the script runs
When to move to a dedicated tool
- You want enrichment to run reliably every week without human intervention
- You need visual monitoring showing fill rates, credit usage, and error rates
- Multiple team members need to manage the enrichment configuration
- You want to chain enrichment with other workflows (lead scoring, routing) in one platform
Need help implementing this?
We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.