Batch enrich HubSpot contacts missing job title or company size using code
Prerequisites
- Node.js 18+ or Python 3.9+
- HubSpot private app token with
crm.objects.contacts.readandcrm.objects.contacts.writescopes - Apollo API key with enrichment credits
- A scheduling environment: cron or GitHub Actions
Why code?
A script is the most cost-effective way to batch enrich contacts. Zero platform fees, full access to Apollo's bulk_match endpoint (10 contacts per request instead of individual calls), and complete control over the field-mapping logic. GitHub Actions provides free weekly scheduling.
The trade-off is maintenance. You own the error handling, rate limiting, retry logic, and monitoring. There's no visual execution history or drag-and-drop configuration. If non-technical team members need to modify which fields get enriched, use n8n or Make instead.
Step 1: Set up the project
# Verify HubSpot search with NOT_HAS_PROPERTY
curl -s -X POST "https://api.hubapi.com/crm/v3/objects/contacts/search" \
-H "Authorization: Bearer $HUBSPOT_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"filterGroups": [{"filters": [{"propertyName": "jobtitle", "operator": "NOT_HAS_PROPERTY"}]}],
"properties": ["email", "jobtitle"],
"limit": 5
}' | python3 -m json.toolStep 2: Search HubSpot for contacts with missing fields
import requests
import os
import time
from datetime import datetime
HUBSPOT_ACCESS_TOKEN = os.environ["HUBSPOT_ACCESS_TOKEN"]
APOLLO_API_KEY = os.environ["APOLLO_API_KEY"]
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_ACCESS_TOKEN}", "Content-Type": "application/json"}
CRITICAL_FIELDS = ["jobtitle", "company", "phone", "industry"]
def get_contacts_missing_fields(field="jobtitle", limit=200):
"""Find contacts missing a specific field using NOT_HAS_PROPERTY."""
contacts = []
after = 0
while len(contacts) < limit:
resp = requests.post(
"https://api.hubapi.com/crm/v3/objects/contacts/search",
headers=HS_HEADERS,
json={
"filterGroups": [{"filters": [{
"propertyName": field,
"operator": "NOT_HAS_PROPERTY"
}]}],
"properties": ["email", "firstname", "lastname"] + CRITICAL_FIELDS,
"limit": min(100, limit - len(contacts)),
"after": after
}
)
resp.raise_for_status()
data = resp.json()
contacts.extend(data["results"])
if data.get("paging", {}).get("next"):
after = data["paging"]["next"]["after"]
time.sleep(0.2) # HubSpot Search rate limit
else:
break
return contactsThe NOT_HAS_PROPERTY operator finds contacts where the property has never been set or was explicitly cleared. It doesn't match empty strings — only truly null values. If someone set a field to an empty string, it won't appear in results.
Step 3: Batch enrich via Apollo's bulk endpoint
Use Apollo's bulk_match endpoint to enrich 10 contacts per request, reducing API calls:
def bulk_enrich_apollo(contacts_batch):
"""Enrich up to 10 contacts in a single Apollo API call."""
details = []
for contact in contacts_batch:
email = contact["properties"].get("email")
if email:
details.append({"email": email})
if not details:
return []
resp = requests.post(
"https://api.apollo.io/api/v1/people/bulk_match",
headers={"x-api-key": APOLLO_API_KEY, "Content-Type": "application/json"},
json={"details": details}
)
resp.raise_for_status()
return resp.json().get("matches", [])
def enrich_batch(contacts):
"""Process contacts in batches of 10 using Apollo bulk endpoint."""
results = []
for i in range(0, len(contacts), 10):
batch = contacts[i:i+10]
matches = bulk_enrich_apollo(batch)
for contact, match in zip(batch, matches):
if match:
results.append({"contact": contact, "match": match})
time.sleep(1) # rate limit between bulk calls
print(f" Processed {min(i+10, len(contacts))}/{len(contacts)}")
return resultsThe matches array is returned in the same order as the details input array. If a person isn't found, the corresponding index contains null. Always zip/pair results by index, not by email.
Step 4: Update HubSpot (only empty fields)
The key rule: never overwrite existing data. Only fill fields that are currently null:
def update_contact_fields(contact_id, existing_props, apollo_match):
"""Write only fields that are currently empty."""
properties = {}
field_map = {
"jobtitle": lambda m: m.get("title"),
"company": lambda m: m.get("organization", {}).get("name"),
"phone": lambda m: (m.get("phone_numbers") or [{}])[0].get("sanitized_number"),
"linkedin_url": lambda m: m.get("linkedin_url"),
"industry": lambda m: m.get("organization", {}).get("industry"),
}
for hs_field, extractor in field_map.items():
existing_value = existing_props.get(hs_field)
if not existing_value: # only fill if empty
new_value = extractor(apollo_match)
if new_value:
properties[hs_field] = new_value
if not properties:
return 0
properties["enrichment_date"] = datetime.now().strftime("%Y-%m-%d")
properties["enrichment_source"] = "apollo-batch"
resp = requests.patch(
f"https://api.hubapi.com/crm/v3/objects/contacts/{contact_id}",
headers=HS_HEADERS,
json={"properties": properties}
)
resp.raise_for_status()
return len(properties) - 2 # subtract enrichment_date and enrichment_sourceStep 5: Tie it together
def main():
print(f"[{datetime.now().isoformat()}] Starting batch enrichment...")
contacts = get_contacts_missing_fields(field="jobtitle", limit=200)
print(f"Found {len(contacts)} contacts missing job title")
# Filter out personal emails
business_contacts = [
c for c in contacts
if c["properties"].get("email") and
c["properties"]["email"].split("@")[-1].lower() not in
("gmail.com", "yahoo.com", "hotmail.com", "outlook.com", "aol.com")
]
print(f"After filtering personal emails: {len(business_contacts)} contacts")
enriched_results = enrich_batch(business_contacts)
print(f"Apollo matched {len(enriched_results)} contacts")
fields_filled = 0
contacts_updated = 0
for item in enriched_results:
count = update_contact_fields(
item["contact"]["id"],
item["contact"]["properties"],
item["match"]
)
if count > 0:
contacts_updated += 1
fields_filled += count
print(f"\nDone. Updated {contacts_updated} contacts, filled {fields_filled} fields.")
if __name__ == "__main__":
main()Step 6: Schedule the script
# .github/workflows/batch-enrich.yml
name: Weekly Batch Enrichment
on:
schedule:
- cron: '0 3 * * 0' # Sunday at 3 AM UTC
workflow_dispatch: {}
jobs:
enrich:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install requests
- run: python batch_enrich.py
env:
HUBSPOT_ACCESS_TOKEN: ${{ secrets.HUBSPOT_ACCESS_TOKEN }}
APOLLO_API_KEY: ${{ secrets.APOLLO_API_KEY }}Rate limits
| API | Limit | Strategy |
|---|---|---|
| HubSpot Search | 5 req/sec | 200ms between paginated calls |
| HubSpot PATCH | 150 req/10 sec | No delay needed for batch sizes under 150 |
| Apollo bulk_match | 5 req/sec, 10 records/request | 1 second between bulk calls |
Troubleshooting
Cost
- Apollo: 1 credit per person in the bulk request (same as individual calls). The bulk endpoint saves time, not credits. Basic plan ($49/mo) = 900 credits. 200 contacts/week = 800 credits/month.
- HubSpot: Free within rate limits.
- GitHub Actions: Free tier (2,000 min/month). Each batch run takes 2-5 minutes.
- Per 200 contacts: 200 Apollo credits + ~20 bulk API calls + ~200 HubSpot PATCH calls. Total cost: ~$11 at Basic plan pricing.
Apollo's bulk_match endpoint doesn't offer a discount — it's 1 credit per person in the request, same as individual calls. The benefit is fewer HTTP requests (1 instead of 10) and faster processing. Use it for efficiency, not cost savings.
Common questions
How much does it cost to enrich 200 contacts per week?
200 Apollo credits/week = 800/month. On the Basic plan ($49/mo, 900 credits), that's $0.054 per contact. GitHub Actions and HubSpot API calls are free. Total monthly cost: $49 for Apollo.
Should I use Python or Node.js?
Both work equally well. Python is slightly more concise for the data manipulation (zip, list comprehensions). Node.js is better if your team already has a JavaScript codebase. The API calls and logic are identical.
How do I prevent re-enriching contacts Apollo already couldn't match?
Set an enrichment_date property on every processed contact, even when Apollo returns no data. Add NOT_HAS_PROPERTY on enrichment_date to your search filter. This excludes previously processed contacts from future runs, saving Apollo credits on repeat misses.
What's the fill rate for Apollo's bulk_match endpoint?
Typical B2B contact lists see 60-75% match rates for job title and 70-85% for company name. Personal email domains (gmail, yahoo) have near-zero match rates — the script filters these out to avoid wasting credits.
Next steps
- Expand field checks — run the search for multiple missing fields: jobtitle, company, phone, industry. Use separate search queries per field or combine with
filterGroups. - Add deduplication — track enriched contacts by ID to avoid re-processing on overlapping runs
- Add Slack summary — post a weekly summary to a Slack channel with enrichment metrics
- Monitor fill rates — log what percentage of contacts Apollo successfully enriches to evaluate ROI
Looking to scale your AI operations?
We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.