Identify website visitors with Clearbit Reveal and create HubSpot companies using code
Prerequisites
- Python 3.9+ or Node.js 18+
- Clearbit Reveal API key (legacy) or HubSpot account with Breeze Intelligence add-on
- HubSpot private app token with
crm.objects.companies.readandcrm.objects.companies.writescopes - Access to server logs or an analytics pipeline that captures visitor IP addresses
Clearbit was acquired by HubSpot and rebranded as Breeze Intelligence. The standalone Reveal API is being sunset. This guide covers both the legacy API approach (for existing Clearbit customers) and the HubSpot-native Breeze approach. New users should start with Breeze.
Step 1: Set up the project
# Test your Clearbit API key
curl -s "https://reveal.clearbit.com/v1/companies/find?ip=203.0.113.42" \
-H "Authorization: Bearer $CLEARBIT_API_KEY" | head -c 300
# Test your HubSpot token
curl -s "https://api.hubapi.com/crm/v3/objects/companies?limit=1" \
-H "Authorization: Bearer $HUBSPOT_TOKEN" | head -c 200Step 2: Extract unique IPs from your logs
Before calling Clearbit, extract and deduplicate visitor IPs. This example reads from a common log format, but adapt it to your analytics pipeline.
import re
from collections import Counter
def extract_ips_from_log(log_path, min_visits=2):
"""Extract IPs that visited key pages multiple times (shows intent)."""
ip_pages = {}
target_pages = ["/pricing", "/demo", "/contact", "/enterprise"]
with open(log_path) as f:
for line in f:
match = re.match(r'^(\d+\.\d+\.\d+\.\d+).*"GET (\S+)', line)
if not match:
continue
ip, page = match.groups()
if any(page.startswith(p) for p in target_pages):
ip_pages.setdefault(ip, []).append(page)
# Only return IPs with multiple visits to high-intent pages
return {ip: pages for ip, pages in ip_pages.items() if len(pages) >= min_visits}Step 3: Resolve IPs to companies via Clearbit Reveal
import requests
import os
import time
CLEARBIT_API_KEY = os.environ["CLEARBIT_API_KEY"]
HUBSPOT_TOKEN = os.environ["HUBSPOT_TOKEN"]
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_TOKEN}", "Content-Type": "application/json"}
def reveal_company(ip):
"""Resolve an IP to a company via Clearbit Reveal."""
resp = requests.get(
"https://reveal.clearbit.com/v1/companies/find",
params={"ip": ip},
headers={"Authorization": f"Bearer {CLEARBIT_API_KEY}"},
)
if resp.status_code == 404:
return None
resp.raise_for_status()
data = resp.json()
company = data.get("company")
if not company or company.get("type") != "company":
return None
return {
"domain": company.get("domain"),
"name": company.get("name"),
"industry": company.get("category", {}).get("industry"),
"employees": company.get("metrics", {}).get("employees"),
"city": company.get("geo", {}).get("city"),
"state": company.get("geo", {}).get("state"),
"country": company.get("geo", {}).get("country"),
"description": company.get("description"),
}Only 20-30% of B2B visitor IPs resolve to a company. Consumer ISPs (Comcast, AT&T), VPNs, and mobile carriers always return null. Filter your IP list to corporate-looking traffic before calling the API to save credits.
Step 4: Filter for ICP and deduplicate against HubSpot
def matches_icp(company, min_employees=50):
"""Check if a resolved company matches your ICP criteria."""
if not company.get("domain"):
return False
employees = company.get("employees") or 0
return employees >= min_employees
def company_exists_in_hubspot(domain):
"""Check if a company with this domain already exists in HubSpot."""
resp = requests.post(
"https://api.hubapi.com/crm/v3/objects/companies/search",
headers=HS_HEADERS,
json={
"filterGroups": [{"filters": [{
"propertyName": "domain",
"operator": "EQ",
"value": domain,
}]}],
},
)
resp.raise_for_status()
results = resp.json().get("results", [])
return results[0]["id"] if results else NoneStep 5: Create companies in HubSpot
def create_hubspot_company(company, pages_visited):
"""Create a new company in HubSpot with visitor metadata."""
resp = requests.post(
"https://api.hubapi.com/crm/v3/objects/companies",
headers=HS_HEADERS,
json={
"properties": {
"domain": company["domain"],
"name": company["name"],
"industry": company.get("industry", ""),
"numberofemployees": str(company.get("employees", "")),
"city": company.get("city", ""),
"state": company.get("state", ""),
"country": company.get("country", ""),
"description": company.get("description", ""),
}
},
)
resp.raise_for_status()
return resp.json()["id"]
# --- Main execution ---
ip_pages = extract_ips_from_log("/var/log/nginx/access.log")
print(f"Found {len(ip_pages)} IPs with high-intent visits")
created = 0
skipped = 0
unresolved = 0
for ip, pages in ip_pages.items():
company = reveal_company(ip)
if not company:
unresolved += 1
continue
if not matches_icp(company):
skipped += 1
continue
existing = company_exists_in_hubspot(company["domain"])
if existing:
print(f" EXISTS: {company['name']} ({company['domain']})")
skipped += 1
continue
company_id = create_hubspot_company(company, pages)
print(f" CREATED: {company['name']} — {company.get('employees', '?')} employees — visited {', '.join(pages)}")
created += 1
time.sleep(0.2)
print(f"\nDone. Created: {created}, Skipped: {skipped}, Unresolved: {unresolved}")Step 6: Schedule with cron or GitHub Actions
# .github/workflows/identify-visitors.yml
name: Identify Website Visitors
on:
schedule:
- cron: '0 8 * * *' # Daily at 8 AM UTC
workflow_dispatch: {}
jobs:
run:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install requests
- run: python identify_visitors.py
env:
CLEARBIT_API_KEY: ${{ secrets.CLEARBIT_API_KEY }}
HUBSPOT_TOKEN: ${{ secrets.HUBSPOT_TOKEN }}If your logs aren't accessible from GitHub Actions, pipe IPs to a file (S3, GCS) during the day and download it in the workflow. Or use a webhook-based approach where your server pushes IPs to an API endpoint in real time.
Breeze Intelligence alternative
If you're using HubSpot Breeze Intelligence, the IP-to-company resolution happens automatically within HubSpot — no code needed for that step.
What code adds on top of Breeze:
- Custom ICP filtering — Breeze identifies all visitors, but you may want stricter filters
- Custom properties — Enrich the auto-created records with data from your logs (pages visited, visit count, referrer)
- Routing logic — Assign companies to sales reps based on territory, industry, or company size
# Example: Enrich Breeze-created companies with visit metadata
# Poll for recently created companies and update them
resp = requests.post(
"https://api.hubapi.com/crm/v3/objects/companies/search",
headers=HS_HEADERS,
json={
"filterGroups": [{"filters": [{
"propertyName": "createdate",
"operator": "GTE",
"value": str(twenty_four_hours_ago_ms),
}]}],
"properties": ["domain", "name"],
"limit": 100,
},
)Cost
- Hosting: Free on GitHub Actions or ~$5/mo on Railway
- Clearbit Reveal (legacy): Volume-based pricing, typically starting ~$99/mo for 2,500 lookups
- Breeze Intelligence: Included with HubSpot Professional+, priced per credit
- HubSpot API: Free with any plan that supports private apps
Need help implementing this?
We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.