Identify website visitors with Clearbit Reveal and create HubSpot companies using code

high complexityCost: $0Recommended

Prerequisites

Prerequisites
  • Python 3.9+ or Node.js 18+
  • Clearbit Reveal API key (legacy) or HubSpot account with Breeze Intelligence add-on
  • HubSpot private app token with crm.objects.companies.read and crm.objects.companies.write scopes
  • Access to server logs or an analytics pipeline that captures visitor IP addresses
Clearbit is now Breeze Intelligence

Clearbit was acquired by HubSpot and rebranded as Breeze Intelligence. The standalone Reveal API is being sunset. This guide covers both the legacy API approach (for existing Clearbit customers) and the HubSpot-native Breeze approach. New users should start with Breeze.

Step 1: Set up the project

# Test your Clearbit API key
curl -s "https://reveal.clearbit.com/v1/companies/find?ip=203.0.113.42" \
  -H "Authorization: Bearer $CLEARBIT_API_KEY" | head -c 300
 
# Test your HubSpot token
curl -s "https://api.hubapi.com/crm/v3/objects/companies?limit=1" \
  -H "Authorization: Bearer $HUBSPOT_TOKEN" | head -c 200

Step 2: Extract unique IPs from your logs

Before calling Clearbit, extract and deduplicate visitor IPs. This example reads from a common log format, but adapt it to your analytics pipeline.

import re
from collections import Counter
 
def extract_ips_from_log(log_path, min_visits=2):
    """Extract IPs that visited key pages multiple times (shows intent)."""
    ip_pages = {}
    target_pages = ["/pricing", "/demo", "/contact", "/enterprise"]
 
    with open(log_path) as f:
        for line in f:
            match = re.match(r'^(\d+\.\d+\.\d+\.\d+).*"GET (\S+)', line)
            if not match:
                continue
            ip, page = match.groups()
            if any(page.startswith(p) for p in target_pages):
                ip_pages.setdefault(ip, []).append(page)
 
    # Only return IPs with multiple visits to high-intent pages
    return {ip: pages for ip, pages in ip_pages.items() if len(pages) >= min_visits}

Step 3: Resolve IPs to companies via Clearbit Reveal

import requests
import os
import time
 
CLEARBIT_API_KEY = os.environ["CLEARBIT_API_KEY"]
HUBSPOT_TOKEN = os.environ["HUBSPOT_TOKEN"]
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_TOKEN}", "Content-Type": "application/json"}
 
def reveal_company(ip):
    """Resolve an IP to a company via Clearbit Reveal."""
    resp = requests.get(
        "https://reveal.clearbit.com/v1/companies/find",
        params={"ip": ip},
        headers={"Authorization": f"Bearer {CLEARBIT_API_KEY}"},
    )
    if resp.status_code == 404:
        return None
    resp.raise_for_status()
    data = resp.json()
 
    company = data.get("company")
    if not company or company.get("type") != "company":
        return None
 
    return {
        "domain": company.get("domain"),
        "name": company.get("name"),
        "industry": company.get("category", {}).get("industry"),
        "employees": company.get("metrics", {}).get("employees"),
        "city": company.get("geo", {}).get("city"),
        "state": company.get("geo", {}).get("state"),
        "country": company.get("geo", {}).get("country"),
        "description": company.get("description"),
    }
Expect a low match rate

Only 20-30% of B2B visitor IPs resolve to a company. Consumer ISPs (Comcast, AT&T), VPNs, and mobile carriers always return null. Filter your IP list to corporate-looking traffic before calling the API to save credits.

Step 4: Filter for ICP and deduplicate against HubSpot

def matches_icp(company, min_employees=50):
    """Check if a resolved company matches your ICP criteria."""
    if not company.get("domain"):
        return False
    employees = company.get("employees") or 0
    return employees >= min_employees
 
 
def company_exists_in_hubspot(domain):
    """Check if a company with this domain already exists in HubSpot."""
    resp = requests.post(
        "https://api.hubapi.com/crm/v3/objects/companies/search",
        headers=HS_HEADERS,
        json={
            "filterGroups": [{"filters": [{
                "propertyName": "domain",
                "operator": "EQ",
                "value": domain,
            }]}],
        },
    )
    resp.raise_for_status()
    results = resp.json().get("results", [])
    return results[0]["id"] if results else None

Step 5: Create companies in HubSpot

def create_hubspot_company(company, pages_visited):
    """Create a new company in HubSpot with visitor metadata."""
    resp = requests.post(
        "https://api.hubapi.com/crm/v3/objects/companies",
        headers=HS_HEADERS,
        json={
            "properties": {
                "domain": company["domain"],
                "name": company["name"],
                "industry": company.get("industry", ""),
                "numberofemployees": str(company.get("employees", "")),
                "city": company.get("city", ""),
                "state": company.get("state", ""),
                "country": company.get("country", ""),
                "description": company.get("description", ""),
            }
        },
    )
    resp.raise_for_status()
    return resp.json()["id"]
 
 
# --- Main execution ---
ip_pages = extract_ips_from_log("/var/log/nginx/access.log")
print(f"Found {len(ip_pages)} IPs with high-intent visits")
 
created = 0
skipped = 0
unresolved = 0
 
for ip, pages in ip_pages.items():
    company = reveal_company(ip)
    if not company:
        unresolved += 1
        continue
 
    if not matches_icp(company):
        skipped += 1
        continue
 
    existing = company_exists_in_hubspot(company["domain"])
    if existing:
        print(f"  EXISTS: {company['name']} ({company['domain']})")
        skipped += 1
        continue
 
    company_id = create_hubspot_company(company, pages)
    print(f"  CREATED: {company['name']}{company.get('employees', '?')} employees — visited {', '.join(pages)}")
    created += 1
    time.sleep(0.2)
 
print(f"\nDone. Created: {created}, Skipped: {skipped}, Unresolved: {unresolved}")

Step 6: Schedule with cron or GitHub Actions

# .github/workflows/identify-visitors.yml
name: Identify Website Visitors
on:
  schedule:
    - cron: '0 8 * * *'  # Daily at 8 AM UTC
  workflow_dispatch: {}
jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install requests
      - run: python identify_visitors.py
        env:
          CLEARBIT_API_KEY: ${{ secrets.CLEARBIT_API_KEY }}
          HUBSPOT_TOKEN: ${{ secrets.HUBSPOT_TOKEN }}
Log access in CI

If your logs aren't accessible from GitHub Actions, pipe IPs to a file (S3, GCS) during the day and download it in the workflow. Or use a webhook-based approach where your server pushes IPs to an API endpoint in real time.

Breeze Intelligence alternative

If you're using HubSpot Breeze Intelligence, the IP-to-company resolution happens automatically within HubSpot — no code needed for that step.

What code adds on top of Breeze:

  1. Custom ICP filtering — Breeze identifies all visitors, but you may want stricter filters
  2. Custom properties — Enrich the auto-created records with data from your logs (pages visited, visit count, referrer)
  3. Routing logic — Assign companies to sales reps based on territory, industry, or company size
# Example: Enrich Breeze-created companies with visit metadata
# Poll for recently created companies and update them
resp = requests.post(
    "https://api.hubapi.com/crm/v3/objects/companies/search",
    headers=HS_HEADERS,
    json={
        "filterGroups": [{"filters": [{
            "propertyName": "createdate",
            "operator": "GTE",
            "value": str(twenty_four_hours_ago_ms),
        }]}],
        "properties": ["domain", "name"],
        "limit": 100,
    },
)

Cost

  • Hosting: Free on GitHub Actions or ~$5/mo on Railway
  • Clearbit Reveal (legacy): Volume-based pricing, typically starting ~$99/mo for 2,500 lookups
  • Breeze Intelligence: Included with HubSpot Professional+, priced per credit
  • HubSpot API: Free with any plan that supports private apps

Need help implementing this?

We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.