Identify website visitors with Clearbit Reveal and create HubSpot companies using an agent skill

medium complexityCost: Usage-based

Prerequisites

Prerequisites
  • Claude Code or another agent that supports the Agent Skills standard
  • Clearbit Reveal API key stored as CLEARBIT_API_KEY environment variable (legacy) or HubSpot Breeze Intelligence enabled
  • HubSpot private app token stored as HUBSPOT_TOKEN environment variable
  • A file or directory containing visitor IP addresses (server logs, CSV export, etc.)
Clearbit is now Breeze Intelligence

Clearbit was acquired by HubSpot and rebranded as Breeze Intelligence. The standalone API is being sunset. This skill works with the legacy Clearbit Reveal API. If you're using Breeze, visitor identification happens natively in HubSpot — use this skill for batch processing of historical log data or as a supplement.

Overview

This agent skill takes a log file or CSV of visitor IP addresses, resolves them to companies via Clearbit Reveal, filters for ICP matches, and creates new company records in HubSpot. Ideal for batch processing a day's worth of visitor data on demand.

Step 1: Create the skill directory

mkdir -p .claude/skills/identify-visitors/scripts

Step 2: Write the SKILL.md file

Create .claude/skills/identify-visitors/SKILL.md:

---
name: identify-visitors
description: Resolve website visitor IPs to companies via Clearbit Reveal and create matching companies in HubSpot.
disable-model-invocation: true
allowed-tools: Bash(python *)
---
 
Identify companies from website visitor IPs and add them to HubSpot.
 
Usage: /identify-visitors <path_to_log_or_csv>
 
1. Run: `python $SKILL_DIR/scripts/reveal_visitors.py --input "$1"`
2. Review the output showing resolved companies and ICP matches
3. Confirm new companies were created in HubSpot

Step 3: Write the script

Create .claude/skills/identify-visitors/scripts/reveal_visitors.py:

#!/usr/bin/env python3
"""
Visitor Identification: Clearbit Reveal → HubSpot
Resolves visitor IPs to companies, filters by ICP, creates HubSpot records.
"""
import argparse
import csv
import os
import re
import sys
import time
 
try:
    import requests
except ImportError:
    os.system("pip install requests -q")
    import requests
 
# --- Config ---
CLEARBIT_API_KEY = os.environ.get("CLEARBIT_API_KEY")
HUBSPOT_TOKEN = os.environ.get("HUBSPOT_TOKEN")
 
if not all([CLEARBIT_API_KEY, HUBSPOT_TOKEN]):
    print("ERROR: Set CLEARBIT_API_KEY and HUBSPOT_TOKEN environment variables")
    sys.exit(1)
 
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_TOKEN}", "Content-Type": "application/json"}
CB_HEADERS = {"Authorization": f"Bearer {CLEARBIT_API_KEY}"}
 
MIN_EMPLOYEES = 50  # Adjust to your ICP
 
# --- Functions ---
def extract_ips(input_path):
    """Extract unique IPs from a log file or CSV."""
    ips = set()
    with open(input_path) as f:
        for line in f:
            # Match IPv4 addresses
            found = re.findall(r'\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b', line)
            ips.update(found)
 
    # Filter out private/reserved IPs
    public_ips = set()
    for ip in ips:
        octets = [int(o) for o in ip.split(".")]
        if octets[0] == 10:
            continue
        if octets[0] == 172 and 16 <= octets[1] <= 31:
            continue
        if octets[0] == 192 and octets[1] == 168:
            continue
        if octets[0] == 127:
            continue
        public_ips.add(ip)
 
    return public_ips
 
 
def reveal_company(ip):
    """Resolve IP to company via Clearbit Reveal."""
    resp = requests.get(
        "https://reveal.clearbit.com/v1/companies/find",
        params={"ip": ip},
        headers=CB_HEADERS,
    )
    if resp.status_code == 404:
        return None
    resp.raise_for_status()
 
    company = resp.json().get("company")
    if not company or company.get("type") != "company":
        return None
 
    return {
        "domain": company.get("domain"),
        "name": company.get("name"),
        "industry": company.get("category", {}).get("industry"),
        "employees": company.get("metrics", {}).get("employees"),
        "city": company.get("geo", {}).get("city"),
        "state": company.get("geo", {}).get("state"),
        "country": company.get("geo", {}).get("country"),
    }
 
 
def company_exists(domain):
    """Check if company exists in HubSpot by domain."""
    resp = requests.post(
        "https://api.hubapi.com/crm/v3/objects/companies/search",
        headers=HS_HEADERS,
        json={"filterGroups": [{"filters": [{
            "propertyName": "domain", "operator": "EQ", "value": domain,
        }]}]},
    )
    resp.raise_for_status()
    return len(resp.json().get("results", [])) > 0
 
 
def create_company(company):
    """Create company in HubSpot."""
    resp = requests.post(
        "https://api.hubapi.com/crm/v3/objects/companies",
        headers=HS_HEADERS,
        json={"properties": {
            "domain": company["domain"],
            "name": company["name"],
            "industry": company.get("industry", ""),
            "numberofemployees": str(company.get("employees", "")),
            "city": company.get("city", ""),
            "state": company.get("state", ""),
            "country": company.get("country", ""),
        }},
    )
    resp.raise_for_status()
    return resp.json()["id"]
 
 
# --- Main ---
parser = argparse.ArgumentParser()
parser.add_argument("--input", required=True, help="Path to log file or CSV with visitor IPs")
parser.add_argument("--min-employees", type=int, default=50, help="Minimum employee count for ICP")
args = parser.parse_args()
 
MIN_EMPLOYEES = args.min_employees
 
ips = extract_ips(args.input)
print(f"Found {len(ips)} unique public IPs\n")
 
created = 0
already_exists = 0
no_match = 0
unresolved = 0
seen_domains = set()
 
for ip in sorted(ips):
    company = reveal_company(ip)
    time.sleep(0.1)  # Respect rate limits
 
    if not company:
        unresolved += 1
        continue
 
    domain = company["domain"]
    if domain in seen_domains:
        continue
    seen_domains.add(domain)
 
    employees = company.get("employees") or 0
    if employees < MIN_EMPLOYEES:
        no_match += 1
        continue
 
    if company_exists(domain):
        print(f"  EXISTS: {company['name']} ({domain})")
        already_exists += 1
        continue
 
    company_id = create_company(company)
    print(f"  CREATED: {company['name']}{employees} employees — ID: {company_id}")
    created += 1
 
print(f"\n--- Summary ---")
print(f"IPs processed: {len(ips)}")
print(f"Companies created: {created}")
print(f"Already in HubSpot: {already_exists}")
print(f"Below ICP threshold: {no_match}")
print(f"Unresolved IPs: {unresolved}")

Step 4: Run the skill

# Process yesterday's access log
/identify-visitors /var/log/nginx/access.log
 
# Process a CSV export from your analytics tool
/identify-visitors ~/downloads/visitor-ips.csv
 
# With a custom employee threshold
python .claude/skills/identify-visitors/scripts/reveal_visitors.py \
  --input /var/log/nginx/access.log --min-employees 100

Step 5: Schedule (optional)

For daily processing, schedule via cron:

# crontab — run daily at 7 AM, process previous day's log
0 7 * * * cd /path/to/project && python .claude/skills/identify-visitors/scripts/reveal_visitors.py \
  --input /var/log/nginx/access.log

When to use this approach

  • You want to batch-process a log file or CSV of IPs on demand
  • You're evaluating visitor identification before committing to Breeze Intelligence pricing
  • You need custom ICP filtering beyond what Breeze offers natively
  • You want full control over which companies get created and with what data

When to graduate to a dedicated tool

  • You need real-time identification (as visitors land, not in daily batches)
  • You want native HubSpot integration without managing API keys
  • Breeze Intelligence pricing works for your volume
Clearbit credit usage

Each IP lookup costs 1 Clearbit Reveal credit, regardless of whether it resolves to a company. A log file with 1,000 unique IPs = 1,000 credits used, even though only 200-300 will resolve. Pre-filter your IPs to exclude known consumer ISP ranges to save credits.

Need help implementing this?

We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.