Contact deduplication

HubSpot Contact Deduplication: How the Match Logic Works

HubSpot's native deduplication tool compares email address first, phone number second, and name plus company name third. When two contacts match on any of these signals above a confidence threshold, they surface as a suggested merge. The tool does not auto-merge; a human reviews and confirms each pair.

Published 3 July 202650+ projects deliveredSurry Hills, Sydney

What HubSpot actually checks when it looks for duplicates

Email address is the primary signal. HubSpot treats email as a unique identifier for contacts. Two contacts with the same email address will always be flagged. This is an exact match after lowercasing, so `John@Company.com` and `john@company.com` are the same contact; `john.smith@company.com` and `jsmith@company.com` are not.

At the point of capture, this exact-match logic also prevents duplicates from being created in the first place. When someone fills out a HubSpot form, the system checks whether a contact with that email already exists. If one does, HubSpot updates that existing record instead of creating a new one. This is why form submissions do not systematically generate duplicates the way manual imports do.

Phone number is the secondary signal. HubSpot normalises phone numbers before comparing them. It strips formatting characters (spaces, dashes, parentheses, country code prefixes) and then checks for an exact match on the numeric string. A contact with `+61 2 9000 0000` and one with `02 9000 0000` can match if the normalisation resolves them to the same digits. This behaviour varies slightly depending on the property in use and whether the portal is set to a specific region.

Name plus company is the tertiary signal. For contacts who have no email address, or as a supplementary signal where email is present but the system wants a second opinion, HubSpot applies a similarity comparison on first name, last name, and company name. The exact algorithm is not published by HubSpot, but in practice it behaves like a fuzzy match with a high threshold. Two contacts named "John Smith" at "Acme Corp" will likely surface as a duplicate pair even if their phone numbers differ.

These three signals combine to generate a confidence score. The Manage Duplicates tool surfaces pairs above a minimum threshold, ranked by confidence. You can find the tool under Contacts, then Actions, then Manage Duplicates.

Where duplicates come from in the first place

The most common sources:

Imports without a deduplication pass. When you import a CSV, HubSpot checks email addresses in the import file against existing contacts. If a match is found, it updates the existing record. But if the import file itself contains two rows with the same email, only one survives. If the import file contains two contacts with no email (or different but functionally identical emails like a work alias and a personal address), both get created.

Integrations that do not use email as the join key. Some CRM integrations, event platforms, and marketing tools pass contacts without email addresses, or with email addresses that are slightly different from what already exists in HubSpot. Each API call creates a new contact unless your integration is explicitly looking up and matching first.

Manual data entry. People create contacts by hand. They do not check for an existing record first, or they use a different name format, or they enter a mobile number instead of an office number. The portal accumulates variations.

Salesforce sync. If you run HubSpot alongside Salesforce, the sync relationship maps on email by default, but edge cases exist. Leads and Contacts in Salesforce that share an email but have different record types, or that were created before the sync was active, can arrive in HubSpot as separate records.

Unsubscribed or bounced contacts re-entering via a new form. A contact who previously unsubscribed, or whose email bounced, may re-engage via a different email address. HubSpot will not match these, because the email is different. You end up with two valid contacts representing the same person at different points in time.

How the merge works: winner, loser, and what actually happens to data

When you confirm a merge in HubSpot, one contact becomes the primary record (the winner) and one is absorbed (the loser). The loser record is deleted; its data is folded into the winner.

Property values: By default, HubSpot keeps the winner's property values. If the winner has no value for a property and the loser does, the loser's value populates that field. You can override this during merge review by selecting which value to keep for each property, though the interface only shows the most important fields. Properties that exist on the loser but not on the winner are transferred.

Timeline activity: Activity from both records merges into a single timeline on the winner. Emails, calls, notes, and form submissions from the loser are all retained. This is the main reason merging is preferable to deleting the duplicate: you keep the full history.

Associations: Deals, companies, tickets, and conversations associated with the loser are re-associated to the winner. If the same deal was associated with both (which should not happen but occasionally does), it appears once on the merged record.

List memberships: Active list memberships transfer. Static list memberships from the loser are typically not transferred; the contact is removed from those lists. This matters if you are using static lists for segmentation or suppression.

HubSpot Score: The merged contact's score is recalculated based on the current criteria, not inherited from either record.

What changes by tier

Free and Starter accounts have access to the Manage Duplicates tool for manual review. The tool surfaces up to 500 suggested pairs at a time, ranked by confidence. You confirm or dismiss each pair. There is no bulk merge and no automation.

Professional and Enterprise accounts gain access to Operations Hub features, which allow you to build workflows that automate parts of the deduplication process. Using custom code actions, you can write logic that identifies and merges records programmatically, based on criteria you define. This is how teams handle high-volume imports or integrations that regularly introduce duplicates.

Enterprise accounts also have the option of working with HubSpot's professional services team on bulk data operations, though this is a paid engagement.

If you are on Starter and running into the 500-pair limit regularly, that is a signal that the upstream data problem needs fixing, not just the deduplication queue.

Where the native tool fails

Company deduplication is separate. The Manage Duplicates tool has tabs for Contacts and for Companies. The company matching logic uses company name and domain. If two company records exist with different domains, they will not be surfaced as duplicates even if they represent the same organisation. You need to find and merge these manually, or use an integration.

No cross-object deduplication. HubSpot will not surface a duplicate Contact and Lead as the same person if they are in different object types. In a portal that has run alongside Salesforce, you may find contacts created from form fills and contacts synced from Salesforce that represent the same person. The deduplication tool will not catch this automatically if the email addresses differ slightly.

The 500-pair limit is a rolling queue, not a backlog. Once you dismiss a suggested pair, it does not resurface unless the matching signals change. If you dismiss a pair incorrectly, recovering that suggestion requires manual investigation.

Name-only duplicates with no email are difficult. A contact with no email who was created twice with slightly different name spellings may or may not surface depending on whether the fuzzy match threshold is met. In practice, name-only contacts with low data completeness generate fewer suggestions and get missed.

Common failures we see

Clients who run imports without cleaning the source data first. They bring in a file with no email addresses, or with email addresses from a previous system that have since changed. The deduplication tool catches what it can; the rest sit in the portal undetected.

Clients who run form captures and Salesforce sync in parallel without checking the field mapping. A form captures a mobile number; Salesforce syncs an office number. Two contacts are created for the same person because the email addresses differ by a single character or because one was captured in a different case.

Clients who merge records without checking static list membership first. They merge a contact into a primary record, then find a suppression list is missing that person because the loser's list memberships did not transfer.

What to ship your team

If your portal has a backlog of duplicates, start with the Manage Duplicates tool and clear the high-confidence suggestions first. Set a regular cadence: once a month is enough for most portals.

For portals running integrations that regularly introduce contacts, add a deduplication check to the intake process. This is a workflow that runs when a contact is created via API, checks for matching email or phone in existing records, and routes a task to a team member to review before the new record sits in the database for six months unnoticed.

If you are on Professional or Enterprise, consider a custom code action that runs a merge on creation for contacts that match an existing record's email. This keeps the queue manageable and removes the manual step for clear-cut matches.

For company deduplication, run a separate audit. Export your company records, identify duplicates by domain, and merge them before they accumulate deal and contact associations that make merging complicated.

How this fits with the rest of your HubSpot data model

Contact deduplication is one part of a broader data quality practice. The other parts are property cleanup (removing unused and redundant fields), association mapping (making sure contacts link to the right companies, deals, and tickets), and import governance (setting standards for what data must be present before a contact enters HubSpot).

Duplicate contacts are usually a symptom of an intake problem. The deduplication tool resolves the symptom; fixing the intake prevents recurrence.

If you are managing a HubSpot portal at scale and running into recurring duplicate problems, the source is almost always one of three things: an integration that does not match on email, an import process that skips the clean-up step, or a Salesforce sync that was configured before the email field was standardised. Fix those, and the deduplication queue stays clear.

Related reading:

HubSpot calculation properties: how they work and when to use them

HubSpot company hierarchies: parent, child, and subsidiary records

HubSpot workflow re-evaluation: how the 30-minute cadence affects your automations

Related.

Talk it through

If this maps to something you are wrestling with in your own portal, book a free thirty-minute consult and we will tell you where to start.

Book a free consult