feat: implement identity stitching for session linking (#3820)

Adds automatic session linking/identity stitching to link anonymous
browsing sessions with authenticated user sessions.

## Changes

### Database Schema
- Add `identity_link` table (PostgreSQL + ClickHouse) to store mappings
  between visitor IDs and authenticated user IDs
- Add `visitor_id` field to `Session` model
- Add `visitor_id` column to ClickHouse `website_event` table

### Client Tracker
- Generate and persist `visitor_id` in localStorage
- Include `vid` in all tracking payloads
- Support opt-out via `data-identity-stitching="false"` attribute

### API
- Accept `vid` parameter in `/api/send` endpoint
- Auto-create identity links when `identify()` is called with both
  visitor_id and distinct_id
- Store visitor_id in sessions and events

### Query Updates
- Update `getWebsiteStats` to deduplicate visitors by resolved identity
- Visitors who browse anonymously then log in are now counted as one user

## Usage

When a user logs in, call `umami.identify(userId)`. If identity stitching
is enabled (default), the tracker automatically links the anonymous
visitor_id to the authenticated userId. Stats queries then resolve
linked identities to accurately count unique visitors.

Resolves #3820
This commit is contained in:
Arthur Sepiol 2025-12-03 16:06:54 +03:00
parent 9a269ab811
commit a902a87c08
11 changed files with 245 additions and 33 deletions

View file

@ -39,6 +39,7 @@ CREATE TABLE umami.website_event
event_name String,
tag String,
distinct_id String,
visitor_id String,
created_at DateTime('UTC'),
job_id Nullable(UUID)
)
@ -123,6 +124,7 @@ CREATE TABLE umami.website_event_stats_hourly
max_time SimpleAggregateFunction(max, DateTime('UTC')),
tag SimpleAggregateFunction(groupArrayArray, Array(String)),
distinct_id String,
visitor_id String,
created_at Datetime('UTC')
)
ENGINE = AggregatingMergeTree
@ -176,6 +178,7 @@ SELECT
max_time,
tag,
distinct_id,
visitor_id,
timestamp as created_at
FROM (SELECT
website_id,
@ -214,6 +217,7 @@ FROM (SELECT
max(created_at) max_time,
arrayFilter(x -> x != '', groupArray(tag)) tag,
distinct_id,
visitor_id,
toStartOfHour(created_at) timestamp
FROM umami.website_event
GROUP BY website_id,
@ -230,6 +234,7 @@ GROUP BY website_id,
city,
event_type,
distinct_id,
visitor_id,
timestamp);
-- projections
@ -281,3 +286,15 @@ JOIN (SELECT event_id, string_value as currency
WHERE positionCaseInsensitive(data_key, 'currency') > 0) c
ON c.event_id = ed.event_id
WHERE positionCaseInsensitive(data_key, 'revenue') > 0;
-- identity linking
CREATE TABLE umami.identity_link
(
website_id UUID,
visitor_id String,
distinct_id String,
linked_at DateTime('UTC')
)
ENGINE = ReplacingMergeTree(linked_at)
ORDER BY (website_id, visitor_id, distinct_id)
SETTINGS index_granularity = 8192;