Commit graph

55 commits

Author SHA1 Message Date
Arthur Sepiol
34db34759f feat: implement automatic session linking and identity stitching (#3820)
Links anonymous browser sessions to authenticated user identities, enabling unified
user journey tracking across login boundaries. This solves the "logged-out anonymous
session → logged-in session" tracking gap, providing complete funnel visibility and
accurate visitor deduplication.

## Changes

- Client-side: Persistent visitor ID in localStorage (data-identity-stitching attribute)
- Server-side: identity_link table linking visitors to distinct IDs (authenticated users)
- Query updates: getWebsiteStats now deduplicates by resolved identity
- Graceful degradation: Works in Safari private browsing and when localStorage unavailable

## Implementation Details

Uses hybrid approach combining client-side persistence with server-side linking:
- Visitor ID generated once per browser, persists across sessions
- When user logs in, identify() creates identity link
- stats queries join through identity_link to deduplicate cross-device sessions

Both PostgreSQL and ClickHouse supported with appropriate query patterns:
- PostgreSQL: normalized schema, joins through session table
- ClickHouse: denormalized with ReplacingMergeTree for deduplication

## Edge Cases Handled

- Safari private browsing: localStorage throws, visitorId undefined, no link created
- localStorage cleared: new visitorId generated, creates new link
- Multiple tabs: same visitorId shared via localStorage
- Multiple devices: one visitor can link to multiple distinct_ids
- Multiple accounts: one distinct_id can link to multiple visitors

## Test Plan

- [ ] Enable feature on test website (default enabled)
- [ ] Anonymous pageview - confirm visitor_id in events table
- [ ] Call umami.identify('user1') - confirm identity_link created
- [ ] Stats show 1 visitor (deduplicated)
- [ ] Log out, browse anonymously, stats still show 1 visitor
- [ ] Test with data-identity-stitching="false" - no visitor_id collected
- [ ] Test in Safari private browsing - no errors, gracefully skips
- [ ] Test ClickHouse: verify identity_link table populated and FINAL keyword works
- [ ] Verify retroactive: historical anonymous session attributed correctly
2025-12-03 16:54:56 +03:00
Arthur Sepiol
a902a87c08 feat: implement identity stitching for session linking (#3820)
Adds automatic session linking/identity stitching to link anonymous
browsing sessions with authenticated user sessions.

## Changes

### Database Schema
- Add `identity_link` table (PostgreSQL + ClickHouse) to store mappings
  between visitor IDs and authenticated user IDs
- Add `visitor_id` field to `Session` model
- Add `visitor_id` column to ClickHouse `website_event` table

### Client Tracker
- Generate and persist `visitor_id` in localStorage
- Include `vid` in all tracking payloads
- Support opt-out via `data-identity-stitching="false"` attribute

### API
- Accept `vid` parameter in `/api/send` endpoint
- Auto-create identity links when `identify()` is called with both
  visitor_id and distinct_id
- Store visitor_id in sessions and events

### Query Updates
- Update `getWebsiteStats` to deduplicate visitors by resolved identity
- Visitors who browse anonymously then log in are now counted as one user

## Usage

When a user logs in, call `umami.identify(userId)`. If identity stitching
is enabled (default), the tracker automatically links the anonymous
visitor_id to the authenticated userId. Stats queries then resolve
linked identities to accurately count unique visitors.

Resolves #3820
2025-12-03 16:06:54 +03:00
Francis Cao
16451dd5cd update CH view to account for new event types
Some checks are pending
Create docker images / Build, push, and deploy (push) Waiting to run
Node.js CI / build (postgresql, 18.18, 10) (push) Waiting to run
2025-10-02 10:18:13 -07:00
Francis Cao
822ddee9ae update ch schema for custom data numbers 2025-08-12 09:15:42 -07:00
Francis Cao
38f251ead5 finish expanded queries and ui. 2025-08-07 09:47:18 -07:00
Francis Cao
2dcb9e21bd change pagestable to visitors and update clickhouse hostname column to array 2025-07-13 22:44:09 -07:00
Mike Cao
b2a6e3f842
Merge pull request #3505 from eoussama/master
Added optional website ID for creation
2025-07-07 22:58:16 -07:00
Matt Harrington
19ccfa0745 fixing the clickhouse schema file 2025-06-13 12:17:18 -07:00
Francis Cao
9a437dcfa2 convert attribution report 2025-06-07 07:43:36 -07:00
Francis Cao
a16846f4ce add website_revenue table and view. update revenue report to use view 2025-06-06 08:47:52 -07:00
Francis Cao
c5efc27c07 distinct_id schema changes and search on sessions page 2025-04-29 08:57:58 -07:00
Francis Cao
12b8ac4272 app and db schema - region rename, hostname move 2025-04-24 22:42:33 -07:00
Francis Cao
b9a2145766 ch attribution report, schema changes, and migration 2025-04-13 18:12:03 -07:00
Francis Cao
203e782530 Create attribution report template and parameters 2025-03-18 10:00:23 -07:00
Francis Cao
a708e6c350 add tags to ch schema file 2024-10-17 09:53:07 -07:00
Francis Cao
c79720ae1d update session data schema 2024-08-15 09:28:39 -07:00
Francis Cao
3207b0ce06 revert AggregatingMergeTree order by 2024-08-01 16:40:48 -07:00
Francis Cao
57a23bab2d fix hourly order by 2024-08-01 16:16:18 -07:00
Francis Cao
61dfa1391e add projection code 2024-08-01 15:34:35 -07:00
Francis Cao
cb4368e12c template multiple queries for filtering 2024-07-31 09:35:29 -07:00
Francis Cao
161da582ba reorder CH stats index 2024-07-24 16:57:23 -07:00
Francis Cao
174b9e4376 only use hourly table, remove daily table logic, fix updatechart undefined 2024-07-23 22:35:11 -07:00
Francis Cao
038ecdb592 fix pkey for stats tables 2024-07-23 15:34:25 -07:00
Francis Cao
5299e9f579 resolve entry / exit queries 2024-07-22 21:30:06 -07:00
Francis Cao
7381254cc2 add relational migrations. update event_key references to data_key 2024-04-08 20:24:15 -07:00
Francis Cao
cc834083d9 update CH schema/migration to include session_data 2024-04-08 16:36:31 -07:00
Francis Cao
cbeefe733f add psql migration 2024-03-21 09:30:42 -07:00
Francis Cao
1cf5bd488c remove kafka engine tables 2023-12-27 09:22:32 -08:00
Brian Cao
7b97209d56 Fix CH script. 2023-09-13 13:27:07 -07:00
Mike Cao
e4bd314bd6 Updates to insights, event data, telemetry. 2023-07-23 13:18:01 -07:00
Francis Cao
74bd4d5366 add missing semi-colons 2023-07-13 16:30:10 -07:00
Brian Cao
0a3ee2277a Fix numeric to number. 2023-07-06 21:02:56 -07:00
Francis Cao
c901358222 rename to job_id 2023-06-26 10:32:23 -07:00
Francis Cao
a5582416b3 update CH schema with upload_id 2023-06-26 08:19:52 -07:00
Brian Cao
b484286523
Feat/um 305 unique session ch (#2065)
* Add session_data / session redis to CH.

* Add mysql migration.
2023-05-31 21:46:49 -07:00
Francis Cao
d827bf1417 update CH kafka engine tables for error handling 2023-05-07 23:13:40 -07:00
Francis Cao
95ed8a09aa update CH event to website_event 2023-03-29 11:06:12 -07:00
Francis Cao
077fad20ea update skip_broken_messages 2023-03-27 12:44:59 -07:00
Francis Cao
14e4a090bb update schema and queries to implement reset_at 2023-03-27 11:25:16 -07:00
Brian Cao
be2fc0de8d Fix ch schema. 2023-03-23 14:17:32 -07:00
Brian Cao
9979672de5
Feat/um 202 event data new (#1841)
* Add event_data base.

* Add url_path.

* Add eventData back.

* Finish event_data relational.

* resolve comments.
2023-03-23 14:01:15 -07:00
Francis Cao
ea39f5b431 add new event data schema 2023-03-22 23:02:37 -07:00
Francis Cao
b0c5899569 update prisma / ch filters logic 2023-03-20 11:26:45 -07:00
Francis Cao
9321401297 schema changes to CH, Postgres, MySQL 2023-03-14 17:27:17 -07:00
Brian Cao
82f0bc3d2b
remove event_data. (#1804) 2023-03-01 16:42:47 -08:00
Francis Cao
55a586fe27 add subdivision1/2, cities to query logic 2023-02-20 09:04:20 -08:00
Francis Cao
074fa2c5fc add subdivision2 to schema 2023-02-16 09:52:07 -08:00
Francis Cao
b6cc6cb655 update CH, postgres, MySQL schemas 2023-02-15 09:40:49 -08:00
Brian Cao
8732d056dd
Dev (#1702)
* Initial Typescript models.

* Re-add realtime data

* get distinct sessions for session metrics

* Add queries for new schema.

* Fix Typo.

* Add some api/team endpoints.

* Fix destructure error.

* Fix getWebsites call.

* Ignore typescript build errors.

* Fix enum issue.

* add clickhouse route to deleteWebsite

* Fix Website auth.

* Updated lint-staged config.

* Add permission checks.

* Add user role api.

* Fix error when updating website.

* Fix isAdmin check.  Fix Schema.

* Initial conversion to react-basics.

* Remove user/team transfer from website update.

* delete website in relational query

* Fix login secure token creation.

* Add event type to event.

* Allow user to be added to team with role.

* Updated login form.

* Add Role to TeamUser.

* Add database migration.

* Refactored permissions check. Updated redis lib.

* Feat/um 114 roles and permissions (#1683)

* Auth checkpoint.

* Merge branch 'dev' into feat/um-114-roles-and-permissions

* Add 02 migration.

* Added lib/types.

* Updated schema.

* Updated roles and permissions logic.

* Implement react-basics styles. Fix queries.

* Update website details layout.

* Add 01 migration.

* Fix admin create.

* Update react-basics.

Co-authored-by: Francis Cao <franciscao@gmail.com>
Co-authored-by: Mike Cao <mike@mikecao.com>
Co-authored-by: Mike Cao <moocao@gmail.com>
2022-12-12 19:45:38 -08:00
Francis Cao
106dd25594 clickhosue: validate string being sent into event_data 2022-11-16 11:42:02 -08:00