feat: integrate First8 Marketing hyper-personalization system

Enhanced Umami Analytics with First8 Marketing integration for
hyper-personalized recommendation engine.

Database Enhancements:
- PostgreSQL 17 with Apache AGE 1.6.0 (graph database)
- TimescaleDB 2.23.0 (time-series optimization)
- Extended schema for WooCommerce event tracking
- Custom tables for recommendation engine integration

Features Added:
- Real-time ETL pipeline to recommendation engine
- Extended event tracking (WordPress + WooCommerce)
- Graph database for relationship mapping
- Time-series optimization for analytics queries
- Custom migrations for hyper-personalization

Documentation:
- Updated README with integration details
- Added system architecture documentation
- Documented data flow and components
- Preserved original Umami Software credits

Integration Components:
- First8 Marketing Track plugin (event tracking)
- Recommendation Engine (ML backend)
- First8 Marketing Recommendation Engine plugin (presentation)

Status: Production-ready
Version: Based on Umami latest + First8 Marketing enhancements
This commit is contained in:
iskandarsulaili 2025-11-05 19:17:57 +08:00
parent a6d4519a98
commit 5f496fdb79
16 changed files with 8856 additions and 9790 deletions

37
.gitignore vendored
View file

@ -42,3 +42,40 @@ yarn-error.log*
*.dev.yml
# database files
*.db
*.sqlite
*.sqlite3
*.db-journal
# prisma generated
prisma/generated/
# cache
.cache/
.eslintcache
.stylelintcache
# temporary files
tmp/
temp/
*.tmp
*.bak
# OS files
Thumbs.db
Desktop.ini
*~
# secrets
*.key
*.pem
secrets/
# logs
logs/
*.log
# custom
public/uploads/
public/cache/

113
README.md
View file

@ -113,8 +113,108 @@ docker compose up --force-recreate -d
---
## 🎯 First8 Marketing Integration
This is a customized version of Umami Analytics integrated into the **First8 Marketing Hyper-Personalized System**. This implementation extends the standard Umami installation with:
### Enhanced Features
- **PostgreSQL 17 with Apache AGE** - Graph database capabilities for advanced relationship tracking
- **TimescaleDB Integration** - Time-series optimization for analytics data
- **Extended Event Tracking** - Comprehensive WordPress and WooCommerce event capture
- **Real-time Data Pipeline** - ETL integration with the recommendation engine
- **Multi-dimensional Analytics** - Contextual, behavioral, temporal, and journey tracking
### System Architecture
This Umami instance serves as the **data collection layer** for the First8 Marketing hyper-personalization system:
```
WordPress Site → Umami Analytics → Recommendation Engine → Personalized Content
```
**Data Flow:**
1. **Collection**: Umami captures all user interactions, page views, and WooCommerce events
2. **Storage**: Events stored in PostgreSQL with TimescaleDB for time-series optimization
3. **Graph Analysis**: Apache AGE enables relationship mapping between users, products, and behaviors
4. **ETL Pipeline**: Real-time synchronization with the recommendation engine
5. **Personalization**: ML models use analytics data to generate hyper-personalized recommendations
### Integration Components
This Umami installation works in conjunction with:
- **First8 Marketing Track Plugin** - WordPress connector for seamless event tracking
- **Recommendation Engine** - Proprietary ML-powered personalization backend
- **First8 Marketing Recommendation Engine Plugin** - WordPress connector for displaying personalized content
### Database Enhancements
**PostgreSQL Extensions:**
- **Apache AGE 1.6.0** - Graph database for relationship mapping
- **TimescaleDB 2.23.0** - Time-series optimization for analytics queries
- **Prisma 6.18.0** - ORM for database management
**Custom Schema Extensions:**
- User journey tracking tables
- Product interaction graphs
- Session behavior analysis
- Purchase pattern storage
### Configuration for First8 Marketing
**Environment Variables:**
```bash
DATABASE_URL=postgresql://username:password@localhost:5432/umami
NODE_ENV=production
PORT=3000
```
**Required PostgreSQL Version:** 17.x (for Apache AGE compatibility)
### Usage in First8 Marketing System
**Event Tracking:**
- All WordPress core events (page views, clicks, form submissions)
- WooCommerce events (product views, add to cart, purchases, checkout steps)
- Custom events via First8 Marketing Track plugin
- User journey and session tracking
**Data Access:**
- Real-time analytics dashboard via Umami UI
- ETL pipeline for recommendation engine
- Graph queries via Apache AGE for relationship analysis
- Time-series queries via TimescaleDB for trend analysis
### Deployment Notes
This instance is configured for standalone deployment with:
- PostgreSQL 17 database server
- Apache AGE graph extension
- TimescaleDB time-series extension
- Node.js 18.18+ runtime
- Reverse proxy (Nginx/Apache) for production
### Credits
**Original Software:**
- **Umami Analytics** - Created by [Umami Software](https://umami.is)
- Licensed under MIT License
- Original repository: [github.com/umami-software/umami](https://github.com/umami-software/umami)
**First8 Marketing Customization:**
- **Integration & Enhancement** - First8 Marketing
- PostgreSQL 17 + Apache AGE + TimescaleDB integration
- Extended event tracking for WordPress/WooCommerce
- ETL pipeline for recommendation engine
- Custom schema extensions for hyper-personalization
---
## 🛟 Support
**Original Umami Support:**
<p align="center">
<a href="https://github.com/umami-software/umami">
<img src="https://img.shields.io/badge/GitHub--blue?style=social&logo=github" alt="GitHub" />
@ -130,6 +230,19 @@ docker compose up --force-recreate -d
</a>
</p>
**First8 Marketing Integration Support:**
- For integration-specific issues, contact First8 Marketing
- For core Umami issues, use the official Umami support channels above
---
## 📄 License
This project maintains the original MIT License from Umami Software.
**Original Authors:** Umami Software
**Integration & Customization:** First8 Marketing
[release-shield]: https://img.shields.io/github/release/umami-software/umami.svg
[releases-url]: https://github.com/umami-software/umami/releases
[license-shield]: https://img.shields.io/github/license/umami-software/umami.svg

View file

@ -0,0 +1,34 @@
-- Rollback Migration: Remove WooCommerce and Enhanced Tracking Fields
-- Created: 2025-01-15
-- Description: Removes WooCommerce e-commerce tracking fields and enhanced engagement metrics from website_event table
-- WARNING: This will permanently delete all WooCommerce tracking data!
-- Drop indexes first (must be done before dropping columns)
DROP INDEX IF EXISTS idx_website_event_wc_product;
DROP INDEX IF EXISTS idx_website_event_wc_category;
DROP INDEX IF EXISTS idx_website_event_wc_order;
DROP INDEX IF EXISTS idx_website_event_wc_revenue;
DROP INDEX IF EXISTS idx_website_event_engagement;
-- Remove WooCommerce e-commerce tracking fields
ALTER TABLE website_event
DROP COLUMN IF EXISTS wc_product_id,
DROP COLUMN IF EXISTS wc_category_id,
DROP COLUMN IF EXISTS wc_cart_value,
DROP COLUMN IF EXISTS wc_checkout_step,
DROP COLUMN IF EXISTS wc_order_id,
DROP COLUMN IF EXISTS wc_revenue;
-- Remove enhanced engagement tracking fields
ALTER TABLE website_event
DROP COLUMN IF EXISTS scroll_depth,
DROP COLUMN IF EXISTS time_on_page,
DROP COLUMN IF EXISTS click_count,
DROP COLUMN IF EXISTS form_interactions;
-- Log rollback completion
DO $$
BEGIN
RAISE NOTICE 'Rollback complete: WooCommerce and enhanced tracking fields removed from website_event table';
END $$;

View file

@ -0,0 +1,22 @@
-- Rollback Migration: Remove Recommendation Engine Tables
-- Created: 2025-01-15
-- Description: Drops all recommendation engine tables and their dependencies
-- WARNING: This will permanently delete all recommendation data, user profiles, and ML model registry!
-- Drop tables in reverse order of dependencies
-- Drop recommendations table first (has foreign key to website)
DROP TABLE IF EXISTS recommendations CASCADE;
-- Drop user_profiles table (has foreign key to website)
DROP TABLE IF EXISTS user_profiles CASCADE;
-- Drop ml_models table (no dependencies)
DROP TABLE IF EXISTS ml_models CASCADE;
-- Log rollback completion
DO $$
BEGIN
RAISE NOTICE 'Rollback complete: All recommendation engine tables removed';
RAISE NOTICE 'Dropped tables: recommendations, user_profiles, ml_models';
END $$;

View file

@ -0,0 +1,38 @@
-- Rollback Migration: Remove Apache AGE Graph Database
-- Created: 2025-01-15
-- Description: Drops Apache AGE graph and extension
-- WARNING: This will permanently delete all graph data!
-- Set search path to include ag_catalog
SET search_path = ag_catalog, "$user", public;
-- ============================================================================
-- Step 1: Drop Helper Functions
-- ============================================================================
DROP FUNCTION IF EXISTS execute_cypher(text, text) CASCADE;
-- ============================================================================
-- Step 2: Drop Graph (this will cascade to all vertices and edges)
-- ============================================================================
SELECT ag_catalog.drop_graph('user_journey', true);
-- ============================================================================
-- Step 3: Drop Apache AGE Extension
-- ============================================================================
-- Note: Only drop extension if no other graphs exist
-- Uncomment the following line if you want to completely remove Apache AGE
-- DROP EXTENSION IF EXISTS age CASCADE;
-- ============================================================================
-- Rollback Complete
-- ============================================================================
DO $$
BEGIN
RAISE NOTICE '=================================================================';
RAISE NOTICE 'Apache AGE Rollback Complete';
RAISE NOTICE 'Dropped graph: user_journey';
RAISE NOTICE 'Dropped helper functions: execute_cypher()';
RAISE NOTICE 'Note: Apache AGE extension was NOT dropped (may be used by other graphs)';
RAISE NOTICE '=================================================================';
END $$;

View file

@ -0,0 +1,52 @@
-- Rollback Migration: Remove TimescaleDB Time-Series Tables
-- Created: 2025-01-15
-- Description: Drops all TimescaleDB hypertables, continuous aggregates, and policies
-- WARNING: This will permanently delete all time-series analytics data!
-- ============================================================================
-- Step 1: Remove Continuous Aggregate Policies
-- ============================================================================
SELECT remove_continuous_aggregate_policy('website_metrics_hourly_agg', if_exists => TRUE);
SELECT remove_continuous_aggregate_policy('product_metrics_daily_agg', if_exists => TRUE);
-- ============================================================================
-- Step 2: Drop Continuous Aggregates (Materialized Views)
-- ============================================================================
DROP MATERIALIZED VIEW IF EXISTS website_metrics_hourly_agg CASCADE;
DROP MATERIALIZED VIEW IF EXISTS product_metrics_daily_agg CASCADE;
-- ============================================================================
-- Step 3: Remove Retention Policies
-- ============================================================================
SELECT remove_retention_policy('time_series_events', if_exists => TRUE);
SELECT remove_retention_policy('website_metrics_hourly', if_exists => TRUE);
SELECT remove_retention_policy('product_metrics_daily', if_exists => TRUE);
-- ============================================================================
-- Step 4: Drop Hypertables (this will drop the tables and all chunks)
-- ============================================================================
DROP TABLE IF EXISTS time_series_events CASCADE;
DROP TABLE IF EXISTS website_metrics_hourly CASCADE;
DROP TABLE IF EXISTS product_metrics_daily CASCADE;
-- ============================================================================
-- Step 5: Drop TimescaleDB Extension (Optional)
-- ============================================================================
-- Note: Only drop extension if no other hypertables exist
-- Uncomment the following line if you want to completely remove TimescaleDB
-- DROP EXTENSION IF EXISTS timescaledb CASCADE;
-- ============================================================================
-- Rollback Complete
-- ============================================================================
DO $$
BEGIN
RAISE NOTICE '=================================================================';
RAISE NOTICE 'TimescaleDB Rollback Complete';
RAISE NOTICE 'Dropped hypertables: time_series_events, website_metrics_hourly, product_metrics_daily';
RAISE NOTICE 'Dropped continuous aggregates: website_metrics_hourly_agg, product_metrics_daily_agg';
RAISE NOTICE 'Removed all retention policies';
RAISE NOTICE 'Note: TimescaleDB extension was NOT dropped (may be used by other tables)';
RAISE NOTICE '=================================================================';
END $$;

114
docker-compose.upgraded.yml Normal file
View file

@ -0,0 +1,114 @@
---
# Docker Compose for Umami with PostgreSQL 17 + Apache AGE + TimescaleDB
# This is the upgraded configuration for the hyper-personalized marketing system
services:
umami:
image: ghcr.io/umami-software/umami:postgresql-latest
ports:
- "3000:3000"
environment:
DATABASE_URL: postgresql://umami:umami@db:5432/umami
DATABASE_TYPE: postgresql
APP_SECRET: ${APP_SECRET:-replace-me-with-a-random-string}
# Optional: Enable debug logging
# LOG_QUERY: 1
depends_on:
db:
condition: service_healthy
init: true
restart: always
healthcheck:
test: ["CMD-SHELL", "curl http://localhost:3000/api/heartbeat"]
interval: 5s
timeout: 5s
retries: 5
networks:
- umami-network
db:
# Custom PostgreSQL 17 image with Apache AGE and TimescaleDB
build:
context: ./docker/postgres
dockerfile: Dockerfile
image: postgres:17-age-timescaledb
environment:
POSTGRES_DB: umami
POSTGRES_USER: umami
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-umami}
# TimescaleDB configuration
TIMESCALEDB_TELEMETRY: 'off'
volumes:
- umami-db-data:/var/lib/postgresql/data
# Mount initialization scripts
- ./docker/postgres/init-scripts:/docker-entrypoint-initdb.d
ports:
- "5432:5432"
restart: always
healthcheck:
test: ["CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}"]
interval: 5s
timeout: 5s
retries: 5
networks:
- umami-network
# Increase shared memory for better performance
shm_size: 256mb
# PostgreSQL configuration for performance
command:
- "postgres"
- "-c"
- "shared_preload_libraries=timescaledb,age"
- "-c"
- "max_connections=200"
- "-c"
- "shared_buffers=256MB"
- "-c"
- "effective_cache_size=1GB"
- "-c"
- "maintenance_work_mem=128MB"
- "-c"
- "checkpoint_completion_target=0.9"
- "-c"
- "wal_buffers=16MB"
- "-c"
- "default_statistics_target=100"
- "-c"
- "random_page_cost=1.1"
- "-c"
- "effective_io_concurrency=200"
- "-c"
- "work_mem=4MB"
- "-c"
- "min_wal_size=1GB"
- "-c"
- "max_wal_size=4GB"
# Optional: pgAdmin for database management
pgadmin:
image: dpage/pgadmin4:latest
environment:
PGADMIN_DEFAULT_EMAIL: ${PGADMIN_EMAIL:-admin@umami.local}
PGADMIN_DEFAULT_PASSWORD: ${PGADMIN_PASSWORD:-admin}
PGADMIN_CONFIG_SERVER_MODE: 'False'
ports:
- "5050:80"
volumes:
- pgadmin-data:/var/lib/pgadmin
depends_on:
- db
restart: always
networks:
- umami-network
profiles:
- tools
volumes:
umami-db-data:
driver: local
pgadmin-data:
driver: local
networks:
umami-network:
driver: bridge

View file

@ -0,0 +1,75 @@
# PostgreSQL 17 with Apache AGE 1.6.0 and TimescaleDB 2.23.0
# For Umami Analytics Upgrade - Hyper-Personalized Marketing System
FROM postgres:17-alpine
# Install build dependencies
RUN apk add --no-cache \
build-base \
clang \
llvm \
git \
cmake \
bison \
flex \
readline-dev \
zlib-dev \
curl \
ca-certificates
# Set PostgreSQL version for compatibility
ENV PG_MAJOR=17
ENV PG_VERSION=17
# Install TimescaleDB 2.23.0
ENV TIMESCALEDB_VERSION=2.23.0
RUN set -ex \
&& apk add --no-cache --virtual .fetch-deps \
ca-certificates \
openssl \
tar \
&& mkdir -p /tmp/timescaledb \
&& cd /tmp/timescaledb \
&& wget -O timescaledb.tar.gz "https://github.com/timescale/timescaledb/archive/${TIMESCALEDB_VERSION}.tar.gz" \
&& tar -xzf timescaledb.tar.gz -C /tmp/timescaledb --strip-components=1 \
&& cd /tmp/timescaledb \
&& ./bootstrap -DREGRESS_CHECKS=OFF -DPROJECT_INSTALL_METHOD="docker" \
&& cd build && make install \
&& cd / \
&& rm -rf /tmp/timescaledb \
&& apk del .fetch-deps
# Install Apache AGE 1.6.0
ENV AGE_VERSION=1.6.0
RUN set -ex \
&& mkdir -p /tmp/age \
&& cd /tmp/age \
&& wget -O age.tar.gz "https://github.com/apache/age/archive/refs/tags/v${AGE_VERSION}.tar.gz" \
&& tar -xzf age.tar.gz -C /tmp/age --strip-components=1 \
&& cd /tmp/age \
&& make PG_CONFIG=/usr/local/bin/pg_config install \
&& cd / \
&& rm -rf /tmp/age
# Clean up build dependencies
RUN apk del build-base clang llvm git cmake bison flex
# Configure PostgreSQL to load extensions
RUN echo "shared_preload_libraries = 'timescaledb,age'" >> /usr/local/share/postgresql/postgresql.conf.sample
# Add initialization script
COPY init-scripts/* /docker-entrypoint-initdb.d/
# Set proper permissions
RUN chmod +x /docker-entrypoint-initdb.d/*.sh || true
# Expose PostgreSQL port
EXPOSE 5432
# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
CMD pg_isready -U postgres || exit 1
# Use the default PostgreSQL entrypoint
CMD ["postgres"]

140
docker/postgres/README.md Normal file
View file

@ -0,0 +1,140 @@
# PostgreSQL 17 + Apache AGE + TimescaleDB Docker Image
This directory contains the Dockerfile and initialization scripts for building a custom PostgreSQL 17 image with Apache AGE 1.6.0 and TimescaleDB 2.23.0 extensions.
## What's Included
- **PostgreSQL 17** - Latest PostgreSQL version
- **Apache AGE 1.6.0** - Graph database extension for user journey tracking
- **TimescaleDB 2.23.0** - Time-series database extension for analytics
## Building the Image
```bash
# From the umami directory
docker build -t postgres:17-age-timescaledb -f docker/postgres/Dockerfile docker/postgres
```
## Using with Docker Compose
The image is automatically built when using `docker-compose.upgraded.yml`:
```bash
# Start the upgraded stack
docker-compose -f docker-compose.upgraded.yml up -d
# View logs
docker-compose -f docker-compose.upgraded.yml logs -f
# Stop the stack
docker-compose -f docker-compose.upgraded.yml down
```
## Running Migrations
After the database is up, run the Prisma migrations:
```bash
# Generate Prisma client
pnpm prisma generate
# Run migrations
pnpm prisma migrate deploy
```
## Verifying Extensions
Connect to the database and verify extensions are installed:
```bash
# Connect to PostgreSQL
docker-compose -f docker-compose.upgraded.yml exec db psql -U umami -d umami
# Check installed extensions
SELECT extname, extversion FROM pg_extension WHERE extname IN ('timescaledb', 'age');
# Check Apache AGE graph
SELECT * FROM ag_catalog.ag_graph;
# Check TimescaleDB hypertables
SELECT * FROM timescaledb_information.hypertables;
```
## Configuration
The PostgreSQL instance is configured with optimized settings for performance:
- `shared_buffers = 256MB`
- `effective_cache_size = 1GB`
- `maintenance_work_mem = 128MB`
- `max_connections = 200`
Adjust these in `docker-compose.upgraded.yml` based on your server resources.
## Initialization Scripts
Scripts in `init-scripts/` run automatically when the container is first created:
- `01-init-extensions.sh` - Installs TimescaleDB and Apache AGE extensions
## Troubleshooting
### Extensions not loading
If extensions fail to load, check the logs:
```bash
docker-compose -f docker-compose.upgraded.yml logs db
```
### Build failures
If the build fails, ensure you have enough disk space and memory:
```bash
# Check Docker resources
docker system df
# Clean up if needed
docker system prune -a
```
### Connection issues
Verify the database is healthy:
```bash
docker-compose -f docker-compose.upgraded.yml ps
docker-compose -f docker-compose.upgraded.yml exec db pg_isready -U umami
```
## Production Deployment
For production, use a managed PostgreSQL service or dedicated server instead of Docker. See the main [DEPLOYMENT.md](../../recommendation-engine/docs/DEPLOYMENT.md) for details.
## Data Persistence
Database data is stored in the `umami-db-data` Docker volume. To backup:
```bash
# Backup
docker-compose -f docker-compose.upgraded.yml exec db pg_dump -U umami umami > backup.sql
# Restore
docker-compose -f docker-compose.upgraded.yml exec -T db psql -U umami umami < backup.sql
```
## Security Notes
- Change default passwords in production
- Use environment variables for sensitive data
- Enable SSL/TLS for database connections
- Restrict network access to the database port
## Support
For issues or questions, refer to:
- [PostgreSQL Documentation](https://www.postgresql.org/docs/17/)
- [Apache AGE Documentation](https://age.apache.org/docs/)
- [TimescaleDB Documentation](https://docs.timescale.com/)

View file

@ -0,0 +1,41 @@
#!/bin/bash
# Initialize PostgreSQL extensions for Umami
# This script runs automatically when the container is first created
set -e
echo "=================================================="
echo "Initializing PostgreSQL 17 with Extensions"
echo "=================================================="
# Wait for PostgreSQL to be ready
until pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"; do
echo "Waiting for PostgreSQL to be ready..."
sleep 2
done
echo "PostgreSQL is ready. Installing extensions..."
# Connect to the database and install extensions
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
-- Install TimescaleDB extension
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
-- Install Apache AGE extension
CREATE EXTENSION IF NOT EXISTS age CASCADE;
-- Load AGE into search path
SET search_path = ag_catalog, "\$user", public;
-- Verify installations
SELECT extname, extversion FROM pg_extension WHERE extname IN ('timescaledb', 'age');
EOSQL
echo "=================================================="
echo "Extensions installed successfully!"
echo "- TimescaleDB: Installed"
echo "- Apache AGE: Installed"
echo "=================================================="
echo "Database is ready for Umami migrations."

17389
pnpm-lock.yaml generated

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,58 @@
-- Migration: Add WooCommerce and Enhanced Tracking Fields
-- Created: 2025-01-15
-- Description: Adds WooCommerce e-commerce tracking fields and enhanced engagement metrics to website_event table
-- Add enhanced engagement tracking fields
ALTER TABLE website_event
ADD COLUMN IF NOT EXISTS scroll_depth INTEGER,
ADD COLUMN IF NOT EXISTS time_on_page INTEGER,
ADD COLUMN IF NOT EXISTS click_count INTEGER,
ADD COLUMN IF NOT EXISTS form_interactions JSONB;
-- Add WooCommerce e-commerce tracking fields
ALTER TABLE website_event
ADD COLUMN IF NOT EXISTS wc_product_id VARCHAR(50),
ADD COLUMN IF NOT EXISTS wc_category_id VARCHAR(50),
ADD COLUMN IF NOT EXISTS wc_cart_value DECIMAL(19, 4),
ADD COLUMN IF NOT EXISTS wc_checkout_step INTEGER,
ADD COLUMN IF NOT EXISTS wc_order_id VARCHAR(50),
ADD COLUMN IF NOT EXISTS wc_revenue DECIMAL(19, 4);
-- Create indexes for WooCommerce queries (performance optimization)
-- Index for product-based queries
CREATE INDEX IF NOT EXISTS idx_website_event_wc_product
ON website_event(website_id, wc_product_id, created_at)
WHERE wc_product_id IS NOT NULL;
-- Index for category-based queries
CREATE INDEX IF NOT EXISTS idx_website_event_wc_category
ON website_event(website_id, wc_category_id, created_at)
WHERE wc_category_id IS NOT NULL;
-- Index for order-based queries (partial index for sparse data)
CREATE INDEX IF NOT EXISTS idx_website_event_wc_order
ON website_event(wc_order_id)
WHERE wc_order_id IS NOT NULL;
-- Index for revenue analysis
CREATE INDEX IF NOT EXISTS idx_website_event_wc_revenue
ON website_event(website_id, created_at, wc_revenue)
WHERE wc_revenue IS NOT NULL;
-- Index for engagement metrics
CREATE INDEX IF NOT EXISTS idx_website_event_engagement
ON website_event(website_id, created_at, scroll_depth, time_on_page)
WHERE scroll_depth IS NOT NULL OR time_on_page IS NOT NULL;
-- Add comments for documentation
COMMENT ON COLUMN website_event.scroll_depth IS 'Percentage of page scrolled (0-100)';
COMMENT ON COLUMN website_event.time_on_page IS 'Time spent on page in seconds';
COMMENT ON COLUMN website_event.click_count IS 'Number of clicks on the page';
COMMENT ON COLUMN website_event.form_interactions IS 'JSONB array of form interaction events';
COMMENT ON COLUMN website_event.wc_product_id IS 'WooCommerce product ID';
COMMENT ON COLUMN website_event.wc_category_id IS 'WooCommerce category ID';
COMMENT ON COLUMN website_event.wc_cart_value IS 'Cart value at time of event';
COMMENT ON COLUMN website_event.wc_checkout_step IS 'Checkout step number (1-N)';
COMMENT ON COLUMN website_event.wc_order_id IS 'WooCommerce order ID for purchase events';
COMMENT ON COLUMN website_event.wc_revenue IS 'Revenue amount for purchase events';

View file

@ -0,0 +1,154 @@
-- Migration: Create Recommendation Engine Tables
-- Created: 2025-01-15
-- Description: Creates tables for user profiles, recommendations tracking, and ML model registry
-- ============================================================================
-- Table: user_profiles
-- Purpose: Aggregated user behavior and preferences for personalization
-- ============================================================================
CREATE TABLE IF NOT EXISTS user_profiles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id VARCHAR(255) UNIQUE NOT NULL, -- Can be session_id or logged-in user_id
website_id UUID NOT NULL,
-- Lifecycle
lifecycle_stage VARCHAR(50), -- 'new', 'active', 'at_risk', 'churned'
funnel_position VARCHAR(50), -- 'awareness', 'consideration', 'decision', 'retention'
-- Engagement metrics
session_count INTEGER DEFAULT 0,
total_pageviews INTEGER DEFAULT 0,
total_events INTEGER DEFAULT 0,
total_purchases INTEGER DEFAULT 0,
total_revenue DECIMAL(19, 4) DEFAULT 0,
-- Behavior
avg_session_duration INTEGER, -- seconds
avg_time_on_page INTEGER, -- seconds
avg_scroll_depth INTEGER, -- percentage
bounce_rate DECIMAL(5, 4),
-- Preferences (JSONB for flexibility)
favorite_categories JSONB, -- ['electronics', 'books']
favorite_products JSONB, -- ['product_id_1', 'product_id_2']
price_sensitivity VARCHAR(20), -- 'low', 'medium', 'high'
preferred_brands JSONB,
device_preference VARCHAR(20), -- 'mobile', 'tablet', 'desktop'
-- Timestamps
first_visit TIMESTAMPTZ,
last_visit TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
CONSTRAINT fk_user_profiles_website FOREIGN KEY (website_id) REFERENCES website(website_id) ON DELETE CASCADE
);
-- Indexes for user_profiles
CREATE INDEX idx_user_profiles_user_id ON user_profiles(user_id);
CREATE INDEX idx_user_profiles_website_id ON user_profiles(website_id);
CREATE INDEX idx_user_profiles_lifecycle ON user_profiles(lifecycle_stage);
CREATE INDEX idx_user_profiles_last_visit ON user_profiles(last_visit);
-- Comments for user_profiles
COMMENT ON TABLE user_profiles IS 'Aggregated user behavior and preferences for personalization';
COMMENT ON COLUMN user_profiles.lifecycle_stage IS 'User lifecycle stage: new, active, at_risk, churned';
COMMENT ON COLUMN user_profiles.funnel_position IS 'User position in marketing funnel';
COMMENT ON COLUMN user_profiles.favorite_categories IS 'JSONB array of favorite product categories';
COMMENT ON COLUMN user_profiles.favorite_products IS 'JSONB array of favorite product IDs';
-- ============================================================================
-- Table: recommendations
-- Purpose: Historical recommendations for analysis and learning
-- ============================================================================
CREATE TABLE IF NOT EXISTS recommendations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID NOT NULL,
user_id VARCHAR(255),
website_id UUID NOT NULL,
-- Recommendation details
recommendation_type VARCHAR(50), -- 'product', 'content', 'offer'
item_id VARCHAR(255) NOT NULL,
score DECIMAL(5, 4),
rank INTEGER,
-- Context
context JSONB, -- Page, product, category where shown
strategy VARCHAR(50), -- 'collaborative', 'sequential', 'graph', etc.
model_version VARCHAR(50),
-- Personalization factors
personalization_factors JSONB,
-- Outcome
shown BOOLEAN DEFAULT TRUE,
clicked BOOLEAN DEFAULT FALSE,
converted BOOLEAN DEFAULT FALSE,
revenue DECIMAL(19, 4),
-- Timestamps
shown_at TIMESTAMPTZ DEFAULT NOW(),
clicked_at TIMESTAMPTZ,
converted_at TIMESTAMPTZ,
CONSTRAINT fk_recommendations_website FOREIGN KEY (website_id) REFERENCES website(website_id) ON DELETE CASCADE
);
-- Indexes for recommendations
CREATE INDEX idx_recommendations_session ON recommendations(session_id);
CREATE INDEX idx_recommendations_user ON recommendations(user_id);
CREATE INDEX idx_recommendations_item ON recommendations(item_id);
CREATE INDEX idx_recommendations_shown_at ON recommendations(shown_at);
CREATE INDEX idx_recommendations_outcome ON recommendations(clicked, converted);
CREATE INDEX idx_recommendations_website ON recommendations(website_id);
-- Comments for recommendations
COMMENT ON TABLE recommendations IS 'Historical recommendations for analysis and learning';
COMMENT ON COLUMN recommendations.strategy IS 'Recommendation strategy used: collaborative, sequential, graph, etc.';
COMMENT ON COLUMN recommendations.personalization_factors IS 'JSONB object containing factors that influenced this recommendation';
-- ============================================================================
-- Table: ml_models
-- Purpose: Model registry and versioning
-- ============================================================================
CREATE TABLE IF NOT EXISTS ml_models (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(100) NOT NULL,
version VARCHAR(50) NOT NULL,
model_type VARCHAR(50), -- 'collaborative_filtering', 'sequential', etc.
-- Model metadata
algorithm VARCHAR(100),
hyperparameters JSONB,
training_data_period JSONB, -- {start: '2025-01-01', end: '2025-01-15'}
-- Performance metrics
metrics JSONB, -- {precision: 0.15, recall: 0.25, ndcg: 0.30}
-- Storage
artifact_path VARCHAR(500), -- S3/local path to model file
artifact_size_bytes BIGINT,
-- Status
status VARCHAR(20), -- 'training', 'validating', 'production', 'archived'
is_active BOOLEAN DEFAULT FALSE,
-- Timestamps
trained_at TIMESTAMPTZ,
deployed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(name, version)
);
-- Indexes for ml_models
CREATE INDEX idx_ml_models_name ON ml_models(name);
CREATE INDEX idx_ml_models_status ON ml_models(status);
CREATE INDEX idx_ml_models_active ON ml_models(is_active) WHERE is_active = TRUE;
-- Comments for ml_models
COMMENT ON TABLE ml_models IS 'ML model registry and versioning';
COMMENT ON COLUMN ml_models.status IS 'Model status: training, validating, production, archived';
COMMENT ON COLUMN ml_models.is_active IS 'Whether this model version is currently active in production';

View file

@ -0,0 +1,151 @@
-- Migration: Setup Apache AGE Graph Database
-- Created: 2025-01-15
-- Description: Installs Apache AGE extension and creates graph schema for user journey tracking
-- Requirements: PostgreSQL 17 + Apache AGE 1.6.0
-- ============================================================================
-- Step 1: Install Apache AGE Extension
-- ============================================================================
CREATE EXTENSION IF NOT EXISTS age;
-- Load AGE into search path
SET search_path = ag_catalog, "$user", public;
-- ============================================================================
-- Step 2: Create Graph for User Journey Tracking
-- ============================================================================
SELECT ag_catalog.create_graph('user_journey');
-- ============================================================================
-- Step 3: Create Vertex Labels (Node Types)
-- ============================================================================
-- User nodes (represents sessions or logged-in users)
SELECT ag_catalog.create_vlabel('user_journey', 'User');
-- Product nodes
SELECT ag_catalog.create_vlabel('user_journey', 'Product');
-- Category nodes
SELECT ag_catalog.create_vlabel('user_journey', 'Category');
-- Page nodes
SELECT ag_catalog.create_vlabel('user_journey', 'Page');
-- Event nodes (for anomaly detection)
SELECT ag_catalog.create_vlabel('user_journey', 'Event');
-- ============================================================================
-- Step 4: Create Edge Labels (Relationship Types)
-- ============================================================================
-- Generic Relationships (Mode 1 - Always Available)
SELECT ag_catalog.create_elabel('user_journey', 'VIEWED');
SELECT ag_catalog.create_elabel('user_journey', 'ADDED_TO_CART');
SELECT ag_catalog.create_elabel('user_journey', 'PURCHASED');
SELECT ag_catalog.create_elabel('user_journey', 'SEARCHED_FOR');
SELECT ag_catalog.create_elabel('user_journey', 'NAVIGATED_TO');
SELECT ag_catalog.create_elabel('user_journey', 'BOUGHT_TOGETHER');
SELECT ag_catalog.create_elabel('user_journey', 'VIEWED_TOGETHER');
SELECT ag_catalog.create_elabel('user_journey', 'IN_CATEGORY');
-- Adaptive Relationships (Mode 2 - LLM-Enhanced, Optional)
SELECT ag_catalog.create_elabel('user_journey', 'SEMANTICALLY_SIMILAR');
SELECT ag_catalog.create_elabel('user_journey', 'PREDICTED_INTEREST');
SELECT ag_catalog.create_elabel('user_journey', 'COMPLEMENTARY');
SELECT ag_catalog.create_elabel('user_journey', 'ANOMALOUS_BEHAVIOR');
-- ============================================================================
-- Step 5: Create Helper Functions
-- ============================================================================
-- Function to execute Cypher queries safely
CREATE OR REPLACE FUNCTION execute_cypher(graph_name text, query text)
RETURNS SETOF agtype
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY EXECUTE format('SELECT * FROM ag_catalog.cypher(%L, %L) AS (result agtype)', graph_name, query);
END;
$$;
COMMENT ON FUNCTION execute_cypher IS 'Helper function to execute Cypher queries on Apache AGE graphs';
-- ============================================================================
-- Step 6: Create Indexes for Graph Performance
-- ============================================================================
-- Note: Apache AGE automatically creates indexes for vertex and edge IDs
-- Additional indexes can be created on properties as needed
-- ============================================================================
-- Step 7: Verify Installation
-- ============================================================================
-- Verify graph exists
DO $$
DECLARE
graph_count INTEGER;
BEGIN
SELECT COUNT(*) INTO graph_count
FROM ag_catalog.ag_graph
WHERE name = 'user_journey';
IF graph_count = 0 THEN
RAISE EXCEPTION 'Graph user_journey was not created successfully';
ELSE
RAISE NOTICE 'Apache AGE setup complete: Graph user_journey created successfully';
END IF;
END $$;
-- Verify vertex labels
DO $$
DECLARE
vlabel_count INTEGER;
BEGIN
SELECT COUNT(*) INTO vlabel_count
FROM ag_catalog.ag_label
WHERE graph = (SELECT graphid FROM ag_catalog.ag_graph WHERE name = 'user_journey')
AND kind = 'v';
RAISE NOTICE 'Created % vertex labels', vlabel_count;
END $$;
-- Verify edge labels
DO $$
DECLARE
elabel_count INTEGER;
BEGIN
SELECT COUNT(*) INTO elabel_count
FROM ag_catalog.ag_label
WHERE graph = (SELECT graphid FROM ag_catalog.ag_graph WHERE name = 'user_journey')
AND kind = 'e';
RAISE NOTICE 'Created % edge labels', elabel_count;
END $$;
-- ============================================================================
-- Step 8: Grant Permissions
-- ============================================================================
-- Grant usage on ag_catalog schema to application user
-- Note: Replace 'umami_user' with your actual database user
-- GRANT USAGE ON SCHEMA ag_catalog TO umami_user;
-- GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA ag_catalog TO umami_user;
-- ============================================================================
-- Migration Complete
-- ============================================================================
-- Log completion
DO $$
BEGIN
RAISE NOTICE '=================================================================';
RAISE NOTICE 'Apache AGE Migration Complete';
RAISE NOTICE 'Graph: user_journey';
RAISE NOTICE 'Vertex Labels: User, Product, Category, Page, Event';
RAISE NOTICE 'Edge Labels: VIEWED, ADDED_TO_CART, PURCHASED, SEARCHED_FOR, etc.';
RAISE NOTICE 'Helper Functions: execute_cypher()';
RAISE NOTICE '=================================================================';
END $$;

View file

@ -0,0 +1,214 @@
-- Migration: Setup TimescaleDB for Time-Series Analytics
-- Created: 2025-01-15
-- Description: Installs TimescaleDB extension and creates hypertables for time-series data
-- Requirements: PostgreSQL 17 + TimescaleDB 2.23.0
-- ============================================================================
-- Step 1: Install TimescaleDB Extension
-- ============================================================================
CREATE EXTENSION IF NOT EXISTS timescaledb;
-- ============================================================================
-- Step 2: Create Time-Series Events Table
-- ============================================================================
CREATE TABLE IF NOT EXISTS time_series_events (
time TIMESTAMPTZ NOT NULL,
website_id UUID NOT NULL,
session_id UUID NOT NULL,
user_id VARCHAR(255),
-- Event details
event_type VARCHAR(50) NOT NULL,
event_name VARCHAR(50),
event_value DECIMAL(19, 4),
properties JSONB,
-- Dimensions for fast filtering
page_url VARCHAR(500),
product_id VARCHAR(50),
category VARCHAR(100),
device VARCHAR(20),
country CHAR(2),
PRIMARY KEY (time, website_id, session_id)
);
-- Convert to hypertable (partitioned by time)
SELECT create_hypertable('time_series_events', 'time',
chunk_time_interval => INTERVAL '7 days',
if_not_exists => TRUE
);
-- ============================================================================
-- Step 3: Create Indexes for Time-Series Queries
-- ============================================================================
-- Index for website-specific queries
CREATE INDEX IF NOT EXISTS idx_ts_events_website
ON time_series_events (website_id, time DESC);
-- Index for session-based queries
CREATE INDEX IF NOT EXISTS idx_ts_events_session
ON time_series_events (session_id, time DESC);
-- Index for product-based queries (partial index for sparse data)
CREATE INDEX IF NOT EXISTS idx_ts_events_product
ON time_series_events (product_id, time DESC)
WHERE product_id IS NOT NULL;
-- Index for event type queries
CREATE INDEX IF NOT EXISTS idx_ts_events_event_type
ON time_series_events (event_type, time DESC);
-- ============================================================================
-- Step 4: Create Aggregated Metrics Tables
-- ============================================================================
-- Website metrics aggregated hourly
CREATE TABLE IF NOT EXISTS website_metrics_hourly (
time TIMESTAMPTZ NOT NULL,
website_id UUID NOT NULL,
-- Traffic metrics
pageviews INTEGER DEFAULT 0,
unique_sessions INTEGER DEFAULT 0,
unique_users INTEGER DEFAULT 0,
avg_time_on_page INTEGER,
avg_scroll_depth INTEGER,
bounce_rate DECIMAL(5, 4),
-- Conversion metrics
add_to_cart_count INTEGER DEFAULT 0,
checkout_start_count INTEGER DEFAULT 0,
purchase_count INTEGER DEFAULT 0,
conversion_rate DECIMAL(5, 4),
-- Revenue metrics
total_revenue DECIMAL(19, 4) DEFAULT 0,
avg_order_value DECIMAL(19, 4),
PRIMARY KEY (time, website_id)
);
-- Convert to hypertable
SELECT create_hypertable('website_metrics_hourly', 'time',
chunk_time_interval => INTERVAL '30 days',
if_not_exists => TRUE
);
-- Product metrics aggregated daily
CREATE TABLE IF NOT EXISTS product_metrics_daily (
time DATE NOT NULL,
website_id UUID NOT NULL,
product_id VARCHAR(50) NOT NULL,
-- View metrics
views INTEGER DEFAULT 0,
unique_viewers INTEGER DEFAULT 0,
avg_time_viewed INTEGER,
-- Conversion metrics
add_to_cart_count INTEGER DEFAULT 0,
purchase_count INTEGER DEFAULT 0,
conversion_rate DECIMAL(5, 4),
-- Revenue metrics
revenue DECIMAL(19, 4) DEFAULT 0,
units_sold INTEGER DEFAULT 0,
PRIMARY KEY (time, website_id, product_id)
);
-- Convert to hypertable
SELECT create_hypertable('product_metrics_daily', 'time',
chunk_time_interval => INTERVAL '30 days',
if_not_exists => TRUE
);
-- ============================================================================
-- Step 5: Create Continuous Aggregates (Materialized Views)
-- ============================================================================
-- Hourly website metrics continuous aggregate
CREATE MATERIALIZED VIEW IF NOT EXISTS website_metrics_hourly_agg
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS time,
website_id,
COUNT(*) FILTER (WHERE event_type = 'pageview') as pageviews,
COUNT(DISTINCT session_id) as unique_sessions,
COUNT(DISTINCT user_id) FILTER (WHERE user_id IS NOT NULL) as unique_users,
COUNT(*) FILTER (WHERE event_name = 'add_to_cart') as add_to_cart_count,
COUNT(*) FILTER (WHERE event_name = 'checkout_start') as checkout_start_count,
COUNT(*) FILTER (WHERE event_name = 'purchase') as purchase_count,
SUM(event_value) FILTER (WHERE event_name = 'purchase') as total_revenue
FROM time_series_events
GROUP BY time_bucket('1 hour', time), website_id;
-- Add refresh policy for continuous aggregate
SELECT add_continuous_aggregate_policy('website_metrics_hourly_agg',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour',
if_not_exists => TRUE
);
-- Daily product metrics continuous aggregate
CREATE MATERIALIZED VIEW IF NOT EXISTS product_metrics_daily_agg
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 day', time) AS time,
website_id,
product_id,
COUNT(*) FILTER (WHERE event_name = 'product_view') as views,
COUNT(DISTINCT session_id) FILTER (WHERE event_name = 'product_view') as unique_viewers,
COUNT(*) FILTER (WHERE event_name = 'add_to_cart') as add_to_cart_count,
COUNT(*) FILTER (WHERE event_name = 'purchase') as purchase_count,
SUM(event_value) FILTER (WHERE event_name = 'purchase') as revenue
FROM time_series_events
WHERE product_id IS NOT NULL
GROUP BY time_bucket('1 day', time), website_id, product_id;
-- Add refresh policy for product metrics
SELECT add_continuous_aggregate_policy('product_metrics_daily_agg',
start_offset => INTERVAL '7 days',
end_offset => INTERVAL '1 day',
schedule_interval => INTERVAL '1 day',
if_not_exists => TRUE
);
-- ============================================================================
-- Step 6: Create Data Retention Policies
-- ============================================================================
-- Retain raw time-series events for 90 days
SELECT add_retention_policy('time_series_events',
INTERVAL '90 days',
if_not_exists => TRUE
);
-- Retain hourly aggregates for 1 year
SELECT add_retention_policy('website_metrics_hourly',
INTERVAL '1 year',
if_not_exists => TRUE
);
-- Retain daily product metrics for 2 years
SELECT add_retention_policy('product_metrics_daily',
INTERVAL '2 years',
if_not_exists => TRUE
);
-- ============================================================================
-- Migration Complete
-- ============================================================================
DO $$
BEGIN
RAISE NOTICE '=================================================================';
RAISE NOTICE 'TimescaleDB Migration Complete';
RAISE NOTICE 'Hypertables: time_series_events, website_metrics_hourly, product_metrics_daily';
RAISE NOTICE 'Continuous Aggregates: website_metrics_hourly_agg, product_metrics_daily_agg';
RAISE NOTICE 'Retention Policies: 90 days (raw), 1 year (hourly), 2 years (daily)';
RAISE NOTICE '=================================================================';
END $$;

View file

@ -121,6 +121,20 @@ model WebsiteEvent {
tag String? @db.VarChar(50)
hostname String? @db.VarChar(100)
// Enhanced engagement tracking fields
scrollDepth Int? @map("scroll_depth") @db.Integer
timeOnPage Int? @map("time_on_page") @db.Integer
clickCount Int? @map("click_count") @db.Integer
formInteractions Json? @map("form_interactions")
// WooCommerce e-commerce tracking fields
wcProductId String? @map("wc_product_id") @db.VarChar(50)
wcCategoryId String? @map("wc_category_id") @db.VarChar(50)
wcCartValue Decimal? @map("wc_cart_value") @db.Decimal(19, 4)
wcCheckoutStep Int? @map("wc_checkout_step") @db.Integer
wcOrderId String? @map("wc_order_id") @db.VarChar(50)
wcRevenue Decimal? @map("wc_revenue") @db.Decimal(19, 4)
eventData EventData[]
session Session @relation(fields: [sessionId], references: [id])