🗄️ Wix Careers - Data Architecture

Complete visual documentation of current and proposed data structures

📊 System Overview

Current Architecture

The Wix Careers site uses a hybrid data architecture with multiple data sources:

  • SmartRecruiters (SR): Source of job postings
  • Oracle External Database: Source of organizational structure (teams, locations)
  • Wix CMS Collections: Synced copies, manual content, and virtual mappings

High-Level Data Flow

graph TB subgraph External["External Systems"] SR[SmartRecruiters API
Job Postings] Oracle[Oracle Database
Teams & Locations] end subgraph Sync["Automated Sync Jobs"] SRSync[syncPostings
Every 10 min] OracleSync[syncCentralDB
Daily 11:59 PM] end subgraph CMS["Wix CMS Collections"] direction TB Postings[(Postings)] ClonedOrg[(ClonedCentralDBOrganization)] ClonedLoc[(ClonedCentralDBLocations)] Mapping[(Mapping)] Virtual[(virtual-structure)] Locations[(Locations)] end subgraph Manual["Manual Content"] EditPeople[(editPeople-v1)] EditTeamLinks[(EditTeamLinks-v1)] EditTeamValues[(EditTeamValues-v1)] EditLocation[(EditLocation)] end subgraph Website["Careers Website"] JobListings[Job Listings Page] TeamPages[Team Pages] LocationPages[Location Pages] end SR -->|Fetch Postings| SRSync Oracle -->|Fetch Org Data| OracleSync SRSync -->|Upsert| Postings OracleSync -->|Clone| ClonedOrg OracleSync -->|Clone| ClonedLoc ClonedOrg -.->|Referenced by| Mapping Mapping -->|Maps to| Virtual EditPeople -->|Updates| Virtual EditTeamLinks -->|Updates| Virtual EditTeamValues -->|Updates| Virtual EditLocation -->|Updates| Locations Postings -->|References| Virtual Postings -->|References| Locations Postings -->|Data| JobListings Virtual -->|Data| TeamPages Locations -->|Data| LocationPages style SR fill:#e3f2fd style Oracle fill:#e3f2fd style Postings fill:#c8e6c9 style Virtual fill:#fff9c4 style ClonedOrg fill:#ffccbc style ClonedLoc fill:#ffccbc style Mapping fill:#f8bbd0

🗂️ Current Entity Relationship Diagram (ERD)

Core Collections (Synced from External)
Virtual/Mapping Collections
Cloned Oracle Data
Manual Content Collections
Other CMS Content

Full ERD - All Collections and Relationships

erDiagram %% Core Synced Collections Postings ||--o| Locations : "location (ref)" Postings ||--o| virtual-structure : "virtualTeam (ref)" %% Oracle Cloned Collections ClonedCentralDBOrganization ||--o{ Mapping : "real[] (multi-ref)" Mapping ||--|| virtual-structure : "virtual (ref)" %% Virtual Structure Hierarchy virtual-structure ||--o{ virtual-structure : "parent (ref)" %% Manual Edit Collections editPeople-v1 }o--|| virtual-structure : "team (ref)" EditTeamLinks-v1 }o--|| virtual-structure : "team (ref)" EditTeamValues-v1 }o--|| virtual-structure : "team (ref)" EditLocation }o--|| Locations : "updates via hooks" EditLocalBenefits }o--|| Locations : "location (ref)" %% Content Collections Blog-Posts ||--o| Blog-Categories : "category (ref)" Postings { string _id PK string title string location FK string virtualTeam FK string team "Oracle ID" string guild "Oracle ID" string department "Oracle ID" string city string country string jobDescription string qualifications string additionalInformation string applyLink boolean isStudent boolean isTemp boolean isIntern boolean isProgram boolean entryLevel boolean isRemote boolean isFulltime boolean isManagerial string seniorityLevel boolean promoted string[] types string reference string postingNumber } Locations { string _id PK "Oracle Location ID" string title "Display name" string googleName "SR matching name" string slug number order string city string country string about string coverImage string[] benefits } virtual-structure { string _id PK string title string slug string parent FK "Self-reference" number order string about string coverImage object[] people "Embedded from editPeople" object[] teamsValues "Embedded from EditTeamValues" object[] teamsLinks "Embedded from EditTeamLinks" number postingsCount "Calculated" } ClonedCentralDBOrganization { string _id PK "Oracle Org ID" string name string organizationType string parentOrganizationId date effectiveStartDate date effectiveEndDate } ClonedCentralDBLocations { string _id PK "Oracle Location ID" string locationName string addressLine1 string city string country date effectiveStartDate date effectiveEndDate } Mapping { string _id PK string[] real FK "Multi-ref to ClonedCentralDBOrganization" string virtual FK "Ref to virtual-structure" } editPeople-v1 { string _id PK string team FK string name string email string role string image number order } EditTeamLinks-v1 { string _id PK string team FK string title string link string icon } EditTeamValues-v1 { string _id PK string team FK string title string description } EditLocation { string _id PK string location FK string about string coverImage string[] benefits } EditLocalBenefits { string _id PK string location FK string title string description string icon } Blog-Posts { string _id PK string title string slug string content string excerpt string coverImage date publishedDate string category FK } Blog-Categories { string _id PK string name string slug } equalReports { string _id PK string title string year string pdfLink } esgReports { string _id PK string title string year string pdfLink } EarlyTalentPrograms { string _id PK string title string description string link string coverImage } SrSources { string _id PK string sourceName string sourceId } SearchKeywords { string _id PK string keyword string[] relatedPostings }

🔄 Data Flow Diagrams

Flow 1: SmartRecruiters Posting Sync

Trigger: Scheduled job every 10 minutes
Purpose: Keep job postings up-to-date with SmartRecruiters
sequenceDiagram participant Cron as Cron Job participant Sync as syncPostings() participant SR as SmartRecruiters API participant LocMap as getLocationMap() participant TeamMap as getAllTeamsMaps() participant CMS as Wix CMS Cron->>Sync: Every 10 minutes par Parallel Fetches Sync->>CMS: Get current postings Sync->>SR: Fetch SR postings Sync->>LocMap: Build location map Sync->>TeamMap: Get team mappings Sync->>CMS: Get promoted postings end loop For each SR posting Sync->>SR: getJSON(item.ref) - detailed data Sync->>Sync: Extract custom fields (guild, subGuild, dept) Sync->>LocMap: Map city → Oracle location ID Sync->>TeamMap: Map Oracle team → virtual team Sync->>Sync: Build posting object end Sync->>Sync: getByOperation (diff current vs new) alt Has deletions Sync->>CMS: bulkRemove deleted postings end alt Has updates/inserts Sync->>CMS: bulkSave postings end Sync-->>Cron: Complete

Flow 2: Oracle Organization Sync

Trigger: Scheduled job daily at 11:59 PM
Purpose: Clone Oracle data to queryable CMS collections
sequenceDiagram participant Cron as Cron Job participant Sync as syncCentralDB() participant Oracle as Oracle External DB participant CMS as Wix CMS Cron->>Sync: Daily at 11:59 PM par Parallel Fetches Sync->>Oracle: Query vw_organization Sync->>Oracle: Query vw_location Sync->>CMS: Get ClonedCentralDBOrganization Sync->>CMS: Get ClonedCentralDBLocations end rect rgb(255, 240, 240) Note over Sync: BUG: getByOperation takes OLD data
from CMS instead of FRESH from Oracle
for existing items (updates not synced!) end Sync->>Sync: getByOperation (diff current vs oracle) Sync->>Sync: Separate locations & organizations alt Has deletions (organizations) Sync->>CMS: bulkDelete organizations end alt Has updates/inserts (organizations) Sync->>CMS: bulkSave organizations end alt Has deletions (locations) Sync->>CMS: bulkDelete locations end alt Has updates/inserts (locations) Sync->>CMS: bulkSave locations end Sync-->>Cron: Complete

Flow 3: Manual Location Content Update

Trigger: HR edits EditLocation collection in CMS
Purpose: Update public-facing location content
sequenceDiagram participant HR as HR/Content Editor participant Edit as EditLocation Collection participant Hook as Data Hook participant Loc as Locations Collection HR->>Edit: Insert/Update/Remove content alt afterInsert Edit->>Hook: afterInsert trigger Hook->>Hook: Prepare location data Hook->>Loc: Insert new location item end alt afterUpdate Edit->>Hook: afterUpdate trigger Hook->>Hook: Prepare updated data Hook->>Loc: Update location item end alt afterRemove Edit->>Hook: afterRemove trigger Hook->>Loc: Remove location item end Hook-->>HR: Success/Failure Loc-->>Website: Updated location displays

Flow 4: Manual Team Content Update

Trigger: HR edits editPeople/EditTeamLinks/EditTeamValues collections
Purpose: Update team pages with people, links, and values
sequenceDiagram participant HR as HR/Content Editor participant Edit as Edit Collection
(People/Links/Values) participant Hook as Data Hook participant Virtual as virtual-structure HR->>Edit: Insert/Update/Remove content Edit->>Hook: Data hook triggers Hook->>Virtual: Get team by ID alt Add Operation Hook->>Hook: Check if already exists Hook->>Virtual: Add to team's array Hook->>Virtual: updateVirtualTeam() end alt Remove Operation Hook->>Hook: Filter out removed item Hook->>Virtual: Update team's array Hook->>Virtual: updateVirtualTeam() end Virtual-->>Website: Team page updated

🔀 Team Mapping Architecture

⚠️ Complex Many-to-One Mapping

The team mapping system uses THREE separate hierarchies that are connected but independent:

  1. Oracle Hierarchy: Guild → SubGuild (internal, real org structure)
  2. Virtual Structure Hierarchy: Parent → Child (public-facing, simplified)
  3. Mapping Collection: Bridges Oracle teams to virtual teams (many-to-one)

Three-Layer Mapping System

graph TB subgraph Oracle["Oracle Organization (Real)"] direction TB OG1[Guild: Backend Engineering
ID: 300000000012345] OS1[SubGuild: Core Platform
ID: 300000000012346] OS2[SubGuild: API Gateway
ID: 300000000012347] OS3[SubGuild: Infrastructure
ID: 300000000012348] OG2[Guild: DevOps Department
ID: 300000000012349] OS4[SubGuild: SRE Team
ID: 300000000012350] OG1 --> OS1 OG1 --> OS2 OG1 --> OS3 OG2 --> OS4 end subgraph Cloned["Cloned in CMS"] direction TB CO1[ClonedOrg: 300000000012345] CO2[ClonedOrg: 300000000012346] CO3[ClonedOrg: 300000000012347] CO4[ClonedOrg: 300000000012348] CO5[ClonedOrg: 300000000012349] CO6[ClonedOrg: 300000000012350] end subgraph Mapping["Mapping Collection (Bridge)"] direction TB M1["real: [300000000012346, 300000000012347]
virtual: virtual-backend"] M2["real: [300000000012348, 300000000012350]
virtual: virtual-infra"] M3["real: [300000000012345]
virtual: virtual-engineering"] end subgraph Virtual["Virtual Structure (Public)"] direction TB VP1[Engineering
parent: null] VC1[Backend Team
parent: Engineering] VC2[Infrastructure & DevOps
parent: Engineering] VP1 --> VC1 VP1 --> VC2 end OG1 -.->|Synced Daily| CO1 OS1 -.->|Synced Daily| CO2 OS2 -.->|Synced Daily| CO3 OS3 -.->|Synced Daily| CO4 OG2 -.->|Synced Daily| CO5 OS4 -.->|Synced Daily| CO6 CO2 -->|Referenced| M1 CO3 -->|Referenced| M1 CO4 -->|Referenced| M2 CO6 -->|Referenced| M2 CO1 -->|Referenced| M3 M1 -->|Maps to| VC1 M2 -->|Maps to| VC2 M3 -->|Maps to| VP1 style Oracle fill:#ffe6e6 style Cloned fill:#ffccbc style Mapping fill:#f8bbd0 style Virtual fill:#fff9c4

Key Insights

  • Complete Independence: Virtual hierarchy is NOT derived from Oracle hierarchy
  • Many-to-One Mapping: Multiple Oracle teams (guilds AND subguilds) can map to ONE virtual team
  • Manual Curation: HR manually decides which Oracle teams map to which virtual teams
  • Flexibility: Can mix Oracle guilds and subguilds in any virtual parent/child structure
  • No Automatic Rules: Virtual team assignment is purely based on Mapping collection

Posting Assignment Logic

flowchart TD Start[SR Posting Received] --> Extract[Extract Oracle team from SR
team = subGuild OR guild] Extract --> Lookup[Look up in Mapping Collection
find mapping where team is in real array] Lookup --> Found{Mapping
Found?} Found -->|Yes| Assign[Assign virtualTeam from mapping] Found -->|No| NoAssign[virtualTeam = null] Assign --> Save[Save to Postings collection] NoAssign --> Save Save --> Display[Display on Website] Display --> ShowTeam{Has
virtualTeam?} ShowTeam -->|Yes| ShowParentOrChild[Display whatever virtual team
mapping says - parent OR child] ShowTeam -->|No| Error[Error: No team mapping] style Found fill:#fff9c4 style ShowTeam fill:#fff9c4 style Error fill:#ffebee

⏱️ Sync Processes & Schedules

Sync Job Frequency Source Target Purpose Status
syncPostings Every 10 minutes SmartRecruiters API Postings collection Keep job listings up-to-date ✅ Active
syncCentralDB Daily at 11:59 PM Oracle External DB ClonedCentralDBOrganization
ClonedCentralDBLocations
Clone Oracle data for reference fields ⚠️ Active (has bug)

🐛 Known Bug in syncCentralDB

Location: src/backend/data-sync/centralDBSync.js:48

Issue: For existing items that need updates, the function takes OLD data from CMS instead of FRESH data from Oracle

Current Code:

...a.filter(item => idsInB.has(item._id))

Should Be:

...b.filter(item => idsInA.has(item._id))

Impact: Updates from Oracle are not synced for existing items, only new inserts and deletions work correctly

Sync Performance Considerations

Current Approach: Full Upsert

Both sync jobs use a "delete and recreate" approach rather than comparing field values:

  • Pros: Simple logic, always consistent, handles schema changes
  • Cons: More compute, more writes, potential for race conditions

Reasoning: With 10-minute (postings) and daily (oracle) frequencies, and relatively small datasets, the simplicity outweighs optimization benefits. Modern serverless compute handles this efficiently.

🚀 Proposed V2 Architecture

⚠️ Decision Point: Do We Need MySQL?

Before proposing a MySQL architecture, let's evaluate if it's necessary:

Current Pain Points Analysis

Issue Severity Wix CMS Solution Requires MySQL?
Unused collections (ClonedCentralDBLocations) 🟡 Low Simply stop syncing it ❌ No
Redundant data in Postings (city, country) 🟡 Low Keep for filtering performance ❌ No
Complex team mapping 🟢 Feature Working as designed ❌ No
Sync bug in centralDBSync 🔴 High Fix one line of code ❌ No
Multiple Edit collections for one entity 🟡 Medium Consolidate with better UI ❌ No
Oracle ID storage rationale unclear 🟢 Documentation Better documentation ❌ No

✅ Recommendation: Optimize Current Architecture

Verdict: MySQL is NOT necessary. The current architecture is sound but needs cleanup and optimization.

Reasons to keep Wix CMS:

  • Built-in UI for content management
  • Data hooks for business logic
  • Reference fields work well
  • Performance is adequate with caching
  • Integration with Wix ecosystem
  • External database support already available (Oracle)

Proposed V2: Optimized Wix CMS Architecture

graph TB subgraph External["External Systems"] SR[SmartRecruiters API] Oracle[Oracle Database] end subgraph Sync["Optimized Sync Jobs"] SRSync[syncPostings
Every 10 min
✅ Fixed] OracleSync[syncCentralDB
Daily
✅ Bug Fixed] end subgraph Core["Core Collections (Streamlined)"] Postings[(Postings
✅ Keep denormalized fields)] ClonedOrg[(ClonedCentralDBOrganization
✅ Keep for Mapping)] Mapping[(Mapping
✅ Keep as-is)] Virtual[(virtual-structure
✅ Consolidate edit logic)] Locations[(Locations
✅ Keep as-is)] end subgraph Removed["Removed/Unused"] ClonedLoc[(ClonedCentralDBLocations
❌ Remove - Unused)] end subgraph Consolidated["Consolidated Edit UI"] TeamEditor[Unified Team Editor
People + Links + Values] LocationEditor[Unified Location Editor
Content + Benefits] end subgraph Content["Other Content (Keep)"] Blog[(Blog/Posts + Categories)] Reports[(equalReports + esgReports)] Programs[(EarlyTalentPrograms)] Keywords[(SearchKeywords)] end SR -->|Fetch| SRSync Oracle -->|Fetch| OracleSync SRSync -->|Upsert| Postings OracleSync -->|Clone| ClonedOrg OracleSync -.->|Stop syncing| ClonedLoc ClonedOrg -->|Referenced| Mapping Mapping -->|Maps to| Virtual TeamEditor -->|Updates| Virtual LocationEditor -->|Updates| Locations Postings -->|References| Virtual Postings -->|References| Locations style SR fill:#e3f2fd style Oracle fill:#e3f2fd style Postings fill:#c8e6c9 style Virtual fill:#fff9c4 style ClonedOrg fill:#ffccbc style ClonedLoc fill:#ffebee style TeamEditor fill:#e1bee7 style LocationEditor fill:#e1bee7

V2 Optimization Checklist

Task Priority Effort Impact Action
Fix syncCentralDB bug 🔴 Critical 1 line High Change line 48 in centralDBSync.js
Remove ClonedCentralDBLocations sync 🟡 Medium Low Low Stop syncing, keep collection for now
Consolidate Edit Collections UI 🟡 Medium High Medium Build unified editor for team content
Add better error handling 🟡 Medium Medium Medium Log unmapped postings, alert on failures
Document Oracle ID rationale 🟢 Low Low Low Add inline comments and docs
Audit unused fields 🟢 Low Medium Low Remove companyDescription if confirmed unused
Add field-level comparison in sync 🟢 Low High Low Only if performance becomes an issue

📋 Complete Collections Reference

Core Synced Collections

Postings

Field Type Source Purpose Used In UI?
_id String (PK) SR refNumber + id Unique identifier ✅ Routing
title String SR item.name Job title ✅ Display
location Reference Mapped via locationMap Wix Location item ✅ Display
city String SR item.location.city Denormalized for filtering ✅ Filters
country String SR item.location.country (via isoToCountry) Denormalized for filtering ✅ Filters
virtualTeam Reference Mapped via teamsMaps Public-facing team ✅ Display
team String SR subGuild || guild (Oracle ID) Oracle team ID for analytics ❌ Backend only
guild String SR custom field (Oracle ID) Oracle guild ID for analytics ❌ Backend only
department String SR custom field (Oracle ID) Oracle dept ID for analytics ❌ Backend only
jobDescription RichText SR sections.jobAd.text Job description ✅ Display
qualifications RichText SR sections.requirements.text Requirements ✅ Display
additionalInformation RichText SR sections.companyDescription.text About the team ✅ Display
companyDescription RichText SR sections.companyDescription.text Duplicate? ❌ Unused
applyLink String SR shareableLink Apply URL ✅ Apply button
isStudent Boolean SR custom field Student position flag ✅ Badge/Filter
isTemp Boolean SR custom field Temp position flag ✅ Badge/Filter
isIntern Boolean SR custom field Intern position flag ✅ Badge/Filter
isProgram Boolean SR custom field Program position flag ✅ Badge/Filter
entryLevel Boolean SR custom field Entry level flag ✅ Badge/Filter
isRemote Boolean SR custom field Remote work flag ✅ Badge/Filter
isFulltime Boolean SR custom field Full-time flag ✅ Badge/Filter
isManagerial Boolean SR custom field Managerial position flag ✅ Badge/Filter
seniorityLevel String SR custom field Seniority level ❌ Not displayed
types String[] Calculated from boolean flags Array of job types ✅ Filters
promoted Boolean promoted-postings collection Promoted job flag ✅ Sorting
reference String SR refNumber SR reference number ❌ Backend only
postingNumber String SR id SR posting ID ❌ Backend only

Locations

Source: Manually maintained (can be updated via EditLocation hooks)
Purpose: Public-facing location data with Oracle IDs as primary keys
Sync: Not automatically synced from Oracle
Field Type Purpose
_id String (PK) Oracle Location ID (e.g., "200000000000214")
title String Display name (shown in CMS UI for references)
googleName String City name for SR matching (e.g., "Be'er Sheva")
slug String URL-friendly identifier
order Number Display order
city String City name
country String Country name
about RichText Location description
coverImage Image Location image
benefits String[] Local benefits list

virtual-structure

Source: Manually maintained, updated via Edit collection hooks
Purpose: Public-facing team hierarchy
Sync: Not synced from Oracle (independent structure)
Field Type Purpose
_id String (PK) Virtual team ID
title String Team display name
slug String URL-friendly identifier
parent Reference (Self) Parent team (null for top-level)
order Number Display order
about RichText Team description
coverImage Image Team cover image
people Object[] Team members (embedded from editPeople-v1)
teamsValues Object[] Team values (embedded from EditTeamValues-v1)
teamsLinks Object[] Team links (embedded from EditTeamLinks-v1)
postingsCount Number Calculated job count (runtime)

Mapping & Oracle Collections

Mapping

Source: Manually maintained
Purpose: Bridge between Oracle teams and virtual teams
Critical: This is how postings get their virtual team assignment
Field Type Purpose
_id String (PK) Mapping ID
real Multi-Reference Array of ClonedCentralDBOrganization IDs (Oracle team IDs)
virtual Reference Single virtual-structure ID

ClonedCentralDBOrganization

Source: Synced daily from Oracle
Purpose: Writable copy of Oracle teams for reference fields
Used By: Mapping collection
Field Type Source
_id String (PK) Oracle organization ID
name String Oracle team name
organizationType String Type (Department, Team, etc.)
parentOrganizationId String Parent org ID in Oracle
effectiveStartDate Date Oracle effective start date
effectiveEndDate Date Oracle effective end date

ClonedCentralDBLocations

Status: ❌ UNUSED
Source: Synced daily from Oracle
Problem: Synced but never referenced or used
Recommendation: Stop syncing to save resources

Edit Collections (Manual Content Entry)

editPeople-v1

HR adds/edits team members. Data hooks automatically update virtual-structure.people

EditTeamLinks-v1

HR adds/edits team links. Data hooks automatically update virtual-structure.teamsLinks

EditTeamValues-v1

HR adds/edits team values. Data hooks automatically update virtual-structure.teamsValues

EditLocation

HR adds/edits location content. Data hooks automatically update Locations

EditLocalBenefits

HR adds/edits location-specific benefits. Data hooks automatically update Locations.benefits

Other Content Collections

Collection Purpose Used In
Blog/Posts Blog articles Life at Wix page, Blog page
Blog/Categories Blog categories Blog filtering
equalReports Equal opportunity reports The Wix Way page
esgReports ESG reports The Wix Way page
EarlyTalentPrograms Early talent programs Early Talent page
SrSources SR source tracking Apply link tracking
SearchKeywords Search keywords Search functionality
promoted-postings Promoted job references Job listing sorting

📚 Additional Resources