Case Study
Enterprise Data Management Platform
Designed and built a multi-tenant enterprise data management platform starting with a specific product-domain, expanding to serve multiple products across the organization. Event-driven, extensible, and cloud-native.
Transformation Journey
Greenfield Unknowns
- No platform or DB decision
- ETL and scale undefined
- No existing precedent
Blank slate with risk
Dual Database
- MongoDB for app-layer code
- Azure SQL for data layer
- Early architecture exploration
Two databases, one platform
Production Pivot
- MongoDB to SQL Server migration
- DAG batch processing designed
- Debouncing at scale added
Right foundation found
Event-Driven Build
- Workflow validation + orchestration
- API and SFTP delivery
- Extensible event registrations
Data pipeline automated
Multi-Tenant Platform
- Org-wide extensibility
- Event-driven delivery at scale
- Consuming service automation
Enterprise data platform live
Challenge
This was a brand new product-space for both the organization and the developers. Few decisions were made upfront, with database technology, ETL definitions, and scale requirements all unknowns. The only certainty was reusing the existing Azure cloud service architecture.
- Brand new product-space for the organization and its developers
- Database technology not yet decided at the outset
- ETL definitions and data pipeline shapes undefined
- Scale of the platform entirely unknown
- Only known constraint was reusing existing Azure cloud service architecture
- Multi-tenant isolation and tenant onboarding complexity
- Enabling external clients to automate workflows through public APIs
Strategy
Azure SQL Server anchored the data layer as a well-understood foundation. MongoDB was adopted for application-layer code to maintain cohesion with the parent product despite no team familiarity, then migrated to SQL Server via production migration once the cost to velocity became clear. A DAG-based batch processing system with debouncing, workflow validation, and orchestration layers was architected to guarantee execution at scale. An event-based system was layered on top for registration and delivery of data via API, direct-consumable integrations, and SFTP to internal business products.
- Azure SQL Server as the data layer foundation for atomic handling and well-known tooling
- MongoDB initially adopted for application-layer code to maintain parent product cohesion
- Production migration from MongoDB to Azure SQL Server to reduce internal team friction and improve velocity
- Debouncing, customizable batch-processing system for scalability challenges
- DAG chosen for guaranteed execution and optimal processing of batched operations
- Workflow-validation layer with orchestration leveraging validation for efficient automated batch processing
- Event-driven architecture delivering data to consuming services via API and direct-consumable integrations
- Platform events registered and delivered via SFTP to consuming internal business products
- Event registrations enabling consuming services to automate their own pipeline workflows relative to internal business product needs
Tradeoffs Considered
- MongoDB vs SQL Server: initially chose MongoDB for application-layer code to maintain parent product cohesion, then executed a production migration to SQL Server once the impedance mismatch and team friction costs became clear
- DAG-based orchestration vs sequential processing: adopted DAG for guaranteed execution and optimal batching of mixed workloads, accepting increased orchestration complexity in exchange for deterministic processing guarantees
- Event-driven delivery vs direct integration: chose event-based architecture with registrations to decouple producers from consumers, accepting eventual consistency across the data pipeline in exchange for scalability
- Build vs buy for data pipeline: built custom batch processing with debouncing rather than adopting an off-the-shelf ETL tool, trading implementation effort for precise control over processing behavior and the ability to independently optimize throughput, latency, and resource allocation per tenant workload
- Multi-tenant shared platform vs per-tenant instances: designed shared infrastructure with isolated tenant boundaries, optimizing operational efficiency at the cost of increased multi-tenancy complexity
Results
Established a unified enterprise data platform that enabled multiple business products to operate from a common foundation, reducing duplication and improving consistency across the organization.
- Improved engineering effectiveness and cross-team collaboration by modernizing data management capabilities and reducing operational friction
- Strengthened data integrity and reliability by enabling consistent transactional processing across critical business workflows
- Delivered a scalable orchestration model capable of reliably processing large and unpredictable workloads while maintaining operational stability
- Increased confidence in data operations through workflow validation and coordinated execution controls
- Enabled flexible integration patterns that allowed business capabilities to be consumed through APIs, direct integrations, and automated workflows
- Streamlined data delivery processes and improved interoperability between business systems and downstream consumers
- Enabled greater automation across the enterprise by providing event-driven integration points that reduced manual coordination and accelerated workflow execution
Before vs After
Greenfield with Unknowns
- No existing platform or architectural precedent
- Database technology undecided
- ETL definitions and pipeline shapes undefined
- Scale requirements unknown
- MongoDB for application-layer code with no team expertise
- No atomic transaction support across operations
- No batch processing or debouncing mechanisms
- No standardized file delivery or API pipeline automation
Multi-Tenant Event-Driven Platform
- Azure SQL Server providing atomic handling and familiar tooling
- DAG-based batch processing with debouncing at scale
- Workflow-validation and orchestration layers guaranteeing execution
- Event-driven architecture feeding consuming services in real time
- SFTP delivery of output files to enterprise consuming products
- API-level automation for paying clients to orchestrate their own pipelines
- Production-proven migration path from MongoDB to SQL Server
- Extensible multi-tenant platform serving multiple products organization-wide