By this point, the platform was doing what it was designed to do. The multi-tenant database had replaced a dozen separate client environments. The data access layer and reporting engine were generating several million dollars a year in revenue. The automation tooling had gotten the data team back on schedule. We’d grown from about a dozen clients to a number that would eventually reach 75.
And then our storage array ran out of space.
This is Part 3 of a three-part series about the architecture decisions behind a healthcare data SaaS platform, from proof of concept through acquisition. Part 1 covers the migration from a dozen duplicated client databases to a single multi-tenant system. Part 2 covers the product architecture and automation that made the platform scalable. Part 3 covers the cloud migration, system reliability, and what survived through acquisition.
The wall we hit
Our database, reporting, and application servers were co-located in a data center in South Florida, which is hurricane country. System reliability was already a concern because of periodic power outages, and the servers themselves were struggling to keep up with load during peak usage, especially after data updates and during clients’ planning for open enrollment season.
The real forcing function was physical storage. Our data was expanding so fast that the storage array was completely full, with no room to expand. We were facing a significant capital outlay for new servers and bigger storage arrays, and even if we made that investment, we’d be solving a problem that was just going to come back again as the client base kept growing.
We scrambled for workable options, and it became pretty clear that expandable cloud storage was the only long-term answer. Once we accepted that the data was moving to the cloud, the rest of the picture came into focus: the reporting server and application server needed to follow, because splitting them across on-premise and cloud infrastructure would create its own set of problems. As it turned out, that’s exactly what happened.
The phased migration
We needed to migrate in phases. We had live clients depending on the system, and a cutover of the entire infrastructure carried too much risk.
First, we expanded our production storage into Azure while keeping the internal data processing storage local. That solved the immediate crisis and bought us breathing room. Next, we moved the reporting server to the cloud. Finally, the application server followed.
The intermediate state actually made performance worse. With some systems in the cloud and others still on-premise, data transmission latency created performance issues that clients noticed. We went from a system that was unreliable because of power outages and load problems to a system that was reliable but slower. For a stretch of time, the migration felt like it was going sideways.
Once everything was co-located in Azure, however, the picture changed completely. The latency issues disappeared because the systems were no longer passing data between two different environments. Client performance complaints basically ceased. Our availability went from below 94% to 99.99%, and we celebrated a successful migration.
What the migration taught me
The performance dip during the hybrid phase wasn’t a surprise in hindsight, but I could have planned for it more explicitly. Phased migrations are the responsible approach when you have live clients. You can’t cut over everything at once and hope for the best. But the tradeoff is that the intermediate state can genuinely be worse than what you started with, and you need to be ready to communicate that to stakeholders and move through the transition quickly.
I’ve seen this pattern play out in other projects where I was not the lead, and also in other companies. So I’m relatively comfortable saying that, when creating a new system to replace a legacy one, it is natural to be optimistic. “We built a new system to resolve our major pain points. This is going to make our lives so much better!” And if you’ve done your work well, it will, but not necessarily right away or without customers (internal and external) feeling some pain.
If I could go back, the bigger change I’d make is to the database performance strategy. We were reactive, dealing with scaling pain as it showed up rather than getting ahead of it. I would have used the cloud migration as an opportunity to evaluate the broader data architecture. Could a different database product have served us better at scale? Would a warehousing strategy have eased the load on the production database? Those are questions I was starting to explore just before the acquisition. Starting that evaluation earlier, ideally as part of the decision to move to Azure, would have given us more options.
Through the acquisition
The Ignition Group was acquired by Zelis, and I served as the technical program owner during the transition. My job was to ensure continuity of systems, data flows, and client delivery while the two companies figured out how to integrate.
Post-acquisition, a meaningful portion of what we’d built carried forward. The data processing improvements, import transformations, QA systems, and subscription features were all adopted into the Zelis platform. The patterns and capabilities we’d designed at a small company proved valuable enough to absorb into a much larger enterprise system.
Our multi-tenant architecture and subscription system sparked what I’d describe as a spirited discussion. Zelis leadership acknowledged that our architecture had significant advantages. There was genuine consideration of adopting our approach more broadly. In the end, though, leadership decided to retire our system and fold the client-facing features into their existing platform. It was simply more practical to migrate our clients and features into what they already had, and there were major product plans that were already on the development teams’ roadmap.
That outcome was hard to accept at the time. I’d spent years building a system I believed in, and watching it get retired in favor of what had been a competing platform was painful. But I understood the decision. Good architecture doesn’t guarantee survival through acquisition. Business pragmatism wins, and honestly, it should. The acquiring company has to make choices based on what’s best for the combined organization.
The consolation, and it’s a real one, is that the work survived even when the system didn’t. The data processing patterns, the subscription logic, the QA frameworks, the integration approaches we’d developed all influenced what came after. Building something well means that even if the specific system gets retired, the ideas and capabilities it embodied tend to persist in whatever replaces it.
Looking back across the whole journey
This series started with a dozen duplicated databases and a founder who knew his system needed to change. Over the course of about six years, we built a multi-tenant platform, a decoupled product architecture, an automation layer that freed up the data team, and migrated the whole thing into the cloud. The platform grew from 12 clients to 75, generated millions in annual revenue, and was ultimately part of a successful acquisition.
Every major architecture decision in that story was driven by a specific business constraint. The multi-tenant migration happened because the data team was drowning in manual updates. The data access layer happened because we needed the UI and database to evolve independently. The automation platform happened because the data team couldn’t keep up with client growth. The cloud migration happened because our storage array was physically full. Every one of these decisions started with a real problem that was limiting the company’s ability to grow.
The principle connecting everything is straightforward: at a startup, the best architecture decisions are the ones that solve the problem in front of you while leaving room to solve the next one when it arrives. We didn’t predict that we’d need to migrate to the cloud, but the decoupled architecture made it possible when the time came. We didn’t know the company would be acquired, but the clean data patterns and subscription logic we’d built were portable enough to survive in a new environment.
Everything I’ve built since then starts from the same place: the constraint in front of me today, designed so that tomorrow’s problem is solvable when it shows up. You are not going to have the time and resources to build the perfect (theoretically) architecture. And I keep investing in the work that nobody sees, because that’s usually the work that determines whether everything else holds together.
This concludes the three-part “Building a SaaS Platform from Zero” series. Part 1 covers the multi-tenant migration. Part 2 covers the product architecture and internal tooling. Thanks for reading.
Pingback: Building a SaaS Platform from Zero, Part 1: From a Dozen Databases to One | Try Fail Learn Grow
Pingback: Building a SaaS Platform from Zero, Part 2: The Product Architecture That Powered the Business | Try Fail Learn Grow