Lessons From The Trainwreck That Is Healthcare.gov

If you’ve ever worked on a large-scale, relatively high profile Website project, you can’t help but be drawn to the ongoing analysis of the problems that have plagued healthcare.gov since its October 2 launch. It’s a trainwreck in the true definition of the word: “a chaotic or disastrous situation that holds a peculiar fascination for observers.”

All politics aside, I can’t look away. I’m intrigued in part because this is a unique opportunity for developers and others to Monday morning-quarterback issues that never before commanded a mass audience. Finally people care about technology infrastructure, content management systems, agile development, testing, managing the number of throats to choke, etc.!

For the record, there’s nothing to enjoy about this flawed launch. And for those of us who work on Websites, it’s helpful to remember “there but for the grace of God…” What's fascinating is watching technical specialists try to educate the media and the rest of us on how Web projects should be managed. Who knows how many people are bookmarking these explanations for future reference? 

Project Management

"It's very hard to estimate how long a project will take. That means it's hard to know how many programmers to devote to the task, how much it will cost and so on. Measuring progress isn't easy, either, which means it's hard to tell how far along you are. 

"The worst problem is probably that requirements change while the software is being developed. This may mean that you have to redo work you've already done, but the effects can be more far-reaching. It's like building a house: If the owners suddenly decide they want a big floor-to-ceiling picture window on the second floor, it may require rerouting water pipes. That may require moving the ground-floor bathroom, which in turn could affect the kitchen layout, because the bathtub and the kitchen sink share drain pipes. Part of project management's job is to say "no" to many change requests, but that's not always possible."

Excerpted from Why healthcare.gov has so many problems, October 14

Pacing

"The fatal mistake they made is bringing up everything at once," making debugging the sprawling system a challenge, said David Starr, a former chief information officer for 3Com Corp., ITT Corp. and other large companies. But, "it's almost always better to postpone things than bring them up broken," said Mr. Starr, who is not involved in the exchange development.

The problem might have been largely averted if consumers had been able to browse plans without going through the complex registration process first, experts say.

"If you put a critical pathway at the front of the experience, you're going to logjam everything," said Patrick Byrne, chief executive of Overstock.com, an e-commerce company. "If you have a bunch of Web services talking to each other, you want to put the transaction at the end," he said.

Excerpted from Healthcare.Gov's Flaws Found, Fixes Eyed, October 10, 2013

Asynchronous vs. Synchronous

"An alternative to an online lookup of personal data or account creation would be to store the request for later processing. This is commonly referred to as queuing. It turns an online process into an offline one: the system goes from being synchronous—waiting for a response from another system after making a request to it—to asynchronous—not waiting for the response and arranging to check the result somehow later…

It is now a widely established pattern for system development. For example, when you buy a ticket from an airline reservation site, and wait for your credit card to be processed and the whole transaction to complete, that is an example of a synchronous, or online, system …When you place an order with Amazon, on the other hand, you receive a response almost immediately (“thank you for your order!”). If there is a problem with your order—your card is expired, or was declined—you later receive a notification, usually an email, asking you to update your payment info. That is an example of an asynchronous system.

Why does this matter? Asynchronous, distributed systems have components that are de-coupled—if one fails, it doesn’t necessarily bring the rest down with that… This introduces operational complexity: you must have a functioning queue system, you must have programs that process the queue, they need to be monitored and errors have to be handled appropriately (since there is no online user that can respond to them), and notification systems like email that are out-of-band of the website may need to be employed (in case you need to ask the user to come back and provide more information).

Excerpted from Healthcare.gov and ACA marketplace sites from the perspective of a software engineer, October 4, 2013

Responsive Design

"Jekyll, for those who are unfamiliar with web-development trends, is a way for developers to build a static website from dynamic components. Instead of running a traditional website with a relational database and server-side code, using Jekyll enables programmers to create content like they create code. The end result of this approach is a site that loads faster for users, a crucial performance issue, particularly on mobile devices.

"Instead of [running] farms of application servers to handle massive load, you're basically slimming down to two," said [Bryan Sivak, chief technology officer of the United States Department of Health and Human Services]. "You're just using HTML5, CSS, and Javascript, all being done in responsive design. The way it's being built matters. You could in theory do the same with application servers and a CMS, but it would be much more complex. What we're doing here is giving anyone with basic skills to basic changes on the fly. You don't need expensive consultants."

Excerpted from an optimistic piece published June 28, 2013, Healthcare.gov: Code Developed by the People and for the People, Released Back to the People

Agile Development

"Like anything that involves human beings, shipping code can devolve into squabbling, missed deadlines, and flawed releases. The programming community’s key realization is that the solution to these problems is to create more transparency, not less: code reviews, tons of “unit tests” to automatically find flaws, scheduled stand-up meetings, and the constant pushing of new code into the open, where it’s used by real people." 

Excerpted from The Obamacare Website Didn't Have to Fail. How to Do Better Next Time, October 16