On Tuesday, February 28th, Amazon Web Services (AWS) experienced a significant outage in their US East Data Center, located in Virginia. MobileBytes, like many other companies, did experience the effects of the AWS Simple Storage Service (S3) outage. How did we perform during and after the outage? Read on.
One of the primary reasons businesses are moving critical computing tasks to the cloud rather than hosting them locally is the safety net provided by hosting products like Google Cloud Platform, Microsoft Azure, and Amazon Web Services (AWS). These products are created and maintained by some of the brightest and most talented scientists and engineers in the world, talent mostly impossible for smaller software or restaurant groups to afford.
For smaller companies, cloud services offer leverage. In our case, MobileBytes engineers write our own code, but we leverage many cloud services offered by AWS to reduce the cost and workload associated with distributing our POS, online services, and mobile software. AWS offers outstanding uptime performance as well as opportunity to leverage failsafe capabilities built into their services. For example, our primary data service, RDS, is replicated across servers located in different availability zones. RDS also allows us to utilize a read replica database as an automatic failover in case our primary database fails. RDS allows us to scale up our database both vertically and horizontally automatically based on need. Without the ability to leverage a cloud provider like AWS, the cost of implementing this kind of redundancy would be exponentially higher. Likewise, we leverage the power and affordability of Amazon’s S3 to cost effectively and efficiently store the millions of data files we receive from our thousands of mobile devices.
During the S3 outage, our POS systems continued to operate without interruption for mission critical tasks. Tickets could be created and modified, printing of kitchen slips and receipts continued, cash drawers were operational, credit card transactions would still happen, etc. Our POS system does use S3 to synchronize data between iPads and our Cloud services. With S3 down, each iPad automatically stored all sync data until service was restored, at which point data was automatically posted to S3 and queued for processing.
Our merchant POS customers did experience the loss of some services because they require access to data stored in S3. Examples include:
- Current day reporting
- Online ordering
- Mobile loyalty
- Time punch editing
While we regret any downtime, we are both pleased and simultaneously disappointed with our performance in this case. We are pleased that our app was designed to remain operational during a total internet failure and was therefore able to minimize the effect of Tuesday’s partial outage. We are pleased that once S3 service was restored, every iPad running MB POS automatically recovered from the outage. In hindsight, we wish we would have foreseen this type of outage and taken extra steps to bolster redundancy in the case of an S3 outage. We did acquire valuable knowledge from this experience and are already working on ways to better protect our uptime in case of a similar event in the future.
Running our business with a leveraged cloud infrastructure requires us to continuously educate ourselves on better ways to utilize new and existing services in order to provide a great product at a reasonable price to our resellers and merchants. To borrow a quote from a favorite Broadway play, “We all have a little more homework to do” (13 The Musical).