Add a try/catch clause around any network or file operation.
Yesterday, a network connection to Shared Writable Storage experienced a minor hiccup. Services automatically recovered in about 20 seconds and impacting less than 3% of all hosted applications, but we received several tickets from users whose applications were offline for hours. Following are a few important concepts and options for developers.
Pagoda Box works infrastructure magic on a daily basis, but a few cloud realities are handled only by developers inside their applications. If you’re not familiar with them, take a second to review the CAP Theorem and “The Eight Fallacies of Distributed Computing” or more here.
Here’s the quick list for your convenience:
CAP Theorem: it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
- Consistency (all nodes see the same data at the same time)
- Availability (a guarantee that every request receives a response about whether it was successful or failed)
- Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
The Eight Fallacies of Distributed Computing
1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn’t change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.
Where Partition Tolerance and Fallacy No. 1 Intersect
While Pagoda Box does a great job of minimizing or eliminating most of these issues, Partition Tolerance and Fallacy No. 1 are biggies, especially for PHP. Specifically, many PHP frameworks and applications were created on the assumption that everything resides in a single server and not a distributed environment. Assumptions that are true in a single server environment quickly fly out the window once the application is distributed.
Example: Local Sessions
Consider the practice of storing sessions locally. Applications on Pagoda Box scale horizontally across multiple instances. Obviously in a distributed environment, each instance writing it’s own local sessions would result in garbled session data as visitors navigate to several different instances. This happens in ecommerce, and you’ve got a lawsuit .
Pagoda Box allows developers to centralize this dependency with Shared Writable Storage (but we officially recommend a sessions cached in Redis). Problem solved, right? Well, mostly. Unfortunately, applications still have to account for Partition Tolerance and Fallacy No. 1 because all networks are, in fact, unreliable when compared to hosting on a single server.
That’s not to say distributed networks are always dropping offline. It simply suggests that due to fleeting congestion at a switch or when cycling physical hardware for security, network communications occasionally ‘fluctuate’. Frameworks or applications must anticipate these hiccups, and account for latency or packet loss gracefully. What if the app doesn’t retrieve session data instantaneously? Does the entire application seize up, or can it wait, return an error, and resume normal activity when the network connection is reestablished?
On the client side of the curtain, application or framework developers need to architect with cloud distribution in mind. When using Pagoda Box, if your application mysteriously shuts down while all the underlying resources remain available, check how heavily you’re using Shared Writable Storage. It may be a sign that your application doesn’t gracefully provide Partition Tolerance. Note that the Pagoda Box infrastructure will gracefully retain or reestablish connections. Adding a try/catch clause around any network or file operation is a simple way to accommodate. Also know that “Deploy Latest” will reset applications to their working state.
Pagoda Box’s Role
Behind the curtain, we work a bunch of voodoo around infrastructure. Pagoda Box monitors and maintains all services 24/7, and we’re working to increase the number and quality of both. We do not detect when applications are live or dead. However, in coming weeks, we’re introducing Ping Monitoring to the dashboard to help increase app uptime visibility for administrators and developers.