Redefining Stability
Published on 2016-08-31
Defining stability is a hard problem due to many factors. I propose the use of
some guidelines and modifiers to describe various "stable" systems to better
describe their true state.
Common Points
- The software runs within its defined memory/cpu parameters
- The software does not crash under "normal" load
Understable
- There is some risk in using the service. Not all edge cases for the API have
been tested/bugfixed
- Deployments need monitoring by "that one person" who understands the service
deeply
- Subject to large or frequent code / API updates
- Using "new" or untested libraries, repeatedly updated (subject to frequent
deployments)
Stable
- Most edge-cases have been fixed for the API
- Engineering teams maintain and update the code regularly (as needed or
monthly)
- Relatively few updates to the business-logic code. The API is versioned and/or
tested for backwards compatibility
- Libraries are updated and the app is refactored to keep pace with security
updates and Engineering progress outside the app (e.g. database version
changes)
Overstable
- So "rock solid" that no one wants to touch it -- it has "hairs" on it in the
form of bugfixes and all edge-cases have been handled in code.
- Code is reviewed / updated only when absolutely necessary to prevent
catastrophic failures. Library and code updates are avoided.
- Would be considered "abandonware" in other contexts
- (Like understable) requires "that one person" who understands the code to
maintain it
Conclusions and Reasoning
Both under- and over-stable systems are in a "bad" state and are something we
should avoid.
Having truly Stable code means regularly reviewing or refactoring code to
account for new Engineering requirements or practices, fixing or adding pieces
as needed, and keeping the team's understanding of the code fresh.