DISCLAIMER: This is in relation to hotfixes ONLY. It is also not a commentary on anyone's process or even speak to processes around development and testing. It is only to assist in understanding the impact and thusly prioritizing production issues that generally initiate requests for a hotfix. There is rarely a formalized process around making the determination around releasing, and this hopefully provides context and some repeatable steps/metrics to apply.
What IS a hotfix?
A hotfix is a software update designed to fix a bug or security hole in a program. Unlike typical version updates, hotfixes are urgently developed and released as soon as possible to limit the effects of the software issues. They are often released between incremental version updates. Hotfix is sometimes used interchangeably with patch. Depending upon your organization, that may be the case --or they may be radically different. Regardless of your terminology, this post is applicable to an unexpected malfunction in a piece of software that has somehow made it into the production environment.
Policy
We evaluate every hotfix candidate using the following factors/questions:
Severity. How bad is the problem? Does the problem involve a security exposure? Is there data loss? Is there a loss –even temporarily –to our users' funds available or ability to access those funds? Is functionality blocked? If so, how important is that functionality?
Scope. How many people are or will be affected by this issue? To what extent will it impair their work? Are all clients affected? If not, which ones and how many users associated? Is an affected client's relationship with the company "solid" or "shaky?" Is this isolated to a product offering or corridor?
Workarounds. Are there reasonable steps that avoid the problem? Can those affected be shown these steps easily? What are our options in conveying said workaround? Does this overburden a different department?
Regression status. Is the bug:
A new problem with previously working functionality, i.e. recently introduced?
A problem with new functionality?
An old problem that has been in the product for one or more releases?
Cost of fixing. How long will it take to implement and test a fix?
Risk of fixing. How invasive are the changes? What is the likelihood that those changes will produce unintended consequences?
Time. How long has the release been available? How long before a new release is scheduled?
Evaluating a hotfix candidate is a subjective risk vs. reward trade-off. In many cases, you will probably find that the reward is simply not worth the risk and cons. But, as in #7, the length of time since the last release does and should affect the evaluation.
Something critical discovered shortly after a release should be evaluated more seriously than an issue that hasn’t been reported until three months after a release. (This is helpful as well when releases are *completely* versioned, allowing the drilling-down to the individual levels.)
From that, the following general guidelines can be used to to assess relatively quickly.
Obviously, nothing is a hard and fast rule. The risks or costs of a fix may preclude a worthy hotfix OR you may hotfix a risk that doesn’t meet the criteria. The end goal would be to solicit more testing and version our releases in such a way that it isn’t an all-or-nothing situation.
To pull an old document around this, for anyone wondering why most engineering teams would not want to hotfix everything, there are these reasons:
Risk. Hotfixes largely bypass the standard testing that takes place during the development cycle. These fixes are often deployed to production servers shortly after being committed, with limited opportunity to verify the fix. The main problem is the risk of “unintended consequences.” Like all other code changes, a hotfix can potentially cause unexpected issues in other parts of the system. A hotfix provides no opportunity to detect the follow-on issues before the production deployment.
Focus. At the point when a potential hotfix candidate is identified, developers are deeply engaged in implementing features for the next release. Asking several developers to stop feature work and focus instead on a hotfix often prevents them from finishing one or more existing backlog items on the sprint board.
Cost. Producing a hotfix is typically 3-5 times as costly as fixing the exact same issue during the development cycle. To mitigate those risks, you must be extremely conservative. That means starting with the evaluation process and involving senior management and the client, when necessary. You should design, discuss, implement and test several potential solutions, to find the fix that best addresses the issue while minimizing impact on other functionality. All hotfixes should be assessed for risk by senior management and should never bypass the code review process. Testers should be available to verify the change immediately. Oftentimes, the hotfix is isolated and not an appropriate long-term solution. In those cases, the hotfix changes have to be rolled back and replaced with a more comprehensive fix in the next release. All of this adds to the overhead, making a “simple” hotfix time-consuming and expensive overall.
Ideally, you will be able to give true data metrics in all of these evaluations. That information would be gathered from all departments, and compared against the metrics/data points that are collected in each release.
Comments