If you follow this blog, you know that I‘ve been scratching my head to find relevant metrics to assess the quality of our various products under development. I’m involved in a very large program which is highly visible. These metrics are not only important to our development organization but also to executives who are investing in the program. I wanted to come up with a simple model, relevant for all important characteristics of a software and requiring low effort to compute. Having a model following international standard was of course a plus.
The current model my team and I came up with is largely inspired by the international standard ISO-9126 and has been adapted for the need of our program. ISO-9126 represents the latest (and ongoing) research into characterizing software for the purposes of software quality control, software quality assurance and software process improvement.
The model identifies 6 main quality characteristics, namely: Functionality, Reliability, Usability, Efficiency, Maintainability and Portability. I really like the model as it considers the complete scope of the software. The most difficult part is of course to find the right metrics to measure each characteristics and their associated subcharacteristics.
Bellow are the definition for each characteristics and subscharacteristics and the current thinking about relevant metrics for each. I’ve took the freedom to copy the definition from this website.
Functionality is the essential purpose of any product or service. For certain items this is relatively easy to define, for example a ship’s anchor has the function of holding a ship at a given location. For software a list of functions can be specified, i.e. a sales order processing systems should be able to record customer information so that it can be used to reference a sales order. Functionality is expressed as a totality of essential functions that the software product provides. It is also important to note that the presence or absence of these functions in a software product can be verified as either existing or not, in that it is a Boolean (either a yes or no answer). The other software characteristics listed (i.e. usability) are only present to some degree, i.e. not a simple on or off.
Definition: Appropriateness (to specification) of the functions of the software.
Metric: This is a fairly easy one. We’ve picked the total number of our requirement/the total number of completed requirement. By complete we consider implemented and fully tested. As we’re using SCRUM as our development methodology we were tempted to use user stories but we’ve sticked with business requirement. We might revisit that one.
Definition: Correctness of the functions.
Metric:There are so many possibities for that one that we’ve been struggling for quite some time. We’ve decided on: weight of our overall defects backlog/total number of requirement. We’ve decided on a weight rather than a total number of defects as we wanted to take into consideration the severity of the defects. Basically S1=5 S2=4 S3=3 S4=2 S5=1.
Definition: Ability of a software component to interact with other components or systems.
Metric: Some products which are part of the program need to interact with each others. Some other products need to interact with third-parties. We’ll pretty much come up with a relevant check list for each and assess the interoperability. Pretty much a yes or a no.
Definition: Where appropriate certain industry (or government) laws and guidelines need to be complied with, i.e. SOX. This subcharacteristic addresses the compliant capability of software.
Metric: We have a number of governmental and industry standard we need to be compliant with (typical in financial software) ie. FIPS-141, PCI-DSS, DDA etc. A product is either compliant or it is not. Again an easy one to calculate.
Definition: This subcharacteristic relates to unauthorized access to the software functions.
Metric: Due to the specificity of our software we are required to be certified by Korelogic, Veracode and Fortify. These tools and processes are part of our development process. As rating are given by these tools, we use them as metric.
Once a software system is functioning, as specified, and delivered the reliability characteristic defines the capability of the system to maintain its service provision under defined conditions for defined periods of time. One aspect of this characteristic is fault tolerance that is the ability of a system to withstand component failure. For example if the network goes down for 20 seconds then comes back the system should be able to recover and continue functioning.
Definition:Frequency of failure of the software.
Metric: By failure, we mean the software is down which pretty much matches the definition of our Severity 1 defects. So we look at our total number of S1/KLOC. What is important here is to find a reference for a mature product. We’ll use one of our legacy product to based our reference point on.
Definition: Ability to bring back a failed system to full operation, including data and network connections.
Metric: As part of our QA process we have a phase for recoverability testing and we keep track of all associated defects. So we’ll look at our total number of recoverability defects/KLOC.
Usability only exists with regard to functionality and refers to the ease of use for a given function. For example a function of an ATM machine is to dispense cash as requested. Placing common amounts on the screen for selection, i.e. $20.00, $40.00, $100.00 etc, does not impact the function of the ATM but addresses the Usability of the function. The ability to learn how to use a system (learnability) is also a major subcharacteristic of usability.
Definition: Determines the ease of which the systems functions can be understood, relates to user mental models in Human Computer Interaction methods.
Metric: This is one of the toughest metric as it is very subjective and can require quite a bit of effort to come up with a meaningful pick. As part of the program, we’re running external usability session with all kind of users. Some are very familiar with our existing products, some are newbie. At the end of the session a survey is given for them to fill. An overall score is calculated for each user and will be used as understandability metric.
Definition: Learning effort for different users, i.e. novice, expert, casual etc.
Metric: Again, another tough one. Very difficult to assess during the course of development. Much easier when the product in in production with actual users using it and reporting request for information or user error. As we’re writing user guides and online help for our products, I’m tempted to use the completeness of these as metric for learnability for now.
The ability to identify and fix a fault within a software component is what the maintainability characteristic addresses. In other software quality models this characteristic is referenced as supportability. Maintainability is impacted by code readability or complexity as well as modularization. Anything that helps with identifying the cause of a fault and then fixing the fault is the concern of maintainability. Also the ability to verify (or test) a system, i.e. testability, is one of the sub characteristics of maintainability.
Definition:Characterizes the ability to identify the root cause of a failure within the software.
Metric: Still scratching my head on that one. It involves the usefulness of your log, trace, error message. It can involve code complexity (I’m assuming the simpler the code the easier it is to find a root cause of a failure. Fair assumption?). So I still don’t have a very good answer for that one.
Definition: Characterizes the amount of effort needed for code modification or fault removal.
Metric: Again I’m going to assume that simple code makes it easier to change it and identify problem. We’re using Sonar (hudson plugin) which gives us the cyclomatic complexity of the code. We’ll use it as metric for changeability.
Definition: Characterizes the sensitivity to change of a given system that is the negative impact that may be caused by system changes.
Metric: With this metric, we’re basically trying to understand if our code is prone to regression. As we’re running automatic regression test on all builds, we’ll keep track of our % of passed automatic test cases to measure the stability of our code.
Definition: Characterizes the effort needed to verify (test) a system change.
Metric: I’m a bit bothered by that one as it only takes into consideration the effort but not the actual relevance of the test. If we’re only talking about effort, we’ve picked our total number of automated test versus the total number of automatable tests. But I think we’ll revisit that one.
This characteristic refers to how well the software can adopt to changes in its environment or with its requirements. The sub-characteristics of this characteristic include adaptability. Object oriented design and implementation practices can contribute to the extent to which this characteristic is present in a given system.
Definition: Characterizes the ability of the system to change to new specifications or operating environments.
Metric: We have a list of OS we need to be compliant with (Windows, AIX, Solaris, Linux). A product is either compliant or is not. Easy to calculate.
Defintion: Characterizes the effort required to install the software.
Metric: A bit subjective as it also depends on the user installing the software. We’ve changed a bit the definition to check if an installer is available on each platform.
Definition: Similar to compliance for functionality, but this characteristic relates to portability. One example would be Open SQL conformance which relates to portability of database used.
Metric: We have a number of third-party we need to be compliant with ie. RDBMS, Application Servers and other third-party. Again, a product is either compatible or is not. Easy.
Definition: Characterizes the plug and play aspect of software components, that is how easy is it to exchange a given software component within a specified environment.
Metric: This one is a bit difficult to calculate while under development of brand new products. It’s easier when developing new version of a software as you can test your upgrade path. But in this case, we have brand new products. So what do we measure? How easy to install a new build ? Tough one.
To calculate our overall quality index, we consider that all subcharacteristics are equally important and have the same weight. We’re then able to come up with a very simple radar which can be communicated at any level. We keep a trend of our overall index which obviously need to improve over the course of the release.
In summary, we’ve made some good progress but there are still some grey areas. Especially around learnability, analysability, testability and replaceability.
Hopefully I can tap into the online software testing collective and get some opinion on these or any other you might have advice on. I will exceptionally open up this post to comments to start the discussion. You might have noticed that I don’t allow comments on my blog. I’ve had bad experience with spam in the past and I really don’t have time to deal with it right now. I might revisit the problem in the future to see if new options are available to get rid of ALL spam.
In advance, thanks for your help !