Introduction
Every day, thousands of project managers and developers from around the world are grappling with a far more difficult Sudoku puzzles. They are called by clients eager to provide an economic assessment for developement of a new application or its evolution. Generally, time is tight and eventually the magic number is taken as "the true" and used in the chain of permissions (boss, the booss of the boss, etc.) as more of a definitive number.
The reality behind these actions is unfortunately made of several uncertainties and often they arevery far from reality. For example, after estimating, Project Manager found that new application will be very far from the first estimated.
Over time, the "computer science" has developed a series of methodologies for estimating projects, I will present briefly in this post, showing a lot of effort in giving a structure to this art. Personally I can say that of all the methods used to prefer to use the Use Case Points (UCP) that I made a change that tends to make it less random and allows to estimate not only the man / days but also the estimated cost of the project.
Estimation Models
It's better to emphasize one thing above all: an estimation (or software quotation) is the most likelyhood evaluation of the effort made to develop a particular software system (omit the costs for now). The estimate represents a model and then a representation of reality that will never be reality itself. My personal conclusion was, therefore, to consider an estimation model, the one that comes closest to reality that I manage trying to adapt it according to feedback received at the end of projects. The purpose of this post is not to give a lesson or the formula for the perfect estimation, beacuse you have to find it in your organization, but to provide an overview on the most popular methods of trying to provide a bibliography to choose for yourself.
Lines of Code - LOC or lines of code
It was the first effort to make the scientific approach to measuring software and is based on counting of the number of lines of code that make up the software. Clearly it is a measure that can be collected only at the finished product, or used before the development but after the estimation of some other type of effort (eg. Function Point) with some productivity indexes.
What should be included in the LOC estimation? There are a number of indications that the Software Engineering Institute (SEI) issued through a cecklist and indicating which parts to be counted and those to be excluded [1]. Computation produces DLOC (Delivered LOC) lines of code that are useful for the sizing of the software and then estimation of effort. Instead of DLOC it is possible to compute SLOC(Source LOC) that computes all lines of code (including eg. Statements and comments).
It is possible to convert the LOC in the effort through the application of productivity indices (but they are linked to the organization issuing the projects). For example, based on itsprior experience, organization may have a productivity index (PI) of 160 DLOC per man day and then calculate the effort: Effort (in days / uu) = DLOC / PI.
If you do not have productivity indexes you can also use a parameter estimation derived from previous projects of other organizations and introduced by Boehm in 1981 ([2]). It is the Basic COCOMO and is based on the type of project organization:
- Organic Project: projects with small teams, requirements and technologies known stable, low time pressure.
- Semi-detached: projects with average teams, changing requirements, and time pressure, even if not a constraint
- Embedded: large teams, requirements or unknown variability, strong time pressure and the release date is a constraint, complex environments with complex hardware and unknown technologies.
Basic COCOMO effort estimation is as follows:
Applied Effort (E) = a * (SLOC) ^ b [man-months]
Development time (D) = E ^ c * [months]
People required (P) = E / D [people]
where:
c = 2.5 and
| a | b | d |
Organic | 2.4 | 1.05 | 0.38 |
Semi-detached | 3.0 | 1.12 | 0.35 |
Embedded | 3.6 | 1.20 | 0.32 |
After Basic COCOMO, Boehm introduced Intermediate COCOMO model using 15 parameters estimation and then COCOMO II that represents a complete model that non uses LOC but design elements to evaluate the complexity and thus the effort.
COCOMO II
The Constructive Cost Model II is an incremental estimation model that can be used at different stages of the life cycle of software from time to time by providing more accurate estimates. The phases in which it is possible to estimate are:
- Applications Composition. And 'the esteem that occurs during the phase of prototyping and therefore upstream of the project.
- Early Design. And 'the esteem that occurs during the startup phase of the project and takes into account factors of complexity (or cost drivers) to use such language, the skills of tersonale used, the complexity of the product, etc..
- Post-architecture. Duranty using the project's development or even during system maintenance.
The last two estimates using the concept of Function Points to estimate the effort. For a detailed description of the model COCOMO II see [3].
Function Points (FP)
A measure of software complexity all'effort alternative is surely Function Points (FP), which has a large following in big institutions. There is an organization, IFPUG - International Function Point User Group - which officially regulates activities related to FP and maintains the Function Point Counting Practices Manual which is the official method of Function Point estimation [4].
The basic concept of Function Point Analysis (FPA) is to decompose the system to develop in smaller components which can be easily counted and classified into different types. The counting of all these building blocks and their classification determines the complexity of each component, and then the system as a whole. The FPA can be used early in the project for the overall estimate of the same but is more effective when based on the requirements and then on the relevant documentation.
Let's see what are the components to locate in the system, but first you must determine the boundaries of the system to be analyzed: it is necessary to determine what are functions within the system and instead which are provided by external systems. Once you have determined the boundary (boundary) one can identify the following components:
- Data Function
- ILF. Internal Logical Files: logical grouping of data maintained within the application boundary that is designed to store information in support of elementary processes (eg customer, user, invoice, etc..)
- EIF. External Interface Files: are logical structures of data from external systems.
- Transactional
- EI. External Input. Elementary transactions necessary for the maintenance of logical structures related to the ILF
- EO. External Output. Elementary transactions that are flowing information from within the system to the outside.
- EQ. External Inquiries. Elementary transactions necessary to select data and display it to the end user. Data can be aggregated and come from more ILF, EI
There are also some other characteristics needed to perform Function Point estimation for each of the previous features:
- RET. Record Element Type. Subgroups are linked to an ILF or EIF. For example, given the logical supplier may have connected the RET address, phone, etc..
- DET. Data Element Type: elementary data types are used within an ILF and EIF
- FTR. File Type Reference: ILF or EIF number tied to a particular transaction
Now let's see how to compute the value of our application in FP. This value is called the Unadjusted Function Point (UFP) as it does not take into account the general characteristics of the application. These features, called GSC, will be introduced later.
We come to the estimation of UFP: for each of the features listed should be estimated in terms of the low, medium, high (low, Afvg, High) complexity based on the number of RET and DET and FTR calculated for each feature:
for ILF and EIF:
| DETs |
RETs | 1-19 | 20-50 | >50 |
1 | Low | Low | Avg |
2-5 | Low | Avg | High |
>5 | Avg | High | High |
|
for EI:
| DETs |
FTRs | 1-4 | 5-15 | >15 |
<2 | Low | Low | Avg |
2 | Low | Avg | High |
>2 | Avg | High | High |
|
for EO and EQ:
| DETs |
FTRs | 1-5 | 6-19 | >19 |
<2 | Low | Low | Avg |
2-3 | Low | Avg | High |
>3 | Avg | High | High |
|
Then we have to compute how many features are in classes Low, Avg, High, and for each function points are calculated according to the following table:
| Complexity | Total |
Source | Low | Avg | High |
|
ILF - Internal Logic File | _ * 7 = _ | _ * 10 = _ | _ * 15 = _ |
|
EIF - External Interface Files | _ * 5 = _ | _ * 7 = _ | _ * 10 = _ |
|
EI - External Inputs | _ * 3 = _ | _ * 4 = _ | _ * 6 = _ |
|
EO - External Output | _ * 4 = _ | _ * 5 = _ | _ * 7 = _ |
|
EQ - External Queries | _ * 3 = _ | _ * 4 = _ | _ * 6 = _ |
|
Thus each computed ILF with low complexcity has a value 7 of function points, ILF of tupe Avg 19 function points and each ILF with high complexity 15 function points. The sum total of all function points for each of the ILF, EIF, EI, EO and EQ are, as mentioned above, UFP Unadjusted Function Points.
In addition to the previous estimation, Function Point Analysis considers a number of factors in general complexity of the system to be developed. There are 14, namely:
- Data communications. Overall complexity, evaluated from 0 to 5, the transfer of information: 0 batch processing, 5 application with front-end and different transfer protocols used
- Distributed Data Processing. Complexity of the distribution of data processing: 0, no distribution, soin 5 data collected and processed in different modules and / or external systems Performance. Required levels of performance and responsiveness to the system: 0 no performance requirement, 5 response times are critical and required business performance analysis and use of monitoring tools
- Heavily Used Configuration. Degree of dependence on the type of hardware used: 0 no specificity, 5 allocation for specific pieces of hardware modules.
- Transaction Rate. Frequency of execution of transactions: low 0, 5 very high
- On-line data entry. What percentage of the transaction requires interaction with the end user: 0 bacth everything in fashion, 5 more than 30% of transactions are interactive
- End-user efficiency. Degree of interactivity of the system. There is a table with 16 macro-user interface and is required to assess which and how many of these 16 characteristics are necessary for the final system. The value ranging from 0 - none - to 5 - six or more of the features. The sixteen characteristics are, for example.: Use of menus, scrolling, online help, pop-ups, multilingual support, etc..
- On-line update. How many ILF transactions are managed and maintained by interaction (online): 0 none, 5, the majority of the ILF and will require policies aggiounta automatic recovery of data.
- Complex processing. Degrees of complexity of the mathematical calculations required by the application. Here, the technique provides a table with five kind of complexity. From table you can obtain values from 0 nocomplexuty to 5 high complexity.
- Reusability. Degree of code reusability: reusability, 0 no, 5 is the application specifically developed to be reused in other contexts.
- Installation ease. Degree of complexity of the installation: no special request 0, 5 installation is required, there are special requirements to be checked and tested.
- Operational ease. Degree of complexity required for application maintenance (back-up, start-up and restart the application systems, recovery procedures, etc..): 0 no specific activity, 5 fully automated
- Multiple sites. Degree of deployment: 0 stand-alone, 5 distributed application used by multiple users and multiple organizations, multiple environments and required installation hardware.
- Facilitate change. The application must be developed to facilitate change? It uses a table of five characteristics required for the management of change. 0 indicates none of the features, all 5
At the end of the previous assessment of the value added can be calculated (VAF - Value) required by the application as follows:
VAF = 0.65 + sum (characteristics) / 100
Finally, the final value in function points of our system is calculated as:
FP = UAF * VAF
Let now review the steps to estimate FP:
1. Determine all the ILF, EIF, EO, EI, EQ
2. Determine all RET, DET and FTR
3. Associate RET, DET and FTR to the characteristics of point 1
4. Estimate, based on the previous tables, the level of complexity of ILF, EIF, EO, EI and EQ in terms Low, Avg, High
5. Estimate the PFU obtained by adding all the values in the table conversion complexity / function points
6. Calculate the VAF
7. Estimate function points FP = UFP * VAF
Obtained the value of FP, you must proceed with the effort estimation by applying a productivity index. To date, there are a number of tables computed based on historical data show that the values of man / hours necessary to develop a Function Point. There is a comprehensive database of projects and effort maintained by ISBSG (www.isbsg.org) but generally we can consider for Java projects a productivity index of 0.9 FP per man work/ day day for projects small / medium (up to 350 FP) until to 0.5 F / P for large projects (> 2000 FP).
Use Case Points
An alternative method for estimating the Function Point Analysis is based on the use case of the system that are mainly used by the methodology RUP (Rational Unified Process). The method is derived from a 1997 work by Gustav Karner ([5]) and essentially provides the computation and classification of the actors in the system and its use cases mediated through a number of factors that represent the elements "environmental" that influence the development project.
The first step is to estimate the Unadjusted Use Case Points, UUCP: it starts with the survey and assess the complexity of the system actors and use cases according to the following tables:
Complexity | Definition | Weight |
SIMPLE
|
An actor is SIMPLE if it represents a complex external system, and communication is through the use of libraries or APIs (Application Programming Interface)
|
1
|
AVERAGE
| Actor is AVERAGE if it interacts with:
1. a communication protocol
2. line terminal. |
2
|
COMPLEX
|
Actor is COMPLEX if it interacts with the system using graphical interface.
|
3
|
Actors
Complexity |
Definition
|
Weight
|
SIMPLE
|
Use Case is SIMPLE if it is composed by 3 transactions and a maximum and can be achieved with less than 5 analysis objects.
|
5
|
AVERAGE
|
Use Case is AVERAGE if it uses from 3 to 7 transactions and it can be achieved by using 5 to 10 analysis objects
|
10
|
COMPLEX
|
Use Case is COMPLEX if it uses more than 10 transactions and can be achieved by using more than 10 analysis objects
|
15
|
Use Cases
UUCP is obtained by adding to each actor and each use case the weight:
UUCP = sum (Ni * Wi)
After the estimation of UUCP, model assesses factors that influence the development project, namely the TCF, the Technical Complexity Factor, and the EF, the Environmental Factor. The TCF are the technical factors that may influence the development of the system, they are 13 and weights are defined, namely:
Fi
|
Technical Factors
|
Wi
|
F 1
|
Distributed systems.
|
2
|
F 2
|
Application performance objectives, in either response or throughput.
|
1
|
F 3
|
End user efficiency (on-line).
|
1
|
F 4
|
Complex internal processing.
|
1
|
F 5
|
Reusability, the code must be able to reuse in other applications.
|
1
|
F 6
|
Installation ease.
|
0.5
|
F 7
|
Operational ease, usability.
|
0.5
|
F 8
|
Portability.
|
2
|
F 9
|
Changeability.
|
1
|
F 10
|
Concurrency.
|
1
|
F 11
|
Special security features.
|
1
|
F 12
|
Provide direct access for third parties
|
1
|
F 13
|
Special user training facilities
|
1
|
They are assigned a value from 0 to 5: 0 no influence or importance, 5 maximum influence (or importance) on the project.
TCF is calculated as follow:
TCF = 0.6 + sum (Wi * Fi) / 100
The Environmental Factor (EF) help to estimate the efficiency of the project and consider the following features:
Fj
| Environmental Factors |
Weight (Wj)
|
F 1
|
Familiarity with Objectory (*)
|
1.5
|
F 2
|
Part time workers
|
-1
|
F 3
|
Analyst capability
|
0.5
|
F 4
|
Application experience
|
0.5
|
F 5
|
Object oriented experience
|
1
|
F 6
|
Motivation
|
1
|
F 7
|
Difficult programming language
|
-1
|
F 8
|
Stable requirements
|
2
|
(*) As the context in which you can apply the use case is broader than the use of the methodology Objectory, you can replace the factor F1 with "Familiarity with the methodology adopted."
EF = 1.4 - 0.03 * sum (Wi * Fi)
Where to fi we can assign values from 0 to 5: 0 irrelevant, 5 essential
Finally we can estimate the UCP as follows:
UCP = UUCP * TCF * EF.
In his original paper Kerner estimates the effort need to develop a UCP, through the analysis of three projects obtaining a a range from 20 hours to 28 man hours for the development of a single UCP. In a later work Ribu ([6]) showed that the range dell'effort can vary from 15 to 30 hours for UCP and Schneider and Winters propose to adopt a parametric effort computing the number of features exceeding the value 3 and apply the value 20 or 28 (see [7]). The technique for estimating effort, as well as for Function Point, is to acquire historical data from the organization by some mean of project repository that must be update by post analysis activites at the end of the project. If you haven't some kind of statistics you can get started by applying the Shnider rule for which you can use the value 20 for simple projects and the value 28 for very complex projects.
Update: now you can
download a tool I created for helping in software estimation.
REFERENCES
Much of this information by post prened Shivprasad Koirala good book:
How to Prepare Quotation for Software, Draft 1.0.
For COCOMO models see:
[1]. http://greenbay.usc.edu/csci577/fall2005/projects/team23/LCA/Team23_COCOMOII-SourceLinesOfCodeDefinitionsV1.0.pdf
[2]. Software engineering economics. Englewood Cliffs, NJ: Prentice-Hall, 1981. Barry Boehm.
[3]. COCOMO II Model Definition Manual. Jongmoon Baik. http://www.dicyt.gub.uy/pdt/files/6.2.1_-_cocomo_model.pdf
Function Points
[4]. Function Point Counting Practices Manual. The International Function Point User Group
[5]. Gustav Karner. Resource Estimation for Objectory Projects. 1993.
[6]. Ribu, Kirsten. 2001. Estimating Object-Oriented Software Projects with Use Cases. Master of Science Thesis, University of Oslo, Department of Informatics.
Use Case Point
[7]. Schneider, Geri and Jason P. Winters. 1998. Applying Use Cases: A Practical Guide. Addison Wesley.
[8]. An excellent summary of the UCP can be found in "Estimating With Use Case Points," Mike Cohn.
[9]. An article and a discussion on the productivity of the UCP is in the article by Gianfranco Lanza "Function Point: How to Transform Them in the effort? This is the problem!"