Download Report

Appendix 2
United States Agency for International Development
Performance Monitoring and Evaluation TIPS
NUMBER 1
2011 Printing
PERFORMANCE MONITORING & EVALUATION
TIPS
CONDUCTING A PARTICIPATORY EVALUATION
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
performance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive Service (ADS) Chapter 203.
WHAT IS DIRECT
OBSERVATION ?
USAID is promoting
participation in all aspects of its development
work.
Participatory evaluation provides for active involvement in the evaluation process of those
with a stake in the program: providers, partners, customers (beneficiaries), and any other
interested parties. Participation typically takes
place throughout all phases of the evaluation:
planning and design; gathering and analyzing the
data; identifying the evaluation findings, conclusions, and recommendations; disseminating results; and preparing an action plan to improve
program performance.
This TIPS outlines how
to conduct a participatory
evaluation.
CHARACTERISTICS OF
PARTICIPATORY
EVALUATION
1
Participatory evaluations typically share several cally, rapid appraisal techniques are used to decharacteristics that set them apart from trad- termine what happened and why.
tional evaluation approaches. These include:
Use of facilitators. Participants actually conParticipant focus and ownership. Partici- duct the evaluation, not outside evaluators as is
patory evaluations are primarily oriented to traditional. However, one or more outside exthe information needs of program stakehold- perts usually serve as facilitator—that is, proers rather than of the donor agency. The donor vide supporting roles as mentor, trainer, group
agency simply helps the participants conduct processor, negotiator, and/or methodologist.
their own evaluations, thus building their ownership and commitment to the results and facilitating their follow-up action.
WHY CONDUCT A
PARTICIPATORY
Scope of participation. The range of particiEVALUATION?
pants included and the roles they play may vary.
For example, some evaluations may target only
program providers or beneficiaries, while othExperience has shown that participatory evaluers may include the full array of stakeholders.
ations improve program performance. Listening
Participant negotiations. Participating to and learning from program beneficiaries, field
groups meet to communicate and negotiate to staff, and other stakeholders who know why a
reach a consensus on evaluation findings, solve program is or is not working is critical to makproblems, and make plans to improve perfor- ing improvements. Also, the more these insiders are involved in identifying evaluation quesmance.
tions and in gathering and analyzing data, the
Diversity of views.Views of all participants are more likely they are to use the information to
sought and recognized. More powerful stake- improve performance. Participatory evaluation
holders allow participation of the less powerful. empowers program providers and beneficiaries
to act on the knowledge gained.
Learning process. The process is a learning experience for participants. Emphasis is on Advantages to participatory evaluations are
identifying lessons learned that will help partici- that they:
pants improve program implementation, as well
as on assessing whether targets were achieved. • Examine relevant issues by involving key
players in evaluation design
Flexible design. While some preliminary
planning for the evaluation may be necessary,
design issues are decided (as much as possible)
in the participatory process. Generally, evaluation questions and data collection and analysis
methods are determined by the participants,
not by outside evaluators.
• Promote participants’ learning about the
program and its performance and enhance
their understanding of other stakeholders’
points of view
• Improve participants’ evaluation skills
• Mobilize stakeholders, enhance teamwork,
and build shared commitment to act on evalua-
Empirical orientation. Good participatory
evaluations are based on empirical data. Typi2
ment exists among stakeholders that a collaborative approach is likely to fail.
tion recommendations
• Increase likelihood that evaluation informaStep 2: Decide on the degree of particition will be used to improve performance
pation. What groups will participate and what
roles will they play? Participation may be broad,
But there may be disadvantages. For example,
with a wide array of program staff, beneficiaries,
participatory evaluations may
partners, and others. It may, alternatively, tar• Be viewed as less objective because program get one or two of these groups. For example,
if the aim is to uncover what hinders program
staff, customers, and other stakeholders
implementation, field staff may need to be inwith possible vested interests participate
volved. If the issue is a program’s effect on local communities, beneficiaries may be the most
• Be less useful in addressing highly technical
appropriate participants. If the aim is to know
aspects
if all stakeholders understand a program’s goals
and view progress similarly, broad participation
• Require considerable time and resources to
identify and involve a wide array of stakehold- may be best. Roles may range from serving as
a resource or informant to participating fully in
ers
some or all phases of the evaluation.
• Take participating staff away from ongoing
Step 3: Prepare the evaluation scope of
activities
work. Consider the evaluation approach—the
• Be dominated and misused by some stake- basic methods, schedule, logistics, and funding.
Special attention should go to defining roles of
holders to further their own interests
the outside facilitator and participating stakeholders. As much as possible, decisions such as
STEPS IN CONDUCTING A the evaluation questions to be addressed and
the development of data collection instruments
PARTICIPATORY
and analysis plans should be left to the particiEVALUATION
patory process rather than be predetermined
in the scope of work.
Step 1: Decide if a participatory evaluation approach is appropriate. Participatory
evaluations are especially useful when there are
questions about implementation difficulties or
program effects on beneficiaries, or when information is wanted on stakeholders’ knowledge
of program goals or their views of progress.
Traditional evaluation approaches may be more
suitable when there is a need for independent
outside judgment, when specialized information
is needed that only technical experts can provide, when key stakeholders don’t have time to
participate, or when such serious lack of agree-
Step 4: Conduct the team planning meeting. Typically, the participatory evaluation process begins with a workshop of the facilitator
and participants. The purpose is to build consensus on the aim of the evaluation; refine the
scope of work and clarify roles and responsibilities of the participants and facilitator; review
the schedule, logistical arrangements, and agenda; and train participants in basic data collection and analysis. Assisted by the facilitator, participants identify the evaluation questions they
want answered. The approach taken to identify
questions may be open ended or may stipulate
3
broad areas of inquiry. Participants then select
appropriate methods and develop data-gathering instruments and analysis plans needed to
answer the questions.
and interpreting them help participants build a
common body of knowledge. Once the analysis
is complete, facilitators work with participants
to reach consensus on findings, conclusions, and
recommendations. Facilitators may need to negotiate among stakeholder groups if disagreements emerge. Developing a common understanding of the results, on the basis of empirical
evidence, becomes the cornerstone for group
commitment to a plan of action.
Step 5: Conduct the evaluation. Participatory evaluations seek to maximize stakeholders’ involvement in conducting the evaluation
in order to promote learning. Participants define the questions, consider the data collection
skills, methods, and commitment of time and labor required. Participatory evaluations usually
use rapid appraisal techniques, which are simpler, quicker, and less costly than conventional
sample surveys. They include methods such as
those in the box below. Typically, facilitators are
skilled in these methods, and they help train
and guide other participants in their use.
Step 7: Prepare an action plan. Facilitators
work with participants to prepare an action
plan to improve program performance. The
knowledge shared by participants about a program’s strengths and weaknesses is turned into
action. Empowered by knowledge, participants
become agents of change and apply the lessons
they have learned to improve performance.
Step 6: Analyze the data and build consensus on results. Once the data are gathered, participatory approaches to analyzing
WHAT’S DIFFERENT ABOUT PARTICIPATORY
EVALUATIONS?
Traditional Evaluation
Participatory Evaluation
• participant focus and ownership of
evaluation
• donor focus and ownership of evaluation
• broad range of stakeholders participate
• stakeholders often don’t participate
• focus is on accountability
• focus is on learning
• predetermined design
• flexible design
• formal methods
• rapid appraisal methods
• outsiders are evaluators
• outsiders are facilitators
4
may be selected through probability or
nonprobability sampling techniques, or
through “convenience” sampling (interviewing stakeholders at locations where
they’re likely to be, such as a clinic for
a survey on health care programs). The
major advantage of minisurveys is that
the datacan be collected and analyzed
within a few days. It is the only rapid appraisal method that generates quantitative data.
Rapid Appraisal Methods
Key informant interviews. This involves interviewing 15 to 35 individuals
selected for their knowledge and experience in a topic of interest. Interviews are
qualitative, in-depth, and semistructured.
They rely on interview guides that list
topics or open-ended questions. The interviewer subtly probes the informant to
elicit information, opinions, and experiences.
Case studies. Case studies record
anedotes that illustrate a program’s
shortcomings or accomplishments. They
tell about incidents or concrete events,
often from one person’s experience.
Focus group interviews. In these,
8 to 12 carefully selected participants
freely discuss issues, ideas, and experiences among themselves. A moderator introduces the subject, keeps the
discussion going, and tries to prevent
domination of the discussion by a few
participants. Focus groups should be
homogeneous, with participants of similar backgrounds as much as possible.
Village imaging. This involves
groups of villagers drawing maps or diagrams to identify and visualize problems
and solutions.
Selected Further Reading
Community group interviews.
Aaker, Jerry and Jennifer Shumaker. 1994.
Looking Back and Looking Forward: A Participatory Approach to Evaluation. Heifer Project
International. P.O. Box 808, Little Rock, AK
72203.
These take place at public meetings
open to all community members. The primary interaction is between the participants and the interviewer, who presides
over the meeting and asks questions,
following a carefully prepared questionnaire.
Aubel, Judi. 1994. Participatory Program Evaluation: A Manual for Involving Program Stakeholders in the Evaluation Process. Catholic
Relief Services. USCC, 1011 First Avenue, New
York, NY 10022.
Direct observation. Using a detailed
observation form, observers record what
they see and hear at a program site. The
information may be about physical surroundings or about ongoing activities,
processes, or discussions.
Minisurveys. These are usually
Freeman, Jim. Participatory Evaluations: Making
Projects Work, 1994. Dialogue on Development Technical Paper No. TP94/2. International
Centre, The University of Calgary.
based on a structured questionnaire with
a limited number of mostly closeended
questions. They are usually administered to 25 to 50 people. Respondents
Feurstein, Marie-Therese. 1991. Partners inEvaluation: Evaluating Development and Community Programmes with Participants. TALC,
5
Box 49, St. Albans, Herts AL1 4AX, United
Kingdom.
Guba, Egon and Yvonna Lincoln. 1989. Fourth
Generation Evaluation. Sage Publications.
Pfohl, Jake. 1986. Participatory Evaluation: A
User’s Guide. PACT Publications. 777 United
Nations Plaza, New York, NY 10017.
Rugh, Jim. 1986. Self-Evaluation: Ideas for
Participatory Evaluation of Rural Community
Development Projects. World Neighbors Publication.
6
1996, Number 2
Performance Monitoring and Evaluation
TIPS
USAID C enter for D evelopment I nformation and E valuation
CONDUCTING KEY INFORMANT INTERVIEWS
What Are Key Informant Interviews?
USAID reengineering
emphasizes listening
to and consulting
with customers, partners and other stakeholders as we undertake development
activities.
Rapid appraisal techniques offer systematic ways of getting
such information
quickly and at low
cost. This Tips advises how to conduct
one such method—
key informant interviews.
They are qualitative, in-depth interviews of 15 to 35 people selected
for their first-hand knowledge about a topic of interst. The interviews are loosely structured, relying on a list of issues to be discussed. Key informant interviews resemble a conversation among
acquaintances, allowing a free flow of ideas and information. Interviewers frame questions spontaneously, probe for information and
takes notes, which are elaborated on later.
When Are Key Informant Interviews Appropriate?
This method is useful in all phases of development activities—
identification, planning, implementation, and evaluation. For example, it can provide information on the setting for a planned activity that might influence project design. Or, it could reveal why
intended beneficiaries aren’t using services offered by a project.
Specifically, it is useful in the following situations:
1. When qualitative, descriptive information is sufficient for decision-making.
2. When there is a need to understand motivation, behavior, and
perspectives of our customers and partners. In-depth interviews
of program planners and managers, service providers, host
government officials, and beneficiaries concerning their attitudes
and behaviors about a USAID activity can help explain its
successes and shortcomings.
3. When a main purpose is to generate recommendations. Key
informants can help formulate recommendations that can improve a program’s performance.
4. When quantitative data collected through other methods need to
be interpreted. Key informant interviews can provide the how
and why of what happened. If, for example, a sample survey
showed farmers were failing to make loan repayments, key
informant interviews could uncover the reasons.
PN-ABS-541
2
5. When preliminary information is needed to
design a comprehensive quantitative study.
Key informant interviews can help frame the
issues before the survey is undertaken.
Advantages and Limitations
Advantages of key informant interviews include:
•
they provide information directly from
knowledgeable people
•
they provide flexibility to explore new ideas
and issues not anticipated during planning
•
they are inexpensive and simple to conduct
Some disadvantages:
•
they are not appropriate if quantitative data are
needed
•
they may be biased if informants are not
carefully selected
•
•
they are susceptible to interviewer biases
Step 3. Select key informants.
The number should not normally exceed 35. It is
preferable to start with fewer (say, 25), since often
more people end up being interviewed than is
initially planned.
Key informants should be selected for their specialized knowledge and unique perspectives on a
topic. Planners should take care to select informants with various points of view.
Selection consists of two tasks: First, identify the
groups and organizations from which key informants should be drawn—for example, host government agencies, project implementing agencies,
contractors, beneficiaries. It is best to include all
major stakeholders so that divergent interests and
perceptions can be captured.
Second, select a few people from each category
after consulting with people familiar with the
groups under consideration. In addition, each
informant may be asked to suggest other people
who may be interviewed.
Step 4. Conduct interviews.
it may be difficult to prove validity of
findings
Once the decision has been made to conduct key
informant interviews, following the step-by-step
advice outlined below will help ensure highquality information.
Establish rapport. Begin with an explanation of
the purpose of the interview, the intended uses of
the information and assurances of confidentiality.
Often informants will want assurances that the
interview has been approved by relevant officials.
Except when interviewing technical experts,
questioners should avoid jargon.
Steps in Conducting the Interviews
Step 1. Formulate study questions.
These relate to specific concerns of the study.
Study questions generally should be limited to five
or fewer.
Step 2. Prepare a short interview guide.
Key informant interviews do not use rigid questionnaires, which inhibit free discussion. However,
interviewers must have an idea of what questions
to ask. The guide should list major topics and
issues to be covered under each study question.
Because the purpose is to explore a few issues in
depth, guides are usually limited to 12 items.
Different guides may be necessary for interviewing different groups of informants.
Sequence questions. Start with factual questions.
Questions requiring opinions and judgments
should follow. In general, begin with the present
and move to questions about the past or future.
Phrase questions carefully to elicit detailed information. Avoid questions that can be answered by a
simple yes or no. For example, questions such as
“Please tell me about the vaccination campaign?”
are better than “Do you know about the vaccination campaign?”
Use probing techniques. Encourage informants to
detail the basis for their conclusions and recommendations. For example, an informant’s comment, such as “The water program has really
changed things around here,” can be probed for
more details, such as “What changes have you
noticed?” “Who seems to have benefitted most?”
“Can you give me some specific examples?”
3
Maintain a neutral attitude. Interviewers should be
sympathetic listeners and avoid giving the impression of having strong views on the subject under
discussion. Neutrality is essential because some
informants, trying to be polite, will say what they
think the interviewer wants to hear.
Minimize translation difficulties. Sometimes it is
necessary to use a translator, which can change the
dynamics and add difficulties. For example,
differences in status between the translator and
informant may inhibit the conversation. Often
information is lost during translation. Difficulties
can be minimized by using translators who are not
known to the informants, briefing translators on
the purposes of the study to reduce misunderstandings, and having translators repeat the informant’s
comments verbatim.
Step 5. Take adequate notes.
Interviewers should take notes and develop them
in detail immediately after each interview to
ensure accuracy. Use a set of common subheadings
for interview texts, selected with an eye to the
major issues being explored. Common subheadings ease data analysis.
Step 6. Analyze interview data.
Interview summary sheets. At the end of each
interview, prepare a 1-2 page interview summary
sheet reducing information into manageable
themes, issues, and recommendations. Each
summary should provide information about the
key informant’s position, reason for inclusion in
the list of informants, main points made, implications of these observations, and any insights or
ideas the interviewer had during the interview.
Descriptive codes. Coding involves a systematic
recording of data. While numeric codes are not
appropriate, descriptive codes can help organize
responses. These codes may cover key themes,
concepts, questions, or ideas, such as
sustainability, impact on income, and participation
of women. A usual practice is to note the codes or
categories on the left-hand margins of the interview text. Then a summary lists the page numbers
where each item (code) appears. For example,
women’s participation might be given the code
“wom–par,” and the summary sheet might indicate
it is discussed on pages 7, 13, 21, 46, and 67 of the
interview text.
Categories and subcategories for coding (based on
key study questions, hypotheses, or conceptual
frameworks) can be developed before interviews
begin, or after the interviews are completed.
Precoding saves time, but the categories may not
be appropriate. Postcoding helps ensure empirically relevant categories, but is time consuming. A
compromise is to begin developing coding categories after 8 to 10 interviews, as it becomes apparent which categories are relevant.
Storage and retrieval. The next step is to develop a
simple storage and retrieval system. Access to a
computer program that sorts text is very helpful.
Relevant parts of interview text can then be organized according to the codes. The same effect can
be accomplished without computers by preparing
folders for each category, cutting relevant comments from the interview and pasting them onto
index cards according to the coding scheme, then
filing them in the appropriate folder. Each index
card should have an identification mark so the
comment can be attributed to its source.
Presentation of data. Visual displays such as
tables, boxes, and figures can condense information, present it in a clear format, and highlight
underlying relationships and trends. This helps
communicate findings to decision-makers more
clearly, quickly, and easily. Three examples below
and on page 4 illustrate how data from key informant interviews might be displayed.
Table 1. Problems Encountered in
Obtaining Credit
Male Farmers
1. Collateral
requirements
2. Burdensome
paperwork
Female Farmers
1. Collateral
requirements
2. Burdensome
paperwork
3. Long delays in 3. Long delays in
getting loans
getting loans
4. Land registered under
male's name
5. Difficulty getting to
bank location
4
Table 2. Impacts on Income of a
Microenterprise Activity
“In a survey I did of the participants last year, I
found that a majority felt their living conditions have improved.”
—university professor
Assess reliability of key informants. Assess informants’ knowledgeability, credibility, impartiality,
willingness to respond, and presence of outsiders
who may have inhibited their responses. Greater
weight can be given to information provided by
more reliable informants.
“I have doubled my crop and profits this year
as a result of the loan I got.”
—participant
Check interviewer or investigator bias. One’s own
biases as an investigator should be examined,
including tendencies to concentrate on information
that confirms preconceived notions and hypotheses, seek consistency too early and overlook
evidence inconsistent with earlier findings, and be
partial to the opinions of elite key informants.
“I believe that women have not benefitted as
much as men because it is more difficult for us
to get loans.”
—female participant
Check for negative evidence. Make a conscious
effort to look for evidence that questions preliminary findings. This brings out issues that may have
been overlooked.
Table 3. Recommendations for
Improving Training
Recommendation
Number of
Informants
Develop need-based training
courses
39
Develop more objective selection
procedures
20
Plan job placement after training
11
Get feedback from informants. Ask the key informants for feedback on major findings. A summary
report of the findings might be shared with them,
along with a request for written comments. Often a
more practical approach is to invite them to a
meeting where key findings are presented and ask
for their feedback.
Selected Further Reading
These tips are drawn from Conducting Key Informant Interviews in Developing Countries, by
Krishna Kumar (AID Program Design and Evaluation Methodology Report No. 13. December 1986.
PN-AAX-226).
Step 7. Check for reliability and validity.
Key informant interviews are susceptible to error,
bias, and misinterpretation, which can lead to
flawed findings and recommendations.
Check representativeness of key informants. Take
a second look at the key informant list to ensure no
significant groups were overlooked.
U.S. Agency for International Development
For further information on this topic, contact Annette
Binnendijk, CDIE Senior Evaluation Advisor, via
phone (703) 875-4235), fax (703) 875-4866), or e-mail.
Copies of TIPS can be ordered from the Development
Information Services Clearinghouse by calling (703)
351-4006 or by faxing (703) 351-4039. Please refer to
the PN number. To order via the Internet, address a
request to docorder@disc.mhs.compuserve.com
Washington, D.C. 20523
2
ND
NUMBER 3
EDITION, 2010
PERFORMANCE MONITORING & EVALUATION
TIPS
PREPARING AN EVALUATION STATEMENT OF WORK
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
performance management and evaluation. This publication is a supplemental reference to the
Automated Directive System (ADS) Chapter 203.
PARTICIPATION IS KEY
Use a participatory process to ensure
resulting information will be relevant
and useful. Include a range of staff
and partners that have an interest in
the evaluation to:
 Participate in planning meetings
and review the SOW;
 Elicit input on potential evaluation
questions; and
 Prioritize and narrow the list of
questions as a group.
WHAT IS AN
EVALUATION
STATEMENT OF
WORK (SOW)?
The statement of work (SOW) is
viewed as the single most critical
document in the development of
a good evaluation. The SOW
states (1) the purpose of an
evaluation, (2) the questions that
must be answered, (3) the
expected quality of the evaluation
results, (4) the expertise needed
to do the job and (5) the time
frame and budget available to
support the task.
WHY IS THE SOW IMPORTANT?
The SOW is important because it
is a basic road map of all the
elements of a well-crafted
evaluation. It is the substance of
a
contract
with
external
evaluators, as well as the
framework for guiding an internal
evaluation team. It contains the
information that anyone who
implements the evaluation needs
to know about the purpose of the
1
evaluation, the background and
history of the program being
evaluated,
and
the
issues/questions that must be
addressed. Writing a SOW is
about managing the first phase of
the evaluation process. Ideally,
the writer of the SOW will also
exercise management oversight
of the evaluation process.
PREPARATION – KEY
ISSUES
BALANCING FOUR
DIMENSIONS
A well drafted SOW is a critical
first step in ensuring the
credibility and utility of the final
evaluation report.
Four key
dimensions of the SOW are
interrelated and should be
balanced against one another
(see Figure 1):
 The number and complexity of
the evaluation questions that
need to be addressed;
 Adequacy of the time allotted
to obtain the answers;
 Availability of funding (budget)
to support the level of
evaluation design and rigor
required; and
 Availability
of the expertise
needed to complete the job.
The development of the SOW is
an iterative process in which the
writer has to revisit, and
sometimes adjust, each of these
dimensions.
Finding
the
appropriate balance is the main
challenge faced in developing any
SOW.
essential
that
evaluation
planning form an integral part
of the initial program or project
design. This includes factoring
in baseline data collection,
possible comparison or „control‟
site
selection,
and
the
preliminary design of data
collection
protocols
and
instruments. Decisions about
evaluation design must be
reflected in implementation
planning and in the budget.
 There
will always be unanticipated
problems
and
opportunities
that
emerge
during an evaluation.
It is
helpful to build-in ways to
accommodate
necessary
changes.
 The writer of the SOW is, in
essence, the architect of the
evaluation. It is important to
commit adequate time and
energy to the task.
 Adequate time is required to
ADVANCE PLANNING
It is a truism that good planning
is a necessary – but not the only –
condition for success in any
enterprise. The SOW preparation
process is itself an exercise in
careful and thorough planning.
The writer must consider several
principles when beginning the
process.
gather information and to build
productive relationships with
stakeholders (such as program
sponsors,
participants,
or
partners) as well as the
evaluation team, once selected.
 The sooner that information can
be made available to the
evaluation team, the more
efficient they can be in
providing credible answers to
the
important
questions
outlined in the SOW.
 The quality of the evaluation is
dependent on providing quality
guidance in the SOW.
 As USAID and other donors
place more emphasis on
rigorous impact evaluation, it is
WHO SHOULD BE INVOLVED?
Participation in all or some part of
the evaluation is an important
decision for the development of
the SOW. USAID and evaluation
experts strongly recommend that
evaluations maximize stakeholder
participation, especially in the
initial
planning
process.
Stakeholders may encompass a
wide array of persons and
institutions,
including
policy
makers,
program
managers,
implementing
partners,
host
country
organizations,
and
beneficiaries.
In some cases,
stakeholders
may
also
be
involved
throughout
the
evaluation
and
with
the
dissemination of results. The
benefits
of
stakeholder
participation
include
the
following:
 Learning
across a broader
group of decision-makers, thus
increasing the likelihood that
the evaluation findings will be
used to improve development
effectiveness;
 Acceptance of the purpose and
process of evaluation by those
concerned;
 A more inclusive and better
focused list of questions to be
answered;
 Increased
acceptance
and
ownership of the process,
findings and conclusions; and
 Increased possibility that the
evaluation will be used by
decision makers and other
stakeholders.
USAID operates in an increasingly
complex implementation world
2
FIGURE 2. ELEMENTS OF A
GOOD EVALUATION SOW
1. Describe the activity, program, or
process to be evaluated
2. Provide a brief background on the
development hypothesis and its
implementation
3. State the purpose and use of the
evaluation
4. Clarify the evaluation questions
5. Identify the evaluation method(s)
6. Identify existing performance
information sources, with special
attention to monitoring data
7. Specify the deliverables(s) and the
timeline
8. Identify the composition of the
evaluation team (one team
member should be an evaluation
specialist) and participation of
customers and partners
9. Address schedule and logistics
10. Clarify requirements for reporting
and dissemination
11. Include a budget
with many players, including
other USG agencies such as the
Departments of State, Defense,
Justice and others. If the activity
engages other players, it is
important to include them in the
process.
Within USAID, there are useful
synergies that can emerge when
the SOW development process is
inclusive. For example, a SOW
that focuses on civil society
advocacy might benefit from
input by those who are experts in
rule of law.
Participation by host government
and local organizational leaders
and beneficiaries is less common
among
USAID
supported
evaluations. It requires sensitivity
and
careful
management;
however,
the
benefits
to
development practitioners can be
substantial.
Participation of USAID managers
in evaluations is an increasingly
common practice and produces
many benefits. To ensure against
bias or conflict of interest, the
USAID manager‟s role can be
limited to participating in the fact
finding phase and contributing to
the analysis. However, the final
responsibility
for
analysis,
conclusions
and
recommendations will rest with
the independent members and
team leader.
THE ELEMENTS OF A
GOOD EVALUATION
SOW
1. DESCRIBE THE ACTIVITY,
PROGRAM, OR PROCESS TO BE
EVALUATED
Be as specific and complete as
possible in describing what is to
be evaluated.
The more
information provided at the
outset, the more time the
evaluation team will have to
develop the data needed to
answer the SOW questions.
If the USAID manager does not
have the time and resources to
bring together all the relevant
information needed to inform the
evaluation in advance, the SOW
might require the evaluation
team to submit a document
review as a first deliverable. This
will, of course, add to the amount
of time and budget needed in the
evaluation contract.
3
2. PROVIDE A BRIEF
BACKGROUND
Give a brief description of the
context, history and current status
of the activities or programs,
names of implementing agencies
and organizations involved, and
other information to help the
evaluation
team
understand
background and context.
In
addition, this section should state
the development hypothesis(es)
and clearly describe the program
(or project) theory that underlies
the program‟s design.
USAID
activities,
programs
and
strategies, as well as most
policies, are based on a set of “ifthen” propositions that predict
how a set of interventions will
produce intended results.
A
development
hypothesis
is
generally represented in a results
framework (or sometimes a
logical framework at the project
level) and identifies the causal
relationships
among
various
objectives sought by the program
(see TIPS 13: Building a Results
Framework). That is, if one or
more objectives are achieved,
then the next higher order
objective will be achieved.
Whether
the
development
hypothesis is the correct one, or
whether it remains valid at the
time of the evaluation, is an
important question for most
evaluation SOWs to consider.
3. STATE THE PURPOSE AND
USE OF THE EVALUATION
Why is an evaluation needed?
The clearer the purpose, the more
likely it is that the evaluation will
produce credible and useful
findings,
conclusions
and
recommendations.
In defining
the purpose, several questions
should be considered.
 Who wants the information?
Will higher level decision
makers be part of the intended
audience?
 What do they want to know?
 For
what purpose will
information be used?
the
 When will it be needed?
 How accurate must it be?
ADS 203.3.6.1 identifies a number
of triggers that may inform the
purpose and use of an evaluation,
as follows:
 A key management decision is
required for which there is
inadequate information;
 Performance
information
indicates an unexpected result
(positive or negative) that
should be explained (such as
gender differential results);
 Customer,
partner, or other
informed feedback suggests
that there are implementation
problems, unmet needs, or
unintended consequences or
impacts;
 Issues of impact, sustainability,
cost-effectiveness, or relevance
arise;
 The validity of the development
hypotheses
or
critical
assumptions is questioned, for
example, due to unanticipated
changes in the host country
environment; and
 Periodic portfolio reviews have
identified key questions that
need to be answered or require
consensus.
4. CLARIFY THE EVALUATION
QUESTIONS
The core element of an
evaluation SOW is the list of
questions
posed
for
the
evaluation.
One of the most
common
problems
with
evaluation SOWs is that they
contain a long list of poorly
defined or “difficult to answer”
questions given the time, budget
and resources provided. While a
participatory process ensures
wide ranging input into the initial
list of questions, it is equally
important to reduce this list to a
manageable number of key
questions. Keeping in mind the
relationship between budget,
time, and expertise needed, every
potential question should be
thoughtfully examined by asking
a number of questions.
 Is this question of essential
importance to the purpose and
the users of the evaluation?
 Is this question clear, precise
and „researchable‟?
 What level of reliability and
validity is expected in answering
the question?
 Does determining an answer to
the question require a certain
kind
of
experience
and
expertise?
 Are we prepared to provide the
management
commitment,
time and budget to secure a
credible
answer
to
this
question?
4
If these questions can be
answered yes, then the team
probably has a good list of
questions that will inform the
evaluation team and drive the
evaluation process to a successful
result.
5. IDENTIFY EVALUATION
METHODS
The SOW manager has to decide
whether the evaluation design
and methodology should be
specified in the SOW. 1
This
depends on whether the writer
has expertise, or has internal
access to evaluation research
knowledge and experience. If so,
and the writer is confident of the
„on the ground‟ conditions that
will allow for different evaluation
designs, then it is appropriate to
include specific requirements in
the SOW.
If the USAID SOW manager does
not have the kind of evaluation
experience needed, especially for
more formal and rigorous
evaluations, it is good practice to:
1) require that the team (or
bidders, if it is contracted out)
include a description of (or
approach for developing) the
proposed research design and
methodology, or 2) require a
detailed design and evaluation
plan to be submitted as a first
deliverable. In this way, the SOW
manager benefits from external
evaluation expertise. In either
case,
the
design
and
methodology should not be
finalized until the team has an
opportunity to gather detailed
1
See USAID ADS 203.3.6.4 on
Evaluation Methodologies;
information and discuss
issues with USAID.
final
The selection of the design and
data collection methods must be
a function of the type of
evaluation and the level of
statistical and quantitative data
confidence needed. If the project
is selected for a rigorous impact
evaluation, then the design and
methods used will be more
sophisticated and technically
complex. If external assistance is
necessary, the evaluation SOW
will be issued as part of the initial
RFP/RFA (Request for Proposal or
Request
for
Application)
solicitation process. All methods
and evaluation designs should be
as
rigorous
as
reasonably
possible. In some cases, a rapid
appraisal
is
sufficient
and
appropriate (see TIPS 5: Using
Rapid Appraisal Methods). At the
other extreme, planning for a
sophisticated
and
complex
evaluation
process
requires
greater up-front investment in
baselines, outcome monitoring
processes,
and
carefully
constructed
experimental
or
quasi-experimental designs.
6. IDENTIFY EXISTING
PERFORMANCE INFORMATION
Identify
the
existence
and
availability
of
relevant
performance information sources,
such as performance monitoring
systems
and/or
previous
evaluation reports. Including a
summary of the types of data
available, the timeframe, and an
indication of their quality and
reliability will help the evaluation
team to build on what is already
available.
7. SPECIFY DELIVERABLES
AND TIMELINE
The SOW must specify the
products, the time frame, and the
content of each deliverable that is
required
to
complete
the
evaluation contract. Some SOWs
simply require delivery of a draft
evaluation report by a certain
date. In other cases, a contract
may require several deliverables,
such as a detailed evaluation
design, a work plan, a document
review, and the evaluation report.
The most important deliverable is
the final evaluation report. TIPS
17: Constructing an Evaluation
Report provides a suggested
outline of an evaluation report
that may be adapted and
incorporated directly into this
section.
The evaluation report should
differentiate between findings,
conclusions,
and
recommendations, as outlined in
Figure 3. As evaluators move
beyond
the
facts,
greater
interpretation is required.
By
ensuring that the final report is
organized
in
this
manner,
decision makers can clearly
understand the facts on which the
evaluation is based. In addition,
it
facilitates
greater
understanding of where there
might
be
disagreements
concerning the interpretation of
those facts.
While individuals
may
disagree
on
recommendations, they should
not disagree on the basic facts.
5
Another consideration is whether
a section on “lessons learned”
should be included in the final
report. A good evaluation will
produce knowledge about best
practices, point out what works,
what does not, and contribute to
the more general fund of tested
experience on which other
program
designers
and
implementers can draw.
Because unforeseen obstacles
may emerge, it is helpful to be as
realistic as possible about what
can be accomplished within a
given time frame. Also, include
some wording that allows USAID
and the evaluation team to adjust
schedules in consultation with the
USAID manager should this be
necessary.
8. DISCUSS THE COMPOSITION
OF THE EVALUATION TEAM
USAID evaluation guidance for
team
selection
strongly
recommends that at least one
team member have credentials
and experience in evaluation
design and methods. The team
leader must have strong team
management skills, and sufficient
experience
with
evaluation
standards and practices to ensure
a credible product.
The
appropriate team leader is a
person with whom the SOW
manager can develop a working
partnership as the team moves
through the evaluation research
design and planning process.
He/she must also be a person
who can deal effectively with
senior U.S. and host country
officials and other leaders.
Experience with USAID is often an
important factor, particularly for
management
focused
evaluations, and in formative
evaluations designed to establish
the basis for a future USAID
program or the redesign of an
existing program.
If the
evaluation entails a high level of
complexity, survey research and
other sophisticated methods, it
may be useful to add a data
collection and analysis expert to
the team.
Generally, evaluation skills will be
supplemented with additional
subject matter experts. As the
level of research competence
increases in many countries
where USAID has programs, it
makes good sense to include
local
collaborators,
whether
survey
research
firms
or
independents, to be full members
of the evaluation team.
9. ADDRESS SCHEDULING,
LOGISTICS AND OTHER
SUPPORT
Good scheduling and effective
local support contributes greatly
to the efficiency of the evaluation
team. This section defines the
time frame and the support
structure needed to answer the
evaluation questions at the
required level of validity. For
evaluations involving complex
designs and sophisticated survey
research data collection methods,
the schedule must allow enough
time, for example, to develop
sample frames, prepare and
pretest
survey
instruments,
training interviewers, and analyze
data. New data collection and
analysis
technologies
can
accelerate this process, but need
to be provided for in the budget.
In some cases, an advance trip to
the field by the team leader
and/or methodology expert may
be justified where extensive
pretesting
and
revision
of
instruments is required or when
preparing for an evaluation in
difficult or complex operational
environments.
Adequate
logistical
and
administrative support is also
essential. USAID often works in
countries with poor infrastructure,
frequently in conflict/post-conflict
environments where security is an
issue. If the SOW requires the
team to make site visits to distant
or
difficult
locations,
such
planning must be incorporated
into the SOW.
6
Particularly overseas, teams often
rely on local sources for
administrative support, including
scheduling of appointments,
finding
translators
and
interpreters,
and
arranging
transportation. In many countries
where foreign assistance experts
have been active, local consulting
firms have developed this kind of
expertise. Good interpreters are
in high demand, and are essential
to any evaluation team‟s success,
especially when using qualitative
data collection methods.
10. CLARIFY REQUIREMENTS
FOR REPORTING AND
DISSEMINATION
Most evaluations involve several
phases of work, especially for
more complex designs.
The
SOW can set up the relationship
between the evaluation team, the
USAID manager and other
stakeholders. If a working group
was established to help define
the SOW questions, continue to
use the group as a forum for
interim reports and briefings
provided by the evaluation team.
The SOW should specify the
timing and details for each
briefing session.
Examples of
what might be specified include:
 Due dates for draft and final
reports;
 Dates for oral briefings (such as
a mid-term and final briefing);
 Number of copies needed;
 Language requirements, where
applicable;
 Formats and page limits;
 Requirements for datasets, if
primary
data
collected;
has
been
 A requirement to submit all
evaluations to the Development
Experience Clearing house for
archiving
this
is
the
responsibility of the evaluation
contractor; and
 Other
needs
communicating, marketing
disseminating results that
the
responsibility
of
evaluation team.
for
and
are
the
The SOW should specify when
working drafts are to be
submitted for review, the time
frame allowed for USAID review
and comment, and the time
frame to revise and submit the
final report.
11. INCLUDE A BUDGET
With the budget section, the
SOW comes full circle. As stated,
budget considerations have to be
part of the decision making
process from the beginning.
The budget is a product of the
questions
asked,
human
resources needed, logistical and
administrative support required,
and the time needed to produce
a high quality, rigorous and
useful evaluation report in the
most efficient and timely manner.
It is essential for contractors to
understand the quality, validity
and rigor required so they can
develop a responsive budget that
will meet the standards set forth
in the SOW.
For more information:
TIPS publications are available online at [insert website].
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication including USAID‟s
Office of Management Policy, Budget and Performance (MPBP). This publication was written by Richard
Blue, Ph.D. of Management Systems International.
Comments regarding this publication can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
7
1996, Number 4
Performance Monitoring and Evaluation
TIPS
USAID Center for Development Information and Evaluation
USING DIRECT OBSERVATION TECHNIQUES
What is Direct Observation?
Most evaluation teams conduct some fieldwork, observing what's actually going on at
assistance activity sites. Often, this is done informally, without much thought to the
quality of data collection. Direct observation techniques allow for a more systematic,
structured process, using well-designed observation record forms.
Advantages and Limitations
USAID's
reengineering guidance encourages
the use of rapid, low
cost methods for
collecting information on the performance of our development activities.
The main advantage of direct observation is that an event, institution, facility, or
process can be studied in its natural setting, thereby providing a richer understanding
of the subject.
For example, an evaluation team that visits microenterprises is likely to better
understand their nature, problems, and successes after directly observing their
products, technologies, employees, and processes, than by relying solely on
documents or key informant interviews. Another advantage is that it may reveal
conditions, problems, or patterns many informants may be unaware of or unable to
describe adequately.
On the negative side, direct observation is susceptible to observer bias. The very act
of observation also can affect the behavior being studied.
When Is Direct Observation Useful?
Direct observation,
the subject of this
Tips, is one such
method.
Direct observation may be useful:
When performance monitoring data indicate results are not being
accomplished as planned, and when implementation problems are suspected,
but not understood. Direct observation can help identify whether the process
is poorly implemented or required inputs are absent.
When details of an activity's process need to be assessed, such as whether
tasks are being implementing according to standards required for
effectiveness.
When an inventory of physical facilities and inputs is needed and not
available from existing sources.
PN-ABY-208
2
When interview methods are unlikely to elicit
needed information accurately or reliably, either
because the respondents don't know or may be
reluctant to say.
When preparing direct observation forms, consider the
following:
1. Identify in advance the possible response categories for
each item, so that the observer can answer with a simple
Steps in Using Direct Observation
The quality of direct observation can be improved by
following these steps.
Step 1. Determine the focus
Because of typical time and resource constraints, direct
observation has to be selective, looking at a few activities,
events, or phenomena that are central to the evaluation
questions.
For example, suppose an evaluation team intends to study a
few health clinics providing immunization services for
children. Obviously, the team can assess a variety of
areas—physical facilities and surroundings, immunization
activities of health workers, recordkeeping and managerial
services, and community interactions. The team should
narrow its focus to one or two areas likely to generate the
most useful information and insights.
Next, break down each activity, event, or phenomena into
subcomponents. For example, if the team decides to look at
immunization activities of health workers, prepare a list of
the tasks to observe, such as preparation of vaccine,
consultation with mothers, and vaccine administration.
Each task may be further divided into subtasks; for
example, administering vaccine likely includes preparing
the recommended doses, using the correct administration
technique, using sterile syringes, and protecting vaccine
from heat and light during use.
If the team also wants to assess physical facilities and
surroundings, it will prepare an inventory of items to be
observed.
OBSERVATION OF GROWTH
MONITORING SESSION
Name of the Observer
Date
Time
Place
Was the scale set to 0 at the beginning of the growth
session?
Yes______ No ______
How was age determined?
By asking______
From growth chart_______
Other_______
When the child was weighed, was it stripped to
practical limit?
Yes______ No______
Was the weight read correctly?
Yes______No______
Process by which weight and age transferred to record
Health Worker wrote it_____
Someone else wrote it______
Other______
Did Health Worker interpret results for the mother?
Yes_______No_______
Step 2. Develop direct observation forms
The observation record form should list the items to be
observed and provide spaces to record observations. These
forms are similar to survey questionnaires, but
investigators record their own observations, not
respondents' answers.
Observation record forms help standardize the observation
process and ensure that all important items are covered.
They also facilitate better aggregation of data gathered
from various sites or by various investigators. An excerpt
from a direct observation form used in a study of primary
health care in the Philippines provides an illustration below.
yes or no, or by checking the appropriate answer. Closed
response categories help minimize observer variation, and
therefore improve the quality of data.
2. Limit the number of items in a form. Forms should
normally not exceed 40–50 items. If nessary, it is better to
use two or more smaller forms than a single large one that
runs several pages.
3
3. Provide adequate space to record additional observations
for which response categories were not determined.
4. Use of computer software designed to create forms can
be very helpful. It facilitates a neat, unconfusing form that
can be easily completed.
People and organizations follow daily routines associated
with set times. For example, credit institutions may accept
loan applications in the morning; farmers in tropical
climates may go to their fields early in the morning and
return home by noon. Observation periods should reflect
work rhythms.
Step 3. Select the sites
Step 5. Conduct the field observation
Once the forms are ready, the next step is to decide where
the observations will be carried out and whether it will be
based on one or more sites.
Establish rapport. Before embarking on direct observation,
a certain level of rapport should be established with the
people, community, or organization to be studied. The
presence of outside observers, especially if officials or
experts, may generate some anxiety among those being
observed. Often informal, friendly conversations can
reduce anxiety levels.
A single site observation may be justified if a site can be
treated as a typical case or if it is unique. Consider a
situation in which all five agricultural extension centers
established by an assistance activity have not been
performing well. Here, observation at a single site may be
justified as a typical case. A single site observation may
also be justified when the case is unique; for example, if
only one of five centers had been having major problems,
and the purpose of the evaluation is trying to discover why.
However, single site observations should be avoided
generally, because cases the team assumes to be typical or
unique may not be. As a rule, several sites are necessary to
obtain a reasonable understanding of a situation.
In most cases, teams select sites based on experts' advice.
The investigator develops criteria for selecting sites, then
relies on the judgment of knowledgeable people. For
example, if a team evaluating a family planning project
decides to observe three clinics—one highly successful,
one moderately successful, and one struggling clinic—it
may request USAID staff, local experts, or other
informants to suggest a few clinics for each category. The
team will then choose three after examining their
recommendations. Using more than one expert reduces
individual bias in selection.
Alternatively, sites can be selected based on data from
performance monitoring. For example, activity sites
(clinics, schools, credit institutions) can be ranked from
best to worst based on performance measures, and then a
sample drawn from them.
Step 4. Decide on the best timing
Timing is critical in direct observation, especially when
events are to be observed as they occur. Wrong timing can
distort findings. For example, rural credit
Also, let them know the purpose of the observation is not to
report on individuals' performance, but to find out what
kind of problems in general are being encountered.
Allow sufficient time for direct observation. Brief visits can
be deceptive partly because people tend to behave
differently in the presence of observers. It is not
uncommon, for example, for health workers to become
more caring or for extension workers to be more
persuasive when being watched. However, if observers
stay for relatively longer periods, people become less selfconscious and gradually start behaving naturally. It is
essential to stay at least two or three days on a site to
gather valid, reliable data.
Use a team approach. If possible, two observers should
observe together. A team can develop more
comprehensive, higher quality data, and avoid individual
bias.
Train observers. If many sites are to be observed,
nonexperts can be trained as observers, especially if
observation forms are clear, straightforward, and mostly
closed-ended.
Step 6. Complete forms
Take notes as inconspicuously as possible. The best time
for recording is during observation. However, this is not
always feasible because it may make some people selfconscious or disturb the situation. In these cases, recording
should take place as soon as possible after observation.
Step 7. Analyze the data
organizations receive most loan applications during the
planting season, when farmers wish to purchase
agricultural inputs. If credit institutions are observed during
the nonplanting season, an inaccurate picture of loan
processing may result.
Data from close-ended questions from the observation
form can be analyzed using basic procedures such as
frequency counts and cross-tabulations. Statistical software
packages such as SAS or SPSS facilitate such statistical
analysis and data display.
4
Analysis of any open-ended interview questions can also
provide extra richness of understanding and insights. Here,
use of database management software with text storage
capabilities, such as dBase, can be useful.
Step 8. Check for reliability and validity.
Direct observation techniques are susceptible to error and
bias that can affect reliability and validity. These can be
minimized by following some of the procedures suggested,
such as checking the representativeness of the sample of
Direct Observation of Primary
Health Care Services in the Philippines
An example of structured direct observation was an
effort to identify deficiencies in the primary health
care system in the Philippines. It was part of a
larger, multicountry research project, the Primary
Health Care Operations Research Project (PRICOR).
The evaluators prepared direct observation forms
covering the activities, tasks, and subtasks health
workers must carry out in health clinics to
accomplish clinical objectives. These forms were
closed-ended and in most cases observations could
simply be checked to save time. The team looked at
18 health units from a "typical" province, including
samples of units that were high, medium and low
performers in terms of key child survival outcome
indicators.
The evaluation team identified and quantified many
problems that required immediate government
attention. For example, in 40 percent of the cases
where followup treatment was required at home,
health workers failed to tell mothers the timing and
amount of medication required. In 90 percent of
cases, health workers failed to explain to mothers the
results of child weighing and growth plotting, thus
missing the opportunity to involve mothers in the
nutritional care of their child. Moreover, numerous
errors were made in weighing and plotting.
This case illustrates that use of closed-ended
observation instruments promotes the reliability and
consistency of data. The findings are thus more
credible and likely to influence program managers to
make needed improvements.
sites selected; using closed-ended, unambiguous response
categories on the observation forms, recording observations
promptly, and using teams of observers at each site.
Selected Further Reading
Information in this Tips is based on "Rapid Data Collection
Methods for Field Assessments" by Krishna Kumar, in
Team Planning Notebook for Field-Based Program
Assessments (USAID PPC/CDIE, 1991).
For more on direct observation techniques applied to the
Philippines health care system, see Stewart N. Blumenfeld,
Manuel Roxas, and Maricor de los Santos, "Systematic
Observation in the Analysis of Primary Health Care
Services," in Rapid Appraisal Methods, edited by Krishna
Kumar (The World Bank:1993)
CDIE's Tips series provide advice and suggestions to
USAID managers on how to plan and conduct
performance monitoring and evaluation activities.
They are supplemental references to the reengineering
automated directives system (ADS), chapter 203. For
further information, contact Annette Binnendijk, CDIE
Senior Evaluation Advisor, phone (703) 875–4235, fax
(703) 875–4866, or e-mail. Tips can be ordered from
the Development Information Services Clearinghouse
by calling (703) 351-4006 or by faxing (703) 351–4039.
Please refer to the PN number. To order via Internet,
address requests to
docorder@disc.mhs.compuserve.com
NUMBER 5
2ND EDITION, 2010
PERFORMANCE MONITORING & EVALUATION
TIPS
USING RAPID APPRAISAL METHODS
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to performance
monitoring and evaluation. This publication is a supplemental reference to the Automated Directive
System (ADS) Chapter 203.
WHAT IS RAPID
APPRAISAL?
Rapid Appraisal (RA) is an approach
that draws on multiple evaluation
methods and techniques to quickly,
yet systematically, collect data when
time in the field is limited. RA
practices are also useful when there
are budget constraints or limited
availability of reliable secondary
data. For example, time and budget
limitations may preclude the option
of using representative sample
surveys.
BENEFITS – WHEN TO USE
RAPID APPRAISAL
METHODS
Rapid appraisals are quick and can
be done at relatively low cost.
Rapid appraisal methods can help
gather, analyze, and report relevant
information for decision-makers
within days or weeks. This is not
possible with sample surveys. RAs
can be used in the following cases:
• for formative evaluations, to make
mid-course corrections in project
design or implementation when
customer or partner feedback
indicates a problem (See ADS
203.3.6.1);
• when a key management decision
is required and there is inadequate
information;
• for performance monitoring, when
data are collected and the
techniques are repeated over time
for measurement purposes;
• to better understand the issues
behind performance monitoring
data; and
• for project pre-design assessment.
LIMITATIONS – WHEN
RAPID APPRAISALS ARE
NOT APPROPRIATE
Findings from rapid appraisals may
have limited reliability and validity,
and cannot be generalized to the
larger population. Accordingly,
1
rapid appraisal should not be the
sole basis for summative or impact
evaluations. Data can be biased and
inaccurate unless multiple methods
are used to strengthen the validity
of findings and careful preparation is
undertaken prior to beginning field
work.
WHEN ARE RAPID
APPRAISAL
METHODS
APPROPRIATE?
Choosing between rapid appraisal
methods for an assessment or more
time-consuming methods, such as
sample surveys, should depend on
balancing several factors, listed
below.
• Purpose of the study. The
importance and nature of the
decision depending on it.
• Confidence in results. The
accuracy, reliability, and validity of
findings needed for management
decisions.
• Time frame. When a decision
must be made.
• Resource constraints (budget).
• Evaluation questions to be
answered. (see TIPS 3: Preparing
an Evaluation Statement of Work)
USE IN TYPES OF
EVALUATION
Rapid appraisal methods are often
used in formative evaluations.
Findings are strengthened when
evaluators
use
triangulation
(employing more than one data
collection method) as a check on
the validity of findings from any one
method.
Rapid appraisal methods are also
used in the context of summative
evaluations. The data from rapid
appraisal methods and techniques
complement the use of quantitative
methods such as surveys based on
representative
sampling.
For
example, a randomized survey of
small holder farmers may tell you
that farmers have a difficult time
selling their goods at market, but
may not have provide you with the
details of why this is occurring. A
researcher
could
then
use
interviews
with
farmers
to
determine the details necessary to
construct a more complete theory
of why it is difficult for small holder
farmers to sell their goods.
KEY PRINCIPLES
FOR ENSURING
USEFUL RAPID
APPRAISAL DATA
COLLECTION
No set of rules dictates which
methods and techniques should be
used in a given field situation;
however, a number of key principles
can be followed to ensure the
collection of useful data in a rapid
appraisal.
• Preparation is key. As in any
evaluation, the evaluation design
and selection of methods must
begin
with
a
thorough
understanding of the evaluation
questions and the client’s needs
for evaluative information. The
client’s intended uses of data must
guide the evaluation design and
the types of methods that are
used.
• Triangulation increases the validity
of findings. To lessen bias and
strengthen the validity of findings
from rapid appraisal methods and
techniques, it is imperative to use
multiple methods. In this way,
data collected using one method
can be compared to that collected
using other methods, thus giving a
researcher the ability to generate
valid and reliable findings. If, for
example, data collected using Key
Informant Interviews reveal the
same findings as data collected
from Direct Observation and
Focus Group Interviews, there is
less chance that the findings from
the first method were due to
researcher bias or due to the
findings being outliers. Table 1
summarizes
common
rapid
appraisal methods and suggests
how findings from any one
method can be strengthened by
the use of other methods.
COMMON RAPID
APPRAISAL
METHODS
INTERVIEWS
This method involves one-on-one
interviews with individuals or key
informants selected for their
knowledge or diverse views.
Interviews are qualitative, in-depth
and semi-structured.
Interview
guides are usually used and
2
EVALUATION METHODS
COMMONLY USED IN RAPID
APPRAISAL
• Interviews
• Community Discussions
• Exit Polling
• Transect Walks (see p. 3)
• Focus Groups
• Minisurveys
• Community Mapping
• Secondary Data Collection
• Group Discussions
• Customer Service Surveys
• Direct Observation
questions may be further framed
during the interview, using subtle
probing techniques.
Individual
interviews may be used to gain
information on a general topic but
cannot provide the in-depth inside
knowledge on evaluation topics that
key informants may provide.
MINISURVEYS
A minisurvey consists of interviews
with between five to fifty individuals,
usually
selected
using
nonprobability sampling (sampling in
which respondents are chosen based
on their understanding of issues
related to a purpose or specific
questions, usually used when sample
sizes are small and time or access to
areas is limited).
Structured
questionnaires are used with a
limited number of close-ended
questions.
Minisurveys generate
quantitative data that can often be
collected and analyzed quickly.
FOCUS GROUPS
The focus group is a gathering of a
homogeneous body of five to twelve
participants to discuss issues and
experiences among themselves.
These are used to test an idea or to
get a reaction on specific topics. A
moderator introduces the topic,
stimulates
and
focuses
the
discussion, and prevents domination
of discussion by a few, while another
documents
the
evaluator
conversation.
THE ROLE OF TECHNOLOGY
IN RAPID APPRAISAL
Certain equipment and technologies
can aid the rapid collection of data
and help to decrease the incidence of
errors. These include, for example,
hand held computers or personal
digital assistants (PDAs) for data
input,
cellular
phones,
digital
recording devices for interviews,
videotaping and photography, and the
use of geographic information systems
(GIS) data and aerial photographs.
GROUP DISCUSSIONS
This method involves the selection
of approximately five participants
who are knowledgeable about a
given topic and are comfortable
enough with one another to freely
discuss the issue as a group. The
moderator introduces the topic and
keeps the discussion going while
another evaluator records the
discussion. Participants talk among
each other rather than respond
directly to the moderator.
COMMUNITY DISCUSSIONS
This method takes place at a public
meeting that is open to all
community members; it can be
successfully moderated with as
many as 100 or more people. The
primary interaction is between the
participants while the moderator
leads the discussion and asks
questions following a carefully
prepared interview guide.
DIRECT OBSERVATION
Teams of observers record what
they hear and see at a program site
using a detailed observation form.
Observation may be of the physical
surrounding or of ongoing activities,
processes, or interactions.
COLLECTING SECONDARY
DATA
This method involves the on-site
collection of existing secondary
data, such as export sales, loan
information, health service statistics,
etc. These data are an important
augmentation
to
information
collected using qualitative methods
such as interviews, focus groups, and
community discussions.
The
3
evaluator must be able to quickly
determine the validity and reliability
of the data. (see TIPS 12: Indicator
and Data Quality)
TRANSECT WALKS
The transect walk is a participatory
approach in which the evaluator
asks a selected community member
to walk with him or her, for
example, through the center of
town, from one end of a village to
the other, or through a market.
The evaluator asks the individual,
usually a key informant, to point out
and
discuss
important
sites,
neighborhoods, businesses, etc., and
to discuss related issues.
COMMUNITY MAPPING
Community mapping is a technique
that requires the participation of
residents on a program site. It can
be used to help locate natural
resources, routes, service delivery
points, regional markets, trouble
spots, etc., on a map of the area, or
to use residents’ feedback to drive
the development of a map that
includes such information.
COMMON RAPID APPRAISAL METHODS
Table 1
Method
Useful for
Providing
Example
Advantages
Limitations
Further
References
INDIVIDUAL INTERVIEWS
Interviews
− A general overview of
the topic from
someone who has a
broad knowledge and
in-depth experience
and understanding
(key informant) or indepth information on
a very specific topic or
subtopic (individual)
− Suggestions and
recommendations to
improve key aspects
of a program
Minisurveys
− Quantitative data on
narrowly focused
questions, for a
relatively
homogeneous
population, when
representative
sampling is not
possible or required
Key informant:
Interview with
program
implementation
director
− Provides in-depth, − Susceptible to
inside information
interviewer and
on specific issues
selection biases
from the
− Individual
individuals
interviews lack the
perspective and
broader
experience
understanding and
− Flexibility permits
insight that a key
exploring
informant can
unanticipated
provide
topics
Interview with
director of a regional
trade association
Individual:
Interview with an
activity manager within
an overall
− Easy to administer
development program
− Low cost
Interview with a local
entrepreneur trying to
enter export trade
− A customer service
assessment
− Quantitative data
from multiple
respondents
− Rapid exit interviews
after voting
− Low cost
TIPS No. 2,
Conducting Key
Informant Interviews
K. Kumar, Conducting
Key Informant Surveys
in Developing
Countries, 1986
Bamberger, Rugh, and
Mabry, Real World
Evaluation, 2006
UNICEF Website: M&E
Training Modules:
Overview of RAP
Techniques
− Findings are less
generalizable than
those from sample
surveys unless the
universe of the
population is
surveyed
TIPS No. 9,
Conducting Customer
Service Assessments
K. Kumar, Conducting
Mini Surveys in
Developing Countries,
1990
Bamberger, Rugh, and
Mabry, RealWorld
Evaluation, 2006 on
purposeful sampling
− Quick data on
attitudes, beliefs,
behaviors of
beneficiaries or
partners
GROUP INTERVIEWS
Focus Groups
− Customer views on
services, products,
benefits
− Information on
implementation
problems
− Suggestions and
recommendations for
improving specific
activities
− Discussion on
− Group discussion
experience related
may reduce
to a specific program
inhibitions,
intervention
allowing free
exchange of ideas
− Effects of a new
business regulation
− Low cost
or proposed price
changes
4
− Discussion may be
dominated by a
few individuals
unless the process
is facilitated/
managed well
TIPS No. 10,
Conducting Focus
Group Interviews
K. Kumar, Conducting
Group Interviews in
Developing Countries,
1987
T. Greenbaum,
Moderating Focus
Groups: A Practical
Guide for Group
Facilitation, 2000
Group
Discussions
Community
Discussions
− Understanding of
issues from different
perspectives and
experiences of
participants from a
specific subpopulation
− Discussion with
young women on
access to prenatal
and infant care
− Small group size
allows full
participation
− Discussion with
entrepreneurs about
export regulations
− Understanding of an
− A Town Hall
issue or topic from a
meeting
wide range of
participants from key
evaluation sites within
a village, town, city, or
city neighborhood
− Findings cannot be Bamberger, Rugh, and
Mabry, RealWorld
generalized to a
Evaluation, 2006
larger population
UNICEF Website: M&E
Training Modules:
Community Meetings
− Allows good
understanding of
specific topics
− Low cost
− Yields a wide
range of opinions
on issues
important to
participants
− Findings cannot be
generalized to
larger population
or to
subpopulations of
concern
Bamberger, Rugh, and
Mabry, RealWorld
Evaluation, 2006
− Observer bias
unless two to
three evaluators
observe same
place or activity
TIPS No. 4, Using
Direct Observation
Techniques
− Must be able to
determine
reliability and
validity of data
TIPS No. 12,
Guidelines for
Indicator and Data
Quality
− A great deal of
information can be − Larger groups
obtained at one
difficult to
point of time
moderate
UNICEF Website: M&E
Training Modules:
Community Meetings
ADDITIONAL COMMONLY USED TECHNIQUES
Direct
Observation
− Visual data on physical − Market place to
− Confirms data
infrastructure,
observe goods being
from interviews
supplies, conditions
bought and sold,
− Low cost
who is involved,
− Information about an
sales interactions
agency’s or business’s
delivery systems,
services
WFP Website:
Monitoring & Evaluation
Guidelines: What Is
Direct Observation and
When Should It Be Used?
− Insights into behaviors
or events
Collecting
Secondary
Data
− Validity to findings
gathered from
interviews and group
discussions
− Microenterprise
bank loan info.
− Value and volume of
exports
− Quick, low cost
way of obtaining
important
quantitative data
− Number of people
served by a health
clinic, social service
provider
PARTICIPATORY TECHNIQUES
Transect
Walks
Community
Mapping
− Important visual and
locational information
and a deeper
understanding of
situations and issues
− Walk with key
informant from one
end of a village or
urban neighborhood
to another, through
a market place, etc.
− Info. on locations
important for data
collection that could
be difficult to find
− Insiders viewpoint − Susceptible to
interviewer and
− Quick way to find
selection biases
out location of
places of interest
to the evaluator
− Low cost
− Map of village and
− Important
− Rough locational
surrounding area
locational data
information
with locations of
when there are no
markets, water and
detailed maps of
fuel sources, conflict
the program site
− Quick comprehension
areas, etc.
on spatial location of
services/resources in a
region which can give
insight to access issues
5
Bamberger, Rugh, and
Mabry, Real World
Evaluation, 2006
UNICEF Website: M&E
Training Modules:
Overview of RAP
Techniques
Bamberger, Rugh, and
Mabry, Real World
Evaluation, 2006
UNICEF Website: M&E
Training Modules:
Overview of RAP
Techniques
References Cited
M. Bamberger, J. Rugh, and L. Mabry, Real World Evaluation. Working Under Budget, Time, Data, and Political
Constraints. Sage Publications, Thousand Oaks, CA, 2006.
T. Greenbaum, Moderating Focus Groups: A Practical Guide for Group Facilitation. Sage Publications, Thousand Oaks,
CA, 2000.
K. Kumar, “Conducting Mini Surveys in Developing Countries,” USAID Program Design and Evaluation Methodology
Report No. 15, 1990 (revised 2006).
K. Kumar, “Conducting Group Interviews in Developing Countries,” USAID Program Design and Evaluation
Methodology Report No. 8, 1987.
K. Kumar, “Conducting Key Informant Interviews in Developing Countries,” USAID Program Design and Evaluation
Methodology Report No. 13, 1989.
For more information:
TIPS publications are available online at [insert website].
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication including USAID’s Office of
Management Policy, Budget and Performance (MPBP). This publication was authored by Patricia Vondal, PhD., of
Management Systems International.
Comments regarding this publication can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
6
NUMBER 6
2ND EDITION, 2010
PERFORMANCE MONITORING & EVALUATION
TIPS
SELECTING PERFORMANCE INDICATORS
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
performance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive System (ADS) Chapter 203.
WHAT ARE
PERFORMANCE
INDICATORS?
Performance indicators define a
measure of change for the
results identified in a Results
Framework (RF). When wellchosen, they convey whether
key objectives are achieved in a
meaningful
way
for
performance
management.
While a result (such as an
Assistance Objective or an
Intermediate Result) identifies
what we hope to accomplish,
indicators tell us by what
standard that result will be
measured.
Targets define
whether there will be an
expected increase or decrease,
and by what magnitude.1
Indicators may be quantitative
or
qualitative
in
nature.
Quantitative
indicators
are
numerical: an example is a
person’s height or weight. On
the other hand, qualitative
indicators require subjective
evaluation. Qualitative data are
sometimes
reported
in
numerical form, but those
numbers do not have arithmetic
meaning on their own. Some
examples are a score on an
institutional capacity index or
progress along a milestone
scale.
When developing
quantitative
or
qualitative
indicators, the important point
is that the indicator be
1
For further information, see TIPS 13:
Building a Results Framework and TIPS
8: Baselines and Targets.
1
Selecting an optimal set of indicators
to track progress against key results
lies at the heart of an effective
performance management system.
This TIPS provides guidance on how to
select effective performance
indicators.
constructed in a way that
permits
consistent
measurement over time.
USAID has developed many
performance indicators over the
years. Some examples include
the dollar value of nontraditional
exports,
private
investment as a percentage of
gross
domestic
product,
contraceptive prevalence rates,
child mortality rates, and
progress on a legislative reform
index.
WHY ARE
PERFORMANCE
INDICATORS
IMPORTANT?
FOR WHAT RESULTS
ARE PERFORMANCE
INDICATORS
REQUIRED?
Performance indicators provide
objective evidence that an
intended change is occurring.
Performance indicators lie at
the heart of developing an
effective
performance
management system – they
define the data to be collected
and enable actual results
achieved to be compared with
planned results over time.
Hence,
they
are
an
indispensable management tool
for making evidence-based
decisions
about
program
strategies
and
activities.
Performance indicators can also
be used:
THE PROGRAM LEVEL
USAID’s ADS requires that at
least one indicator be chosen
for each result in the Results
Framework in order to measure
progress (see ADS 203.3.3.1)2.
This includes the Assistance
Objective (the highest-level
objective
in
the
Results
Framework)
as
well
as
supporting Intermediate Results
(IRs)3. These indicators should
be included in the Mission or
Office
Performance
Management Plan (PMP) (see
TIPS 8: Preparing a PMP).





To assist managers in
focusing
on
the
achievement
of
development results.
To
provide
objective
evidence that results are
being achieved.
To orient and motivate staff
and
partners
toward
achieving results.
To communicate USAID
achievements
to
host
country counterparts, other
partners, and customers.
To more effectively report
results achieved to USAID's
stakeholders, including the
U.S. Congress, Office of
Management and Budget,
and citizens.
PROJECT LEVEL
AO teams are required to
collect data regularly for
projects and activities, including
inputs, outputs, and processes,
to ensure they are progressing
as
expected
and
are
contributing to relevant IRs and
AOs. These indicators should
be included in a project-level
monitoring
and
evaluation
2
For further discussion of AOs and IRs
(which are also termed impact and
outcomes respectively in other
systems) refer to TIPS 13: Building a
Results Framework.
3
Note that some results frameworks
incorporate IRs from other partners if
those results are important for USAID
to achieve the AO. This is discussed in
further detail in TIPS 13: Building a
Results Framework. If these IRs are
included, then it is recommended that
they be monitored, although less
rigorous standards apply.
2
(M&E) plan. The M&E plan
should be integrated in project
management and reporting
systems (e.g., quarterly, semiannual, or annual reports).
TYPES OF
INDICATORS IN
USAID SYSTEMS
Several different types of
indicators are used in USAID
systems.
It is important to
understand the different roles
and
functions
of
these
indicators so that managers can
construct
a
performance
management
system
that
effectively
meets
internal
management
and
Agency
reporting needs.
CUSTOM INDICATORS
Custom
Indicators
are
performance indicators that
reflect progress within each
unique country or program
context. While they are useful
for managers on the ground,
they
often
cannot
be
aggregated across a number of
programs
like
standard
indicators.
Example:
Progress on a
milestone scale reflecting
legal
reform
and
implementation to ensure
credible elections, as follows:
 Draft law is developed in
consultation with nongovernmental
organizations (NGOs) and
political parties.
 Public input is elicited.
PARTICIPATION IS ESSENTIAL
Experience suggests that
participatory approaches are an
essential aspect of developing and
maintaining effective performance
management systems. Collaboration
with development partners
(including host country institutions,
civil society organizations (CSOs),
and implementing partners) as well
as customers has important benefits.
It allows you to draw on the
experience of others, obtains buy-in
to achieving results and meeting
targets, and provides an opportunity
to ensure that systems are as
streamlined and practical as possible.
 Draft law is modified based
on feedback.
 The secretariat presents
the draft to the Assembly.
 The law is passed by the
Assembly.
 The
appropriate
government
body
completes internal policies
or
regulations
to
implement the law.
The example above would differ
for each country depending on
its unique process for legal
reform.
STANDARD INDICATORS
Standard indicators are used
primarily for Agency reporting
purposes. Standard indicators
produce data that can be
aggregated
across
many
programs. Optimally, standard
indicators meet both Agency
reporting and on-the-ground
management needs. However,
in
many
cases,
standard
indicators do not substitute for
performance
(or
custom
indicators) because they are
designed to meet different
needs. There is often a tension
between measuring a standard
across many programs and
selecting indicators that best
reflect true program results and
that can be used for internal
management purposes.
Example: Number of Laws or
Amendments
to
Ensure
Credible Elections Adopted
with
USG
Technical
Assistance.
In comparing the standard
indicator above with the
previous example of a custom
indicator, it becomes clear that
the custom indictor is more
likely to be useful as a
management tool, because it
provides greater specificity and
is more sensitive to change.
Standard indicators also tend to
measure change at the output
level, because they are precisely
the types of measures that are,
at face value, more easily
aggregated
across
many
programs, as the following
example demonstrates.
Example:
The number of
people trained in policy and
regulatory practices.
CONTEXTUAL INDICATORS
Contextual indicators are used
to understand the broader
environment
in
which
a
program operates, to track
assumptions, or to examine
externalities that may affect
success, failure, or progress.
3
INDICATORS AND DATA—SO
WHAT’S THE DIFFERENCE?
Indicators define the particular
characteristic or dimension that will
be used to measure change. Height
is an example of an indicator.
The data are the actual
measurements or factual information
that result from the indicator. Five
feet seven inches is an example of
data.
They do not represent program
performance,
because
the
indicator measures very highlevel change.
Example:
Score on the
Freedom House Index or
Gross
Domestic
Product
(GDP).
This sort of indicator may be
important
to
track
to
understand the context for
USAID programming (e.g. a
severe drop in GDP is likely to
affect
economic
growth
programming), but represents a
level of change that is outside
the manageable interest of
program managers. In most
cases, it would be difficult to
say that USAID programming
has affected the overall level of
freedom within a country or
GDP (given the size of most
USAID programs in comparison
to the host country economy,
for example).
WHAT ARE USAID’S
CRITERIA FOR
SELECTING
INDICATORS?
USAID policies (ADS 203.3.4.2)
identify seven key criteria to
guide
the
selection
of
performance indicators:







Direct
Objective
Useful for Management
Attributable
Practical
Adequate
Disaggregated, as necessary
These criteria are designed to
assist managers in selecting
optimal indicators. The extent
to
which
performance
indicators meet each of the
criteria must be consistent with
the requirements of good
management.
As managers
consider these criteria, they
should use a healthy measure
of
common
sense
and
reasonableness.
While we
always
want
the
―best‖
indicators, there are inevitably
trade-offs
among
various
criteria. For example, data for
the most direct or objective
indicators of a given result
might be very expensive to
collect or might be available
too infrequently. Table 1
includes a summary checklist
that can be used during the
selection process to assess
these trade-offs.
Two
overarching
factors
determine the extent to which
performance indicators function
as useful tools for managers
and decision-makers:


The degree to which
performance
indicators
accurately
reflect
the
process or phenomenon
they are being used to
measure.
The level of comparability of
performance indicators over
time: that is, can we
measure
results
in
a
consistent and comparable
manner over time?
1. DIRECT
An indicator is direct to the
extent that it clearly measures
the intended result.
This
criterion is, in many ways, the
most important. While this may
appear to be a simple concept,
it is one of the more common
problems
with
indicators.
Indicators should either be
widely accepted for use by
specialists in a subject area,
exhibit readily understandable
face validity (i.e., be intuitively
understandable),
or
be
supported
by
research.
Managers should place greater
confidence in indicators that are
direct. Consider the following
example:
Result:
Increased
Transparency of Key Public
Sector Institutions
4
Indirect Indicator: Passage
of
the
Freedom
of
Information Act (FOIA)
Direct Indicator: Progress
on
a
milestone
scale
demonstrating
enactment
and enforcement of policies
that require open hearings
The passage of FOIA, while an
important step, does not
actually measure whether a
target institution is more
transparent.
The
better
example outlined above is a
more direct measure.
Level
Another dimension of whether
an indicator is direct relates to
whether it measures the right
level of the objective. A
common problem is that there
is often a mismatch between
the stated result and the
indicator. The indicator should
not measure a higher or lower
level than the result.
For example, if a program
measures
improved
management practices through
the real value of agricultural
production, the indicator is
measuring a higher-level effect
than is stated (see Figure 1).
Understanding levels is rooted
in
understanding
the
development
hypothesis
inherent
in
the
Results
Framework (see TIPS 13:
Building a Results Framework).
Tracking indicators at each level
facilitates better understanding
and analysis of whether the
Figure 1. Levels
RESULT
INDICATOR
Increased
Production
Real value of
agricultural
production.
Improved
Management
Practices
Number and
percent of
farmers using a
new technology.
Improved
Knowledge
and
Awareness
Number and
percent of
farmers who can
identify five out
of eight steps
for
implementing a
new technology.
development hypothesis is
working.
For example, if
farmers are aware of how to
implement a new technology,
but the number or percent that
actually use the technology is
not increasing, there may be
other issues that need to be
addressed.
Perhaps the
technology is not readily
available in the community, or
there is not enough access to
credit. This flags the issue for
managers and provides an
opportunity
to
make
programmatic adjustments.
Proxy Indicators
Proxy indicators are linked to
the result by one or more
assumptions. They are often
used when the most direct
indicator is not practical (e.g.,
data collection is too costly or
the
program
is
being
implemented in a conflict zone).
When proxies are used, the
relationship
between
the
indicator and the result should
be well-understood and clearly
articulated.
The
more
assumptions the indicator is
based upon, the weaker the
indicator.
Consider
the
following examples:
Result: Increased Household
Income
Proxy Indicator:
Dollar
value
of
household
expenditures
The proxy indicator above
makes the assumption that an
increase in income will result in
increased
household
expenditures; this assumption is
well-grounded in research.
Result: Increased Access to
Justice
Proxy Indicator: Number of
new courts opened
The indicator above is based on
the assumption that physical
access to new courts is the
fundamental
development
problem—as
opposed
to
corruption, the costs associated
with using the court system, or
lack of knowledge of how to
obtain legal assistance and/or
use court systems. Proxies can
be used when assumptions are
clear and when there is research
to support that assumption.
2. OBJECTIVE
An indicator is objective if it is
unambiguous about 1) what is
being measured and 2) what
5
data are being collected. In
other words, two people should
be able to collect performance
information for the same
indicator and come to the same
conclusion.
Objectivity is
critical to collecting comparable
data over time, yet it is one of
the most common problems
noted in audits.
As a result,
pay particular attention to the
definition of the indicator to
ensure that each term is clearly
defined, as the following
examples demonstrate:
Poor Indicator: Number of
successful firms
Objective Indicator:
Number of firms with an
annual increase in revenues
of at least 5%
The better example outlines the
exact
criteria
for
how
―successful‖ is defined and
ensures that changes in the
data are not attributable to
differences in what is being
counted.
Objectivity can be particularly
challenging when constructing
qualitative indicators.
Good
qualitative indicators permit
regular, systematic judgment
about progress and reduce
subjectivity (to the extent
possible).
This means that
there must be clear criteria or
protocols for data collection.
3. USEFUL FOR
MANAGEMENT
An indicator is useful to the
extent that it provides a
meaningful measure of change
over time for management
decision-making. One aspect of
usefulness is to ensure that the
indicator is measuring the ―right
change‖ in order to achieve
development results.
For
example, the number of
meetings between Civil Society
Organizations
(CSOs)
and
government is something that
can be counted but does not
necessarily reflect meaningful
change. By selecting indicators,
managers are defining program
success in concrete ways.
Managers
will
focus
on
achieving targets for those
indicators, so it is important to
consider the intended and
unintended incentives
that
performance indicators create.
As a result, the system may
need to be fine-tuned to ensure
that incentives are focused on
achieving true results.
A second dimension is whether
the indictor measures a rate of
change that is useful for
management purposes.
This
means that the indicator is
constructed so that change can
be monitored at a rate that
facilitates management actions
(such as corrections and
improvements).
Consider the
following examples:
Result:
Targeted legal
reform
to
promote
investment
Less Useful for
Management: Number of
laws passed to promote
direct investment.
More Useful for
Management: Progress
toward targeted legal reform
based on the following
stages:
Stage 1. Interested groups
propose that legislation is
needed on issue.
Stage 2. Issue is introduced
in the relevant legislative
committee/executive
ministry.
Stage 3.
Legislation is
drafted
by
relevant
committee
or
executive
ministry.
Stage 4.
Legislation is
debated by the legislature.
Stage 5.
Legislation is
passed by full approval
process needed in legislature.
Stage 6.
Legislation is
approved by the executive
branch (where necessary).
Stage 7.
Implementing
actions are taken.
Stage 8. No immediate need
identified for amendments to
the law.
The less useful example may be
useful for reporting; however, it
is so general that it does not
provide a good way to track
progress
for
performance
management. The process of
passing or implementing laws is
a long-term one, so that over
the course of a year or two the
AO team may only be able to
report that one or two such
laws have passed when, in
reality, a high degree of effort is
6
invested in the process. In this
case, the more useful example
better articulates the important
steps that must occur for a law
to be passed and implemented
and facilitates management
decision-making. If there is a
problem in meeting interim
milestones, then corrections
can be made along the way.
4. ATTRIBUTABLE
An indicator is attributable if it
can be plausibly associated with
USAID interventions.
The
concept
of
―plausible
association‖ has been used in
USAID for some time. It does
not mean that X input equals Y
output. Rather, it is based on
the idea that a case can be
made to other development
practitioners that the program
has
materially
affected
identified change.
It is
important to consider the logic
behind what is proposed to
ensure attribution. If a Mission
is piloting a project in three
schools, but claims national
level
impact
in
school
completion, this would not pass
the
common
sense
test.
Consider
the
following
examples:
Result: Improved Budgeting
Capacity
Less Attributable: Budget
allocation for the Ministry of
Justice (MOJ)
More Attributable:
The
extent to which the budget
produced by the MOJ meets
established criteria for good
budgeting
If the program works with the
Ministry of Justice to improve
budgeting
capacity
(by
providing technical assistance
on budget analysis), the quality
of the budget submitted by the
MOJ may improve. However, it
is often difficult to attribute
changes in the overall budget
allocation
to
USAID
interventions, because there are
a number of externalities that
affect a country’s final budget –
much like in the U.S.
For
example, in tough economic
times, the budget for all
government institutions may
decrease. A crisis may emerge
that requires the host country
to reallocate resources. The
better example above is more
attributable (and directly linked)
to USAID’s intervention.
5. PRACTICAL
A practical indicator is one for
which data can be collected on a
timely basis and at a reasonable
cost. There are two dimensions
that determine whether an
indicator is practical. The first is
time and the second is cost.
Time
Consider whether resulting data
are available with enough
frequency for management
purposes (i.e., timely enough to
correspond
to
USAID
performance management and
reporting purposes). Second,
examine whether data are
current when available.
If
reliable data are available each
year, but the data are a year
old, then it may be problematic.
Cost
Performance indicators should
provide data to managers at a
cost that is reasonable and
appropriate as compared with
the management utility of the
data. As a very general rule of
thumb, it is suggested that
between 5% and 10% of
program or project resources
be allocated for monitoring and
evaluation (M&E) purposes.
However, it is also important to
consider priorities and program
context.
A program would
likely be willing to invest more
resources in measuring changes
that are central to decisionmaking and less resources in
measuring more tangential
results.
A more mature
program may have to invest
more in demonstrating higherlevel changes or impacts as
compared to a new program.
6. ADEQUATE
Taken as a group, the indicator
(or set of indicators) should be
sufficient to measure the stated
result. In other words, they
should be the minimum
number necessary and costeffective
for
performance
management. The number of
indicators
required
to
adequately measure a result
depends on 1) the complexity
of the result being measured, 2)
the amount of information
needed to make reasonably
confident decisions, and 3) the
7
level of resources available.
Too many indicators create
information
overload
and
become overly burdensome to
maintain. Too few indicators
are also problematic, because
the data may only provide a
partial or misleading picture of
performance. The following
demonstrates
how
one
indicator can be adequate to
measure the stated objective:
Result: Increased Traditional
Exports in Targeted Sectors
Adequate Indicator: Value
of traditional exports in
targeted sectors
In
contrast,
an
objective
focusing on improved maternal
health may require two or three
indicators to be adequate. A
general rule of thumb is to
select between two and three
performance indicators per
result. If many more indicators
are needed to adequately cover
the result, then it may signify
that the objective is not
properly focused.
7. DISAGGREGATED, AS
NECESSARY
The disaggregation of data by
gender, age, location, or some
other dimension is often
important
from
both
a
management and reporting
point of view. Development
programs
often
affect
population
cohorts
or
institutions in different ways.
For example, it might be
important to know to what
extent youth (up to age 25) or
adults (25 and older) are
participating
in
vocational
training, or in which districts
schools
have
improved.
Disaggregated data help track
whether or not specific groups
participate in and benefit from
activities intended to include
them.
In particular, USAID policies
(ADS 203.3.4.3) require that
performance
management
systems and evaluations at the
AO and project or activity levels
include
gender-sensitive
indicators
and
sexdisaggregated data if the
activities or their anticipated
results involve or affect women
and men differently. If so, this
difference
would
be
an
important factor in managing
for sustainable program impact.
Consider the following example:
Result: Increased Access to
Credit
Gender-Sensitive Indicator:
Value of loans disbursed,
disaggregated
by
male/female.
WHAT IS THE
PROCESS FOR
SELECTING
PERFORMANCE
INDICATORS?
Selecting
appropriate
and
useful performance indicators
requires
careful
thought,
iterative refining, collaboration,
and consensus-building. The
following describes a series of
steps
to
select
optimal
performance
indicators4.
Although presented as discrete
steps, in practice some of these
can be effectively undertaken
simultaneously or in a more
iterative manner. These steps
may be applied as a part of a
larger process to develop a new
PMP, or in part, when teams
have to modify individual
indicators.

STEP 1. DEVELOP A
PARTICIPATORY PROCESS
FOR IDENTIFYING
PERFORMANCE INDICATORS
The most effective way to
identify indicators is to set up a
process
that
elicits
the
participation and feedback of a
number
of
partners
and
stakeholders.
This allows
managers to:
A common way to begin the
process is to hold working
sessions. Start by reviewing the
Results Framework.
Next,
identify indicators for the
Assistance
Objective,
then
move down to the Intermediate
Results. In some cases, the AO
team establishes the first round
of indicators and then provides
them to other partners for
input.
In other cases, key
partners may be included in the
working sessions.



Draw on different areas of
expertise.
Ensure
that
indicators
measure the right changes
and represent part of a
larger approach to achieve
development impact.
Build
commitment
and
understanding
of
the
linkage between indicators
and
results.
This
will
increase the utility of the
performance management
system
among
key
stakeholders.
This process focuses on presenting
greater detail related specifically to
indicator selection. Refer to TIPS 7:
Preparing a PMP for a broader set of
steps on how to develop a full PMP.
4
8

Build
capacity
for
performance management
among partners, such as
NGOs and partner country
institutions.
Ensure that systems are as
practical and streamlined as
possible.
Often
development partners can
provide excellent insight on
the
practical
issues
associated with indicators
and data collection.
It is important to task the group
with identifying the set of
minimal indicators necessary
and sufficient to manage the
program effectively. That is, the
group must go through a
process of prioritization in order
to narrow down the list. While
participatory processes may
take more time at the front end,
they almost always result in
more coherent and effective
system.
STEP 2. CLARIFY THE RESULT
Carefully define the result
desired. Good performance
indicators are based on clearly
articulated
and
focused
objectives. Review the precise
wording and intention of the
objective.
Determine what
exactly is meant by the result.
For example, if the result is
―improved
business
environment,‖ what does that
mean? What specific aspects of
the business environment will
be improved? Optimally, the
result should be stated with as
much specificity as possible. If
the result is broad (and the
team doesn’t have the latitude
to change the objective), then
the team might further define
its meaning.
Example:
One AO team
further defined their IR,
―Improved
Business
Environment,‖ as follows:
Making it easier to do
business in terms of resolving
disputes, obtaining licenses
from the government, and
promoting investment.

An identified set of key
policies are in place to
support investment.
Key
policies
include
laws,
regulations,
and
policies
related to the simplification of
investment
procedures,
bankruptcy, and starting a
business.

As the team gains greater
clarity and consensus on what
results are sought, ideas for
potential indicators begin to
emerge.
Be clear about what type of
change is implied. What is
expected
to
change—a
situation, a condition, the level
of knowledge, an attitude, or a
behavior?
For
example,
changing a country's voting
law(s) is very different from
changing citizens' awareness of
their right to vote (which is
different from voting). Each
type of change is measured by
different types of performance
indicators.
Identify more precisely the
specific targets for change. Who
or what are the specific targets
for the change? For example, if
individuals, which individuals?
For an economic growth
program designed to increase
exports, does the program
target all exporters or only
exporters of non-traditional
agricultural products? This is
known as identifying the ―unit
of analysis‖ for the performance
indicator.
STEP 3: IDENTIFY POSSIBLE
INDICATORS
Usually there are many possible
indicators for a particular result,
but some are more appropriate
and useful than others.
In
selecting indicators, don’t settle
too quickly on the first ideas
that come most conveniently or
obviously to mind. Create an
initial list of possible indicators,
using the following approaches:

Conduct a brainstorming
session with colleagues to
draw upon the expertise of
9
the full Assistance Objective
Team. Ask, ―how will we
know if the result is
achieved?‖

Consider other resources.
Many organizations have
databases or indicator lists
for various sectors available
on the internet.

Consult
experts.

Review the PMPs and
indicators
of
previous
programs
or
similar
programs in other Missions.
with
technical
STEP 4. ASSESS THE BEST
CANDIDATE INDICATORS,
USING THE INDICATOR
CRITERIA
Next, from the initial list, select
the
best
candidates
as
indicators.
The seven basic
criteria that can be used to
judge
an
indicator’s
appropriateness and utility
described in the previous
section are summarized in
Table 1. When assessing and
comparing possible indicators,
it is helpful to use this type of
checklist
to
guide
the
assessment
process.
Remember that there will be
trade-offs between the criteria.
For example, the optimal
indicator may not be the most
cost-effective to select.
STEP 5. SELECT THE “BEST”
PERFORMANCE INDICATORS
Select the best indicators to
incorporate in the performance
management system.
They
should be the optimum set of
measures that are useful to
management and can be
obtained at reasonable cost.
Be Strategic and Streamline
Where Possible. In recent years,
there has been a substantial
increase in the number of
indicators used to monitor and
track programs. It is important
to remember that there are
costs, in terms of time and
money, to collect data for each
indicator. AO teams should:

Select indicators based on
strategic thinking about
what must truly be achieved
for program success.

Review
indicators
to
determine whether any final
narrowing can be done. Are
some indicators not useful?
If so, discard them.

Use
participatory
approaches in order to
discuss
and
establish
priorities
that
help
managers focus on key
indicators that are necessary
and sufficient.
Ensure that the rationale for
indicator selection is recorded in
the PMP.
There are rarely
perfect
indicators
in
the
development environment—it
is more often a case of
weighing different criteria and
making the optimal choices for
a particular program.
It is
important to ensure that the
rationale behind these choices
is recorded in the PMP so that
new staff, implementers, or
auditors understand why each
indicator was selected.
STEP 6. FINE TUNE WHEN
NECESSARY
Indicators are part of a larger
system that is ultimately
designed to assist managers in
achieving development impact.
On the one hand, indicators
must remain comparable over
time but, on the other hand,
some refinements will invariably
be needed to ensure the system
is as effective as possible. (Of
course, there is no value in
continuing to collect bad data,
for example.) As a result, these
two issues need to be balanced.
Remember that indicator issues
are often flags for other
10
underlying problems. If a large
number of indicators are
frequently changed, this may
signify a problem with program
management or focus. At the
other end of the continuum, if
no indicators were to change
over a long period of time, it is
possible that a program is not
adapting and evolving as
necessary. In our experience,
some refinements are inevitable
as data are collected and
lessons learned. After some
rounds of data collection are
completed, it is often useful to
discuss indicator issues and
refinements among AO team
members and/or with partners
and
implementers.
In
particular, the period following
portfolio reviews is a good time
to refine PMPs if necessary.
TABLE 1. INDICATOR SELECTION CRITERIA CHECKLIST
Criteria
1. Direct
Definition
Checklist
Direct. The indicator clearly represents the
intended result. An outsider or an expert
in the field would agree that the indicator
is a logical measure for the stated result.
 Level. The indicator reflects the right
level; that is, it does not measure a
higher or lower level than the stated
result.
 Proxies. The indicator is a proxy
measure. If the indicator is a proxy, note
what assumptions the proxy is based
upon.
2. Objective
The indicator is clear and unambiguous
about what is being measured.
3. Useful for
Management
The indicator is useful for management
decision-making.
4. Attributable
The indicator can be plausibly associated
with USAID interventions.
5. Practical
Time. Data are produced with enough
frequency for management purposes (i.e.
timely enough to correspond to USAID
performance management and reporting
purposes). Data are current when
available.
Cost. Data are worth the cost to USAID
managers.
6. Adequate
The indicators, taken as a group, are
sufficient to measure the stated result. All
major aspects of the result are measured.
7. Disaggregated,
as necessary
The indicators are appropriately
disaggregated by gender, age, location, or
some other dimension that is important for
programming. In particular, gender
disaggregation has been considered as
required (see ADS 203.3.4.3).
11
Comments
For more information:
TIPS publications are available online at [insert website].
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication, including Gerry Britan
and Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This
publication was updated by Michelle Adams-Matson of Management Systems International.
Comments can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
12
1996, Number 7
Performance Monitoring and Evaluation
TIPS
USAID Center for Development Information and Evaluation
PREPARING A PERFORMANCE MONITORING PLAN
What Is a Performance Monitoring Plan?
A performance monitoring plan (PMP) is a tool USAID operating units use to
plan and manage the collection of performance data. Sometimes the plan also
includes plans for data analysis, reporting, and use.
USAID's
reengineering
guidance
requires
operating units
to prepare a
Performance
Monitoring Plan
for the
systematic and
timely collection
of performance
data.
This Tips offers
advice for
preparing such a
plan.
Reengineering guidance requires operating units to prepare PMPs once their
strategic plans are approved. At a minimum, PMPs should include:
a detailed definition of each performance indicator
the source, method, frequency and schedule of data collection, and
the office, team, or individual responsible for ensuring data are
available on schedule
As part of the PMP process, it is also advisable (but not mandated) for
operating units to plan for:
how the performance data will be analyzed, and
how it will be reported, reviewed, and used to inform decisions
While PMPs are required, they are for the operating unit's own use. Review by
central or regional bureaus is not mandated, although some bureaus encourage
sharing PMPs. PMPs should be updated as needed to ensure plans, schedules,
and assignments remain current.
Why Are PMPs Important?
A performance monitoring plan is a critical tool for planning, managing, and
documenting data collection. It contributes to the effectiveness of the
performance monitoring system by assuring that comparable data will be
collected on a regular and timely basis. These are essential to the operation of a
credible and useful performance-based management approach.
PMPs promote the collection of comparable data by sufficiently documenting
indicator definitions, sources, and methods of data collection. This enables
operating units to collect comparable data over time even when key personnel
change.
PMPs support timely collection of data by documenting the frequency and
schedule of data collection as well as by assigning responsibilities. Operating
units should also consider developing plans for data analysis, reporting, and
review efforts as part of the PMP process. It makes sense to
PN-ABY-215
2
Use a Participatory Approach
The Agency's reengineering directives require that operating units involve USAID's partners, customers, and
stakeholders in planning approaches to monitoring performance. Experience indicates the value of collaborating
with relevant host government officials, implementing agency staff, contractors and grantees, other donors, and
customer groups, when preparing PMPs. They typically have the most familiarity with the quality, availability,
think through data collection, analysis, reporting, and
review as an integrated process. This will help keep the
performance monitoring system on track and ensure
performance data informs decision-making. While there
are strong arguments for including such integrated plans
in the PMP document, this is not mandated in the
reengineering guidance. Some operating units may wish
to prepare these plans separately.
Elements of a PMP
The following elements should be considered for
inclusion in a performance monitoring plan. Elements
1- 5 are required in the reengineering guidance, whereas
6 -9 are suggested as useful practices.
I. Plans for Data Collection (Required)
In its strategic plan, an operating unit will have identified
a few preliminary performance indicators for each of its
strategic objectives, strategic support objectives, and
special objectives (referred to below simply as SOs), and
USAID-supported intermediate results (IRs). In most
cases, preliminary baselines and targets will also have
been provided in the strategic plan. The PMP builds on
this initial information, verifying or modifying the
performance indicators, baselines and targets, and
documenting decisions.
PMPs are required to include information outlined below
(elements 1-5) on each performance indicator that has
been identified in the Strategic Plan for SOs and IRs.
Plans should also address how critical assumptions and
results supported by partners (such as the host
government, other donors, NGOs) will be monitored,
although the same standards and requirements for
developing indicators and collecting data do not apply.
Furthermore, it is useful to include in the PMP lowerlevel indicators of inputs, outputs, and processes at the
activity level, and how they will be monitored and
linked to IRs and SOs.
1. Performance Indicators and Their Definitions
Each performance indicator needs a detailed definition.
Be precise about all technical elements of the indicator
statement. As an illustration, consider the indicator,
number of small enterprises receiving loans from the
private banking system. How are small enterprises
defined -- all enterprises with 20 or fewer employees, or
50 or 100? What types of institutions are considered part
of the private banking sector -- credit unions,
government-private sector joint-venture financial
institutions?
Include in the definition the unit of measurement. For
example, an indicator on the value of exports might be
otherwise well defined, but it is also important to know
whether the value will be measured in current or constant
terms and in U.S. dollars or local currency.
The definition should be detailed enough to ensure that
different people at different times, given the task of
collecting data for a given indicator, would collect
identical types of data.
2. Data Source
Identify the data source for each performance indicator.
The source is the entity from which the data are obtained,
usually the organization that conducts the data collection
effort. Data sources may include government
departments, international organizations, other donors,
NGOs, private firms, USAID offices, contractors, or
activity implementing agencies.
Be as specific about the source as possible, so the same
source can be used routinely. Switching data sources for
the same indicator over time can lead to inconsistencies
and misinterpretations and should be avoided. For
example, switching from estimates of infant mortality
rates based on national sample surveys to estimates based
on hospital registration statistics can lead to false
impressions of change.
3
Plans may refer to needs and means for strengthening the
capacity of a particular data source to collect needed data
on a regular basis, or for building special data collection
efforts into USAID activities.
3. Method of Data Collection
Specify the method or approach to data collection for
each indicator. Note whether it is primary data collection
or is based on existing secondary data.
For primary data collection, consider:
the unit of analysis (individuals, families,
communities, clinics, wells)
data disaggregation needs (by gender, age, ethnic
groups, location)
sampling techniques for selecting cases (random
sampling, purposive sampling); and
techniques or instruments for acquiring data on
these selected cases (structured questionnaires,
direct observation forms, scales to weigh infants)
For indicators based on secondary data, give the method
of calculating the specific indicator data point and the
sources of data.
Note issues of data quality and reliability. For example,
using secondary data from existing sources cuts costs and
efforts, but its quality may not be as reliable.
Provide sufficient detail on the data collection or
calculation method to enable it to be replicated.
4. Frequency and Schedule of Data Collection
Performance monitoring systems must gather
comparable data periodically to measure progress. But
depending on the performance indicator, it may make
sense to collect data on a quarterly, annual, or less
frequent basis. For example, because of the expense and
because changes are slow, fertility rate data from sample
surveys may only be collected every few years whereas
data on contraceptive distributions and sales from clinics'
record systems may be gathered quarterly. PMPs can
also usefully provide the schedules (dates) for data
collection efforts.
When planning the frequency and scheduling of data
collection, an important factor to consider is
management's needs for timely information for decisionmaking.
5. Responsibilities for Acquiring Data
For each performance indicator, the responsibility the
operating unit for the timely acquisition of data from
their source should be clearly assigned to a particular
office, team, or individual.
II. Plans for Data Analysis, Reporting,
Review, and Use
An effective performance monitoring system needs to
plan not only for the collection of data, but also for data
analysis, reporting, review, and use. It may not be
possible to include everything in one document at one
time, but units should take the time early on for careful
planning of all these aspects in an integrated fashion.
6. Data Analysis Plans
To the extent possible, plan in advance how performance
data for individual indicators or groups of related
indicators will be analyzed. Identify data analysis
techniques and data presentation formats to be used.
Consider if and how the following aspects of data
analysis will be undertaken:
Comparing disaggregated data. For indicators with
disaggregated data, plan how it will be compared,
displayed, and analyzed.
Comparing current performance against multiple
criteria. For each indicator, plan how actual performance
data will be compared with a) past performance, b)
planned or targeted performance or
c) other relevant benchmarks.
Analyzing relationships among performance indicators.
Plan how internal analyses of the performance data will
examine interrelationships. For example
How will a set of indicators (if there are more
than one) for a particular SO or IR be analyzed
to reveal progress? What if only some of the
indicators reveal progress?
How will cause-effect relationships among SOs
and IRs within a results framework be analyzed?
How will USAID activities be linked to
achieving IRs and SOs?
Analyzing cost-effectiveness. When practical and
feasible, plan for using performance data to compare
systematically alternative program approaches in terms
of costs as well as results. The Government Performance
and Results Act (GPRA) encourages this.
4
7. Plans for Complementary Evaluations
Reengineering stresses that evaluations should be
conducted only if there is a clear management need. It
may not always be possible or desirable to predict years
in advance when or why they will be needed.
Nevertheless, operating units may find it useful to plan
on a regular basis what evaluation efforts are needed to
complement information from the performance
monitoring system. The operating unit's internal
performance reviews, to be held periodically during the
year, may be a good time for such evaluation planning.
For example, if the reviews reveal that certain
performance targets are not being met, and if the reasons
why are unclear, then planning evaluations to investigate
why would be in order.
8. Plans for Communicating and Using Performance
Information
Planning how performance information will be reported,
reviewed, and used is critical for effective managing for
results. For example, plan, schedule, and assign
responsibilities for internal and external reviews,
briefings, and reports. Clarify what, how and when
management decisions will consider performance
information. Specifically, plan for the following:
Operating unit performance reviews. Reengineering
guidance requires operating units to conduct internal
reviews of performance information at regular intervals
during the year to assess progress toward achieving SOs
and IRs. In addition, activity-level reviews should be
planned regularly by SO teams to assess if activities'
inputs, outputs, and processes are supporting
achievement of IRs and SOs.
USAID/Washington reviews and the R4 Report.
Reengineering requires operating units to prepare and
submit to USAID/Washington an annual Results Review
and Resource Request (R4) report, which is the basis for
a joint review with USAID/W of performance and
resource requirements. Help plan R4 preparation by
scheduling tasks and making assignments.
External reviews, reports, and briefings. Plan for
reporting and disseminating performance information to
key external audiences, such as host government
counterparts, collaborating NGOs, other partners, donors,
customer groups, and stakeholders. Communication
techniques may include reports, oral briefings,
videotapes, memos, newspaper articles.
Influencing management decisions. The ultimate aim of
performance monitoring systems is to promote
performance-based decision-making. To the extent
possible, plan in advance what management decisionmaking processes should be influenced by performance
information. For example, budget discussions,
programming decisions, evaluation designs/scopes of
work, office retreats, management contracts, and
personnel appraisals often benefit from the consideration
of performance information.
9. Budget
Estimate roughly the costs to the operating unit of
collecting, analyzing, and reporting performance data for
a specific indicator (or set of related indicators). Identify
the source of funds.
If adequate data are already available from secondary
sources, costs may be minimal. If primary data must be
collected at the operating unit's expense, costs can vary
depending on scope, method, and frequency of data
collection. Sample surveys may cost more than
$100,000, whereas rapid appraisal methods can be
conducted for much less. However, often these low-cost
methods do not provide quantitative data that are
sufficiently reliable or representative.
Reengineering guidance gives a range of 3 to 10 percent
of the total budget for an SO as a reasonable level to
spend on performance monitoring and evaluation.
CDIE's Tips series provides advice and
suggestions to USAID managers on how to
plan and conduct performance monitoring
and evaluation activities effectively. They
are supplemental references to the
reengineering automated directives system
(ADS), chapter 203. For further information, contact Annette Binnendijk, CDIE
Senior Evaluation Advisor, via phone
(703) 875-4235, fax (703) 875-4866, or email. Copies of TIPS can be ordered from
the Development Information Services
Clearinghouse by calling (703) 351-4006 or
by faxing (703) 351-4039. Please refer to
the PN number. To order via Internet,
address requests to
docorder@disc.mhs.compuserve.com
NUMBER 8
2ND EDITION, 2010
PERFORMANCE MONITORING & EVALUATION
TIPS
BASELINES AND TARGETS
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to performance
monitoring and evaluation. This publication is a supplemental reference to the Automated Directive
System (ADS) Chapter 203.
INTRODUCTION
The achievement of planned results
is at the heart of USAID’s
performance management system. In
order to understand where we, as
project managers, are going, we
need to understand where we have
been. Establishing quality baselines
and setting ambitious, yet achievable,
targets are essential for the
successful management of foreign
assistance programs.
WHAT ARE
BASELINES AND
TARGETS?
A baseline is the value of a
performance indicator before the
implementation of projects or
activities, while a target is the
specific, planned level of result to be
achieved
within
an
explicit
timeframe (see ADS 203.3.4.5).
Targets are set for indicators at the
Assistance
Objective
(AO),
Intermediate Result (IR), and output
levels.
WHY ARE
BASELINES
IMPORTANT?
Baselines help managers determine
progress in achieving outputs and
outcomes. They also help identify
the extent to which change has
happened at each level of result.
USAID ADS 203.3.3 requires a PMP
for each AO. Program managers
should provide baseline and target
values for every indicator in the
PMP.
Lack of baseline data not only
presents challenges for management
decision-making purposes, but also
hinders evaluation efforts.
For
example, it is generally not possible
to conduct a rigorous impact
1
evaluation without solid baseline
data (see TIPS 19: Rigorous Impact
Evaluation).
ESTABLISHING THE
BASELINE
Four common scenarios provide the
context for establishing baseline
data:
1. BASELINE IS
ESTABLISHED
If baseline data exist prior to the
start of a project or activity,
additional data collected over the
life of the project must be collected
in a consistent manner in order to
facilitate comparisons. For example,
consider the drop-out rate for girls 16
and under. If baseline data are
obtained from the Ministry of
Education, the project should
continue to collect these data from
this same source, ensuring that the
data
collection
remains the same.
methodology
Data may also be obtained from a
prior
implementing
partner’s
project, provided that the data
collection protocols, instruments,
and scoring procedures can be
replicated. For example, a policy
index might be used to measure
progress of legislation (see TIPS 14:
Monitoring the Policy Reform
Process). If these activities become
a part of a new project, program
managers should consider the
benefit
of using
the
same
instrument.
In cases where baseline data exist
from primary or secondary sources,
it is important that the data meet
USAID’s data quality standards for
validity,
reliability,
precision,
integrity, and timeliness (see TIPS
12: Data Quality Standards).
2. BASELINES MUST BE
COLLECTED
In cases where there are no existing
data with which to establish a
baseline,
USAID
and/or
its
implementing partners will have to
collect it if the required data are not
already being collected by, for
example,
a
host-country
government,
an
international
organization, or another donor.
Primary data collection can be
expensive, particularly if data are
collected through a formal survey or
Participation of key stakeholders in
setting targets helps establish a
common understanding about what
the project will accomplish and
when. USAID staff, implementing
partners, host country governments,
other donors, and civil society
partners, among others, should
attend working sessions at the outset
of program implementation to
review baseline data and other
information to set interim and final
targets.
a new index. Program managers
should consider this cost and
incorporate it into program or
project planning.
Ideally, data should be collected
prior to the initiation of the
program. If this is not feasible,
baselines should be collected as
soon as possible. For example, an
implementing partner may collect
perception data on the level of
corruption in targeted municipalities
for USAID’s PMP sixty days after
approval of a project’s work plan; in
another case, a score on an
advocacy capacity index may not be
collected until Community Service
Organizations (CSOs) are awarded
grants. If baseline data cannot be
collected until later in the course of
implementing an activity, the AO
Team should document when and
how the baseline data will be
collected (ADS 203.3.4.5).
3. BASELINES ARE
ESTABLISHED ON A
ROLLING BASIS
In some cases, it is possible to
collect baseline data on a rolling
basis as implementation proceeds.
For example, imagine that a health
project is being rolled out
sequentially across three provinces
over a three-year period. Data
collected in the first province will
serve as baseline for Year One; data
collected in the second province will
serve as baseline for the second
province in Year Two; and data
collected in the third province will
serve as baseline for that province in
Year Three.
The achievement of results requires
the joint action of many
stakeholders. Manageable interest
means we, as program managers,
have sufficient reason to believe that
the achievement of our planned
results can be significantly influenced
by interventions of USAID’s
program and staff resources. When
setting targets, take into account the
achievement of how other actors
will affect outcomes and what it
means for USAID to achieve
success.
program is the number of grants
awarded, the baseline is zero.
WHY ARE TARGETS
IMPORTANT?
Beyond
meeting
USAID
requirements, performance targets
are important for several reasons.
They help justify a program by
describing in concrete terms what
USAID’s investment will produce.
Targets orient stakeholders to the
tasks to be accomplished and
motivate individuals involved in a
program to do their best to ensure
the targets are met. Targets also
help to establish clear expectations
for USAID staff, implementing
partners, and key stakeholders.
Once a program is underway, they
serve as the guideposts for
monitoring whether progress is
being made on schedule and at the
levels originally envisioned. Lastly,
targets promote transparency and
accountability by making available
information on whether results have
been achieved or not over time.
4. BASELINE IS ZERO
For some indicators, baselines will
be zero. For example, if a new
program focuses on building the
teaching skills of teachers, the
baseline for the indicator “the
number of teachers trained” is zero.
Similarly, if an output of a new
2
A natural tension exists between the
need to set realistic targets and the
value,
from
a
motivational
perspective, of setting targets
ambitious enough to ensure that
staff and stakeholders will stretch to
meet them; when motivated, people
can often achieve more than they
imagine. Targets that are easily
achievable are not useful for
management and reporting purposes
since they are, in essence, pro forma.
AO Teams should plan ahead for
the analysis and interpretation of
actual
data
against
their
performance
targets
(ADS
203.3.4.5).
USING TARGETS
FOR
PERFORMANCE
MANAGEMENT IN A
LEARNING
ORGANIZATION
Targets can be important tools for
effective program management.
However, the extent to which
targets are or are not met should
not be the only criterion for judging
the success or failure of a program.
Targets are essentially flags for
managers; if the targets are wildly
exceeded
or
well-below
expectations, the program manager
should ask, “Why?”
Consider an economic growth
project. If a country experiences an
unanticipated downturn in its
economy,
the
underlying
FIGURE 1. PORTFOLIO
REVIEWS AND
PERFORMANCE TARGETS
To prepare for Portfolio Reviews,
AO Teams should conduct analysis of
program data, including achievement
of planned targets. ADS 203.3.7.2
provides illustrative questions for
these reviews:
• Are the desired results being
achieved?
• Are the results within USAID’s
manageable interest?
• Will planned targets be met?
• Is the performance management
system currently in place adequate
to capture data on the achievement
of results?
assumptions upon which that
project was designed may be
affected. If the project does not
meet targets, then it is important for
managers to focus on understanding
1) why targets were not met, and 2)
whether the project can be adjusted
to allow for an effective response to
changed circumstances.
In this
scenario, program managers may
need to reexamine the focus or
priorities of the project and make
related adjustments in indicators
and/or targets.
Senior
managers,
staff,
and
implementing
partners
should
review performance information and
targets as part of on-going project
management responsibilities and in
Portfolio Reviews (see Figure 1.)
TYPES OF TARGETS
FINAL AND INTERIM
TARGETS
A final target is the planned value of
a performance indicator at the end
of the AO or project. For AOs, the
final targets are often set three to
five years away, while for IRs they
are often set one to three years
away. Interim targets should be set
for the key points of time in
between the baseline and final target
in cases where change is expected
and data can be collected.
QUANTITATIVE AND
QUALITATIVE TARGETS
Targets may be either quantitative
or qualitative, depending on the
nature of the associated indicator.
Targets for quantitative indicators are
numerical, whereas targets and for
qualitative indicators are descriptive.
To
facilitate
comparison
of
baselines, targets, and performance
data for descriptive data, and to
maintain
data
quality,
some
indicators convert qualitative data
into a quantitative measure (see
Figure 2).
Nonetheless, baseline
and target data for quantitative and
3
FIGURE 2. TARGET
SETTING FOR
QUANTITATIVE AND
QUALITATIVE INDICATORS
- WHAT’S THE
DIFFERENCE?
Quantitative indicators and targets
are numerical. Examples include
the dropout rate, the value of
revenues, or number of children
vaccinated.
Qualitative indicators and targets
are descriptive. However,
descriptions must be based on a set
of pre-determined criteria. It is
much easier to establish baselines
and set targets when qualitative
data are converted into a
quantitative measure. For example,
the Advocacy Index is used to
measure the capacity of a target
organization, based on agreed-upon
standards that are rated and
scored. Other examples include
scales, indexes, and scorecards (see
Figure 3).
qualitative indicators must be
collected using the same instrument
so that change can be captured and
progress towards results measured
accurately (see TIPS 6: Selecting
Performance Indicators).
EXPRESSING
TARGETS
As with performance indicators,
targets can be expressed differently.
There are several possible ways to
structure
targets
to
answer
questions about the quantity of
expected change:
• Absolute level of achievement –
e.g., 75% of all trainees obtained
jobs by the end of the program or
7,000 people were employed by
the end of the program.
• Change in level of achievement –
e.g., math test scores for students
in grade nine increased by 10% in
Year One, or math test scores for
students in grade nine increased
FIGURE 3. SETTING TARGETS FOR QUALITATIVE MEASURES
For the IR Improvements in the Quality of Maternal and Child Health Services, a service delivery scale was used as the
indicator to measure progress. The scale, as shown below, transforms qualitative information about services into a rating
system against which targets can be set:
0 points = Service not offered
1 point = Offers routine antenatal care
1 point = Offers recognition and appropriate management of high risk pregnancies
1 point = Offers routine deliveries
1 point = Offers appropriate management of complicated deliveries
1 point = Offers post-partum care
1 point = Offers neonatal care
Score
= Total number of service delivery points
Illustrative Target: Increase average score to 5 by the end of year.
by three points in Year One.
Yields
per
hectare
under
improved management practices
increased by 25% or yields per
hectare increased by 100 bushels
from 2010 to 2013.
• Change in relation to the scale of
the problem – e.g., 35% of total
births in target area attended by
skilled health personnel by the end
of year two, or the proportion of
households with access to reliable
potable water increased by 50% by
2013.
• Creation
or
provision
of
something new – e.g., 4,000 doses
of tetanus vaccine distributed in
Year One, or a law permitting
non-government organizations to
generate income is passed by
2012.
Other targets may be concerned
with the quality of expected results.
Such targets can relate to indicators
measuring customer satisfaction,
public opinion, responsiveness rates,
enrollment rates, complaints, or
failure rates.
For example, the
average customer satisfaction score
for registration of a business license
(based on a seven-point scale)
increases to six by the end of the
program, or the percentage of
mothers who return six months
after delivery for postnatal care
increases to 20% by 2011.
Targets relating to cost efficiency or
producing outcomes at the least
expense are typically measured in
terms of unit costs. Examples of
such targets might include: cost of
providing
a
couple-year-ofprotection is reduced to $10 by
1999 or per-student costs of a
training program are reduced by
20% between 2010 and 2013.
demonstrate that:
DISAGGREGATING
TARGETS
A gender-sensitive indicator can be
defined as an indicator that captures
gender-related changes in society
over time. For example, a program
may focus on increasing enrollment
of children in secondary education.
Program managers may not only
want to look at increasing
enrollment rates, but also at the gap
between girls and boys. One way to
measure performance would be to
When a program’s progress is
measured in terms of its effects on
different segments of the population,
disaggregated targets can provide
USAID with nuanced information
that may not be obvious in the
aggregate. For example, a program
may seek to increase the number of
micro-enterprise loans received by
businesses in select rural provinces.
By disaggregating targets, program
inputs can be directed to reach a
particular target group.
Targets can be disaggregated along a
number of dimensions including
gender, location, income level,
occupation, administration level
(e.g., national vs. local), and social
groups.
For USAID programs, performance
management systems must include
gender-sensitive indicators and sexdisaggregated data when the
technical analyses supporting the
AO or project to be undertaken
4
• The different roles and status of
women and men affect the
activities differently; and
• The anticipated results of the
work would affect women and
men differently.
FIGURE 4. AN EXAMPLE OF
DISAGGREGATED TARGETS
FOR GENDER SENSITIVE
INDICATORS
Indicator: Number of children
graduating from secondary school;
percent gap between boys and girls.
B=boys; G=girls
Year
2010
(baseline)
Planned
2011
175
120B; 55G
50.0%
200
120B; 80G
25.0%
200
115B; 92G
2012
2013
Actual
145
115B; 30G
58.6%
160
120 B; 40G
56.3%
200
130 B; 70G
30.0%
205
110B; 95G
disaggregate the total number of
girls and boys attending school at
the beginning and at the end of the
school year (see Figure 4). Another
indicator might look at the quality of
the participation levels of girls vs.
boys with a target of increasing the
amount of time girls engage in
classroom discussions by two hours
per week.
Gender-sensitive indicators can use
qualitative
or
quantitative
methodologies to assess impact
directly on beneficiaries. They can
also be used to assess the
differential impacts of policies,
programs, or practices supported by
USAID on women and men (ADS
201.3.4.3).
Program managers should think
carefully about disaggregates prior
to collecting baseline data and
setting targets.
Expanding the
number of disaggregates can
increase the time and costs
associated with data collection and
analysis.
SETTING TARGETS
Targets
should
be
realistic,
evidence-based,
and
ambitious.
Setting meaningful targets provides
staff, implementing partners, and
stakeholders with benchmarks to
document
progress
toward
achieving results. Targets need to
take
into
account
program
resources,
the
implementation
period, and the development
hypothesis implicit in the results
framework.
PROGRAM RESOURCES
The level of funding, human
resources, material goods, and
institutional capacity contribute to
determining project outputs and
affecting change at different levels of
results and the AO. Increases or
decreases in planned program
resources should be considered
when setting targets.
ASSISTANCE OBJECTIVES
AND RESULTS
FRAMEWORKS
Performance
targets
represent
commitments that USAID AO
Teams make about the level and
timing of results to be achieved by a
program. Determining targets is
easier
when
objectives
and
indicators are within USAID’s
manageable interest.
Where a
result sits in the causal chain, critical
assumptions, and other contributors
to achievement of the AO will affect
targets.
Other key considerations include:
1. Historical Trends: Perhaps
even
more
important
than
examining a single baseline value, is
understanding
the
underlying
historical trend in the indicator
value over time. What pattern of
change has been evident in the past
five to ten years on the performance
indicator? Is there a trend, upward
FIGURE 5. PROGRESS IS NOT ALWAYS A STRAIGHT LINE
While it is easy to establish annual targets by picking an acceptable final
performance level and dividing expected progress evenly in the years between,
such straight-line thinking about progress is often inconsistent with the way
development programs really work. More often than not, no real progress – in
terms of measureable impacts or results – is evident during the start-up period.
Then, in the first stage of implementation, which may take the form of a pilot
test, some but not much progress is made, while the program team adjusts its
approaches. During the final two or three years of the program, all of this early
work comes to fruition. Progress leaps upward, and then rides a steady path at
the end of the program period. If plotted on a graph, it would look like “stair
steps,” not a straight line
5
or downward, that can be drawn
from existing reports, records, or
statistics? Trends are not always a
straight line; there may be a period
during which a program plateaus
before improvements are seen (see
Figure 5).
2. Expert Judgments: Another
option is to solicit expert opinions
as to what is possible or feasible
with respect to a particular indicator
and country setting. Experts should
be knowledgeable about the
program area as well as local
conditions. Experts will be familiar
with what is and what is not possible
from a technical and practical
standpoint – an important input for
any target-setting exercise.
3. Research Findings: Similarly,
reviewing development literature,
especially research and evaluation
findings, may help in choosing
realistic targets. In some program
areas, such as population and health,
extensive research findings on
development trends are already
widely available and what is possible
to achieve may be well-known. In
other areas, such as democracy,
research on performance indicators
and trends may be scarce.
4. Stakeholder Expectations:
While targets should be defined on
the basis of an objective assessment
of what can be accomplished given
certain conditions and resources, it
is useful to get input from
stakeholders regarding what they
want, need, and expect from USAID
activities.
What
are
the
expectations of progress? Soliciting
expectations may involve formal
interviews, rapid appraisals, or
informal conversations. Not only
end users should be surveyed;
intermediate
actors
(e.g.,
implementing agency staff) can be
especially useful in developing
realistic targets.
5. Achievement of Similar
Programs: Benchmarking is the
FIGURE 6. BENCHMARKING
One increasingly popular way of
setting targets and comparing
performance is to look at the
achievement of another program or
process by one or a collection of
high-performing organizations.
USAID is contributing to the
development of benchmarks for
programs such as water governance
(http://www.rewab.net), financial
management (www.fdirisk.com) and
health care systems
(www.healthsystems2020.org) Targets
may be set to reflect this “best in the
business” experience, provided of
course that consideration is given to
the comparability of country
conditions, resource availability, and
other factors likely to influence the
performance levels that can be
achieved.
process of comparing or checking
the progress of other similar
programs. It may be useful to
analyze progress of other USAID
Missions or offices, or other
development agencies and partners,
to understand the rate of change
that can be expected in similar
circumstances.
APPROACHES FOR
TARGET SETTING
There is no single best approach to
use when setting targets; the
process is an art and a science.
Although much depends on available
information, the experience and
knowledge of AO Team members
will add to the thinking behind
performance target.
Alternative
approaches include the following:
1. Projecting a future trend, then
adding the “valued added” by USAID
activities.
Probably the most
rigorous and credible approach, this
involves estimating the future trend
without USAID’s program, and then
adding whatever gains can be
expected as a result of USAID’s
efforts. This is no simple task, as
projecting the future can be very
tricky. The task is made somewhat
easier if historical data are available
and can be used to establish a trend
line.
2. Establishing a final performance
target for the end of the planning
period, and then planning the
progress from the baseline level.
This approach involves deciding on
the program’s performance target
for the final year, and then defining a
path of progress for the years in
between.
Final targets may be
judged on benchmarking techniques
or on judgments of experts,
program staff, customers, or
partners about the expectations of
what can be reasonably achieved
within the planning period. When
setting interim targets, remember
that progress is not always a straight
line. All targets, both final and
interim, should be based on a careful
analysis of what is realistic to
achieve, given the stage of program
implementation,
resource
availability,
country
conditions,
technical constraints, etc.
6
3. Setting annual performance
targets. Similar to the previous
approach, judgments are made
about what can be achieved each
year, instead of starting with a final
performance level and working
backwards. In both cases, consider
variations in performance, e.g.,
seasons and timing of activities and
expected results.
DOCUMENT AND
FILE
Typically, USAID project, baselines,
targets, and actual data are kept in a
data table for analysis either in the
PMP, as a separate document, or
electronically.
Furthermore, it is important to
document in the PMP how targets
were selected and why target values
were chosen.
Documentation
serves as a future reference for:
• Explaining
a
methodology.
target-setting
• Analyzing actual performance data.
• Setting targets in later years.
Responding to inquiries or audits
For more information:
TIPS publications are available online at [insert website].
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication, including Gerry Britan and
Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was
updated by Jill Tirnauer of Management Systems International.
Comments can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
7
NUMBER 9
2011 Printing
PERFORMANCE MONITORING & EVALUATION
TIPS
CONDUCTING CUSTOMER SERVICE ASSESSMENTS
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
performance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive Service (ADS) Chapter 203.
WHAT IS A CUSTOMER
SERVICE ASSESSMENT?
Under USAID’s new operations system, Agency operating units are required to
routinely and systematically
assess customer needs for,
perceptions of, and reactions
to USAID programs.
A customer service assessment is a management tool for understanding USAID’s programs
from the customer’s perspective. Most often
these assessments seek feedback from customers about a program’s service delivery performance. The Agency seeks views from both
ultimate customers (the end-users, or beneficiaries, of USAID activities—usually disadvantaged
groups) and intermediate customers (persons
or organizations using USAID resources, services, or products to serve the needs of the
ultimate customers).
This TIPS gives practical advice about customer service
assessments— for example,
when they should be conducted, what methods may
be used, and what information can be usefully included.
Customer service assessments may also be used
to elicit opinions from customers or potential
customers about USAID’s strategic plans, development objectives, or other planning issues.
1
For example, the operating unit may seek their
views on development needs and priorities to
help identify new, relevant activities.
WHO DOES CUSTOMER
SERVICE ASSESSMENTS?
WHY CONDUCT
CUSTOMER SERVICE
ASSESSMENTS?
USAID guidance specifies that all operating
units should develop a customer service plan.
The plan should include information about customers’ needs, preferences, and reactions as an
element in a unit’s planning, achieving, perfor-
USAID’s reengineered operating system calls
for regularly conducting customer service assessments for all program activities. Experience indicates that effective customer feedback
on service delivery improves performance,
achieves better results, and creates a more participatory working environment for programs,
and thus increases sustainability.
Box 1.The Customer Service Plan
The customer service plan presents the
operating unit’s vision for including customers and partners to achieve its objectives.
It explains how customer feedback will be
incorporated to determine customer needs
and perceptions
of services provided, and how this feedback
will be regularly incorporated into the unit’s
operations. The customer service plan is a
management tool for the operating unit and
does not require USAID/W approval.
Specifically, the plan
These assessments provide USAID staff with
the information they need for making constructive changes in the design and execution of development programs. This information may also
be shared with partners and customers as an
element in a collaborative, ongoing relationship.
In addition, customer service assessments provide input for reporting on results, allocating
resources, and presenting the operating unit’s
development programs to external audiences.
• Identifies the ultimate and intermediate customers for service delivery and
segments customer groups for different
programs, products, or services
Customer service assessments are relevant not
only to program-funded activities directed to
customers external to USAID. They can also
be very useful in assessing services provided to
internal USAID customers.
• Describes and regularly schedules appropriate means for assessing service
delivery, performance, and customer
satisfaction
Moreover, customer service assessments are
federally mandated. The Government Performance and Results Act of 1993 and Executive
Order 12862 of 1993 direct federal agencies to
reorient their programs toward achievement
of measurable results that reflect customers’
needs and to systematically assess those needs.
Agencies must report annually to the Administration on customer service performance.
• Establishes service principles and specifies measurable service performance
standards indicates staff responsibilities
for managing customer service activities—including assessments
• Specifies the resources required for customer service activities and assessments.
2
formance in delivering the program’s products
and services.
mance monitoring and evaluation functions (see
Box 1). Depending on the scope of its program
operations, an operating unit may find it needs
to plan several customer service assessments.
The various assessments might be tailored to
different strategic objectives, program activities
and services, or customer groups (differentiated, for example, by gender, ethnicity, or income).
Responsibility for designing and managing these
assessments typically is assigned to the relevant
development objective.
Unless the service or product delivery is satisfactory (i.e., timely, relevant, accessible, good
quality) from the perspective of the customers,
it is unlikely that the program will achieve its
substantive development results, which, after all,
ultimately depend on customers’ participation
and use of the service or product. For example,
a family planning program is unlikely to achieve
reduced fertility rates unless customers are satisfied with the contraceptive products
it offers and the delivery mechanism it uses to
provide them. If not sufficiently satisfied, customers will simply not use them.
HOW DO CUSTOMER
SERVICE ASSESSMENTS
COMPLEMENT PERFORMANCE MONITORING
AND EVALUATION?
Customer service assessments thus complement broader performance monitoring and
evaluation systems by monitoring a specific
type of result: service delivery performance
Performance monitoring and evaluation broad- from the customer’s perspective. By providing
ly addresses the results or outcomes of a pro- managers with information on whether cusgram.These results reflect objectives chosen by tomers are satisfied with and using a program’s
the operating unit (in consultation with part- products and services, these assessments are
ners and customer representatives) and may especially useful for giving early indications of
encompass several types of results.
whether longer term substantive development
results are likely to be met.
Often they are medium- to longer-term developmental changes or impacts. Examples: reduc- Both customer service assessments and perfortions in fertility rates, increases in income, im- mance monitoring and evaluation use the same
provements in agricultural yields, reductions in array of standard social science investigation
forest land destroyed.
techniques—surveys, rapid and participatory
appraisal, document reviews, and the like. In
some cases, the same survey or rapid appraisal
may even be used to gather both types of information. For example, a survey of customers of
an irrigation program might ask questions about
service delivery aspects (e.g., access, timeliness,
quality, use of irrigation water) and questions
concerning longer term development results
(e.g., yields, income).
Another type of result often included in performance monitoring and evaluation involves customer perceptions and responses to goods or
services delivered by a program— for example,
the percentage of women satisfied with the maternity care they receive, or the proportion of
farmers who have tried a new seed variety and
intend to use it again. Customer service assessments look at this type of result—customer
satisfaction, perceptions, preferences, and related opinions about the operating unit’s per3
planning the assessment should 1) identify the
purpose and intended uses of the information,
2) clarify the program products or services being assessed, 3) identify the customer groups
involved, and 4) define the issues the study will
address. Moreover, the scope of work typically discusses data collection methods, analysis
techniques, reporting and dissemination plans,
and a budget and time schedule.
STEPS IN CONDUCTING A
CUSTOMER SERVICE
ASSESSMENT
Step 1. Decide when the assessment
should be done.
Customer service assessments should be conducted whenever the operating unit requires
customer information for its management purposes. The general timing and frequency of customer service assessments is typically outlined
in the unit’s customer service plan.
Specific issues to be assessed will vary with the
development objective, program activities under way, socioeconomic conditions, and other
factors. However, customer service assessments generally aim at understanding
• Customer views regarding the importance
Customer service assessments are likely to
of various USAID-provided services (e.g.,
be most effective if they are planned to coortraining, information, commodities, technidinate with critical points in cycles associated
cal assistance) to their own needs and priwith the program being assessed (crop cycles,
orities
local school year cycles, host country fiscal year
cycles, etc.) as well as with the Agency’s own • Customer judgments, based on measurable
annual reporting and funding cycles.
service standards, on how well USAID is
performing service delivery
Customer service assessments will be most
valuable as management and reporting tools if • Customer comparisons of USAID service
they are carried out some months in advance of
delivery with that of other providers.
the operating unit’s annual planning and reporting process. For example, if a unit’s results re- Open-ended inquiry is especially well suited for
view and resources request (R4) report is to be addressing the first issue.The other two may be
completed by February, the customer service measured and analyzed quantitatively or qualiassessment might be conducted in November. tatively by consulting with ultimate or intermediate customers with respect to a number of
However, the precise scheduling and execution service delivery attributes or criteria important
of assessments is a task appropriate for those
responsible for results in a program sector—
members of the strategic objective or results
Box 2.
package team.
Illustrative Criteria For Assessing
Service Delivery
Step 2. Design the assessment.
Convenience. Ease of working with the
operating unit, simple processes, minimal
red tape, easy physical access to contacts
Depending on the scale of the effort, an operating unit may wish to develop a scope of work for
a customer service assessment. At a minimum,
4
Responsiveness. Follow up promptly, meet
changing needs, solve problems, answer questions, return calls
and program activity.
Reliability. On-time delivery that is thorough, accurate, complete
With its objective clearly in mind, and the information to be collected carefully specified, the
operating unit may decide in-house resources,
external assistance consultants, or a combination of the two, to conduct the assessment.
Step 3. Conduct the assessment.
Quality of products and services. Perform as intended; flexible in meeting local
needs; professionally qualified personnel
Contact personnel. Professional, knowledgable, understand local culture, language
skills
Select from a broad range of methods. A customer service assessment is not just a survey. It may
use a broad repertory of inquiry tools designed
to elicit information about the needs, preferences, or reactions of customers regarding a
USAID activity, product or service. Methods
may include the following:
to customer satisfaction (see Box 2).
• Formal customer surveys
In more formal surveys, for example, customers
may be asked to rate services and products on,
say, a 1-to-5 scale indicating their level of satisfaction with specific service characteristics or
attributes they consider important (e.g., quality,
reliability, responsiveness). In addition to rating
the actual services, customers may be asked
what they would consider “excellent” service,
referring to the same service attributes and using the same 5-point scale. Analysis of the gap
between what customers expect as an ideal
standard and what they perceive they actually
receive indicates the areas of service delivery
needing improvement.
• rapid appraisal methods (e.g., focus groups,
town meetings, interviews with key informants)
Breadth of choice. Sufficient choices to
meet customer needs and preferences
• Participatory appraisal techniques, in which
customers plan analyze, self-monitor, evaluate or set priorities for activities
• Document reviews, including systematic
use of social science research conducted by
others.
Use systematic research methods. A hastily prepared and executed effort does not provide
quality customer service assessment informaIn more qualitative approaches, such as focus tion. Sound social science methods are essengroups, customers discuss these issues among tial.
themselves while researchers listen carefully to
their perspectives. Operating units and teams Practice triangulation. To the extent resources
should design their customer assessments to and time permit, it is preferable to gather incollect customer feedback on service delivery formation from several sources and methods,
issues and attributes they believe are most im- rather than relying on just one. Such triangulaportant to achieving sustainable results toward tion will build confidence in findings and proa clearly defined strategic objective. These is- vide adequate depth of information for good
sues will vary with the nature of the objective decision-making and program management. In
5
particular, quantitative surveys and qualitative
studies often complement each other. Whereas
a quantitative survey can produce statistical
measurements of customer satisfaction (e.g.,
with quality, timeliness, or other aspects of a
program operation) that can be generalized
to a whole population, qualitative studies can
provide an in-depth understanding and insight
into customer perceptions and expectations on
these issues.
and encourage closer rapport with customers and partners. Moreover, they encourage a
more collaborative, participatory, and effective
approach to achievement of objectives.
Conduct assessments routinely. Customer service
assessments are designed to be consciously
iterative. In other words, they are undertaken
periodically to enable the operating unit to
build a foundation of findings over time to inform management of changing customer needs
and perceptions. Maintaining an outreach orientation will help the program adapt to changing
circumstances as reflected in customer views.
H. S. Plunkett and Elizabeth Baltimore, Customer
Focus Cookbook, USAID/M/ROR, August 1996.
Selected Further Reading
Resource Manual for Customer Surveys. Statistical
Policy Office, Office of Management and Budget. October 1993.
Zeithaml, Valarie A; A. Parasuraman; and Leonard L.Berry. Delivering Quality Service. New York:
Free Press
Step 4. Broadly disseminate and use
assessment findings to improve performance.
Customer service assessments gain value when
broadly disseminated within the operating unit,
to other operating units active in similar program sectors, to partners, and more widely
within USAID. Sharing this information is also
important to maintaining open, transparent relations with customers themselves.
Assessment findings provide operating unit
managers with insight on what is important to
customers and how well the unit is delivering
its programs. They also can help identify operations that need quality improvement, provide
early detection of problems, and direct attention to areas where remedial action may be
taken to improve delivery of services.
Customer assessments form the basis for review of and recommitment to service principles. They enable measurement of service
delivery performance against service standards
6
NUMBER 10
2011 Printing
PERFORMANCE MONITORING & EVALUATION
TIPS
CONDUCTING FOCUS GROUP INTERVIEWS
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
peroformance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive Service (ADS) Chapter 203.
WHAT IS A FOCUS GROUP
INTERVIEW?
USAID’s guidelines encourage use of rapid, lowcost methods to collect
information on the
performance of
development assistance
activities.
A focus group interview is an inexpensive, rapid
appraisal technique that can provide managers with a wealth of qualitative information on
performance of development activities, services, and products, or other issues. A facilitator
guides 7 to 11 people in a discussion of their
experiences, feelings, and preferences about a
topic. The facilitator raises issues identified in
a discussion guide and uses probing techniques
to solicit views, ideas, and other information.
Sessions typically last one to two hours.
Focus group interviews,
the subject of this TIPS,
is one such method.
ADVANTAGES AND
LIMITATIONS
1
cannot be explained recommendations and
This technique has several advantages. It is low
suggestions are needed from customers,
cost and provides speedy results. Its flexible forpartners, experts, or other stakeholders
mat allows the facilitator to explore unanticipated issues and encourages interaction among
participants. In a group setting participants pro- For example, focus groups were used to unvide checks and balances, thus minimizing false cover problems in a Nepal family planning program where facilities were underutilized, and
or extreme views.
to obtain suggestions for improvements from
Focus groups have some limitations, however. customers.The focus groups revealed that rural
The flexible format makes it susceptible to fa- women considered family planning important.
cilitator bias, which can undermine the validity However, they did not use the clinics because of
and reliability of findings. Discussions can be caste system barriers and the demeaning mansidetracked or dominated by a few vocal individ- ner of clinic staff. Focus group participants suguals. Focus group interviews generate relevant gested appointing staff of the same social status
qualitative information, but no quantitative data to ensure that rural women were treated with
from which generalizations can be made for a respect. They also suggested that rural women
whole population. Moreover, the information disseminate information to their neighbors
can be difficult to analyze; comments should be about the health clinic.
interpreted in the context of the group setting.
Before deciding whether to use focus group interviews as a source of information, the study
WHEN ARE FOCUS GROUP purpose needs to be clarified. This requires
identifying who will use the information, deterINTERVIEWS USEFUL?
mining what information is needed, and understanding why the information is needed. Once
this is done, an appropriate methodology can
Focus group interviews can be useful in all phasbe selected. (See Tips 5 Using Rapid Appraisal
es of development activities— planning, impleMethods for additional information on selecting
mentation, monitoring, and evaluation. They can
appraisal techniques.)
be used to solicit views, insights, and recommendations of program staff, customers, stakeholders, technical experts, or other groups.
STEPS IN CONDUCTING
FOCUS GROUP
INTERVIEWS
They are especially appropriate when:
• program activities are being planned and it
is important for managers to understand
customers’ and other stakeholders’ attitudes, preferences or needs
Follow this step-by-step advice to help ensure
high-quality results.
• specific services or outreach approaches Step 1. Select the team
have to take into account customers’ prefConducting a focus group interview requires a
erences
small team, with at least a facilitator to guide
• major program implementation problems the discussion and a rapporteur to record it.
The facilitator should be a native speaker who
2
can put people at ease. The team should have Step 3. Decide on timing and location
substantive knowledge of the topic under discussion.
Discussions last one to two hours and should
be conducted in a convenient location with
Skills and experience in conducting focus some degree of privacy. Focus groups in a small
groups are also important. If the interviews village arouse curiosity and can result in uninare to be conducted by members of a broader vited participants. Open places are not good
evaluation team without previous experience spots for discussions.
in focus group techniques, training is suggested.
This training can take the form of role playing, Step 4. Prepare the discussion guide
formalized instruction on topic sequencing and
probing for generating and managing group dis- The discussion guide is an outline, prepared in
cussions, as well as pre-testing discussion guides advance, that covers the topics and issues to be
in pilot groups.
discussed. It should contain few items, allowing
some time and flexibility to pursue unanticipatStep 2. Select the participants
ed but relevant issues.
First, identify the types of groups and institutions that should be represented (such as program managers, customers, partners, technical experts, government officials) in the focus
groups. This will be determined by the informtion needs of the study. Often separate focus
groups are held for each type of group. Second,
identify the most suitable people in each group.
One of the best approaches is to consult key
informants who know about local conditions.
It is prudent to consult several informants to
minimize the biases of individual preferences.
Excerpt from a Discussion
Guide on Curative
Health Services
(20-30 minutes)
Q. Who treats/cures your children
when they get sick? Why?
Note: Look for opinions about
Each focus group should be 7 to 11 people to
allow the smooth flow of conversation.
• outcomes and results
• provider-user relations
• costs (consultations, transportation, medicine)
• waiting time
• physical aspects (privacy, cleanliness)
• availability of drugs, lab services
• access (distance, availability of
transportation)
• follow-up at home
Participants should be homogenous, from similar socioeconomic and cultural backgrounds.
They should share common traits related to the
discussion topic. For example, in a discussion
on contraceptive use, older and younger women should participate in separate focus groups.
Younger women may be reluctant to discuss
sexual behavior among their elders, especially if
it deviates from tradition. Ideally, people should
not know each other. Anonymity lowers inhibition and prevents formation of cliques.
3
The guide provides the framework for the facilitator to explore, probe, and ask questions.
Initiating each topic with a carefully crafted
question will help keep the discussion focused.
Using a guide also increases the comprehensiveness of the data and makes data collection
more efficient. Its flexibility, however can mean
that different focus groups are asked different
questions, reducing the credibility of the findings. An excerpt from a discussion guide used
in Bolivia to assess child survival services provides an illustration. (See box on page 3)
• What do you think about corruption in the
criminal justice system?
• How do you feel about the three parties
running in upcoming national elections?
Use probing techniques. When participants give
incomplete or irrelevant answers, the facilitator
can probe for fuller, clearer responses. A few
suggested techniques:
Repeat the question—repetition gives more time
to think
Step 5. Conduct the interview
Adopt sophisticated naivete” posture—convey
limited understanding of the issue and ask for
specific details
Establish rapport. Often participants do not
know what to expect from focus group discussions. It is helpful for the facilitator to outline
the purpose and format of the discussion at the
beginning of the session, and set the group at
ease. Participants should be told that the discussion is informal, everyone is expected to
participate, and divergent views are welcome.
Pause for the answer—a thoughtful nod or expectant look can convey that you want a fuller
answer
Repeat the reply—hearing it again sometimes
Phrase questions carefully. Certain types of ques- stimulates conversation. Ask when, what,
tions impede group discussions. For example, where, which, and how questions—they proyes-or-no questions are one dimensional and voke more detailed information
do not stimulate discussion. “Why” questions
put people on the defensive and cause them to Use neutral comments— Anything else?” Why do
take “politically correct” sides on controversial you feel this way?”
issues.
Control the discussion. In most groups a few indiOpen-ended questions are more useful be- viduals dominate the discussion. To balance out
cause they allow participants to tell their story participation:
in their own words and add details that can re• Address questions to individuals who are
sult in unanticipated findings. For example:
reluctant to talk
• What do you think about the criminal justice system?
• Give nonverbal cues (look in another direction or stop taking notes when an individual
• How do you feel about the upcoming natalks for an extended period)
tional elections?
• Intervene, politely summarize the point,
If the discussion is too broad the facilitator can
then refocus the discussion
narrow responses by asking such questions as:
4
• Take advantage of a pause and say, “Thank
you for that interesting idea, perhaps we can
discuss it in a separate session. Meanwhile
with your consent, I would like to move on
to another item.”
trends andpatterns, strongly held or frequently
aired opinions.
Read each transcript. Highlight sections that correspond to the discussion guide questions and
mark comments that could be used in the final
report.
Minimize group pressure. When an idea is being
adopted without any general discussion or disagreement, more than likely group pressure is
occurring. To minimize group pressure the facilitator can probe for alternate views. For example, the facilitator can raise another issue, or
say, “We had an interesting discussion but let’s
explore other alter natives.”
Analyze each question separately. After reviewing
all the responses to a question or topic, write a
summary statement that describes the discussion. In analyzing the results, the team should
consider:
• Words. Weigh the meaning of words participants used. Can a variety of words and
phrases categorize similar responses?
Step 6. Record the discussion
A rapporteur should perform this function.
Tape recordings in conjunction with written • Framework. Consider the circumstances in
notes are useful. Notes should be extensive
which a comment was made (context of
and reflect the content of the discussion as well
previous discussions, tone and intensity of
as nonverbal behavior (facial expressions, hand
the comment).
movements).
• Internal agreement. Figure out whether shifts
Shortly after each group interview, the team
in opinions during the discussion were
should summarize the information, the team’s
caused by group pressure.
impressions, and implications of the information for the study.
• Precision of responses. Decide which responses were based on personal experience and
Discussion should be reported in participants’
give them greater weight than those based
language, retaining their phrases and grammation vague impersonal impressions.
cal use. Summarizing or paraphrasing responses
can be misleading. For instance, a verbatim reply • The big picture. Pinpoint major ideas. Allo“Yes, indeed! I am positive,” loses its intensity
cate time to step back and reflect on major
when recorded as “Yes.”
findings.
Step 7. Analyze results
• Purpose of the report. Consider the objectives of the study and the information
needed for decisionmaking. The type and
After each session, the team should assemble
scope of reporting will guide the analytical
the interview notes (transcripts of each focus
process. For example, focus group reports
group interview), the summaries, and any other
typically are: (1) brief oral reports that highrelevant data to analyze trends and patterns.
light key findings; (2) descriptive reports
The following method can be used.
that summarize the discussion; and (3) analytical reports that provide trends, patterns,
Read summaries all at one time. Note potential
5
or findings and include selected comments.
Focus Group Interviews of Navarongo Community
Health and Family Planning Project in Ghana
The Ghanaian Ministry of Health launched a small pilot project in three villages
in 1994 to assess community reaction to family planning and elicit community
advice on program design and management. A new model of service deliverywas introduced: community health nurses were retrained as community health
officers living in the communities and providing village-based clinical services.
Focus group discussions were used to identify constraints to introducing family planning services and clarify ways to design operations that villagers value.
Discussions revealed that many women want more control over their ability
to reproduce, but believe their preferences are irrelevant to decisions made
in the male dominated lineage system. This indicated that outreach programs
aimed primarily at women are insufficient. Social groups must be included to
legitimize and support individuals’ family-planning decisions. Focus group discussions also revealed women’s concerns about the confidentiality of information and services. These findings preclude development of a conventional communitybased distribution program, since villagers clearly prefer outside service
delivery workers to those who are community members.
Selected Further Reading
Krishna Kumar, Conducting Group Interviews in
Developing Countries, A.I.D. Program Design and
Evaluation Methodology Report No. 8, 1987
(PN-AAL-088)
Richard A. Krueger, Focus Groups: A Practical
Guide for Applied Research, Sage Publications,
1988
6
2009, NUMBER 12
2ND EDITION
PERFORMANCE MONITORING & EVALUATION
TIPS
DATA QUALITY STANDARDS
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
performance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive System (ADS) Chapter 203.
WHY IS DATA
QUALITY
IMPORTANT?
Results-focused development
programming
requires
managers to design and
implement programs based
on evidence. Since data play a
central role in establishing
effective
performance
management systems, it is
essential to ensure good data
quality
(see
Figure
1).
Without this, decision makers
do not know whether to have
confidence in the data, or
worse, could make decisions
based on misleading data.
Attention to data quality
assists in:
Figure 1. Data Quality Plays a Central Role in Developing
Effective Performance Management Systems
Cycle:
Plan: Identify or Refine Key Program Objectives
Design: Develop or Refine the Performance Management Plan
Analyze Data
Use Data: Use Findings from Data Analysis to Improve Program Effectiveness
Data
Quality
 Ensuring
that
limited
development resources are
used as effectively as
possible
1
 Ensuring
that
Agency
program
and
budget
decisions in Washington
and the field are as well
informed
possible
as
practically
 Meeting the requirements
of
the
Government
Performance and Results
Act (GPRA)
 Reporting the impact of
USAID programs to external
stakeholders,
including
senior management, OMB,
the Congress, and the
public with confidence
DATA QUALITY
STANDARDS
Data quality is one element of
a
larger
interrelated
performance
management
system. Data quality flows
from a well designed and
logical strategic plan where
Assistance Objectives (AOs)
and Intermediate Results (IRs)
are clearly identified.
If a
result is poorly defined, it is
difficult to identify quality
indicators,
and
further,
without quality indicators, the
resulting data will often have
data quality problems.
One key challenge is to
determine what level of data
quality is acceptable (or “good
enough”) for management
purposes. It is important to
understand that we rarely
require the same degree of
rigor as needed in research or
for laboratory experiments.
Standards for data quality
must be keyed to our
intended use of the data. That
is, the level of accuracy,
currency,
precision,
and
reliability
of
performance
The Five Data Quality
Standards
1. Validity
2. Reliability
3. Precision
4. Integrity
5. Timeliness
information
should
be
consistent
with
the
requirements
of
good
management.
Determining
appropriate
or
adequate
thresholds of indicator and
data quality is not an exact
science.
This task is made
even more difficult by the
complicated and often datapoor development settings in
which USAID operates.
As
with
performance
indicators, we sometimes have
to consider trade-offs, or
make informed judgments,
when applying the standards
for data quality. This is
especially true if, as is often
the case, USAID relies on
others to provide data for
indicators. For example, if our
only existing source of data
for a critical economic growth
indicator is the Ministry of
Finance, and we know that the
Ministry’s
data
collection
methods are less than perfect,
we may have to weigh the
alternatives of relying on lessthan-ideal data, having no
data at all, or conducting a
potentially
costly
USAIDfunded
primary
data
collection effort. In this case,
2
a decision must be made as to
whether the Ministry’s data
would allow the Assistance
Objective team to have
confidence when assessing
program
performance
or
whether they are so flawed as
to be useless, or perhaps
misleading, in reporting and
managing for results.
The
main point is that managers
should not let the ideal drive
out the good.
1. VALIDITY
Validity refers to the extent to
which a measure actually
represents what we intend to
measure.1
Though simple in principle,
validity can be difficult to
assess in practice, particularly
when
measuring
social
phenomena.
For example,
how can we measure political
power or sustainability? Is the
poverty gap a good measure
of the extent of a country’s
poverty? However, even valid
indicators have little value, if
the data collected do not
correctly measure the variable
or characteristic encompassed
by the indicator. It is quite
possible, in other words, to
identify valid indicators but to
then
collect
inaccurate,
unrepresentative,
or
incomplete data.
In such
cases, the quality of the
indicator is moot. It would be
equally undesirable to collect
1
This criterion is closely related
to “directness” criteria for
indicators.
good data
indicator.
for
an
invalid
There are a number of ways to
organize or present concepts
related to data validity. In the
USAID context, we focus on
three key dimensions of
validity that are most often
relevant
to
development
programming, including: face
validity,
attribution,
and
measurement error.
FACE VALIDITY
Face validity means that an
outsider or an expert in the
field would agree that the
data is a true measure of the
result. For data to have high
face validity, the data must be
true representations of the
indicator, and the indicator
must be a valid measure of
the result. For example:
Result:
Increased
household income in a
target district
Indicator:
Value
of
median household income
in the target district
In this case, the indicator has a
high degree of face validity
when compared to the result.
That is, an external observer is
likely to agree that the data
measure
the
intended
objective. On the other hand,
consider
the
following
example:
Result:
Increased
household income in a
target district
Indicator:
Number of
houses in the target
community with tin roofs
This example does not appear
to have a high degree of face
validity as a measure of
increased income, because it
is not immediately clear how
tin roofs are related to
increased income.
The
indicator above is a proxy
indicator
for
increased
income.
Proxy indicators
measure results indirectly, and
their validity hinges on the
assumptions made to relate
the indicator to the result. If
we assume that 1) household
income data are too costly to
obtain and 2) research shows
that when the poor have
increased income, they are
likely to spend it on tin roofs,
then this indicator could be an
appropriate
proxy
for
increased income.
ATTRIBUTION
Attribution focuses on the
extent to which a change in
the data is related to USAID
interventions. The concept of
attribution is discussed in
detail as a criterion for
indicator
selection,
but
reemerges when assessing
validity.
Attribution means
that changes in the data can
be plausibly associated with
USAID interventions.
For
example, an indicator that
measures changes at the
national level is not usually
appropriate for a program
targeting a few areas or a
particular segment of the
3
population.
following:
Consider the
Result:
revenues
in
municipalities.
Increased
targeted
Indicator: Number of
municipalities where tax
revenues have increased
by 5%.
In this case, assume that
increased
revenues
are
measured
among
all
municipalities
nationwide,
while the program only
focuses on a targeted group
of municipalities. This means
that the data would not be a
valid measure of performance
because the overall result is
not reasonably attributable to
program activities.
MEASUREMENT ERROR
Measurement error results
primarily from the poor
design or management of
data collection processes.
Examples include leading
questions,
unrepresentative
sampling,
or
inadequate
training of data collectors.
Even if data have high face
validity, they still might be an
inaccurate measure of our
result due to bias or error in
the measurement process.
Judgments about acceptable
measurement error should
reflect technical assessments
about what level of reductions
in measurement error are
possible and practical. This
can be assessed on the basis
of cost as well as management
judgments about what level of
accuracy
decisions.
is
needed
for
Some degree of measurement
error is inevitable, particularly
when dealing with social and
economic changes, but the
level of measurement error
associated
with
all
performance data collected or
used by operating units
should not be so large as to 1)
call into question either the
direction or degree of change
reflected by the data or 2)
overwhelm the amount of
anticipated change in an
indicator
(making
it
impossible for managers to
determine whether progress.
reflected in the data is a result
of actual change or of
measurement error). The two
main sources of measurement
error are sampling and nonsampling error.
Sampling Error (or
representativeness)
Data
are
said
to
be
representative
if
they
accurately
reflect
the
population they are intended
to
describe.
The
representativeness of data is a
function of the process used
to select a sample of the
population from which data
will be collected.
It is often not possible, or
even desirable, to collect data
from
every
individual,
household, or community
involved in a program due to
resource
or
practical
constraints. In these cases,
data are collected from a
sample to infer the status of
the population as a whole. If
we are interested in describing
the characteristics of a
country’s primary schools, for
example, we would not need
to examine every school in the
country. Depending on our
focus, a sample of a hundred
schools might be enough.
However, when the sample
used to collect data are not
representative
of
the
population as a whole,
significant
bias
can
be
introduced into the data. For
example, if we only use data
from 100 schools in the capital
area of the country, our data
will
not
likely
be
representative of all primary
schools in the country.
Drawing a sample that will
allow managers to confidently
generalize data/findings to
the population requires that
two basic criteria are met: 1)
that all units of a population
(e.g., households, schools,
enterprises) have an equal
chance of being selected for
the sample and 2) that the
sample is of adequate size.
The sample size necessary to
ensure that resulting data are
representative to any specified
degree can vary substantially,
depending on the unit of
analysis, the size of the
population, the variance of the
characteristics being tracked,
and
the
number
of
characteristics that we need to
analyze. Moreover, during
data collection it is rarely
possible to obtain data for
every member of an initially
4
chosen sample. Rather, there
are established techniques for
determining acceptable levels
of non-response or for
substituting new respondents.
If a sample is necessary, it is
important for managers to
consider the sample size and
method relative to the data
needs.
While data validity
should always be a concern,
there may be situations where
accuracy is a particular
priority. In these cases, it may
be useful to consult a
sampling expert to ensure the
data are representative.
Non-Sampling Error
Non-sampling error includes
poor design of the data
collection instrument, poorly
trained
or
partisan
enumerators, or the use of
questions (often related to
sensitive subjects) that elicit
incomplete
or
untruthful
answers from respondents.
Consider the earlier example:
Result: Increased
household income in a
target district
Indicator: Value of
median household
income in the target
district
While these data appear to
have high face validity, there is
the potential for significant
measurement error through
reporting bias. If households
are asked about their income,
they might be tempted to
income
to
under-report
demonstrate the need for
additional assistance (or overreport
to
demonstrate
success). A similar type of
reporting bias may occur
when data is collected in
groups or with observers, as
respondents may modify their
responses to match group or
observer norms. This can be a
particular source of bias when
collecting data on vulnerable
groups. Likewise, survey or
interview
questions
and
sequencing
should
be
developed in a way that
minimizes the potential for
the leading of respondents to
predetermined responses. In
order to minimize nonsampling measurement error,
managers should carefully
plan and vet the data
collection process with a
careful eye towards potential
sources of bias.
Minimizing Measurement
Error
Keep in mind that USAID is
primarily
concerned
with
learning,
with
reasonable
confidence, that anticipated
improvements have occurred,
not with reducing error below
some arbitrary level. 2 Since it
is impossible to completely
eliminate measurement error,
and reducing error tends to
become
increasingly
expensive or difficult, it is
important to consider what an
2
For additional information, refer
to Common Problems/Issues with
Using Secondary Data in the CDIE
Resource Book on Strategic
Planning and Performance
Monitoring, April 1997.
acceptable level of error
would be.
Unfortunately,
there is no simple standard
that can be applied across all
of the data collected for
USAID’s varied programs and
results.
As
performance
management plans (PMPs) are
developed, teams should:
 Identify the existing or
potential sources of error
for each indicator and
document this in the PMP.
 Assess how this error
compares
with
the
magnitude of expected
change. If the anticipated
change is less than the
measurement error, then
the data are not valid.
 Decide whether alternative
data sources (or indicators)
need to be explored as
better alternatives or to
complement the data to
improve data validity.
2. RELIABILITY
Data should reflect stable and
consistent
data
collection
processes
and
analysis
methods over time.
Reliability is important so that
changes in data can be
recognized as true changes
rather than reflections of poor
or changed data collection
methods. For example, if we
use
a
thermometer
to
measure a child’s temperature
repeatedly and the results
vary from 95 to 105 degrees,
even though we know the
child’s temperature hasn’t
changed, the thermometer is
5
not a reliable instrument for
measuring fever.
In other
words, if a data collection
process is unreliable due to
changes in the data collection
instrument,
different
implementation across data
collectors, or poor question
choice, it will be difficult for
managers to determine if
changes in data over the life
of the project reflect true
changes or random error in
the data collection process.
Consider
the
following
examples:
Indicator: Percent
increase in income
among target
beneficiaries.
The first year, the project
reports
increased
total
income, including income as a
result of off-farm resources.
The second year a new
manager is responsible for
data collection, and only farm
based income is reported.
The third year, questions arise
as to how “farm based
income” is defined. In this
case, the reliability of the data
comes into question because
managers are not sure
whether changes in the data
are due to real change or
changes in definitions. The
following is another example:
Indicator: Increased
volume of agricultural
commodities sold by
farmers.
A scale is used to measure
volume
of
agricultural
commodities sold in the
What’s the Difference
Between Validity and
Reliability?
Validity refers to the
extent to which a
measure actually
represents what we
intend to measure.
Reliability refers to the
stability of the
measurement process.
That is, assuming there is
no real change in the
variable being measured,
would the same
measurement process
provide the same result if
the process were
repeated over and over?
market. The scale is jostled
around in the back of the
truck. As a result, it is no
longer properly calibrated at
each stop. Because of this,
the scale yields unreliable
data, and it is difficult for
managers
to
determine
whether changes in the data
truly reflect changes in
volume sold.
3. PRECISION
Precise data have a sufficient
level of detail to present a fair
picture of performance and
enable management decisionmaking.
The level of precision or detail
reflected in the data should be
smaller (or finer) than the
margin of error, or the tool of
measurement is considered
too imprecise. For some
indicators, for which the
magnitude
of
expected
change is large, even relatively
large measurement errors may
be perfectly tolerable; for
other
indicators,
small
amounts of change will be
important and even moderate
levels of measurement error
will be unacceptable.
Example: The number of
politically
active
nongovernmental organizations
(NGOs) is 900. Preliminary
data shows that after a few
years this had grown to
30,000 NGOs. In this case, a
10 percent measurement error
(+/- 3,000 NGOs) would be
essentially
irrelevant.
Similarly, it is not important to
know precisely whether there
are 29,999 or 30,001 NGOs. A
less precise level of detail is
still sufficient to be confident
in the magnitude of change.
Consider
an
alternative
scenario. If the second data
point is 1,000, a 10 percent
measurement error (+/- 100)
would
be
completely
unacceptable
because
it
would represent all of the
apparent change in the data.
4. INTEGRITY
Integrity focuses on whether
there is improper manipulation
of data.
Data that are collected,
analyzed and reported should
have established mechanisms
in
place
to
reduce
manipulation.
There are
generally two types of issues
that affect data integrity. The
first is transcription error. The
second, and somewhat more
complex issue, is whether
there is any incentive on the
6
part of the data source to
manipulate the data for
political or personal reasons.
Transcription Error
Transcription error refers to
simple data entry errors made
when transcribing data from
one document (electronic or
paper) or database to another.
Transcription
error
is
avoidable,
and
Missions
should seek to eliminate any
such error when producing
internal or external reports
and other documents. When
the data presented in a
document produced by an
operating unit are different
from the data (for the same
indicator and time frame)
presented in the original
source simply because of data
entry or copying mistakes, a
transcription
error
has
occurred.
Such differences
(unless due to rounding) can
be easily avoided by careful
cross-checking of data against
the original source. Rounding
may result in a slight
difference from the source
data but may be readily
justified when the underlying
data do not support such
specificity, or when the use of
the data does not benefit
materially from the originally
reported level of detail. (For
example, when making cost or
budget
projections,
we
typically
round
numbers.
When we make payments to
vendors, we do not round the
amount
paid
in
the
accounting ledger. Different
purposes can accept different
levels of specificity.)
Technology can help to
reduce transcription error.
Systems can be designed so
that the data source can enter
data directly into a database—
reducing the need to send in a
paper report that is then
entered into the system.
However, this requires access
to computers and reliable
internet services. Additionally,
databases can be developed
with internal consistency or
range checks to minimize
transcription errors.
The use of preliminary or
partial data should not be
confused with transcription
error. There are times, where
it makes sense to use partial
data (clearly identified as
preliminary or partial) to
inform management decisions
or to report on performance
because these are the best
data currently available. When
preliminary or partial data are
updated by the original
source, USAID should quickly
follow suit, and note that it
has done so. Any discrepancy
between preliminary data
included in a dated USAID
document and data that were
subsequently updated in an
original source does not
constitute transcription error.
Manipulation
A somewhat more complex
issue is whether data is
manipulated.
Manipulation
should be considered 1) if
there may be incentive on the
part of those that report data
to skew the data to benefit
the project or program and
managers suspect that this
may be a problem, 2) if
managers
believe
that
numbers
appear
to
be
unusually favorable, or 3) if
the data are of high value and
managers want to ensure the
integrity of the data.
There are a number of ways in
which managers can address
manipulation.
First, simply
understand the data collection
process. A well organized and
structured process is less likely
to be subject to manipulation
because each step in the
process is clearly documented
and handled in a standard
way. Second, be aware of
potential issues. If managers
have reason to believe that
data are manipulated, then
they should further explore
the issues. Managers can do
this by periodically spot
checking or verifying the data.
This establishes a principle
that the quality of the data is
important and helps to
determine
whether
manipulation is indeed a
problem.
If there is
substantial concern about this
issue,
managers
might
conduct a Data Quality
Assessment (DQA) for the AO,
IR, or specific data in question.
Example: A project assists
the Ministry of Water to
reduce
water
loss
for
agricultural use. The Ministry
reports key statistics on water
loss to the project. These
statistics are critical for the
Ministry, the project and
USAID to understand program
performance. Because of the
7
importance of the data, a
study is commissioned to
examine data quality and
more specifically whether
there is any tendency for the
data to be inflated. The study
finds that there is a very slight
tendency to inflate the data,
but it is within an acceptable
range.
5. TIMELINESS
Data should be available and
up to date enough to meet
management needs.
There are two key aspects of
timeliness. First, data must be
available frequently enough
to influence management
decision
making.
For
performance indicators for
which annual data collection is
not practical, operating units
will collect data regularly, but
at longer time intervals.
Second, data should be
current or, in other words,
sufficiently up to date to be
useful in decision-making. As
a general guideline, data
should lag no more than three
years.
Certainly, decisionmaking should be informed
by the most current data that
are
practically
available.
Frequently,
though,
data
obtained from a secondary
source, and at times even
USAID-funded primary data
collection,
will
reflect
substantial time lags between
initial data collection and final
analysis and publication. Many
of these time lags are
unavoidable,
even
if
considerable
additional
resources
were
to
be
expended.
Sometimes
preliminary estimates may be
obtainable, but they should be
clearly flagged as such and
replaced as soon as possible
as the final data become
available from the source.
The
following
example
demonstrates issues related to
timeliness:
Result: Primary school
attrition in a targeted
region reduced.
Indicator: Rate of
student attrition at
targeted schools.
In August 2009, the Ministry
of Education published full
enrollment analysis for the
2007 school year.
In this case, currency is a
problem because there is a 2
year time lag for these data.
While it is optimal to collect
and report data based on the
U.S. Government fiscal year,
there are often a number of
practical challenges in doing
so. We recognize that data
may come from preceding
calendar or fiscal years.
Moreover, data often measure
results for the specific point in
time that the data were
collected, not from September
to September, or December to
December.
Often the realities of the
recipient country context will
dictate the appropriate timing
of the data collection effort,
rather than the U.S. fiscal year.
For example, if agricultural
yields are at their peak in July,
then data collection efforts to
measure yields should be
conducted in July of each
year. Moreover, to the extent
that
USAID
relies
on
secondary data sources and
partners for data collection,
we may not be able to dictate
exact timing
ASSESSING DATA
QUALITY
Approaches and steps for how
to assess data quality are
discussed in more detail in
TIPS 18: Conducting Data
Quality Assessments. USAID
policy requires managers to
understand the strengths and
weaknesses of the data they
use on an on-going basis. In
addition, a Data Quality
Assessment (DQA) must be
conducted at least once every
3 years for those data
reported to Washington (ADS
203.3.5.2).
For more information:
TIPS publications are available online at [insert website]
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication including Gerry Britan
and Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This
publication was updated by Michelle Adams-Matson of Management Systems International (MSI).
Comments regarding this publication can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
8
NUMBER 13
2ND EDITION, 2010 DRAFT
PERFORMANCE MONITORING & EVALUATION
TIPS
BUILDING A RESULTS FRAMEWORK
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
performance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive System (ADS) Chapter 203.
WHAT IS A RESULTS
FRAMEWORK?
The Results Framework (RF) is a
graphic representation of a
strategy to achieve a specific
objective that is grounded in
cause-and-effect logic.
The RF
includes the Assistance Objective
(AO) and Intermediate Results
(IRs), whether funded by USAID
or partners, necessary to achieve
the objective (see Figure 1 for an
example). The RF also includes
the critical assumptions that must
hold true for the strategy to
remain valid.
The
Results
represents
Framework
a development hypothesis or a
theory about how intended
change will occur. The RF shows
how the achievement of lower
level objectives (IRs) leads to the
achievement of the next higher
order of objectives, ultimately
resulting in the AO.
In short, a person looking at a
Results Framework should be
able to understand the basic
theory for how key program
objectives will be achieved. The
Results
Framework
is
an
important tool because it helps
managers identify and focus on
key objectives within a complex
development environment.
1
A RESULTS FRAMEWORK
INCLUDES:

An Assistance Objective (AO)

Intermediate Results (IR)

Hypothesized cause and
effect linkages

Critical Assumptions
WHY IS THE RESULTS
FRAMEWORK
IMPORTANT?
The development of a Results
Framework
represents
an
important first step in forming
the actual strategy. It facilitates
analytic thinking and helps
What’s the Difference
Between a Results Framework
and the Foreign Assistance
Framework (FAF)?
In one word, accountability. The
results framework identifies an
objective that a Mission or Office
will be held accountable for
achieving in a specific country or
program environment. The
Foreign Assistance Framework
outlines broad goals and
objectives (e.g. Peace and
Security) or, in other words,
programming categories.
Achievement of Mission or
Office AOs should contribute to
those broader FAF objectives.
program managers gain clarity
around
key
objectives.
Ultimately, it sets the foundation
not only for the strategy, but also
for numerous other management
and
planning
functions
downstream, including project
design, monitoring, evaluation,
and program management. To
summarize,
the
Results
Framework:
 Provides an opportunity to
build consensus and ownership
around shared objectives not
only among AO team members
but also, more broadly, with
host-country representatives,
partners, and stakeholders.
 Facilitates agreement with
other
actors
(such
as
USAID/Washington, other USG
entities, the host country, and
other donors) on the expected
results and resources necessary
to achieve those results. The
AO is the focal point of the
agreement
between
USAID/Washington and the
Mission. It is also the basis for
Assistance
Agreements
(formerly
called
Strategic
Objective
Assistance
Agreements).
 Functions as an effective
communication tool because it
succinctly captures the key
elements of a program’s intent
and content.
 Establishes the foundation to
design
monitoring
and
evaluation
systems.
Information from performance
monitoring and evaluation
systems should also inform the
development of new RFs.
 Identifies the objectives that
drive project design.
In order to be an effective tool, a
Results Framework should be
current. RFs should be revised
when 1) results are not achieved
or completed sooner than
expected, 2) critical assumptions
are no longer valid, 3) the
underlying development theory
must be modified, or 4) critical
problems with policy, operations,
or resources were not adequately
recognized.
KEY CONCEPTS
THE RESULTS FRAMEWORK
IS PART OF A BROADER
STRATEGY
While the Results Framework is
one of the core elements of a
strategy, it alone does not
constitute a complete strategy.
Typically it is complimented by
narrative that further describes
the thinking behind the RF, the
relationships
between
the
objectives, and the identification
of synergies. As a team develops
the RF, broader strategic issues
2
should be considered, including
the following:
 What has led the team to
propose
the
Results
Framework?
 What is strategic about what is
being proposed (that is, does it
reflect
a
comparative
advantage or a specific niche)?
 What are the main strategic
issues?
 What is different in the new
strategy when compared to the
old?
 What synergies emerge? How
are
cross-cutting
issues
addressed?
How can these
issues be tackled in project
level
planning
and
implementation?
THE UNDERPINNING OF THE
RESULTS FRAMEWORK
A good Results Framework is not
only based on logic. It draws on
analysis, standard theories in a
technical
sector,
and
the
expertise
of
on-the-ground
managers.
Supporting Analysis
Before developing a Results
Framework, the team should
determine what analysis exists
and what analysis must yet be
completed
to
construct
a
development hypothesis with a
reasonable level of confidence.
Evaluations
constitute
an
important source of analysis,
identify important lessons from
past programs, and may explore
the validity of causal linkages that
can be used to influence future
programming.
Analysis of past
performance monitoring data is
also an important source of
information.
FIGURE 2. SETTING THE CONTEXT
FOR PARTICIPATION
External Forces
(Host Country
Strategy)
Standard Sector Theories
Sectors, particularly those that
USAID has worked in for some
time, often identify a set of
common elements that constitute
theories for how to accomplish
certain
objectives.
These
common elements form a basic
―template‖ of sorts to consider in
developing an RF. For example,
democracy
and
governance
experts often refer to addressing
supply and demand.
Supply
represents
the
ability
of
government to play its role
effectively or provide effective
services. Demand represents the
ability of civil society to demand
or
advocate
for
change.
Education
generally
requires
improved quality in teaching and
curriculum,
community
engagement,
and
adequate
facilities. Health often requires
improved quality of services, as
well as access to -- and greater
awareness of – those services.
An understanding of these
common strategic elements is
useful because they lay out a
standard set of components that
a team must consider in
developing a good RF. Although,
not all of these elements will
apply to all countries in the same
way, they form a starting point to
inform the team’s thinking. As
the team makes decisions about
what (or what not) to address,
this becomes a part of the logic
The
―Fit‖
USAID Mission/
Vision
that is presented in the narrative.
Technical experts can assist teams
in understanding standard sector
theories. In addition, a number
of USAID publications outline
broader sector strategies or
provide guidance on how to
develop strategies in particular
technical areas1.
On-the-Ground Knowledge
and Experience
Program managers are an
important source of knowledge
on the unique program or incountry factors that should be
considered in the development of
the Results Framework. They are
best able to examine different
types of information, including
1
Examples include: Hansen,
Gary. 1996. Constituencies for
Reform: Strategic Approaches for
Donor-Supported Civic Advocacy
Groups or USAID. 2008. Securing
the Future: A Strategy for
Economic Growth.
3
Internal
Capacity
analyses and standard sector
theories, and tailor a strategy for
a specific country or program
environment.
PARTICIPATION AND
OWNERSHIP
Development
of
a
Results
Framework presents an important
opportunity for USAID to engage
its own teams, the host country,
civil society, other donors, and
other
partners
in
defining
program objectives. Experience
has shown that a Results
Framework built out of a
participatory process results in a
more effective strategy.
Recent donor commitments to
the Paris Declaration and the
Accra Agenda for Action reinforce
these points. USAID has agreed
to increase ownership, align
systems
with
country-led
strategies, use partner systems,
harmonize aid efforts, manage for
development
results,
and
establish mutual accountability.
Common questions include,
―how
do
we
manage
participation?‖ or ―how do we
avoid raising expectations that
we cannot meet?‖
One
approach for setting the context
for effective participation is to
simply set expectations with
participants before engaging in
strategic discussions. In essence,
USAID is looking for the
―strategic fit‖ (see Figure 2). That
is, USAID seeks the intersection
between what the host country
wants, what USAID is capable of
delivering, and the vision for the
program.
WHOLE-OF- GOVERNMENT
APPROACHES
Efforts are underway to institute
planning processes that take into
account the U.S. Government’s
overall approach in a particular
country.
A
whole-ofapproach
may
government
identify larger goals or objectives
to which many USG entities
contribute.
Essentially, those
objectives would be at a higher
level or above the level of
accountability of any one USG
agency alone. USAID Assistance
Objectives
should
clearly
contribute to those larger goals,
but also reflect what the USAID
Mission can be held accountable
for within a specified timeframe
and within budget parameters.
The
whole-of-government
approach may be reflected at a
lower level in the Results
Framework as well.
The RF
provides flexibility to include the
objectives of other
GUIDELINES FOR CONSTRUCTING AOs AND IRs
AOs and IRs should be:

Results Statements. AOs and IRs should express an outcome. In other words,
the results of actions, not the actions or processes themselves. For example,
the statement ―increased economic growth in targets sectors‖ is a result, while
the statement ―increased promotion of market-oriented policies‖ is more
process oriented.

Clear and Measurable. AOs and IRs should be stated clearly and precisely, and
in a way that can be objectively measured. For example, the statement
―increased ability of entrepreneurs to respond to an improved policy, legal,
and regulatory environment‖ is both ambiguous and subjective. How one
defines or measures ―ability to respond‖ to a changing policy environment is
unclear and open to different interpretations. A more precise and measurable
results statement in this case is ―increased level of investment.‖ It is true that
USAID often seeks results that are not easily quantified. In these cases, it is
critical to define what exactly is meant by key terms. For example, what is
meant by ―improved business environment‖? As this is discussed, appropriate
measures begin to emerge.

Unidimensional. AOs or IRs ideally consist of one clear overarching objective.
The Results Framework is intended to represent a discrete hypothesis with
cause-and-effect linkages. When too many dimensions are included, that
function is lost because lower level results do not really ―add up‖ to higher
level results. Unidimensional objectives permit a more straightforward
assessment of performance. For example, the statement ―healthier, better
educated, higher-income families‖ is an unacceptable multidimensional result
because it includes diverse components that may not be well-defined and
may be difficult to manage and measure. There are limited exceptions. It may
be appropriate for a result to contain more than one dimension when the
result is 1) achievable by a common set of mutually-reinforcing Intermediate
Results or 2) implemented in an integrated manner (ADS 201.3.8).
actors (whether other USG
entities, donors, the host country,
or other partners) where the
achievement of those objectives
are essential for USAID to achieve
its AO.
For example, if a
program achieves a specific
objective that contributes to
USAID’s AO, it should be
reflected as an IR.
This can
facilitate greater coordination of
efforts.
THE LINKAGE TO PROJECTS
The RF should form the
foundation for project planning.
4
Project teams may continue to
flesh out the Results Framework
in further detail or may use the
Logical Framework2. Either way,
all projects and activities should
be designed to accomplish the
AO and some combination of one
or more IRs.
2
The Logical Framework (or
logframe for short) is a project
design tool that complements the
Results Framework. It is also
based on cause-and-effect
linkages. For further information
reference ADS 201.3.11.8.
THE PROCESS FOR
DEVELOPING A
RESULTS
FRAMEWORK
SETTING UP THE PROCESS
Missions may use a variety of
approaches to develop their
respective results frameworks. In
setting up the process, consider
the following three questions.
When
should
the
results
frameworks be developed? It is
often helpful to think about a
point in time at which the team
will have enough analysis and
information
to
confidently
construct a results framework.
Who is going to participate
(and at what points in the
process)?
It is important to
develop a schedule and plan out
the process for engaging partners
and stakeholders. There are a
number of options (or a
combination) that might be
considered:
 Invite
key
partners
or
stakeholders
to
results
framework
development
sessions. If this is done, it may
be useful to incorporate some
training
on
the
results
framework methodology in
advance. Figure 3 outlines the
basic building blocks and
defines terms used in strategic
planning
across
different
organizations.
 The AO team may develop a
preliminary results framework
and hold sessions with key
counterparts to present the
draft strategy and obtain
feedback.
 Conduct a strategy workshop
for AO teams to present out
RFs and discuss strategic issues.
Although these options require
some time and effort, the results
framework will be more complete
and representative.
What process and approach
will be used to develop the
results frameworks?
We
strongly recommend that the AO
team hold group sessions to
construct the results framework.
It is often helpful to have one
person
(preferably
with
experience in strategic planning
and facilitation) to lead these
sessions.
This person should
focus on drawing out the ideas of
the group and translating them
into the results framework.
STEP 1. IDENTIFY THE
ASSISTANCE OBJECTIVE
The Assistance Objective (AO) is
the center point for any results
framework and is defined as:
The most ambitious result
(intended measurable change)
that a USAID Mission/Office,
along with its partners, can
materially affect, and for which
it is willing to be held
accountable (ADS 201.3.8).
Defining an AO at an appropriate
level of impact is one of the most
critical and difficult tasks a team
faces.
The AO forms the
5
―It is critical to stress the importance
of not rushing to finalize a results
framework. It is necessary to take
time for the process to mature and to
be truly participative.‖
—USAID staff member in Africa
standard by which the Mission or
Office is willing to be judged in
terms of its performance.
The
concept of ―managing for results‖
(a USAID value also reflected in
the Paris Declaration) is premised
on this idea.
The task can be challenging,
because an AO should reflect a
balance of two conflicting
and
considerations—ambition
accountability. On the one hand,
every team wants to deliver
significant impact for a given
investment. On the other hand,
there are a number of factors
outside the control of the team.
In fact, as one moves up the
Results Framework toward the
AO, USAID is more dependent on
other development partners to
achieve the result.
Identifying an appropriate level
of ambition for an AO depends
on a number of factors and will
be different for each country
context. For example, in one
country it may be appropriate for
the AO to be ―increased use of
family planning methods‖ while
in another, ―decreased total
fertility‖ (a higher level objective)
would be more suitable. Where
to set the objective is influenced
by the following factors:
 Programming
history.
There
are
different
expectations for more
mature programs, where
higher level impacts and
greater sustainability are
expected.
 The magnitude of the
development problem.
 The timeframe
strategy.
for
Figure 3. Results Framework Logic
So What?
Necessary
and
Sufficient
the
 The
range
of
resources
available or expected.
The AO should represent the
team’s best assessment of what
can realistically be achieved. In
other words, the AO team should
be able to make a plausible case
that the appropriate analysis has
been done and the likelihood of
success is great enough to
warrant investing resources in the
AO.
STEP 2. IDENTIFY
INTERMEDIATE RESULTS
After agreeing on the AO, the
team must identify the set of
―lower level‖ Intermediate Results
necessary to achieve the AO. An
Intermediate Result is defined as:
An important result that is
seen as an essential step to
achieving a final result or
outcome.
IRs
are
How?
measurable results that may
capture a number of
discrete and more specific
results (ADS 201.3.8.4).
As the team moves down from
the AO to IRs, it is useful to ask
―how‖ can the AO be achieved?
By answering this question, the
team begins to formulate the IRs
(see Figure 3). The team should
assess relevant country and
sector conditions and draw on
development experience in other
countries to better understand
the changes that must occur if
the AO is to be attained.
The
Results
Framework
methodology
is
sufficiently
flexible to allow the AO team to
include Intermediate Results that
are supported by other actors
when they are relevant and
critical to achieving the AO. For
example, if another donor is
6
building schools that are
essential for USAID to
accomplish an education AO
(e.g.
increased
primary
school completion), then
that should be reflected as
an IR because it is a
necessary ingredient for
success.
Initially, the AO team might
identify a large number of
possible results relevant to
the AO.
However, it is
important to eventually settle on
the critical set of Intermediate
Results. There is no set number
for how many IRs (or levels of IRs)
are appropriate. The number of
Intermediate Results will vary
with the scope and complexity of
the AO. Eventually, the team
should arrive at a final set of IRs
that
members
believe
are
reasonable. It is customary for
USAID Missions to submit a
Results Framework with one or
two
levels
of
IRs
to
USAID/Washington for review.
The key point is that there should
be enough information to
adequately
convey
the
development hypothesis.
So What is Causal Logic Anyway?
Causal logic is based on the concept of cause-and-effect. That is, the accomplishment of lower-level
objectives ―cause‖ the next higher-level objective (or the effect) to occur. In the following example, the
hypothesis is that if IR 1, 2, and 3 occur, it will lead to the AO.
AO: Increased
Completion of
Primary School
IR 1: Improved
Quality of
Teaching
STEP 3. CLARIFY THE
RESULTS FRAMEWORK
LOGIC
Through
the
process
of
identifying Intermediate Results,
the team begins to construct the
cause-and-effect logic that is
central to the Results Framework.
Once the team has identified the
Intermediate Results that support
an objective, it must review and
confirm this logic.
The accomplishment of lower
level results, taken as a group,
should result in the achievement
of the next higher objective. As
the team moves up the Results
Framework, they should ask, ―so
what?‖ If we accomplish these
lower
level
objectives,
is
something
of
significance
achieved at the next higher level?
The
higher-order
result
establishes the ―lens‖ through
which lower-level results are
viewed. For example, if one IR is
―Increased Opportunities for Outof-School Youth to Acquire Life
Skills,‖ then, by definition, all
lower level IRs would focus on
IR 2: Improved
Curriculum
IR 3: Increased
Parental
Commitment to
Education
the target population established
(out-of-school youth).
As the team looks across the
Results Framework, it should ask
whether the Intermediate Results
are necessary and sufficient to
achieve the AO.
Results Framework logic is not
always linear.
There may be
relationships across results or
even with other AOs. This can
sometimes be demonstrated on
the graphic (e.g., through the use
of arrows or dotted boxes with
some explanation) or simply in
the narrative. In some cases,
teams find a number of causal
connections in an RF. However,
teams have to find a balance
between the two extremes- on
the one hand, where logic is too
simple and linear and, on the
other, a situation where all
objectives are related to all
others.
STEP 4. IDENTIFY CRITICAL
ASSUMPTIONS
The next step is to identify the set
of critical assumptions that are
relevant to the achievement of
7
the AO. A critical assumption is
defined as:
―….a general condition under
which
the
development
hypothesis will hold true.
Critical
assumptions
are
outside
the
control
or
influence of USAID and its
partners (in other words, they
are not results), but they
reflect conditions that are
likely to affect the achievement
of results in the Results
Framework.
Critical
assumptions may also be
expressed
as
risks
or
vulnerabilities…‖
(ADS
201.3.8.3)
Identifying critical assumptions,
assessing associated risks, and
determining how they should be
addressed is a part of the
strategic
planning
process.
Assessing risk is a matter of
balancing the likelihood that the
critical assumption will hold true
with the ability of the team to
address the issue. For example,
consider the critical assumption
―adequate rainfall.‖
If this
assumption has held true for the
What is NOT Causal Logic?
Categorical Logic. Lower level results are simply sub-categories rather than cause and effect, as
demonstrated in the example below.
AO: Increased
Completion of
Primary School
IR 1: Improved
Pre-Primary
School
IR 2: Improved
Primary
Education
IR 3: Improved
Secondary
Education
Definitional Logic. Lower-level results are a restatement (or further definition) of a higher-level objective.
The use of definitional logic results in a problem later when identifying performance indicators because it is
difficult to differentiate indicators at each level.
IR: Strengthened
Institution
IR: Institutional
Capacity to Deliver
Goods & Services
target region only two of the past
six years, the risk associated with
this assumption is so great that it
poses a risk to the strategy.
the years when a drought may
occur.
In cases like this, the AO team
should attempt to identify ways
to actively address the problem.
For example, the team might
include efforts to improve water
storage or irrigation methods, or
increase use of drought-resistant
seeds or farming techniques.
This would then become an IR (a
specific
objective
to
be
accomplished by the program)
rather than a critical assumption.
Another option for the team is to
develop contingency plans for
As a final step, the AO team
should step back from the Results
Framework and review it as a
whole. The RF should be
straightforward
and
understandable. Check that the
results contained in the RF are
measurable and feasible with
anticipated USAID and partner
resource levels. This is also a
good point at which to identify
synergies between objectives and
across AOs.
STEP 5. COMPLETE THE
RESULTS FRAMEWORK
8
STEP 6. IDENTIFY
PRELIMINARY
PERFORMANCE MEASURES
Agency policies (ADS 201.3.8.6)
require that the AO team present
proposed indicators for the AO
with baseline data and targets.
The AO, along with indicators and
targets, represents the specific
results that will be achieved vis-avis the investment. To the extent
possible, indicators for IRs with
baseline and targets should be
included as well.
1.
Figure 1. Illustrative Results Framework
AO:
Increased
Production by
Farmers in the
Upper River Zone
IR:
Farmers’ Access to
Commercial
Capital Increased
IR: Farmers’
Capacity to
Develop Bank
Loan Applications
Increased
(4 years)
IR: Banks’ Loan
Policies Become
More Favorable
for the Rural
Sector
(3 years)
2.
3.
4.
Critical Assumptions
Market prices for farmers’ products remain stable
or increase.
Prices of agricultural inputs remain stable or
decrease.
Roads needed to get produce to market are
maintained.
Rainfall and other critical weather conditions
remain stable.
IR:
Farmers’ Transport
Costs Decreased
IR: Additional
Local Wholesale
Market Facilities
Constructed (with
the World Bank)
IR:
Farmers’
Knowledge About
Effective
Production
Methods
Increased
IR: Village
Associations
Capacity to
Negotiate
Contracts
Increased (4 years)
IR: New
Technologies
Available
(World Bank)
(
Key
USAID
Responsible
Partner(s)(4
Responsible
9
USAID +
Partner(s)
Responsible
IR: Farmers’
Exposure to OnFarm Experiences
of Peers Increased
Figure 3. The Fundamental Building Blocks for Planning
ASSISTANCE OBJECTIVE (AO)
AO
The highest level objective for which USAID is
willing to be held accountable. AOs may also
be referred to as outcomes, impacts, or results.
Increased Primary School Completion
INTERMEDIATE RESULTS (IRs)
IR
Interim events, occurrences, or conditions that
are essential for achieving the AO. IRs may
also be referred to as outcomes or results.
Teaching Skills Improved
OUTPUT
OUTPUT
Products or services produced as a result of
internal activity.
Number of teachers trained
INPUT
INPUT
Resources used to produce an output.
Funding or person days of training
10
Figure 4. Sample Results Framework and Crosswalk of FAF Program Hierarchy and a
Results Framework
F Program
Hierarchy for
Budgeting and
Reporting
Illustrative Results Framework for
Program Planning
Assistance Objective: Economic Competitiveness of
Private Enterprises Improved
IR 1: Enabling Environment for
Enterprises Improved
IR 1.1 Licensing
and registration
requirements for
enterprises
streamlined
The Illustrative Results Framework
links to the FAF Program
Hierarchy as follows:
• Objective 4 Economic Growth
• Program Areas 4.6 (Private Sector
•
•
•
•
•
•
•
Competitiveness) and 4.7
(Economic Opportunity
Program Elements 4.6.1, 4.6.2, 4.7
Sub-Elements 4.6.12 and 4.7.2.1
Sub-Element 4.6.1.3
Sub-Element 4.7.2.2
Sub-Element 4.6.2.1
Sub-Element 4.7.3
Sub-Element 4.6.2.4
IR 1.3
Regulatory
environment for
micro and small
enterprises
improved
IR 1.2
Commercial laws
that support
market-oriented
transactions
promoted
IR 2: Private Sector
Capacity Strengthened
IR 2.1
Competitiveness
of targeted
enterprises
improved
IR 2.2
Productivity of
microenterprises in
targeted
geographic
regions
increased
Critical Assumptions:
•
•
Key political leaders, including the President and the
Minister of Trade and Labor, will continue to support
policy reforms that advance private enterprise-led
growth.
Government will sign the Libonia Free Trade
Agreement, which will open up opportunities for
enterprises targeted under IR 2.1.
Note: The arrows demonstrate the linkage of AO1, IR 1, and IR 1.1 to the FAF. As an example, IR1 links to the program element 4.6.1
“Business Enabling Environment”. IR 1.1 links to 4.7.2.1 “Reduce Barriers to Registering Micro and Small Business”.
11
IR 2.3
Information
Exchange
Improved
For more information:
TIPS publications are available online at [insert website].
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication including Gerry Britan
and Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This
publication was updated by Michelle Adams-Matson, of Management Systems International.
Comments can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
12
NUMBER 15
2011 Printing
PERFORMANCE MONITORING & EVALUATION
TIPS
MEASURING INSTITUTIONAL CAPACITY
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
peroformance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive Service (ADS) Chapter 203.
What are the strengths and limitations of each
approach with regard to internal bias, quantification, or comparability over time or across
organizations?
INTRODUCTION
This PME Tips gives USAID managers information on measuring institutional capacity,* including some tools that measure the capacity of an
entire organization as well as others that look
at individual components or functions of an organization. The discussion concentrates on the
internal capacities of individual organizations,
rather than on the entire institutional context
in which organizations function. This Tips is not
about how to actually strengthen an institution, nor is it about how to assess the eventual
impact of an organization’s work. Rather, it is
limited to a specific topic: how to measure an
institution’s capacities.
How will the data be collected and how participatory can and should the measurement process be?
Measuring institutional capacity might be one
important aspect of a broader program in institutional strengthening; it may help managers
make strategic, operational, or funding decisions;
or it may help explain institutional strengthening activities and related performance.
It addresses the following questions:
Whatever the reason for assessing institutional
capacity, this Tips presents managers with several tools for identifying institutional strengths
and weaknesses.
Which measurement approaches are most
useful for particular types of capacity building?
The paper will define and discuss capacity assessment in general and present several ap1
able to accomplish their mission and provide
for their own needs in the long run.

USAID operating units build capacity with a broad
spectrum of partner and customer organizations.
These include but are not limited to:
proaches for measuring institutional capacity.
We assess the measurement features of each
approach to help USAID managers select the
tool that best fits their diverse management
and reporting needs. The paper is organized as
follows:
1. Background: Institutional Capacity • American private voluntary organizations
Building and USAID
(PVOs)
2. How to Measure Institutional Capacity
• Local and international nongovernmental
organizations (NGOs) and other civil society organizations (CSOs)
3. Measurement Issues
4. Institutional Assessment Tools
• Community-based membership cooperatives, such as a water users group
5. Measuring Individual Organizational
Components
• Networks and associations of organizations
6. Developing Indicators
• Political parties
7. Practical Tips for a Busy USAID Manager
• Government entities (ministries, departments, agencies, subunits, policy analysis
units, health clinics, schools)
BACKGROUND:
INSTITUTIONAL CAPACITY
BUILDING AND USAID
• Private sector organizations (financial institutions, companies, small businesses and
other forprofit organizations)
• Regional institutions
USAID operating units must work closely with
partner and customer organizations to meet
program objectives across all Agency goal areas, among them Peace and Security, Governing
Justly and Democratically, Economic Growth,
Investing in People, and Humanitarian Assistance. In the course of planning, implementing,
and measuring their programs, USAID managers often find that a partner or customer organization’s lack of capacity stands in the way
of achieving results. Increasing the capacity of
partner and customer organizations helps them
carry out their mandate effectively and function
more efficiently. Strong organizations are more
The Agency uses a variety of techniques to
build organizational capacity. The most common involve providing technical assistance, advisory services, and long-term consultants to
organizations, to help them build the skills and
experience necessary to contribute successfully to sustainable development. Other techniques include providing direct inputs, such as
financial, human, and technological resources.
Finally, USAID helps establish mentoring relationships; provides opportunities for formal
study in-country, in the United States or in
third countries; and it sets up internships or
2
apprenticeships with other organizations. The
goal of strengthening an institution is usually to
improve the organization’s overall performance
and viability by improving administrative and
management functions, increasing the effectiveness of service provision, enhancing the organization’s structure and culture, and furthering its sustainability. Institutional strengthening
programs may address one or more of these
components.
the capacity of an organization to help make
decisions about awarding grants or holding
grantees accountable for results. In this case,
the assessment is more of an external oversight/audit of an organization hired to carry out
Agency programs. Or, the manager may have a
programmatic commitment to strengthen the
abilities of customer and partner organizations.
Different tools and methods are available for
both situations. This paper deals primarily with
programs that fit the latter description.
In most cases, USAID managers are concerned
with institutional strengthening because they
are interested in the eventual program-level results (and the sustainability of these results) that
these stronger organizations can help achieve.
While recognizing the need to address eventual results, this Tips looks primarily at ways to
measure institutional capacity. Understanding
and measuring institutional capacity are critical
and often more complex than measuring the
services and products an organization delivers.
Within USAID, the former Office of Private and
Voluntary Cooperation (PVC) took the lead on
building the capacity of nongovernmental organization (NGO) and private voluntary organization (PVO) partners. PVC has defined development objectives and intermediate results aimed
specifically at improving the internal capacity
of U.S. PVOs. PVC has studied different approaches to institutional capacity building and
has begun to develop a comprehensive capacity assessment tool called discussion-oriented
Measuring organizational capacity is important organizational self-assessment, described in exbecause it both guides USAID interventions ample 1 in this paper. In addition to DOSA, PVC
and allows managers to demonstrate and re- has developed several indicators for measuring
port on progress. The data that emerge from institutional capacity development.
measuring institutional capacity are commonly
used in a number of valuable ways. These data PVC specifically targets NGOs and PVOs
establish baselines and provide the basis for and is particularly concerned with enhancsetting targets for improvements. They help ex- ing partnerships. USAID missions, by contrast,
plain where or why something is going wrong; work with a broader range of organizations
they identify changes to specific program in- on activities aimed at increasing institutional
terventions and activities that address areas of capacity. Such programs usually view instipoor performance; they inform managers of the tutional capacity as a means to achieve highimpact of an intervention or the effectiveness er level program results, rather than as an
of an intervention strategy; and they identify end in itself.
lessons learned.They are also useful for reporting to Washington and to partners.
HOW TO MEASURE
INSTITUTIONAL CAPACITY
It is important to note the difference between
assessing capacity for contracting and grantmaking decisions versus for a “capacity build- An organization can be thought of as a system
ing” relationship with partner/customer organi- of related components that work together to
zations. A USAID manager may want to assess achieve an agreed-upon mission. The follow3
ing list of organizational components is not
all-inclusive, nor does it apply universally to
all organizations. Rather, the components are
representative of most organizations involved
in development work and will vary according
to the type of organization and the context in
which it functions.
• External relations
Administrative and Support Functions
• Other
Resources
• Human
• Financial
• Administrative procedures and management systems
MANAGEMENT ISSUES
• Financial management (budgeting, accounting, fundraising, sustainability)
This TIPS presents capacity-assessment tools
and other measurement approaches that, while
similar in some ways, vary in both their emphasis and their method for evaluating an organization’s capacity. Some use scoring systems and
others don’t; some use questionnaires while
others employ focus groups; some use external evaluators , and others use selfassessments;
some emphasize problem solving, while others concentrate on appreciating organzational
strengths. Some tools can be used to measure
the same standard across many organizations,
while others are organization specific. Many of
the tools are designed so that the measurement
process is just as important as, if not more important than, the resulting information. They
may involve group discussions, workshops, or
exercises, and may explicitly attempt to be participatory. Such tools try to create a learning
opportunity for the organization’s members, so
that the assessment itself becomes an integral
part of the capacity-building effort.
• Human resource management (staff recruitment, placement, support)
• Management of other resources (information, equipment, infrastructure)
Technical/Program Functions
• Service delivery system
• Program planning
• Program monitoring and evaluation
• Use and management of technical knowledge and skills
Structure and Culture
• Organizational identity and culture
• Vision and purpose
Because of each user’s different needs, it would
be difficult to use this TIPS as a screen to predetermine the best capacity-assessment tool for
each situation. Rather, managers are encouraged to adopt the approaches most appropriate
to their program and to adapt the tools best
suited for local needs. To assist managers in
identifying the most useful tools and approach-
• Leadership capacity and style
• Organizational values
• Governance approach
4
es, we consider the following issues for each of
the tools presented:
methods are hands-on and highly participatory, involving a wide range of customers,
partners, and stakeholders, while others
are more exclusive, relying on the opinion
of one or two specialists. In most cases, it
is best to use more than one data collection method.
• Type of organization measured. Many
of the instruments developed to measure
institutional capacity are designed specifically for measuring NGOs and PVOs. Most
of these can be adapted easily for use with
other types of organizations, including government entities.
• Objectivity. By their nature, measures of
institutional capacity are subjective. They
rely heavily on individual perception, judgment, and interpretation. Some tools are
better than others at limiting this subjectivity. For instance, they balance perceptions with more empirical observations, or
they clearly define the capacity area being
measured and the criteria against which
it is being judged. Nevertheless, users of
these tools should be aware of the limitations to the findings.
• Comparability across organizations.
To measure multiple organizations, to compare them with each other, or to aggregate
the results of activities aimed at strengthening more than one organization, the tool
used should measure the same capacity
areas for all the organizations and use the
same scoring criteria and measurement
processes. Note, however, that a standard
tool, applied to diverse organizations, is
less able to respond to specific organizational or environmental circumstances. This
is less of a problem if a group of organizations, using the same standard tool, has
designed its diagnostic instrument together
(see the following discussion of PROSE).
• Quantification. Using numbers to represent capacity can be helpful when they
are recognized as relative and not absolute
measures. Many tools for measuring institutional capacity rely on ordinal scales.
Ordinal scales are scales in which values
can be ranked from high to low or more
to less in relation to each other. They are
useful in ordering by rank along a continuum, but they can also be misleading.
Despite the use of scoring criteria and
guidelines, one person’s “3” may be someone else’s “4.” In addition, ordinal scales
do not indicate how far apart one score is
from another. (For example, is the distance
between “agree” and “strongly agree” the
same as the distance between “disagree”
and “strongly disagree”?) Qualitative
descriptions of an organization’s capacity
level are a good complement to ordinal
scales.
• Comparability over time. In many cases, the value of measuring institutional capacity lies in the ability to track changes in
one organization over time. That requires
consistency in method and approach. A
measurement instrument, once selected
and adapted to the needs of a particular
organization, must be applied the same way
each time it is used. Otherwise, any shifts
that are noted may reflect a change in the
measurement technique rather than an
actual change in the organization.
• Data collection. Data can be collected
in a variety of ways: questionnaires, focus
groups, interviews, document searches, and
observation, to name only some. Some
• Internal versus external assessments.
Some tools require the use of external
5
facilitators or assessors; others offer a
process that the organization itself can
follow. Both methods can produce useful
data, and neither is automatically better
than the other. Internal assessments can
facilitate increased management use and
better understanding of an assessment’s
findings, since the members of the organization themselves are carrying out the
assessment. By contrast, the risk of bias
and subjectivity is higher in internal assessments. External assessments may be more
objective. They are less likely to introduce
internal bias and can make use of external
expertise. The downside is that external
assessors may be less likely to u cover
what is really going on inside an organization.
same format.
• Practicality. The best measurement
systems are designed to be as simple as
possible-- not too time consuming, not unreasonably costly, yet able to provide managers with good information often enough
to meet their management needs. Managers should take practicality into account
when selecting a measurement tool. They
should consider the level of effort and
resources required to develop the instrument and collect and analyze the data, and
think about how often and at what point
during the management cycle the data will
be available to managers.
Background
• Background of the methodology/tool
• Process (how the methodology/tool is used
in the field)
• Product (the types of outputs expected)
• Assessment (a discussion of the uses and
relative strengths of each methodology/
tool)
• An example of what the methodology/tool
looks like
PARTICIPATORY, RESULTS-ORIENTED
SELF-EVALUATION
The participatory, results-oriented self-evaluation (PROSE) method was developed by Evan
Bloom of Pact and Beryl Levinger of the Education Development Center. It has the dual
purpose of both assessing and enhancing organizational capacities. The PROSE method produces an assessment tool customized to the
organizations being measured. It is designed to
compare capacities across a set of peer organizations, called a cohort group, which allows
for benchmarking and networking among the
organizations. PROSE tools measure and profile
organizational capacities and assess, over time,
how strengthening activities affect organizational capacity. In addition, through a facilitated
workshop, PROSE tools are designed to allow
organizations to build staff capacity; create consensus around future organizational capacitybuilding activities; and select, implement, and
track organizational change and development
strategies.
INSTITUTIONAL
ASSESMENT TOOLS
This section describes capacity measurement
tools that USAID and other development organizations use. You can find complete references
and Web sites in the resources section at the One example of an instrument developed using
end of the paper. For each tool, we follow the the PROSE method is the discussion-oriented
6
organizational self-assessment. DOSA was developed in 1997 for the Office of Private and
Voluntary Cooperation and was designed specifically for a cohort of USAID PVO grantees.
Participatory, Results-Oriented
Self-Evaluation
Type of Organization Measured
Process
NGOs/PVOs; adaptable to other types of organizations
Developers of the PROSE method recommend
that organizations participate in DOSA or develop a customized DOSA-like tool to better fit
their organization’s specific circumstances. The
general PROSE process for developing such a
tool is as follows: After a cohort group of organizations is defined, the organizations meet in
a workshop setting to design the assessment
tool. With the help of a facilitator, they begin by
pointing to the critical organizational capacities
they want to measure and enhance. The cohort
group then develops two sets of questions: discussion questions and individual questionnaire
items. The discussion questions are designed to
get the group thinking about key issues. Further,
these structured discussion questions minimize
bias by pointing assessment team members toward a common set of events, policies, or conditions. The questionnaire items then capture
group members’ assessments of those issues
on an ordinal scale. During the workshop, both
sets of questions are revised until the cohort
group is satisfied. Near the end of the process,
tools or standards from similar organizations
can be introduced to check the cohort group’s
work against an external example. If the tool
is expected to compare several organizations
within the same cohort group, the tool must be
implemented by facilitators trained to administer it effectively and consistently across the
organizations.
Features
• Cross-organizational comparisons can be
made
• Measures change in one organization or a
cohort of organizations over time
• Measures well-defined capacity areas
against well-defined criteria
• Assessment based primarily upon perceived capacities
• Produces numeric score on capacity areas
• Assessment should be done with the help
of an outside facilitator or trained insider
• Data collected through group discussion
and individual questionnaires given to a
cross-section of the organization’s staff
team meets for four to six hours and should
represent a cross-functional, crosshierarchical
sample from the organization. Participants respond anonymously to a questionnaire, selecting the best response to statements about the
organization’s practices (1=strongly disagree,
2=disagree, 3=neutral, 4=agree, 5=strongly
Once the instrument is designed, it is applied
agree) in six capacity areas:
to each of the organizations in the cohort. In
the case of DOSA, the facilitator leads a team
• External Relations
of the organization’s members through a series
(constituency development, fund-raising
of group discussions interspersed with individand communications)
ual responses to 100 questionnaire items. The
7
Example 1. Excerpt From DOSA, a PROSE Tool
The DOSA questionnaire can be found in annex 1a
The following is a brief example drawn from the Human Resource Management section of the
DOSA questionnaire:
Discussion Questions
a. When was our most recent staff training?
b. How often over the last 12 months have we held staff training events?
Questionnaire items for individual response
Strongly
Disagress
Neutral
Agree
Strongly
Disagree
Agree
1. We routinely offer
1
2
3
4
5
staff training.
Discussion Questions
a. What are three primary, ongoing functions (e.g., monitoring and evaluation, proposal writing, resource mobilization) that we carry out to achieve our mission?
b. To what extent does staff, as a group, have the requisite skills to carry out these functions?
c. To what extent is the number of employees carrying out these functions commensurate
with work demands?
Questionnaire items for individual response
Strongly
Disagress
Neutral
Disagree
2. We have the ap1
2
3
propriate staff skills
to achieve our mission
3. We have the ap1
2
3
propriate staff numbers to achieve our
mission
Agree
4
4
Strongly
Agree
5
5
*The annexes for this paper are available separately and can be obtained through the USAID
Development Experience Clearinghouse at http://dec.usaid.gov/index.cfm
• Financial Resource Management
(budgeting, forecasting, and cash management)
• Human Resource Management
(staff training, supervision, and personnel
practices)
8
• Organizational Learning
(teamwork and information sharing)
check on the perceived capacities reported by
individual organizational members. It also helps
identify capacity areas that all members agree
• Strategic Management
need immediate attention.
(planning, governance, mission, and partnering) Because the cohort organizations develop the
specifics of the instrument together and share
• Service Delivery
a common understanding and application of the
(field-based program practices and sustainabil- approach, PROSE is relatively good at comparity issues)
ing organizations with each other or rolling up
results to report on a group of organizations
Although the analysis is statistically complex, together. However, the discussions could influquestionnaires can be scored and graphics pro- ence the scoring if facilitators are not consisduced using instructions provided on the DOSA tent in their administration of the tool.
Web site. In the case of DOSA, the DOSA team
in Washington processes the results and posts INSTITUTIONAL DEVELOPMENT FRAMEthem on the Internet. The assessment tool can WORK
be readministered annually to monitor organizational changes.
Background
The institutional development framework (IDF)
is a tool kit developed by Mark Renzi of Management Systems International. It has been used
in USAID/Namibia’s Living in a Finite Environment project as well as several other USAID
programs. Designed specifically to help nonprofit organizations improve efficiency and become more effective, the IDF is best suited for
the assessment of a single organization, rather
than a cohort group (as opposed to PROSE).
The kit contains three tools (Institutional Development Framework, Institutional Development Profile, and Institutional Development
Calculation Sheet), which help an organization
determine where it stands on a variety of organizational components, identify priority areas
of improvement, set targets, and measure progress over time. While it can be adapted for any
organization, the IDF was originally formulated
for environmental NGOs.
Product
PROSE instruments produce two types of
scores and accompanying graphics. The first is a
capacity score, which indicates how an organization perceives its strengths and weaknesses in
each of the capacity and subcapacity areas. The
second is a consensus score, which shows the
degree to which the assessment team members
agree on their evaluation of the organization’s
capacity.
Assessment
Unless the existing DOSA questions are used,
developing a PROSE instrument from scratch
can be time consuming and generally requires
facilitators to guide the process of developing and using the instrument. PROSE, like most
other such instruments, is based on perceived
capacities and does not currently include a
method for measuring externally observable
performance in various capacity areas (although
this is under consideration). It is unique among
the instruments in this paper in its use of a
consensus score. The consensus score acts as a
Process
An organization can use the IDF tools either
with or without the help of a facilitator. The
IDF identifies five organizational capacity areas,
9
Institutional Development Framework
Type of Organization Measured
NGOs/PVOs; adaptable to other types of organizations
Features
• Can be used, with limitations, to compare across organizations
• Measures change in the same organization over time
• Measures well-defined capacity areas against well-defined criteria
• Assessment based primarily upon perceived capacities
• Produces numeric score on capacity areas
• Produces qualitative description of an organization’s capacity in terms of developmental stages
• Assessment can be done internally or with help of an outside facilitator
•
• Data collected through group discussion with as many staff as feasible
called resource characteristics. Each capacity (public relations, ability to work with local
area is further broken down into key compo- communities, ability to work with government
nents, including:
bodies, ability to work with other NGOs)
Each key component within a capacity area is
• Oversight/Vision
rated at one of four stages along an organiza(board, mission, autonomy)
tional development continuum (1= start up, 2=
development, 3= expansion/consolidation, and
• Management Resources
4= sustainability). IDF offers criteria describing
(leadership style, participatory managment,
each stage of development for each of the key
management systems, planning, community
components (see example 2 below).
participation, monitoring, evaluation)
Different processes can be used depending on
• Human Resources
the organization’s size and the desired out(staff skills, staff development, organizational
come. Small organizations usually involve as
diversity)
many staff as possible; larger organizations may
work in small groups or use a few key infor• Financial Resources
mants. Members of the organization can modify
(financial management, financial vulnerability,
the Institutional Development Framework to fit
financial solvency)
their organization. Nonapplicable areas can be
ignored and new areas can be added, although
• External Resources
the creator of the tool warns against complete10
ly rewriting the criteria. Through discussion, the
participating members then use the criteria to
determine where along the development continuum their organization is situated for each
component. The resulting graphic, the Institutional Development Profile (IDP), uses bars or
“x”s to show where the organization ranks on
each key component.Through a facilitated meeting or group discussion, organization members
then determine which areas of organizational
capacity are most important to the organization
and which need priority attention for improvement. Using the IDP, they can visually mark their
targets for the future.
The IDF also provides numeric ratings. Each key
component can be rated on a scale of 1 to 4,
and all components can be averaged together
to provide a summary score for each capacity area. This allows numeric targets to be set
and monitored. The Institutional Development
Calculation Sheet is a simple table that permits
the organization to track progress over time by
recording the score of each component along
the development continuum.
Example 2. Excerpt From the IDF Tool
The following is an excerpt from the Financial Management section of the Institutional Development Framework. The entire framework appears in annex 2.
Resource
Key
Characteristic Component
Financial
Management
Budget as
Management
Tools
Cash
Controls
Financial
Security
Criteria for Each Progressive Stage
(the Development Continuum)
Start Up
Development
Expansion and
Consolidation
1
2
3
Total expendiBudgets are
Budgets are
not used as
developed for ture is usually
within 20% of
management
project actools.
tivities, but are budget, but
often over- or actual activity
underspent
often diverge
by more than from budget
20%.
predictions.
Improved
No clear
Financial
procedures ex- controls exist financial control
ist for handling but lack a sys- systems exist.
payables and
tematic office
receivables.
procedure.
Financing
comes from
only one
source.
Financing
comes from
multiple
sources, but
90% or more
from one
source.
11
No single
source of funding provides
more than 60%
of funding.
Sustainability
4
Budgets are
integral part of
project management
and are adjusted as project
implementation
warrants.
Excellent cash
controls for
payables and
receivables and
established
budget procedures.
No single source
provides more
than 40% of
funding.
Product
Process
The IDF produces a graphic that shows the
component parts of an organization and the organization’s ratings for each component at different points in time. It also provides a numeric
score/rating of capacity in each key component
and capacity area.
The OCAT is intended to be a participatory
self-assessment but may be modified to be an
external evaluation. An assessment team, composed of organizational members (representing different functions of the organization) plus
some external helpers, modifies the OCAT assessment sheet to meet its needs (annex 3).The
Assessment
assessment sheet consists of a series of statements under seven capacity areas (with subThe IDF is an example of a tool that not only elements). The assessment team then identifies
helps assess and measure an organization’s ca- sources of information, assigns tasks, and uses a
pacity but also sets priorities for future change variety of techniques (individual interviews, foand improvements. Compared with some of the cus groups, among others) to collect the inforother tools, IDF is relatively good at tracking mation they will later record on the assessment
one organization’s change over time because of sheet. The assessment team assigns a score to
the consistent criteria used for each progres- each capacity area statement (1=needs urgent
sive stage of development. It is probably not attention and improvement; 2=needs attention;
as well suited for making cross-organizational 3=needs improvement; 4=needs improvement
comparisons, because it allows for adjustment in limited aspects; but not major or urgent;
to fit the needs of each individual organization. 5=room for some improvement; 6=no need for
immediate improvement).The assessment team
ORGANIZATIONAL CAPACITY ASSESMENT would have to develop precise criteria for what
TOOL
rates as a “1” or a “2,” etc.
Background
The capacity areas and sub-elements are:
Pact developed the organizational capacity assessment tool (OCAT) in response to a need
to examine the impact of NGO capacity-building activities. Like the Institutional Development Framework, OCAT is better suited for
measuring one organization over time. The
OCAT differs substantially from the IDF in
its data collection technique. It is designed to
identify an organization’s relative strengths and
weaknesses and provides the baseline information needed to develop strengthening interventions. It can also be used to monitor progress.
The OCAT is well known; other development
organizations have widely adapted it. Designed
to be modified for each measurement situation,
the OCAT can also be standardized and used
across organizations.
• Governance
(board, mission/goal, constituency, leadership,
legal status)
• Management Practices
(organizational structure, information
management, administration procedures,
personnel, planning, program development,
program reporting)
• Human Resources
(human resources development, staff roles,
work organization, diversity issues, supervisory
practices, salary and benefits)
12
• Financial Resources
(accounting, budgeting, financial/inventory
Example 3. Excerpt From an Adaptation of the OCAT
USAID/Madagascar developed a capacity assessment tool based on the OCAT, but tailored it to
its own need to measure 21 partner institutions implementing reproductive health programs,
including the Ministry of Health. The mission tried to measure different types of organizations
and compare them by creating a standardized instrument to use with all the organizations.
Combining the OCAT results with additional information from facilitated discussions, the mission was able to summarize how different types of organizations perceived different aspects of
their capacity and recommend future strengthening programs.
Some of the difficulties that USAID/Madagascar encountered when using the tool included
having to translate questions from French to Malagasy, possibly losing some of their meaning;
finding that some respondents were unable to answer some questions because they had no
experience with the part of the organization to which the questions referred; discovering that
some respondents had difficulty separating the subject area of the questionnaire (family planning) from their work in other health areas; and having difficulty scheduling meetings because
of the organizations’ heavy workload. Moreover, the mission noted that the instrument is based
on perceptions and is self-scored, with the resulting potential for bias.a
Below is an excerpt from the “communications/extension to customers” component of the
OCAT used by USAID/Madagascar. The entire questionnaire is in annex 4.
Classification Scale
0 Nonexistent or out of order
1 Requires urgent attention and upgrading
2 Requires overall attention and upgrading
3 Requires upgrading in certain areas, but neither major nor urgent
4 Operating, but could benefit from certain improvements
5 Operating well in all regards
Communications/Extension to Customers
a. The institution has in each clinic a staff trained and competent in
counseling all customers.
1 2 3 4 5
b. The institution is able to identify and develop key messages for exten- 1 2 3 4 5
sion among potential customers, and it can produce or obtain materials
for communicating such messages.
c. A well-organized community extension is practiced by the clinic’s
staff or other workers affiliated with the institution, whether they are
salaried or volunteers. A system exists for supervising extension workers and monitoring their effectiveness.
13
1 2 3 4 5
controls, financial reporting)
• Service Delivery
(sectoral expertise, constituency, impact
assessment)
The IDF and the OCAT are similar in several
ways, but the processes differ. The OCAT uses
an assessment team that conducts research before completing the assessment sheet. For the
IDF, organization members meet and fill out the
sheet (determine their capacities) without the
intermediate data collection step (the OCAT,
by design, relies on evidence to supplement
perceptions when conducting an assessment,
and the IDF does not). The OCAT’s data-gathering step allows for systematic cross-checking
of perceived capacities with actual or observable “facts.” It is more inductive, building up to
the capacity description, while the IDF attempts
to characterize the organization along the development continuum from the beginning. The
OCAT categorizes an organization’s capacity
areas into one of four developmental stages.
Unlike the IDF, which uses the stages as the criteria by which members rate their organization,
the OCAT uses them as descriptors once the
rating has been done.
• External Relations
(constituency relations, inter-NGO collaboration, public relations, local resources, media)
• Sustainability
(program/benefit sustainability, organizational
sustainability, financial sustainability, resource
base sustainability)
After gathering data, the assessment team
meets to reach a consensus on the rating of
each element. With the help of an OCAT rating sheet, averages can be calculated for each
capacity area. These numeric scores indicate
the relative need for improvement in each area.
They also correspond to a more qualitative description of the organization’s developmental
stage. Each capacity area can be characterized
as nascent, emerging, expanding, or mature. DYNAMIC PARTICIPATORY
OCAT provides a table (similar to the IDF), INSTITUTIONAL DIAGNOSIS
“NGO Organizational Development—Stages
and Characteristics” that describes organiza- Background
tional capacities at each stage of development.
The dynamic participatory institutional diagnoProduct
sis (DPID) was developed by the Senegal PVO/
NGO support project in conjunction with the
The OCAT provides numeric ratings for each New TransCentury Foundation and Yirawah Incapacity area. In addition, it gives organizations ternational. It is a rapid and intensive facilitated
a description of their capacity areas in terms assessment of the overall strengths and weakof progressive stages of organizational develop- nesses of an organization. This methodology
ment. This information can be presented graph- explores member perceptions of an organizaically as well as in narrative form.
tion and the organization’s relationship with its
environment. DPID is highly participatory; an
Assessment
organization assesses itself in the absence of
external benchmarks or objectives to take full
The OCAT identifies areas of organization- advantage of its specific context, such as culture
al strength and weakness and tracks related and attitudes.
changes from one measurement period to the
next.
Process
14
Example 4. An Application of DPID
Since the DPID is such an individualized and flexible tool, every application will be different.
The DPID does not lend itself easily to an example as do the other tools in this Tips. Below
is an anecdote about one West African organization’s use of the DPID as reported by the
Senegal DPIPVO/NGO support project.
A Federation of Farmers’ Cooperatives with about 15,000 members in the Sahel was looking for a unique and efficient approach to redress some of the organization’s problems. The
federation suffered from internal strife and a tarnished reputation, impeding its ability to raise
funds. Through DPID, the federation conducted a critical in-depth analysis of its operational
and management systems, resulting in the adoption of “10 emergency measures” addressing
leadership weaknesses, management systems, and operational procedures. Subsequently, the
organization underwent internal restructuring, including an overhaul of financial and administrative systems. One specific result of the DPID analysis was that federation members gained
more influence over the operations of the federation.
An outside facilitator conducts the DPID over
5 to 10 days. It takes place during a series of
working sessions in which the facilitator leads
an organization’s members through several
stages: discussion of the services; operations
and results of the organization; exploration of
the issues affecting the organization; and summarization of the “state of the organization.”
During the discussions, members analyze the
following features of the organization:
They examine each element with reference to
institutional behavior, human behavior, management, administration, know-how, philosophy
and values, and sensitive points.
• Identity
Assessment
• Mission
Unlike the previously described tools, the DPID
does not use ranking, scoring, or questionnaires,
nor does it assess the organization along a continuum of developmental stages. Assessment
is based purely on group reflection. The DPID
requires a facilitator experienced in leading a
group through this type of analysis.
• Means and Resources
• Environment
• Management
• Internal Operations
• Service Provided and Results
Product
A written description of the state of the organization can result from the working sessions.The
analysis is qualitative without numeric scoring.
The DPID is open ended but somewhat systematic in covering a predefined set of organizational functions. Because of its flexibility, the
DPID is organization specific and should not
15
ORGANIZATIONAL CAPACITY
INDICATOR
Dynamic Participatory
Institutional Diagnosis
Background
From 1994 through 1997, the Christian Reformed World Relief Committee (CRWRC)
conducted research on organizational capacitybuilding with the Weatherhead School of Management at Case Western Reserve University
and more than 100 local NGOs around the
world. The results of this research led them to
replace their earlier system, the Skill Rating System, with an approach to capacity building and
assessment based on “appreciative inquiry.” Appreciative inquiry is a methodology that emphasizes an organization’s strengths and potential
more than its problems. It highlights those qualities that give life to an organization and sustain its ongoing capacity. Rather than providing
a standardized tool, the organizational capacity
indicator assumes that capacity monitoring is
unique to each organization and in the organization’s own self-interest. The organizational
capacity indicator (OCI) builds ownership because each organization creates its own capacity
assessment tool. Capacity areas are self-defined
and vary from organization to organization.
Type of Organization Measured
NGOs/PVOs; adaptable to other types of organizations
Features
• Difficult to compare across organizations
• Difficult to compare the same organization over time
• Capacity areas and criteria for measurement are loosely defined
• Assessment based primarily upon perceived capacities
• Produces qualitative description of an
organization’s capacity
• Assessment done with the help of an
outside facilitator
• Data collected through group discussion
with the organization’s staff
Process
Although organizations create their own tool
under the OCI, they all follow a similar probe used to compare organiza tions. Nor is it a
cess in doing so. As they involve all partners
rigorous means of monitoring an organization’s
and stakeholders as much as possible, the parchange over time. Since the DPID does not use
ticipants “appreciate” the organization’s history
external standards to assess institutional caand culture.Together they explore peak experipacities, it should not be used to track accountences, best practices, and future hopes for the
ability. Collecting information from the DPID,
organization. Next, the participants identify the
as well as using it, should offer organizations a
forces and factors that have made the organizaprocess to assess their needs, improve commution’s positive experiences possible. These benications, and solve problems around a range of
come the capacity areas that the organization
organizational issues at a given moment.
tries to monitor and improve.
Next, the participants develop a list of “provoca16
ences related to each capacity component. The
organization should monitor itself by this process twice a year. The results of the assessment
should be used to encourage future development, plans, and aspirations.
tive propositions” for each capacity area. These
propositions, visions of what each capacity area
should ideally look like in the future, contribute
to the overall objective: that each organization
will be able to measure itself against its own
vision for the future, not some external standard. Each capacity area is defined by the most
ambitious vision of what the organization can
become in that area. Specific indicators or behaviors are then identified to show the capacity
area in practice. Next, the organization designs
a process for assessing itself and sharing experi-
Product
Each time a different organization uses the
methodology, a different product specific to
that organization is developed. Thus, each tool
will contain a unique set of capacity areas, an
Example 5. Excerpt From an OCI Tool
The following is an excerpt of one section from the capacity assessment tool developed by
CRWRC’s partners in Asia, using the OCI method. (The entire tool can be found in annex 5.)
It offers a menu of capacity areas and indicators from which an organization can choose and
then modify for its own use. It identifies nine capacity areas, and under each area is a “provocative proposition” or vision of where the organization wants to be in that area. It provides an
extensive list of indicators for each capacity area, and it describes the process for developing
and using the tool. Staff and partners meet regularly to determine their capacity on the chosen indicators. Capacity level can be indicated pictorially, for example by the stages of growth
of a tree or degrees of happy faces.
Capacity Area
A clear vision, mission, strategy, and set of shared values
Proposition
Our vision expresses our purpose for existing: our dreams, aspirations, and concerns for the
poor. Our mission expresses how we reach our vision. Our strategy expresses the approach
we use to accomplish our goals. The shared values that we hold create a common understanding and inspire us to work together to achieve our goal.
Selected Indicators
• Every person can state the mission and vision in his or her own words
• There is a yearly or a six-month plan, checked monthly
• Operations/activities are within the vision, mission, and goal of the organization
• Staff know why they do what they’re doing
• Every staff member has a clear workplan for meeting the strategy
• Regular meetings review and affirm the strategy
17
• Possible to measure well-defined capacity areas across well-defined criteria
organization capacity areas like the DPID does.
The OCI is the only tool presented in this paper in which the capacity areas are entirely self
defined. It is also unique in its emphasis on the
positive, rather than on problems. Further, the
OCI is more rigorous than the DPID, in that it
asks each organization to set goals and develop
indicators as part of the assessment process. It
also calls for a scoring system to be developed,
like the more formal tools (PROSE, IDF, OCAT).
Because indicators and targets are developed
for each capacity area, the tool allows for relatively consistent measurement over time. OCI
is not designed to compare organizations with
each other or to aggregate the capacity measures of a number of organizations; however, it
has proven useful in allowing organizations to
learn from each other and in helping outsiders
assess and understand partner organizations.
• Assessment based primarily upon perceived capacities
THE YES/NO CHECKLIST OR
“SCORECARD”
• Produces numeric or pictorial score on
capacity areas
Background
Organizational Capacity
Indicator
Type of Organization Measured
NGOs/PVOs; adaptable to other types of
organizations
Features
• Difficult to comparably measure across
organizations
• Measures change in the same organization over time
A scorecard/checklist is a list of characteristics
or events against which a yes/no score is assigned. These individual scores are aggregated
and presented as an index. Checklists can effectively track processes, outputs, or more general
characteristics of an organization. In addition,
they may be used to measure processes or outputs of an organization correlated to specific
areas of capacity development.
• Assessment done internally
• Data collected through group discussion
with organization’s staff
evaluation process, and scoring methods. In
general, the product comprises a written description of where the organization wants to be
in each capacity area, a list of indicators that can Scorecards/checklists can be used either to
be used to track progress toward the targeted measure a single capacity component of an
organization or several rolled together. Scorelevel in a capacity area, and a scoring system.
cards/checklists are designed to produce a
quantitative score that can be used by itself or
Assessment
as a target (though a scorecard/checklist withLike the DPID, the OCI is highly participatory out an aggregate score is also helpful).
and values internal standards and perceptions.
Both tools explicitly reject the use of external
standards. However, the OCI does not desi nate
18
Process
The Yes/No Checklist
“Scorecard”
To construct a scorecard, follow these general
steps: First, clarify what the overall phenomena
to be measured are and identify the components that, when combined, cover the phenomenon fairly well. Next, develop a set of characteristics or indicators that together capture the
relevant phenomena. If desired, and if evidence
and analysis show that certain characteristics
are truly more influential in achieving the overall result being addressed, define a weight to be
assigned to each characteristic/indicator. Then
rate the organization(s) on each characteristic
using a well defined data collection approach.
The approach could range from interviewing
organization members to reviewing organization documents, or it could consist of a combination of methods. Finally, if desired and appropriate, sum the score for the organization(s).
Type of Organization Measured
All types of organizations
Features
• Cross-organizational comparisons can
be made
• Measures change in the same organization over time
• Measures well-defined capacity areas
against well-defined criteria
Product
• Possible to balance perceptions with
empirical observations
• Produces numeric score on capacity
areas
A scorecard/checklist results in a scored listing
of important characteristics of an organization
and can also be aggregated to get a summary
score.
• Assessment can be done by an external
evaluator or internally
• Data collected through interviews, observation, documents, involving a limited
number of staff
Assessment
A scorecard/checklist should be used when the
characteristics to be scored are unambiguous.
There is no room for “somewhat” or “yes, but . .
.” with the scorecard technique.The wording of
each characteristic should be clear and terms
should be well defined. Because scorecards/
checklists are usually based on observable facts,
processes, and documents, they are more objective than most of the tools outlined in this Tips.
This, in turn, makes them particularly useful for
cross-organizational comparisons, or tracking
organizations over time; that is, they achieve
better measurement consistency and comparability. Yet concentrating on observable facts can
be limiting, if such facts are not complemented
19
with descriptive and perceptionbased information. Though a person outside the organization
frequently completes the scorecard/checklist,
self-assessment is also possible. Unlike other
tools that require facilitators to conduct or
interpret them, individuals who are not highly
trained can also use scorecards. Further, since
scorecards are usually tightly defined and specific, they are often a cheaper measurement
tool.
Example 6. A Scorecard
USAID/Mozambique developed the following scorecard to measure various aspects of institutional capacity in partner civil society organizations. The following example measures democratic governance.
Increased Democratic Governance Within Civil Society Organizations
Characteristics
Score Multiplied Weight
By
1. Leaders (board member or equivalent) of the
X
3
CSO electedby secret ballot. No=0 pts.Yes=1 pt.
2. General assembly meetings are adequately
announced at least two weeks in advance to all
members (1 pt.) and held at least twice a year (1
pt.). Otherwise=0 pt.
3. Annual budget presented for member approval. No=0 pts.Yes=1 pt.
4. Elected leaders separate from paid employees.
No=0 pts.Yes=1 pt.
5. Board meetings open to ordinary members
(nonboard members). No=0 pts.Yes=1 pt.
X
2
X
2
X
2
X
1
Weighted
Score
Total
external relations and internal governance.
development professionals also
MEASURING INDIVIDUAL Organizational
use other tools to measure specific capacity
ORGANIZATIONAL
areas. Some drawbacks of these tools are that
they require specialized technical expertise and
COMPONENTS
they can be costly to use on a regular basis.
Other tools may require some initial training
In some cases, USAID is not trying to strength- but can be much more easily institutionalized.
en the whole organization, but rather specific Below we have identified some tools for meaparts of it that need special intervention. In many suring selected organizational components.
cases, the best way of measuring more specific (You will find complete reference information
organizational changes is to use portions of the for these tools in the resources section of this
instruments described. For instance, the IDF Tips.)
has a comparatively well-developed section
on management resources (leadership style, STRUCTURE AND CULTURE
participatory management, planning, monitoring and evaluation, and management systems). The Preferred Organizational Structure instruSimilarly, the OCAT has some good sections on ment is designed to assess many aspects of or20
ganizational structure, such as formality of rules,
communication lines, and decision-making. This
tool requires organizational development skills,
both to conduct the assessment and to interpret the results.
HUMAN RESOURCES AND THEIR
MANAGEMENT
First, the development of indicators should be
driven by the informational needs of managers,
from both USAID and the given relevant organizations-- to inform strategic and operational
decisions and to assist in reporting and communicating to partners and other stakeholders.
At times, there is a tendency to identify or design a data collection instrument without giving
too much thought to exactly what information
will be needed for management and reporting.
In these situations, indicators tend to be developed on the basis of the data that have been
collected, rather than on what managers need.
More to the point, the development of indicators should follow a thorough assessment of
informational needs and precede the identification of a data collection instrument. Managers should first determine their informational
needs; from these needs, they should articulate
and define indicators; and only then, with this
information in hand, they would identify or
develop an instrument to collect the required
data. This means that, in most cases, indicators
should not be derived, post facto, from a data
collection tool. Rather, the data collection tool
should be designed with the given indicators in
mind. Second, indicators should be developed
for management decisions at all levels (input indicators, output indicators, process indicators,
and outcome/impact indicators). With USAID’s
increased emphasis on results, managers sometimes may concentrate primarily on strategic
indicators (for development objectives and
intermediate results). While an emphasis on
results is appropriate, particularly for USAID
managers, tracking operational-level information for the organizations supported through
a given Agency program is critical if managers
are to understand if, to what degree, and how
the organizations are increasing their capacities. The instruments outlined in this paper can
provide data for indicators defined at various
management levels.
Many personnel assessments exist, including the
Job Description Index and the Job Diagnostic
Survey, both of which measure different aspects
of job satisfaction, skills, and task significance.
However, skilled human resource practitioners
must administer them. Other assessments, such
as the Alexander Team Effectiveness Critique,
have been used to examine the state and functioning of work teams and can easily be applied
in the field.
SERVICE DELIVERY
Often, a customer survey is one of the best
ways to measure the efficiency and effectiveness of a service delivery system. A specific
customer survey would need to be designed
relative to each situation. Example 7 shows a
sample customer service assessment.
DEVELOPING INDICATORS
Indicators permit managers to track and understand activity/program performance at both
the operational (inputs, outputs, processes)
and strategic (development objectives and intermediate results) levels. To managers familiar
with the development and use of indicators, it
may seem straightforward to derive indicators
from the instruments presented in the preceding pages. However, several critical points will
ensure that the indicators developed within the
context of these instruments are useful to managers.
Finally, indicators should meet the criteria out21
Example 7. A Customer Service Assessment
1. In the past 12 months, have you ever contacted a municipal office to complain about something such as poor
city services or a rude city official, or any other reason?
________No ________Yes
If YES:
1a. How many different problems or complaints did you contact the municipality about in the last 12 months?
________One ________Two ________Three to five ________More than five
1b. Please describe briefly the nature of the complaint starting with the one you feel was most important.
1._______________________________________________
2._______________________________________________
3._______________________________________________
2. Which department or officials did you contact initially regarding these complaints?
____Mayor’s office
____Council member
____Police
____Sanitation
____Public works
____Roads
____Housing
____Health
____Other________________________________________
2a. Were you generally satisfied with the city’s response? (IF DISSATISFIED, ASK: What were the major
reasons for your dissatisfaction?)
_____Response not yet completed
_____Satisfied
_____Dissatisfied, never responded or corrected condition
_____Dissatisfied, poor quality or incorrect response was provided
_____Dissatisfied, took too long to complete response, had to keep pressuring for results, red tape, etc.
_____Dissatisfied, personnel were discourteous, negative, etc.
_____Dissatisfied, other_____________________________
3. Overall, are you satisfied with the usefulness, courtesy and effectiveness of the municipal department or
official that you contacted?
_____Definitely yes
_____Generally yes
_____Generally no (explain)__________________________
_____Definitely no (explain)__________________________
Survey adapted from Hatry, Blair, and others, 1992.
22
lined in USAID’s Automated Directives System
and related pieces of Agency guidance such as
CDIE’s Performance Monitoring and Evaluation Tips #6, “Selecting Performance Indicators,” and Tips #12, “Guidelines for Indicator
and Data Quality.” That is, indicators should be
direct, objective, practical, and adequate. Once
an indicator has been decided upon, it is important to document the relevant technical details:
a precise definition of the indicator; a detailed
description of the data source; and a thorough
explanation of the data collection method. (Refer to Tips #7, “Preparing a Performance Monitoring Plan.”)
data). If a result refers to multiple organizations,
it might be useful to frame an indicator in terms
of the number or percent of the organizations
that meet or exceed a given threshold score or
development stage, on the basis of an aggregate
index or the score of a single element for each
organization. The key is to ensure that the indicator reflects the result and to then identify the
most appropriate and useful measurement instrument.
RESULTS-LEVEL INDICATORS
Example 8 includes real indicators used by USAID missions in 1998 to report on strategic objectives and intermediate results in institutional
capacity strengthening.
USAID managers spend substantial time and
energy developing indicators for development
objectives and intermediate results related to
institutional capacity. The range of the Agency’s
institutional strengthening programs is broad,
as is the range of the indicators that track the
programs’ results. Some results reflect multiple
organizations and others relate to a single organization. Additionally, of those results that relate to multiple organizations, some may refer
to organizations from only one sector while
others may capture organizations from a number of sectors. Results related to institutional
strengthening also vary relative to the level of
change they indicate-- such as an increase in institutional capacity versus the eventual impact
generated by such an i crease-- and with regard to whether they reflect strengthening of
the whole organization(s) or just one or several
elements. It is relatively easy to develop indicators for all types of results and to use the instruments outlined in this Tips to collect the necessary data. For example, when a result refers
to strengthening a single organization, across
all elements, an aggregate index or “score” of
institutional strength may be an appropriate indicator (an instrument based on the IDF or the
scorecard model might be used to collect such
PRACTICAL TIPS FOR A
BUSY USAID MANAGER
This TIPS introduces critical issues related to
measuring institutional capacity. It presents a
number of approaches that managers of development programs and activities currently use
in the field. In this section we summarize the
preceding discussion by offering several quick
tips that USAID managers should find useful as
they design, modify, and implement their own
approaches for measuring institutional capacity.
1. Carefully review the informational needs of
the relevant managers and the characteristics of the organization to be measured to
facilitate development of indicators. Identify
your information needs and develop indicators
before you choose an instrument.
23
2. To assist you in selecting an appropriate
measurement tool, ask yourself the following
questions as they pertain to your institutional
capacity measurement situation. Equipped
with the answers to these questions, you
Example 8.
Selected Institutional Capacity Indicators From USAID Missions
Indicator
• Number of institutions meeting at least
80% of their targeted improvements
To Measure
Institutions strengthened (entire organization)
• Amount of funds raised from non-USAID
Institutions more financially sustainable
sources
• Number of organizations where USAID
contribution is less than 25% of revenues
• Number of organizations where at least
five funding sources contribute at least 10%
each
• Percent of suspected polio cases investiga- Organization’s service delivery systems
tee within 48 hours
strengthened
• Number of governmental units displaying
Local government management capacities
improved practices, such as open and trans- improved
parent financial systems, set organizational
procedures, accountability, participatory
decision-making, by-laws and elections
can scan the “features list” that describes
every tool in this paper to identify which
measurement approaches to explore further.
• Is the objective to measure the entire organization? Or is it to measure
specific elements of the organization?
If the latter, what are the specific capacity areas of functions to be measured?
• How will the information be used?
To measure change in an organization over time? To compare organizations with each other?
To inform procurement decisions?
To hold an organization accountable
for achieving results or implementing
reforms?
• What type of organizations are you
measuring? Are there any particular
measurement issues pertaining to
this type of organization that must
be considered?
• How participatory do you want the
measurement process to be?
• What is the purpose of the intervention? To strengthen an organization?
24
• Will organization members themselves or outsiders conduct the
assessment?
• What product do you want the measurement tool to generate?
and involve some form of ordinal scaling/
scoring. When reviewing data, managers
should therefore zero in on the direction and
general degree of change. Do not be overly
concerned about small changes; avoid false
precision.
• Do you want the measurement process to be an institution-strengthening exercise in itself?i. Do you need
an instrument that measures one
organization? Several organizations
againstindividual criteria? Or several organizations against standard
criteria?
5. Cost matters-- and so does the frequency
and timing of data collection. Data need to
be available frequently enough, and at the
right point in the program cycle, to inform
operational and strategic management decisions. Additionally, the management benefits
of data should exceed the costs associated
with their collection.
3. If you are concerned about data reliability, apply measurement instruments consistently over
time and across organizations to ensure data
reliability. You can adapt and adjust tools as
needed, but once you develop the instru- 6. The process of measuring institutional capacity
can contribute substantially to increasing an orment, use it consistently.
ganization’s strength. A number of measurement approaches are explicitly designed as
4. When interpreting and drawing conclusions
learning opportunities for organizations;
from collected data, remember the limits of the
that is, to identify problems and suggest rerelevant measurement tool. Most methods for
lated solutions, to improve communication,
measuring institutional capacity are subjecor to facilitate a consensus around future
tive, as they are based on the perceptions
priorities
of those participating in the assessment,
This TIPS was prepared for CDIE by Alan Lessik and
Victoria Michener of Management Systems International.
Bibliography
RESOURCES
Booth, W.; and R. Morin. 1996. Assessing Organizational Capacity Through Participatory
Monitoring and Evaluation Handbook. Prepared for the Pact Ethiopian NGO Sector
Enhancement Initiative. Washington: USAID.
Center for Democracy and Governance. 1998. Handbook of Democracy and Governance
25
Program Indicators.Washington: U.S. Agency for International Development.
Christian Reformed World Relief Committee. 1997. Partnering to Build and Measure
Organizational Capacity. Grand Rapids, Mich.
Cooper, S.; and R. O’Connor. 1993. “Standards for Organizational Consultation: Assessment and Evaluation Instruments.” Journal of Counseling and Development 71: 651-9.
Counterpart International. N.d. “CAP Monitoring and Evaluation Questionnaire.”
—N.d. “Manual for the Workshop on Development of a Training and Technical Assistance Plan (TTAP).”
—N.d. “Institutional Assessment Indicators.”
Drucker, P.; and C. Roseum. 1993. How to Assess Your Nonprofit Organization with Peter Drucker’s Five
Important Questions: User Guide for Boards, Staff,Volunteers and Facilitators. Jossey--Bass .
Eade, D. 1997. Capacity-Building: An Approach to People-Centred Development. Oxford: Oxfam.
Fowler, A.; L. Goold; and R. James. 1995. Participatory Self Assessment of NGO Capacity. INTRAC
Occasional Papers Series No. 10. Oxford.
Hatry, H.; L. Blair; D. Fisk; J. Grenier; J. Hall; and P. Schaenman. 1992. How Effective Are Your Community
Services? Procedures for Measuring Their Quality. Washington: The Urban
Institute.
International Working Group on Capacity Building of Southern NGOs. 1998. “Southern NGO Capacity Building: Issues and Priorities.” New Delhi: Society for Participatory Research in Asia.
International Working Group on Capacity Building for NGOs. 1998. “Strengthening Southern NGOs: The Donor Perspective.” Washington: USAID and The World Bank.
Kelleher, D. and K. McLaren with R. Bisson. 1996. “Grabbing the Tiger by the Tail: NGOs
Learning forOrganizational Change.” Canadian Council for International Cooperation.
Lent, D. October 1996. “What is Institutional Capacity?” On Track: The Reengineering Digest. 2 (7): 3. Washington: U.S. Agency for International Development.
Levinger, B. and E. Bloom. 1997. Introduction to DOSA: An Outline Presentation.
http://www.edc.org/int/capdev/dosafile/dosintr.htm.
Lusthaus, C., G. Anderson, and E. Murphy. 1995. “Institutional Assessment: A Framework for
Strengthening Organizational Capacity for IDRC’s Research Partners.” IDRC.
26
Mentz, J.C.N. 1997. “Personal and Institutional Factors in Capacity Building and Institutional
Development.” European Centre for Development Policy Management Working Paper
No. 14.
Morgan, P.; and A. Qualman. 1996. “Institutional and Capacity Development, Results-Based
Management and Organisational Performance.” Canadian International Development
Agency.
New TransCentury Foundation. 1996. Practical Approaches to PVO/NGO Capacity Building:
Lessons from the Field (five monographs). Washington: U.S.Agency for International
Development.
Pact. N.d. “What is Prose?”
—1998. “Pact Organizational Capacity Assessment Training of Trainers.” 7-8 January.
Renzi, M. 1996. “An Integrated Tool Kit for Institutional Development.”Public Administration
and Development 16: 469-83.
—N.d. “The Institutional Framework: Frequently Asked Questions.” Unpublished paper.
Management Systems International.
Sahley, C. 1995. “Strengthening the Capacity of NGOs: Cases of Small Enterprise Development
Agencies in Africa.” INTRAC NGO Management and Policy Series. Oxford.
Save the Children. N.d. Institutional Strengthening Indicators: Self Assessment for NGOs. UNDP. 1997.
Capacity Assessment and Development. Technical Advisory Paper No. 3, Management
Development and Governance Division. New York.
Bureau for Policy and Program Coordination. 1995. USAID-U.S. PVO Partnership. Policy
Guidance. Washington: U.S. Agency for International Development.
Office of Private and Voluntary Cooperation. 1998. USAID Support for NGO Capacity-Building:
Approaches, Examples, Mechanisms. Washington: U.S. Agency for International
Development.
—1998. Results Review Fiscal Year 1997. Washington: U.S. Agency for International
Development.
NPI Learning Team. 1997. New Partnerships Initiative: A Strategic Approach to Development
Partnering. Washington: U.S. Agency for International Development. 23
USAID/Brazil. 1998. Fiscal Year 2000 Results Review and Resource Request.
USAID/Guatemala. 1998. Fiscal Year 2000 Results Review and Resource Request.
27
USAID/Indonesia. 1998. Fiscal Year 2000 Results Review and Resource Request.
USAID/Madagascar. 1998. Fiscal Year 2000 Results Review and Resource Request.
—1997. Institutional Capacity Needs Assessment.
USAID/Mexico. 1998. The FY 1999--FY 2003 Country Strategy for USAID in Mexico.
USAID/Mozambique. 1998. Fiscal Year 2000 Results Review and Resource Request.
USAID/West Bank--Gaza. 1998. Fiscal Year 2000 Results Review and Resource Request.
Whorton, J.; and D. Morgan. 1975. Measuring Community Performance: A Handbook of
Indicators, University of Oklahoma.
World Bank. 1996. Partnership for Capacity Building in Africa: Strategy and Program of Action.
Washington.
World Learning. 1998. Institutional Analysis Instrument: An NGO Development Tool.
Sources of Information on Institutional Capacity Measurement Tools
Discussion-Oriented Organizational Self-Assessment:
http://www.edc.org/int/capdev/dosafile/dosintr.htm.
Institutional Development Framework: Management Systems International. Washington.
Organizational Capacity Assessment Tool: http://www.pactworld.org/ocat.html Pact.Washington.
Dynamic Participatory Institutional Diagnostic: New TransCentury Foundation. Arlington,Va.
Organizational Capacity Indicator: Christian Reformed World Relief Committee. Grand
Rapids, Mich.
Smith, P.; L. Kendall; and C. Hulin. 1969. The Measurement of Satisfaction in Work and
Retirement. Rand McNally.
Hackman, J.R.; and G.R. Oldham. 1975. “Job Diagnostic Survey: Development of the Job
Diagnostic Survey”
Journal of Applied Psychology 60: 159-70.
Goodstein, L.D.; and J.W. Pfeiffer, eds. 1985. Alexander Team Effectiveness Critique:The 1995
Annual: Developing Human Resources. Pfeiffer & Co.
28
Bourgeois, L.J.; D.W. McAllister; and T.R. Mitchell. 1978. “Preferred Organizational Structure:
The Effects of Different Organizational Environments Upon Decisions About
Organizational Structure.” Academy of Management Journal 21: 508-14.
Kraut, A. 1996. Customer and Employee Surveys: Organizational Surveys:Tools for Assessment and
Change. Jossey-Bass Publishers. 24
29
NUMBER 16
1ST EDITION 2010
PERFORMANCE MONITORING & EVALUATION
TIPS
CONDUCTING MIXED-METHOD EVALUATIONS
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
performance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive System (ADS) Chapter 203.
INTRODUCTION
This TIPS provides guidance on
using a mixed-methods approach
for
evaluation
research.
Frequently, evaluation statements
of work specify that a mix of
methods be used to answer
evaluation questions. This TIPS
includes the rationale for using a
mixed-method evaluation design,
guidance for selecting among
methods (with an example from
an evaluation of a training
program) and examples of
techniques for analyzing data
collected with several different
methods
(including
―parallel
analysis‖).
MIXED-METHOD
EVALUATIONS
DEFINED
A mixed-method evaluation is
one that uses two or more
techniques or methods to collect
the data needed to answer one or
more evaluation questions. Some
of the different data collection
methods that might be combined
in
an
evaluation
include
structured
observations,
key
informant interviews, pre- and
post-test surveys, and reviews of
government statistics. This could
involve the collection and use of
both quantitative and qualitative
data to analyze and identify
findings
and
to
develop
conclusions in response to the
evaluation questions.
1
RATIONALE FOR
USING A MIXEDMETHOD
EVALUATION DESIGN
There are several possible cases
in which it would be highly
beneficial to employ mixedmethods in an evaluation design:
 When a mix of different
methods is used to collect data
from different sources to
provide independent estimates
of key indicators—and those
estimates complement one
another—it
increases
the
validity of conclusions related
to an evaluation question. This
is referred to as triangulation.
(See TIPS 5: Rapid Appraisal,
and Bamberger, Rugh and
Key Steps in Developing a Mixed-Method Evaluation Design and Analysis
Strategy
1. In order to determine the methods that will be employed, carefully review the purpose of the evaluation and the
primary evaluation questions. Then select the methods that will be the most useful and cost-effective to answer
each question in the time period allotted for the evaluation. Sometimes it is apparent that there is one method
that can be used to answer most, but not all, aspects of the evaluation question.
2. Select complementary methods to cover different aspects of the evaluation question (for example, the how and
why issues) that the first method selected cannot alone answer, and/or to enrich and strengthen data analysis
and interpretation of findings.
3. In situations when the strength of findings and conclusions for a key question is absolutely essential, employ a
triangulation strategy. What additional data sources and methods can be used to obtain information to answer
the same question in order to increase the validity of findings from the first method selected?
4. Re-examine the purpose of the evaluation and the methods initially selected to ensure that all aspects of the
primary evaluation questions are covered thoroughly. This is the basis of the evaluation design. Develop data
collection instruments accordingly.
5. Design a data analysis strategy to analyze the data that will be generated from the selection of methods chosen
for the evaluation.
6. Ensure that the evaluation team composition includes members that are well-versed and experienced in applying
each type of data collection method and subsequent analysis.
7. Ensure that there is sufficient time in the evaluation statement of work for evaluators to fully analyze data
analysis
canof conducting
also  Mixed-method
evaluations
generated[2006]
from eachfor
method
employedricher
and to realize
the and
benefits
a mixed method evaluation.
Mabry
further
explanation and descriptions of
triangulation strategies used in
evaluations.)
provide a better understanding
of the context in which a
program operates.
 When reliance on one method
There are a number of additional
benefits derived from using a mix
of methods in any given
evaluation.
alone may not be sufficient to
answer all aspects of each
evaluation question.
 When the data collected from
one method can help interpret
findings from the analysis of
data collected from another
method.
For
example,
qualitative data from in-depth
interviews or focus groups can
help
interpret
statistical
patterns from quantitative data
collected through a randomsample survey. This yields a
 Using
mixed-methods can
more readily yield examples of
unanticipated
changes
or
responses.
 Mixed-method
evaluations
have the potential of surfacing
other key issues and providing
a deeper understanding of
program context that should
be considered when analyzing
data and developing findings
and conclusions.
2
often yield a wider range of
points of view that might
otherwise be missed.
DETERMINING
WHICH METHODS TO
USE
In a mixed-method evaluation,
the evaluator may use a
combination of methods, such as
a survey using comparison
groups in a quasi-experimental or
experimental design, a review of
key documents, a reanalysis of
government statistics, in-depth
interviews with key informants,
focus groups, and structured
observations. The selection of
methods, or mix, depends on the
nature of the evaluation purpose
and the key questions to be
addressed.
SELECTION OF DATA
COLLECTION
METHODS – AN
EXAMPLE
The selection of which methods
to use in an evaluation is
driven by the key evaluation
questions to be addressed.
Frequently,
one
primary
evaluation method is apparent.
For
example,
suppose
an
organization wants to know
about the effectiveness of a pilot
training program conducted for
100 individuals to set up their
own small businesses after the
completion of the training.
The evaluator should ask what
methods are most useful and
cost-effective to assess the
question of the effectiveness of
that training program within the
given time frame allotted for the
evaluation. The answer to this
question must be based on the
stated outcome expected from
the training program. In this
example, let us say that the
organization’s expectations were
that, within one year, 70 percent
of the 100 individuals that were
trained will have used their new
skills and knowledge to start a
small business.
What is the best method to
determine whether this outcome
has been achieved? The most
costeffective
means
of
answering this question is to
survey 100 percent of the
individuals who graduated from
the training program using a
close-ended questionnaire.
It
follows that a survey instrument
should be designed to determine
if these individuals have actually
succeeded in starting up a new
business.
While this sounds relatively
straightforward, organizations are
often interested in related issues.
If less than 70 percent of the
individuals started a new business
one year after completion of the
training,
the
organization
generally wants to know why
some
graduates
from
the
program were successful while
others were not. Did the training
these individuals received actually
help them start up a small
business? Were there topics that
should have been covered to
more thoroughly prepare them
for the realities of setting up a
business?
Were there other
topics that should have been
addressed?
In summary, this
organization wants to learn not
only whether at least 70 percent
of the individuals trained have
started up a business, but also
how effectively the training
equipped them to do so. It also
wants to know both the strengths
and the shortcomings of the
training so that it can improve
future training programs.
The organization may also want
to know if there were factors
outside the actual intervention
that had a bearing on the
training’s success or failure. For
example, did some individuals
find employment instead? Was
3
access to finance a problem? Did
they conduct an adequate market
analysis? Did some individuals
start with prior business skills?
Are there factors in the local
economy, such as local business
regulations, that either promote
or discourage small business
start-ups? There are numerous
factors
which
could
have
influenced this outcome.
The selection of additional
methods to be employed is,
again, based on the nature of
each aspect of the issue or set
of related questions that the
organization wants to probe.
To continue with this example,
the evaluator might expand the
number of survey questions to
address issues related to the
effectiveness of the training and
external factors such as access to
finance.
These
additional
questions can be designed to
yield additional quantitative data
and to probe for information
such as the level of satisfaction
with the training program, the
usefulness
of
the
training
program
in
establishing
a
business, whether the training
graduate
received
a
small
business start-up loan, if the size
of the loan the graduate received
was sufficient, and whether
graduates are still in the process
of starting up their businesses or
instead have found employment.
Intake data from the training
program on characteristics of
each trainee can also be
examined to see if there are any
particular characteristics, such as
sex or ethnic background, that
can be correlated with the survey
findings.
It is important to draw on
additional methods to help
explain the statistical findings
from the survey, probe the
strengths and shortcomings of
the training program, further
understand issues related to
access to finance, and identify
external factors affecting success
in starting a business.
In this
case, the evaluation design could
focus on a sub-set of the 100
individuals to obtain additional
qualitative information.
A
selected group of 25 people
could be asked to answer an
additional series of open-ended
questions during the same
interview session, expanding it
from 30 minutes to 60 minutes.
Whereas asking 100 people
open-ended questions would be
better than just 25 people, costs
prohibit interviewing the entire
group.
Using
the
same
example,
suppose the organization has
learned
through
informal
feedback that access to finance is
likely a key factor in determining
success in business start-up in
addition to the training program
itself.
Depending on the
evaluation
findings,
the
organization may want to design
a finance program that increases
access to loans for small business
start-ups. To determine the
validity of this assumption, the
evaluation design relies on a
triangulation approach to assess
whether and how access to
finance for business start-ups
provides further explanations
regarding success or failure
outcomes. The design includes a
plan to collect data from two
other sources using a separate
data collection method for each
source. The first data source
includes the quantitative data
from the survey of the 100
training
graduates.
The
evaluation designers determine
that the second data source will
be the managers of local banks
and credit unions that survey
respondents reported having
approached for start-up loans.
In-depth interviews will be
conducted
to
record
and
understand policies for lending to
entrepreneurs trying to establish
small businesses, the application
of those policies, and other
business practices with respect to
prospective clients.
The third
data source is comprised of bank
loan statistics for entrepreneurs
who have applied to start up
small businesses. Now there are
three independent data sources
using different data collection
methods to assess whether
access to finance is an additional
key factor in determining small
business start-up success.
In this example, the total mix of
methods the evaluator would use
includes the following: the survey
of all 100 training graduates, data
from open-ended questions from
a subset of graduates selected for
longer interviews, analysis of
training intake data on trainee
characteristics,
in-depth
interviews with managers of
lending institutions, and an
examination of loan data. The
use of mixed-methods was
necessary because the client
organization in this case not only
wanted to know how effective the
pilot training course was based
4
on its own measure of program
success, but also whether access
to finance contributed to either
success or failure in starting up a
new business. The analysis of the
data will be used to strengthen
the training design and content
employed in the pilot training
course, and as previously stated,
perhaps to design a microfinance
program.
The last step in the process of
designing
a
mixed-method
evaluation is to determine how
the data derived from using
mixed-methods will be analyzed
to produce findings and to
determine the key conclusions.
ANALYZING DATA
FROM A MIXEDMETHOD
EVALUATION –
DESIGNING A DATA
ANALYSIS STRATEGY
It is important to design the data
analysis strategy before the
actual data collection begins.
Having done so, the evaluator
can begin thinking about trends
in findings from different sets of
data to see if findings converge
or diverge.
Analyzing data
collected from a mixture of
methods is admittedly more
complicated than analyzing the
data derived from one method.
This entails a process in which
quantitative and qualitative data
analysis strategies are eventually
connected to determine and
understand key findings. Several
different techniques can be used
to analyze data from mixedmethods approaches, including
parallel
analysis,
conversion
analysis,
sequential
analysis,
multilevel analysis, and data
synthesis. The choice of analytical
techniques should be matched
with the purpose of the
evaluation using mixed-methods.
Table 1 briefly describes the
different analysis techniques and
the situations in which each
method is best applied.
In
complex
evaluations
with
multiple issues to address, skilled
evaluators may use more than
one of these techniques to
analyze the data.
EXAMPLE OF
APPLICATION
Here we present an example of
parallel
mixed-data
analysis,
because it is the most widely
used analytical technique in
mixed-method evaluations. This
is followed by examples of how
to resolve situations where
divergent findings arise from the
analysis of data collected through
a triangulation process.
PARALLEL
MIXED-DATA
ANALYSIS
Parallel mixed-data analysis is
comprised of two major steps:
Step 1: This involves two or
more analytical processes. The
data collected from each method
employed must be analyzed
separately.
For example, a
statistical analysis of quantitative
data derived from a survey, a set
of height/weight measures, or a
set of government statistics is
conducted. Then, a separate and
independent
analysis
is
conducted of qualitative data
derived from, for example, indepth interviews, case studies,
focus groups, or structured
observations
to
determine
emergent
themes,
broad
patterns, and contextual factors.
The main point is that the
analysis of data collected from
each
method
must
be
conducted independently.
Step 2: Once the analysis of the
data generated by each data
collection method is completed,
the evaluator focuses on how the
analysis and findings from each
data set can inform, explain,
and/or strengthen findings from
the other data set. There are two
possible
primary
analytical
methods for doing this – and
sometimes both methods are
used in the same evaluation.
Again, the method used depends
on the purpose of the evaluation.
 In cases where more than one
method is used specifically to
strengthen
and
validate
findings for the same question
through a triangulation design,
the evaluator compares the
findings from the independent
analysis on each data set to
determine if there is a
convergence of findings. This
method is used when it is
critical to produce defensible
conclusions that can be used to
inform
major
program
decisions (e.g., end or extend a
program).
 To interpret or explain findings
from
quantitative
5
analysis,
evaluators use findings from
the analysis of qualitative data.
This method can provide a
richer analysis and set of
explanations affecting program
outcomes that enhance the
utility of the evaluation for
program
managers.
Conversely,
patterns
and
associations arising from the
analysis of quantitative data
can inform additional patterns
to look for in analyzing
qualitative data. The analysis
of qualitative data can also
enhance the understanding of
important program context
data. This method is often used
when program managers want
to know not only whether or
not a program is achieving its
intended results, but also, why
or why not.
WHEN FINDINGS
CONVERGE
DO
NOT
In cases where mixed-method
evaluations employ triangulation,
it is not unusual that findings
from the separate analysis of
each
data
set
do
not
automatically converge. If this
occurs, the evaluator must try to
resolve the conflict among
divergent findings. This is not a
disaster. Often this kind of
situation
can
present
an
opportunity to generate more
nuanced
explanations
and
important additional findings that
are of great value.
One method evaluators use when
findings from different methods
diverge is to carefully re-examine
the raw qualitative data through
a second and more in-depth
content analysis. This is done to
determine if there were any
factors or issues that were missed
when these data were first being
organized for analysis.
The
results of this third layer of
analysis can produce a deeper
understanding of the data, and
can then be used to generate
new interpretations.
In some
cases, other factors external to
the program might be discovered
through contextual analysis of
economic, social or political
conditions or an analysis of
operations and interventions
across program sites.
Another approach is to reanalyze
all the disaggregated data in
each data set separately, by
characteristics of the respondents
as appropriate to the study, such
as age, gender, educational
background, economic strata,
etc., and/or by geography/locale
of respondents.
The results of this analysis may
yield other information that can
help to resolve the divergence of
findings. In this case, the
evaluator should attempt to rank
order these factors in terms of
frequency of occurrence. This
further analysis will provide
additional explanations for the
variances in findings. While most
professionals build this type of
disaggregation into the analysis
of the data during the design
phase of the evaluation, it is
worth reexamining patterns from
disaggregated data.
Evaluators should also check for
data quality issues, such as the
validity of secondary data sources
or possible errors in survey data
from incomplete recording or
incorrect coding of responses.
(See TIPS 12: Data Quality
Standards.) If the evaluators are
still at the program site, it is
possible to resolve data quality
issues with limited follow-up data
collection by, for example,
conducting in-depth interviews
with key informants (if time and
budget permit).
In cases where an overall
summative program conclusion is
required, another analytical tool
that is used to resolve divergent
findings is the data synthesis
method.
(See Table 2.) This
method rates the strength of
findings generated from the
analysis of each data set based
on the intensity of the impact
(e.g., on a scale from very high
positive to very high negative)
and the quality and validity of the
data. An overall rating is assigned
for each data set, but different
weights can then be assigned to
different data sets if the evaluator
knows that certain data sources
or methods for collecting data
are stronger than others.
Ultimately, an index is created
based on the average of those
ratings to synthesize an overall
program effect on the outcome.
See McConney, Rudd and Ayres
(2002) to learn more about this
method.
REPORTING ON
MIXED-METHOD
EVALUATIONS
Mixed-method
evaluations
generate a great deal of data,
6
and, to profit from the use of
those methods, evaluators must
use and analyze all of the data
sets. Through the use of mixedmethod evaluations, findings and
conclusions can be enriched and
strengthened. Yet there is a
tendency to underuse, or even
not to use, all the data collected
for the evaluation. Evaluators can
rely too heavily on one particular
data source if it generates easily
digestible and understandable
information for a program
manager. For example, in many
cases data generated from
qualitative
methods
are
insufficiently analyzed. In some
cases only findings from one
source are reported.
One
way
to
prevent
underutilization of findings is to
write a statement of work that
provides the evaluator sufficient
time to analyze the data sets
from each method employed,
and hence to develop valid
findings, explanations, and strong
conclusions that a program
manager
can
use
with
confidence.
Additionally,
statements of work for evaluation
should require evidence of, and
reporting on, the analysis of data
sets from each method that was
used to collect data, or
methodological justification for
having discarded any data sets.
REFERENCES
Bamberger, Michael, Jim Rugh and Linda Mabry. Real World Evaluation: Working Under Budget,
Time, Data and Political Constraints, Chapter 13, ―Mixed-Method Evaluation,‖ pp. 303-322, Sage
Publications Inc., Thousand Oaks, CA, 2006.
Greene, Jennifer C. and Valerie J. Caracelli. ―Defining and Describing the Paradigm Issue in Mixedmethods Evaluation,” in Advances in Mixed-Method Evaluation: The Challenges and Benefits of
Integrating Diverse Paradigms, Green and Caracelli eds. New Directions for Evaluation. Josey-Bass
Publishers, No. 74, Summer 1997, pp 5-17.
Mark, Melvin M., Irwin Feller and Scott B. Button. ―Integrating Qualitative Methods in a
Predominantly Quantitative Evaluation: A Case Study and Some Reflections,‖ in Advances in
Mixed-Method Evaluation: The Challenges and Benefits of Integrating Diverse Paradigms, Green
and Caracelli eds. New Directions for Evaluation. Josey-Bass Publishers, No. 74, Summer 1997, pp
47-59.
McConney, Andrew, Andy Rudd, and Robert Ayres. ―Getting to the Bottom Line: A Method for
Synthesizing Findings Within Mixed-method Program Evaluations,‖ in American Journal of
Evaluation, Vol. 3, No. 2, 2002, pp. 121-140.
Teddlie, Charles and Abbas Tashakkori, Foundations of Mixed-methods Research: Integrating
Quantitative and Qualitative Approaches in the Behavioral Science, Sage Publications, Inc., Los
Angeles, 2009.
7
TABLE 1 – METHODS FOR ANALYZING MIXED-METHODS DATA1
Analytical
Method
Brief Description
Best for…
Parallel
Two or more data sets collected using a mix of
Triangulation designs to look for
methods (quantitative and qualitative) are analyzed
convergence of findings when the strength
independently. The findings are then combined or
of the findings and conclusions is critical,
integrated.
or to use analysis of qualitative data to
yield deeper explanations of findings from
quantitative data analysis.
Conversion
Two types of data are generated from one data source
Extending the findings of one data set, say,
beginning with the form (quantitative or qualitative) of
quantitative, to generate additional
the original data source that was collected. Then the
findings and/or to compare and potentially
data are converted into either numerical or narrative
strengthen the findings generated from a
data. A common example is the transformation of
complimentary set of, say, qualitative data.
qualitative narrative data into numerical data for
statistical analysis (e.g., on the simplest level,
frequency counts of certain responses).
Sequential
A chronological analysis of two or more data sets
Testing hypotheses generated from the
(quantitative and qualitative) where the results of the
analysis of the first data set.
analysis from the first data set are used to inform the
analysis of the second data set. The type of analysis
conducted on the second data set is dependent on the
outcome of the first data set.
Multilevel
Qualitative and quantitative techniques are used at
Evaluations where organizational units for
different levels of aggregation within a study from at
study are nested (e.g., patient, nurse,
least two data sources to answer interrelated evaluation
doctor, hospital, hospital administrator in
questions. One type of analysis (qualitative) is used at
an evaluation to understand the quality of
one level (e.g., patient) and another type of analysis
patient treatment).
(quantitative) is used in at least one other level (e.g.,
nurse).
Data
Synthesis
A multi-step analytical process in which: 1) a rating of
Providing a bottom-line measure in cases
program effectiveness using the analysis of each data
where the evaluation purpose is to provide
set is conducted (e.g., large positive effect, small
a summative program-wise conclusion
positive effect, no discernable effect, small negative
when findings from mixed-method
effect, large negative effect; 2) quality of evidence
evaluations using a triangulation strategy
assessments are conducted for each data set using
do not converge and appear to be
“criteria of worth” to rate the quality and validity of each
irresolvable, yet a defensible conclusion is
data set gathered; 3) using the ratings collected under
needed to make a firm program decision.
the first two steps, develop an aggregated equation for
Note: there may still be some divergence in
each outcome under consideration to assess the overall
the evaluation findings from mixed data
strength and validity of each finding; and 4) average
sets that the evaluator can still attempt to
outcome-wise effectiveness estimates to produce one
resolve and/or explore to further enrich the
overall program-wise effectiveness index.
analysis and findings.
1
See Teddlie and Tashakkori (2009) and Mark, Feller and Button (1997) for examples and further explanations of parallel data analysis.
See Teddlie and Tashakkori (2009) on conversion, sequential, multilevel, and fully integrated mixed-methods data analysis; and
McConney, Rudd, and Ayers (2002), for a further explanation of data synthesis analysis.
8
For more information:
TIPS publications are available online at [insert website].
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication including USAID’s
Office of Management Policy, Budget and Performance (MPBP). This publication was written by Dr.
Patricia Vondal of Management Systems International.
Comments regarding this publication can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
9
NUMBER 17
1ST EDITION, 2010
PERFORMANCE MONITORING & EVALUATION
TIPS
CONSTRUCTING AN EVALUATION
REPORT
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to performance
monitoring and evaluation. This publication is a supplemental reference to the Automated Directive
System (ADS) Chapter 203.
INTRODUCTION
This TIPS has three purposes. First,
it provides guidance for evaluators
on the structure, content, and style
of evaluation reports. Second, it
offers
USAID
officials,
who
commission evaluations, ideas on
how to define the main deliverable.
Third, it provides USAID officials
with guidance on reviewing and
approving evaluation reports.
The main theme is a simple one: how
to make an evaluation report useful to
its readers. Readers typically include
a
variety
of
development
stakeholders and professionals; yet,
the most important are the
policymakers and managers who
need credible information for
program or project decision-making.
Part of the primary purpose of an
evaluation usually entails informing
this audience.
To be useful, an evaluation report
should address the evaluation
questions and issues with accurate
and data-driven findings, justifiable
conclusions,
and
practical
recommendations. It should reflect
the use of sound evaluation
methodology and data collection,
and report the limitations of each.
Finally, an evaluation should be
written with a structure and style
that promote learning and action.
Five common problems emerge in
relation to evaluation reports.
These problems are as follows:
• An unclear description of the
program strategy and the specific
results it is designed to achieve.
• Inadequate description of the
evaluation’s purpose, intended
uses, and the specific evaluation
questions to be addressed.
• Imprecise analysis and reporting
of quantitative and qualitative data
collected during the evaluation.
1
• A lack of clear distinctions
between findings and conclusions.
• Conclusions
that
are
not
grounded in the facts and
recommendations that do not
flow logically from conclusions.
This guidance offers tips that apply
to an evaluation report for any type
of evaluation — be it formative,
summative (or impact), a rapid
appraisal evaluation, or one using
more rigorous methods.
Evaluation reports should be readily
understood and should identify key
points clearly, distinctly, and
succinctly. (ADS 203.3.6.6)
A PROPOSED
REPORT OUTLINE
Table 1 presents a suggested outline
and approximate page lengths for a
typical evaluation report.
The
evaluation team can, of course,
modify this outline as needed. As
indicated in the table, however,
some elements are essential parts of
any report.
This outline can also help USAID
managers define the key deliverable
in an Evaluation Statement of Work
(SOW) (see TIPS 3: Preparing an
Evaluation SOW).
We will focus particular attention
on the section of the report that
covers findings, conclusions, and
recommendations. This section
represents the core element of the
evaluation report.
BEFORE THE
WRITING BEGINS
Before the report writing begins, the
evaluation team must complete two
critical tasks: 1) establish clear and
defensible findings, conclusions, and
recommendations
that
clearly
address the evaluation questions;
and 2) decide how to organize the
report in a way that conveys these
elements most effectively.
FINDINGS,
CONCLUSIONS, AND
RECOMMENDATIONS
One of the most important tasks in
constructing an evaluation report is
to organize the report into three
main elements: findings, conclusions,
and recommendations (see Figure
1). This structure brings rigor to
the evaluation and ensures that each
element can ultimately be traced
back to the basic facts. It is this
structure that sets evaluation apart
from other types of analysis.
Once the research stage of an
evaluation is complete, the team has
typically collected a great deal of
data in order to answer the
evaluation questions. Depending on
the methods used, these data can
include observations, responses to
survey questions, opinions and facts
from key informants, secondary data
from a ministry, and so on. The
team’s first task is to turn these raw
data into findings.
Suppose, for example, that USAID
has charged an evaluation team with
answering the following evaluation
question (among others):
“How adequate are the prenatal
services provided by the Ministry of
Health’s
rural
clinics
in
Northeastern District?”
To answer this question, their
research in the district included site
visits to a random sample of rural
clinics,
discussions
with
knowledgeable health professionals,
and a survey of women who have
used clinic prenatal services during
the past year. The team analyzed
the raw, qualitative data and
identified the following findings:
• Of the 20 randomly-sampled rural
clinics visited, four clinics met all
six established standards of care,
FIGURE 1.
ORGANIZING KEY ELEMENTS
OF THE EVALUATION
REPORT
Recommendations
Proposed actions for management
Conclusions
Interpretations and judgments
based on the findings
Findings
Empirical facts collected during the
evaluation
while the other 16 (80 percent)
failed to meet at least two
standards. The most commonly
unmet standard (13 clinics) was
“maintenance of minimum staffpatient ratios.”
• In 14 of the 16 clinics failing to
meet two or more standards, not
one of the directors was able to
state the minimum staff-patient
ratios for nurse practitioners,
nurses, and prenatal educators.
TYPICAL PROBLEMS WITH FINDINGS
Findings that:
1.
Are not organized to address the evaluation questions — the reader must
figure out where they fit.
2.
Lack precision and/or context —the reader cannot interpret their relative
strength.
Incorrect: “Some respondents said ’x,’ a few said ’y,’ and others said ’z.’”
Correct: “Twelve of the 20 respondents (60 percent) said ’x,’ five (25
percent) said ’y,’ and three (15 percent) said ’z.’ ”
3.
Mix findings and conclusions.
Incorrect: “The fact that 82 percent of the target group was aware of the
media campaign indicates its effectiveness.”
Correct: Finding: “Eighty-two percent of the target group was aware of the
media campaign.” Conclusion: “The media campaign was effective.”
2
TYPICAL PROBLEMS WITH
CONCLUSIONS
Conclusions that:
1.
2.
3.
4.
Restate findings.
Incorrect: “The project met its
performance targets with respect
to outputs and results.”
Correct: “The project’s strategy
was successful.”
Are vaguely stated.
Incorrect: “The project could
have been more responsive to its
target group.”
Correct: “The project failed to
address the different needs of
targeted women and men.”
Are based on only one of several
findings and data sources.
Include respondents’ conclusions,
which are really findings.
Incorrect: “All four focus groups
of project beneficiaries judged the
project to be effective.”
Correct: “Based on our focus
group data and quantifiable data on
key results indicators, we conclude
that the project was effective.”
• Of 36 women who had used their
rural clinics’ prenatal services
during the past year, 27 (76
percent) stated that they were
“very
dissatisfied”
or
“dissatisfied,” on a scale of 1-5
from “very dissatisfied” to “very
satisfied.” The most frequently
cited reason for dissatisfaction
was “long waits for service” (cited
by 64 percent of the 27
dissatisfied women).
• Six of the seven key informants
who offered an opinion on the
adequacy of prenatal services for
the rural poor in the district
noted that an insufficient number
of prenatal care staff was a “major
problem” in rural clinics.
These findings are the empirical facts
collected by the evaluation team.
Evaluation findings are analogous to
the evidence presented in a court of
law or a patient’s symptoms
identified during a visit to the
doctor. Once the evaluation team
has correctly laid out all the findings
against each evaluation question,
only then should conclusions be
drawn for each question. This is
where many teams tend to confuse
findings and conclusions both in
their analysis and in the final report.
Conclusions represent the team’s
judgments based on the findings.
These are analogous to a court
jury’s decision to acquit or convict
based on the evidence presented or
a doctor’s diagnosis based on the
symptoms. The team must keep
findings and conclusions distinctly
separate
from
each
other.
However, there must also be a clear
and logical relationship between
findings and conclusions.
In our example of the prenatal
services evaluation, examples of
reasonable conclusions might be as
follows:
• In general, the levels of prenatal
care staff in Northeastern
District’s
rural
clinics
are
insufficient.
• The Ministry of Health’s periodic
informational bulletins to clinic
directors regarding the standards
of prenatal care are not sufficient
to ensure that standards are
understood and implemented.
However, sometimes the team’s
findings from different data sources
are not so clear-cut in one direction
as this one. In those cases, the team
must weigh the relative credibility of
the data sources and the quality of
the data, and make a judgment call.
The team might state that a
definitive conclusion cannot be
made, or it might draw a more
3
guarded conclusion such as the
following:
“The preponderance of the
evidence suggests that prenatal
care is weak.”
The team should never omit
contradictory findings from its
analysis and report in order to have
more
definitive
conclusions.
Remember,
conclusions
are
interpretations and judgments made
TYPICAL PROBLEMS WITH
RECOMMENDATIONS
Recommendations that:
1. Are unclear about the action to be
taken.
Incorrect: “Something needs to be
done to improve extension
services.”
Correct: “To improve extension
services, the Ministry of Agriculture
should implement a comprehensive
introductory training program for all
new extension workers and annual
refresher training programs for all
extension workers. “
2. Fail to specify who should take
action.
Incorrect: “Sidewalk ramps for the
disabled should be installed.”
Correct: “Through matching grant
funds from the Ministry of Social
Affairs, municipal governments
should install sidewalk ramps for the
disabled.”
3. Are not supported by any findings
and conclusions
4. Are not realistic with respect to
time and/or costs.
Incorrect: The Ministry of Social
Affairs should ensure that all
municipal sidewalks have ramps for
the disabled within two years.
Correct: The Ministry of Social
Affairs should implement a gradually
expanding program to ensure that all
municipal sidewalks have ramps for
the disabled within 15 years.
on the basis of the findings.
Sometimes we see reports that
include conclusions derived from
preconceived notions or opinions
developed
through
experience
gained outside the evaluation,
especially by members of the team
who have substantive expertise on a
particular topic. We do not
recommend this, because it can
distort the evaluation. That is, the
role of the evaluator is to present
the findings, conclusions, and
recommendations in a logical order.
Opinions outside this framework
are then, by definition, not
substantiated by the facts at hand. If
any of these opinions are directly
relevant to the evaluation questions
and come from conclusions drawn
from prior research or secondary
sources, then the data upon which
they are based should be presented
among the evaluation’s findings.
FIGURE 3
OPTIONS FOR REPORTING
FINDINGS, CONCLUSIONS,
AND RECOMMENDATIONS
OPTION 1
FINDINGS
Evaluation Question 1
Evaluation Question 2
CONCLUSIONS
Evaluation Question 1
Evaluation Question 2
OPTION 2
EVALUATION
QUESTION 1
Findings
Conclusions
Recommendations
EVALUATION
QUESTION 2
Findings
RECOMMENDATIONS Conclusions
Evaluation Question 1
Evaluation Question 2 Recommendations
OPTION 3
Mix the two approaches. Identify which
evaluation questions are distinct and which
are interrelated. For distinct questions, use
option 1 and for the latter, use option 2.
FIGURE 2
Tracking the linkages is one way to help ensure a credible report, with
information that will be useful.
Evaluation Question #1:
FINDINGS
CONCLUSIONS
RECOMMENDATIONS
XXXXXX
YYYYYY
ZZZZZZ
XXXXXX
ZZZZZZ
XXXXXX
YYYYYY
Once conclusions are complete, the
team is ready to make its
recommendations.
Too often
recommendations do not flow from
the team’s conclusions or, worse,
they are not related to the original
evaluation purpose and evaluation
questions. They may be good ideas,
but they do not belong in this
section of the report.
As an
alternative, they could be included in
an annex with a note that they are
derived
from
coincidental
observations made by the team or
from team members’ experiences
elsewhere.
Using our example related to rural
health clinics, a few possible
recommendations could emerge as
follows:
• The
Ministry
of
Health’s
Northeastern
District
office
should develop and implement an
annual prenatal standards-of-care
training program for all its rural
clinic directors. The program
would cover….
• The Northeaster District office
should
conduct
a
formal
assessment of prenatal care
staffing levels in all its rural clinics.
• Based on the assessment, the
4
ZZZZZZ
Northeastern
District
office
should establish and implement a
five-year plan for hiring and
placing needed prenatal care staff
in its rural clinics on a mostneedy-first basis.
Although
the
basic
recommendations should be derived
from conclusions and findings, this is
where the team can include ideas
and options for implementing
recommendations that may be based
on their substantive expertise and
best
practices
drawn
from
experience outside the evaluation
itself. Usefulness is paramount.
When developing recommendations,
consider practicality. Circumstances
or resources may limit the extent to
which a recommendation can be
implemented. If practicality is an
issue — as is often the case — the
evaluation team may need to ramp
down recommendations, present
them in terms of incremental steps,
or suggest other options. In order
to be useful, it is essential that
recommendations be actionable or,
in other words, feasible in light of
the human, technical, and financial
resources available.
Weak connections between findings,
conclusions, and recommendations
can undermine the user’s confidence
in evaluation results. As a result, we
encourage teams—or, better yet, a
colleague who has not been
involved—to review the logic before
beginning to write the report. For
each evaluation question, present all
the findings, conclusions, and
recommendations in a format similar
to the one outlined in Figure 2.
Starting with the conclusions in the
center, track each one back to the
findings that support it, and decide
whether the findings truly warrant
the conclusion being made. If not,
revise the conclusion as needed.
Then track each recommendation to
the conclusion(s) from which it
flows, and revise if necessary.
CHOOSE THE BEST
APPROACH FOR
STRUCTURING THE
REPORT
Depending on the nature of the
evaluation questions and the
findings,
conclusions,
and
recommendations, the team has a
few options for structuring this part
of the report (see Figure 3). The
objective is to present the report in
a way that makes it as easy as
possible for the reader to digest all
of the information. Options are
discussed below.
Option 1- Distinct Questions
If all the evaluation questions are
distinct from one another and the
relevant findings, conclusions, and
recommendations do not cut across
questions, then one option is to
organize the report around each
evaluation question. That is, each
question will include a section
including its relevant findings,
conclusions, and recommendations.
Option 2- Interrelated
Questions
If, however, the questions are
closely interrelated and there are
findings,
conclusions,
and/or
recommendations that apply to
more than one question, then it may
be preferable to put all the findings
for all the evaluation questions in
one section, all the conclusions in
another,
and
all
the
recommendations in a third.
Option 3- Mixed
If the situation is mixed—where a
few but not all the questions are
closely interrelated—then use a
mixed
approach.
Group
the
interrelated questions and their
findings,
conclusions,
and
recommendations into one subsection, and treat the stand-alone
questions and their respective
findings,
conclusions,
and
recommendations in separate subsections.
The important point is that the team
should be sure to keep findings,
conclusions, and recommendations
separate and distinctly labeled as such.
Finally, some evaluators think it
more useful to present the
conclusions first, and then follow
with the findings supporting them.
This helps the reader see the
“bottom line” first and then make a
judgment as to whether the
conclusions are warranted by the
findings.
OTHER KEY
SECTIONS OF THE
REPORT
THE EXECUTIVE
SUMMARY
The Executive Summary should
stand alone as an abbreviated
version of the entire report. Often
it is the only thing that busy
managers read. The Executive
Summary should be a “mirror
image” of the full report—it should
contain no new information that is
not in the main report.
This
principle also applies to making the
Executive Summary and the full
report equivalent with respect to
presenting positive and negative
evaluation results.
Although all sections of the full
report are summarized in the
Executive Summary, less emphasis is
given to an overview of the project
and the description of the evaluation
purpose and methodology than is
given to the findings, conclusions,
and recommendations. Decisionmakers
are
generally
more
interested in the latter.
The Executive Summary should be
written after the main report has
been drafted. Many people believe
that a good Executive Summary
should not exceed two pages, but
there is no formal rule in USAID on
this. Finally, an Executive Summary
should be written in a way that will
entice interested stakeholders to go
on to read the full report.
DESCRIPTION OF THE
PROJECT
Many evaluation reports give only
cursory
attention
to
the
development
problem
(or
opportunity) that motivated the
project in the first place, or to the
5
FIGURE 4. SUMMARY OF EVALUATION DESIGN AND METHODS (an illustration)
Evaluation
Question
Type of Analysis
Conducted
Data Sources and
Methods Used
Type and Size of
Sample
1. How
adequate are
the prenatal
services
provided by
the Ministry
of Health’s
(MOH) rural
clinics in
Northeastern
District?
Comparison of rural
clinics’ prenatal
service delivery to
national standards
MOH manual of rural
clinic standards of care
Structured observations
and staff interviews at
rural clinics
Twenty clinics,
randomly sampled
from 68 total in
Northeastern District
Three of the originally
sampled clinics were closed
when the team visited. To
replace each, the team
visited the closest open
clinic. As a result, the sample
was not totally random.
Description, based on
a content analysis of
expert opinions
Key informant interviews
with health care experts in
the district and the MOH
Ten experts identified
by project & MOH
staff
Only seven of the 10
experts had an opinion
about prenatal care in the
district.
Description and
comparison of ratings
among women in the
district and two other
similar rural districts
In-person survey of
recipients of prenatal
services at clinics in the
district and two other
districts
Random samples of 40
women listed in clinic
records as having
received prenatal
services during the
past year from each of
the three districts’
clinics
Of the total 120 women
sampled, the team was able
to conduct interviews with
only 36 in the district, and
24 and 28 in the other two
districts. The levels of
confidence for generalizing
to the populations of service
recipients were __, __, and
__, respectively.
“theory of change” that underpins
USAID’s intervention. The “theory
of change” includes what the project
intends to do and the results which
the activities are intended to
produce. TIPS 13: Building a Results
Framework is a particularly useful
reference and provides additional
detail on logic models.
If the team cannot find a description
of these hypotheses or any model of
the project’s cause-and-effect logic
such as a Results Framework or a
Logical Framework, this should be
noted. The evaluation team will
then have to summarize the project
strategy in terms of the “if-then”
propositions that show how the
project designers envisioned the
interventions as leading to desired
results.
In describing the project, the
evaluation team should be clear
about what USAID tried to improve,
eliminate, or otherwise change for
the better. What was the “gap”
between conditions at the start of
the project and the more desirable
conditions that USAID wanted to
establish with the project? The team
should indicate whether the project
design documents and/or the recall
of interviewed project designers
offered a clear picture
of the specific economic and social
factors that contributed to the
problem — with baseline data, if
available. Sometimes photographs
and
maps
of
before-project
conditions, such as the physical
characteristics and locations of rural
prenatal clinics in our example, can
be used to illustrate the main
problem(s).
It is equally important to include
basic information about when the
project was undertaken, its cost, its
intended beneficiaries, and where it
was implemented (e.g., country-wide
or only in specific districts). It can
be particularly useful to include a
6
Limitations
map that shows the project’s target
areas.
A good description also identifies
the organizations that implement the
project, the kind of mechanism used
(e.g., contract, grant, or cooperative
agreement), and whether and how
the project has been modified during
implementation.
Finally,
the
description
should
include
information about context, such as
conflict or drought, and other
government or donor activities
focused on achieving the same or
parallel results.
THE EVALUATION
PURPOSE AND
METHODOLOGY
The credibility of an evaluation
team’s findings, conclusions, and
recommendations rests heavily on
the quality of the research design, as
well as on data collection methods
and analysis used. The reader needs
to understand what the team did
and why in order to make informed
judgments
about
credibility.
Presentation of the evaluation design
and methods is often best done
through a short
summary in the text of the report
and a more detailed methods annex
that
includes
the
evaluation
instruments. Figure 4 provides a
sample summary of the design and
methodology that can be included in
the body of the evaluation report.
From a broad point of view, what
research design did the team use to
answer each evaluation question?
Did the team use description (e.g.,
to document what happened),
comparisons (e.g., of baseline data
or targets to actual data, of actual
practice to standards, among target
sub-populations or locations), or
cause-effect research (e.g., to
determine whether the project
made a difference)? To do causeeffect analysis, for example, did the
team use one or more quasiexperimental approaches, such as
time-series analysis or use of nonproject comparison groups (see
TIPS 11: The Role of Evaluation)?
More specifically, what data collection
methods did the team use to get the
evidence needed for each evaluation
question? Did the team use key
informant interviews, focus groups,
surveys,
on-site
observation
methods, analyses of secondary data,
and other methods? How many
people did they interview or survey,
how many sites did they visit, and
how did they select their samples?
and developing the findings and
conclusions that follow in the
report. The reader needs to know
these limitations in order to make
informed judgments about the
evaluation’s
credibility
and
usefulness.
Most evaluations suffer from one or
more constraints that affect the
comprehensiveness and validity of
findings and conclusions. These may
include overall limitations on time
and
resources,
unanticipated
problems in reaching all the key
informants and survey respondents,
unexpected problems with the
quality of secondary data from the
host-country government, and the
like. In the methodology section, the
team
should
address
these
limitations and their implications for
answering the evaluation questions
When writing its report, the
evaluation team must always
remember the composition of its
audience. The team is writing for
policymakers,
managers,
and
takeholders, not for fellow social
science
researchers
or
for
publication in a professional journal.
To that end, the style of writing
should make it as easy as possible
for the intended audience to
understand and digest what the
team is presenting. For further
suggestions on writing an evaluation
in reader-friendly style, see Table 2.
7
READER-FRIENDLY
STYLE
TABLE 1. SUGGESTED OUTLINE FOR AN EVALUATION REPORT1
Element
Approximate
Number of
Pages
Description and Tips for the Evaluation Team
Title Page
1 (but no page
number)
Essential. Should include the words “U.S. Agency for International
Development” with the acronym “USAID,” the USAID logo, and the
project/contract number under which the evaluation was conducted. See
USAID Branding and Marking Guidelines (http://www.usaid.gov/branding/)
for logo and other specifics. Give the title of the evaluation; the name of
the USAID office receiving the evaluation; the name(s), title(s), and
organizational affiliation(s) of the author(s); and the date of the report.
Contents
As needed, and
start with
Roman
numeral ii.
Essential. Should list all the sections that follow, including Annexes. For
multi-page chapters, include chapter headings and first- and second-level
headings. List (with page numbers) all figures, tables, boxes, and other
titled graphics.
Foreword
1
Optional. An introductory note written by someone other than the
author(s), if needed. For example, it might mention that this evaluation is
one in a series of evaluations or special studies being sponsored by USAID.
Acknowledgements
1
Optional. The authors thank the various people who provided support
during the evaluation.
Preface
1
Optional. Introductory or incidental notes by the authors, but not material
essential to understanding the text. Acknowledgements could be included
here if desired.
Executive Summary
2-3; 5 at most
Essential, unless the report is so brief that a summary is not needed. (See
discussion on p. 5)
Glossary
1
Optional. Is useful if the report uses technical or project-specific
terminology that would be unfamiliar to some readers.
Acronyms and
Abbreviations
1
Essential, if they are used in the report. Include only those acronyms that
are actually used. See Table 3 for more advice on using acronyms.
I. Introduction
5-10 pages,
starting with
Arabic numeral
1.
Optional. The two sections listed under Introduction here could be
separate, stand-alone chapters. If so, a separate Introduction may not be
needed.
Description of the
Project
The Evaluation Purpose
and Methodology
II. Findings, Conclusions,
and Recommendations
Essential. Describe the context in which the USAID project took place—
e.g., relevant history, demography, political situation, etc. Describe the
specific development problem that prompted USAID to implement the
project, the theory underlying the project, and details of project
implementation to date. (See more tips on p. 6.)
Essential. Describe who commissioned the evaluation, why they
commissioned it, what information they want, and how they intend to use
the information (and refer to the Annex that includes the Statement of
Work). Provide the specific evaluation questions, and briefly describe the
evaluation design and the analytical and data collection methods used to
answer them. Describe the evaluation team (i.e., names, qualifications, and
roles), what the team did (e.g., reviewed relevant documents, analyzed
secondary data, interviewed key informants, conducted a survey, conducted
site visits), and when and where they did it. Describe the major limitations
encountered in data collection and analysis that have implications for
reviewing the results of the evaluation. Finally, refer to the Annex that
provides a fuller description of all of the above, including a list of
documents/data sets reviewed, a list of individuals interviewed, copies of
the data collection instruments used, and descriptions of sampling
procedures (if any) and data analysis procedures. (See more tips on p. 6.)
20-30 pages
Essential. However, in some cases, the evaluation user does not want
recommendations, only findings and conclusions. This material may be
8
TABLE 1. SUGGESTED OUTLINE FOR AN EVALUATION REPORT1
Element
Approximate
Number of
Pages
Description and Tips for the Evaluation Team
organized in different ways and divided into several chapters. (A detailed
discussion of developing defensible findings, conclusions, and
recommendations and structural options for reporting them is on p 2 and
p. 5)
III. Summary of
Recommendations
1-2 pages
Essential or optional, depending on how findings, conclusions and
recommendations are presented in the section above. (See a discussion of
options on p. 4.) If all the recommendations related to all the evaluation
questions are grouped in one section of the report, this summary is not
needed. However, if findings, conclusions, and recommendations are
reported together in separate sections for each evaluation question, then a
summary of all recommendations, organized under each of the evaluation
questions, is essential.
IV. Lessons Learned
As needed
Required if the SOW calls for it; otherwise optional. Lessons learned
and/or best practices gleaned from the evaluation provide other users, both
within USAID and outside, with ideas for the design and implementation of
related or similar projects in the future.
Some are
essential and
some are
optional as
noted.
Essential. Lets the reader see exactly what USAID initially expected in the
evaluation.
Annexes
Statement of Work
Evaluation Design and
Methodology
Essential. Provides a more complete description of the evaluation
questions, design, and methods used. Also includes copies of data
collection instruments (e.g., interview guides, survey instruments, etc.) and
describes the sampling and analysis procedures that were used.
List of Persons
Interviewed
Essential. However, specific names of individuals might be withheld in order
to protect their safety.
List of Documents
Reviewed
Essential. Includes written and electronic documents reviewed, background
literature, secondary data sources, citations of websites consulted.
Dissenting Views
If needed. Include if a team member or a major stakeholder does not agree
with one or more findings, conclusions, or recommendations.
Recommendation
Action Checklist
Optional. As a service to the user organization, this chart can help with
follow-up to the evaluation. It includes a list of all recommendations
organized by evaluation question, a column for decisions to accept or reject
each recommendation, a column for the decision maker’s initials, a column
for the reason a recommendation is being rejected, and, for each accepted
recommendation, columns for the actions to be taken, by when, and by
whom.
1
The guidance and suggestions in this table were drawn from the writers’ experience and from the “CDIE Publications
Style Guide: Guidelines for Project Managers, Authors, & Editors,” compiled by Brian Furness and John Engels, December
2001. The guide, which includes many tips on writing style, editing, referencing citations, and using Word and Excel is
available online at http://kambing.ui.ac.id/bebas/v01/DEC-USAID/Other/publications-style-guide.pdf. Other useful
guidance: ADS 320 (http://www.usaid.gov/policy/ads/300/320.pdf ; http://www.usaid.gov/branding; and
http://www.usaid.gov/branding/Graphic Standards Manual.pdf.
9
TABLE 2. THE QUICK REFERENCE GUIDE FOR A READER-FRIENDLY TECHNICAL STYLE
Writing Style—
Keep It Simple
and Correct!
Avoid meaningless precision. Decide how much precision is really necessary. Instead of “62.45
percent,” might “62.5 percent” or “62 percent” be sufficient? The same goes for averages and other
calculations.
Use technical terms and jargon only when necessary. Make sure to define them for the unfamiliar
readers.
Don’t overuse footnotes. Use them only to provide additional information which, if included in the
text, would be distracting and cause a loss of the train of thought.
Use Tables,
Charts and Other
Graphics to
Enhance
Understanding
Avoid long, “data-dump”paragraphs filled with numbers and percentages. Use tables, line graphs, bar
charts, pie charts, and other visual displays of data, and summarize the main points in the text. In
addition to increasing understanding, these displays provide visual relief from long narrative tracts.
Be creative—but not too creative. Choose and design tables and charts carefully with the reader in
mind.
Make every visual display of data a self-contained item. It should have a meaningful title and headings
for every column; a graph should have labels on each axis; a pie or bar chart should have labels for
every element.
Choose shades and colors carefully. Expect that consumers will reproduce the report in black and
white and make copies of copies. Make sure that the reader can distinguish clearly among colors or
shades among multiple bars and pie-chart segments. Consider using textured fillings (such as hatch
marks or dots) rather than colors or shades.
Provide “n’s” in all displays which involve data drawn from samples or populations. For example,
the total number of cases or survey respondents should be under the title of a table (n = 100). If a
table column includes types of responses from some, but not all, survey respondents to a specific
question, say, 92 respondents, the column head should include the total number who responded to
the question (n = 92).
Refer to every visual display of data in the text. Present it after mentioning it in the text and as soon
after as practical, without interrupting paragraphs.
Number tables and figures separately, and number each consecutively in the body of the report.
Consult the CDIE style guide for more detailed recommendations on tables and graphics.
Punctuate the
Text with Other
Interesting
Features
Put representative quotations gleaned during data collection in text boxes. Maintain balance
between negative and positive comments to reflect the content of the report. Identify the sources
of all quotes. If confidentiality must be maintained, identify sources in general terms, such as “a clinic
care giver” or “a key informant.”
Provide little “stories” or cases that illustrate findings. For example, a brief anecdotal story in a text
box about how a woman used a clinic’s services to ensure a healthy pregnancy can enliven, and
humanize, the quantitative findings.
Use photos and maps where appropriate. For example, a map of a district with all the rural clinics
providing prenatal care and the concentrations of rural residents can effectively demonstrate
adequate or inadequate access to care.
Don’t overdo it. Strike a reader-friendly balance between the main content and illustrative material.
In using illustrative material, select content that supports main points, not distracts from them.
Finally…
Remember that the reader’s need to understand, not the writer’s need to impress, is paramount.
Be consistent with the chosen format and style throughout the report.
Sources: “CDIE Publications Style Guide: Guidelines for Project Managers, Authors, & Editors,” compiled by Brian
Furness and John Engels, December 2001 (http://kambing.ui.ac.id/bebas/v01/DEC-USAID/Other/publications-styleguide.pdf); USAID’s Graphics Standards Manual (http://www.usaid.gov/branding/USAID_Graphic_Standards_Manual.pdf);
and the authors extensive experience with good and difficult-to-read evaluation reports.
10
For more information:
TIPS publications are available online at [insert website].
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication including Gerry Britan and
Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was
written by Larry Beyna of Management Systems International (MSI).
Comments regarding this publication can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
11
NUMBER 18
1ST EDITION, 2010
PERFORMANCE MONITORING & EVALUATION
TIPS
CONDUCTING DATA QUALITY ASSESSMENTS
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
performance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive System (ADS) Chapter 203.
THE PURPOSE OF
THE DATA QUALITY
ASSESSMENT
Data quality assessments (DQAs)
help managers to understand how
confident they should be in the
data used to manage a program and
report on its success. USAID’s
ADS notes that the purpose of the
Data Quality Assessment is to:
“…ensure that the USAID
Mission/Office
and
Assistance Objective (AO)
Team are aware of the
strengths and weaknesses of
the data, as determined by
applying the five data quality
standards …and are aware
of the extent to which the
data integrity can be trusted
to influence management
decisions.” (ADS 203.3.5.2)
This purpose is important to keep
in mind when considering how to
do a data quality assessment. A
data quality assessment is of little
use unless front line managers
comprehend key data quality issues
and are able to improve the
performance management system.
THE DATA QUALITY
STANDARDS
Five key data quality standards are
used to assess quality. These are:
• Validity
• Reliability
• Precision
• Integrity
• Timeliness
A more detailed discussion of each
standard is included in TIPS 12:
Data Quality Standards.
WHAT IS REQUIRED?
USAID POLICY
While managers are required to
understand data quality on an
ongoing basis, a data quality
assessment must also be conducted
at least once every three years for
those
data
reported
to
Washington. As a matter of good
management, program managers
may decide to conduct DQAs
more frequently or for a broader
range of data where potential
issues emerge.
The ADS does not prescribe a
specific way to conduct a DQA. A
variety of approaches can be used.
Documentation may be as simple
1
as a memo to the files, or it could
take the form of a formal report.
The most appropriate approach
will
reflect
a
number
of
considerations,
such
as
management need, the type of data
collected, the data source, the
importance of the data, or
suspected data quality issues. The
key is to document the findings,
whether formal or informal.
A DQA focuses on applying the
data
quality
standards
and
examining
the
systems
and
approaches for collecting data to
determine whether they are likely
to produce high quality data over
time. In other words, if the data
quality standards are met and the
data collection methodology is well
designed, then it is likely that good
quality data will result.
This “systematic approach” is
valuable because it assesses a
broader set of issues that are likely
to ensure data quality over time (as
opposed to whether one specific
number is accurate or not). For
example, it is possible to report a
number correctly, but that number
may not be valid1 as the following
example demonstrates.
Example:
A program works
across a range of municipalities
(both urban and rural).
It is
reported that local governments
have increased revenues by 5%.
These data may be correct.
However, if only major urban areas
have been included, these data are
not valid.
That is, they do not
measure the intended result.
VERIFICATION OF DATA
Verification of data means that the
reviewer follows a specific datum
to its source, confirming that it has
supporting documentation and is
accurate—as is often done in
audits.
The DQA may not
necessarily verify that all individual
numbers reported are accurate.
The ADS notes that when assessing
data from partners, the DQA
should focus on “the apparent
accuracy and consistency of the
data.” As an example, Missions
often report data on the number of
individuals trained. Rather than
verifying each number reported,
the DQA might examine each
project’s system for collecting and
maintaining those data. If there is a
good system in place, we know
that it is highly likely that the data
produced will be of high quality.
“…data used for
management purposes
have different standards
than data used for
research.
Having said this, it is certainly
advisable to periodically verify
actual data as part of the larger
performance management system.
Project managers may:
Choose a few indicators to verify
periodically throughout the course
of the year.
Occasionally spot check data (for
example, when visiting the field).
HOW GOOD DO DATA
HAVE TO BE?
Refer to TIPS 12: Data Quality
Standards for a full discussion of all
the data quality standards.
1
In development, there are rarely
perfect data. Moreover, data used
for management purposes have
different standards than data used
2
for research. There is often a
direct trade-off between cost and
quality.
Each manager is
responsible for ensuring the highest
quality data possible given the
resources and the management
context. In some cases, simpler,
lower-cost approaches may be
most appropriate. In other cases,
where indicators measure progress
in major areas of investment,
higher data quality is expected.
OPTIONS AND
APPROACHES FOR
CONDUCTING DQAS
A data quality assessment is both a
process for reviewing data to
understand
strengths
and
weaknesses
as
well
as
documentation. A DQA can be
done in a variety of ways ranging
from the more informal to the
formal (see Figure 1).
In our
experience, a combination of
informal, on-going and systematic
assessments work best, in most
cases, to ensure good data quality.
INFORMAL OPTIONS
Informal approaches can be ongoing or driven by specific issues as
they emerge.
These approaches
depend more on the front line
manager’s in-depth knowledge of
the program.
Findings are
documented by the manager in
memos
or
notes
in
the
Performance Management Plan
(PMP).
Example:
An
implementer
reports
that
civil
society
organizations (CSOs) have initiated
50 advocacy campaigns.
This
number seems unusually high. The
project
manager
calls
the
Implementer to understand why
the number is so high in
FIGURE 1. OPTIONS FOR CONDUCTING DATA QUALITY ASSESSMENTS- THE CONTINUUM
Informal Options
• Conducted internally by the
AO team
• Ongoing (driven by
emerging and specific
issues)
• More dependent on the
AO team and individual
manager’s expertise &
knowledge of the program
• Conducted by the program
manager
• Product: Documented in
memos, notes in the PMP
comparison to previously reported
numbers and explores whether a
consistent
methodology
for
collecting the data has been used
(i.e., whether the standard of
reliability has been met).
The
project manager documents his or
her findings in a memo and
maintains that information in the
files.
Informal approaches should be
incorporated into Mission systems
as a normal part of performance
management. The advantages and
disadvantages of this approach are
as follows:
Advantages
• Managers
incorporate
data
quality as a part of on-going
work processes.
• Issues can be addressed and
corrected quickly.
• Managers establish a principle
that data quality is important.
Disadvantages
• It is not systematic and may not
be complete. That is, because
informal
assessments
are
normally driven by more
Semi-Formal Partnership
• Draws on both
management expertise and
M&E expertise
• Periodic & systematic
• Facilitated and coordinated
by the M&E expert, but AO
team members are active
participants
• Product: Data Quality
Assessment Report
immediate
management
concerns, the manager may miss
larger issues that are not readily
apparent (for example, whether
the data are attributable to
USAID programs).
• There is no comprehensive
document that addresses the
DQA requirement.
• Managers may not have enough
expertise to identify more
complicated data quality issues,
audit
vulnerabilities,
and
formulate solutions.
SEMI-FORMAL /
PARTNERSHIP OPTIONS
Semi-formal or partnership options
are characterized by a more
periodic and systematic review of
data quality. These DQAs should
ideally be led and conducted by
USAID staff. One approach is to
partner
a
monitoring
and
evaluation (M&E) expert with the
Mission’s AO team to conduct the
assessment jointly.
The M&E
expert can organize the process,
develop
standard
approaches,
facilitate
sessions,
assist
in
identifying potential data quality
issues and solutions, and may
3
Formal Options
• Driven by broader
programmatic needs, as
warranted
• More dependent on
external technical expertise
and/or specific types of data
expertise
• Product: Either a Data
Quality Assessment report
or addressed as a part of
another report
document the outcomes of the
assessment. This option draws on
the experience of AO team
members as well as the broader
knowledge and skills of the M&E
expert.
Engaging front line
mangers in the DQA process has
the additional advantage of making
them more aware of the strengths
and weaknesses of the data—one
of the stated purposes of the DQA.
The advantages and disadvantages
of this approach are summarized
below:
Advantages
• Produces a systematic and
comprehensive report with
specific recommendations for
improvement.
• Engages AO team members in
the data quality assessment.
• Draws on the complementary
skills of front line managers and
M&E experts.
• Assessing data quality is a matter
of understanding trade-offs and
context in terms of deciding
what data is “good enough” for a
program. An M&E expert can be
useful in guiding AO team
members through this process in
order to ensure that audit
vulnerabilities are adequately
addressed.
• Does not require
external team.
a
large
These types of data quality
assessments require a high degree
of rigor and specific, in-depth
technical expertise.
Advantages
and disadvantages are as follow:
Disadvantages
Advantages
• The Mission may use an internal
M&E expert or hire someone
from the outside.
However,
hiring an outside expert will
require additional resources, and
external contracting requires
some time.
• Produces a systematic and
comprehensive assessment, with
specific recommendations.
• Examines data quality issues with
rigor and based on specific, indepth technical expertise.
• Because of the additional time
and planning required, this
approach is less useful for
addressing immediate problems.
• Fulfills two important purposes,
in that it can be designed to
improve data collection systems
both within USAID and for the
beneficiary.
FORMAL OPTIONS
Disadvantages
At the other end of the continuum,
there may be a few select situations
where Missions need a more
rigorous and formal data quality
assessment.
• Often conducted by an external
team of experts, entailing more
time and cost than other
options.
Example:
A Mission invests
substantial funding into a highprofile program that is designed to
increase the efficiency of water use.
Critical performance data comes
from the Ministry of Water, and is
used both for performance
management and reporting to key
stakeholders,
including
the
Congress. The Mission is unsure as
to the quality of those data. Given
the high level interest and level of
resources invested in the program,
a data quality assessment is
conducted by a team including
technical experts to review data
and
identify
specific
recommendations
for
improvement. Recommendations
will be incorporated into the
technical assistance provided to the
Ministry to improve their own
capacity to track these data over
time.
• Generally
involvement
managers.
less
direct
by front line
• Often examines data through a
very technical lens.
It is
important to ensure that
broader management issues are
adequately addressed.
THE PROCESS
The Mission will also have to
determine
whether
outside
assistance is required.
Some
Missions have internal M&E staff
with the appropriate skills to
facilitate this process.
Other
Missions may wish to hire an
outside M&E expert(s) with
experience in conducting DQAs.
AO team members should also be
part of the team.
DATA SOURCES
Primary Data: Collected directly by
USAID.
Secondary Data: Collected from and
other sources, such as implementing
partners, host country governments,
other donors, etc.
STEP 2. DEVELOP AN
OVERALL APPROACH
AND SCHEDULE
The team leader must convey the
objectives, process, and schedule
for conducting the DQA to team
members. This option is premised
on the idea that the M&E expert(s)
work closely in partnership with
AO
team
members
and
implementing partners to jointly
assess data quality. This requires
active participation and encourages
managers to fully explore and
understand the strengths and
weaknesses of the data.
For purposes of this TIPS, we will
outline a set of illustrative steps for
the middle (or semi-formal/
partnership) option.
In reality,
these steps are often iterative.
STEP 3. IDENTIFY THE
INDICATORS TO BE
INCLUDED IN THE
REVIEW
STEP 1. IDENTIFY THE
DQA TEAM
It is helpful to compile a list of all
indicators that will be included in
the DQA. This normally includes:
Identify one person to lead the
DQA process for the Mission. This
person is often the Program
Officer or an M&E expert. The
leader is responsible for setting up
the
overall
process
and
coordinating with the AO teams.
4
• All indicators reported to
USAID/Washington (required).
• Any indicators with suspected
data quality issues.
• Indicators for program areas
that are of high importance.
This list can also function as a
central guide as to how each
indicator is assessed and to
summarize where follow-on action
is needed.
STEP 4. CATEGORIZE
INDICATORS
With the introduction of standard
indicators,
the
number
of
indicators that Missions report to
USAID/Washington has increased
substantially. This means that it is
important to develop practical and
streamlined
approaches
for
conducting DQAs. One way to do
this is to separate indicators into
two categories, as follows:
Outcome Level Indicators
Outcome level indicators measure
AOs or Intermediate Results (IRs).
Figure 2 provides examples of
indicators at each level.
The
standards for good data quality are
applied to results level data in
order to assess data quality.
The data quality assessment
worksheet (see Table 1) has been
developed as a tool to assess each
indicator against each of these
standards.
Output Indicators
Many of the data quality standards
are not applicable to output
indicators in the same way as
outcome level indicators.
For
example, the number of individuals
trained by a project is an output
indicator. Whether data are valid,
timely, or precise is almost never
an issue for this type of an
indicator. However, it is important
to ensure that there are good data
collection and data maintenance
systems in place. Hence, a simpler
and more streamlined approach
can be used to focus on the most
relevant issues. Table 2 outlines a
sample matrix for assessing output
indicators. This matrix:
• Identifies the indicator.
• Clearly outlines
collection method.
the
data
• Identifies key data quality issues.
• Notes whether further action is
necessary.
• Provides specific information on
who was consulted and when.
STEP 5. HOLD WORKING
SESSIONS TO REVIEW
INDICATORS
Hold working sessions with AO
team members.
Implementing
partners may be included at this
5
point as well. In order to use time
efficiently, the team may decide to
focus these sessions on resultslevel indicators. These working
sessions can be used to:
• Explain the purpose and process
for conducting the DQA.
• Review data quality standards for
each results-level indicator,
including the data collection
systems and processes.
• Identify issues or concerns that
require further review.
STEP 6. HOLD SESSIONS
WITH IMPLEMENTING
PARTNERS TO REVIEW
INDICATORS
If the implementing partner was
included in the previous working
session, results-level indicators will
already have been discussed. This
session may then focus on
reviewing the remaining outputlevel indicators with implementers
who often maintain the systems to
collect the data for these types of
indicators. Focus on reviewing the
systems and processes to collect
and maintain data. This session
provides a good opportunity to
identify solutions or recommenddations for improvement.
STEP 7. PREPARE THE
DQA DOCUMENT
As information is gathered, the
team should record findings on the
worksheets provided.
It is
particularly important to include
recommendations for action at the
conclusion of each worksheet.
Once this is completed, it is often
useful to include an introduction
to:
• Outline the overall approach and
methodology used.
• Highlight key data quality issues
that are important for senior
management.
• Summarize
recommendations
for
improving
performance
management systems.
AO
team
members
and
participating implementers should
have an opportunity to review the
first draft. Any comments or issues
can then be incorporated and the
DQA finalized.
STEP 8. FOLLOW UP ON
ACTIONS
Finally, it is important to ensure
that there is a process to follow-up
on recommendations.
Some
recommendations
may
be
addressed internally by the team
handling management needs or
audit vulnerabilities. For example,
the AO team may need to work
with a Ministry to ensure that data
can be disaggregated in a way that
correlates precisely to the target
group. Other issues may need to
be addressed during the Mission’s
portfolio reviews.
CONSIDER THE
SOURCE – PRIMARY
VS. SECONDARY
DATA
PRIMARY DATA
USAID is able to exercise a higher
degree of control over primary
data that it collects itself than over
secondary data collected by others.
As a result, specific standards
should be incorporated into the
data collection process. Primary
data collection requires that:
• Written procedures are in place
for data collection.
• Data are collected from year to
year using a consistent collection
process.
• Data
are
collected
using
methods
to
address
and
minimize sampling and nonsampling errors.
• Data are collected by qualified
personnel and these personnel
are properly supervised.
• Duplicate data are detected.
• Safeguards are in place to
prevent unauthorized changes to
the data.
• Source
documents
are
maintained and readily available.
• If the data collection process is
contracted
out,
these
requirements
should
be
incorporated directly into the
statement of work.
SECONDARY DATA
Secondary data are collected from
other sources, such as host
country
governments,
implementing partners, or from
other organizations. The range of
control that USAID has over
secondary data varies.
For
example, if USAID uses data from a
survey commissioned by another
donor, then there is little control
over
the
data
collection
methodology. On the other hand,
USAID does have more influence
over
data
derived
from
implementing partners. In some
cases,
specific
data
quality
requirements may be included in
the contract. In addition, project
performance management plans
6
(PMPs) are often reviewed or
approved by USAID. Some ways in
which to address data quality are
summarized below.
Data from Implementing
Partners
• Spot check data.
• Incorporate specific data quality
requirements as part of the
SOW, RFP, or RFA.
• Review data quality collection
and maintenance procedures.
Data from Other Secondary
Sources
Data from other secondary
sources includes data from host
countries, government, and other
donors.
• Understand the methodology.
Documentation often includes a
description of the methodology
used to collect data.
It is
important to understand this
section so that limitations (and
what the data can and cannot
say) are clearly understood by
decision makers.
• Request a briefing on the
methodology, including data
collection
and
analysis
procedures, potential limitations
of the data, and plans for
improvement (if possible).
• If data are derived from host
country organizations, then it
may be appropriate to discuss
how assistance can be provided
to strengthen the quality of the
data. For example, projects may
include technical assistance to
improve management and/or
M&E systems.
TABLE 1. THE DQA WORKSHEET FOR OUTCOME LEVEL INDICATORS
Directions: Use the following worksheet to complete an assessment of data for outcome level indicators against the
five data quality standards outlined in the ADS. A comprehensive discussion of each criterion is included in TIPS 12
Data Quality Standards.
Data Quality Assessment Worksheet
Assistance Objective (AO) or Intermediate Result (IR):
Indicator:
Reviewer(s):
Date Reviewed:
Data Source:
Is the Indicator Reported to USAID/W?
Criterion
Definition
1. Validity
Do the data clearly and adequately
represent the intended result? Some issues
to consider are:
Face Validity. Would an outsider or an
expert in the field agree that the indicator is
a valid and logical measure for the stated
result?
Attribution. Does the indicator measure
the contribution of the project?
Measurement Error. Are there any
measurement errors that could affect the
data? Both sampling and non-sampling error
should be reviewed.
2. Integrity
Do the data collected, analyzed and
reported have established mechanisms in
place to reduce manipulation or simple
errors in transcription?
3. Precision
Are data sufficiently precise to present a fair
picture of performance and enable
management decision-making at the
appropriate levels?
4. Reliability
Do data reflect stable and consistent data
collection processes and analysis methods
over time?
5. Timeliness
Are data timely enough to influence
management decision-making (i.e., in terms
of frequency and currency)?
Yes or No
Explanation
Note: This criterion requires the reviewer to
understand what mechanisms are in place to
reduce the possibility of manipulation or
transcription error.
Note: This criterion requires the reviewer to ensure
that the indicator definition is operationally precise
(i.e. it clearly defines the exact data to be collected)
and to verify that the data is, in fact, collected
according to that standard definition consistently
over time.
A Summary of Key Issues and Recommendations:
7
Table 2. SAMPLE DQA FOR OUTPUT INDICATORS: THE MATRIX APPROACH
Document
Source
Data
Source
Data Collection Method/
Key Data Quality Issue
Further Action
Additional Comments/
Notes
AO or IR
Indicators
1.
Number of investment
measures made consistent
with international investment
agreements as a result of
USG assistance
Quarterly
Report
Project
A
A consultant works directly with the
committee in charge of simplifying procedures
and updates the number of measures
regularly on the website
(www.mdspdres.com). The implementer has
stated that data submitted includes
projections for the upcoming fiscal year
rather than actual results.
Yes. Ensure that
only actual results
within specified
timeframes are
used for
reporting.
Meeting with COTR 6/20/10
and 7/6/10.
2.
Number of public and private
sector standards-setting
bodies that have adopted
internationally accepted
guidelines for standards
setting as a result of USG
assistance
Semi-Annual
Report
Project
A
No issues. Project works only with one body
(the Industrial Standards-Setting Service) and
maintains supporting documentation.
No.
Meeting with COTR and COP
on 6/20/10.
3.
Number of legal, regulatory,
or institutional actions taken
to improve implementation
or compliance with
international trade and
investment agreements due
to support from USGassisted organizations
Quarterly
Report
Project
A
Project has reported “number of Regional
Investment Centers”. This is not the same as
counting “actions”, so this must be corrected.
Yes. Ensure that
the correct
definition is
applied.
Meeting with COTR, COP,
and Finance Manager and M&E
specialist on 6/20/10. The
indicator was clarified and the
data collection process will be
adjusted accordingly.
4.
Number of Trade and
Investment Environment
diagnostics conducted
Quarterly
Report
Projects
A and B
No issues. A study on the investment
promotion policy was carried out by the
project. When the report is presented and
validated the project considers it
“conducted”.
No.
Meeting with CTO and COPs
on 6/25/10.
8
For more information:
TIPS publications are available online at [insert website].
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication including Gerry Britan and
Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was
written by Michelle Adams-Matson, of Management Systems International.
Comments can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
9
NUMBER 19
1ST EDITION, 2010 DRAFT
PERFORMANCE MONITORING & EVALUATION
TIPS
RIGOROUS IMPACT EVALUATION
ABOUT TIPS
These TIPS provide practical advice and suggestions to USAID managers on issues related to
performance monitoring and evaluation. This publication is a supplemental reference to the
Automated Directive System (ADS) Chapter 203.
WHAT IS RIGOROUS
IMPACT
EVALUATION?
Rigorous impact evaluations are
useful for determining the effects
of
USAID
programs
on
outcomes.
This
type
of
evaluation allows managers to
test development hypotheses by
comparing changes in one or
more specific outcomes to
changes that occur in the
absence
of
the
program.
Evaluators
term
this
the
counterfactual. Rigorous impact
evaluations
typically
use
comparison groups, composed of
individuals or communities that
do not participate in the
program. The comparison group
FIGURE 1. DEFINITIONS OF IMPACT EVALUATION
• An evaluation that looks at the impact of an intervention on final welfare
outcomes, rather than only at project outputs, or a process evaluation which
focuses on implementation.
• An evaluation carried out some time (five to ten years) after the
intervention has been completed, to allow time for impact to appear.
• An evaluation considering all interventions within a given sector or
geographical area.
• An evaluation concerned with establishing the counterfactual, i.e., the
difference the project made (how indicators behaved with the project
compared to how they would have been without it).
is examined in relation to the
treatment group to determine
the effects of the USAID program
or project.
Impact evaluations may be
defined in a number of ways (see
Figure 1). For purposes of this
TIPS, rigorous impact evaluation
1
is defined by the evaluation
design (quasi-experimental and
experimental) rather than the
topic being evaluated. These
methods can be used to attribute
change at any program or project
outcome level, including
Intermediate Results (IR), sub-IRs,
and Assistance Objectives (AO).
Decisions about whether a
rigorous impact evaluation would
be appropriate and what type of
rigorous impact evaluation to
conduct are best made during
the program or project design
phase, since many types of
rigorous impact
evaluation can only be utilized if
comparison groups are
established and baseline data is
collected before a program or
project intervention begins.
WHY ARE RIGOROUS
IMPACT
EVALUATIONS
IMPORTANT?
A rigorous impact evaluation
enables managers to determine
the extent to which a USAID
program or project actually
caused observed changes.
A Performance Management Plan
(PMP) should contain all of the
tools necessary to track key
objectives (see also TIPS 7
Preparing
a
Performance
Management Plan).
However,
comparing
data
from
performance indicators against
baseline values demonstrates
only
whether
change
has
occurred,
with
very
little
information about what actually
caused the observed change.
USAID program managers can
only say that the program is
correlated with changes in
outcome, but cannot confidently
attribute that change to the
program.
FIGURE 2. A WORD ABOUT WORDS
Many of the terms used in rigorous evaluations hint at the origin of these
methods: medical and laboratory experimental research. The activities of a
program or project are often called the intervention or the independent
variable, and the outcome variables of interest are known as dependent
variables. The target population is the group of all individuals (if the unit of
analysis or unit is the individual) who share certain characteristics sought by
the program, whether or not those individuals actually participate in the
program. Those from the target population who actually participate are
known as the treatment group, and the group used to measure what would
have happened to the treatment group had they not participated in the
program (the counterfactual) is known as a control group if they are selected
randomly, as in an experimental evaluation, or, more generally, as a
comparison group if they are selected by other means, as in a quasiexperimental evaluation.
There are normally a number of
factors, outside of the program,
that might influence an outcome.
These are called confounding
factors. Examples of confounding
factors include programs run by
other donors, natural events (e.g.,
rainfall, drought, earthquake,
etc.), government policy changes,
or even maturation (the natural
changes that happen in an
individual or community over
time). Because of the potential
contribution
of
these
confounding factors, the program
manager cannot claim with full
certainty that the program
caused the observed changes or
results.
In some cases, the intervention
causes all observed change. That
is, the group receiving USAID
assistance will have improved
significantly while a similar, nonparticipating group will have
stayed roughly the same.
In
other situations, the target group
may have already been improving
and the program helped to
accelerate that positive change.
Rigorous
evaluations
are
2
designed to identify the effects of
the program of interest even in
these cases, where both the
target
group
and
nonparticipating groups may have
both changed, only at different
rates. By identifying the effects
caused by a program, rigorous
evaluations
help
USAID,
implementing partners and key
stakeholders
learn
which
program or approaches are most
effective, which is critical for
effective
development
programming.
WHEN SHOULD
THESE METHODS BE
USED?
Rigorous impact evaluations can
yield very strong evidence of
program effects. Nevertheless,
this method is not appropriate
for all situations.
Rigorous
impact evaluations often involve
extra costs for data collection and
always require careful planning
during program implementation.
To determine whether a rigorous
impact evaluation is appropriate,
potential cost should be weighed
against the need for and
usefulness of the information.
Rigorous
impact
evaluations
answer
evaluation
questions
concerning the causal effects of a
program.
However, other
evaluation designs may be more
appropriate for answering other
types of evaluation questions.
For example, the analysis of ‘why’
and ‘how’ observed changes,
particularly unintended changes,
were produced may be more
effectively answered using other
evaluation methods, including
participatory evaluations or rapid
appraisals. Similarly, there are
situations
when
rigorous
evaluations, which often use
comparison groups, will not be
advisable, or even possible. For
example, assistance focusing on
political parties can be difficult to
evaluate using rigorous methods,
as this type of assistance is
typically offered to all parties,
making the identification of a
comparison group difficult or
impossible. Other methods may
be more appropriate and yield
conclusions
with
sufficient
credibility
for
programmatic
decision-making.
rigorous
impact
While
evaluations are sometimes used
to examine the effects of only
one
program
or
project
approach,
rigorous
impact
evaluations are also extremely
useful for answering questions
about the effectiveness of
alternative
approaches
for
achieving a given result, e.g.,
which of several approaches for
improving farm productivity, or
for delivering legal services, are
most effective.
Missions should consider using
rigorous evaluations strategically
to answer specific questions
about the effectiveness of key
approaches.
When multiple
rigorous evaluations are carried
out across Missions on a similar
topic or approach, the results can
be used to identify approaches
that can be generalized to other
settings, leading to significant
advances
in
programmatic
knowledge. Rigorous methods
are often useful when:
 Multiple
approaches
to
achieving desired results have
been suggested, and it is
unclear which approach is the
most effective or efficient;
 An approach is likely to be
replicated if successful, and
clear evidence of program
effects are desired before
scaling up;
 A program uses a large amount
of resources or affects a large
number of people; and
 In general, little is known about
the effects of an important
program or approach, as is
often the case with new or
innovative approaches.
PLANNING
Rigorous methods require strong
performance
management
systems to be built around a
clear, logical results framework
(see TIPS 13 Building a Results
Framework). The development
hypothesis should clearly define
the logic of the program, with
3
particular emphasis on the
intervention
(independent
variable) and the principal
anticipated results (dependent
variables), and provides the basis
for the questions that will be
addressed by the rigorous
evaluation.
Rigorous evaluation builds upon
the indicators defined for each
level of result, from inputs to
outcomes, and requires high data
quality.
Because
quasiexperimental and experimental
designs typically answer very
specific evaluation questions and
are generally analyzed using
quantitative methods, they can
be paired with other evaluation
tools and methods to provide
context, triangulate evaluation
conclusions, and examine how
and why effects were produced
(or not) by a program. This is
termed mixed method evaluation
(see TIPS 16, Mixed Method
Evaluations).
Unlike
most
evaluations
conducted by USAID, rigorous
impact evaluations are usually
only possible, and are always
most effective, when planned
before project implementation
begins.
Evaluators need time
prior to implementation to
identify appropriate indicators,
identify a comparison group, and
set baseline values. If rigorous
evaluations are not planned prior
to implementation, the number
of potential evaluation design
options is reduced, often leaving
alternatives that are either more
complicated or less rigorous. As
a result, Missions should consider
the feasibility of and need for a
Observed Change
Outcome of Interest
FIGURE 3. CONFOUNDING EFFECTS
Program
Effect
Confounding
Effect
Baseline
Follow-up
= Target Group
= Comparison Group
rigorous evaluation prior to and
during project design.
DESIGN
Although
there
are
many
variations, rigorous evaluations
are divided into two categories:
quasi-experimental
and
experimental. Both categories of
rigorous evaluations rely on the
same basic concept - using the
counterfactual to estimate the
changes caused by the program.
The counterfactual answers the
question, “What would have
happened to program participants
if they had not participated in the
program?” The comparison of
the
counterfactual
to
the
observed change in the group
receiving USAID assistance is the
true measurement of a program’s
effects.
While
before
and
after
measurements of a single group
using a baseline allow the
measurement of a single group
both with and without program
participation, this design does
not control for all the other
confounding factors that might
influence the participating group
during program implementation.
Well constructed, comparison
groups provide a clear picture of
the effects of program or project
interventions on the target group
by
differentiating
program/project effects from the
effects of multiple other factors in
the environment that affect both
the target and comparison
groups. This means that in
situations where economic or
other factors affecting
both
groups make everyone better
off, it will still be possible to see
the additional or incremental
improvement caused by the
program or project, as Figure 3
illustrates.
QUASI-EXPERIMENTAL
EVALUATIONS
To estimate program effects,
quasi-experimental designs rely
on measurements of a nonrandomly selected comparison
group. The most common means
for selecting a comparison group
is
matching,
wherein
the
4
evaluator ‘hand-picks’ a group of
similar units based on observable
characteristics that are thought to
influence the outcome.
For
example, the evaluation of
an
agriculture program aimed at
increasing crop yield might seek
to
compare
participating
communities
against
other
communities with similar weather
patterns,
soil
types,
and
traditional crops, as communities
sharing
these
critical
characteristics would be most
likely to behave similarly to the
treatment group in the absence
of the program.
However, program participants
are often selected based on
certain characteristics, whether it
is level of need, motivation,
location, social or political factors,
or some other factor.
While
evaluators can often identify and
match many of these variables, it
is impossible to match all factors
that might create differences
between the treatment and
comparison groups, particularly
characteristics that are more
difficult to measure or are
unobservable, such as motivation
or social cohesion. For example,
if a program is targeted at
WHAT IS EXPERIMENTAL AND
QUASI-EXPERIMENTAL
EVALUATION?
Experimental design is based on a
the selection of the comparison and
treatment group through random
sampling.
Quasi-experimental design is
based on a comparison group that
is chosen by the evaluator (that is,
not based on random sampling).
FIGURE 4.
QUASI-EXPERIMENTAL EVALUATION OF THE KENYA NATIONAL CIVIC EDUCATION PROGRAM
PHASE II (NCEP II)
NCEP II, funded by USAID in collaboration with other donors, reached an estimated 10 million individuals through
workshops, drama events, cultural gatherings and mass media campaigns aimed at changing individuals’ awareness,
competence and engagement in issues related to democracy, human rights, governance, constitutionalism, and
nation-building. To determine the program’s impacts on these outcomes of interest, NCEP as evaluated using a
quasi-experimental design with a matched comparison group.
Evaluators matched participants to a comparison group of non-participating individuals who shared geographic and
demographic characteristics (such as age, gender, education, and involvement with CSOs). This comparison group
was compared to the treatment group along the outcomes of interest to identify program effects. The evaluators
found that the program had significant long term effects, particularly on ‘civic competence and involvement’ and
‘identity and ethnic group relations, but had only negligible impact on ‘Democratic Values, Rights, and
Responsibilities’. The design also allowed the evaluators to assess the conditions under which the program was
most successful. They found confirmation of prior assertions of the critical role in creating lasting impact of multiple
exposures to civic education programs through multiple participatory methods.
- ‘The Impact of the Second National Kenya Civic Education Programme (NECP II-URAIA) on Democratic Attitudes,
Values, and Behavior’, Steven E. Finkel and Jeremy Horowitz, MSI
communities that are likely
succeed, then the target group
might be expected to improve
relative to a comparison group
that was not chosen based on the
same factors. Failing to account
for this in the selection of the
comparison group would lead to
a biased estimate of program
impact. Selection bias is the
difference
between
the
comparison group and the
treatment group caused by the
inability to completely match on
all characteristics, and the
uncertainty
or
error
this
generates in the measurement of
program effects.
Other
common
quasiexperimental designs, in addition
to matching, are described below.
Non-Equivalent Group Design.
This is the most common quasiexperimental design in which a
comparison group is hand-picked
to match the treatment group as
closely as possible. Since handpicking the comparison group
cannot completely match all
characteristics with the treatment
group, the groups are considered
to be ‘non-equivalent’.
significantly different except in
terms of eligibility for the
program. Because of this, the
group just above the cut-off
serves as a comparison group for
those just below (or vice versa) in
a regression discontinuity design.
Regression
Discontinuity.
Programs often have eligibility
criteria based on a cut-off score
or value of a targeting variable.
Examples
include
programs
accepting only households with
income
below
2,000
USD,
organizations registered for at
least two years, or applicants
scoring above a 65 on a pre-test.
In each of these cases, it is likely
that individuals or organizations
just above and just below the
cut-off value would demonstrate
only marginal or incremental
differences in the absence of
USAID assistance, as families
earning 2,001 USD compared to
1,999 USD are unlikely to be
Propensity Score Matching. This
method is based on the same
rationale as regular matching: a
comparison group is selected
based on shared observable
characteristics with the treatment
group.
However, rather than
‘hand-picking’ matches based on
a small number of variables,
propensity score matching uses a
statistical process to combine
information
from
all
data
collected
on
the
target
population to create the most
accurate matches possible based
on observable characteristics.
5
Interrupted Time Series.1 Some
programs will encounter
situations where a comparison
group is not possible, often
because the intervention affects
everyone at once, as is typically
the case with policy change. In
these cases, data on the outcome
of interest are recorded at
numerous intervals before and
after the program or activity take
places. The data form a timeseries or trend, which the
evaluator analyzes for significant
changes around the time of the
intervention. Large spikes or
drops immediately after the
intervention signal changes
caused by the program. This
method is slightly different from
the other rigorous methods as it
does not use a comparison group
to rule out potentially
confounding factors, leading to
increased uncertainty in
evaluation conclusions.
Interrupted time series are most
effective when data are collected
regularly both before and after
the intervention, leading to a
long time series, and alternative
causes are monitored.
EXPERIMENTAL EVALUATION
In an experimental evaluation, the
treatment
and
comparison
groups are selected from the
target population by a random
process. For example, from a
target
population
of
50
communities that meet the
1
Interrupted time series is normally
viewed as a type of impact evaluation.
It is typically considered quasiexperiemental although it does not use a
comparison group.
eligibility (or targeting) criteria of
a program, the evaluator uses a
coin flip, lottery, computer
program, or some other random
process to determine the 25
communities that will participate
in the program (treatment group)
and the 25 communities that will
not (control group, as the
comparison group is called when
it is selected randomly). Because
they use random selection
processes,
experimental
evaluations are often called
randomized
evaluations
or
randomized
controlled
trials
(RCTs).
Random selection from a target
population into treatment and
control groups is the most
effective tool for eliminating
selection bias because it removes
the possibility of any individual
characteristic
influencing
selection. Because units are not
assigned to treatment or control
groups
based
on
specific
characteristics, but rather are
divided
randomly,
all
characteristics that might lead to
selection bias, such as motivation,
poverty level, or proximity, will be
roughly equally divided between
the treatment and control
groups.
If an evaluator uses
random assignment to determine
treatment and control groups,
she might, by chance, get two or
three
very
motivated
communities in a row assigned to
the treatment group, but if the
program is working in more than
a handful of communities, the
number
of
motivated
communities will likely balance
6
out between treatment
control in the end.
and
Because
random
selection
completely eliminates selection
bias, experimental evaluations are
often easier to analyze and
provide more credible evidence
than quasi experimental designs.
Random assignment can be done
with any type of unit, whether the
unit is the individual, groups of
individuals (e.g., communities or
districts),
organizations,
or
facilities (e.g., health center or
school) and usually follows one of
the designs discussed below.
Simple Random Assignment.
When the number of program
participants has been decided
and additional eligible individuals
are identified, simple random
assignment through a coin flip or
lottery can be used to select the
treatment group and control
groups.
Programs
often
encounter
‘excess
demand’
naturally (for example in training
programs, participation in study
tours, or where resources limit
the
number
of
partner
organizations),
and
simple
random assignment can be an
easy and fair way to determine
participation while maximizing
the
potential
for
credible
evaluation conclusions.
Phased-In Selection. In some
programs, the delivery of the
intervention does not begin
everywhere at the same time. For
capacity or logistical reasons,
some units receive the program
intervention earlier than others.
This type of schedule creates a
natural opportunity for using an
FIGURE 5.
EXPERIMENTAL EVALUATION OF THE IMPACTS OF EXPANDING CREDIT ACCESS IN
SOUTH AFRICA
While commercial loans are a central component of most microfinance strategies, there is much less consensus on
whether consumer loans are also for economic development. Microfinance in the form loans for household
consumption or investment has been criticized as unproductive, usurious, and a contributor to debt cycles or traps.
In an evaluation partially funded by USAID, researchers used an experimental evaluation designed to test the impacts
of access to consumer loans on household consumption, investment, education, health, wealth, and well-being.
From a group of 787 applicants who were just below the credit score needed for loan acceptance, the researchers
randomly selected 325 (treatment group) that would be approved for a loan. The treatment group was surveyed,
along with the remaining 462 who were randomly denied (control group), eight months after their loan application to
estimate the effects of receiving access to consumer credit. The evaluators found that the treatment group was more
likely to retain wage employment, less likely to experience severe hunger in their households, and less likely to be
impoverished than the control group providing strong evidence of the benefits of expanding access to consumer
loans.
-‘Expanding Credit Access: Estimating the Impacts’, Dean Karlan and Jonathan Zinman,
http://www.povertyactionlab.org/projects/print.php?pid=62
experimental design. Consider a
project where the delivery of a
radio-based
civic
education
program was scheduled to
operate in 100 communities
during year one, another 100
during year two, and a final 100
during year three. The year of
participation can be randomly
assigned. Communities selected
to participate in year one would
be designated as the first
treatment group (T1). For that
year, all the other communities
that would participate in Years
Two and Three form the initial
control group. In the second
year, the next 100 communities
would become the second
treatment group (T2), while the
final 100 communities would
continue to serve as the control
group. Random assignment to
the year of participation ensures
that
all
communities
will
participate in the program but
also maximizes evaluation rigor
by reducing selection bias, which
could be significant if only the
most motivated communities
participate in Year One.
Blocked
(or
Stratified)
Assignment. When it is known in
advance that the units to which a
program intervention could be
delivered differ in one or more
ways that might influence the
program outcome, (e.g., age, size
of the community in which they
are located, ethnicity, etc.),
evaluators may wish to take extra
steps to ensure that such
conditions are evenly distributed
between
an
evaluation’s
treatment and control groups. In
a simple block (stratified) design,
an evaluation might separate
men and women, and then use
randomized assignment within
each block to construct the
evaluation’s
treatment
and
control groups, thus ensuring a
specified number or percentage
7
of men and women in each
group.
Multiple Treatments. It is
possible that multiple approaches
will be proposed or implemented
for the achievement of a given
result. If a program is interested
in
testing
the
relative
effectiveness of three different
strategies or approaches, eligible
units can be randomly divided
into three groups. Each group
participates in one approach, and
the results can be compared to
determine which approach is
most effective. Variations on this
design can include additional
groups to test combined or
holistic approaches and a control
group to test the overall
effectiveness of each approach.
COMMON
QUESTIONS AND
CHALLENGES
While
rigorous
evaluations
require significant attention to
detail in advance, they need not
be impossibly complex. Many of
the most common questions and
challenges can be anticipated and
minimized.
COST
Rigorous evaluations will almost
always cost more than standard
evaluations that do not require
comparison groups.
However,
the
additional
cost
can
sometimes
be
quite
low
depending on the type and
availability of data to be
collected.
Moreover, findings
from rigorous evaluations may
lead to future cost-savings,
through improved programming
and more efficient use of
resources over the longer term.
Nevertheless, program managers
must anticipate these additional
costs, including the additional
planning requirements, in terms
of staffing and budget needs.
ETHICS
The use of comparison groups is
sometimes criticized for denying
treatment
to
potential
beneficiaries.
However, every
program has finite resources and
must select a limited number of
program participants. Random
selection of program participants
is often viewed, even by those
beneficiaries
who
are
not
selected, as being the fairest and
most transparent method for
determining participation.
A second, more powerful, ethical
question emerges when a
program
seeks
to
target
participants that are thought to
be most in need of the program.
In
some
cases,
rigorous
evaluations require a relaxing of
targeting
requirements
(as
discussed in Figure 6) in order to
identify enough similar units to
constitute a comparison group,
meaning that perhaps some of
those identified as the ‘neediest’
might be assigned to the
comparison group. However, it is
often the case that the criteria
used to target groups do not
provide a degree of precision
required to confidently rankorder
potential
participants.
Moreover, rigorous evaluations
can help identify which groups
benefit most, thereby improving
targeting for future programs.
SPILLOVER
Programs are often designed to
incorporate ‘multiplier effects’
whereby program effects in one
community naturally spread to
others nearby.
While these
effects help to broaden the
impact of a program, they can
result in bias in conclusions when
the effects on the treatment
group spillover to the comparison
group. When comparison groups
also benefit from a program, then
they no longer measure only the
confounding effects, but also a
portion of the program effect.
This leads to underestimation of
program impact since they
8
FIGURE 6. TARGETING IN
RIGOROUS EVALUATIONS
Programs often have specific
eligibility requirements without
which a potential participant could
not feasibly participate.
Other
programs target certain groups
because of perceived need or
likelihood of success. Targeting is
still
possible
with
rigorous
evaluations, whether experimental
or quasi-experimental, but must be
approached in a slightly different
manner. If a program intends to
work in 25 communities, rather than
defining
one
group
of
25
communities that meet the criteria
and participate in the program, it
might be necessary to identify a
group of 50 communities that meet
the eligibility or targeting criteria
and will be split into the treatment
and comparison group.
This
reduces the potential for selection
bias while still permitting the
program to target certain groups.
In situations where no additional
communities meet the eligibility
criteria and the criteria cannot be
relaxed, phase-in or multiple
treatment approaches, as discussed
below, might be appropriate.
appear better off than they would
have been in the absence of the
program.
In some cases,
spillovers can be mapped and
measured but, most often, they
must be controlled in advance by
selecting treatment and control
groups or units that are unlikely
to significantly interact with one
another.
A special case of
spillover occurs in substitution
bias wherein governments or
other donors target only the
comparison group to fill in gaps
of service. This is best avoided by
ensuring coordination between
the
program
and
development actors.
other
SAMPLE SIZE
During the analysis phase,
rigorous evaluations typically use
statistical tests to determine
whether any observed differences
between
treatment
and
comparison groups represent
actual differences (that would
then, in a well designed
evaluation, be attributed to the
program)
or
whether
the
difference could have occurred
due to chance alone. The ability
to make this distinction depends
principally on the size of the
change and the total number of
units in the treatment and
comparison groups, or sample
size. The more units, or higher
the sample size, the easier it is to
attribute change to the program
rather than to random variations.
During
the
design
phase,
rigorous
impact
evaluations
typically calculate the number of
units (or sample size) required to
confidently identify changes of
the size anticipated by the
program. An adequate sample
size helps prevent declaring a
successful project ineffectual
(false negative) or declaring an
ineffectual project successful
(false positive). Although sample
9
size calculations should be done
before each program, as a rule of
thumb,
rigorous
impact
evaluations are rarely undertaken
with less than 50 units of analysis.
RESOURCES
This TIPS is intended to provide
an introduction to rigorous
impact evaluations.
Additional
resources are provided on the
next page for further reference.
Further Reference
Initiatives and Case Studies:
-
-
-
-
Office of Management and Budget (OMB):
o http://www.whitehouse.gov/OMB/part/2004_program_eval.pdf
o http://www.whitehouse.gov/omb/assets/memoranda_2010/m10-01.pdf
U.S. Government Accountability Office (GAO):
o http://www.gao.gov/new.items/d1030.pdf
USAID:
o Evaluating Democracy and Governance Effectiveness (EDGE):
http://www.usaid.gov/our_work/democracy_and_governance/technical_areas/dg_office/eval
uation.html
o Measure Evaluation:
http://www.cpc.unc.edu/measure/approaches/evaluation/evaluation.html
o The Private Sector Development (PSD) Impact Evaluation Initiative:
www.microlinks.org/psdimpact
Millennium Challenge Corporation (MCC) Impact Evaluations:
http://www.mcc.gov/mcc/panda/activities/impactevaluation/index.shtml
World Bank:
o The Spanish Trust Fund for Impact Evaluation:
http://web.worldbank.org/WBSITE/EXTERNAL/EXTABOUTUS/ORGANIZATION/EXTHDNETW
ORK/EXTHDOFFICE/0,,contentMDK:22383030~menuPK:6508083~pagePK:64168445~piPK:6
4168309~theSitePK:5485727,00.html
o The Network of Networks on Impact Evaluation: http://www.worldbank.org/ieg/nonie/
o The Development Impact Evaluation Initiative:
http://web.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTDEVIMPEVAINI/0,,menuPK:39982
81~pagePK:64168427~piPK:64168435~theSitePK:3998212,00.html
Others:
o Center for Global Development’s ‘Evaluation Gap Working Group’:
http://www.cgdev.org/section/initiatives/_active/evalgap
o International Initiative for Impact Evaluation: http://www.3ieimpact.org/
Additional Information:
-
-
Sample Size and Power Calculations:
o http://www.statsoft.com/textbook/stpowan.html
o http://www.mdrc.org/publications/437/full.pdf
World Bank: ‘Evaluating the Impact of Development Projects on Poverty: A Handbook for
Practitioners’:
o http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/EXTPOVERTY/EXTISPMA/0,,contentM
DK:20194198~pagePK:148956~piPK:216618~theSitePK:384329,00.html
Poverty Action Lab’s ‘Evaluating Social Programs’ Course: http://www.povertyactionlab.org/course/
10
For more information:
TIPS publications are available online at [insert website]
Acknowledgements:
Our thanks to those whose experience and insights helped shape this publication including USAID’s
Office of Management Policy, Budget and Performance (MPBP). This publication was written by Michael
Duthie of Management Systems International.
Comments regarding this publication can be directed to:
Gerald Britan, Ph.D.
Tel: (202) 712-1158
gbritan@usaid.gov
Contracted under RAN-M-00-04-00049-A-FY0S-84
Integrated Managing for Results II
11