This report outlines an approach and some specific methods for evaluating a particular, and often neglected, aspect of nonprofit leadership training and development programs. While the primary (or direct) impacts of these programs on the participating leaders are often evaluated and understood, it’s less clear how leadership programs ought to think about and evaluate their secondary impact on the communities in which their program alumni work. This report offers a conceptual framework and tools for staff of leadership training and development programs to use in evaluating that secondary impact.
It is impossible to effectively measure or learn about secondary impacts until you have first articulated as clearly as possible what the desired outcomes are and how you think they will come about.
Evaluation experts have developed a wide range of creative methods and tools for this step, well beyond the surveys, interviews, case studies, and focus groups that we were most familiar with. And even for some of these tried-and-true methods, experts propose refinements that make them more effective in the context of complex systems.
The learning generated through evaluation processes should be fed back into the program design and process to ensure a continuous cycle of program improvement.
There’s an apocryphal story in which a Harvard Business School student visits the course materials office at HBS and is disgruntled by the lackadaisical service.
Fed up with waiting, he finally raps his knuckles on the counter and barks, “Hey! Can you hurry it up? I’m the customer!” In response, the clerk casually lowers the crossword puzzle in which she’s been engrossed and remarks, “You’re not the customer. You’re the product.” This joke encapsulates a perception that is often held among those who manage programs focused on leadership training and development: that the participants are the “product” of the learning process.
Many nonprofits run programs to invest in future skilled leaders and professionals. While we are deeply committed to the intrinsic value that these programs offer to those who participate in them, we should be equally committed to understanding the impact that the leaders in whom we’ve invested have on their communities and networks, what are referred to in this report as “secondary impacts.”
In fact, the real measure of our success is not the leader’s experience alone, but also the change they catalyze in their communities, their networks, and the larger world. In this context, our participants are also a means to support our larger visions for change. Our implied theory of change is that investing in leaders translates to better-equipped leaders who can more effectively pursue the change we all need to see in the world. The measure of our success is how much more effectively they realize that change.
This report was born out of a shared need of its authors, staff leaders at two different nonprofits, American Jewish World Service and Auburn Seminary. We were both overseeing major programs that aspire to strengthen the leadership of a cohort of leaders. For both of us, the investment in leaders was clearly a means to an end: the real desired outcome was making an impact on society, and the leaders we were training served as vehicles for achieving that impact. We were operating under a hypothesis about social change that presumes that great leaders make great things happen, so investing in people to help them become great leaders—inspiring, empowering, and training them to catalyze communal transformation—would lead to great change.
Our bare-bones theory of change looked something like this:
While we had many tools for evaluating our program’s impact on the leaders themselves (A->B or “primary impact”), when it came to understanding or measuring how each leader’s experience in our program contributed to an impact on the communities they led (A->C or “secondary impact”), we were at a loss.
Stirred by this frustration, we produced a two-day seminar for nonprofit staff to study approaches to evaluating the secondary impacts of leadership programs. We are grateful to the Wexner Foundation for underwriting the costs associated with the seminar (through an alumni collaboration grant), held in April 2015 at the New York offices of American Jewish World Service.
Twenty-two colleagues who lead similar leadership programs (mostly in the Jewish nonprofit sector) participated in the seminar. We hired two experts in evaluation to lead us in our learning and reflection: Dr. Jewlya Lynn from the Spark Policy Institute in Denver and Tanya Beer from the Center for Evaluation Innovation in Washington, D.C.
The report shares the tools and insights we gained through the seminar, in the hopes that other nonprofit staff and funders designing and supporting leadership efforts can better assess the impact their leaders have on the communities they serve, in order to learn and improve those programs to effect yet more positive change for our world.
The leadership programs at issue address what are often called complex problems or systems.
As opposed to simple problems (baking a cake) and complicated problems (sending a rocket into space), complex problems or systems—such as raising a child, changing immigration policy, or solving climate change—are characterized by a multiplicity of interconnected factors, some of which are unknown or invisible; an inability to predict consequences; and unstable contexts.
When evaluating impact in the context of a complex system, experts encourage evaluators to develop a plausible case for how a program contributed to the outcomes, as opposed to overstating the case. The Spark Policy Institute encourages evaluators to:
Perhaps the most important approach to evaluating complex systems is to focus more on “learning and adaptation” than on traditional “accountability for impact.”
Evaluations that focus on learning and adaptation encourage developmental learning and risk taking. The complex problems our programs are addressing are never entirely solved; what worked last time may not work in the future; and what works in one context may not work in another. Developmental evaluation methods track effects as they unfold, not expecting methods or even intended outcomes to remain stable. In this context, evaluation encourages staff to adapt to what is learned, acknowledge what isn’t working, and use the new knowledge to inform future decisions. In their book Getting to Maybe: How the World is Changed, Westley, Zimmerman, and Patton call this a “safe-fail” approach, as opposed to a “fail-safe” approach.2 The point is that in the context of complex systems, spending lots of time and resources on measuring the original, predicted impact may not be an effective way to get to the ultimate outcome.
Moving from the idea of more effectively evaluating leadership programs in a complex system to actually doing it can be daunting. We have found in our own work that it takes significant time and energy to get institutional buy-in and involve colleagues in new behaviors. Whether you try a DIY (“do it yourself”) approach or involve a consultant, these three steps form the core elements of an evaluation and learning cycle.
Most programs treat evaluation as an exercise in counting heads (e.g., How many people applied to / participated in / graduated from my program?) and/or customer satisfaction (e.g., How would you rate your experience in my program on a scale of 1-5?). We design leadership programs to leverage leaders to effect meaningful change in the world. The first step in evaluating these programs is to articulate the change we seek to produce and the way we think it will materialize: What do we want to see happen in the communities our leaders serve? Will more people join those communities? Will community members engage in more—or better—political activism? Will they engage more productively in civil discourse around contentious issues? Will they manage their institutions more sustainably?
One way to begin articulating the outcomes and change pathways of a leadership program is to develop a persona, an iconic/archetypal story of a program participant who exemplifies what the program is trying to accomplish. Using personas can help concretize program design in the observed interests, experiences, and perspectives of real people. Developing a persona consists of the following key steps:
Shlomo Goltz has written a helpful guide for developing robust program personas. Below, you’ll also find a case study from one of the program directors who participated in our evaluation seminar. While this is a description of an actual program participant and not a synthesized persona, it conveys the value of having an individual story to capture the kind of change your program is designed to produce.
Another valuable tool for articulating outcomes and change pathways is the logic model. (See W.K. Kellogg Foundation’s logic model development guide.) A logic model is a flow-chart diagram depicting the causal relationships among the various elements of program design. While there are many logic-model formats, most include and describe relationships among the following elements.
Some logic models also include articulations of the problem the program is designed to solve, underlying assumptions that inform the logic model design, and critical factors in the external operating environment.
Logic models can be used for many purposes in both program design and evaluation, but their core utility is in making explicit and transparent the causal relationships among the various elements of a program.
Julie (not her real name) grew up in a part of the U.S. with a small Jewish community. She belonged to a Reform Jewish synagogue, attended Hebrew School, and participated in Reform Jewish summer programs growing up. Her limited exposure to Jewish diversity resulted in very few friendships with Jews outside of the Reform movement.
Julie traveled to Israel with BYFI the summer before her senior year of high school and continued with BYFI seminars during her senior year. The program exposed her to Jewish pluralism, text study, and Israel in a multifaceted way. She built close relationships with Jews different from her (e.g., she emerged from the program best friends with an observant Orthodox young woman).
Julie went on to attend an Ivy League school. Because of her experience in BYFI, she decided to get involved and joined a prayer community, but came to see that the Hillel—dominated by Orthodox students on her campus—seemed to alienate Jews from more secular backgrounds and those with left-leaning perspectives on Israel.
As a junior, Julie decided to run for Hillel president against a politically conservative and religiously Orthodox fellow student. She was the underdog, but spent a lot of time reaching out to students on the fringes and those in the center. Her BYFI experience gave her valuable tools for how to speak with Jews from diverse backgrounds and helped her honor each person’s point of view as equally valid. She gained people’s trust and won the election.
A month prior to the election she was feeling disheartened about polarized attitudes toward Israel on campus. She attended a BYFI seminar for college alumni designed to create a safe space to explore the Israel discourse on campus, provide discussion tools, and build community for students who care about Israel across different political perspectives. She broke down in tears at the seminar, but had several mentoring conversations with staff and peers that bolstered her. After the seminar she wrote to staff saying the retreat had rejuvenated her and given her increased confidence and strength to keep her campaign going. The sense of being part of a community that believed in her and the reminder that there are Jewish spaces that can enable heartfelt-yet-civil discussion about Israel gave her hope and energy.
As Hillel president she focused on making those on the margins feel welcome. She hosted the first ever pre-Passover bagel brunch, with over 250 students attending (most of whom were not “the usual suspects”). She has also convinced the Hillel professional staff to develop programs that provide small-group settings for Israel conversation to bring people together.
One last point: Julie went to college intending to major in political science. Because of her exposure to a range of Jewish texts and ideas through BYFI, she decided to try Yiddish and is now a Yiddish major.
~Becky Voorwinde, Executive Director of The Bronfman Youth Fellowships
If the first step is articulating meaningful outcomes and the means by which we intend to achieve them, the next step is to open the evaluation toolbox to find methods and tools that capture and analyze data relevant to those outcomes.
The central task of this step is to match one or more evaluation tools to your desired outcome(s). To do that, you need to explore the evaluation toolbox, and then reflect on how you would use the data that a specific tool would generate. So first we’ll introduce you to the toolbox, and then we’ll offer a set of questions to reflect on as you consider potential evaluation methods.
One of the most exciting moments of the seminar was when our two evaluation experts walked us through a set of 15 evaluation tools they and other experts have developed for program providers. While it is difficult to recreate that experience in a written report, we believe that tasting the variety of evaluation methods that are available is an important experience in the process of improving our approach to evaluating/assessing our programs.
Below are brief descriptions of a handful of these evaluation tools. At the end of this report, you’ll find a brief summary of all 15 different approaches, a chart you can use to identify additional approaches that you may wish to explore. Depending on the evaluation experience of your staff, you may decide to hire an evaluation consultant to identify—and support you in using—the most relevant tools to collect data related to your desired outcomes.
Use the Identity Leadership Inventory to measure a leader’s ability to mobilize and direct followers’ energies.
The Identity Leadership Inventory (ILI) is a simple, validated3 survey instrument designed to measure a leader’s influence on her or his community vis-a-vis four interconnected relationships:
By deploying the ILI in a leader’s community before, during, and after a leadership-training intervention, you can infer some of the intervention’s value in supporting the leader’s influence in her/his community. You can find the ILI survey instrument and guidance for administering it in Niklas K. Steffen, et al’s “Identity Leadership Inventory (ILI) Instrument and Scoring Guide (ILI Version 1.0).”
Use Outcome Harvesting to identify changes in a complex system and work backwards to determine whether and how an intervention contributed to those changes.
The outcome harvesting methodology is unusual in that it does not seek to measure progress toward a predetermined set of outcomes. Instead, it starts by identifying changes (e.g., shifts in communal behavior, adoption of new regulations/legislation, emergence of new memes/messages, etc.) within a system affected by an intervention (e.g., a community whose leader has experienced a leadership-training program) and then works backward to determine whether there’s credible evidence that the intervention contributed to the changes. It seeks to answer the following questions:
It’s particularly applicable in contexts in which the connections between cause and effect are ambiguous. You can find detailed guidance on the outcomes harvesting methodology in Ricardo Wilson-Grau and Heather Britt’s “Outcome Harvesting.”
Use Most Significant Change to assess not only what outcomes a program has achieved, but also how various stakeholders value those outcomes in different ways.
The Most Significant Change (MSC) technique is a highly participatory methodology designed to engage a broad cross-section of stakeholders in the evaluation of a given intervention. In the context of a leadership-training program, members of the leader’s community would be invited to share stories of what they consider to be significant changes resulting from leader’s participation. This process of sharing not only provides useful evidence for the impacts of the intervention, but also surfaces valuable insights into what community members consider “significant.” Once these stories have been collected, a group of select stakeholders (staff, advisors, etc.) discuss the stories and determine which they consider “most significant,” along with describing a rationale for their determinations. These analyses are then fed back to the community to focus its attention on impact, shared with leaders, and used to refine the training program. You can find detailed background information and implementation guidance in Rick Davies and Jess Dart’s The ‘Most Significant Change’ (MSC) Technique: A Guide to Its Use.
Use a Collective Efficacy Scale to assess a group’s shared belief in its joint capability to organize and execute a course of action in pursuit of its desired results.
The Collective Efficacy Scale emerged from criminal-justice research into how communities’ shared values and willingness to cooperate could exert influence on community members to reduce violence and crime. It has been adapted for use in schools and many other contexts in which community members’ sense of themselves as a mutually supportive community capable of collective action is important.
By deploying the Collective Efficacy Scale in a leader’s community before, during, and after a leadership-training program, you can infer some of the program’s value in building the community’s sense of efficacy. You can find more information on the Collective Efficacy Scale in John M. Carroll, Mary Beth Rosson, and Jingying Zhou’s “Collective Efficacy as a Measure of Community.”
Use Ripple Effects Mapping to reflect upon and visually map the intended and unintended changes produced by a complex program or collaboration.
Ripple Effects Mapping (REM) is a participatory qualitative evaluation method in which stakeholders collectively and visually map the interconnected outcomes resulting from a program or intervention. In an REM process, participants come together to reflect on the ways in which a program has affected their lives and the lives of others. Individual impacts are captured using mind-mapping software or on a large white board, and the group works together to identify and map connections among these impacts to generate a collective representation of the ripple effects of the program or intervention. For more information and a practical guide for using REM, see Washington State University-Extension’s “Ripple Effects Mapping for Evaluation.”
More details and more tools are included in the section titled “15 Evaluation Tools to Consider” near the end of the report.
Once you have identified one or two evaluation tools of interest, you can test them out on your outcomes. Do this by reflecting on the following questions:
Note that each of these methods also introduces a frame you can apply to your evaluation even if you aren’t using the full method. For example, you can learn about the types of questions you might ask to capture emergent outcomes by becoming familiar with outcome harvesting, even if you don’t plan to do the full verification process that is part of that technique.
Evaluation experts offer various guidelines for selecting appropriate evaluation tools. No tool is perfect, comprehensive, or appropriate for all contexts, so deciding upfront what is most important about the tool can help. Some of the key criteria to consider in choosing evaluation tools include:
Note that these are criteria to balance—your tools need to meet all of them to a certain degree, but you will also have to balance how far to go (e.g. the most timely data collection may not be the most accurate or represent the most perspectives, but you don’t want to go so far into being accurate and representative that the data doesn’t show up until it’s too late to use it).
This is perhaps the simplest and yet most overlooked step in the program evaluation process. The learning generated through evaluation processes should be fed back into the program design and process to ensure a continuous cycle of program improvement. There’s no “magic bullet” methodology for this—it’s mostly a function of setting aside time for a planning team to reflect on the evaluation and identify ways in which the program can be refined to improve effectiveness. This should be an ongoing cycle in which the program design (outcomes and change pathways), and learning and evaluation methods are regularly interrogated and refined to ensure that the overall intervention is continually improving.
We hope this report has stimulated your thinking, and your practices, around evaluating your own leadership program.
As your evaluation practices evolve, you may wish to deepen your learning about evaluation methods. In the addenda you’ll find a short list of resources to consider. You may also wish to partner with other organizations and pool resources or explore joint evaluation efforts. Could you design or use an instrument together? Work with a consultant together? Share data you have gathered and the learning it has offered?
Ultimately, evaluation is a core part of learning and improvement. Good evaluation gives us data to reflect on, helps us imagine better ways to make an impact, and guides our ongoing efforts at improvement.
Donors and funders have a significant influence on the evaluation practices of nonprofits.
In most cases, this influence is very positive. The questions embedded in a grant application and the inquiries of a program officer about evaluation practices force program designers to reflect on the outcomes they hope to achieve and articulate how they hope to measure progress toward those outcomes. But sometimes a funder’s requests for evaluation lead to a culture in which program directors feel pressure to show how a pre-determined target was achieved. When a program is situated in a complex system, such a linear approach to evaluation can shut down important learning and reflection that will lead to breakthroughs later on. We encourage funders and nonprofit staff to engage in open conversations about approaches to evaluation that invite curiosity and excitement and rather than compliance-driven fear. By committing to outcomes, independent of the success of any particular programmatic approach, and encouraging honesty, collaboration, and innovation, funders can shape a larger field of practice. Some funders we work with take a learning-focused approach and use language like: “We care more about making an impact than the particular success or failure of your program. We want to learn what you are learning and find out how to make a difference. If this program doesn’t work well, we know we will all learn a lot from the experience, and then we can try something else together.”
This report was produced with the support of a Wexner Graduate Fellowship Alumni Collaboration Grant.
The authors express gratitude to the Wexner Foundation for their investment in leadership development and for the financial support that made this initiative possible.
Authors of this report:
We express appreciation to the almost two dozen staff who oversee leadership development programs who came to the April 2015 seminar that led to the development of this report:
The American Evaluation Association’s mission is to improve evaluation practices and methods, increase evaluation use, promote evaluation as a profession, and support the contribution of evaluation to the generation of theory and knowledge about effective human action.
Our aim is to push philanthropic and nonprofit evaluation practice in new directions and into new arenas. We specialize in areas that are challenging to assess, such as advocacy and systems change.
This page lists free resources for Program Evaluation and Social Research Methods. The focus is on “how-to” do program evaluation and social research: surveys, focus groups, sampling, interviews, and other methods. Most of these links are to resources that can be read over the web.
Spark Policy Institute partners with stakeholders throughout the country to develop innovative, research-based solutions to complex societal problems. Spark combines community and stakeholder-driven research with practical, hands-on experience and best practices, allowing for solutions that bridge sectors, issues, beliefs, and values. By integrating diverse policy systems, the team at Spark identifies and develops the best solutions for all stakeholders.
Auburn Seminary’s research team also consults with institutions on evaluation projects. Contact Auburn’s VP Research, Christian Scharen, to learn more.