Kitabı oku: «Позитивные изменения. Том 2, № 2 (2022). Positive changes. Volume 2, Issue 2 (2022)», sayfa 12
Evaluating Impact with All Rigor Possible. Applicability of Mathematical Methods in Measuring Social Impact (Exemplified by the Health Insurance Subsidy Program)
Elena Avramenko
DOI 10.55140/2782-5817-2022-2-2-68-81
Increasing incomes, improving the education system, reducing morbidity are important areas of impact investment. Whether these changes will actually be achieved is the key question for an investor deciding on a social technology or project to invest in. However, the leaders of social projects and programs often focus on measuring the immediate outputs rather than on assessing whether projects and programs have had the expected impact. In this article, we would like to highlight the experience of evaluating the impact of the Health Insurance Subsidy Program (HISP) and describe approaches that can be used to address this and other similar problems.
Elena Avramenko
Expert of the project "Development of a Social and Economic Impact Assessment Model for NGOs" by the GLADWAY Foundation, Lean 6 Sigma Green Belt master
HEALTH INSURANCE FOR THE POOR
The Health Insurance Subsidy Program (HISP) is a program implemented in Kenya that finances the purchase of health insurance for low-income households in rural areas. The insurance covers the costs associated with medical care and medications. The objective of HISP is to reduce out-of-pocket health expenditures of poor families and ultimately to improve health outcomes.
HISP was originally launched in pilot mode. The plans for gradual expansion of the program depended on the results of the pilot stage. As part of the pilot run, the plan was to reduce the average yearly per-capita health expenditures of poor rural households by at least USD 10 below what they would have spent in the absence of the program, and this target was to be reached within two years.
During the initial pilot phase, HISP was introduced in 100 rural areas. Of the 4,959 households in the baseline sample, a total of 2,907 were enrolled in HISP, and the program operated successfully through its pilot stage over the next two years. All health clinics and pharmacies serving 100 villages accepted patients under the insurance program, and surveys showed that most enrolled households were happy with the program. Data was collected before the start of the pilot run and at the end of the two-year period, using the same sample of 4,959 households.
PROOF OF IMPACT
Has HISP affected out-of-pocket health expenditures of poor rural households? Yes it has, and it has been proven mathematically. The impact evaluation approach used as part of HISP was to select the most rigorous method, given the specifics of the project.
HISP implementation case study provides us with the following «menu» of options for impact evaluation methods:
• randomized assignment;
• instrumental variables;
• regression discontinuity design;
• difference-in-differences;
• benchmarking method.
All of these approaches aim at identifying valid comparison groups so that the true impact of the program on out-of-pocket health care expenditures of poor households can be evaluated.
So, we build on the stage when the evaluation indicators are selected and elaborated in detail, the data collection plan is ready and the data is collected properly.
We will review the evaluation methodology selected for this case by introducing the concept of counterfactual (that is, a fact that contradicts the hypothesis). And then, within the framework of this article, we will give an overview of the most rigorous evaluation method proposed by HISP and tested on this program.
There are two concepts that are integral to the process of making accurate and reliable impact evaluations – the concept of causation and that of counterfactual.
First of all, issues of social impact are related to causation, for example, with the search for answers to such questions:
Does teacher training improve student test scores? Do additional funding programs for health facilities result in better health outcomes for children? Do vocational training programs increase a trainee's income?
Finding answers to these questions can be difficult. For example, in the context of a vocational training program, simply observing how a trainee’s income increases after completing such a program is not sufficient to establish a causal relationship. A trainee’s income could have increased even if he or she had not been trained – all through the trainee’s own efforts, due to changing conditions in the labor market, or due to many other factors that could affect income.
The challenge is to find a method that will allow us to establish a causal relationship. We must empirically determine to what extent a particular program – and that program alone – contributed to the change in outcome. The methodology must exclude all external factors.
The answer to the basic question of impact evaluation is what is the impact or causal effect of the program (P) on the outcome of interest (Y)? – is given by the basic impact evaluation formula:
Δ = (Y | P = 1) − (Y | P = 0).
This formula states that the causal effect (Д) of the program (P) on the outcome (Y) is the difference between the outcome (Y) with the program (in other words, when P = 1) and the same outcome (Y) without the program (i.e. when P = 0).
For instance, should P denote a training program and Y denote income, then the causal effect of the training program (Д) is the difference between a trainee’s income (Y) after participating in the training program (in other words, when P = 1) and the same person at the same point in time if he or she did not participate in the program (in other words, when P = 0).
If this were possible, we would observe how much income the same person would have at the same point in time both with and without the program, so that the program would be the only possible explanation for any difference in that person’s income. By comparing the same person with himself or herself at the same time, we would be able to exclude any external factors that could also explain the difference in outcomes.
But unfortunately, measuring two versions of the same unit at the same time is impossible: at a particular point in time, the person either participated or did not participate in the program.
This phenomenon is called the counterfactual problem: how can we measure what would happen if other circumstances prevailed?
COMPARISON AND TREATMENT GROUPS
In practice, the task of impact evaluation is to identify a comparison group and a treatment group that are similar in their parameters, but one of them participates in the program and the other does not. That way any difference in results must be due to the program.
The treatment and comparison groups should be the same in at least three respects:
1. The baseline properties of the groups should be identical. For example, the mean age of the treatment group should be the same as that of the comparison one.
2. The program factor should not affect the comparison group directly or indirectly.
3. The results in the comparison group should change in the same way as the results in the treatment group if both groups were (or were not) enrolled in the program. That is, groups should respond to the program in the same way. For example, if the income in the treatment group increased by RUB 5,000 due to the training program, then the income in the comparison group would also increase by RUB 5,000 if they received the training.
When the above three conditions are met, then only the existence of the program under study will account for any differences in the outcome (Y) between the two groups.
Instead of considering the impact solely for one person, it is more realistic to consider the average impact for a group of people (Figure 1).
It is important to consider what happens if we decide to proceed with the evaluation without finding a comparison group. We may run the risk of making inaccurate judgments about program outcomes, in particular with regard to counterfactual evaluations.
Such a risk exists when using the following approaches:
• Before-and-after comparisons (also known as reflexive comparisons): comparing the outcomes of the same group prior to and subsequent to the introduction of a program.
• With-and-without comparisons: comparing the outcomes in the group that chose to enroll with the results of the group that chose not to enroll.
A before-and-after comparison attempts to establish the impact of the program by tracking changes in outcomes for program participants over time. In essence, this comparison assumes that if the program had never existed, the outcome (Y) for program participants would have been exactly the same as their situation before the program. Unfortunately, in the vast majority of cases that assumption simply does not hold.
Consider, for example, the evaluation of a microfinance program for rural farmers. The program provides farmers with microloans to help them buy fertilizer to increase rice production. You observe that in the year before the start of the program, farmers harvested an average of 1,000 kilograms (kg) of rice per hectare. (Point B in Figure 2).
The microfinance scheme is launched, and a year later rice yields have increased to 1,100 kg per hectare. (Point A in Figure 2). If you try to evaluate impact using a before-and-after comparison, you have to use the pre-intervention outcome as a counterfactual. Applying the basic impact evaluation formula, you would conclude that the scheme had increased rice yields by 100 kg per hectare. (A-B)
However, imagine that rainfall was normal during the year before the scheme was launched, but a drought occurred in the year the program started. Because of the drought, the average yield without the microloan scheme would have been lower than В: say, at level D. In that case, as the before-and-after comparison assumes, the true impact of the program would have been A-D, which is larger than 100 kg.
Rainfall was one of many external factors that could have influenced the outcome of interest (rice yield) of the scheme over time. Similarly, many of the outcomes that development programs aim to improve, such as income, productivity, health or education, are affected by multiple factors over time. For this reason, the preintervention outcome is almost never a good estimate of the counterfactual.
Comparing those who chose to enroll to those who chose not to enroll ("with-and-without") constitutes another risky approach to impact evaluation. The comparison group, which independently chose the program, will provide another «counterfeit» counterfactual estimate. The choice occurs when participation in the program is based on the preferences or decisions of each participant. This preference is a separate factor on which the outcome of participation may depend. It is impossible to talk about the comparability of those who enrolled with those who did not enroll under such conditions.
The HISP pilot evaluation consultants, in their attempts to mathematically understand the results, made both the first and the second mistake in evaluating the counterfactual, but the program organizers, realizing the risk of bias, decided to find methods for a more accurate evaluation.
RANDOMIZED ASSIGNMENT METHOD
This method is similar to running a lottery that decides who is enrolled in the program at a given time and who is not. The method is also known as randomized controlled trials (RCTs). Not only does it give the project team fair and transparent rules for assigning limited resources to equally eligible population clusters, but it also provides a reliable method for evaluating program impact.
"Randomness" applies to a large population cluster having a homogeneous set of qualities. In order to decide who will be given access to the program and who will not, we can also generate a basis for a reliable counterfactual evaluation.
In a randomized allocation, each eligible unit (e.g., individual, household, business, school, hospital, or community) has the same probability of being selected for the program. When there is excess demand for the program, randomized assignment is considered transparent and fair for all participants in the process.
Insert 1 provides examples of the use of randomized distribution in practice.
Insert 1: RANDOMIZED CONTROLLED TRIALS AS A VALUABLE OPERATIONAL TOOL
Randomized assignment can be a useful rule for assigning program benefits, even outside the context of an impact evaluation. The following two cases from Africa illustrate how.
In Côte d'Ivoire, following a period of crisis, the government introduced a temporary employment program that was initially targeted at former combatants and later expanded to youth more generally. The program provided youth with short-term employment opportunities, mostly to clean or rehabilitate roads through the National Roads Agency. Young people in participating municipalities were invited to register. Given the attractiveness of the benefits, many more candidates applied than there were places available. In order to come up with a transparent and fair way of allocating the benefits among applicants, program implementers put in place a public lottery process. Once registration had closed and the number of applicants (say, N) in a location was known, a public lottery was organized. All candidates were invited to a public location, and small pieces of paper with numbers from 1 to N were put in a box. Applicants were then called one by one to come to draw a number from the box in front of all other candidates. Once the number was drawn, it was read aloud. After all applicants were called, someone would check the remaining numbers in the box one by one to ensure they match the applicants who did not turn up for the draw. If N spots were available for the program, the applicants having drawn the lowest numbers were selected for the program. The draw was organized separately for men and women. The public lottery process was well accepted by participants, and helped provide an image of fairness and transparency to the program in a post-conflict environment marked by social tensions. After several years of operations, researchers used this allocation rule, already integrated in the program, to conduct impact evaluation.
In Niger, the government started to roll out a national safety net project in 2011 with support from the World Bank. Niger is one of the poorest countries in the world, and the population of poor households eligible for the program greatly exceeded the available benefits during the first years of operation. Program implementers relied on geographical targeting to identify the departments and communes where the cash transfer program would be implemented first. This was feasible, as data was available to identify relative poverty or vulnerability status of the various departments or communes. However, within communes, very limited number of people could enroll in the program based on objective criteria. For the first phase of the project, program implementers decided to use public lotteries to select beneficiary villages within targeted communes. This decision was made in part because the available data to prioritize villages objectively was limited, and in part because an impact evaluation was being embedded in the project. For the public lotteries, all the village chiefs were invited in the municipal center, and the names of their villages were written on a piece of paper and put in a box. A child would then randomly draw beneficiary villages from the box until the quotas were filled. The procedure was undertaken separately for sedentary and nomadic villages to ensure representation of each group. After villages were selected, a separate household-level targeting mechanism was implemented to identify the poorest households, which were later enrolled as beneficiaries. The transparency and fairness of the public lottery was greatly appreciated by the village authorities, as well as by program implementers – so much so that the public lottery process continued to be used in the second and third cycle of the project to select over 1,000 villages throughout the country. Even though public lottery was not necessary for an impact evaluation at that point, its value as a transparent, fair, and widely accepted operational tool to allocate benefits among equally deserving populations justified its continued use in the eyes of program implementers and local authorities.
Sources: Bertrand, Marianne, Bruno Crépon, Alicia Marguerie, and Patrick Premand. Impacts à Court et Moyen Terme sur les Jeunes des Travaux à Haute Intensité de Main d’oeuvre (THIMO): Résultats de l’évaluation d’impact de la composante THIMO du Projet Emploi Jeunes et Développement des compétence (PEJEDEC) en Côte d’Ivoire. Washington, DC: Banque Mondiale et Abidjan, BCPEmploi. 2016Premand, Patrick, Oumar Barry, and Marc Smitz. "Transferts monétaires, valeur ajoutée de mesures d’accompagnement comportemental, et développement de la petite enfance au Niger. Rapport descriptif de l’évaluation d’impact à court terme du Projet Filets Sociaux." Washington, DC: Banque Mondiale. 2016.
WHY DOES THE RANDOMIZED ASSIGNMENT METHOD WORK WELL?
As discussed, the ideal comparison group should be as similar as possible to the treatment group in all respects, except with respect to its participation in the program that is being evaluated. When we randomly assign units to treatment and comparison groups, that randomized assignment process in itself will produce two groups with a high probability of being statistically identical – as long as the number of potential units to which we apply the randomized assignment process is sufficiently large.
Figure 3 illustrates why randomized assignment produces a comparison group that is statistically equivalent to the treatment group.
To estimate the impact of a program using randomized assignment, we simply take the difference between the outcome under treatment (the mean outcome of the randomly assigned treatment group) and our estimate of the counterfactual (the mean outcome of the randomly assigned comparison group). We can be confident that our estimated impact constitutes the true impact of the program, since we have eliminated all observed and unobserved factors that might otherwise plausibly explain the difference in outcomes.
Inserts 2 and 3 give real-world applications of randomized assignment to evaluate the impact of a number of different interventions around the world.
Insert 2: RANDOMIZED ASSIGNMENT AS A PROGRAM ALLOCATION RULE: CONDITIONAL CASH TRANSFERS AND EDUCATION IN MEXICO
The Progresa program, now called "Prospera," provides cash transfers to poor mothers in rural Mexico conditional on their children's enrollment in school and regular health checkups. The cash transfers, for children in grades 3 through 9, amount to about 50 percent to 75 percent of the private cost of schooling and are guaranteed for three years. The communities and households eligible for the program were determined based on a poverty index created from census data and baseline data collection. Because of a need to phase in the large-scale social program, about two-thirds of the localities (314 out of 495) were randomly selected to receive the program in the first two years, and the remaining 181 served as a comparison group before entering the program in the third year. Based on the randomized assignment, Schultz (2004) found an average increase in enrollment of 3.4 percent for all students in grades 1–8, with the largest increase among girls who had completed grade 6, at 14.8 percent. The likely reason is that girls tend to drop out of school at greater rates as they get older, so they were given a slightly larger transfer to stay in school past the primary grade levels. These short-term impacts were then extrapolated to predict the longer-term impact of the Progresa program on schooling lifetime and earnings.
Source: Schultz, Paul. "School Subsidies for the Poor: Evaluating the Mexican Progresa Poverty Program." Journal of Development Economics 74 (1): 199–250. 2004
Insert 3: RANDOMIZED ASSIGNMENT OF SPRING WATER PROTECTION TO IMPROVE HEALTH IN KENIA
The link between water quality and health impacts in developing countries has been well documented. However, the health value of improving infrastructure around water sources is less evident.
Kremer et al (2011) measured the effects of a program providing spring protection technology to improve water quality in Kenya, with random assignment of springs to receive the treatment.
Approximately 43 percent of households in rural Western Kenya obtain drinking water from natural springs. Spring protection technology seals off the source of a water spring to reduce contamination.
Starting in 2005, the NGO International Child Support (ICS) implemented a spring protection program in two districts in western Kenya. Because of financial and administrative constraints, ICS decided to phase in the program over four years. This allowed evaluators to use springs that had not received the treatment yet as the comparison group.
From the 200 eligible springs, 100 were randomly selected to receive the treatment in the first two years. The study found that spring protection reduced fecal water contamination by 66 percent and child diarrhea among users of the springs by 25 %.
Source: Kremer, Michael, Jessica Leino, Edward Miguel, and Alix Peterson Zwane. "Spring Cleaning: Rural Water Impacts, Valuation, and Property Rights Institutions." Quarterly Journal of Economics 126: 145–205. 2011
WHEN CAN RANDOMIZED ASSIGNMENT BE USED?
Randomized assignment can be used in one of the two scenarios as follows:
1. When the eligible population is greater than the number of program spaces available. When the demand for a program exceeds the supply, a lottery can be used to select the treatment group within the eligible population. The group that wins the «lottery» is the treatment group, and the rest of the population that is not offered the program becomes the comparison group. As long as a constraint exists that prevents scaling the program up to the entire population, the comparison groups can be maintained to measure the short-term, intermediate, and long-term impacts of the program.
2. When a program needs to be gradually phased in until it covers the entire eligible population. When a program is phased in, randomization of the order in which participants receive the program gives each eligible unit the same chance of receiving treatment in the first phase or in a later phase of the program. Until the last group joins the program, it serves as a valid comparison group from which the counterfactual for the groups that have already been phased in can be estimated. This setup also allows evaluating the effects of differential exposure to treatment: that is, the effect of receiving the program for a longer or shorter time.
STEPS IN RANDOMIZED ASSIGNMENT
Step 1 – Define the units eligible for the program. Remember that depending on the particular program, a unit could be a person, a health center, a school, a business, or even an entire village or municipality.
Step 2 – Select a sample of units from the population to be included in the evaluation sample.
This second step is done mainly to limit data collection costs. If it is found that data from existing monitoring systems can be used for the evaluation, and that those systems cover the full population of eligible units, then a separate evaluation sample may not be needed.
Step 3 – Form the treatment and comparison groups from the units in the evaluation sample through randomized assignment.
Figure 4 shows the main steps of successfully implementing the randomized assignment method.
Once the above steps are completed, what remains is relatively simple. Once the program has run for some time, outcomes for both the treatment and comparison units will need to be measured. The impact of the program is simply the difference between the average outcome (Y) for the treatment group and the average outcome (Y) for the comparison group.
Randomized assignment is the most reliable method of evaluating counterfactual data and, to a certain extent, the gold standard in the field of impact evaluation.
ESTIMATING THE IMPACT OF HISP: RANDOMIZED ASSIGNMENT
Let us now turn back to the estimation of the HISP pilot that involves 100 treatment villages.
Having conducted two impact evaluations using potentially biased counterfactuals (as mentioned above), the project team decided to obtain a more precise estimate – using randomized assignment. It was determined that building a valid estimate of the counterfactual will require identifying a group of villages that are identical to the 100 treatment villages in all respects. Since the 100 treatment villages were selected for HISP randomly from among all of the rural villages in the country, the treatment villages had the same characteristics as the general population of rural villages. All that was left to be done was to evaluate the difference between these two groups. Thus, data was collected on another 100 villages that were left out of the program.
Table 2 shows the average health expenditures of households in the comparison and treatment groups according to the same criteria. The pre-intervention average health expenditures of households in the two groups do not statistically differ, which is what’s expected with randomized assignment. Mathematical analysis showed that the outcome of the intervention was a reduction in household expenditures by USD 10.14 over two years.
Note: Significance level: ** = 1 percent.
Note: Significance level: ** = 1 percent.
Source: Paul J. Gertler, Sebastian Martinez, Patrick Premand, Laura B. Rawlings, and Christel M. J. Vermeersch. Impact Evaluation in Practice Second Edition. – International Bank for Reconstruction and Development / The World Bank, 2016
Summing up, we would like to say that the use of rigorous evaluation methods and the regular collection and monitoring of data about a project or program represents the main set of tools that the parties involved can use to verify and improve the effectiveness and efficiency of social projects and programs at various implementation stages.
REFERENCES:
11. Paul J. Gertler, Sebastian Martinez, Patrick Premand, Laura B. Rawlings, and Christel M. J. Vermeersch. Impact Evaluation in Practice Second Edition. – International Bank for Reconstruction and Development / the World Bank, 2016
2. Imbens, Guido W., and Donald B. Rubin. Rubin Causal Model. In the New Palgrave Dictionary of Economics, second edition, edited by Steven N. Durlauf and Lawrence E. Blume. Palgrave, 2008.
3. Rubin, Donald B. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology 66 (5): 688–701, 1974.
4. Bertrand, Marianne, Bruno Crépon, Alicia Marguerie, and Patrick Premand. Impacts à Court et Moyen Terme sur les Jeunes des Travaux à Haute Intensité de Main d’oeuvre (THIMO): Résultats de l’évaluation d’impact de la composante THIMO du Projet Emploi Jeunes et Développement des compétence (PEJEDEC) en Côte d’Ivoire. Washington, DC: Banque Mondiale et Abidjan, BCP-Emploi. 2016.
5. Blattman, Christopher, Nathan Fiala, and Sebastian Martinez. «Generating Skilled Self-Employment in Developing Countries: Experimental Evidence from Uganda.» Quarterly Journal of Economics 129 (2): 697–752. doi: 10.1093/qje/qjt057.
6. Bruhn, Miriam, and David McKenzie. «In Pursuit of Balance: Randomization in Practice in Development Field Experiments.» American Economic Journal: Applied Economics 1 (4): 200–232. 2009.
7. Dupas, Pascaline. 2011. «Do Teenagers Respond to HIV Risk Information? Evidence from a Field Experiment in Kenya.» American Economic Journal: Applied Economics 3 (1): 1–34.
8. Glennerster, Rachel, and Kudzai Takavarasha. Running Randomized Evaluations: a Practical Guide. Princeton, NJ: Princeton University Press. Randomized Assignment 87, 2013.
9. Kremer, Michael, Jessica Leino, Edward Miguel, and Alix Peterson Zwane. «Spring Cleaning: Rural Water Impacts, Valuation, and Property Rights Institutions.» Quarterly Journal of Economics 126: 145–205. 2011.
10. Kremer, Michael, and Edward Miguel. «Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities.» Econometrica 72 (1): 159–217. 2004.
11. Premand, Patrick, Oumar Barry, and Marc Smitz. «Transferts monétaires, valeur ajoutée de mesures d’accompagnement comportemental, et développement de la petite enfance au Niger. Rapport descriptif de l’évaluation d’impact à court terme du Projet Filets Sociaux.» Washington, DC: Banque Mondiale. 2016.
12. Schultz, Paul. «School Subsidies for the Poor: Evaluating the Mexican Progresa Poverty Program.» Journal of Development Economics 74 (1): 199–250. 2004.