Chapters 5-9

Chapters 5 - 9

CHAPTER 5
USING THE SAR AND THE SSS

What Are the SAR and the SSS? ¹

The Subgrant Award Report (SAR) is the form you complete when subgrant awards are made, to inform VAWGO how states are allocating their funds and what the subgrant projects plan to accomplish. An SAR is completed for each new or continuation subgrant awarded with each year's funds. Reports on subgrants funded with FY 1995 funds were done in the fall of 1996, and reports on FY 1996 subgrants are due in fall 1997. Beginning with reports for FY 1996 funds, the SAR can be completed and submitted to VAWGO on the computer.

The Subgrant Statistical Summary (SSS) is the follow-up to the SAR, and provides VAWGO with information on any changes in each subgrant's time period, funding, goals, or activities. It also provides feedback on what the project has accomplished during the reporting period. The SSS is a very useful tool for performance monitoring of immediate results. However, it does not give any information about the ultimate impact of a project. More and different data would be needed to determine how a specific training activity changed officers' attitudes, knowledge, and practices; whether a particular policy has been put into practice and with what effects; how a victim service program has helped women improve their lives; and so on.

The SSS is completed once a year for subgrants which were in operation during the previous calendar year. The first round of SSS reports is due in fall 1997, and will provide information on all subgrants active at any time during calendar year 1996. These might be subgrants funded with FY 1995 funds, FY 1996 funds, or both. They may have started before 1996, they may end during 1996, or they may continue into 1997. You will report on all activities from the project's start date through the end of 1996 (or the project's end date, whichever comes first). The reports will be completed on paper this year but will be available electronically beginning next year.

Getting Information for the SAR and the SSS

Your first question might be, How do I get the information I need to complete these forms? Both forms ask for fairly fundamental information on the wide range of subgrant projects which might be funded, and many project directors should already use record keeping systems which capture this information. However, in some cases you may need to collect additional information. This section describes the basic types of information needed for these forms and where you are likely to find them.

The SAR may be completed by either the state administrator or the subgrantee. It asks for information which was very likely provided in the subgrant application materials. This includes fundamental descriptive information such as the amount and type of STOP funding and match; project start and end dates; geographical target area; type of violence against women to be addressed; type of subgrantee agency; and subgrant purpose areas. It also asks for information on other important concerns expressed in the STOP legislation, including efforts to address full faith and credit and underserved populations. If any information requested in the SAR was not provided in the application materials or other existing sources, state administrators and subgrantees should take the opportunity to discuss how the subgrantee will provide the information, and also what information should be included in future subgrant applications.

The new electronic version provided for the FY 1996 subgrants asks for much of the same information as the earlier paper version used for the FY 1995 subgrants, so state administrators are likely to have already established procedures for getting this information. The major difference between the forms over these two years is that where the 1995 form asked for narrative write-in answers, the 1996 form provides answers to be checked off, based on the most common types of answers received in the earlier year. This should make it easier and quicker for you to complete the new SAR.

The SSS will most likely be completed by the subgrantee and forwarded for review by the state administrator, unless the state agency has established procedures for subgrantees to report this type of information to the state, and for the state agency to maintain the data. The SSS is divided into nine parts (I through IX). Which parts and how many parts each subgrantee will complete depends on the project's purpose areas.

Part I of the SSS must be completed for all subgrants. It provides updates on some of the fundamental descriptive information which may have changed over the course of the project. Each of the next seven parts (Part II through Part VIII) relates to one of the seven VAWA purpose areas and asks for very basic information for each purpose area the subgrant addressed.

The number of parts you complete depends on how many purpose areas your subgrant addressed. Many subgrantees will only complete one part, as their subgrant addresses only one purpose area. So, if your subgrant was focused entirely on delivering direct victim services, you would complete only Part VI; if it focused exclusively on stalking, you would complete only Part VII.

However, some of you may have to complete more than one part if your subgrant covered two or more purpose areas (e.g., if your subgrant includes training, policy development, and data system development, you would complete Parts II, IV, and V).

Much of the information you will need to complete the SSS should be available from records routinely kept by projects, such as sign-in sheets for training sessions or logs of calls to a hotline. Other information may be obtained from knowledgeable project staff, such as what other training-related activities were done, how many people staff the special unit, what topics are addressed in the policies you developed, the type of database you have created, and so on.

Part IX is very important. Everyone who serves victims directly should complete Part IX. Part IX asks you to describe the demographic and background characteristics of the victims you served, and is information that VAWGO is required to collect by the VAWA. Subgrants that focus on or have components that focus on special units, victim services, data systems, stalking, or serving Indian populations will all need to complete Part IX if they provided any services directly to victims.

Using the SAR and SSS Information

Information from these sources is invaluable to VAWGO and the Urban Institute as they monitor, evaluate, and report to Congress on how the STOP program is working. But what can it do for you? If you are a state administrator, it can help you keep track of how funds are being distributed across the state, what types of projects are working particularly well, and useful directions for future funding. If you are a subgrantee, this information can help you monitor the implementation and accomplishments of your project, make improvements as needed, and document achievements to justify future funding.

State-Level Monitoring and Planning

SAR and SSS information can be used to identify statewide patterns in STOP funding; what needs are being addressed; and what needs are remaining unmet. For example, you may be interested in knowing how many subgrants or how much funding is devoted to sexual assault versus domestic violence versus stalking. You can aggregate the information from these forms to get total spending for each type of crime for a single year, or identify spending trends over time as the STOP program continues. You may also want to do the same analysis of which underserved populations are being targeted for services, or which areas of the state have not received funding. Knowing how the funding has been allocated according to whatever factors are most important in your state can be very useful in deciding future funding priorities and strategies for soliciting and receiving the proposals you are most interested in.

You can also gain very valuable information by comparing SARs and SSSs on a subgrant-by-subgrant basis. This comparison will let you answer questions about how subgrant goals or activities change over time, and to what extent the projects are achieving their objectives. Suppose, for example, you compare the SARs and the SSSs and you find that the prosecution training projects are having more difficulty staying on time or reaching potential trainees than the law enforcement training projects. This may lead you to look for differences between law enforcement and prosecution in their training projects (e.g., what type of agency receives the funds, how the training is structured) or the general training environment (e.g., requirements, coordinating agencies, available curricula). You may then be able to identify what it is about the law enforcement training projects or environment that is more conducive to project success, and how these features can be borrowed or adapted to benefit the prosecution training projects. This may help you select future prosecution training proposals with a higher likelihood of success, and may indicate special projects needed (e.g., you may wish to fund a project to establish a coordinating agency for prosecution training).

Monitoring and Modifying the Subgrant

The information needed for the SAR and SSS can also be very useful to administrators of subgrants. The SAR provides a record of the activities and goals that you and the state STOP administrator agreed your project would address when the award was first made. The SSS you complete every year (for multiyear projects) provides a record of progress and milestones you have achieved. They can be used to improve how your program is implemented or renegotiate activities and goals if any have proven beyond the scope of the project. Of course, you need not assess progress only once a year when the SSS is due; you can monitor the project more often by getting quarterly or even monthly reports from the record keeping system you have established to provide the information requested in the SSS.

The SSS also documents what your project has accomplished—how many victims you have served, how many staff you have trained, or what policies you have developed. This information can be very useful in helping you assess what still needs to be done and how best to do it based on your experience working in the area. Being able to show that you have done successful work in the past and can identify and address unmet needs is very impressive to potential funders.

Providing Feedback on the Forms

Another very important type of information that you can provide is your opinion on the SAR and SSS forms themselves. Let your VAWGO Program Manager know what parts were difficult to use or interpret; what parts were not useful to you or did not really capture what your project is about; and what additional information you would like to provide to give a better understanding of what your project is about, or that would be more helpful in meeting your goals as you monitor, improve, and seek additional support for your statewide strategy or individual project. These forms can be modified from year to year, and your experience with them would provide very valuable insights about how they can be improved to meet everyone's needs for information.

How to Handle Special Circumstances

In some cases it might not be that easy to determine program outputs—how many personnel were trained, victims served, and so on—under STOP funds. This might be because the project's support came from more than one source, because the project is giving more or different services to the same victims it would have served without STOP funds, or because the project enhances service quality but does not increase the numbers of victims served or services offered. How do you report project activities and accomplishments on the SSS then?

Projects with Several Funding Sources

Suppose, for example, STOP funds supported a project which established a sexual assault special unit in a law enforcement agency and developed a policy on investigation and charging. This project is being supported with equal amounts of STOP, VOCA, and Byrne funds. The unit has five full-time staff who investigated 300 cases of sexual assault through 1996. The question is whether (1) to report on the SSS that the STOP funds were used to develop one-third of a policy, employ one and two-thirds staff, and investigate 100 cases; (2) to report some other allocation (perhaps the STOP funds were in fact used exclusively for policy development); or (3) to report the total outputs of the project without allocating them among funding sources.

You will provide the clearest and most comprehensive picture of what the project has accomplished by reporting in Parts III, IV, and IX the total outputs of the project (e.g., 5 staff, 1 policy, and 300 cases), even though it was supported by three sources of funding. To reflect the funding situation, make sure that your answer to Question 4 in Part I shows ALL the non-STOP funding sources being used to support this activity, including the dollar values assigned for in-kind support. Analyses of projects with multiple sources of funding done by state coordinators or national evaluators will then combine your answers to Parts III, IV, and IX with Part I, Question 4 to assess cost per unit of service, efficiency, or productivity. That is, they will compare the total inputs (STOP, other federal, state, local, and private funds and in-kind cash equivalents) to the total outputs (300 cases, 1 policy, and 5 staff). In some cases this may tell us how much was spent per staff, or per case, or per training session, or per policy, and so on. We may or may not be able to allocate costs to specific funding sources, but since you don't need to know where the funds came from to know whether a service is efficient or productive, this is a useful approach and provides the most comprehensive information on the projects STOP funds are supporting.

Projects that Enhance Services to the Same Victims

When STOP funds support a project that lets you serve victims you would not otherwise have reached, it is clear that those are the victims you should report on in Part IX. But suppose you were running a counseling service for sexual assault and domestic violence victims, and (Case A.1) you are using your new STOP funding to add court accompaniment for about 20 percent of your current clients, or (Case A.2) you are now able to offer victims up to 10 free counseling sessions whereas before you had to limit them to 5 free sessions. How do you characterize your services, and what victims do you include in Part IX?

CASE A.1: In Part VI, indicate on Questions 28 and 29 that your STOP subgrant gave new types of service to the same victims you would have served even without the STOP funds. Then, in Part IX, provide the characteristics of ONLY the victims who received court accompaniment (in addition to the counseling services you were already offering before STOP funding).

CASE A.2: In Part VI, indicate on Question 28 and 29 that your STOP subgrant gave more of the same types of service to the same victims you would have served even without the STOP funds. Then, in Part IX, provide the characteristics of the victims who received the increased number (6 or more) of free sessions.

Projects that Enhance Service Quality but NOT Numbers

Questions might also arise when STOP funds are supporting goals and activities which are not easily translated into numbers of services or numbers of victims. Suppose (Case B.1) you are using your STOP money to add a nighttime staff person to your shelter to make it more secure and make women feel safer, but you will not be serving any more women, unless (Case B.2) more women are now willing to stay in the shelter because they know it is safer. How do you report on your project and the victims it served?

CASE B.1: In Part VI, indicate on Question 28 and 29 that your STOP subgrant gave enhanced or improved services to the same victims you would have served even without the STOP funds. Then, in Part IX, provide the characteristics of ALL the victims who stayed in your shelter once the nighttime staff person was on board.

CASE B.2: In Part VI, indicate on Question 28 and 29 that your STOP subgrant gave enhanced or improved services to the same victims that you would have served even without STOP funding, AND ALSO to different victims than you would have served without STOP. Then, in Part IX, provide the characteristics of ALL the victims who stayed in your shelter once the nighttime staff person was on board.

However, this leaves you dissatisfied because what you really want to show is the impact of your project—that women feel safer when they stay in your shelter. You want to know whether women feel safer under the new arrangements because if they don't, you want to do something different. This is because you want them to feel safer—you are not doing this just to fill out forms for VAWGO. To answer this question you will need to go beyond the performance monitoring data you provided in the SSS and gather some impact data. You can ask around informally and stop there. Or, you can use one of the outcome measures described in Chapter 7 to assess the level of safety your clients feel and whether it has changed from before to after the new security measures were put in place. In fact, as you will usually have some time between notification of award and actually getting the money, you can use this time to collect data from the women who stay in your shelter without the enhanced nighttime security to make your "before" assessment. Then use the same measures to ask the clients who stay there with the enhanced security how they feel, and you will have your "after" assessment. Compare the two to see whether you have achieved your goal.

Other Ways to Get Useful Information

The SSS is not the only way you can find out what is going on across your state or how your project is coming along. Your clients and colleagues have many insights and suggestions they would no doubt be delighted to share if you asked them.

State administrators can turn to the statewide planning group or advisory board for advice and assistance. Members who represent law enforcement, prosecution, and victim services agencies and associations should have a good idea how projects are working in their areas and what additional work is still needed. State administrators can also consider convening regional or statewide subgrantee conferences to get their feedback directly and to promote cross-fertilization of ideas from one community to another. A model emphasizing multiagency community teams, similar to the model used in the VAWGO conference in July 1995, might be a useful approach. State administrators may also wish to borrow another page from the federal approach and support a state-level technical assistance project. Not only would this enhance project activities, but monitoring the types and sources of questions and issues may indicate areas for further work.

Subgrantees may also gain valuable insights by consulting other members of their community. You may find that your project is having unintended side effects, positive or negative, on other agencies, and can then work with those personnel to identify appropriate responses to spillover effects. You may also find that other agencies could contribute to or benefit from expansion of your project to include them; either way your project accomplishes more. And don't overlook the people who participate in your project—you can gain many valuable insights that would otherwise have never occurred to you by directly asking prosecutors how the special unit is coming along, asking court personnel what makes your database easy or difficult to use, asking law enforcement officers what they thought of the training you provided, or asking victims whether the project's services were helpful and what else they needed. STOP project staff may be able to hold a small number of informal interviews, but larger or more structured focus groups or surveys may be best undertaken by local or state evaluators involved with the project.

CHAPTER 6
CHOOSING AN EVALUATION DESIGN ¹

This chapter describes the major evaluation choices available to you and discusses the factors you need to consider in picking an approach for your evaluation. It covers the three types of evaluation described in Chapter 1—impact evaluation, process evaluation, and. performance monitoring. Following an overview of evaluation designs, the chapter briefly describes a variety of data sources and data collection strategies, including quantitative and qualitative ones. You may use one or more types of evaluation, and one or more data collection strategies, in evaluating your program. Once you have an overview of options in evaluation design, the chapter helps you figure out which level of evaluation might be right for your program, what evaluation activities to select, and how to proceed when your project is already operating and now you want to introduce an evaluation. The chapter also includes a discussion of how to safeguard victim rights and well-being when you are thinking of collecting data from the individuals you serve. The chapter ends with a list of additional reading about evaluation design and basic research methods.

Parts of this chapter are important for everyone to read. These include the section on choosing the right level of evaluation for your program (among impact and process evaluations, and performance monitoring), and the section on informed consent and data security. Other sections of this chapter may seem quite technical to some readers in their discussion of evaluation design choices and the data collection methods that might go along with them. Therefore, read those sections if you intend to get seriously involved in working with an evaluator to shape the design of an evaluation, or if you intend to design your own evaluation (in the latter case, you probably will also want to consult some of the books listed in the Addendum at the end of this chapter).

Readers who are not going to be deeply involved in evaluation design can still interact effectively with evaluators without committing the more technical parts of this chapter to memory. The most important thing for these readers is to become familiar with the general evaluation options and choices (impact or process evaluation, performance monitoring, and what is meant by a comparison group and what makes a good one) by skimming the more technical sections to get a general idea of design options. Exhibit 6.1, which will be found at the end of the material on evaluation design options, gives you a quick graphic overview and summary of the technical material described in the impact evaluation design portions of this chapter. The exhibit is a "decision tree" showing the design options for impact evaluations available to you under different conditions.

Impact Evaluation Designs

Our description of impact evaluations begins with the least demanding design and moves to more elaborate designs. The following sections present the key elements of each design and variations you can consider. The strengths and limitations of each are summarized as are the general requirements of each in terms of resources such as budget and staff. As you move through these choices, the budget increases as does the extent to which you produce scientifically convincing results. However, as noted below, the best choice is often driven by a consideration of the audience for your results—who wants to know, when do they need to know, what issues do they care about, and what types of information will convince them?

We have tried to avoid unnecessary jargon in describing evaluation methods, but we do use the traditional evaluation terms to describe the people from whom you will be collecting data. Project participants are called "the treatment group" and the services they receive are called "the treatment." Those who do not receive services are called "the control group" (if people are randomly assigned to treatment and control groups) or "the comparison group" (if some method other than random assignment is used to select this group).

Non-Experimental Impact Evaluations

Key Elements. Non-experimental impact evaluations examine changes in levels of risk or outcomes for project participants, or groups that may include project participants (e.g., all women in a particular neighborhood). Non-experimental designs do not compare the outcomes for participants to individuals or groups who do not get services.

Design Variations. You can choose from four primary types of non-experimental design: (1) comparisons of groups before and after treatment; (2) time series designs; (3) panel studies; and (4) cross-sectional comparisons after a treatment has been delivered.

The first two designs are based on analysis of aggregate data—that is, data for groups, not for individuals. In a before and after comparison, outcomes for groups of participants that enter the project at a specific time and progress through it over the same time frame are measured before and after an intervention. Your assessment of program impact is inferred from the differences in the average score for the group before and after the services. This simple design is often used to assess whether knowledge, attitudes, or behavior of the group changed after exposure to an intervention. For example, a project focused on training might ask whether the average score on knowledge about domestic violence policies increased for your group of participating police, prosecutors, or others after the training compared to the baseline score measured at the start of training. Similarly, you could measure public attitudes or beliefs before and after a public safety campaign.

A time series design is an extension of the before and after design that takes measures of the outcome variables several times before an intervention begins (e.g., once a month for the six months before an intervention starts) and continues to take measures several times after the intervention is in place (e.g., once a month for six months after the intervention). The evaluation tests whether a statistically significant change in direction or level of the outcome occurs at or shortly after the time of the intervention. For example, a project trying to increase community collaboration could begin collecting information on the number of cross-agency referrals and other collaborative actions every month for the six months before intensive collaboration development efforts begin, and for every month of the two years following the initiation of collaborative work. You could then trace the development of collaborative activity and tie it to events in the community (including the timing of stepped-up efforts to promote collaboration).

Time series measures may be collected directly from project participants. However, people also use a time series design based on information from larger groups or units that include but are not restricted to project participants. For example, rates of reported violent offenses against women for neighborhoods in which special police patrols are introduced might be used to assess reductions in violence. A time series design using publicly available data (such as the rate of violent offenses just suggested) should be considered when it is difficult to identify who receives project services, or when the evaluation budget does not support collection of detailed data from project participants. Although new statistical techniques have strengthened the statistical power of these designs, it is still difficult to rule out the potential impact of non-project events using this approach.

The next two designs examine data at the individual level (that is, data come from individuals, not just from groups). Cross-sectional comparisons are based on surveys of project participants that you conduct after the project is completed. You can use the data collected with this design can be used to estimate correlations between the outcomes experienced by individuals and differences in the duration, type, and intensity of services they received. This will let you draw some conclusions about plausible links between outcomes and services within your treatment group. However, you can not draw definitive conclusions about what caused what, because you do not have any type of comparison group that would let you say "it happened for those who got services, but not for those who did not get services." Panel designs use repeated measures of the outcome variables for individual participants in a treatment. In this design, outcomes are measured for the same group of project participants, often starting at the time they enter the project and continuing at intervals over time. The design is similar to the "time series" design described earlier, but the data come from individuals, not from groups, and data collection rarely starts before the individuals enter the program or receive the intervention.

Considerations/Limitations. Correctly measuring the services received by project participants is critical in non-experimental evaluations. Because the inferences about project impact are based on response to services, differences in the type and amount of service received are critical. The key variations in services need to be spelled out carefully in developing your logic model. Several limitations to non-experimental designs should be noted:

First, the cross-sectional and panel designs provide only a segment of the "dose-response curve." That is, they only give estimates of the differences in impact related to differences in the services received. These designs cannot estimate the full impact of the project compared to no service at all, unless estimates can be based on other information on the risks of the target population.
Second, the designs that track participants over time (before and after, panel, and time series) cannot control for the effects of changes that would have occurred anyway without services, or for the effects of other events outside the project's influence.
Third, the extent to which the results can be assumed to apply to other groups or other settings is limited, because this design provides no information for assessing the extent to which outcomes for those who participated differ from those who might be eligible for services but chose not to participate. For example, if those who came to your training program were those interested in learning more about domestic violence, they might show greater attitude changes after training than a group of trainees required to come by their supervisor. Alternatively, if the project provides intensive counseling only for sexual assault victims who are still showing severe psychological symptoms one or more years after the assault, their gains in mental health might be slower than those of victims whose symptoms (e.g., of fear, anxiety, or depression) abated more quickly.

Practical Issues/Data Collection. Non-experimental designs have several practical advantages. They are relatively easy and inexpensive to conduct. Data from individuals for cross-sectional or panel analyses are often collected routinely by the project at the end (and sometimes beginning) of project participation. When relying on project records, the evaluator needs to review the available data against the logic model to be sure that adequate information on key variables is already included. If some key data are missing, the evaluator needs to set up procedures for collecting additional data items.

When individual project records are not available, aggregate statistics may be obtained from the project or from other community agencies that have information on the outcomes you care about. The primary problem encountered in using such statistics for assessing impacts is that they may not be available for the specific population or geographic area targeted by the project. Often these routinely collected statistics are based on the general population or geographic areas served by the agency (e.g., the police precinct or the clinic catchment area). The rates of negative outcomes for the entire set of cases included may well be lower than rates for your target group, if you are trying to serve those with the most severe cases or history of violence. The larger the population or geographical area covered by the statistics, the greater the risk that any effects on program participants will be swamped by the vastly larger number of nonparticipants included in the statistics.

A more expensive form of data collection for non-experimental evaluations is a survey of participants some time after the end of the project. These surveys can provide much needed information on longer term outcomes such as rates of employment or earnings for battered women after leaving the battering situation, or psychological health for sexual assault victims one or more years after the assault. As in any survey research, the quality of the results is determined by response rate rather than by overall sample size, and by careful attention to the validity and reliability of the questionnaire items.

There are a variety of data collection strategies for use in non-experimental and other types of evaluation designs. We describe a number of them later in this chapter, after the review of evaluation designs is completed.

Quasi-Experimental Designs

Key Elements. Quasi-experimental evaluations compare outcomes from project participants to outcomes for comparison groups that do not receive project services. The critical difference between quasi-experimental and experimental designs is that the decision on who participates in the program is not random. Comparison groups are made up of individuals as similar as possible to project participants on factors that could affect the selected outcomes you want to measure. Statistical techniques are then used to control for remaining differences between the groups.

Usually, evaluators use existing groups for comparison—victims (or police officers) in the same or similar neighborhoods of the city who did not receive services (or training), or those who have similar cases in other neighborhoods. In some situations, selected staff (or precincts or court dockets) try a new "treatment" (approach to services) while others do not. When selecting a comparison group, you need to be sure that the comparison group is indeed similar to the treatment group on critical factors. If victims are to be served or officers are to be trained, those receiving new services should be similar to those who get the existing services.

Design Variations. As just described, the main way to define a comparison group is to find an existing group as similar as possible to the treatment group. The most common variation to the "whole group" approach is called "matching." In matching, the researcher constructs a comparison "group" by matching individuals who do not receive treatment to individuals in the treatment group on a selected set of characteristics. This process for constructing a comparison group runs two relatively serious threats to validity. The first is that the groups, while similar at the time of selection, may change over time due to pre-existing characteristics. As a result, changes over time may reflect factors other than the "treatment." The second is that the researcher may have failed to use key variables influencing outcomes the matching process. These variables, which differed between the two groups at the outset, may still cause matched groups to differ on outcomes for reasons other than the treatment. To do the best possible job on selecting critical variables for matching, you should refer to the background factors which your logic model identifies as likely to influence outcomes. These factors should be used in the match.

Quasi-experimental designs vary in the frequency and timing of collecting data on outcome measures. One makes decisions about the frequency and timing of measurements after assessing the potential threats posed by competing hypotheses that cannot be ruled out by the comparison methodology. In many situations, the strongest designs are those that collect pre-project measures of outcomes and risk factors and use these in the analysis to focus on within-individual changes that occur during the project period. These variables are also used to identify groups of participants who benefit most from the services. One design variation involves additional measurement points (in addition to simple before and after) to measure trends more precisely. Another variation is useful when pre-project data collection (such as administering a test on knowledge or attitudes) might "teach" a sample member about the questions to be asked after the project to measure change, and thus distort the measurement of project impact. This variation involves limiting data collection to the end of the project period for some groups, allowing their post-project answers to be compared with the post-project answers of those who also participated in the pre-project testing.

Considerations/Limitations. Use of non-equivalent control group designs requires careful attention to procedures that rule out competing hypotheses regarding what caused any observed differences on the outcomes of interest.

A major threat in STOP evaluations may be that known as "history" —the risk that unrelated events may affect outcomes. The rapid change in laws, services, and public awareness of violence against women may affect the policies and services available to treatment and comparison groups alike. Changes may occur suddenly in large or small geographic areas, jurisdictions, or service catchment areas. For example, if one court begins using a victim advocate successfully, other nearby courts may adopt the practice or even undertake a more comprehensive project with similar goals. The same is true of prosecution strategies or law enforcement approaches. If your comparison group came from the courts or offices that leapt on the bandwagon shortly after you drew your sample, your "comparison group" has just become a treatment group.

A second threat to validity is the process of "selection" —the factors that determine who is eligible for, or who chooses to use, services. Some of these factors are readily identified and could be used in selecting the comparison sample, or could be included in the statistical models estimating project impact. For example, if victims who do not speak English are excluded from services either formally or informally, the comparison of outcomes needs to consider English language proficiency as a control variable. Such differences may not be easy to measure during the evaluation.

Practical Issues/Data Collection. It is a challenge to build defenses or "controls" for threats to validity into evaluation designs through the selection of comparison groups and the timing of outcome observations. Even when the comparison group is carefully selected, the researcher cannot be sure that all relevant group differences have been identified and measured accurately. Statistical methods can adjust for such problems and increase the precision with which project effects can be estimated, but they do not fully compensate for the non-random design. Findings need to be interpreted extremely cautiously, and untested alternative hypotheses need to be considered carefully.

Plans for quasi-experimental evaluations need to pay close attention to the problem of collecting comparable information on control group members and developing procedures for tracking them. You may be able to collect data and provide contact information for treatment group members relatively easily because the program and cooperating agencies have continuing contacts with clients, other agencies, and the community, and have a stake in the outcome of your evaluation. Collecting comparable data and contact information on comparison groups can be difficult. If you collect more complete information for your treatment group than for your comparison group or lose track altogether of more comparison than treatment group members, not only will the evaluation data be incomplete, it will be biased—that is, it will provide distorted and therefore misleading information on project impact. The best way to avoid bias from this problem is to plan tracking procedures and data collection at the start of the evaluation, gathering information from the comparison group members on how they can be located, and developing agreements with other community agencies, preferably in writing, for assistance in data collection and sample member tracking. These agreements are helpful in maintaining continuing contact with your sample in the face of staff turnover at the agencies involved.

Quasi-experimental designs may employ a variety of quantitative and qualitative approaches to gather the data needed to draw conclusions about a project and its impact. Data collection strategies are described below, once we have reviewed all of the options for evaluation design.

Experimental Designs

Key Elements. Experimental designs are considered the "gold standard" in impact evaluation. Experiments require that individuals or groups (e.g., trainees, police precincts, courtrooms, or victims) be assigned at random (by the flip of a coin or equivalent randomizing procedure) to one or more groups prior to the start of project activities. A "treatment" group receives particular services designed to achieve clearly specified outcomes. If several new services are introduced, the experiment can compare multiple treatment groups. A "control" group continues to receive the services in existence prior to the introduction of the new project (either no services or already existing services). The treatment group outcomes are compared to outcomes for alternative treatment groups and/or to a control group to estimate impact. Because chance alone determines who receives the project services, the groups can be assumed to be similar on all characteristics that might affect the outcome measures. Any differences between treatment and control groups, therefore, can be attributed with confidence to the effects of the project.

Design Variations. One design variation is based on a random selection of time periods during which services are provided. For example, new services may be offered on randomly chosen weeks or days. A version of this approach is to use "week on/week off" assignment procedures. Although not truly random, this approach closely approximates random assignment if client characteristics do not vary systematically from week to week. It has the major advantage that project staff often find it easier to implement than making decisions on project entry by the flip of a coin on a case-by-case basis. A second design variation is a staggered start approach in which some members of the target group are randomly selected to receive services with the understanding that the remainder will receive services at a later time (in the case of a school or classroom, the next month, semester, or year). One disadvantage of the staggered start design is that the observations of outcomes are limited to the period between the time the first group completes the project and the second group begins. As a result, it is generally restricted to assessing gains made during participation in relatively short-term projects.

Limitations/Considerations. Although experiments are the preferred design for an impact evaluation on scientific grounds, random assignment evaluations are not always the ideal choice in real life settings. Some interventions are inherently impossible to study through randomized experiments for legal, ethical, or practical reasons. Laws cannot be enforced selectively against a randomly selected subset of offenders or areas in a community. Access to legal protections cannot be curtailed. For example, protection orders cannot be issued to victims only during selected weeks. Essential services should not be withheld. However, it may be possible to randomly assign alternative services or responses if the relative merits of the alternatives are unknown.

You need to ask yourself whether the results that are likely to be obtained justify the investment. Experiments typically require high levels of resources—money, time, expertise, and support from project staff, government agencies, funders, and the community. Could the answers to evaluation questions—and subsequent decisions on project continuation, expansion, or modification—be based on less costly, less definitive, but still acceptable evaluation strategies? The answer is often "yes."

Practical Issues/Data Collection. Experimental designs run the most risk of being contaminated because of deliberate or accidental mistakes made in the field. To minimize this danger, there must be close collaboration between the evaluation team and the project staff in identifying objectives, setting schedules, dividing responsibilities for record-keeping and data collection, making decisions regarding client contact, and sharing information on progress and problems. Active support of the key project administrators, ongoing staff training, and communication via meetings, conference calls, or e-mail are essential.

Failure to adhere to the plan for random assignment is a common problem. Staff are often intensely committed to their clients and will want to base project entry decisions on their perceptions of who needs or will benefit most from the project—although these judgments may not be supported by later research. Thus it is important that the evaluator, not project staff, remain in charge of the allocation to treatment or control group.

As in quasi-experimental evaluations, lack of comparable information for treatment and control group members can be a problem. Experiments generally use both agency records and data collected from individuals through questionnaires and surveys. To assure access to these individuals, quasi-experimental evaluations need to plan for data collection and tracking of sample members at the start of the project and get agreements with agencies and consent procedures with individuals in place early in the process.

Along with all other types of impact evaluation, quasi-experimental designs often combine quantitative data with qualitative information gathered through process evaluation in order to understand more about the program when interpreting impacts on participants. Another issue is documenting what parts of the program each participant received. If the project services and content change over time, it may be difficult to determine what level or type of services produced the outcomes. The best strategy is to identify key changes in the project and the timing of changes as part of a process evaluation and use this information to define "types of project" variations in the project experience of different participants for the impact analysis.

The Impact Evaluation Design "Decision Tree"

Exhibit 6.1 is a "decision tree" taken from Harrell (1996), organized around a set of questions to which the program wanting to conduct an impact evaluation answers "yes" or "no." With each answer the program advances closer to a decision about the type of impact evaluation most appropriate for its circumstances and resources. This decision tree is a quick graphic way of summarizing the foregoing discussion about alternative impact evaluation designs and their requirements. If your program is ready for impact evaluation, the "decision tree" may help you to think about the type of evaluation that would best suit your program.

Process Analysis

Key Elements

Process evaluations rarely vary in basic design. Most involve a thorough documentation and analysis of activities of the program. A good process analysis design is guided by a set of core questions: Is the project model is being implemented as specified and, if not, how do operations differ from those initially planned? Does the program have unintended consequences and unanticipated outcomes and, if so, what are they and who is affected? What is the view of the project from the perspectives of staff, participants, and the community? The answers to these questions are useful in providing guidance to policy makers and project planners interested in identifying key project elements and in generating hypotheses about project impact that can be tested in impact analyses.

Design Variations

Process evaluations vary in the number of projects or sites included. Most process evaluations focus on a single project or site. However, some undertake comparative process analysis. Comparative process analysis requires that observations, interviews, and other data collection strategies be structured in advance around a set of questions or hypotheses about elements of implementation believed to be critical to project success. Comparative process analysis allows the evaluation to make assessments about alternative strategies and is useful in generalizing the findings to other settings or jurisdictions. This strategy is used to assess which approach is most successful in attaining goals shared by all when competing models have emerged in different locations. It requires purposely selecting sites to represent variations in elements or types of projects, careful analysis of potential causal models, and the collection of qualitative data to elaborate the causal links at each site.

Most design uncertainties in process evaluation involve deciding what information will be collected, from whom and how. Process evaluation can be based solely on qualitative data. However, qualitative data are usually combined with quantitative data on services produced, resources used, and outcomes achieved. Qualitative data collection strategies used in process evaluation include semi-structured interviews with those involved in project planning and operations; focus groups with project planners, staff, or participants; and researcher observations of project activities. Data collection strategies for use with all types of evaluation are described below, following the presentation of performance monitoring.

Practical Issues

In a process evaluation, it is often difficult to decide on what information is truly key to describing program operations and what information is simply extraneous detail. In selecting relevant data and posing questions about program operations, the evaluator needs to refer carefully to the logic model prepared at the start of the project, although it is permissible and important in process evaluation to revise the original logic model in light of findings during the evaluation.

Analysis of qualitative data requires considerable substantive knowledge on the part of the evaluator. The evaluator needs to be familiar with similar projects, respondents, and responses, and the context in which the project is operating. Your evaluator will need to be able to understand the project's historical and political context as well as the organizational setting and culture in which services are delivered. At the same time, the evaluator needs to maintain some objectivity and separation from project management in order to be able to make an unbiased assessment of whether responses support or refute hypotheses about the way the project works and the effects it has.

Collecting qualitative data also requires skilled researchers who are experienced in interviewing and observing. Data must be carefully recorded or taped. Notes on contextual factors and interim hypotheses need to be recorded as soon as possible after data collection. When using interview guides or semi-structured interview protocols, interviewers must be trained to understand the intent of each question, the possible variety of answers that respondents might give, and ways to probe to ensure that full information about the issues under investigation is obtained.

Performance Monitoring

Key Elements

Performance monitoring is used to provide information on (1) key aspects of how a system or project is operating; (2) whether, and to what extent, pre-specified project objectives are being attained (e.g., numbers of women served by a shelter, increases in cases prosecuted, improved evidence collection); and (3) identification of failures to produce project outputs (this kind of data can be used in managing or redesigning project operations). Performance indicators can also be developed to (4) monitor service quality by collecting data on the satisfaction of those served; and (5) report on project efficiency, effectiveness, and productivity by assessing the relationship between the resources used (project costs and other inputs) and the output and outcome indicators.

If conducted frequently enough and in a timely way, performance monitoring can provide managers with regular feedback that will allow them to identify problems, take timely action, and subsequently assess whether their actions have led to the improvements sought. Performance measures can also stimulate communication about project goals, progress, obstacles, and results among project staff and managers, the public, and other stakeholders. They focus attention on the specific outcomes desired and better ways to achieve them, and can promote credibility by highlighting the accomplishments and value of the project.

Performance monitoring involves identification and collection of specific data on project outputs, outcomes, and accomplishments. Although they may measure subjective factors such as client satisfaction, the data are numeric, consisting of frequency counts, statistical averages, ratios, or percentages. Output measures reflect internal activities: the amount of work done within the project or organization. Outcome measures (immediate and longer term) reflect progress towards project goals. Often the same measurements (e.g., number/percent of women who filed for a protection order) may be used for both performance monitoring and impact evaluation. However, unlike impact evaluation, performance monitoring does not make any rigorous effort to determine whether these outcomes were caused by project efforts or by other external events.

Design Variations

When projects operate in a number of communities, the sites are likely to vary in mission, structure, the nature and extent of project implementation, primary clients/targets, and timeliness. They may offer somewhat different sets of services, or have identified somewhat different goals. In such situations, it is advisable to construct a "core" set of performance measures to be used by all, and to supplement these with "local" performance indicators that reflect differences. For example, some victim service projects will collect detailed data on the needs of women or the history of domestic violence, while others will simply have data on the number provided with specific services. Performance indicators need to be constructed so that results can be compared across projects in multi-site projects.

Considerations/Limitations

Indicators of outcomes should be clearly differentiated from elaborate descriptions of the population served. For example, there is a tendency of funders to ask for nitty-gritty details about program clients, when they should be asking what the program has done for these women. Take the case of victim services programs under VAWA. The governing legislation specifies only a few victim characteristics as the information that must be reported. This is quite different from the more important information of what the programs did for victims (services provided), and whether the victims benefited from the services. We probably need only basic information about victims, and might do better to concentrate our evaluation effort on understanding the short- and long-term outcomes that the program has helped them achieve. Chapter 7 lays out what these might be, from a sense of being heard and understood, to living in safety and peace of mind.

In selecting performance indicators, evaluators and service providers need to consider:

The relevance of potential measures to the mission/objective of the project. Do process indicators reflect project strategies/activities identified in mission statements or written agreements between agencies? Do outcome indicators cover the full range of identified objectives? Do indicators capture the priorities of project planners and staff?
The comprehensiveness of the set of measures. Does the set of performance measures cover inputs, outputs, and service quality as well as outcomes? Does it include relevant items of customer feedback?
The project's control over the factor being measured. Does the project have influence/control over the outputs or outcomes measured by the indicator? If the project has limited or no influence over the outputs or outcomes being measured, the indicator may not fairly reflect project performance.
The validity of the measures. Do the proposed indicators reflect the range of outcomes the project hopes to affect? Are the data free from obvious reporting bias?
The reliability and accuracy of the measures. Can indicators be operationally defined in a straightforward manner so that supporting data can be collected consistently over time, across data gatherers, and across communities? Do existing data sources meet these criteria?
The feasibility of collecting the data. How much effort and money is required to generate each measure? Should a particularly costly measure be retained because it is perceived as critically important?

Practical Issues

The set of performance indicators should be simple, limited to a few key indicators of priority outcomes. Too many indicators burden the data collection and analysis and make it less likely that managers will understand and use reported information. At the same time, the set of indicators should be constructed to reflect the informational needs of stakeholders at all levels—community members, agency directors, and national funders. Most importantly, the performance indicators should reflect key activities defined as central to the project in the logic model.

Regular measurement, at least quarterly, is important so that the system provides the information in time to make shifts in project operations and to capture changes over time. However, pressures for timely reporting should not be allowed to sacrifice data quality. For performance monitoring to take place in a reliable and timely way, the evaluation should include adequate support and plans for training and technical assistance for data collection. Routine quality control procedures should be established to check on data entry accuracy and missing information. At the point of analysis, procedures for verifying trends should be in place, particularly if the results are unexpected.

The costs of performance monitoring are modest relative to impact evaluations, but still vary widely depending on the data used. Most performance indicator data come from records maintained by service providers. The added expense involves regularly collecting and analyzing these records, as well as preparing and disseminating reports to those concerned. This is typically a part-time work assignment for a supervisor within the agency. The expense will be greater if client satisfaction surveys are used to measure outcomes. An outside survey organization may be required for a large-scale survey of past clients; alternatively, a self-administered exit questionnaire can be given to clients at the end of services. In either case, the assistance of professional researchers is needed in preparing data sets, analyses, and reports.

Data Collection Strategies

Quantitative Strategies

There are many types of quantitative data collection strategies and sources. The interested reader can pursue more details through the references provided at the end of this chapter. Here we present only the briefest descriptions of the most common types of data:

Data from and records of public agencies, including the program being evaluated (on services received, cases filed, case outcomes, recidivism, injury, etc.). These have the advantage of already existing, so no new data collection is needed, and often some parts of the data are published in public documents. However, they have several disadvantages for which your design might want to try to compensate (1) they only report on people who contacted public agencies; (2) they rarely contain all of the information you want; (3) they usually cover a geographical area that contains many more people than you will be serving, thereby making it very hard to see an effect; (4) if you want to use actual records rather than published reports, they are not always easy to obtain from the agency (both in terms of getting permission and in terms of extracting information from files). Published data will never be at the individual level, but can still be useful in characterizing the experiences of districts, neighborhoods, or cities. Data from files can be accessed at the individual level, but it takes a lot of work. Further, when relying on public records you need to consider whether the treatment group is more closely monitored than the comparison group—which could lead to an error in interpretation. For example, an intensive supervision probation program involves very careful monitoring of offenders. This careful monitoring may mean that probation officers know more about what offenders are doing than regular probation officers do, and may result in higher rates of probation violation because of that knowledge, even if the two groups of offenders actually commit violations at the same rate.
Systematic data collection directly from individuals—through questionnaires, exit surveys, telephone or mail surveys, or in-person interviews. Any type of evaluation can use this data source, and many do so. Going directly to the people served (and to comparison group members) is often required and useful for a number of reasons (1) you can get the data you want, rather than the data that an agency has collected for other purposes; (2) you can sometimes get a more honest and complete reading on people's experiences than they would tell an official agency; (3) you will not be burdening program staff with the extra work of data collection; (4) if you have a comparison or a control group, you have no other source of information about them other than to go to them directly (that is, they are not participants in the program, so the program cannot tell you anything about them). Further, collecting information in different ways from the treatment and the comparison group can introduce considerable measurement error.
Surveys need to have careful sampling plans that describe who is included in the study and how eligibility is defined. You need to know who is represented by the sample so it will be clear to what groups the findings can be generalized. Other issues in survey design may require considerable expertise. You need questions that are well constructed, easily understood by the respondent, and unambiguously interpreted. They need to be phrased in neutral ways to avoid a response bias. You need to consider who is administering the survey and in what setting. The willingness of respondents to give complete and accurate answers may depend on their confidence in the interviewer and the protections of privacy built into the survey procedures.

Qualitative Strategies

Qualitative data collection strategies are extremely useful. They can stand by themselves, as they do in certain types of process evaluation or case studies. Or, they can be used in combination with quantitative methods as part of virtually any of the designs described in this chapter. As with the quantitative strategies just described, the interested reader can pursue more details through the references provided at the end of this chapter. Qualitative strategies include:

Semi-structured interviews, which contain specific questions about particular issues or project practices. The "semi" aspect of these discussion guides refers to the fact that a respondent may give as long, detailed, and complex a response as he or she desires to the question—whatever conveys the full reality of the project's experience with the issue at hand. If some issues have typical categories associated with them, the protocols will usually contain probes to make sure the researcher learns about each category of interest.
Focus groups, which seek to understand attitudes through a series of group discussions guided by one researcher acting as a facilitator, with another researcher present to take detailed notes. Five or six general questions are selected to guide open-ended discussions lasting about an hour and a half. The goals of the discussions may vary from achieving group consensus to emphasizing points of divergence among participants. Discussions are tape-recorded, but the primary record is the detailed notes taken by the researcher who acts as recorder. Less detailed notes may also be taken publicly, on a flip-chart for all to see, to try to achieve consensus or give group members the chance to add anything they think is important. Soon after a particular focus group, the recorder and facilitator summarize in writing the main points that emerged in response to each of the general questions. When all focus groups are completed, the researchers develop a combined summary, noting group differences and suggesting hypotheses about those differences.
Observations of project activities, which may be guided by structured or semi-structured protocols designed to ensure that key items reported in interviews are verified and that consistent procedures for rating project performance are used across time and across sites. In addition, process evaluation includes notes maintained by researchers on meetings and activities they see while collecting data on the project.

The Planner's Questions

1. Who is the audience for the evaluation? Who wants to know, what do they want to know, when do they need the information, and what types of data will they believe?

2. What kinds of evaluation should be included? Impact evaluation, process evaluation, performance monitoring, or all three?

3. What does the logic model indicate about the key questions to be asked?

4. What kinds of data can be collected, from whom, by whom, and when?

5. What levels of resources—budget, time, staff expertise—are required? What are available?

Additional Considerations in Planning your Evaluation

What Level of Evaluation to Use

Every project can do performance monitoring, regardless of whether you can find a good comparison group or whether you undertake a full-fledged process or impact evaluation. Collecting data to describe clients served can show you changes over time, and whether you are meeting certain goals, such as increasing the proportion of your clients who come from underserved populations or reaching clients earlier in their process of deciding to leave a batterer. Routinely collecting data on which services you have given people and who gets them allows you to track whether everyone who needs certain services gets them, which are your most and least frequently used services, what types of services are least likely to be available, and so on. For police and prosecution agencies, such tracking can also document where people get "stuck" in the system, and perhaps help you unplug important bottlenecks.

In addition to performance monitoring, most projects can benefit from some level of process evaluation, in which you compare your processes to your logic model and see where the problems lie. A good process evaluation can help you improve your program, and can also get you to the point where conducting an impact evaluation will be worth the investment.

Designs for Projects Already in Progress

Many projects cannot begin evaluating "at the beginning," because they are already operating at full strength when the evaluation begins. You need not let this stop you. You can still construct meaningful comparison groups in a number of ways, and you can certainly begin collecting data on your own clients as soon as you know the evaluation is going to proceed.

For comparison groups, you can use participants in other programs that do not have the type of intervention you are doing (i.e., collect data from participants in a program similar to yours but across town, or in the next county, which does not have the upgraded services your STOP grant provides), or you can use participants in your own program who predate the enhanced services (i.e., collect follow-up data on women who went through your program before your STOP grant started). If necessary when doing this, you can also collect information on participant characteristics that your old intake forms did not include.

Even without a comparison group, you can do performance monitoring for clients beginning as soon as (or even before) you get your STOP money. As described above, you can learn a lot from performance monitoring, and it can be of great help by providing feedback to help you improve your program. In addition, you can institute "exit interviews" or other exit data collection, through which you can get important feedback about client perceptions of and satisfaction with services. Chapter 7 offers for some ideas about what to measure at these interviews, and how to do it.

Informed Consent, Follow-Up Arrangements, and Confidentiality/Data Security

Ethical considerations dictate a careful review of the risks and benefits of any evaluation design. The risks to victims, project staff, and offenders need to be enumerated and strategies to minimize them should be developed. Studies of violence against women need to be particularly sensitive to avoiding "secondary victimization" through data collection procedures that could cause psychological or emotional trauma, place the victim (particularly in family violence cases) at risk from the offender, or reveal private information including the woman's status as a victim.

A review of whether the evaluation procedures meet acceptable standards for the protection of the individuals and agencies being studied should be conducted before work begins. Many funders require a formal review of the research design by a panel trained in guidelines developed to protect research participants. Even when such review is not required, explicit consideration of this issue is essential. Two considerations should be part of this review—informed consent, and confidentiality/data security.

Informed consent refers to what you tell people about what you want from them, the risks to them of participating in the research/evaluation, the benefits that might accrue to them from participating, and what you intend to do to protect them from the risks. With respect to women victims of violence from whom you wish to collect data on impacts at some later time, it also involves establishing permission for follow-up and procedures for recontact that will safeguard the woman. You owe it to your evaluation participants to think these matters through and write out a clear and complete statement of risks and protections. Then, before you gather any information from women, share this disclosure with them and get their consent to continue. Some funders will require that you get this consent in writing. Informed consent is relevant not only with evaluation participants who have been victims, but is also necessary with anyone you gather information from, including agency employees, volunteers, and members of the general public.

If you want to collect follow-up information on impacts, you will need permission to recontact, and will also need to set up safe procedures for doing so. Even if you have no immediate plans to conduct follow-up data collection, if you are beginning to think about doing and evaluation and realize that you might need to recontact women in the future, consider setting up a permission procedure now. Set up a form for this, that includes the text of the appeal you will make (see below), plus spaces to note agreement or refusal and, if agreement, the woman's name and contact information. With every woman who receives help from your agency, say something like:

"We are very interested in improving our services to help women more. To do this, it would be very helpful if we could contact you at some future time to learn about what has happened to you, whether you think our efforts helped, and what more we could have done to assist you. Would you be willing to have us contact you again, if we can work out a safe way to do so? [if yes...] What would be the safest way for us to contact you in the future?

Work out acceptable arrangements with the woman and write down the particulars on the form. If possible, it would also be best to have her sign the form to indicate affirmative consent to follow-up.

How you will assure the confidentiality and data security of the information they give you is the final thing you need to tell people as part of informed consent. You need to think through risks to victims of telling their stories, and risks to agency employees of answering questions about how things "really" work. Then you need to develop procedures to guard their data so the risks do not materialize (that is, you need to ensure that they do not suffer repercussions should they be identified as a source of information that reflects negatively on an agency). Above all, you should tell research participants up front how you intend to handle the data they give you—will you cite them by name, will you disguise the source of your information, or will you report only grouped data that does not identify individuals. They can then decide for themselves how much they want to reveal. Whatever you tell them, that's what you must do, or you will have violated the understanding under which they were willing to share their perceptions, opinions, and information with you. If you promised not to cite them in any way that would make them identifiable, then don't break your promise. If you want to be able to cite them, then tell them so up front. If you have data with people's names attached to it and you have promised to keep their information confidential, then you will have to develop security procedures to maintain that confidentiality (e.g., keeping it in locked file cabinets, putting only an ID number on the data, and keeping the key that links ID number and name in a separate, locked drawer, limiting access to the data to those people who have committed themselves to respect the conditions of confidentiality you have promised).

Addendum: Evaluation and Basic Research Methods

General Issues

A Practical Guide to Conducting Empirical Research

Nonsexist Research Methods

Handbook of Practical Program Evaluation.

Evaluation Strategies for Human Service Programs: A Guide for Policymakers and Providers.

Evaluation Basics: a Practitioner's Manual

Human Services Monograph Series,

Handbook of Practical Evaluation

Communicating for Cultural Competence.

Defining Your Research Issue

Strategies of Social Research: The Methodological Imagination

New Approaches to Evaluating Community Initiatives: Concepts, Methods, and Contexts,

A Review of Research Designs

Methods of Social Research

Research Design Explained

Measuring Impact: a Guide to Program Evaluation for Prosecutors

An Introduction to Survey Research, Polling, and Data Analysis

Case Study, Implementation Assessment, and Other Qualitative Methods

How to Assess Program Implementation

Focus Groups: A Practical Guide for Applied Research

Qualitative Data Analysis: An Expanded Sourcebook

How to Use Qualitative Methods in Evaluation

Utilization-Focused Evaluation

Case Study Research: Design and Methods

Surveys and Questionnaires

Survey Research Methods

Handbook of Survey Research

Mail and Telephone Surveys: The Total Design Method.

Experimental and Quasi-Experimental Design

New Directions for Program Evaluation

Quasi-Experimentation

Causal Modeling—Regression as a General System, for (Almost) Any Type of Data

Applied Statistics: Analysis of Variance and Regression

Primer of Applied Regression and Analysis of Variance

Causal Models in Panel and Experimental Design.

Foundations of Behavioral Research

Cost Analyses

Cost-Effectiveness: A Primer.

Performance Monitoring

Guide to Program Outcome Measurement for the U.S. Department of Education.

INTRODUCTION TO THE RESOURCE CHAPTERS

The remaining chapters in this Guidebook provide resources to help you measure and evaluate your program(s). The first six chapters (Chapters 7 through 12) focus on the types of outcomes you may need to measure. As explained below, any given program may need to draw on resources from several chapters to get a complete picture of what the program has accomplished. The next two chapters (Chapter 13 on training and Chapter 14 on data system development) describe evaluation issues and measurement approaches to two fairly complex activities that can be funded with STOP grants. The last chapter offers some background and critical contextual information about conducting evaluations of programs on Indian tribal lands, as these pose some unique challenges of both program development and evaluation.

Most STOP-funded projects will need to draw on at least one of these resource chapters; many projects will need to incorporate the suggestions of several chapters into their evaluation design. The following brief chapter descriptions repeat some information from the Preface and Chapter 1, but augment it by indicating the types of projects that would benefit from reading each chapter:

Chapter 7 focuses on immediate and long-term outcomes for women victims of violence. You would want to look at the measures in this chapter for any project that ultimately wants to make a difference for women's short- or long-term outcomes. That probably includes most projects funded under STOP.
Chapter 8 focuses on indicators of the scope, variety, and organization of victim services, from within a given agency up to a network that encompasses the whole community. Projects devoted to victim services would want to read this chapter, as would projects trying to achieve changes within the justice systems and/or increased community collaboration.
Chapter 9 focuses on changes that might occur within specific civil and criminal justice agencies as they work to improve their handling of cases involving violence against women. Any project for which these are immediate or ultimate outcomes (e.g., training, special units, policies and procedures, data systems) would want to look at the measures in this chapter.
Chapter 10 focuses on changes that might occur within the service network of an entire community (including victims services, law enforcement, prosecution, the judiciary, and other elements of the network) as it attempts to develop more cooperative and collaborative arrangements for handling situations involving violence against women. Projects of any type that have a collaborative or system-building aspect would want to look at this chapter.
Chapter 11 focuses on community-wide issues such as attitudes toward violence against women, levels of violence against women, and signs that businesses and other community elements with low historical involvement in the issue are beginning to get involved. Projects trying to bring more elements of the community into partnership in reducing violence against women, and any project that has the reduction of violence against women as its ultimate goal, might want to look at this chapter.
Chapter 12 focuses on women's perceptions that the civil and criminal justice systems do or do not treat them fairly. Since we hope that system changes (as indicated by measures from Chapters 9 and 10) improve the fairness of the system, the measures suggested in this chapter might be used to get a reading on whether women perceive that the fairness of their treatment has increased.
Chapter 13 describes the issues to address when conducting evaluations of training projects, including four levels of evaluation: reaction, learning, behavior change, and impact on violence against women. The chapter contains its own logic model for a training project. It also includes some concrete measures of immediate outcomes, especially attitude change on the part of trainees.
Chapter 14 describes the issues to address when conducting evaluations of projects that are developing data systems for justice agencies (mostly), although the same issues pertain to data systems for victim services. It also contains its own logic model. In addition to projects that directly address the data systems purpose area, any project that is developing or using a data system to track project activities should take a look at this chapter. This might include a victim services project that uses a client tracking system, and special unit project that has a case tracking data system, and so on.
Chapter 15 lays out factors that are important to understand when conducting evaluations of violence against women projects on Indian tribal lands. Many of the factors discussed in this chapter could be considered background or antecedent variables in a logic model framework—things that need to be understood and documented to provide the context for the project's eventual success or failure. Also included in the chapter is some advice for working with people who are involved with these projects, so that issues of cultural competence do not cloud the progess of the evaluation.

The remainder of this introduction presents several logic models, starting with simpler ones and progressing to the more complex. For each element in these logic models, we refer to one or more of the chapters in this resource section, to give you an idea how you might use the material in these chapters to construct a full evaluation design. For more examples of complex program models, you might also want to look at the logic models for training and for data system development included in Chapters 13 and 14, respectively.

Example 1: Counseling Services

Exhibit LM.1 shows the logic underlying an evaluation of a relatively simple counseling program. The basic service, counseling, is shown in Column B, which also indicates that client case records are expected to be the source of data to document the types and amounts of counseling provided to clients. A variety of possible immediate and longer-term outcomes for clients (effects of counseling) appear in Column D; the primary data source would be client interviews, constructed to include some of the measures found in Chapter 7. The simplest evaluation of this program would involve collecting data relevant only to Columns B and D, on services received and victim outcomes. Client interviews would be required to obtain these outcome data.

You can complicate the simple evaluation considerably, which will mean more work but probably also more knowledge about what really makes a difference if you do. You can measure background factors—in this case, pertinent characteristics of the women coming for counseling—which appear in Column A. You would use information about client characteristics to help you understand what types of women are helped most by which types of counseling. Another complication is external factors that might increase or decrease the likelihood that counseling will produce the desired outcomes. These are shown in Column C, and include the availability of other supportive services from the same agency or from other agencies in the community (see Chapters 8, 9, and 10); other stressors in each woman's life (see Chapter 7); and justice system actions in each woman's case. The expected sources for both background and external factors in this model include intake forms, assessment forms, and/or research interviews with clients.

Remember that whether or not you measure them, background and external factors are always present. You may leave them out of your data collection plans, but you can't leave them out of your thinking. They should be represented in your logic model so you will be sure to consider (1) what you will miss if you leave them out, and (2) the limits on your ability to interpret findings because you have limited the scope of the variables available for analysis.

Example 2: Special Prosecution Unit

Exhibit LM.2 shows the logic underlying an evaluation of a special prosecution unit. The activities of the unit itself are shown in Column B, while the immediate and longer-term outcomes are shown in Column D. Outcomes include some that apply to women victims of violence (see Chapter 7) and some that apply to criminal justice system changes (see Chapter 9). As with Example 1, the simplest evaluation one could do on a special prosecution unit would examine outcomes (Column D) in relation to inputs (Column B). Greater complexity could (should) be introduced by including background factors (Column A, in this case characteristics of the sexual assault and/or domestic violence cases handled by the unit, and their similarity to or difference from the totality of cases handled before the unit began operations, such as whether the unit gets only the tough cases). One might also include external factors in the analysis; Column C suggests a number of external factors pertinent to a woman's decision to pursue a case (e.g., her level of danger, the quality of her support system, or other stresses in her life). Measures for these factors can be found in Chapter 7. Other external factors may also be relevant to particular evaluation situations. Exhibit LM.2 indicates that quite a variety of data sources may be necessary for a full treatment of this logic model, and that one might want to look at Chapters 7, 9, 13, and 14 in the process of planning the data collection.

Example 3: Court Advocacy Program

Exhibit LM.3 describes a logic model for a court advocacy program to help women coming to civil court for a protection/restraining/stay away order. Column B shows the direct activities of the program, which one would document through case records and process analysis, including observations. Column D shows a variety of outcomes, including some that are personal to the woman and some that would also be considered system outcomes (following through to a completed permanent order). Column A suggests some background factors of the women seeking orders that might affect both their use of the advocacy services and their own ultimate outcomes. Column C indicates some external realities of court accommodation to the program (or failure to do so) that might make a difference for service efficacy. Data sources would be case records, possible data system data, process analysis, and client interviews.

These brief examples as guides to the resource chapters and their connection to logic models will hopefully provide you with a practical basis for jumping into the resource chapters themselves. They do not answer all questions—for example, they do not give you variables to use in describing clients (although you could use the SSS Part IX data fields as a start for this, and add much more that your own program wants to know). Nor do the chapters give you a system for describing and counting services, as this is much too variable across programs. However, the resource chapters do offer a rich array of ideas and measures covering topics and issues that most programs will need to include in an evaluation. Their specifics should be helpful to you. In addition, just reading the chapters may give you some practice in the way that an evaluator might think about measuring things, and this practice may help you develop the details of your own evaluation.

IMTPexLM_1.jpg (136510 bytes)

IMTPexLM_2.jpg (146188 bytes)

IMTPexLM_3.jpg (98304 bytes)

CHAPTER 7
VICTIM SAFETY AND WELL-BEING:
MEASURES OF SHORT-TERM AND LONG-TERM CHANGE

By Cris M. Sullivan ¹

Many STOP projects have goals that involve making changes in victims' lives. This chapter offers suggestions about ways to document such changes. It discusses the need to identify and measure changes that occur in the short run, and also changes that are more likely to take a significant period of time to develop. Once this distinction between short- and long-term impacts has been described, the chapter offers specific instruments (scales, questions, formats) that measure both short- and long-term changes.

What is Short-Term and What Is Long-Term Change?

Short-term changes are those more immediate and/or incremental outcomes one would expect to see quickly, and that will eventually lead to desired long-term changes. Optimal long-term changes might include (1) freedom from violence, (2) decreased trauma symptoms, and/or (3) increased physical, psychological, economic, and/or spiritual well-being. However, we would not expect these outcomes to occur quickly as the direct or sole result of a new or improved community-based program. Rather, effective programs would ideally result in some degree of measurable, immediate, positive change in women's lives, with this change ultimately contributing to long-term safety and well-being. For example, a hospital-based medical advocacy project for battered women might be expected to result in more women being correctly identified by the hospital, more women receiving support and information about their options, and increased sensitivity being displayed by hospital personnel in contact with abused women. Or, a SANE (sexual assault nurse examiner) program for treating sexual assault victims might be expected to produce many of these same outcomes, as well as better evidence collection. These short-term changes might then be expected to result in more women accessing whatever community resources they might need to maximize their safety (e.g., shelter, personal protection order) and/or help them cope with emotional issues (e.g., counseling, hotlines), which ultimately would be expected to lead to reduced violence and/or increased well-being (long-term outcomes). However, it would be unrealistic to expect to see a change in the level of violence in women's lives or their full psychological healing immediately or even shortly after receipt of medical advocacy offered to battered women or women who have been sexually assaulted. Rather, programs should measure the short-term changes they expect to impact. In these examples, that might include (1) the number of women correctly identified in the hospital as survivors of domestic abuse or sexual assault; (2) the number of women with full, complete, and secure evidence collected; (3) women's satisfaction with information and support received from the program; (4) women's increased knowledge of available resources post-intervention; (5) victims' perceptions of the effectiveness of the intervention in meeting their needs; and (6) hospital personnel's attitudes toward victims of domestic violence and sexual assault.

There are two critical points to make here:

Most programs using STOP grants should focus on the measurement of short-term, not long-term, change.
Direct service delivery programs that are "victim"-focused should not be expected to produce decreased violence in women's lives. Victim-based direct service programs can provide support, information, assistance, immediate safety for women and/or psychological counseling, but they are not designed to decrease the perpetrators' abuse or end the risk of rape. A coordinated community response that holds perpetrators accountable for their behaviors is necessary to decrease the risk of continued abuse and sexual assault.

Once you have decided which program outcomes you want to measure, you will need to choose or develop measuring instruments that are sensitive enough to detect whether desired changes have occurred. It is preferable to use a well-established instrument whenever possible to maximize the odds of detecting change and to increase confidence in your findings. We include in this chapter a number of instruments measuring victim safety and well-being over time, all of which are currently being used in research related to violence against women. You should not try to develop a new instrument specifically for your project unless you cannot find any existing instruments in the literature.

Measures of Short-Term Change

In order to measure short-term change, answers must be provided to questions such as:

What specifically did survivors receive from this program/service/intervention?
How much did survivors receive from this program/service/intervention (i.e., how much time, how many units of service)?
How effective did survivors feel this intervention was in meeting their needs?
How satisfied were survivors with the various components of this intervention?
If this program/service/intervention was designed to result in any tangible, measurable change in survivors' lives (e.g., change of residence, new financial resources), did this change occur?

Unlike constructs such as depression or social support, which can be measured by standardized instruments, short-term changes are generally assessed using questions you create yourself, such as:

How effective was having a court advocate in helping you understand the court process?

1 = very ineffective
2 = somewhat ineffective
3 = somewhat effective
4 = very effective

How effective was the (protection/restraining/stay away order) clinic in helping you obtain a protection/restraining/stay away order?

1 = very ineffective
2 = somewhat ineffective
3 = somewhat effective
4 = very effective

How effective was the SANE nurse in helping you understand and feel comfortable with the evidence collection process?

1 = very ineffective
2 = somewhat ineffective
3 = somewhat effective
4 = very effective

How supported did you feel by the staff of this program?

1 = very unsupported
2 = somewhat unsupported
3 = somewhat supported
4 = very supported

Did the hotline counselor help you understand the feelings you were having after you were raped?

1 = did not help at all
2 = helped a little
3 = helped somewhat
4 = helped a lot

Did shelter staff help you devise a safety plan while you were here?

1 = yes
2 = no
0 = I did not request or want a safety plan

How satisfied are you with the services you received from this program?

1 = very dissatisfied
2 = somewhat dissatisfied
3 = somewhat satisfied
4 = very satisfied

While it is often important to ask open-ended questions, such as "What did you like?" and "What can we improve?" these types of questions should be asked in addition to, not instead of, more quantitative questions (closed-ended questions with forced options, such as those just presented). It is much easier to describe effects and to detect change with closed-ended questions, which assign a number to each answer and force the respondent to select one and only one answer. Thus, if you want to know what clients liked, ask specific questions and use specific answer categories. For example, you could ask:

How much did you like (or "how satisfied were you with," or "how helpful was"):

...the promptness of our response,
...the specific information we provided about resources,
...the way people in our program interacted with you,
...the advice we gave you,
...the ways we helped you think about your situation.

For these questions, you could use answer categories such as: 1=did not like at all, 2=liked somewhat, 3=liked a lot. Or the categories could be 1=not at all satisfied/helpful, 2=somewhat satisfied/helpful, 3=very satisfied/helpful.

It is important to remember that the wording of questions influences the responses received. If, for example, a legal advocacy project is designed to help survivors make the best informed legal decisions they can for themselves, based on complete and accurate information, it would be inappropriate to ask women whether they did or did not participate in pressing charges against their assailants to determine program outcome, as pressing charges may not be the best option for all women. Rather, questions might be used such as the following:

What decision did you make regarding participating in pressing charges against (perpetrator's name)?

1 = yes, will participate in pressing charges
2 = undecided
3 = no, will not participate in pressing charges

What factors influenced your decision?	YES	NO
prior experience with the criminal justice system	1	2
information received from legal advocate	1	2
information received from prosecutor	1	2
fear of perpetrator	1	2
(etc., etc.)
other (specify:_________________________)	1	2

Do you think this decision will result in (perpetrator's name) being:

1 = less violent against you in the future
2 = no change in violence
3 = more violent against you in the future
7 = don't know

Did you find our legal advocacy program to be:

1 = not at all helpful in making this decision
2 = somewhat helpful in making this decision
3 = very helpful in making this decision

Again, short-term change is generally measured by examining what the consumer received, how much she received, how effective she found the service, how satisfied she was with the service, and whether short-term, incremental change occurred.

Measures of Long-Term Change: Victim Safety and Well-Being

Although the majority of programs receiving STOP grant funding will not be in a position to evaluate longer-term outcomes, it is sometimes feasible and appropriate to measure whether victims' level of safety and/or quality of life improves over time as a result of an intervention. This level of evaluation generally requires additional time and financial resources, but when such an effort is warranted, there are a number of standardized instruments available from which to choose. The remainder of this chapter presents brief critiques of various measures previously used in research pertaining to violence against women. The first section describes eight instruments developed to measure physical, psychological, and/or sexual abuse. These measures can be used to examine whether such violence increases, decreases, or remains the same over time. The second section pertains to measures of well-being, including depression, post-traumatic stress, overall quality of life, and self-esteem. The last section describes instruments that measure correlates of well-being—access to community resources, stressful life events, and level of social support.

Criteria Used to Select Instruments

Numerous instruments have been developed that measure some aspect of safety and/or well-being, and not all could be included in this Guidebook. The instruments described in this chapter were chosen because they met the following criteria:

(1) easily administered by laypersons;
(2) relatively few items (less than 30 wherever possible);
(3) history of use in research on violence against women;
(4) acceptable reliability and validity; and
(5) easily interpreted by laypersons.

Each scale presented has demonstrated at least some degree of adequate reliability and validity as a field instrument; therefore we do not detail the properties of each instrument unless they are noted as a concern. ² Sample items follow each scale description, as do references to articles where you can find detailed information about how each instrument behaves. Complete instruments can be ordered from the STOP TA Project (800 256-6883 or 202 265-0967 in the Washington, D.C. area) unless they are under copyright. Copyrighted instruments that must be obtained by the publisher or author are noted. ³

A Note of Caution

The majority of the instruments that follow were developed for research purposes. Their strength lies in their ability to characterize groups of people, and they should not be used by laypersons as individualized assessment instruments or as diagnostic tools for individual clients. For example, although the Sexual Experiences Survey classifies respondents into one of four categories (nonvictimized, sexually coerced, sexually abused, or sexually assaulted), such classifications are made based on large aggregated data sets. It would be problematic and unethical to inform a particular woman of her "classification," based on her responses to this survey. A woman who did not self-identify in the same manner as her classification on this measure could lose faith in the intervention program designed to assist her and/or could feel misunderstood or even revictimized. The following instruments should be used for descriptive purposes of the sample as a whole and to examine group, not individual, differences.

Measures of Victim Safety

The following measures of physical, psychological, and/or sexual violence are presented alphabetically in this section.

Measures of Physical Abuse by Intimates

Abusive Behavior Inventory [also measures psychological abuse] (Shepard & Campbell, 1992)
Conflict Tactics Scales (Revised) (Straus et al., 1996)
Danger Assessment (Campbell, 1986)
Index of Spouse Abuse [also measures psychological abuse] (Hudson & McIntosh, 1981)
Severity of Violence Against Women Scales (Marshall, 1992)

Measures of Psychological Abuse

Index of Psychological Abuse (Sullivan, Parisian, & Davidson, 1991)
Psychological Maltreatment of Women Inventory (Tolman, 1989)

Measure of Sexual Violence

Sexual Experiences Survey (Koss & Oros, 1982)

Intimate Physical Abuse

ABUSIVE BEHAVIOR INVENTORY - PARTNER FORM

Citation	Shepard, M.F., & Campbell, J.A. (1992). The Abusive Behavior Inventory: A measure of psychological and physical abuse. Journal of Interpersonal Violence, 7(3), 291-305. Copyright © 1992 by Sage Publications, Inc. Reprinted by Permission of Sage Publications, Inc.
Description	Drawing from both feminist theory and educational curriculum with batterers, the authors designed this 29-item measure of psychological (18 items) and physical (11 items) types of abuse. A self-report measure, it is simple to administer and would generally take no more than 5 minutes to complete. The scale's strengths are that it was found to successfully differentiate between abusers and non-abusers, and it was designed to tap power and control issues within the relationship (for example: "checked up on you," and "stopped you or tried to stop you from going to work or school"). Its weaknesses, conceded by the authors, are that (1) there is no attention to injuries or to medical attention needed, which could approximate severity of the violence; and (2) the reliability and validity of the measure were based on a sample of inpatient, chemically-dependent men and women.
Sample items:	Copyright restrictions prohibit electronic distribution of scale contents.

Reference

Petrik, N.D. (1994). The reduction of male abusiveness as a result of treatment: Reality or myth? Journal of Family Violence, 9(4), 307-316.

Intimate Physical Abuse

THE REVISED CONFLICT TACTICS SCALES (CTS2)

Citation	Straus, M.A., Hamby, S.L., Boney-McCoy, S., & Sugarman, D.B. (1996). The Revised Conflict Tactics Scales (CTS2): Development and preliminary psychometric data. Journal of Family Issues, 17, 283-316. Copyright © 1996 by Sage Publications, Inc. Reprinted by Permission of Sage Publications, Inc.
Description	The first instrument designed to measure conflict and violence between intimate partners was the original version of this scale (CTS1). The CTS1 has been both widely used and widely criticized. It was an 18-item scale of relationship conflict tactics, with the latter 10 items measuring violent strategies. Its strengths were that it was used successfully in many settings and with many populations, and that it was short and easy to administer. Its weaknesses included (1) measuring partner violence only within the context of conflict, while domestic violence is about power and control, (2) ignoring many common types of woman-battering, including symbolic gestures of violence as well as tactics of power, control, and intimidation, (3) rating some acts of violence as more severe than others outside of the context of the event (i.e., slapping is rated as "mild," although a hard slap can cause severe injury), (4) ignoring whether an act was committed in self-defense, and (5) exclufing injuries sustained or medical attention needed to approximate severity of the violence. The revised CTS2 has added more items to include some additional types of conflict tactics, has added a section on injuries, and now has 39 items. The other weaknesses remain even with the CTS2. Both the CTS and CTS2 are still very good instruments to use because a great deal is known about their strengths and weaknesses. However, you may want to compensate for remaining weaknesses with additional questions. There is no standardized instrument in general use to help you in making these compensations, so this is one place where you may have to make up some questions of your own to cover the issues that the CTS2 omits.
Sample items:	Copyright restrictions prohibit electronic distribution of scale contents.

Reference

Straus., M.A. (1979). Measuring intrafamily conflict and violence: The Conflict Tactics (CT) Scales. Journal of Marriage and the Family, 75-88.

Straus, M.A., & Gelles, R.J. (1986). Societal change and change in family violence: Violence from 1975 to 1985 as revealed by two national surveys. Journal of Marriage and the Family, 48, 465-479.

Intimate Physical Abuse

THE DANGER ASSESSMENT

Citation	Campbell, J.C. (1986). Nursing assessment for risk of homicide with battered women. Advances in Nursing Science, 8, 36-51. Copyright © 1981 by the National Council on Family Relations, 3989 Central Ave., NE, Suite 550, Minneapolis, MN55421. Used by permission. The instrument is available from Jacquelyn Campbell on request.
Description	This 11-item assessment tool was created to assist women with abusive partners in assessing their danger of homicide. The author recommends this tool be used as part of a nursing assessment of domestic violence, and that nurses and patients complete the tool together. This instrument was created with the input of battered women, shelter workers, law enforcement officials, and other experts on battering. Due to the singularities of each woman's situation, however, the author stresses that no actual prediction of lethality be made based upon a woman's score. The score (summed affirmative responses) should be shared with the woman who has completed the Danger Assessment so she can determine her own risk.
Sample items:	Response categories: 0=no 1=yes 1. Has the physical violence increased in frequency over the past year? 4. Is there a gun in the house? 7. Does he threaten to kill you and/or do you believe he is capable of killing you?
Reference	McFarlane, J., & Parker, B. (1994). Preventing abuse during pregnancy: An assessment and intervention protocol. American Journal of Maternal/Child Nursing, 19, 321-324. McFarlane, J., Parker, B., & Soeken, J. (1995). Abuse during pregnancy: Frequency, severity, perpetrator and risk factors of homicide. Public Health Nursing, 12(5), 284-289.

Intimate Physical Abuse

INDEX OF SPOUSE ABUSE (ISA)

Citation

Hudson, W.W., & McIntosh, S.R. (1981). The assessment of spouse abuse: Two quantifiable dimensions. Journal of Marriage and the Family, 43, 873-888.

Description

This 30-item self-report instrument takes about 5 minutes to administer, and measures both physical (15 items) and non-physical (15 items) types of intimate abuse. The primary drawback of this scale is related to one of its strengths. Because the authors note that some types of violence are more severe than other types, they have assigned weights to the items, which results in slightly more complexity in computing scale scores for respondents. However, Hudson & McIntosh (1981) describe, in simple terms, the computation required to obtain the two scores.

The 15 items measuring non-physical abuse are quite inclusive of numerous psychologically and emotionally abusive behaviors. The "physical abuse" items, however, do not include such behaviors as kicking, restraining, burning, choking, pushing, or shoving. Women who have experienced these types of abuse without experiencing punches would receive artificially minimized scores on this measure. Further, 7 of the 15 "physical abuse" items only imply physical abuse. An example of such an item is: 'My partner becomes abusive when he drinks.' An additional drawback is that two items refer to abuse occurring when the perpetrator has been drinking, which could also artificially minimize the scores of women whose abusers do not drink. To compensate for aspects of domestic violence that this scale omits, you might want to make up some questions of your own.

Sample items:

Response categories: 1=never 2=rarely 3=occasionally 4=frequently 5=very frequently		[physical abuse] 7. My partner punches me with his fists. [non-physical abuse] 1. My partner belittles me.

Reference

Campbell, D.W., Campbell, J., King, C., Parker, B., & Ryan, J. (1994). The reliability and factor structure of the Index of Spouse Abuse with African-American women. Violence and Victims, 9(3), 259-274.

Intimate Physical Abuse

SEVERITY OF VIOLENCE AGAINST WOMEN SCALES

Citation

Description

This 46-item instrument was specifically created to measure "threatened, attempted, and completed behaviors likely to cause injury or pain" (Marshall, 1992: 105). The nine dimensions of violence measured by this scale are: symbolic violence; threats of mild, moderate, and serious violence; acts of mild, minor, moderate, and severe violence; and sexual violence. A strength of this scale is that it captures symbolic violence—behaviors often used by perpetrators to frighten and intimidate women. Weaknesses of this instrument are its weighting system, and therefore the categories of threats and violence that it produces. Items were weighted for severity based on ratings provided by samples of women who were not necessarily abused themselves. For example, the item "held her down, pinning her in place" was rated as "mild," and the item "bit her" was rated as "minor." This illustrates the difficulty of rating behaviors out of context, as these acts can of course also be very serious. However, the 46 empirically derived items are excellent examples of intimate male violence against women and can be used without the weighting system discussed in Marshall (1992).

Sample items:

Response categories (referring to the prior 12 months):

1=never
2=once
3=a few times
4=many times

[symbolic violence]
1. Hit or kicked a wall, door or furniture.

[threats]
7. Shook a fist at you.

[physical violence]
35. Choked you.

[sexual violence]
41. Demanded sex whether you wanted to or not.

Reference

Vitanza, S., Vogel, L.C.M., & Marshall, L.L. (1995). Distress and symptoms of posttraumatic stress disorder in abused women. Violence and Victims, 10(1), 23-34.

Psychological Abuse

INDEX OF PSYCHOLOGICAL ABUSE

Citation

Sullivan, C.M., Parisian, J.A., & Davidson, W.S. (1991). Index of Psychological Abuse: Development of a measure. Presented at the 99th annual convention of the American Psychological Association, San Francisco, CA.

Description

This 33-item instrument was designed to measure the common types of psychological abuse reported by battered women: criticism, ridicule, isolation, withdrawal, and control. It is easy to administer and generally takes less than 5 minutes to complete. This measure was originally developed on two samples: women who were exiting a domestic violence shelter, and dating college students. It has since been validated with a Korean sample of abused and non-abused women (Morash et al., in preparation).

Sample items:

Response categories: 1=never 2=rarely 3=sometimes 4=often 8=not applicable (i.e., no children, no pets)		2. Accused you of having or wanting other sexual relationship(s). 7. Tried to control your activites. 20. Criticized your intelligence.

Reference

Morash, M., Hoffman, V., Lee, Y.H., & Shim, Y.H. (in preparation). Wife abuse in South Korea.

Sullivan, C.M., Tan, C., Basta, J., Rumptz, M., & Davidson W.S. (1992). An advocacy intervention program for women with abusive partners: Initial evaluation. American Journal of Community Psychology, 20(3), 309-332.

Tan, C., Basta, J., Sullivan, C., & Davidson, W.S. (1995). The role of social support in the lives of women exiting domestic violence shelters. Journal of Interpersonal Violence, 10(4), 437-451.

Psychological Abuse

THE PSYCHOLOGICAL MALTREATMENT OF WOMEN INVENTORY

Citation

Tolman, R.M. (1989). The development of a measure of psychological maltreatment of women by their male partners. Violence and Victims, 4(3), 159-178. Copyright © 1994 by Springer Publishing Company, Inc. Used by permission.

Description

This 58-item scale was derived from clinical observations, the clinical literature, and nonphysical abuse items from the Conflict Tactics Scales and the Index of Spouse Abuse (described under "Measures of Physical Abuse" in this chapter). This self-report measure, which was piloted on batterers as well as battered women, takes 10-15 minutes to administer. The author notes, however, that batterers underestimate their own abusiveness and cautions against using this scale as a pre-intervention evaluation tool for batterer intervention programs.

Sample items:

Response categories: 1=never 2=rarely 3=sometimes 4=frequently 8=very frequently		2. My partner insulted me or shamed me in front of others. 10. My partner called me names. 42. My partner restricted my use of the phone.

Reference

Dutton, D.G. (1995). A scale for measuring propensity for abusiveness. Journal of Family Violence, 10(2), 203-221.

Kasian, M. (1992). Frequency and severity of psychological abuse in a dating population. Journal of Interpersonal Violence, 7(3), 350-364.

Sexual Abuse

THE SEXUAL EXPERIENCES SURVEY

Citation	Koss, M.P., & Oros, C.J. (1982). The Sexual Experiences Survey: A research instrument investigating sexual aggression and victimization. Journal of Consulting and Clinical Psychology, 50(3), 455-457. Copyright © 1982 by the American Psychological Association. Reprinted with permission.
Description	This 13-item instrument, which can be asked of victims or slightly reworded to be administered to perpetrators, is the only measure currently used to detect victims of rape as well as unidentified offenders in the general population. Women can be classified into one of the following four categories based on their responses to this survey: nonvictimized, sexually coerced, sexually abused, and sexually assaulted. "Sexually assaulted" includes having experienced those acts that would legally be classified as criminal sexual assault, including gross sexual imposition and attempted rape. The authors note that rape survivors responded consistently to this instrument whether it was administered privately or by an interviewer. This is important, given the reticence many women have in admitting to having been sexually victimized.
Sample items:	Response categories: 1=yes 2=no 4. Have you ever had sexual intercourse with a man even though you didn't really want to because he threatened to end your relationship otherwise? 8. Have you ever been in a situation where a man tried to get sexual intercourse with you when you didn't want to by threatening to use physical force (twisting your arm, holding you down, etc.) if you didn't cooperate, but for various reasons sexual intercourse did not occur? 12. Have you ever been in a situation where a man obtained sexual acts with you such as anal or oral intercourse when you didn't want to by using threats or physical force (twisting your arm, holding you down, etc.)?
Reference	Koss, M.P., & Gidycz, C.A. (1985). Sexual Experiences Survey: Reliability and validity. Journal of Consulting and Clinical Psychology, 53(3), 422-423. Koss, M.P. (1985). The hidden rape victim: Personality, attitudinal, and situational characteristics. Psychology of Women Quarterly, 9(2), 193-212.

Measures of Psychological Well-Being

In addition to examining level of violence as a long-term outcome, it is sometimes useful to examine the effects of a program on a woman's overall well-being. Prior research in the area of violence against women has often focused on the following constructs when examining well-being: depression, post-traumatic stress symptoms, perceived quality of life, and self-esteem. It is important to remember, when measuring the level of distress in women's lives, that normative distress does not equal psychological dysfunction. It is typically "normal" and adaptive to be distressed after a physical or sexual assault. Therefore, the following instruments should not be used by laypersons for diagnostic purposes, and high scores on distress scales should be cautiously interpreted.

Depression

Beck Depression Inventory (Beck, Rush, Shaw, & Emery, 1979)
CES-D (Radloff, 1977)

Post-Traumatic Stress Disorder

Posttraumatic Stress Scale for Family Violence (Saunders, 1994)
Trauma Symptom Checklist (Briere & Runtz, 1989)

Quality of Life

Quality of Life Scale (Sullivan, Tan, Basta, Rumptz, & Davidson, 1992)
Satisfaction with Life Scale (Pavot & Diener, 1993)

Self-Esteem

Coopersmith Self-Esteem Inventory (Coopersmith, 1967)
Rosenberg Self-Esteem Inventory (Rosenberg, 1965)

Growth Outcomes and Coping Strategies "How I See Myself Now" (Burt & Katz, 1987)
"Changes that Have Come from Your Efforts to Recover" (Burt & Katz, 1987)
"How I Deal With Things" (Burt & Katz, 1987, 1988)

Depression

BECK DEPRESSION INVENTORY

Citation	The Beck Depression Inventory. Copyright © 1978 by Aaron T. beck. Reproduced by permission of the publisher. The Psychological Corporation. All rights reserved. "Beck Depression Inventory" and "BDI" are registered trademarks of The Psychological Corporation.
Description	This 21-item inventory is a clinically derived measure originally designed for use with psychiatric populations. The measure can be administered verbally or in writing, and takes approximately 10-15 minutes to complete. Individuals endorse one of four choices per item (on a 0-3 scale), to receive a final summed score between 0 and 63. A score of 0-9 indicates no depression, while a score of 10-15 indicates mild depression and a score of 16-23 suggests moderate depression. Scores above 23 indicate severe clinical depression. This is a popular inventory that has been used across numerous clinical and non-clinical samples, including college samples, women with abusive partners, and survivors of rape. It is relatively simple to administer and is easily interpretable. However, it requires more deliberation and attention to detail from study participants than does the CES-D (described on next page). Within each item, the respondent must decide which of four statements most accurately describes her or his feelings, a task requiring more concentration and reflection than may be necessary, depending on the study.
Sample items:	1. [ ] I do not feel sad. [ ] I feel sad. [ ] I am sad all the time and I can't snap out of it. [ ] I am so sad or unhappy that I can't stand it. 2. [ ] I am not particularly discouraged about the future. [ ] I feel discouraged about the future. [ ] I feel I have nothing to look forward to. [ ] I feel that the future is hopeless and that things cannot improve.
Reference	Beck, A.T., Rush, A.J., Shaw, B.F., & Emery, G. (1979). Cognitive Therapy for Depression. New York: Guilford. Robinson, B.E. (1996). Concurrent validity of the Beck Depression Inventory as a measure of depression. Psychological Reports, 79(3), 929-930.

Depression

CENTER FOR EPIDEMIOLOGICAL STUDIES-DEPRESSION (CES-D) SCALE

Citation	Radloff, L.S. (1977). The CES-D Scale: A self report depression scale for research in the general population. Applied Psychological Measurement, 1(3), 385-401. Copyright © 1977 by West Publishing Company/Applied Psychological Measurement, Inc. Used with permission.
Description	This is a 20-item self-report checklist of depressive symptoms found within the general population. It can be administered verbally or in written format, and takes 5-10 minutes to complete. Individuals rate the degree to which they have been bothered by each symptom, within the prior week, on a 0 (rarely or never) to 3 (most or all the time) scale. For determining level of depression, scores are summed (four are reverse coded). A score of 0 - 15.5 indicates no depression, a score of 16-20.5 indicates mild depression, and scores 31 and higher indicate severe depression. This checklist was originally validated on a nonclinical, general population sample. It has since been administered to numerous study samples, including women with abusive partners and incarcerated women. It is simple to administer and easily understood by study participants. Having been validated for research in the general population, it is appropriate to use for community samples of women who are survivors of violence.
Sample items:	Response categories (referring to the past week): 0=rarely or never 1=some or a little 2=occasionally 3=most or all the time 1. I was bothered by things that usually don't bother me. 2. I did not feel like eating; my appetite was poor.
Reference	Lin, N., Dean, A., & Ensel, W. (1986). Social Support, Life Events, and Depression. FL: Academic Press. Martin, S.L., Cotten, N.U., Browne, D.C., Kurz, B., & Robertson, E. (1995). Family violence and depressive symptomatology among incarcerated women. Journal of Family Violence, 10(4), 399-412.

Post-Traumatic Stress

POSTTRAUMATIC STRESS SCALE FOR FAMILY VIOLENCE

Citation	Saunders, D.G. (1994). Posttraumatic stress symptom profiles of battered women: A comparison of survivors in two settings. Violence and Victims, 9(1), 31-44. Copyright © 1994 by Springer Publishing Company, Inc. Used by permission.
Description	This 17-item scale measures level of posttraumatic stress symptoms reported by women no longer experiencing intimate partner violence. It would be important to include a number of additional items if this scale were used in a research study: length and severity of abuse, length of time between cessation of abuse and presence of symptomatology, and length of time PTSD symptoms were/have been experienced. Also, although the author preferred to use the number of times a problem was experienced instead of the more subjective responses of "never," "rarely," "sometimes," and "often," there are a number of concerns with this strategy. First, it can be difficult if not impossible to remember an exact number of times one experiences a troubling feeling or thought. Second, the response ranges are not equivalent (i.e., 1-2 times vs 51-100 times), and the author created scale scores by summing the midpoints between each response category. This atypical scaling strategy was not well justified. However, the scale does seem to capture symptoms of posttraumatic stress in women who have experienced intimate abuse.
Sample items:	As a result of any of your partner's verbal or physical abuse of you, please circle how many times you had each of the following problems: never 1-2 3-11 12-24 25-36 37-50 51-100 over 100 times 1. Unpleasant memories of the abuse you can't keep out of your mind. 5. Trying to avoid thoughts or feelings associated with the abuse. 14. Difficulty concentrating.
Reference	Woods, S.J., & Campbell, J.C. (1993). Posttraumatic stress in battered women: Does the diagnosis fit? Issues in Mental Health Nursing, 14, 173-186.

Post-Traumatic Stress

THE TRAUMA SYMPTOM CHECKLIST

Citation	Briere, J., & Runtz, M. (1989). The Trauma Symptom Checklist (TSC-33): Early data on a new scale. Journal of Interpersonal Violence, 4(2), 151-163. Copyright © 1989 by Sage Publications, Inc. Reprinted by Permission of Sage Publications, Inc.
Description	This 33-item scale was designed to examine the impact of childhood abuse on later adult functioning. It contains five subscales: dissociation [6 items], anxiety [9 items], depression [9 items], hypothesized post-sexual-abuse-trauma [6 items], and sleep disturbances [4 items] (1 item appears in two different subscales). The instrument can be administered verbally or in written form, and it has been found to discriminate between women who were and were not sexually abused as children. However, the authors caution against using this measure as a "litmus test" for the presence of childhood sexual victimization, as symptomatology varies across individuals.
Sample items:	Copyright restrictions prohibit electronic distribution of scale contents.
Reference	Briere, J., & Runtz, M. (1987). Post-sexual abuse trauma: Data and implications for clinical practice. Journal of Interpersonal Violence, 2, 367-379. Briere, J., & Runtz, M. (1988). Symptomatology associated with childhood sexual victimization in a non-clinical adult sample. Child Abuse and Neglect, 12, 51-59.

Quality of Life

QUALITY OF LIFE SCALE

Citation	Sullivan, C.M., Tan, C., Basta, J., Rumptz, M., & Davidson W.S. (1992). An advocacy intervention program for women with abusive partners: Initial evaluation. American Journal of Community Psychology, 20(3), 309-332. Copyright © 1992 by Plenum Publishing Corp. Used with permission.
Description	This scale first contained 25 items, modified from Quality of Life scales developed by Andrews & Withey (1976), designed to measure how satisfied respondents were with their overall quality of life. This instrument was pilot tested on women exiting a domestic violence shelter program, and was re-administered every six months over a two year period. Items were so highly intercorrelated that the top 9 items were then chosen for a shorter, final scale. The measure is easy to administer and sensitive to detecting change over time.
Sample items:	Response categories: 1=extremely pleased 2=pleased 3=mostly satisfied 4=mixed (equally satisfied and dissatisfied) 5=mostly dissatisfied 6=unhappy 7=terrible 1. First, a very general question. How do you feel about your life in general? 4. How do you feel about the amount of fun and enjoyment you have? 9. How do you feel about the way you spend your spare time?
Reference	Andrews, F., & Withey, S. (1976). Social Indicators of Well-being: American's Perceptions of Life Quality. New York: Plenum Press. Atkinson, T. (1982). The stability and validity of quality of life measures. Social Indicators Research, 10, 113-132. Sullivan, C.M., Campbell, R., Angelique, H., Eby, K.K., & Davidson, W.S. (1994). An advocacy intervention program for women with abusive partners: Six month followup. American Journal of Community Psychology, 22(1), 101-122.

Quality of Life

SATISFACTION WITH LIFE SCALE

Citation	Pavot, W., & Diener, E. (1993). Review of the Satisfaction With Life Scale. Psychological Assessment, 5(2), 164-172. Copyright © 1993 by the American Psychological Association. Reprinted with permission.
Description	The 5-item Satisfaction With Life Scale assesses the positive aspects of people's lives. It can detect change over time, and has been translated into French, Russian, Korean, Hebrew, Mandarin Chinese, and Dutch. Because it is short, the entire scale is reproduced below. No permission is needed to use this instrument.
Items:	Response categories: 1=strongly disagree 2=disagree 3=slightly disagree 4=neither agree nor disagree 5=slightly agree 6=agree 7=strongly agree 1. In most ways my life is close to my ideal. 2. The conditions of my life are excellent. 3. I am satisfied with my life. 4. So far I have gotten the important things I want in life. 5. If I could live my life over, I would change almost nothing.
Reference	Diener, E., Emmons, R.A., Larsen, R.J., & Griffin, S. (1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49, 71-75. Diener, E., Sandvik, E., Pavot, W., & Gallagher, D. (1991). Response artifacts in the measurement of well-being. Social Indicators Research, 24, 35-56. Larsen, R.J., Diener, E., & Emmons, R.A. (1985). An evaluation of subjective well-being measures. Social Indicators Research, 17, 1-18. Pavot, W., Diener, E., Colvin, R., & Sandvik, E. (1991). Further validation of the Satisfaction With Life Scale: Evidence for the cross-method convergence of well-being measures. Journal of Personality Assessment, 57(1), 149-161.

Self-Esteem

COOPERSMITH SELF-ESTEEM INVENTORY

Citation	Coopersmith, S. (1967). The Antecedents of Self-esteem. San Francisco: W.H. Freeman & Co. From: THE ANTECEDENTS OF SELF-ESTEEM by Coopersmith © by W.H. Freeman and Company. Used with permission.
Description	This 25-item self-administered form takes approximately 10 minutes to complete. It was originally tested on school children but has since been administered to adults across numerous settings as well. Participants indicate whether each statement is "like me" or "unlike me." Raw scores, which range from 0 to 25, are multiplied by 4, for a final score ranging from 0 to 100. Higher scores indicate higher self-esteem.
Sample items:	Items are answered either "like me" or "unlike me." The high-esteem response is indicated in parentheses after each item. 1. I often wish I were someone else. (unlike me) 5. I'm a lot of fun to be with. (like me)
Reference	Getzinger, S., Kunce, J., Miller, D., & Weinberg, S. (1972). Self-esteem measures and cultural disadvantagement. Journal of Consulting and Clinical Psychology, 38, 149. Silvern, L., Karyl, J., Waelde, L., Hodges, W.F., Starek, J., Heidt, E., & Min, K. (1995). Retrospective reports of parental partner abuse: Relationships to depression, trauma symptoms and self-esteem among college students. Journal of Family Violence, 10(2), 177-202. Ziller, R., Hagey, J., Smith, M.D., & Long, B. (1969). Self-esteem: A self-social construct. Journal of Consulting and Clinical Psychology, 33, 84-95.

Self-Esteem

ROSENBERG SELF-ESTEEM INVENTORY

Citation	Rosenberg, (1965). Self-Esteem Scale. Rights reserved by Princeton University Press. Used with permission.
Description	This 10-item scale is easy to administer verbally or in written form. It has been used extensively in the past with varied populations, including women with abusive partners and rape survivors.
Sample items:	Response categories: 1=Strongly agree 2=Agree 3=Disagree 4=Strongly disagree 1. I feel that I am a person of worth, at least on an equal basis with others. 2. I feel that I have a number of good qualities.
Reference	Fleming, J.E., & Courtney, B.E. (1984). The dimensionality of self-esteem: II. Hierarchical facet model for revised measurement scales. Journal of Personality and Social Psychology, 46, 404-421. Myers, M.B., Templer, D.I., & Brown, R. (1984). Coping ability of women who become victims of rape. Journal of Consulting and Clinical Psychology, 52, 73-78. Saunders, D.G. (1994). Posttraumatic stress symptom profiles of battered women: A comparison of survivors in two settings. Violence and Victims, 9(1), 31-44.

Growth Outcomes and Coping Strategies

"HOW I SEE MYSELF NOW"

Citation

Burt, M.R., & Katz, B.L. (1987). Dimensions of recovery from rape: Focus on growth outcomes. Journal of Interpersonal Violence, 2 (1), 57-81. Copyright © 1987 by Sage Publications, Inc. Reprinted by Permission of Sage Publications, Inc.

Description

This 43-item self-administered instrument measures various components of self-concept. It focuses on dimensions of a woman's life that are challenged or "shaken up" by a rape. It has six subscales: angry/needy/lonely; independence/ competence; mental health; trust; help; and guilt/blame. A respondent rates the degree to which she feels that each adjective describes her. The scale also deals with the respondent's satisfaction with her status on each dimension. This measure was one of three growth outcome instruments created by Burt and Katz and used in their 1987 study as part of a questionnaire given to adult rape victims 1 to 14 years after their rape.

Sample items:

Response categories (referring to the present time)

1=almost never 2=rarely 3=sometimes 4=half the time 5=often 6=usually 7=almost always		u=unsatisfied m=mixed feelings s=satisfied

Sample items:

guilty willing to ask for help independent doing well self-respecting lonely competent deserving blame

Reference

Katz, B.L. & Burt, M.R. (1988). Self-Blame in recovery from rape: Help or hindrance? Chapter 10 in Ann W. Burgess, (ed.), Rape and Sexual Assault, Vol. II. New York: Garland Press.

Janoff-Bulman, R., & Frieze, I.H. (1983). A theoretical perspective for understanding reactions to victimization. Journal of Social Issues, 39 (2), 195-221.

Veronen, L., & Kilpatrick, D. (1983). Rape: A precursor to change. In E. Callahan & K. McCluskey (Eds.), Life-span Development Psychology: Nonnormative Life Events. New York: Academic Press.

Growth Outcomes and Coping Strategies

"CHANGES THAT HAVE COME FROM YOUR EFFORTS TO RECOVER"

Citation	Burt, M.R., & Katz, B.L. (1987). Dimensions of recovery from rape: Focus on growth outcomes. Journal of Interpersonal Violence, 2 (1), 57-81. Copyright © 1987 by Sage Publications, Inc. Reprinted by Permission of Sage Publications, Inc.
Description	This 28-item measure assesses respondents' perceptions of how they are now compared to how they were before being raped. It draws its items from anecdotal accounts of ways that women say they have become stronger as a result of coping with and trying to recover from a rape. The resulting scale yields three subscales: self-value, positive actions, and interpersonal skills. Each has acceptable reliability and important elements of construct validity. Most women interviewed 2 to 14 years postrape could report several ways in which they liked themselves better or saw themselves as having grown stronger during the process of recovery.
Sample items:	Response categories (referring to the present time): 1=much less than before the rape 2=somewhat less 3=a little less 4=the same as before the rape 5=a little more 6=somewhat more 7=much more than before the rape 2. I am able to talk with my family about all types of important things. 8. I'm confident in my judgments of people. 16. I value myself. 18. I can handle people who try to boss or control me. 23. I feel like I'm in control of my life.
Reference	Veronen, L., & Kilpatrick, D. (1983). Rape: A precursor to change. In E. Callahan & K. McCluskey (Eds.), Life-Span Development Psychology: Nonnormative Life Events. New York: Academic Press. Davis, R.C., Kennedy, J. & Taylor, B.G. (1997). Positive changes following sexual assault. New York: Victim Services Agency.

Growth Outcomes and Coping Strategies

"HOW I DEAL WITH THINGS"

Citation	Burt, M.R., & Katz, B.L. (1987). Dimensions of recovery from rape: Focus on growth outcomes. Journal of Interpersonal Violence, 2 (1), 57-81. Copyright © 1987 by Sage Publications, Inc. Reprinted by Permission of Sage Publications, Inc.
Description	This self-report instrument contains 33 items describing different behaviors often used when coping with a rape (or with other traumatic events). It yields five subscales measuring different coping tactics: avoidance, expressive behaviors, nervous/anxious behaviors, cognitive approaches, and self-destructive behaviors. It may be used to refer to the present time only, or, as in its original use, respondents can be asked to give two sets of answers—one for how they dealt with the rape in the weeks and months immediately after it occurred, and a second set for how they deal with the rape now. As with the other growth outcome scales, this measure was validated on adult rape victims and is still used on the same population.
Sample items:	Response categories (referring to both "now" and "in the first several months after the rape occurred"): 1=never 2=rarely 3=sometimes 4=half the time 5=often 6=usually 7=always 1. Trying to rethink the situation and see it from a different perspective. 10. Directly showing your feelings when you are with others--actually crying, screaming, expressing confusion, etc. 20. Drinking a lot of alcohol or taking other drugs more than usual.
Reference	Burt, M.R. & Katz, B.L. Coping strategies and recovery from rape. (1988). Annals of the New York Academy of Sciences, 528, 345-358. Burgess, A.W., & Holmstrom, L.L. (1978). Recovery from rape and prior life stress. Research in Health and Nursing, 1(4), 165-174. Horowitz, M. (1976). Stress Response Syndrome. New York: Jason Aronson, Inc.

Other Correlates of Well-Being

When examining change over time, it is important to consider other factors that might be influencing survivors' safety and well-being. Especially when evaluating a program that has no control (or even comparison) group, caution must be taken when interpreting findings. For example, a domestic violence program that implemented a shelter-based counseling project might be interested in seeing if the counseling decreased women's depression. Administering a depression measure to a woman entering the shelter and again when the woman exited the shelter might seem logical, and in fact would likely show the desired effects (decreased depression). However, women's depression would be expected to decrease regardless of a specific component of the shelter program. Rather, such a decrease would be due to what is called a time effect. In this example, the passage of time between the crisis that precipitated entrance into the shelter (the first interview) and leaving the shelter program (second interview ) could itself account for the decrease in depression. Similarly, one might choose to interview women six months after they have been sexually assaulted in order to evaluate their level of fear, anxiety, stress, depression, and/or quality of life. How a woman is feeling at any point in time, however, will be influenced by a variety of factors. Is she unemployed, or was she just promoted at her job? Has she had a recent death in the family, or did she just have a baby? Is she dealing with custody issues? What is her level of social support from family and friends? Although a woman's experience of violence would be expected to contribute to her overall feelings of well-being, it is only one of many factors simultaneously influencing her adjustment and overall life satisfaction.

Factors that have been found to correlate with survivors' well-being include (1) their level of social support, (2) their level of daily stress and hassles, and (3) their access to community resources. Instruments designed to measure the level and intensity of these variables are described in the next section:

Access to Community Resources
Effectiveness in Obtaining Resources (Sullivan, Tan, Basta, Rumptz, & Davidson, 1992)
Facility Availability, Usage, and Quality (Coulton, Korbin, & Su, 1996)

Social Support
Multidimensional Scale of Perceived Social Support (Zimet, Dahlem, Zimet, & Farley, 1988)
Social Support Scale (Sullivan, Tan, Basta, Rumptz, & Davidson, 1992)

Stresses and Hassles
Perceived Stress Scale (Cohen, Kamarck, & Mermelstein, 1983)
Survey of Recent Life Experiences (Kohn & Macdonald, 1992)

Access to Community Resources

EFFECTIVENESS IN OBTAINING RESOURCES

Citation

Description

This 11-item scale was created to measure battered women's effectiveness in obtaining desired resources from their communities. For each area in which a woman reported needing help or assistance (legal assistance, housing, employment, education, child care, resources for her children, health care, material goods, finances, social support, "other"), she was asked how effective she was in obtaining that resource. Scale scores were created by taking the mean score across all valid areas. Therefore, a woman who worked on ten different issues did not receive an artificially inflated score when compared to a woman who only needed to work on two issues. This instrument was created specifically to evaluate a post-shelter advocacy intervention program and may need to be modified to effectively evaluate other interventions.

Sample items:

[Following an affirmative response to whether she needed help or assistance in an area]:

How effective have your efforts been in accomplishing your goals in this area? Would you say:

1=very ineffective
2=somewhat ineffective
3=somewhat effective
4=very effective

Access to Community Resources

FACILITY AVAILABILITY, USAGE, AND QUALITY

Citation	Coulton, C.J., Korbin, J.E., & Su, M. (1996). Measuring neighborhood context for young children in an urban area. American Journal of Community Psychology, 24(1), 5-32. Copyright © 1996 by Plenum Publishing Corp. Used with permission.
Description	This 13-item scale was created to differentiate between high-risk and low-risk neighborhoods for children with regard to crime, safety and child abuse. It can, however, be modified to ask about the availability of community resources more applicable to women with abusive partners, such as hospitals and clinics, legal aid offices, city hall, community college, etc.
Sample items:	3a. Is there a recreation center in your neighborhood? 1=yes 2=no b. How would you rate its quality? very bad 1 2 3 4 5 6 7 8 9 10 excellent c. Have you used this in the past 2 months? 1=yes 2=no
Reference	Coulton, C.J., Korbin, J.E., Su, M., & Chow, J. (1995). Community level factors and child maltreatment rates. Child Development, 66, 1262-1276. Korbin, J., & Coulton, C. (1994). Neighborhood impact on child abuse and neglect. Final report of Grant No. 90-CA1494. Washington, DC: National Center on Child Abuse and Neglect, Department of Health and Human Services.

Social Support

MULTIDIMENSIONAL SCALE OF PERCEIVED SOCIAL SUPPORT

Citation	Zimet, G.D., Dahlem, N.W., Zimet, S.G., & Farley, G.K. (1988). The Multidimensional Scale of Perceived Social Support. Journal of Personality Assessment, 52(1), 30-41. Copyright © 1993 by Laurence Earlbaum Associates. Reprinted with permission.
Description	This 12-item scale measures perceived social support in the area of (a) friends, (b) family, and (c) significant other. It was validated on college students but has also been used by Barnett et al. (1996) with a sample of women with abusive partners. This instrument is simple to use and interpret, and can be administered verbally or in written format.
Sample items:	Response categories: 1=strongly disagree 2=disagree 3=neither disagree or agree 4=agree 5=strongly agree [significant other] 1. There is a special person who is around when I am in need. [family] 4. I get the emotional help and support I need from my family. [friends] 12. I can talk about my problems with my friends.
Reference	Barnett, O.W., Martinez, T.F., & Keyson, M. (1996). The relationship between violence, social support, and self-blame in battered women. Journal of Interpersonal Violence, 11(2), 221-233. Dahlem, N.W., Zimet, G.D., & Walker, R.R. (1991). The Multidimensional Scale of Perceived Social Support: A confirmation study. Journal of Clinical Psychology, 47(6), 756-761.

Social Support

SOCIAL SUPPORT SCALE

Citation	Sullivan, C.M., Tan, C., Basta, J., Rumptz, M., & Davidson W.S. (1992). An advocacy intervention program for women with abusive partners: Initial evaluation. American Journal of Community Psychology, 20(3), 309-332. Copyright © 1992 by Plenum Publishing Corp. Used with permission.
Description	This 9-item scale was modified from Bogat et al.'s (1983) Social Support Questionnaire, and measures the amount and quality of respondents' social support. This scale has been used extensively with women who are or who have been involved with abusive partners. The instrument has been found to be sensitive to change over time (Tan et al., 1995), and is easy to administer and interpret. In creating a scale score, items should be reverse-scored and summed, so that a higher score indicates higher social support.
Sample items:	Response categories: 1=extremely pleased 2=pleased 3=mostly satisfied 4=mixed (equally satisfied and dissatisfied) 5=mostly dissatisfied 6=unhappy 7=terrible Thinking about the people in your life, family and friends: 3. In general, how do you feel about the amount of advice and information that you receive? 8. In general, how do you feel about the quality of emotional support that you receive?
Reference	Bogat, G.A., Chin, R., Sabbath, W., & Schwartz, C. (1983). The Adult's Social Support Questionnaire. Technical report. East Lansing, MI: Michigan State University. Tan, C., Basta, J., Sullivan, C.M., & Davidson, W.S. (1995). The role of social support in the lives of women exiting domestic violence shelters: An experimental study. Journal of Interpersonal Violence, 10(4), 437-451.

Stresses and Hassles

PERCEIVED STRESS SCALE

Citation	Cohen, S., Kamarck, T., & Mermelstein, R. (1983). A global measure of perceived stress. Journal of Health and Social Behavior, 24, 385-396. Copyright © 1983 by the American Sociological Association. Reprinted with permission.
Description	This 14-item instrument measures the degree to which individuals find their lives to be stressful. While some scales measure objective stressors, such as whether someone has recently lost his or her job, experienced the death of a loved one, etc., this scale measures the respondents' perceived stress. The items were designed to assess the degree to which people find their lives to be unpredictable, uncontrollable, and overloading. The scale was validated on college students and members of a smoking-cessation program, but has also been used by Tutty et al. (1993) with a sample of women who have abusive partners.
Sample items:	Response categories: 0=never 1=almost never 2=sometimes 3=fairly often 4=very often 1. In the last month, how often have you been upset because of something that happened unexpectedly? 4. In the last month, how often have you dealt successfully with irritating life hassles? 14. In the last month, how often have you felt difficulties were piling up so high that you could not overcome them?
Reference	Hewitt, P.L. (1992). The Perceived Stress Scale: Factor structure and relation to depression symptoms in a psychiatric sample. Journal of Psychopathology and Behavioral Assessment, 14(3), 247-257. Tutty, L.M., Bidgood, B.A., & Rothery, M.A. (1993). Support groups for battered women: Research on their efficacy. Journal of Family Violence, 8(4), 325-344.

Stresses and Hassles

SURVEY OF RECENT LIFE EXPERIENCES (SRLE)

Citation	Kohn, P.M., & Macdonald, J.E. (1992). The Survey of Recent Life Experiences: A decontaminated hassles scale for adults. Journal of Behavioral Medicine, 15(2), 221-236. Copyright © 1992 by Plenum Publishing Corp. Used with permission.
Description	This 51-item instrument was designed to measure individuals' exposure to hassles over the prior month. It contains six subscales: social and cultural difficulties [11 items], work [7 items], time pressure [8 items], finances [6 items], social acceptability [5 items], and social victimization [4 items]. This instrument was initially validated with a sample of visitors to the Ontario Science Centre, who were primarily young and highly educated. It has since been validated with a Dutch population (de Jong et al., 1996), but its generalizability to lower-income individuals and/or women who have experienced violence has yet to be determined.
Sample items:	Response categories (referring to "how much a part of your life" each experience has been in the prior month): 1=not at all 2=only slightly 3=distinctly 4=very much 1. Disliking your daily activities. 9. Too many things to do at once. 27. Financial burdens.
Reference	de Jong, G.M., Timmerman, I.G.H., & Emmelkamp, P.M.G. (1996). The Survey of Recent Life Experiences: A psychometric evaluation. Journal of Behavioral Medicine, 19(6), 529-542. Kohn, P.M., Hay, D.H., & Legere, J.J. (1994). Hassles, coping styles, and negative well-being. Personality and Individual Differences, 17(2), 169-179.

CHAPTER 8
DESCRIBING VICTIM SERVICES AND SUPPORT SYSTEMS

More than a third of STOP funds are being used to support a wide variety of victim services and other activities to help women victims of violence. The minimum goal of these projects is simply to make more victim services available, on the assumption that such services promote better victim outcomes (as detailed in Chapter 7). An additional goal of many projects is to change something about the ways that victim services are delivered. Some projects focus on changing the numbers or types of women they serve, which may also entail changing some aspects of service delivery. Some projects are trying to develop better integrated services within a full-service advocacy context. Other projects help victims during their involvement with a criminal justice system agency (e.g., Victim Witness Assistance Programs), and may have goals pertaining to system as well as victim outcomes (e.g., faster case processing, as well as better victim understanding of what is happening). Finally, some projects use advocacy to influence the ways that legal and other systems treat women victims of violence. Advocacy may occur at the case-specific level (i.e., helping an individual woman negotiate the system) or at the system level (i.e., trying to improve treatment of women in the system as a whole).

To know what has changed, one needs to know what has been. With respect to victim services and systems, this means we need to measure agencies using set of dimensions that encompasses their most important aspects—the dimensions across which programs serving and supporting victims may differ the most. This chapter offers a set of dimensions and some suggestions for how to assess them. The set is not exhaustive, but it does include quite a number of critical program and system characteristics. The examples it provides should give you some ideas about how to develop additional measures for yourself.

Institutional Affiliation, Organizational Structure, and Linkages

Activities supporting women victims of violence occur in many venues and are organized in many ways. As with changes to the civil and criminal justice systems described in Chapter 9, there are not many "tried and true" ways to measure the structure and activities of the programs and systems designed to help women. Table 8.1 offers a few of the most important dimensions, plus suggestions for possible measures. We cover venue, the structure of activities within an agency, and linkages to other agencies.

Venue

"Venue" refers to both the type of agency that houses a project and to its location. The type of agency offering the activity or service may be a justice system agency, another governmental agency, or a nongovernmental agency. Common agency types for STOP grants are law enforcement, prosecution, judicial, and nonprofit, nongovernmental victim service agencies, but others are also possible. You might want to start with the list in the SAR, items 17A and 17B (see Chapter 5), and expand it as necessary. Among the things you will want to examine are the following:

Whether the activities within a particular type of agency expand,
Whether some victim service agencies that existed prior to STOP begin receiving a larger (or smaller) proportion of the total funds available to support victim service activities,
Whether new types of agencies come into the picture (or try to) that have not previously been involved in victim services, and
Whether agencies change the mix of services they provide.

The location of STOP projects is also of great interest, given the focus of the VAWA on reaching underserved populations. Critical dimensions of location include:

Whether the program site is inner city, other urban, suburban, or rural,
Whether the program site is in or near major concentrations of women who would be considered underserved (e.g., women from urban racial, ethnic, cultural, or language minority communities), and
Whether the program is located in an agency with a history of serving women from designated underserved populations.

Structure of Activities

"Structure of activities" refers to several things, including vertical integration, duration of the problem integration, horizontal integration, and non-case-based advocacy. Vertical integration, in its ideal form, means that any woman coming for services is aided by the same staff person throughout the course of her case/situation/time with the agency. Thus one prosecutor does everything with a woman involved in prosecution, from the initial interview to the case preparation to the court appearance to the follow-up.

Duration of the problem integration means, again ideally, that the same agency, usually a victim support agency, offers a woman support from her first call through the entire duration of the violent conduct affecting her, the risk to her safety, and her psychological recovery. In many instances, this support extends well beyond the period during which a criminal case is in progress. This is especially important to note with respect to sexual assault, as most rape victims using hotlines, counseling, and other supportive services are never involved in a criminal case. It is also relevant to many battered women who come for counseling but are never involved in either a criminal or a civil case.

Horizontal integration, in its ideal form, means that an agency offers advocacy and assistance in all areas of a woman's life that the woman asks for or is interested in receiving. Such assistance may range from shelter and food to employment, housing, income support, education, child care and parenting, health screens, help with substance abuse problems, legal advocacy, psychological adjustment and recovery, counseling for the woman and sometimes other family and friends, safety planning, and so on. Help may be available through the program itself, or by linkages with other agencies. Linkages are especially likely for the more conventional benefits such as housing, income support, and child care.

Non-case-based advocacy is activity focused on system reform to enhance the safety and autonomy of women, prevent their further victimization by systems that are supposed to help them, see that offenders are held accountable, and increase the resources for helping women victims of violence. It also includes general advocacy to keep the spotlight focused on ending violence against women. Sometimes advocacy can also be case-based, as when an agency helps a particular woman deal with whatever part of the system she must negotiate.

Linkages

"Linkages" refers to the nature and variety of agencies, services, and activities with which your program has connections. "Connections," as used here, does not simply mean that you give out their phone number and tell a client to call. Rather, it means that you have some type of active arrangement with another program or agency to send your clients over, to help your clients get services, and so on. It is different from the situation in which your own agency staff goes with a client to help her negotiate the nitty-gritty of another agency such as the welfare office, the court, etc.—this situation falls under the "advocacy" category.

Services, Activities, and Clients

A major goal of programs operating under the victim services purpose area is to increase the variety and volume of services available to women. Another important goal of many STOP projects (not just those designated as "victim services") is to reach more women from the populations designated by the VAWA as underserved, including women from racial, ethnic, cultural, language, and sexual orientation minority communities, and rural women (who may also share other aspects of "underservedness"). Table 8.2 shows several objectives that are part of reaching these goals, including more and better services and advocacy; more women using services, especially those from underserved populations; and increased cultural competence of agency staff.

More and Better Services and Advocacy

You will want to document the type of services that have been developed or expanded, the level of service use, the quality of services, and their appropriateness to women's needs. Type of service to be expected will vary by type of agency, so there is no uniform list that will work for all STOP projects. Here are some examples of service types that might be relevant to different types of agencies:

For a victim-witness assistance program (VWAP), services might/should include interviewing a woman who has a case pending with the agency to gather relevant data, informing her of what to expect from the agency, developing a safety plan with the woman, keeping her informed about the progress of her case and the location/situation of the offender (if he is in custody), preparing her for testimony/ court/other events where she must be present, offering information and linkages to sexual assault/domestic violence services in the community.
For a sexual assault nurse examiner (SANE) program, services might/should include providing a private space for the interview and examination, timely and courteous treatment, information and counseling, explaining the steps in the exam, performing the evidentiary exam, providing medical treatment as necessary, providing support for any accompanying family or friend, offering follow-up contact and/or information and linkages to sexual assault services in the community, secure transfer of evidence, court testimony.
A hotline program might be expected to provide information and referral, telephone counseling, offers of in-person counseling or other services, callback or follow-up if the caller wants it or accepts it when it is offered. The content of telephone counseling might be further differentiated into concrete information, help thinking about safety, help anticipating the future, help dealing with feelings, help dealing with reactions of others, and so on.
A domestic violence shelter might be expected to provide food and lodging; safety planning; assessment and case management; links to public programs such as income maintenance, housing, or child care; help with parenting; court advice and advocacy for both civil and criminal cases; and so on.

The variety and volume of services or activities are relatively easy to document or measure. Documentation of variety will come from published agency documents, and can also be verified through interviews with program staff and clients. Service volume or use levels have a clear meaning and lend themselves to counting (e.g., number of hotline calls last month, number of nights of shelter provided). However, it can be a laborious task to establish the data system that will let you record service use levels, and an even more strenuous exercise to get program staff to enter the necessary data into the system. See Chapter 14 on developing data systems for thoughts on these issues.

The quality and appropriateness of services are considerably more difficult to conceptualize and measure. Quality can be measured in any number of ways, including timeliness, supportiveness of interactions between program and client, completeness in relation to scope of the client's issues and problems, and appropriateness. Appropriateness can be defined for this purpose as "Did she get what she needed, and did she get it when she needed it?" We suggest several sources of information on quality and appropriateness, each of which has its advantages and disadvantages. These sources are client feedback, expert ratings, and process feedback from key informants in the community who interact with the project and its clients.

Client feedback means asking clients whether they feel the services they received were what they needed, when they needed it. They can be asked whether they were satisfied with the services in general, and with various aspects of particular services. They can be asked whether there was anything they needed that they did not get, or whether they were treated in any ways that put them off. Nothing can replace this type of direct feedback from clients, and most programs find it invaluable in defining what they do well and what they could do better.

However, clients are generally inclined to be satisfied with services, especially when the questions asked are very broad. In addition, they usually do not know the full scope of what they might have gotten in an ideal full-service one-stop shopping agency that puts client needs uppermost. So it helps to get a reading on service quality from other sources in addition to clients. Therefore we include expert ratings and key informant feedback, as described below.

Expert ratings are one other source for evaluating service quality. Experts could be given a sample (random, of course) of case records containing client need assessments and documentation of agency contacts and service use. For each case they are asked to rate the thoroughness of the need assessment, the appropriateness of the help given to the needs revealed by the assessment, and areas where help might have been important but where it was not given. If they are familiar with the community, they might also be asked to identify opportunities for assistance that exist in the community but were missed in the case at hand.

An alternative approach to expert ratings can be used for the many agencies that do not keep written case records. In this alternative approach a meeting is set up with agency staff and the outside experts. The experts bring sample cases, usually starting with very simple ones and proceeding to ones of greater complexity. Staff describe how they typically handle each case, what resources they would call upon, how long they would follow the case, what they would look for as complicating issues, and so on. The experts can then reach a judgment about the general comprehensiveness and appropriateness of staff efforts with respect to a variety of cases.

Key informant feedback is a process analysis technique used to obtain perceptions of the program and its services from knowledgeable people in the community. These qualitative data can give you an idea of whether other agencies perceive the referrals they receive from the project being evaluated as appropriate, whether adequate documentation comes with the referrals, whether project staff are easy to work with on individual cases, and so on. Even if an adversarial relationship sometimes exists between the key informant's agency and the project being evaluated (due to advocacy work or other past interactions), you still can often get valuable information about the respect with which the project is perceived and the quality of the supports it offers to women. In fact, it can be extremely useful to know that the police department, prosecutor's office, or judge perceives the victim support program to be an effective advocate for its clients and sees it as having stimulated some changes in the system.

More Clients/Users, More Underserved Clients/Users

Your intake forms and other case records can give you information about important client characteristics. You should include all the characteristics requested by the Subgrant Statistical Summary (see Chapter 5), especially those related to characteristics that identify underserved populations. You will probably want to add more characteristics to your own forms, depending on the information you have found to be essential to helping women victims of violence.

VAWGO identifies women from racial, ethnic, cultural, language, and sexual minority groups, and rural women, as "underserved." Projects engaged in special outreach efforts to make services available to women from these groups in culturally competent ways should count their activities, and the women they reach, as relevant to the "underserved" criterion. However, if your agency routinely serves, or deals with, women from these groups just because that has been the nature of your business, and nothing has changed about the way you do it, then you do not meet the "underserved" criterion in VAWGO's terms. That is, if you are a VWAP in a police department, and serve mostly African-Americans because your jurisdiction has mostly African-American residents and that's who the department sees, this is your "traditional" population. You serve them all the time; they are not "underserved" by your department. If you make a special effort to reach members of another population, or if you do something substantially different so as to become more culturally competent and acceptable to the African-American community, then you can count the new and/or better-served women as "underserved" in VAWGO's terms.

Increased Cultural Competence

A very large agency might think about assessing cultural competence by comparing the demographic makeup of its staff to that of the community it serves. However, in a small agency, the percentage figures this would produce will not have much meaning. It may be more important to ask whether staff have had adequate training on cultural competence issues; to assess staff attitudes toward women from underserved populations, both in general and with regard to violence against women; and to identify the presence or absence of language competencies. In addition, it is critical to ask clients about their experiences with staff in relation to cultural competence. You could also ask the community at large about their perceptions of the agency as being hospitable to community members. You can do this through focus groups, interviews with key informants from the target underserved communities, or community surveys of the general public.

Table 8.1 Institutional Affiliation, Organizational Structure, and Linkages
Objectives	Specific Measures	Data Collection Procedures	Caveats
Expanded venues—more agencies, in more places, offer new or expanded victim services. Type of agency Location of agency VAW focus	Categorical data, specifying agency type and location. Use categories from SAR, or develop expanded categories. VAW focus = agency whose primary mission is offering service to women victims of violence. May be further differentiated into DV, SA, or both. If the agency's primary mission is not VAW, this variable would be scored zero.	Review of agency documents, grant application. Observation.	When identifying nonprofit, nongovernmental victim service agencies, be sure to use separate categories for generic victim services (e.g., VOCA, crisis hotline) and agencies that have a specific focus on violence against women.
Structure of services and activities increase in vertical, duration of problem, and horizontal integration,, and in non-case-based advocacy.	Duration of contact = average amount of time your agency is in contact with a client. Amount of vertical integration = number of people at your agency who deal with the average client during that client's contact with your agency. Amount of duration-of-problem integration = proportion of time-from-beginning-to-full-resolution that your agency offers support to clients. Amount of horizontal integration = proportion of life issues relevant to clients that your agency routinely provides. Amount of advocacy = percentage of staff time, or of agency budget, devoted to non-case-based system advocacy.	You will need a "measuring stick" of what "full service" means for your clientsCwhat is the complete package of assessment, services, etc. that would meet their needs. Then you have to examine your agency's activities to see what proportion of that need you address (horizontality). For duration-of-problem, the measuring stick should relate to time-from-beginning-to-resolution (as defined by the woman), and how much of that time your agency is available to the client. Also case records, agency budgets and staffing patterns, process analysis of agency activities, time and staff allocations.	There is no standard "measuring stick." You may have to do quite a few complete client needs assessments to get the whole picture, even if your agency has no capacity to meet many of these needs.
Agencies establish more linkages with important supports for women throughout community.	Number and type of agencies/services/activities with whom your agency commonly shares clients. Nature of linkagesCformal agreements between agencies; informal arrangements between line workers, co-location of staff or joint action in response to VAW incidents, degree of shared understanding, cross-training, etc.	Case records of your and other agencies. Client interview data reporting services received from other agencies and help received from your agency in accessing those services. See also Chapter 10 for measures of collaboration.

Table 8.2 Characteristics of Services, Activities, and Clients
Objectives	Specific Measures	Data Collection Procedures	Caveats
More and better services, activities, and advocacy will be available to women. Types of services/advocacy Amounts/Use levels Appropriateness and Quality	Number of different types of services, activities, advocacy; number given by program, number given by referral. Number of women using each type of service, help. Number of units given, for each type of service or help. Proportion of clients who receive some advocacy assistance in addition to direct services. Proportion of clients who got what they needed, when they needed it. Proportion of clients satisfied with help received.	Types of services/activities documented by content analysis of agency documents. Numbers of services, use levels, and source of service documented from agency records, including case records/client tracking system. Client ratings of own service, from interviews; expert ratings of sampled cases, using victim service advocates from outside the agency being evaluated; process feedback from key informants.	Expected/desirable types of services/activities will vary by agency. You will need to develop a "standard" for your type of agency and then assess your own activities against it.
More women, and more women from previously underserved populations, will use services/advocacy.	Number of women using each type of service or help, categorized by "underserved" characteristics. Number of women with other characteristics of importance to your project.	Client/case records and interviews. Client databases.	Be sure to define "underserved clients" as women from a group that has not usually received services and whether your project has a specific focus on reaching this population.
Cultural competence of staff will increase.	Percent of staff trained in cultural competence. Variety of resource materials available in relevant languages. Staff demographic profile compared to community demographic profile. Staff attitudes are supportive, accepting, positive. Client ratings of staff cultural sensitivity.	Examine agency training records. Examine resource materials. Compare agency personnel records with census data on community population. Use personnel records to assess staff language competencies. Use standard attitude measures. Client interviews.

CHAPTER 9
EVALUATING CRIMINAL AND CIVIL JUSTICE AGENCY CHANGES

One of the most important long-range goals of the STOP program is to change permanently the way justice system agencies do business. Change might take many forms, from law enforcement officers making more arrests, to prosecutors pursuing cases more vigorously, to courts imposing more victim-oriented conditions and sentences, to probation and parole officers supervising offenders more closely. Whatever type of change your STOP-funded project aims to produce, it is important to document the project's effects on how agencies work so you will know what you have accomplished and what is left to be done. This chapter provides some guidance in how you can go about anticipating and measuring changes in agency functioning.

Many different types of STOP projects can potentially impact how justice system agencies do their jobs. Projects falling under the training, special units, and policy development purpose areas clearly intend to improve the performance of justice agency personnel by providing them with more or better knowledge, resources, or work guidelines. Data system development projects may also be designed to help justice personnel do better work by providing them with improved information for making arrests, prosecuting cases, and sentencing and supervising offenders. Projects in the victim service area may include a collaboration component to help justice agencies serve victims better or develop improved training or policies. Projects on stalking or for Indian tribes might include any of a wide variety of activities, many of which may help justice agencies work better.

Even projects that are not explicitly intended to change how an agency works might bring about some changes along the way. For example, a private nonprofit victim service agency may be funded to provide in-court support to victims. Having these staff available may free up the victim/witness staff in the prosecutor's office to put more time into preparing victim impact statements, which may help judges make more informed sentencing decisions that provide better protection and justice to victims. If you predict any changes in how an agency works or is structured when you specify your project's logic model (see Chapter 2), whether these changes are the end goal, an intermediary step, or simply a by-product, you should consider how to measure these changes so you will have a thorough understanding of what your project has accomplished and how it made things happen.

You should also think about documenting or measuring agency changes that may happen completely independently of your project. These may affect your project and help explain its impact. But don't overdo it. Justice agencies can be laden with long-standing traditions and bureaucracies that resist change. When you consider what changes might come about and how you would measure them, be realistic about what you can reasonably expect given your STOP project's activities, timeline, and the general flexibility or inflexibility of the system you're working in. Also bear in mind the time frame of your evaluation and the resources you'll have available for evaluation activities when you decide what are the most worthwhile evaluation issues to pursue.

The rest of this chapter presents some ideas about what sorts of changes might occur in justice agencies and how to measure them, whether they are the goals of STOP projects or not. Tables are provided for each of the four major types of justice agencies—law enforcement (Table 9.1), prosecution (Table 9.2), judges and courts (Table 9.3), and corrections (Table 9.4). Each table provides some suggestions for how the agency might change, how you could measure each type of change, and where to get that information. These are only suggestions for the most general or obvious types of changes that might occur. You should consider what changes your project is likely to produce when you select from among these or devise your own.

This chapter is concerned with changes in tangible behaviors and agency resources rather than changes in individuals' attitudes, knowledge, or beliefs. Since behavioral change is not the sort of thing that lends itself to standardized measures, it's a bit more complicated than putting together a questionnaire and handing it out. In some cases the information you need may be available in an agency's case files or, better yet, in its automated records. In other cases you may wish to interview concerned parties to get richer, more in-depth information on how and why things have (or haven't) changed. Some creativity may be required in figuring out the least intrusive and most efficient way to get the information you need, and how to get the agency concerned to be willing to make any necessary changes in the way records are kept. Thinking through the issues involved in maintaining confidentiality will stand you in good stead here. When people feel they can trust you to keep their names separate from any of the things they tell you, they will tell you a lot more. If, on the other hand, they feel they cannot trust you and that information they give you may come back to haunt them, they will be a good deal more close-mouthed and you will not learn what you need to know.

These variables and measures can be used to make a number of different types of comparisons, depending on your research design. If you are using a pre/post design, for example, you can compare interview and evidence collection techniques before and after the officers in the law enforcement agency were trained or the new policy was implemented. Or, as another example, you can compare prosecutors' charging practices across jurisdictions if you are using a non-equivalent comparison group design. The critical point is to select the appropriate variable reflecting agency change, decide how best to measure it, and then measure it as consistently as possible across the time periods or sites your research design includes.

Also please note: there is no need to treat the suggestions in the following tables as gospel. They are merely suggestions of objectives and some measures that might be relevant to them. Certainly you should use these suggestions to stimulate your own thinking about some measures that you feel would offer convincing evidence that you have met your goals. Modify our suggestions to fit your circumstances, and add your own.

Don't Forget Background and External Factors

Tables 9.1 through 9.4 identify outcomes of interest and their potential measures within justice system agencies. Collecting data on these outcomes lies at the heart of any evaluation design. They are not, however, the only data you need to collect. Be sure to construct a logic model of your project, in which you note not only the services you offer and their outcomes, but also the background and external factors that could make your job harder or intervene to nullify the impact of the services you offer. You will need to include plans for collecting data on these factors as part of your evaluation design. Data on background factors (characteristics of clients, of staff, etc.) may be available from intake interviews, case records, personnel records, or research interviews. Data on external factors are most likely to come from process analysis, using any of the methods described in Chapter 6.

Table 9.1: Impacts on Law Enforcement Agencies
Objectives	Specific Measures	Data Collection Procedures	Caveats
Improve dispatchers' identification of calls for service as VAW.	Percent of correct classifications = number of calls which both dispatchers and responding officers classified as either VAW or not VAW, divided by the total number of VAW calls to which officers were dispatched.	Dispatchers must record, for every call, classification made (VAW vs. non-VAW) and whether officers were dispatched. Officers must record, for every call they're dispatched to, classification made (VAW vs. non-VAW).	This assumes that the classification of responding officers is the correct one, and judges dispatchers' classifications against them. Multiple calls on the same incident should be counted as a single event.
Improve dispatchers' response to calls for service by prioritizing VAW calls and always dispatching officers as soon as possible; accessing and providing officers with information on the history of the perpetrator; and providing the victim or other caller with crisis intervention and/or safety planning.	Priority rating for VAW calls = average priority assigned (sum of priority assigned to each VAW call divided by the total number of VAW calls). Dispatching rate = number of VAW calls to which officers were dispatched to respond divided by the total number of VAW calls. Timeliness of dispatching = number of minutes between time an officer becomes available to respond (or time of call if officers are available immediately) and the time he or she is dispatched. Provision of information to officers = number of calls in which the dispatcher provides any information on the perpetrator's criminal history, Brady indicators, protection order history, firearms possession or use history, or history of injuring victims, divided by the total number of calls for which such information is available. Services to victims/callers = proportion of calls in which dispatchers attempt to help the victim or caller meet immediate safety needs while waiting for the officers to arrive.	Dispatchers must record priority rating assigned to every VAW call, whether officers were dispatched to respond, time of call, time an officer became available, time officer was dispatched, provision of perpetrator history information to officers, and response to victim/caller.	Multiple calls on a single incident should be counted as one event.
Improve officers' response to dispatchers' instructions by increasing their response rates and timeliness.	Response rate = number of VAW calls to which officers responded on the scene divided by the number of VAW calls to which they were dispatched. Timeliness of response = number of minutes between the time an officer is dispatched and the time he or she arrives at the scene.	In addition to dispatchers' records, above, officers must record whether they responded to the scene and, if yes, their time of arrival.	Any delays in response that look unusually long should be assessed by examining the officers' location when dispatched and how much distance and traffic they had to get through to reach the scene of the call.
Improve interviewing, evidence collection, and report writing by responding officers and detectives to whom cases are referred for follow-up work. Incident reports and other relevant documents should be available to victims, prosecutors, and victim services programs more quickly.	Quality of on-scene and follow-up interviewing practices with victims, perpetrators, and witnesses = average ratings of interviewing practices. Good practices include on-scene separation of parties before interviewing (if DV), conducting interviews in a private setting, showing respect and courtesy, asking appropriate questions, and making thorough notes (including "excited utterances" that may be admissible in court without the victim's corroboration). Quality of documentation of physical evidence = average ratings of documentation practices. Good practices include collection of clothes or objects damaged in the assault, use of technology (e.g., photographing injuries or crime scene, DNA testing, referral for SANE or other rape kit examination, etc.), follow-up evidence collection (e.g., returning to photograph injuries several days after the assault when bruises have appeared), securing the chain of custody of evidence, and other means to improve casework. Percent of calls for which written report completed even if no arrest is made. Average number of days between incident and availability of report to victims, prosecutors, and victim service programs.	Quality of interviews can be assessed with surveys of victims, perpetrators, and witnesses. Criteria can include whether interviews were conducted in a private and secure context; interviewees' ratings of officer demeanor; and content of questions. Case files can also be examined to assess interview setting; content of questions; and whether notes were logical and thorough. Quality of evidence collection can be measured by reviewing case files and comparing them against standards of practice and resources available in the jurisdiction. For example, if victim reports injuries or damage and cameras are available to officers, there should be photographic evidence. Compare dispatch records to arrests to written reports, from agency records. Compare date of incident with date report becomes available.	Criteria for good investigation practices should be informed by local policies, regulations, and resources. Analyses should examine variations as well as average ratings to identify model practices and areas for improvement.
Improve assistance to victims by officers, detectives, advocates, counselors, and other law enforcement personnel.	Percent of victims who are offered assistance such as notification of rights, service referrals, and assistance with immediate needs (such as transportation for emergency medical treatment or shelter, or referral for hotline crisis counseling). For victims offered each type of assistance, percent who received or did not receive it, and reasons why not (e.g., victim refused, assistance not immediately available, victim not eligible for assistance, etc.). Percent of victims who are offered follow-up assistance such as case updates, information on the legal system, and service referrals. For victims offered each type of assistance, percent who received or did not receive it, and reasons why not. Can also measure the number of offers over time and the length of time over which the victim receives offers of assistance. See also Chapter 7, proximate measures, for victim feedback on agency performance; Chapter 8 for characteristics of victim services.	Victim assistance can be assessed through victim surveys and reviews of law enforcement case files documenting assistance offered and whether victims accepted or declined the assistance. Victim surveys should also identify types of help that victims needed but were not offered.
Officers and detectives should enforce laws against VAW more vigorously. More cases should be substantiated, more warrants issued, and more arrests made.	Percent of substantiated cases = number of cases of DV or SA found to have probable cause divided by the total number of cases investigated. Percent of cases pursued = number of substantiated cases in which search warrants, arrest warrants, or other types of follow-up options were both needed and obtained, divided by the number of substantiated cases where follow-up actions were needed (whether they were accomplished or not). Percent of cases with arrests = number of substantiated cases with an arrest divided by the total number of substantiated cases. Percent of cases in which victim's account of the incident (or 911 description) corresponds with the information that police included in their report, and with the resulting charge, if any.	Automated agency database should contain, for each case, whether the case was substantiated and what follow-up actions were taken, including warrants, arrests, and other relevant legal options. If there is no adequate automated database, this information should be available in case files. 911 records, or independent interviews with victims, will be needed in addition to information from above sources.	There should be a trend away from victim-initiated actions and toward officer-initiated actions. If case files are reviewed, look for signatures indicating victims took the initiative to request the action (such as obtaining an arrest warrant or making an arrest).
Agencies should have more and better resources to respond to cases of VAW.	Percent of agencies which have thorough, progressive written policies on VAW cases. Percent of agency personnel who receive training in a timely fashion when new policies are implemented. Percent of agencies that have adequate equipment and supplies to handle all VAW cases using up-to-date technology. Percent of agencies that regularly monitor dispatchers', officers', and other relevant staff's responses to calls and sanction non-compliance with agency policies.	Assess thoroughness and progressiveness of policies by content analysis. Policies should address a variety of important current topics. Assess training by review of training records for all agency personnel. Assess adequacy of equipment and supplies by making an inventory and interviewing managers and officers for their perceptions of whether they are sufficient given the number of cases. Interview managers, officers, dispatchers, and other staff to identify supervision practices and areas for improvement.	Important policy topic areas may be determined in part by current issues in the state or community. Some topic areas might include warrantless arrest, officer involvement in DV, providing services to victims, coordinating with other agencies, etc.
Agency staff should be culturally competent and have culturally appropriate resources available to them.	Percent of staff trained in cultural competence. Number and types of cultural competence resource materials available (e.g., information on cultural beliefs or norms about women and violence, for populations found in the jurisdiction). Staff demographic profile compared to community demographic profile. Percent of victims with language barriers for whom an appropriate interpreter is provided. Number and types of forms and brochures available in languages spoken by the population of the jurisdiction.	Examine agency training records and resource materials. Compare agency personnel records with census data on community population. Examine case records, and also records of use of interpreter service, if available. Use personnel records to assess staff language competencies. Examine forms and informational materials available to the public.
Special units or other ways of giving special attention to DV and SA cases should be more widespread, should handle many of the cases they were designed for, should be staffed with qualified personnel who should have reasonable caseloads, and should make valuable contributions to the agency's functioning.	Percent of agencies with special units or other ways to give special attention to DV, SA, or both. Percent of the total DV, SA, or DV and SA cases which are referred to the special unit (depending on what type or types of cases the unit is designed to handle). Number (in FTEs) and types of staff assigned to the unit (may include officers, detectives, counselors, or other personnel). Staff's average level of training and experience in DV, SA, or both. Number of cases per staff member or team (depending on unit's structure) as measure of caseload. Average rated value of the unit's contributions to the agency. Reduced number of interviews with victim, people involved with victim, duplicate information-gathering by prosecutors, allowing smoother transition to prosecution.	Assess special units or staff designated to give special attention; the number, types, and qualifications of personnel; and the improvements in case processing through interviews with management and with victims, and through reviews of agency policies, organizational charts, annual reports, personnel records, etc. Measure case referrals and caseloads through agency database (or case file reviews). Measure unit's contributions through interviews with managers and staff both inside and outside the unit, and with victims.
Staff should be well-trained in VAW cases.	Percent of staff who meet minimum requirements for VAW training, as specified by state POST or other training authority, or by state or local law, policy, regulation, etc. Percent of specialized staff who have advanced training in DV, SA, or both, depending on type of unit. Percent of staff following correct procedures learned in training (e.g., training to identify the primary aggressor measured by reductions in number of dual-arrest cases).	Measure by review of personnel records or records of the training authority.
Agency's caseload should increase due to higher reporting rates and greater case activity (e.g., more investigation, more arrests, etc). Increased caseload will lead to the need for additional staff, overtime, reorganization, or redeployment of staff.	Reporting = number of VAW reports divided by the total number of crime reports, including 911 calls and walk-in reports at police stations and other places officers are stationed (such as hospital emergency rooms). Caseload = the average number of cases per staff and the average amount of time put into each case. Increased demand arises from increases in staff caseload and time spent per case, over time.	Dispatchers and others receiving reports must record classification of report (VAW vs. non-VAW). Caseload should be available in the agency's automated or manual management information system database. The agency's response to increased demand can be measured through management information on hiring, use of overtime, staffing deployment changes, and reorganization.	If improved classification procedures are implemented, the evaluator must be very careful that differences in reporting rates are truly due to changes in public reporting rather than changes in dispatcher classification procedures.

Table 9.2: Impacts on Prosecution Offices
Objectives	Specific Measures	Data Collection Procedures	Precautions
Tougher case filing practices should be followed. This includes filing charges in more cases referred from law enforcement, and filing more charges, filing charges at higher severity levels, and filing charges which clearly indicate a crime of DV or SA (laws permitting). Prosecutors should seek out information on the perpetrator's history and use it in charging decisions.	Percent of cases referred from law enforcement in which charges are filed vs. dropped vs. other disposition. For cases with charges filed, average number of charges and severity of the charges (i.e., felony vs. misdemeanor, class A vs. B, or other indicators of charge severity). Percent of cases referred from law enforcement as VAW which are charged with offenses not recognizable as DV or SA (e.g., disturbing the peace or drunk and disorderly rather than DV assault, assault or indecent liberties rather than sexual assault). Percent of cases in which prosecutors obtained information on the perpetrator's history of protection orders, arrests, court cases, probation/parole status, firearms violations, and so on, and whether more cases with serious histories were charged more severely (compared with cases with comparable current allegations but less serious histories).	Automated office database should contain, for each case, whether and what charges were filed. If there is no automated database, this information should be available in case files. Whatever the source, charges should be specified as VAW-related or not VAW-related at each stage of the process, and this classification must be retained at each stage so changes in status can be noted. Prosecutors could be interviewed to determine their use of perpetrator's history information in charging decisions.	Filing practices are more likely to be improved where law enforcement agencies have strong evidence collection and case preparation policies and practices. In assessing the severity of the charges filed, it is important to understand local conditions and policies since filing higher-level charges may not always be the best strategy for obtaining a conviction or a harsher penalty. For example, in some jurisdictions stronger penalties may be more likely by filing misdemeanor than felony charges, due to local court practices or other factors. For states which do not have special codes for DV or SA crimes, charge severity must be used as the measure of "charging down" at the filing stage.
More indictments should be sought and obtained, on more and higher charges. When guilty pleas are negotiated, less "down-charging" should occur. Requirements to attend treatment should still be accompanied by sentencing (charges should not be dropped).	Percent of cases brought forward for indictment vs. dropped vs. guilty plea reached pre-indictment. For cases brought to the grand jury, percent on which an indictment is obtained. Differences in the number, severity, and type (VAW vs. non-VAW) of charges brought for indictment vs. charges on which an indictment is handed down. Differences in the number, severity, and type of charges to which a guilty plea is entered vs. charges filed, when a guilty plea is negotiated.	Automated agency database should contain, for each case, whether an indictment was sought, a guilty plea entered, or the charges were dropped. With an indictment or plea, the database should include charges in the indictment or to which the defendant pled guilty. When an indictment was sought, information should also be included on the grand jury's decision. If there is no adequate automated database, this information should be available in case files.
Practices for setting the amount and conditions of bail should involve a systematic risk assessment, including risk of both elopement and further violence, and should provide more protections for victims.	Percent of cases in which a systematic risk assessment is conducted, and the nature of that assessment. For high risk cases, the number of special conditions imposed to protect victims, such as stay-away orders, surveillance conditions, participation in treatment for violence and substance abuse if appropriate, and the like.	Interviews with prosecutors and pretrial services staff can identify the type and use of risk assessment procedures. The automated agency database (or hard copy case files) can be reviewed to determine whether cases identified as high risk are more likely to receive special conditions (compared with lower-risk cases and/or high-risk cases from another jurisdiction or before risk assessment procedures were instituted).
Indicted cases are prosecuted more vigorously. Prosecutors make more use of alternative evidence presentation strategies (besides victim testimony); charges are not charged down as much in the plea negotiation process; and fewer cases are dropped.	Percent of prosecutors willing and able to pursue cases with evidence other than victim testimony when the victim is reluctant (such as admissible hearsay evidence, forensic evidence, witness testimony, expert testimony, etc.). For cases in which a guilty plea is reached through negotiation, differences in the number and severity of charges to which the defendant pleads guilty vs. indicted charges. Percent of indicted cases in which the charges are subsequently dropped.	Measure prosecutors' willingness and ability to use alternative evidence presentation strategies through interviews with prosecutors to determine when they use such strategies and what the barriers to using these strategies are. Measure "down-charging" and percent of indicted cases which are dropped through automated agency database or, if none available, reviews of case files.
Stronger coordination mechanisms are used. Prosecutors use practices that ensure integration of cases across civil and criminal courts, and across misdemeanor and felony offenses. State and local prosecutors coordinate with U.S. Attorneys and federal agencies to ensure prosecution under the most appropriate jurisdiction. Mechanisms should also be used to facilitate cross-jurisdictional enforcement of protection orders.	Percent of cases with involvement in more than one court in which a single prosecutor handles all cases or prosecutors closely coordinate their work. Percent of cases in which prosecutor obtains and makes case decisions based on information about all relevant cases regardless of which court they appear in. Provisions of working agreements between prosecutorial offices and other relevant federal agencies (such as the Bureau of Alcohol, Tobacco, and Firearms, and the Federal Bureau of Investigation), and how well these links function. Usefulness of policies, legislation, databases, and other resources to enhance enforcement of civil or criminal protection orders from other states, counties, cities, territories, or tribal areas.	Searches of case documentation, either electronic or manual. Interviews with state, local, and federal prosecutors, and investigators in federal agencies, to assess working relationships and areas for improvements. Interviews with law enforcement, prosecutors, and court administrators and support staff to assess the infrastructure supporting cross-jurisdiction protection order enforcement, and areas for improvement.
Better case outcomes: more convictions are obtained, on more and higher charges. Method of case disposition may or may not change. Case processing should become more efficient, with expedited docketing, fewer continuances, and timely victim notification of all phases of the case.	Percent of indicted cases with each type of disposition: conviction by guilty plea, conviction by trial, acquittal, diversion or suspension, or charges dropped. For cases with guilty plea or verdict, number and severity of charges of conviction. Time from case filing to disposition, number of continuances, and number and timing of victim notification contacts.	Measure case outcomes through an automated agency database which should contain, for each case, date and type of disposition for each charge (also need date of initial case filing to compute case processing time). If there is no adequate automated database, the disposition information should be available in case files.	Local practices and conditions should be very carefully examined to determine what are desirable disposition types and favorable outcomes. For example, plea bargaining can be a very useful tool if cases are not bargained down to minor offenses with light penalties that do not protect victims. Faster case processing time may be desirable from the system's perspective but must not be too fast to deliver justice and protection to victims.
Improve assistance to victims by prosecutors, investigators, paralegals, victim/witness staff, and other personnel.	Percent of victims who are offered assistance such as notifications of rights, service referrals, case updates, information on how the court system works, and support for participation in the process. For victims offered each type of assistance, percent who received or did not receive it, and reasons why not (e.g., victim refused, assistance not immediately available, victim not eligible for assistance, etc.). Can also measure the number of assistance offers over time, the length of time over which the victim receives offers, and offers to victims who are not able to testify. See also Chapter 7, proximate measures, for victim feedback on agency performance; Chapter 8 for measures of victim services.	Victim assistance can be assessed through victim surveys and reviews of case files documenting assistance offered and whether victims accepted or declined. Victim surveys should also identify types of help that victims needed but were not offered. In addition, process analysis should use semi-structured interviews with advocates and other knowledgeable people to identify problems, barriers, and gaps in assistance as well as improvements and successes in serving victims.
Prosecutors' offices should have more and better resources to respond to cases of VAW.	Percent of offices with thorough, progressive written policies on VAW cases. Percent of office personnel who receive training in a timely fashion when new policies are implemented. Percent of offices with special units on DV, SA, or both. These units may consist of prosecutors, paralegals, victim/witness counselors, investigators, or other personnel. Special units' caseloads and the average rated value of the unit's contributions to the agency. Number of prosecution staff with whom victim must interact (fewer is better). Consistent responses and information about case from all prosecution staff.	Assess thoroughness and progressiveness of policy by content analysis. Policies should address a variety of important current topics. Assess training by review of training records for all office personnel. Assess special units and the number and types of personnel through interviews with managers, reviews of office policies, organizational charts, annual reports, personnel files, etc. Assess special unit's caseload through agency database (or case file review). Measure the unit's contributions to the agency by interviews with staff inside and outside the unit. Assess by reviews of case records, and by interviews with victims.	Important policy topic areas may be determined in part by current issues in the state or community, and may include aggressive prosecution practices, caseload management, special court structures, providing services to victims, coordinating with other agencies, etc.
Prosecutors and other agency staff should be culturally competent and have culturally appropriate resources available to them.	Percent of staff trained in cultural competence. Number and types of cultural competence resource materials available (e.g., information on cultural beliefs or norms about women and violence, for populations found in the jurisdiction). Staff demographic profile compared to community demographic profile. Percent of victims with language barriers for whom an appropriate interpreter is provided. Number and types of forms and brochures available in languages spoken by the population of the jurisdiction.	Examine agency training records and resource materials. Compare agency personnel records with census data on community population. Examine case records, and also records of use of interpreter service, if available. Use personnel records to assess staff language competencies. Examine forms and informational materials available to the public.
Staff should be well-trained in VAW cases.	Percent of staff who meet minimum requirements for VAW training, as specified by the state prosecutors' association, or by state or local law, policy, regulation, etc. Percent of staff in special units who have advanced training in DV, SA, or both, depending on type of unit.	Measure by review of personnel records or records of the training authority.
Prosecution office's caseload should increase because of these improvements and/or improvements in law enforcement case preparation and referrals. Having more cases and prosecuting them more vigorously will lead to the need for additional staff, staff overtime, or redeployment of staff.	Caseload = the average number of cases per staff and the average amount of time put into each case. Increased demand arises from increases in staff caseload and time spent per case, over time.	Caseload should be available in the office's automated or manual management information system. The office's response to increased demand can be measured through management information on hiring, use of overtime, staffing deployment changes, and reorganization.

Table 9.3: Impact on Judges and Courts
Objectives	Specific Measures	Data Collection Procedures	Precautions
Court policies should make protections such as civil orders more available by streamlining and simplifying forms and procedures, providing more and better assistance, and removing barriers such as fees.	Number and types of changes in forms and procedures courts have made, and average ratings of usefulness of changes. Percent of victims who receive and court personnel who offer assistance in filling out court forms. Types of help given and average ratings of helpfulness, and types of help still needed. Number and types of barriers to court protection which have been removed. Percent of protection orders issued for which fees were waived for issuing the order; for serving the order. Existence of a protection/restraining/stay away order registry, and percent of all orders, including emergency orders, that become entries on the registry.	Changes in court forms and procedures, and removal of barriers, can be assessed by interviewing court personnel and reviewing forms and policies for changes. Interviews with staff who work in the court setting (whether court employees or private victim service agency staff) and surveys of victims can provide information on how helpful changes in forms and procedures have been, how many victims receive assistance when needed, who offers assistance, what type of assistance is available, how helpful the assistance is, and what needs are left unaddressed.
Judges and other court personnel should have access to, and should use, information from other court cases involving the same victim and/or perpetrator in making sentencing decisions.	Number of cases where victim and/or perpetrator are involved in related cases in other courts for which judge has access to information about the other cases; number of cases in which the judge uses that information to make sentencing decisions.	Review of case records; assessment of electronic or other mechanisms for acquiring relevant information from other cases; interviews with judges and staff conducting pre-sentence investigations.
Case activity should increase as a result of greater case activity by law enforcement, prosecution, or corrections, and/or from new court policies or procedures making it easier to obtain court protection. Courts should issue more warrants (e.g., arrest warrants, search warrants), subpoenas, and orders (e.g., protection orders) to facilitate case investigation and prosecution, and to protect victims. Courts should hold more hearings and impose more penalties in response to reported violations of civil and criminal court orders and conditions of diversion, bail, probation, or parole.	Measures may include the number of such court actions, the percent of cases with such actions, or the percent of requests which are affirmed, as appropriate. Measures may include the number of hearings, the percent of cases with hearings, the percent of hearings which result in a finding of a violation, and the severity of penalties imposed on violations.	Automated court database should contain, for each case, information on warrants, subpoenas, and orders requested from the court, and result of the request (issued or denied). Violation reports, hearings, findings, and actions should also be available. If no adequate automated system exists, this information should be available in case files.
Judges should make court orders more responsive to victims' situations and needs, and provide better protection and justice to victims. For example, there should be fewer mutual protection, restraining, or no-contact orders, and protection and other orders (such as separation, divorce, or custody decrees) should contain more provisions needed by victims. Adequate procedures for serving orders and documenting proof of service should be followed.	Percent of cases without mutual orders, the typical provisions of orders, service and documentation procedures, and the percent of victims who are satisfied with the provisions of their orders.	Surveys of victims and/or advocates can assess what orders are issued, what the provisions of the orders are, whether these provisions address the victims' circumstances and need for safety and justice, and how the service process worked. Reviews of automated or manual court files can also assess what orders are written and their provisions and service procedures.
Judges and other court personnel, including those doing pre-trial and pre-sentence investigations, should give victims' needs and input more consideration. Better protection should be provided for victims and witnesses in the courthouse. More assistance should be offered to facilitate victims' involvement with the court, such as court schools, transportation and food allotments, and child care.	Percent of judges, magistrates, and other decision-makers who give victims' input more consideration when deciding whether to issue warrants, impose conditions or sentences, and so on. Percent of courts that have metal detectors and guards at the doors, escort services, and private secure waiting rooms for victims and witnesses. Percent of court proceedings in DV or SA cases held in courtrooms with safeguards. Percent of courts that offer such services, and percent of victims who avail themselves of the services. Victims' assessments of the value of this assistance and areas for improvement.	Interviews with judges and other decision-makers can be done to assess how they consider the victims' input in decision-making, and what is needed to enhance the victim's role. Courthouse security can be assessed through inspection of court facilities; interviews with security personnel to assess safeguards in use and still needed; interviews with court personnel, victims, and witnesses to measure perceived safety; and review of reports of violent incidents maintained by court security personnel. Court database or case file reviews should indicate which proceedings were held in secure settings. Victim assistance activities can be assessed by reviews of court records on assistance activities, and interviews with victims to get their feedback.
Judges should allow and support increased court use of appropriate evidence and testimony.	Percent of cases in which evidence is presented from DNA testing, polygraph tests for sex offenders, and/or use of expert witness testimony about rape trauma syndrome, psychological reactions to violence, recovery patterns, and belief structures (myths and attitudes). Also, the percent of cases in which rape shield laws are applied appropriately.	Automated court database or court files should document the evidence presented in proceedings.
Judges' use of sentencing practices and conditions of diversion, suspension, or other alternative methods of case processing should reflect greater efforts to hold offenders accountable and provide protection to victims. Diversion or suspension should be used only in ways that serve victims' interests (i.e., not for the convenience of the justice system when that conflicts with victim's interests).	For convictions, assess percent of cases with fine, jail time, probation time, community service requirements, and other conditions such as restitution or no-contact orders. Can also compute average fine and restitution amount and average probation, jail, or community service time. Assess percent of cases in which the victim is given the opportunity to adduce testimony on the future risk posed by the defendant, in addition to the impact of past behavior, and the use of this information in sentencing decisions. For diverted or suspended cases, assess conditions of diversion/suspension and how they meet victims' needs for justice and safety. Percent of offenders who are ordered to batterer intervention, substance abuse treatment, sex offender treatment, and other appropriate services. Percent of offenders ordered to specialized monitoring, such as electronic surveillance, intensive supervised probation, special victim safety check and/or notification requirements, and polygraph tests for sex offenders under supervision. Percent of cases for which offender has post-sentencing interaction with a judge, for monitoring. Frequency of such interactions.	Information on sentencing, victims' testimony, and conditions of probation, parole, and diversion/suspension (including orders for treatment and specialized monitoring) should be available through prosecutor's or court's automated database. Judges and victims can be interviewed for their perceptions of the impact of the victims' testimony on sentencing decisions. Measure whether sentencing and diversion/suspension practices respond to victims' needs through interviews with prosecutors and victims and/or victim advocates.	Exactly what "victim's needs" means might be different for each case, but in general court-ordered conditions should try to provide for victim safety from further assault or abuse and financial well-being (e.g., orders for restitution or separate maintenance) in so far as possible. Rather than using a set list of court-imposed conditions, it might be most useful to conduct more open-ended interviews and ask victims/advocates what they wanted, what they got, how it worked out, and what they still needed but didn't get.
Courts offices should have more and better resources to respond to cases of VAW.	Percent of courts which have thorough, progressive written policies and/or bench guides on VAW cases. Percent of court personnel (judges, magistrates, clerks, bailiffs, and others) who receive training on new policies, and how soon after policy implementation the training begins. Percent of courts with an adequate number of research and writing staff to prepare legal memoranda on substantive, evidentiary, and procedural issues so that judges are fully apprised of legal precedent and public policy developments. Percent of jurisdictions with unified court systems or other ways of facilitating coordination of the different courts which may be involved in VAW cases (e.g., family, civil, and criminal). Percent of jurisdictions with specialized courts, specialized calendars, or specialized judicial assignments. Specialized court's caseload and the average rated value of the court's contributions to the jurisdiction, according to interviews with personnel on the specialized court and others in the jurisdiction. See also Chapter 7, proximate measures, for victim feedback on agency performance.	Assess thoroughness and progressiveness of policy by content analysis. Policies should address a variety of important current topics. Assess training by review of training records for all court personnel. Interview legal staff and judges to assess their the sufficiency of time available for qualified staff to prepare legal briefings, and the usefulness of their products. Assess cross-court coordination mechanisms through interviews with court personnel and review of policies, memoranda, shared information systems, and other documentation. Assess specialized courts and the number and types of personnel through interviews with managers and reviews of office policies, organizational charts, annual reports, personnel files, etc. Assess special unit's caseload through agency database (or case file review).	Important policy topic areas may be determined in part by current issues in the state or community, and may include such topics as cross-court coordination, working with other agencies to supervise compliance with court orders, etc.
Judges and court staff should be culturally competent and have culturally appropriate resources available to them.	Percent of staff trained in cultural competence. Number and types of cultural competence resource materials available (e.g., information on cultural beliefs or norms about women and violence, for populations found in the jurisdiction). Staff demographic profile compared to community demographic profile. Percent of victims with language barriers for whom an appropriate interpreter is provided. Number and types of forms and brochures available in languages spoken by the population of the jurisdiction.	Examine agency training records and resource materials. Compare agency personnel records with census data on community population. Examine case records, and also records of use of interpreter service, if available. Use personnel records to assess staff language competencies. Examine forms and informational materials available to the public.
Court personnel should be well-trained in VAW cases.	Percent of staff who meet minimum requirements for VAW training, as specified by the state judges' association, or by state or local law, policy, regulation, etc. Percent of staff in specialized courts who have advanced training in DV, SA, or both, depending on type of court.	Measure by review of personnel records or records of the training authority.
Court caseloads should increase because of court improvements and/or improvements in law enforcement and prosecution practices. Heavier caseloads will lead to the need for additional staff, staff overtime, court restructuring, or redeployment of staff.	Caseload can be measured by the number of cases, number of court appearances or actions per case, and case processing time. Increased demand arises from increases in number of cases and time spent per case, over time.	Caseload should be available in the court's automated or manual management information system database. The court's response to increased demand can be measured through management information on hiring, use of overtime, staffing deployment changes, and reorganization.

Table 9.4: Impact on Corrections (Probation and Parole)
Objectives	Specific Measures	Data Collection Procedures	Precautions
More proactive assessment of supervision needs and efforts to ensure appropriate supervision is ordered.	Number of agencies that systematically assess supervision needs, and nature of the assessment procedures. When level or type of court-ordered supervision appears inappropriate, percent of cases in which the corrections agency either modifies conditions accordingly, or asks the court to modify its supervision orders (in jurisdictions where this is required).	Interview correctional offices to assess the nature and use of assessment procedures. An automated agency database or case file records should document modifications indicated by assessment results, and whether the supervising agency modified supervision accordingly or requested the court to do so.
Offender monitoring should increase. Probation and parole officers should directly monitor offenders more closely and have more contact with other community agencies involved in the case to monitor offender compliance and victim safety. For example, it may be necessary to check with law enforcement to verify that a sex offender has registered as required and to be notified of rearrests; to communicate with batterer treatment providers to monitor compliance with court orders to treatment; and to communicate with victims and/or advocacy/service providers to detect repeat abuse and notify victims of offender status changes (e.g., release, furlough, escape, the opportunity to participate in reviews or hearings, and so on).	Direct offender monitoring = average number of contacts with offender per case, average length of time over which contacts occur in a case, and types of contacts most frequently made (e.g., visits to offender's home or workplace, visits by offender to officer's office, etc.). Compliance monitoring = average number of contacts with victims and other agencies involved in the case.	Interview probation and parole officers and staff of other relevant agencies, and review automated database of probation or parole agency, if available, or case files to document contacts with offenders and other agencies. Interview offenders to assess their perceptions of the monitoring process and what additional steps could be taken to fill gaps.
Closer communication with the courts on offender compliance or violations. More violation reports to the court, as a result of enhanced monitoring and/or stronger policies on reporting violations.	Communication with the court = average number of reports to the court on compliance or non-compliance with conditions of probation or parole. These may involve periodic court reviews or special reports issued in response to violations. Violation reports = average number of violation reports to the court.	Probation, parole, or court automated database if available; probation or parole officers' case files; and/or court case files.
Supervision of offenders should improve: more referrals to treatment programs and other services needed or ordered by the court, more assistance with obtaining services (e.g., assistance with fees, transportation or other accessibility issues, waiting list priority, etc.), and greater service availability in prisons and jails.	Offender service referrals = average number of service referrals, types of referrals most frequently made (to batterer treatment programs, substance abuse treatment, etc.). Referral assistance = average number of assistance efforts given, types of assistance most frequently given. Service availability = number and types of service programs available in prisons and jails; Accessibility = number of inmates per program opening.	Interview probation and parole officers and offenders to assess factors which facilitate and hinder offenders from obtaining services, how the officers helped, and other interventions needed to make services more accessible. Review officers' records or agency database for information on referrals made and assistance provided. Interview prison and jail managers, counselors and other service program staff, and inmates to identify service programs and accessibility issues. Review management records on program activities and attendance.
Probation and parole should have more and better resources to respond to cases of VAW.	Percent of agencies with thorough, progressive written policies on VAW cases. Percent of agency personnel who receive training in a timely fashion when new policies are implemented. Percent of agencies with specialized staff (such as victim liaisons) or special units on DV, SA, or both. Special staff or unit caseloads and the average rated value of their contributions to the agency. Percent of cases for which corrections officers track and monitor offenders after probation and parole ends.	Assess thoroughness and progressiveness of policy by content analysis. Policies should address a variety of important current topics. Assess training by review of training records for all agency personnel. Assess special staff or units and the number and types of personnel through interviews with managers, reviews of office policies, organizational charts, annual reports, personnel files, etc. Assess special staff or unit's caseload through agency database (or case file review). Measure the staff or unit's contributions to the agency by interviews with staff inside and outside the unit. Assess by review of case records.	Important policy topic areas may be determined in part by current issues in the state or community, and may include models of intensive supervision, monitoring victim safety, etc.
Staff should be well-trained in VAW cases.	Percent of staff who meet minimum requirements for VAW training, as specified by the state correctional officers' association, or by state or local law, policy, regulation, etc. Percent of staff in special units who have advanced training in DV, SA, or both, depending on type of unit. Percent of staff with explicit understandings gained in training (e.g., what constitutes a violation of a PO). Percent of offenders "violated" by probation or parole officer for different offenses, before and after training.	Measure by review of personnel records or records of the training authority.
Correctional agencies' caseload should increase because of these improvements and/or improvements in prosecution and the courts. Having more cases and monitoring them more closely will lead to the need for additional staff, staff overtime, or redeployment of staff.	Caseload = the average number of cases per staff and the average amount of time put into each case. Increased demand arises from increases in staff caseload and time spent per case over time.	Caseload should be available in the agency's automated or manual management information system. The agency's response to increased demand can be measured through management information on hiring, use of overtime, staffing deployment changes, and reorganization.
Corrections staff should be culturally competent and have culturally appropriate resources available to them.	Percent of staff trained in cultural competence. Number and types of cultural competence resource materials available (e.g., information on cultural beliefs or norms about women and violence, for populations found in the jurisdiction). Staff demographic profile compared to community demographic profile. Percent of offenders with language barriers for whom an appropriate interpreter is provided. Number and types of forms and brochures available in languages spoken by the population of the jurisdiction.	Examine agency training records and resource materials. Compare agency personnel records with census data on community population. Examine case records, and also records of use of interpreter service, if available. Use personnel records to assess staff language competencies. Examine forms and informational materials available to the public.

Notes for Chapters 5-9

Chapter 5
1. The final SAR and the provisional SSS forms can be found at the end of this chapter. Instructions for both forms, plus the electronic version of the SAR, are available from VAWGO (if you are a state coordinator) or from your state coordinator (if you are a subgrantee).

Chapter 6
1. This chapter presents and overview of evaluation options and issues, but cannot take the place of the many books and texts that exist on evaluation design, research methods, and data collection strategies. Readers who desire these more detailed treatments may want to look at one or more of the books listed in the Addendum to the Chapter.

Chapter 7
1. Cris M. Sullivan is Associate Professor of Community Psychology at Michigan State University, where she has conducted numerous studies on the course of women's recovery from domestic violence and the effects of services on women's outcomes. She serves on the Advisory Group for the National Evaluation of the STOP Formula Grants under the Violence Against Women Act.

2.Reliability refers to certain characteristics of a scale related to sameness: that all the items correlate highly with each other; and that if the scale is used at two different points in time and nothing has changed between those times, the scale will produce approximately the same score both times. Validityrefers to a number of characteristics of a scale related to correctness: that all the items on the scale seem on their face to be measuring the concept of interest; that each item on Scale A "likes" the other items on Scale A better than it "likes" the items that comprise Scale B, C, etc.; and that the scale scores are strongly associated with concepts you would expect them to be associated with, and not associated with things you would not expect them to be related to.

3. Unquestionably, additional scales measuring safety and well-being and meeting the above criteria either have been overlooked by the author or were excluded due to space limitations. The inclusion or exclusion of an instrument in this chapter does not necessarily reflect an endorsement or lack thereof on the part of the author or the Urban Institute.

Chapters 5 - 9

CHAPTER 5 USING THE SAR AND THE SSS

Getting Information for the SAR and the SSS

Using the SAR and SSS Information

How to Handle Special Circumstances

Other Ways to Get Useful Information

Impact Evaluation Designs

The Impact Evaluation Design "Decision Tree"

Process Analysis

Performance Monitoring

Data Collection Strategies

The Planner's Questions

Additional Considerations in Planning your Evaluation

Addendum: Evaluation and Basic Research Methods

INTRODUCTION TO THE RESOURCE CHAPTERS

Example 1: Counseling Services

Example 2: Special Prosecution Unit

Example 3: Court Advocacy Program

CHAPTER 7 VICTIM SAFETY AND WELL-BEING: MEASURES OF SHORT-TERM AND LONG-TERM CHANGE

What is Short-Term and What Is Long-Term Change?

Measures of Short-Term Change

Measures of Long-Term Change: Victim Safety and Well-Being

Intimate Physical Abuse

ABUSIVE BEHAVIOR INVENTORY - PARTNER FORM

Intimate Physical Abuse

THE REVISED CONFLICT TACTICS SCALES (CTS2)

Intimate Physical Abuse

THE DANGER ASSESSMENT

Intimate Physical Abuse

INDEX OF SPOUSE ABUSE (ISA)

Intimate Physical Abuse

SEVERITY OF VIOLENCE AGAINST WOMEN SCALES

Psychological Abuse

INDEX OF PSYCHOLOGICAL ABUSE

Psychological Abuse

THE PSYCHOLOGICAL MALTREATMENT OF WOMEN INVENTORY

Sexual Abuse

THE SEXUAL EXPERIENCES SURVEY

Depression

BECK DEPRESSION INVENTORY

Depression

CENTER FOR EPIDEMIOLOGICAL STUDIES-DEPRESSION (CES-D) SCALE

Post-Traumatic Stress

POSTTRAUMATIC STRESS SCALE FOR FAMILY VIOLENCE

Post-Traumatic Stress

THE TRAUMA SYMPTOM CHECKLIST

Quality of Life

QUALITY OF LIFE SCALE

Quality of Life

SATISFACTION WITH LIFE SCALE

Self-Esteem

Self-Esteem

ROSENBERG SELF-ESTEEM INVENTORY

Growth Outcomes and Coping Strategies

"HOW I SEE MYSELF NOW"

Growth Outcomes and Coping Strategies

"CHANGES THAT HAVE COME FROM YOUR EFFORTS TO RECOVER"

Growth Outcomes and Coping Strategies

"HOW I DEAL WITH THINGS"

Access to Community Resources

EFFECTIVENESS IN OBTAINING RESOURCES

Access to Community Resources

FACILITY AVAILABILITY, USAGE, AND QUALITY

Social Support

MULTIDIMENSIONAL SCALE OF PERCEIVED SOCIAL SUPPORT

Social Support

SOCIAL SUPPORT SCALE

Stresses and Hassles

PERCEIVED STRESS SCALE

Stresses and Hassles

SURVEY OF RECENT LIFE EXPERIENCES (SRLE)

CHAPTER 8 DESCRIBING VICTIM SERVICES AND SUPPORT SYSTEMS

Institutional Affiliation, Organizational Structure, and Linkages

Services, Activities, and Clients

CHAPTER 9 EVALUATING CRIMINAL AND CIVIL JUSTICE AGENCY CHANGES

Don't Forget Background and External Factors

Notes for Chapters 5-9

Continue to next section (Chapters 10-15)

CHAPTER 5
USING THE SAR AND THE SSS

CHAPTER 7
VICTIM SAFETY AND WELL-BEING:
MEASURES OF SHORT-TERM AND LONG-TERM CHANGE

CHAPTER 8
DESCRIBING VICTIM SERVICES AND SUPPORT SYSTEMS

CHAPTER 9
EVALUATING CRIMINAL AND CIVIL JUSTICE AGENCY CHANGES

Continue to next section
(Chapters 10-15)