This is the accessible text file for GAO report number GAO-04-9 entitled 'Small Business Administration: Model for 7(a) Program Subsidy Had Reasonable Equations, but Inadequate Documentation Hampered External Reviews' which was released on March 31, 2004. This text file was formatted by the U.S. General Accounting Office (GAO) to be accessible to users with visual impairments, as part of a longer term project to improve GAO products' accessibility. Every attempt has been made to maintain the structural and data integrity of the original printed product. Accessibility features, such as text descriptions of tables, consecutively numbered footnotes placed at the end of the file, and the text of agency comment letters, are provided but may not exactly duplicate the presentation or format of the printed version. The portable document format (PDF) file is an exact electronic replica of the printed version. We welcome your feedback. Please E-mail your comments regarding the contents or accessibility features of this document to Webmaster@gao.gov. This is a work of the U.S. government and is not subject to copyright protection in the United States. It may be reproduced and distributed in its entirety without further permission from GAO. Because this work may contain copyrighted images or other material, permission from the copyright holder may be necessary if you wish to reproduce this material separately. Report to Congressional Committees: March 2004: SMALL BUSINESS ADMINISTRATION: Model for 7(a) Program Subsidy Had Reasonable Equations, but Inadequate Documentation Hampered External Reviews: [Hyperlink, http://www.gao.gov/cgi-bin/getrpt?GAO-04-9]: GAO Highlights: Highlights of GAO-04-9, a report to Chairman and Ranking Minority Member, House Committee on Small Business; Ranking Minority Member, Senate Committee on Small Business and Entrepreneurship Why GAO Did This Study: The Small Business Administration (SBA) approved about $8.6 billion in loan guarantees through its 7(a) loan program in fiscal year 2003. SBA must estimate the subsidy cost of this program. Since fiscal year 2003, SBA has been using econometric modeling to estimate the subsidy. This report reviews SBA’s estimation methodology and equations, assesses the default and recovery rates the model produced, identifies ways to enhance the estimates’ reliability, describes the process for developing the model, and analyzes SBA’s data. What GAO Found: From an economics perspective, SBA’s econometric equations were reasonable, and its model produced estimated default and recovery rates that were in line with historical experience. However, from an audit perspective, SBA’s lack of documentation of the model development process precluded GAO, and others, from independently evaluating the model’s development and determining if SBA used a sound and consistently applied method to select and reject model variables. Taking into account economic reasoning and research, SBA’s econometric equations for estimating defaults, prepayments, and recoveries were reasonable. SBA’s equations used a limited set of variables; equations using other variables could also be reasonable but would produce different estimates. Since an estimate is an approximation, no one estimate can be considered accurate, and reasonable estimates can fall within a range of values. The model's estimated default and recovery rates were in line with recent historical experience. SBA could improve its estimation methodology by periodically checking for and correcting errors and should consider adding more borrower information, such as credit scores. Some errors in the model resulted in understating the estimated program costs. SBA used the expertise of other agencies and a contractor to develop its model and worked closely with the Office of Management and Budget (OMB), which must approve the methodology agencies use to estimate subsidies. OMB officially approved the model in the fall of 2002. SBA did not adequately document its model development process, including alternative variables considered and rejected, to enable external reviewers to assess the process that was used. Further, GAO and two other independent reviewers could not determine whether a bias existed in the model by systematically excluding variables to influence the subsidy rate in a particular direction. Adequate documentation, a key internal control, would enable SBA and other agencies to demonstrate the rationale and basis for key aspects of the model that provide important cost information for budgets, financial statements, and congressional decision makers and facilitate SBA’s annual financial statement audit. Current OMB and other guidance is either silent or unclear about the level of documentation necessary for credit subsidy model development. SBA had a process to help ensure data integrity and data consistency in the equations with the loan-level data in its databases. Although errors existed in SBA’s data systems, the magnitude and nature of these errors were not likely to significantly affect the subsidy rate. What GAO Recommends: SBA should (1) determine whether to include in the model other information from its new loan monitoring system, (2) periodically evaluate and update the model, and (3) document the model development process. OMB should require agencies to document the basis and process for developing their credit subsidy models. SBA agreed with recommendations to improve the final model but SBA and OMB disagreed that the model development was inadequately documented and disagreed with our recommendations to improve such documentation and guidance. However, given the difficulty experienced by reviewers due to inadequate documentation, we continue to recommend that SBA document the basis and process for developing its model and that OMB require this documentation. www.gao.gov/cgi-bin/getrpt?GAO-04-9. To view the full product, including the scope and methodology, click on the link above. For more information, contact Davi D'Agostino at (202) 512-8678 or dagostinod@gao.gov. [End of section] Contents: Letter: Results in Brief: Background: SBA's Equations Were Reasonable and Estimated Default, and Recovery Rates Were in Line with Historical Experience: SBA's Model Could Be Enhanced by Adding Information on Borrowers, Correcting Errors, and Updating Some Data: SBA Collaborated with OFHEO and OMB to Develop the Model: Lack of Adequate Model Documentation Hampered Independent Reviews of SBA's Model: SBA Had a Process to Help Ensure Data Quality and the Data Used in the Model and SBA's Loan Level Databases Were Consistent: Conclusions: Recommendations for Executive Action: Agency Comments and Our Evaluation: Appendixes: Appendix I: Objectives, Scope, and Methodology: Assessing the Reasonableness of the Model's Econometric Equations and Evaluating the Model's Estimated Default, Prepayment, and Recovery Rates: Identifying Additional Steps SBA Could take to Further Enhance the Reliability of the Model: Reviewing SBA's Process of Developing the Subsidy Model: Evaluating the Model's Supporting Documentation, Including Its Discussion of What Variables Were Tested and Rejected: Determining What Steps SBA Took to Ensure the Integrity of the Data Used in the Model and Whether These Data Were Consistent with Information in Its Databases: Appendix II: Analysis of Default, Prepayment, and Recoveries Econometric Equations: SBA's Default and Prepayment Equations: Effects of Including Additional Variables: SBA's Recovery Equation: Appendix III: Comments from the Small Business Administration: GAO Comments: Appendix IV: Comments from the Office of Management and Budget: GAO Comments: Appendix V: GAO Contacts and Staff Acknowledgments: GAO Contacts: Staff Acknowledgments: Tables: Table 1: Variable Names and Descriptions: Table 2: Multinomial Logistic Regression Coefficient Estimatesa: Table 3: Names and Descriptions of Additional Variables: Table 4: Multinomial Logistic Regression Coefficient Estimatesa: Table 5: Distribution of SIC Industry Codes in SBA's Loan Database Distribution of SIC Industry Codes in SBAs Loan Database: Table 6: Variable Names and Descriptions: Table 7: Recovery Model: Figures: Figure 1: Major Segments of the Model to Estimate 7(a) Subsidy Rate: Figure 2: Estimated Default Rates Compared with Average Default Experience from 1992 through 2001: Figure 3: Estimated Default Rates Compared with Fiscal Year 2001 Actual Default Experience: Abbreviations: CFO: Chief Financial Officer: FCRA: Federal Credit Reform Act of 1990: GDP: gross domestic product: NAIC: North American Industrial Classification: OFHEO: Office of Federal Housing Enterprise Oversight: OMB: Office of Management and Budget: SBA: Small Business Administration: SIC: Standard Industrial Classification: Letter March 31, 2004: The Honorable Donald A. Manzullo: Chairman, Committee on Small Business: House of Representatives: The Honorable Nydia M. Velazquez: Ranking Minority Member: Committee on Small Business: House of Representatives: The Honorable John F. Kerry: Ranking Minority Member: Committee on Small Business and Entrepreneurship: United States Senate: The 7(a) program is the Small Business Administration's (SBA) largest lending program for small businesses. SBA reported that it approved about $8.6 billion in loan guarantees in fiscal year 2003. The program provides loan guarantees of up to 85 percent for loans made to small businesses that are unable to obtain financing on reasonable terms in the private credit markets. Like most federal loan or loan guarantee programs, SBA's 7(a) program is subject to the Federal Credit Reform Act of 1990 (FCRA). FCRA requires most agencies with government lending programs to estimate annually the cost to the federal government of extending or guaranteeing credit over the life of the loans (the subsidy cost). Since an estimate is an approximation, no one estimate can be considered accurate with certainty, and reasonable estimates can fall within a range of values. Changes in estimation methodologies, variables, or data used to calculate an estimate, are likely to result in differences in the estimate. In fiscal year 2003, SBA implemented a new methodology to estimate the subsidy cost of the 7(a) program that is based on econometric modeling.[Footnote 1] SBA officials told us that the new 7(a) model was the first step in a long-term effort to develop and implement new econometric models for their credit programs. Although this allowed SBA to build a model that responds to the need for greater sensitivity to a wider variety of factors than a model based on historical averages, SBA believes that this approach may not be appropriate for all its credit programs. In order to calculate the subsidy cost of their programs, agencies must estimate the present value of future cash flows over the life of the program, which for the 7(a) program are principally affected by defaulted loans, prepayments of outstanding loans, recoveries on defaulted loans, and fees. The revised method SBA adopted for the subsidy calculation has four segments: (1) the econometric equations that are used to estimate the likelihood of defaults and prepayments, (2) the equations used to estimate the extent of recoveries, (3) the cash flow module, and (4) the Office of Management and Budget (OMB) Credit Subsidy Calculator, as shown in figure 1. The results of the first and second segments--the econometric equations--are a key input into the third. The third segment--the cash flow module--uses these results, along with OMB forecasts of interest rates, unemployment rates, and gross domestic product growth rates to estimate cash inflows from fees and recoveries on defaulted loans and outflows from claim payments on defaulted loans. The resulting cash flows are entered into the fourth segment, the OMB Credit Subsidy Calculator, which calculates the (1) present values of the cash flows and (2) the subsidy rate. Figure 1: Major Segments of the Model to Estimate 7(a) Subsidy Rate: [See PDF for image] [End of figure] This report responds to your November 26, 2002, and December 11, 2002, requests that we review the methodology that SBA developed to estimate the subsidy costs of its 7(a) loan program for the fiscal year 2004 budget. As agreed with your staff, we (1) assessed the reasonableness of the model's econometric equations and evaluated the model's estimated default and recovery rates based on the 7(a) program's recent historical loan experience; (2) identified any additional steps SBA could take to further enhance the reliability of its subsidy estimate produced by the model; (3) described SBA's process for developing the subsidy model; (4) evaluated the model's supporting documentation including its discussion of what variables were tested and rejected; and (5) determined what steps SBA takes to ensure the integrity of the data used in the model and determined whether these data are consistent with information in SBA's databases. We did not, however, validate SBA's model. First, to analyze the model, we obtained from SBA copies of the model as approved by OMB in 2002, along with the loan-level data that were used to develop the subsidy estimates. We analyzed the econometric equations to determine whether they were reasonable based on the variables they included, the statistical techniques used, and the results obtained. For example, we determined whether the econometric equations included appropriate variables and whether the variables used in the equations were statistically significant. To evaluate the model's estimated default and recovery rates, we compared these rates with recent historical loan experience of the 7(a) program provided by SBA. Using SBA's data, we also calculated what SBA would have estimated for default and recovery rates based on the estimation methodology it used prior to its fiscal year 2003 budget submission. Second, to identify any additional steps SBA could take to enhance the reliability of its model, we considered additional types of data that SBA might collect and consider including in its econometric equations. As part of this analysis, we reviewed the academic literature on default modeling and interviewed officials with several banks engaged in similar efforts. Third, to describe SBA's process for developing the model we met with SBA and OMB officials. Fourth, to evaluate the model's supporting documentation, including its discussion of what variables were tested and rejected, we obtained and analyzed available relevant documents and met with SBA officials and their contractor who developed the model. We compared the information presented in SBA's model documentation with existing credit subsidy guidance. Finally, to determine what steps SBA took to ensure the integrity of the data used by the model and to determine whether these data were consistent with information in its databases, we assessed SBA's processes for ensuring data reliability. We examined the type and level of errors and evaluated the likelihood that they would significantly affect the credit subsidy estimates. We also compared the loan-level data used in the model with the data contained in SBA's databases. Appendix I discusses the details of our methodology. We conducted our work in Washington, D.C. from December 2002 to March 2004 in accordance with generally accepted government auditing standards. Results in Brief: Overall, we found that from an economics perspective, SBA's econometric equations were reasonable, and the SBA model produced estimated default and recovery rates that were in line with historical experience. However, from an audit perspective, SBA's lack of adequate documentation of the model development process precluded us from (1) independently evaluating the model's development; (2) determining whether SBA used a sound and consistently applied method to select and reject variables to be included in the model; and (3) determining whether a bias from selecting variables existed in the model. We found that SBA's econometric equations for estimating defaults, prepayments, and recoveries were reasonable. SBA's equations used a limited set of variables; equations using other variables could also be reasonable but would produce different estimates. We also found that the model's estimated default and recovery rates were in line with recent historical experience. SBA's econometric equations related the likelihood of defaults and/or prepayments to several variables that economic reasoning and prior research suggested were appropriate to this type of model, and, at the time of our review, SBA used appropriate statistical techniques to identify the nature of these relationships. In addition, SBA's equations produced estimated relationships for defaults and prepayments that were consistent with expectations based on economic reasoning. For example, the likelihood of default was estimated to be higher when unemployment was higher. SBA's equations used a limited set of variables, and we found that equations using additional variables available to SBA that it did not include, such as measures of interest rates and the businesses' industry type, would also be reasonable. If SBA had used these alternative equations, it might have estimated a higher or lower subsidy rate. SBA did not include any economic variables in its equation for estimating recoveries, so that forecasted recovery amounts were not dependent on expected economic conditions. According to documentation provided by SBA of the work done to develop this equation, adding economic variables would not have increased the precision of the recovery rate estimates. SBA could enhance the model and the reliability of the subsidy estimate produced by the model by including additional information that SBA expects to have in the future and by correcting errors. SBA intends to collect new business and business-owner information to determine how it affects loan performance and such information may suggest variables that can be useful in the model. SBA's econometric equations used variables from its current databases and economic indicators, such as gross domestic product (GDP) growth rates and unemployment rates, to forecast future defaults and prepayments. However, at the time of our review, SBA's current database did not include other information on businesses or business owners, such as information on borrowers' credit that is often used by private sector lenders to determine potential defaults and losses. Academic literature on default models suggests that such information is predictive of defaults. SBA has recently contracted to develop a loan monitoring system that is intended to track this information and allow the agency to determine how it affects loan performance. During our review of the model, we identified some errors that resulted in underestimates of the program costs of around $6.5 million or about 6.8 percent of the estimated cost of the program for fiscal year 2004. To develop its subsidy model, SBA drew on the expertise of other government agencies and consulted with OMB officials. In February 2002, SBA entered into an arrangement with the Office of Federal Housing Enterprise Oversight (OFHEO), which has staff with expertise in econometric modeling, to assist in the development of the 7(a) subsidy model.[Footnote 2] OMB also played a key role in the development of the model because FCRA requires OMB to approve the methodology that each federal agency uses to estimate the subsidy costs of its loan programs. Thus, SBA consulted with OMB officials during the model's development, and OMB officially approved the model in the fall of 2002. OMB officials said that their role in reviewing the model was primarily to provide oversight and ensure compliance with the law. Because at the time of our review, SBA routinely had its cash flow models reviewed by an independent third party, it hired an outside consultant to conduct limited reviews of the econometric equations and cash flow segment. The consultant identified some errors that SBA corrected. SBA did not prepare adequate supporting documentation to enable us and other independent reviewers to understand and evaluate the process that SBA used to develop the model. While SBA provided some general documentation of its model development process, the documentation lacked adequate discussion of alternative variables or combinations of variables that SBA considered, which variables were rejected for which reasons, and specific examples based on results of earlier regressions. As a result, we were unable to determine whether a bias in selecting variables existed in the model. SBA officials told us that they did not prepare this type of documentation because they believed that there was no specific requirement to do so. Current guidance is either silent or unclear about supporting documentation needed to explain the development of econometric models used to generate credit subsidy estimates for the budget and financial statements. However, maintaining adequate documentation on how such models were developed is a sound internal control practice that would provide SBA and other agencies the opportunity to more fully demonstrate and explain the rationale and basis for key aspects of their models that provide important cost information for budgets, financial statements, and congressional decision makers. This documentation would also help facilitate SBA's annual financial statement audit. SBA hired a private contractor to reconcile the information submitted to it by 7(a) program lenders with the data stored in SBA's loan-level databases on a monthly basis and, at the time of our review, had an ongoing process to correct any errors that were found. Although errors existed in SBA's data systems at the time of our review, we determined that the magnitude and nature of these errors were not likely to significantly affect the subsidy rate. In addition, SBA officials told us that they performed various ad hoc reviews of the information in SBA's loan-level databases to assess its accuracy and were currently assessing various alternatives to further enhance its data integrity. On the basis of our analysis of a statistical sample of defaulted, prepaid, and active loans, as well as recoveries from defaulted loans, we found that the data SBA used to calculate the subsidy costs were consistent with the loan level data contained in SBA's actual databases at the time of our review. This report contains three recommendations to SBA and one recommendation to OMB. We recommend that SBA (1) determine how best to include in the model borrower-specific information that it intends to collect in its new loan monitoring system; (2) establish a process for periodically revising the model to correct errors and to reflect any changes in the 7(a) program or other factors that could affect the subsidy estimate; and (3) prepare adequate documentation of the model development process including a detailed discussion of alternative variables or combinations of variables that were considered, tested, and rejected and criteria for doing so. We also recommend that OMB require that agencies document the basis for credit subsidy estimates and reestimates, including the process followed for selecting model methodologies over alternatives and variables tested and rejected with the basis for excluding them. We received comments on a preliminary draft of this report from SBA and OMB. SBA agreed with the findings and the first two recommendations related to the final model. OMB had no comments. While a draft of this report was at the agencies for comment, we continued to pursue additional documentation that SBA had that might further explain its 7(a) model development process, including what variables were selected and rejected and why. This final report discusses the lack of adequate documentation and recommends improvements in SBA's documentation of the development process for its credit subsidy models and in OMB's Circular A-11 guidance. SBA generally disagreed with our findings and recommendations related to the lack of adequate documentation supporting the model's development process. OMB disagreed with our recommendation that it revise Circular A-11. However, in light of the consistent difficulty experienced by three independent reviewers of SBA's 7(a) credit subsidy model, including SBA's financial statement auditors, we continue to recommend that SBA enhance its credit subsidy model documentation and that OMB require agencies to document the basis and process used to develop credit subsidy models, including understanding the model's basis and the variables that were selected and rejected. Background: FCRA was enacted, among other reasons, to provide more accurate measures of the costs of federal loan programs and to more accurately compare costs among credit programs and between credit and noncredit programs. FCRA requires agencies with loan guarantee programs to estimate the subsidy cost, or the cost to the government, of their loan guarantees over the life of the loan. To calculate the subsidy costs, agencies must calculate, on a cohort[Footnote 3] basis, the net present value of the forecasted cash flows for the program, which for SBA included estimated defaults, recoveries, and fees related to the 7(a) program. In addition, as part of this process, SBA must determine the effects of loan prepayments on the cash flows. Under FCRA, SBA provides information that generates a single subsidy rate and does not provide information about any uncertainty in its estimate of the rate or other factors affecting the rate, such as prepayments or defaults. Prior to its 2003 budget submission, SBA's methodology for estimating the subsidy on its 7(a) loans used historical averages for defaults and recoveries based on loan data going back to 1986 as the basis for estimates of future defaults and recoveries. This approach resulted in fairly stable subsidy estimates on a yearly basis as it included a sufficient volume of historical information that smoothed out fluctuations in economic conditions from year to year. However, this approach resulted in SBA consistently overestimating defaults and recoveries. In previous work, we found that SBA overestimated defaults by about $2 billion from fiscal years 1992 to 2000.[Footnote 4] In an effort to improve the accuracy of its subsidy estimate, SBA implemented a new methodology based on econometric modeling to estimate the subsidy cost for the fiscal year 2003 and 2004 budget submissions. Econometric modeling has advantages over historical averaging. For example, to the extent that data are available, it can take into account the effects of changes in such factors as economic conditions, program rules, and loan types on defaults and prepayments. All forecasts are uncertain, and this uncertainty has multiple causes. When relationships among economic variables are estimated, uncertainty may arise from the choice of variables used in the model, from the degree of precision with which the strength of the relationships is estimated, and from uncertainty about the future values of the independent variables used in the forecasting equation. Excluding a variable that should be in a forecasting model can reduce the quality of the model. For example, if some industries have high default rates, then excluding industry variables will tend to underestimate default costs in years when many loans go to high risk industries and overstate default costs in years when many loans go to low risk industries. The choice of variables to be used in a model results from a process of professional judgment and balancing the risks of including too many or too few variables. Economic theory and statistical tests play an important role in these decisions. The remaining sources of uncertainty, the precision of the estimated relationships and uncertainty about future values of independent variables, are often beyond the control of those building the model. The precision of the effects of the independent variables is determined largely by the amount of data available to the analyst, and uncertainty about future values of independent variables is inherent in any forecast. Internal control is a major part of managing an organization and this includes controls over data gathering and processing, such as SBA's data on 7(a) loans. As mandated by the Federal Managers' Financial Integrity Act of 1982, the Comptroller General issues standards for internal control in the federal government.[Footnote 5] These standards provide the overall framework for establishing and maintaining internal control and for identifying and addressing major performance and management challenges and areas at greatest risk of fraud, waste, abuse, and mismanagement. According to these standards, internal control comprises the plans, methods, and procedures used to meet missions, goals, and objectives. Control activities are the policies, procedures, techniques, and mechanisms that enforce management's directives and help ensure that actions are taken to address risks. Control activities are an integral part of an entity's planning, implementing, reviewing, and accounting for government resources and achieving effective results. They include a wide range of diverse activities including controls over information processing. These controls are established to ensure that all data inputs are received, are valid, and outputs are correct. Agency management should design and implement internal control based on the related costs and benefits. No matter how well designed and operated, internal control cannot provide absolute assurance that all agency objectives will be met and, thus, once in place, internal control provides reasonable, not absolute, assurance of meeting an agency's objectives. SBA's Equations Were Reasonable and Estimated Default, and Recovery Rates Were in Line with Historical Experience: We found that the econometric equations that SBA used to estimate defaults, prepayments, and recoveries were reasonable, although other equations could also be reasonable. SBA uses an appropriate statistical technique for identifying the nature of these relationships. In addition, SBA's equations produced estimated relationships for defaults and prepayments that were consistent with expectations based on economic reasoning. We found that there were additional variables available to SBA that it did not include in its equations, such as measures of interest rates and the borrower's industry type that would also be reasonable and would produce different subsidy rates. In addition, SBA did not include any economic variables in its equation for estimating recoveries. According to documentation provided by SBA to estimate recoveries on defaulted loans, adding economic variables would not have increased the precision of the recovery rate estimates. Finally, we found that the new model's estimated default and recovery rates were in line with recent historical experience. Variables in SBA's Default and Prepayment Equations Were Appropriate: The econometric equations that SBA used at the time of our review related the likelihood that a borrower would either default on or prepay a loan to several variables that economic reasoning and prior research suggested were appropriate to include in these types of equations. These variables included: (1) characteristics of the borrower's business, such as whether it was a sole proprietorship, partnership, or corporation; (2) characteristics of the loan, such as the amount borrowed; and (3) two measures of economic conditions, the unemployment rate in the state where the loan was made and the GDP growth rate. Economic reasoning and prior research suggested that differences in borrower and loan characteristics and economic conditions were likely to influence defaults and prepayments. For example, prior research suggested that new businesses were less likely to survive than were established businesses and thus were more likely to default.[Footnote 6] Prior research also suggested that the likelihood of default on loans made to partnerships or corporations should be less than it was for loans made to sole proprietors, while the likelihood of prepayment should be greater. Details about SBA's econometric equations are found in appendix II. SBA's Statistical Technique and Estimated Relationships for Prepayments and Defaults Were Appropriate: At the time of our review, SBA used an appropriate technique known as multinomial logistic regression[Footnote 7] to identify whether the variables included in its model were important influences on the likelihood that a borrower would either default on or prepay a loan and to estimate the magnitude of these relationships. This technique, which has been used in other models of this type, was appropriate because it corresponded to the decision-making process that borrowers faced. When deciding whether to default on the loan, prepay the loan, or keep it active, using this technique, SBA produced estimates of both the probability of default and the probability of prepayment.[Footnote 8] The relationships that SBA's equations estimated between different variables and the likelihood of defaults and prepayments were consistent with economic reasoning. For example, SBA's default equation suggested that defaults were more likely when unemployment was higher, and the rate of increase in gross domestic product was lower. Both of these estimated relationships were consistent with economic reasoning because it was less likely borrowers would continue paying their debts when more people are out of work, and the economy was growing less rapidly or in decline. SBA's prepayment equation also suggested that prepayments were more likely when loans were made under the SBA Express Program, for which SBA guaranteed a smaller percentage of the loan amount than it did under the regular 7(a) business loan program. This result was consistent with our expectations because the smaller guarantee was likely to make lenders more cautious in making lending decisions, such that firms borrowing through this program may have been more creditworthy than firms borrowing through the regular program. In turn, the businesses' enhanced creditworthiness may have led to more prepayments because these businesses may have been relatively more financially stable and may have been more likely to pay off their loans early. The details of SBA's default and prepayment equations, which show these relationships, are in appendix II. Other Default and Prepayment Equations Would Also Be Reasonable and Lead To Different Subsidy Rate Estimates: We identified additional variables available to SBA, but not included in the model, that also influenced the likelihood of defaults and prepayments. The choice of variables included in a model reflects the modelers' professional judgment and different equations using different sets of variables can all be considered reasonable. To analyze the effect of adding additional variables, we tested SBA's model to estimate the 2003 subsidy cost using additional variables that (1) measured the current interest rate on 1-year U.S. Treasury bills and (2) considered the industry in which the borrowing firm operates. The interest rate could be important as either another measure of general business conditions or as a specific measure of the cost of capital. The industry in which the borrowing firm operates could be important if default and/or prepayment rates vary among industries, and the distribution of loans among industries varies over time. In addition, banks have traditionally recognized that the financial performance of a borrower depends on the nature of the business supporting the loan, the structure of the loan, and the financial condition of the firm. At the time of our review, SBA's econometric equations contain information on the loan and the firm but did not include information on the firm's business. The estimates produced by our testing suggest that these variables also influenced the likelihood of defaults and prepayments occurring and, therefore, that equations using these variables could also be reasonable.[Footnote 9] However, there are additional considerations that could be important in deciding whether to include a measure of interest rates in the default and prepayment equations. Specifically, including an interest rate variable would mean that forecasted interest rates would be used with the results of the econometric equations (and forecast values of other economic variables) to forecast future defaults and prepayments. The fact that forecasting interest rates is difficult may be a reason for not including an interest rate variable, even if the variable appears to be significantly related to the historical likelihood of default or prepayment. Furthermore, at present, forecasted interest rates are low relative to the interest rates that prevailed over most of the period from which the data were drawn to develop SBA's equations, potentially limiting the usefulness of including an interest rate variable. We found that including either the interest rate on 1-year Treasury bills or the industry in which the borrowing firm operates as a variable in the default and prepayment equations changed the estimated cost of the program. (See app. II.) According to SBA's model, the estimated subsidy rate for loans disbursed in 2003 was 1.04 percent. This estimate increased to 1.13 percent with the industry identifiers included and decreased to 0.76 percent with the inclusion of the interest rate on 1-year Treasury bills. In addition, when we included both the interest rate variable and the industry identifiers, we estimated a subsidy rate of 0.83 percent. Because interest rates are difficult to predict and have recently been quite low, we conducted tests to determine how sensitive the estimate was to small changes in forecasted interest rates. We found that it is not very sensitive to such changes. For example, when we increased the forecasted values above those included in the official OMB forecast by 10 percent, we estimated a subsidy rate of 0.80 percent while when we decreased the forecasted values by 10 percent we estimated a subsidy rate of 0.73 percent. The range of estimated subsidy rates that result from including additional variables was roughly comparable to the range that resulted from using different economic assumptions. We tested the sensitivity of SBA's estimated subsidy rate to small changes in the forecast values of the GDP growth rate and the unemployment rate by reestimating the subsidy rate with SBA's model but used both more optimistic and more pessimistic assumptions about future economic conditions.[Footnote 10] With the more optimistic assumptions, we estimated the subsidy rate decreased to 0.81 percent while with the more pessimistic assumptions we estimated that it increased to 1.28 percent. Estimates of Recoveries Depended Only on Age of Loan, Not Economic Conditions: SBA's model also included a separate econometric equation for estimating recoveries, which are the amounts of defaulted loans that were eventually recouped by collection efforts, such as the liquidation of assets. In this equation, the cumulative net recovery rate[Footnote 11] for a cohort of loans was estimated as a function only of the age of the loans in that cohort. In particular, this equation did not include any economic variables, so forecasted recovery rates were estimated to resemble historical recovery rates even though economic conditions in the future might be quite different from the past. According to documentation provided by SBA of the work done to develop this equation, adding economic variables would not have increased the precision of the recovery rate estimates.[Footnote 12] The Model's Estimated Default and Recovery Rates Were in Line with Historical Experience: Our evaluation of the model's estimated default and recovery rates found that these rates were in line with historical experience of the 7(a) program. There are some limitations to evaluating expected future loan performance compared with historical data because over time the economy changes and underwriting criteria and other factors that affect loan performance may also change. Therefore, one would not expect the estimated loan performance to exactly mirror historical experience. However, these types of comparisons are useful to evaluate the model's estimated default and recovery cash flows. Because recently issued loans do not have significant experience and historical data can be summarized in several ways, we evaluated the new model's estimated default and recovery rates compared with historical data in two ways to determine whether the estimates were in line with historical experience. In August 2001, we reported that from fiscal year 1992 through fiscal year 2000, SBA overestimated the cost of the 7(a) program by about $1 billion, primarily because it overestimated defaults by approximately $2 billion. Over this same period, SBA's estimated recoveries closely matched actual loan performance. SBA's prior method to estimate costs was based on averages of historical loan performance. As previously discussed, SBA's current model estimated defaults significantly differently than the prior method in that it considered economic variables and loan specific information. Meanwhile, at the time of our review, the model continued to estimate recoveries based on historical patterns. While it was currently not possible to determine the accuracy of the model's estimated default rate, as shown in the following two figures, the rate appeared to more closely match recent historical experience than SBA's previous method. Figure 2 shows how the model's estimated default rate compared with the estimated default rates calculated with SBA's previous method and with the average default experience of loans issued between 1992 and 2001.[Footnote 13] We could have included more or fewer years of loans in our analysis, but we believe data since 1992 are sufficient to evaluate the model's estimated default rate compared with historical experience because it included several years of loans that have been through their peak default period, which for 7(a) loans is generally between years 2 and 5. Figure 2: Estimated Default Rates Compared with Average Default Experience from 1992 through 2001: [See PDF for image] [End of figure] As previously mentioned, since historical data may be summarized differently, figure 3 shows how the new model's estimated default rate compared with the estimated default rate calculated with SBA's previous method and to actual default experience during fiscal year 2001 for the loans issued since 1986.[Footnote 14] This comparison allowed us to evaluate the estimated default rate over a longer period of time since data from older loans that have been outstanding for a longer period of time was included.[Footnote 15] Figure 3: Estimated Default Rates Compared with Fiscal Year 2001 Actual Default Experience: [See PDF for image] [End of figure] SBA's Model Could Be Enhanced by Adding Information on Borrowers, Correcting Errors, and Updating Some Data: SBA could enhance the reliability of its model's estimates by adding information on both the businesses and the owners to the econometric equations and reestimating the equations and by correcting errors in the model. The econometric equations SBA used at the time of our review to predict default and prepayments included some variables describing the businesses and loans and two economic indicators, GDP and unemployment rates. But they did not include some variables other analysts and financial institutions often use that are associated with businesses and business owners, such as credit scores. In addition, during our review, we found some errors that resulted in underestimating the cost of the 7(a) program that was included in the fiscal year 2004 President's Budget. Correcting these errors would have increased the estimated cost of the program by about $6.5 million. Including Additional Information on Businesses and Business Owners Could Enhance the Model's Reliability: The quantitative relationships between the default and prepayment rates and the current independent variables would probably change if new information were included. In our review of the literature and discussions with large banks, additional information was mentioned as having an influence on defaults and prepayments. The information cited was more detail on the loans, the business, and on business owners, including credit scores.[Footnote 16] Our review of the academic literature and discussions with some commercial lenders indicated that private lenders often include variables SBA did not consider in forecasting the financial performance of small businesses.[Footnote 17] At the time of our review, the current SBA model included loan variables (age and term) and some business variables (new business indicators, form of ownership, and loan amount, among others) but was missing detailed information on businesses that can help predict financial viability. These variables include earnings, capital, payment records, and available collateral, all of which have been shown to affect creditworthiness and likelihood of default. Profit levels, for example, help predict a business's ability to generate cash internally to cover loan payments. Records of debt payments help determine whether a business can cover its obligations, while available collateral tells a lender whether a business has the resources to cover outstanding debts during a financial crisis. Adding and periodically updating this information could enhance the predictive ability of SBA's econometric model by providing more accurate estimates of potential defaults and prepayments. In addition, analysts and banks have found that variables describing business owners can aid in evaluating credit risk, and many large banks have started to underwrite and monitor small businesses using credit scores. Information from business owners' credit records, such as income, personal debt, employment tenure, homeownership status, and previous personal defaults or delinquencies, can help predict delinquencies and defaults in the businesses themselves. Although at the time of our review SBA's current model did not include variables that measure these characteristics, the agency was developing a new loan monitoring system that SBA officials told us was intended to track this type of information. This is an important issue since, if banks use credit scores and the SBA does not, the SBA may be left with riskier loans. SBA could then determine whether such variables also reflect risks in SBA loans and could be used to help evaluate the costs of SBA loan guarantees. SBA's 2004 Subsidy Rate Estimate Included Errors: During our review of the model used to generate the cost estimate of the 7(a) subsidy that was included in the fiscal year 2004 budget, we found errors that resulted in underestimates of program costs of about $6.5 million. Based on the estimated subsidy rate and the projected loan volume included in the fiscal year 2004 President's Budget, the estimated cost of the program was about $94.9 million. If the errors we found had been detected and corrected by SBA before the budget was submitted, the estimated cost of the program with the same projected loan volume would have increased to about $101.4 million. These errors related to SBA's method of estimating recoveries, annual guarantee fee cash flows, and projections of borrower interest rates. First, the recovery estimates were based on the assumption that loans would be issued during fiscal year 2003 instead of during fiscal year 2004, although default and prepayment estimates were based on the later year. As a result, the model estimated that recovery cash flows would occur 1 year early, affecting the net present value[Footnote 18] of the cash flows and the subsidy rate. Second, formulas SBA used to summarize the output of the cash flow segment of the model indicated that the same annual guarantee fees collected during the first quarter of fiscal year 2004 would be collected from about years 5-27, even though the fees would decline as loan balances were paid off. SBA officials indicated that these two errors would be corrected before the submission of the 2005 budget. Third, in estimating the cost of loans issued in the future, SBA assumed the loans would have characteristics similar to those of loans issued during fiscal year 2001. However, SBA did not adjust the borrower interest rates to levels that would be more appropriate for loans to be issued during fiscal year 2004. SBA officials indicated that this adjustment was not necessary because it would not significantly affect the cost of the program. However, SBA had made this adjustment when it calculated the subsidy cost for loans to be issued during fiscal year 2003. When we corrected the previously described errors, the estimated cost of the program for fiscal year 2004 increased by $6.5 million. We also found an error related to estimating prepayment penalties. SBA officials stated that they were aware of this error but believed that fixing it would be complicated and that these cash flows would be immaterial to the cost of the program. In the officials' view, fixing the error would not be cost beneficial. Cohort Data Could Be Updated: In addition, the model could also be further enhanced if SBA were to update the model to include new information as it becomes available. For example, SBA used the 2001 cohort of loans to generate estimates of the 2003 and 2004 subsidy. But, they were not sure if they were going to use the 2002 cohort of loans for the 2005 estimate because they said that updating the cohort is complicated as a result of changes in program policies or in the composition of the 7(a) loan portfolio. However, the model would likely produce more reliable estimates if the most recent loan data were being used to generate the forecast rather than continuing to use an older cohort of loans. SBA Collaborated with OFHEO and OMB to Develop the Model: SBA contracted with OFHEO economists, with expertise in econometric modeling of mortgage defaults and prepayments, to develop its subsidy model, which included determining the variables to be included in the econometric equations. SBA consulted with OMB officials, who are required by FCRA to approve agency subsidy estimates. SBA also hired a private consulting firm to conduct a limited review of the model as part of its ongoing review process to minimize errors in estimating the subsidy. SBA Entered into an Agreement with OFHEO to Develop the Subsidy Model: In February 2002, SBA entered into an agreement with OFHEO to assist in developing the subsidy model. According to SBA staff, they selected OFHEO because it had staff with expertise and experience in econometric modeling and was less expensive than a private contractor.[Footnote 19] According to SBA staff, the OFHEO economists followed a four-step process to develop the model. The first step was refining and building the data set that would be used to generate the estimates. The data set OFHEO used was constructed from the SBA databases that were used to track loan payment history and personal financial information on borrowers. The second step was the design and estimation of the default, prepayment, and recovery equations, including the selection of variables for these equations. The third step of the process was the construction of the cash flow module, and, the fourth step was the construction and testing of the model that OFHEO would deliver for use by SBA. OMB Officials Approved SBA's Model: OMB officials also played a key role in the development of the model because, under FCRA, OMB has final responsibility for approving estimation methodologies and determining subsidy estimates. SBA officials said they consulted with OMB during the model's development until OMB approved it in the fall of 2002. OMB officials told us that they considered the model to be an improvement over the previous method that SBA used to calculate the program subsidy rate because it used better data and the econometric equations allowed for more accurate estimates of future cash flows. In addition, SBA could now use the model to consider both programmatic and economic variables in estimating the subsidy rate. For example, they said SBA could model how such variables as lender type affected the subsidy rate.[Footnote 20] In reviewing the model, OMB officials told us that they focused on the methodology of the model, the cash flow projections, appropriate use of variables in the econometric equations, and the validity of the data used to make the calculations. They approved the model in November 2002. SBA Hired a Private Consulting Firm to Review the Model: SBA hired a private consulting firm to conduct an independent limited review of the model for September 2002 to October 2002, as part of its ongoing process to identify errors before OMB approved the model. The consulting firm assessed the model conceptually and evaluated its underlying computer programming--specifically, the key data inputs that were the primary source of the model's cash flows and the model's programming specifications (to ensure they were correctly coded and that the code functioned properly). The firm also assessed the model's compliance with the relevant statutes and regulations and conducted scenario testing to evaluate how it performed under different economic assumptions. The consulting firm concluded that although the model performed reasonably well in estimating the subsidy cost, SBA had made errors in estimating loan guaranty and servicing fees, the calculation of recoveries, and prepayment penalties. SBA made changes to the model to address the identified discrepancies for fees and recoveries, the net effect of which was, to increase the subsidy rate estimate by about 36 percentage points. The consulting firm also determined that the model lacked adequate documentation and they were, therefore, unable to review the econometric component of the model. However, OFHEO subsequently provided SBA with a report documenting the model's development to a limited extent. Lack of Adequate Model Documentation Hampered Independent Reviews of SBA's Model: In developing its new econometric model, SBA did not prepare adequate supporting documentation to enable independent reviewers to understand and evaluate the process that was used. For example, the independent contractor SBA hired to review the 7(a) credit subsidy model was hampered by the lack of adequate documentation and, as a result, this team's review of the model's theoretical basis and its working features was severely limited. While SBA later developed some general documentation of its model development process, this documentation did not contain, among other things, an adequate discussion of alternative variables, or combinations of variables, that it considered, tested, and rejected, and the reasons for rejecting them. SBA officials told us that they did not prepare this type of documentation because they believed that there was no specific requirement to do so. Current guidance is either silent or unclear about supporting documentation needed to explain the development of econometric models used to generate credit subsidy estimates for the budget and financial statements. Nevertheless, we believe that maintaining adequate documentation on how such models were developed is a sound internal control practice that would provide SBA and other agencies the opportunity to demonstrate and explain the rationale and basis for key aspects of their models that provide important cost information for budgets, financial statements, and congressional decision makers. Moreover, as a practical matter, this documentation would help facilitate SBA's and other agencies' annual financial statement audits. SBA's 7(a) Credit Subsidy Model Documentation Was Inadequate for Outside Reviewers: BearingPoint, the independent contractor hired to perform an initial review of the SBA 7(a) credit subsidy model prior to its finalization, was hampered by the lack of adequate documentation. In response to our inquiry, the contractor stated that the team did not validate the model which, from an audit perspective, would have encompassed a more robust effort. In its final report to SBA, the contractor reported that SBA lacked sufficient supporting documentation for a "thorough review of its [the model's] theoretical basis (including alternative modeling methodologies explored), its working features, or the update and maintenance procedures necessary to use the model on an ongoing basis. This lack of adequate documentation severely limited our ability to assess certain critical parts of the model in detail, including its econometric components." Further, the contractor recommended that "SBA develop a robust set of documentation to support this model" including "the modeling methodology, alternate methodologies considered, data inputs and outputs, and model maintenance and update requirements.": In its January 30, 2004, audit report, Cotton and Company, the independent public accounting firm, identified in its internal control report 9 specific deficiencies in the model's documentation.[Footnote 21] These deficiencies included, for example, a lack of technical references for the statistical method used for the performance of the model, the absence of mathematical specifications, the fact that important variables were not clearly identified, and that units of measure for key variables were not specified. In addition, the audit report stated that the documentation that was provided was "self-contradictory" about the quality of the default and prepayment model and lacked a discussion of the assumptions and limitations of SBA's modeling approach. In responding to the independent public accountant's internal control report, SBA's Chief Financial Officer generally agreed with the report's findings, including the deficiencies in SBA's model documentation, and stated that the internal control report presented "fundamentals of good financial management and SBA is committed to accomplishing as many of these items as possible in the coming year.": In response to BearingPoint's recommendation, SBA's OFHEO contractor prepared some documentation for the model, but this documentation was not sufficient to allow us and SBA's financial statement auditor to gain an adequate understanding of certain key parts of the model development process. For example, the documentation that SBA provided included a broad overview of how the model works, a list of the variables that the final econometric equations included, the estimated coefficients of the equations, and figures showing how well the equations fit the data during the historical period. For some variables, SBA's documentation indicated how the variables were expected to influence default or prepayment probabilities, but did not provide any reasons, conceptual justification, or supporting empirical analysis. Some of these statements seemed intuitive, such as when the output of the economy increases, as measured by the percent change in real GDP, it is expected that default rates will drop. However, other statements were not intuitive. For example, SBA's documentation indicated that larger loans were expected to default at elevated levels and did not include any support for this assertion. Additionally, the model documentation did not explain in sufficient detail why SBA excluded some variables. Rather, the model documentation included a table of 29 variables that were tested and rejected and stated that the information presented was "a list of most variables tested." The documentation also provided a general overview about why these 29 variables were excluded. SBA's documentation stated that "variables were removed for a variety of reasons. Some of the reasons include--insignificant, highly correlated with other variables, low economic importance (significant but impact on probabilities was negligible), inconsistent results (variable was not robust to different specifications), and incoherent results (results could not be reconciled with any economic logic)." While the documentation that SBA provided to us contained acceptable reasons that economists could cite in rejecting variables, the documentation's lack of specificity did not allow us to determine which variables were rejected for which reasons. Further, we were unable to determine whether these were the only criteria or whether they were consistently applied throughout the model development process. SBA and the OFHEO contractor told us that, during the model development process, approximately 800 pages of raw testing information were generated and retained in an electronic file. They further stated that these 800 pages were not organized in any fashion and that there was no summary document or road map with greater detail than the model documentation provided us that would describe the variable-testing process or the results of that process in an understandable fashion. In addition, SBA and the contractor told us that the variables reflected in the 800 pages were not recorded in English words, but rather in mnemonics, and that there was no crosswalk or key still in existence to decode the mnemonics. Based on these representations by SBA and its contractor, we initially concluded that this information would be of questionable or no usefulness in assessing SBA's development of the assumptions and selection of variables used in the modeling process. SBA eventually provided us access to the 800 pages of material that contained some information on variables that were considered and rejected. This document was a partial compilation of analyses conducted during the model development process with no explanation or discussion of what was learned from each analysis conducted. Thus, on its own, this document provided little additional information regarding the process that SBA's contractor followed in developing the econometric equations used in the subsidy model. Further, the document was written in mnemonics and was not organized in any logical manner. In addition, SBA officials could not identify any specific parts of this documentation that related to alternative variables that were considered and rejected during the model development process. Documenting the basis for selecting and rejecting variables from an econometric model used to develop credit subsidy estimates is an important internal control that would also help to provide financial statement auditors reasonable assurance that a bias was not introduced into the credit subsidy estimates by systematically excluding variables to influence the subsidy rate in a particular direction. Statement on Auditing Standards Number 57, Auditing Accounting Estimates (SAS No. 57), states that "even when management's estimation process involves competent personnel using relevant and reliable data, there is potential for bias in the subjective factors." When evaluating the reasonableness of an estimate, the auditor should concentrate on, among other things, "key factors and assumptions that are subjective and susceptible to misstatement and bias." Because of the nature of econometric models and the effect that variables used have on future loan default and prepayment projections, auditors need to understand both what was included and excluded from the model to assess the reasonableness of the credit subsidy estimate from a financial accounting perspective. As our work demonstrated, changing the variables that were included in the model changed the subsidy rate. Because of the lack of adequate documentation on SBA's 7(a) model development process, we were unable to determine whether a bias in selecting variables existed in the model. Further, SBA's lack of adequate documentation on the 7(a) model development process could have impeded our ability to reach a conclusion on SBA's loan accounts in connection with the audit of the consolidated financial statements of the federal government. Specific Guidance on Credit Subsidy Model Development Documentation Is Limited: Currently, there is limited specific guidance on the nature and extent of documentation that agencies must prepare related to the development of models to generate credit subsidy estimates. OMB Circular A-11, Preparation, Submission, and Execution of the Budget, provides guidance on how agencies should prepare credit subsidy estimates. Circular A-11 does not include any guidance to the agencies for documenting their model development process including selection and rejection of variables for use in the models that generate federal credit subsidy estimates. However, Federal Financial Accounting and Auditing Technical Release 6, Preparing Estimates for Direct Loan and Loan Guarantee Subsidies under the Federal Credit Reform Act Amendments to Technical Release 3: Preparing and Auditing Direct Loan and Loan Guarantee Subsidies under the Federal: Credit Reform Act,[Footnote 22] provides some implementation guidance about the nature and extent of documentation agencies should have for their models. Technical Release 6 states that agencies should document the cash flow model(s) used and the rationale for selecting the specific methodologies. Agencies should also document the sources of information, the logic flow, and the mechanics of the model(s) including the formulas and other mathematical functions. In addition, because the model is the basis for budget and financial statement credit subsidy estimates, this documentation also facilitates an OMB budget analyst's review, if the analyst is not involved in the development process, the external financial statement audit, and other independent reviews. Technical Release 6 also states that agency documentation for subsidy estimates and reestimates should be complete and stand on its own, enabling an independent person to perform the same steps and replicate the same results with little or no outside explanation or assistance. In addition, if the documentation were from a source that would normally be destroyed, then copies should be maintained in the file for the purposes of reconstructing the estimate. Technical Release 6 does not specifically address expected documentation of an agency's model development process, including a detailed discussion of alternative variables that are considered, the reasons for their rejection, and specific examples based on results of earlier regressions. Nevertheless, in our view, the documentation principles in this Technical Release represent sound internal control practice that could also be applied to an agency's development of a model used to generate budget and financial statement credit subsidy estimates. Such documentation would introduce transparency into an agency's budget process and enable agencies' models and the resulting estimates to withstand scrutiny and inquiry from independent reviewers. For example, such documentation would allow validation of an agency's model by independent reviewers, and provide reasonable assurance that the agency selected and rejected assumptions and variables for the model on a sound basis. Further, this documentation would help demonstrate to congressional stakeholders sound decision making and stewardship over millions of dollars in appropriated funds. SBA Had a Process to Help Ensure Data Quality and the Data Used in the Model and SBA's Loan Level Databases Were Consistent: Calculating a reliable credit subsidy estimate requires that the key cash flow data, such as defaults or recoveries and the timing of these events be reliable, or the credit subsidy estimate could be affected. Internal control standards call for agencies to have a process to help ensure the completeness, accuracy, and validity of all transactions processed. SBA's monthly reconciliation process, combined with lender incentives and loan sales, helped ensure the quality of the underlying data used in its credit subsidy estimation process. Although at the time of our review, some errors in its data existed in SBA's databases, the nature and magnitude of these errors was unlikely to significantly alter the subsidy rate. Further, we tested the data used by SBA's new econometric model and found them to be consistent with the data in SBA's loan systems at the time of our review. SBA Had a Process to Identify and Correct Data Errors: The primary method that SBA used to help ensure the integrity of its loan data is its Form 1502 reconciliation process. Reconciliations are an important internal control established to ensure that all data inputs are received and are valid and all outputs from a particular system are correct. This process, which has been in effect since October 1997, utilized an SBA contractor to conduct monthly matches of borrower data submitted by 7(a) program lenders on SBA's Form 1502 to the information in the agency's Portfolio Management Query Display System to help ensure the completeness and accuracy of the agency's data. The information on the Form 1502 included a wide variety of data for an individual loan, some of which was used in the credit subsidy estimation process, and included, among other things, loan identification number; loan status such as current, past due, or in liquidation; loan interest rate; the portion of the loan guaranteed by SBA; and the ending balance of the loan's guaranteed portion. Errors identified by this match were loaded each month into SBA's Portfolio Management Guaranty Information System, and it was accessed by the various district office staff to work with lenders to correct the erroneous data. Although we did not independently test the data match conducted by SBA's contractor or the field office staff's correction of identified errors, we reviewed summary reports of the errors in the Guaranty Loan Reporting System for each district office over a 4 month period during fiscal year 2003 and found that most of these reported errors were resolved during the month the errors were identified. During the months we reviewed, the percentage of errors resolved ranged from a low of about 65 percent to a high of nearly 89 percent.[Footnote 23] Although one month we reviewed had only a 65 percent resolution rate, leaving 4,860 errors uncorrected at the end of the month, as explained in the following paragraph, not all of these errors would affect the subsidy estimate and this number is relatively small compared to the large volume of loan transaction level data used in the credit subsidy estimation process. Our review of the underlying data used in the model showed that about 5.7 million data records were used to record the quarterly loan performance of 392,315 loans from 1988-2001. In order to assess whether the remaining errors in SBA's data base would likely have a significant affect on the credit subsidy estimation process, we reviewed the 38 different error codes that are reported monthly by the Guaranty Loan Reporting System and found that less than half of these error codes were related to data used by the econometric model and, as a result, could have affected the credit subsidy estimate. For example, the Guaranty Loan Reporting System identified errors for lender contact name and phone number--data that were not used by the new econometric model and would not affect the subsidy estimate. Other error codes relating to the guaranteed portion principal balance or whether a loan was in liquidation status could affect the credit subsidy estimate if the number of errors and their dollar volume were significant. We reviewed a 6-month summary error report from the Guaranty Loan Reporting System for activity between February and July 2003 and found that, for those error codes that could affect the credit subsidy estimate, only two of these codes had error rates that exceeded 1 percent of the transactions. One of these codes indicated that the loan status was not correct because the loan was in liquidation and had an average error rate of about 1.4 percent for the 6-month period we reviewed. The other error code indicated that the bank did not report any information for a particular loan and had an average error rate of about 2.4 percent for the same time period. The remaining 11 error codes that could have affected the credit subsidy estimate had rates of less than 1 percent. We assessed the error rates on this report in aggregate to determine if these could affect the credit subsidy estimate and found that the average aggregate error rate was about 6.5 percent during this period. However, given that most of these errors were corrected in the month the error was identified, it was unlikely that the remaining uncorrected errors would affect the credit subsidy estimate at the time of our review. Lender Incentives and Loan Sales Help Ensure Data Integrity: In addition to the monthly loan data reconciliation process, lender incentives also helped ensure the integrity of the underlying data used in the credit subsidy estimates. In accordance with current SBA policy, the agency can reduce or completely deny a lender's claim payment if the defaulted loan data are not correct. According to SBA officials, this policy gives the 7(a) program lenders an incentive to correct data errors because it helps ensure they will be paid the full guarantee amount if the borrower subsequently defaults on the loan. SBA provided us with repair and denial data for fiscal years 1999 through the first three quarters of fiscal year 2003 showing that the agency exercised these options 2,177 times during this time, totaling at least $69.9 million.[Footnote 24] Further, an ancillary benefit of SBA's loan sales program was to help ensure data integrity. Prior to a sale, SBA district office staff, as well as contractors, reviewed loan files as part of the "due diligence" reviews to provide accurate information about the loans available for sale to potential investors so that they may make informed bids. SBA officials told us that prior to selling a loan, discrepancies between the lenders' data and SBA had to be resolved. Data Used by the Econometric Model Were Consistent with SBA Databases: In order to assess the consistency between the data used in SBA's econometric approach and the data in SBA's loan system, we selected and tested a stratified random sample of 400 items to test key data that could affect the credit subsidy estimate and found no errors.[Footnote 25] Specifically, we randomly selected 100 default and recovery transactions and compared the amounts and transaction dates between the loan system data and loan-level data used for the credit subsidy estimate. In addition, we randomly selected 100 loans identified by the model to be prepaid and reviewed the loan histories in SBA's database and determined that all of these loans were paid off prior to their scheduled termination date. Further, we tested 100 additional loans and compared their status such as current, paid off, or default to ensure their status in the model was correct and found no errors. We also assessed the magnitude of 7(a) loans that were excluded from the model in order to determine whether excluding these potentially valid loans would likely affect the credit subsidy estimate. Our earlier work on SBA's previous 7(a) credit subsidy model that primarily used historical averages of defaults and recoveries found that excluding loans from certain years that had higher default rates would lower the overall average default rate. Excluding large numbers of loans from this model would likely have a similar effect on the estimated subsidy rate. To assess the magnitude of excluded loans, we reviewed the computer coding for the econometric model and found that SBA excluded loans when critical data for the model were missing such as the initial disbursement date, the loan amount, or demographic information on the borrowers. For most of the years between 1988 and 2001, the number of loans excluded because they lacked these essential data ranged from 1 percent to 2 percent and overall, we concluded that the degree of excluded loans was acceptable and would not significantly affect the credit subsidy estimation calculation, at the time of our review. Conclusions: Overall, we found that from an economics perspective, SBA's econometric equations for its 7(a) credit subsidy model were reasonable. However, from an audit perspective, SBA's lack of adequate documentation of the model development process precluded us from (1) independently evaluating the model's development; (2) determining whether SBA used a sound and consistently applied method to select and reject variables to be included in the model; and (3) determining whether a bias in selecting variables existed in the model. Based on our review, SBA's econometric equations for estimating defaults, prepayments, and recoveries, which were used to derive the estimate of its fiscal year 2004 subsidy costs, were reasonable. This model's methodology has the potential to produce more reliable estimates than the previous method of using historical averaging to project the estimated program cash flows because this model relies on economic reasoning in addition to historical program data. However, the precision of any econometric model is limited because any estimate produced by such a model should be considered one point in a range within which the actual subsidy cost will likely fall. Because the budget process requires agencies to select a specific estimate rather than project a range, there will likely be some variance between the forecasted and actual subsidy amounts. Using additional data that SBA anticipates gathering in its new loan monitoring system, such as borrower-specific data, could further enhance the reliability of SBA's estimates of the subsidy cost. Therefore, further enhancements could produce more reliable results. Although the errors we identified in the model did not materially affect the subsidy cost estimate, they did indicate that the process SBA used to validate the model could be improved. Therefore, it is important to invest the resources needed to periodically reevaluate the underlying assumptions of any model to ensure that they are correct and comprehensive, and that any errors or erroneous assumptions are corrected so that the model continues to yield reasonable results. While we found SBA's equations to be reasonable from an economics perspective, the lack of adequate documentation of the model's development process hampered three independent reviews of the 7(a) model. Notwithstanding the current lack of clear OMB Circular A-11 guidance, SBA could benefit from applying the documentation principles embodied in Technical Release 6 to the development of the 7(a) econometric model and other credit subsidy estimation models it has recently developed or is currently developing. Without adequate documentation, SBA will be unable to transparently demonstrate the rationale and basis for key aspects of models that provide important cost information for budgets, financial statements, and congressional decision makers. Although OMB provides guidance on how agencies should prepare credit subsidy estimates in Circular A-11, it does not include any guidance to the agencies for documenting their model development process including the selection and rejection of variables for use in the models that generate federal credit subsidy estimates. A lack of improved OMB guidance for model documentation will continue to hamper adequate external oversight and validation of models used to generate credit subsidy estimates. Recommendations for Executive Action: We are making three recommendations to SBA and one to OMB. To further enhance the reliability of SBA's subsidy estimates, we recommend that the SBA Administrator take the following two actions: * determine how best to include in future subsidy models borrower- specific information, such as credit scores and loan-to-value ratios, to be collected in the new loan monitoring system; and: * ensure that the model remains reasonable by establishing a process for periodically evaluating the model to correct any errors and revising it to reflect changes in the 7(a) business loan program or other factors that could affect the subsidy estimate. To demonstrate and explain the rationale and basis for the 7(a) econometric model and all other models developed, we recommend that the SBA Administrator take the following action: * prepare and retain adequate documentation of the model development process including a detailed discussion of the alternative variables or combinations of variables that were considered, tested, and rejected, as well as the reasons for rejecting them. To facilitate (1) validation of models used to generate credit subsidy estimates, (2) external oversight, and (3) financial statement audits, we recommend that the Director, OMB, take the following action: * revise OMB Circular A-11 to require that agencies document the development of their credit subsidy models, including the process followed for selecting modeling methodologies over alternatives, and variables tested and rejected, along with the basis for excluding them. Agency Comments and Our Evaluation: We provided an initial draft and a revised draft, based on our review of additional model documentation, to both SBA and OMB for review and comment. While our initial draft was at the agencies for comment, we continued to pursue additional documentation that SBA had to further explain its 7(a) model development process, including what variables were selected, rejected, and why. When we eventually obtained access to the 800 pages of SBA material, we determined that it was not organized and included no road map to describe the variable testing process or its results. We concluded that this information was of questionable or no usefulness to our assessment of SBA's modeling process. We addressed the weaknesses in SBA's documentation in the revised draft report and provided it to SBA and OMB for comment. In commenting on the initial draft, SBA's Chief Financial Officer (CFO) generally agreed with our findings and the first two recommendations related to actions to further enhance the reliability of the model's subsidy estimates. OMB did not provide any comments on the initial draft report. We received comments on the revised draft from SBA's CFO who generally disagreed with our findings and recommendations related to the lack of adequate documentation supporting the model's development process. We also received comments on the revised draft from the OMB Assistant Director for Budget and the Controller who disagreed with our recommendation that OMB revise Circular A-11. Their written comments are reprinted in appendixes III and IV, respectively, and are summarized below. Both agencies provided technical comments that we have incorporated into the report as appropriate. In commenting on our final draft report, SBA stated that it had provided us with extensive documentation, briefings, and explanations about how the model was developed. We met with SBA officials and their contractor who constructed the model and discussed their methodology, but we were unable to corroborate this information with the documentation they subsequently provided. SBA's comment letter stated that it provided us with 800 pages of material that contained some information on variables that were considered and rejected. During our subsequent review of this material, we found that this documentation was a partial compilation of analyses conducted during the model development process with no explanation or discussion of what was learned from each analysis conducted. After reviewing all of this documentation, as discussed in the report, we concluded that it provided little additional information to enable us to understand and corroborate the process and criteria that SBA used to select and reject variables for its 7(a) model. Our conclusions regarding the lack of adequate documentation for the model's development process were consistent with those of both the independent contractor SBA hired to review the model in 2002 prior to its implementation and the independent public accounting firm that audited SBA's fiscal year 2003 financial statements. As part of its January 30, 2004, audit report, the independent public accounting firm identified in its internal control report 9 specific deficiencies in the model's documentation. These deficiencies included, for example, a lack of technical references for the statistical method used for the performance of the model, the absence of mathematical specifications, that important variables were not clearly identified, and that units of measure for key variables were not specified. In addition, the audit report stated that the documentation that was provided was "self- contradictory" about the quality of the default and prepayment model and lacked a discussion of the assumptions and limitations of SBA's modeling approach. While SBA's CFO agreed with the independent accounting firm's findings regarding the lack of adequate documentation for the credit subsidy model, he disagreed with similar weaknesses identified in our report. SBA disagreed that its lack of adequate documentation on the 7(a) model development process could impede our ability to reach a conclusion about SBA's loan accounts in connection with the audit of the consolidated financial statements of the federal government. Instead, SBA believed mandating additional documentation would establish a new and unnecessary requirement. Our comment was in regard to our responsibility as the auditor of the consolidated financial statements of the federal government and does not establish a new or unnecessary requirement for SBA. For the consolidated financial statement audit, we evaluate the reasonableness of credit program estimates based on audit guidance in SAS No. 57.[Footnote 26] In auditing estimates, SAS No. 57 states that an auditor should consider, among other things, the process used by management to develop the estimate, including determining whether or not (1) relevant factors were used, (2) reasonable assumptions were developed, and (3) biases influenced the factors or assumptions. SBA's lack of adequate documentation of the 7(a) model development process impaired our ability to make such an assessment. OMB disagreed with the recommendation that Circular A-11 should be revised and believed that the report did not demonstrate that revisions were needed. OMB officials commented that they worked closely with SBA during the model development process and believed that the documentation SBA provided to OMB was adequate for them to determine that the subsidy estimates and reestimates were reasonable. OMB also did not concur with our statement that a lack of improved OMB guidance hampered adequate external oversight. Unlike OMB, in this case, we and other external reviewers did not have the opportunity to work with SBA during the model development process and, as a result, relied on oral explanations and documentation provided by SBA staff and its contractor who developed the model. Further, we attempted to corroborate SBA's statements with the documentation that SBA provided. However, as we reported, three independent external reviews of SBA's 7(a) model were hampered by a lack of adequate documentation of SBA's model development process. We reaffirm our conclusion that adequate documentation is needed for the SBA 7(a) model's development and that independent external review and oversight will continue to be hampered without a requirement to provide adequate documentation about how econometric models are developed. OMB stated that Ernst and Young was able to independently validate SBA's 7(a) model with the available documentation. According to OMB, this firm stated that the 7(a) model assumptions and methodology appeared to be reasonable and accurate. We obtained and reviewed the reports OMB cited and found that the firm was not hired to validate or review the same segments of the model that we reviewed. This series of reports was related to the cash flow module of the 7(a) model, as well as the model used to calculate reestimates, but did not review the econometric equations or the model's development process. In its report, the firm explicitly stated that it was not reviewing the same parts of the model that we reviewed. We confirmed this information in conversations with the accounting firm's engagement partner and concluded that this firm's work was not relevant to the findings and conclusions presented in our report. OMB also commented that SAS No. 57 states that internal controls over accounting estimates may or may not be documented. While SAS No. 57 does state that the process for preparing accounting estimates may not be documented, it also states that auditors should assess whether there are additional key factors or alternative assumptions that need to be included in the estimate and assess the factors that management used in developing the assumptions. Further, SAS No. 57 states that auditors should concentrate on key factors and assumptions that are subjective and susceptible to misstatement and bias. We believe this includes the selection and rejection of variables that can be included in the model. Without adequate documentation on the credit subsidy model development process, it is difficult for auditors to fulfill their responsibilities to assess these areas. OMB also commented that SBA fulfilled the management responsibilities described in SAS No. 57 regarding internal controls for accounting estimates. We disagree with this statement and point out that SAS No. 57 provides guidance for auditing accounting estimates as part of conducting financial statement audits rather than directing agency management's actions. Management's responsibility for internal controls are contained in our "Standards for Internal Control in the Federal Government," which states, among other things, that "internal control and all transactions and other significant events need to be clearly documented, and the documentation should be readily available for examination."[Footnote 27] Further, as previously stated, Cotton and Company also identified the lack of adequate model documentation as an internal control weakness. Moreover, SBA's CFO generally agreed with the independent public accountant's report's findings, including the deficiencies in SBA's model documentation, and stated that the internal control report presented "fundamentals of good financial management and SBA is committed to accomplishing as many of these items as possible in the coming year.": OMB also stated that requiring agencies to prepare additional documentation of the variables tested and rejected would be unduly burdensome. We disagree with this statement and note that this documentation would only need to be prepared when a model is developed or when significant updates are implemented. Further, this requirement would be consistent with other segments of OMB Circular A-11 that require agencies to provide supporting documentation for their budget submissions. However, as we mentioned in the report, there is currently no explicit guidance for agencies to document the development of the models that are used to generate credit subsidy estimates. OMB also commented that we received sufficient information to test alternative variables to measure the reasonableness of the final SBA credit subsidy model. We note that our work demonstrated that using additional variables that were also reasonable changed the subsidy estimate. We believe that this work highlights the need for agencies to document their basis for rejecting variables or combinations of variables from their final credit subsidy models. By documenting this work, agencies will be able to demonstrate to independent reviewers that a bias from variable selection does not exist in the final model. Both agencies provided technical comments that we incorporated into the report as appropriate. The written comments of both agencies are reprinted in appendixes III and IV. We are sending copies of this report to the Chair of the Senate Committee on Small Business and Entrepreneurship, other appropriate congressional committees, the Administrator of the Small Business Administration, and the Director of the Office of Management and Budget. We also will make copies available to others upon request. In addition, the report will be available at no charge on the GAO Web site at [Hyperlink, http://www.gao.gov]. If you have any questions about this report, please contact me at (202) 512-8678 or [Hyperlink, dagostinod@gao.gov] or Katie Harris, Assistant Director, at (202) 512-8415 or [Hyperlink, harrism@gao.gov]. Key contributors to this report are listed in appendix V. Signed by: Davi M. D'Agostino Director, Financial Markets and Community Investment: [End of section] Appendixes: [End of section] Appendix I: Objectives, Scope, and Methodology: As agreed with your staff, we (1) assessed the reasonableness of the model's econometric equations and evaluated the model's estimated default, prepayment, and recovery rates based on the 7(a) program's recent historical loan experience; (2) identified additional steps the SBA could take to further enhance the reliability of its subsidy estimate produced by the model; (3) reviewed SBA's process for developing the subsidy model; (4) evaluated the model's supporting documentation, including its discussion of what variables were tested and rejected; and (5) determined what steps SBA has taken to ensure the integrity of the data used in the model and determined whether these data are consistent with information in its databases. We did not validate SBA's model. Assessing the Reasonableness of the Model's Econometric Equations and Evaluating the Model's Estimated Default, Prepayment, and Recovery Rates: To analyze the model, we obtained from SBA copies of the model as approved by the Office of Management and Budget (OMB), along with the loan-level data that were used to develop the subsidy estimates. We analyzed the econometric equations to determine whether they were reasonable based on the variables they included, the statistical techniques used, and the results obtained. For example, we determined whether the econometric equations included appropriate variables and whether the variables used in the equations were statistically significant. To evaluate the model's estimated default and recovery rates, we compared these rates with recent historical loan experience of the 7(a) program provided by SBA. Using SBA's data, we also calculated what SBA would have estimated for default and recovery rates based on the estimation methodology it used prior to its fiscal year 2003 budget submission. (See app. II for a detailed discussion of our analysis of the reasonableness of the model's econometric equations.): Identifying Additional Steps SBA Could take to Further Enhance the Reliability of the Model: To identify additional steps SBA could take to enhance the reliability of its model, we considered additional types of data that SBA might collect and consider including in its econometric equations. As part of this analysis, we reviewed the academic literature on default modeling and interviewed officials with several banks engaged in similar efforts. Reviewing SBA's Process of Developing the Subsidy Model: To determine SBA's process for developing the model, we met with SBA officials in the Chief Financial Office who were responsible for estimating the 7(a) program subsidy costs. We also met with OMB officials who were responsible for approving the model. Finally, we also reviewed available documentation on the model's development provided by SBA and the report by the private consultant who reviewed the model. Evaluating the Model's Supporting Documentation, Including Its Discussion of What Variables Were Tested and Rejected: To evaluate the model's supporting documentation, including its discussion of what variables were tested and rejected, we obtained and analyzed available relevant documents and met with SBA officials and their contractor who developed the model. We compared the information presented in SBA's model documentation with existing credit subsidy guidance including OMB Circular A-11 and Federal Financial Accounting and Auditing Technical Release 6: Preparing Estimates for Direct Loan and Loan Guarantee Subsidies under the Federal Credit Reform Act Amendments to Technical Release 3: Preparing and Auditing Direct Loan and Loan Guarantee Subsidies under the Federal Credit Reform Act. We also assessed the impact the lack of documentation would have on SBA's financial statement audit by comparing the documentation with Statement on Auditing Standards Number 57, Auditing Accounting Estimates. SBA and its contractor told us that 800 pages of raw testing information contained in an electronic file was not organized in any fashion, and that there was no summary document or road map that had greater detail than the model documentation provided us that described the variable- testing process or the results of that process in an understandable fashion. In addition, SBA and the contractor told us that the variables reflected in the 800 pages were not recorded in English words, but rather in mnemonics, and that there was no crosswalk or key still in existence to decode the mnemonics. Thus, no documentation existed that would link the variable names used in the programming to a table of variable descriptions. We obtained and reviewed a copy of this documentation and confirmed the representations of SBA and its contractor. Determining What Steps SBA Took to Ensure the Integrity of the Data Used in the Model and Whether These Data Were Consistent with Information in Its Databases: To determine what steps SBA took to ensure the integrity of the data used by the model, we met with SBA officials to gain a general understanding of the agency's data integrity efforts. We also assessed the number of errors that were resolved by the district offices each month by analyzing 4 months of fiscal year 2003 field office activity from the Form 1502 Guaranty Loan Reporting System. We further assessed whether the remaining errors at the end of the month would likely affect the credit subsidy estimate by analyzing the types of errors tracked by the system and determining which errors affected data used by the new model. We also assessed the magnitude of these errors by analyzing 6 months of fiscal year 2003 activity in the Guaranty Loan Reporting System. To determine whether the data in the new model was consistent with data in SBA's loan-level databases, we selected and tested a stratified random sample of 400 key data elements that could affect the credit subsidy estimate.[Footnote 28] Specifically, we randomly selected 100 default and 100 recovery transactions and compared the amounts and transaction dates between the loan system data and loan-level data used for the credit subsidy estimate; 100 loans identified by the model to be prepaid and reviewed the loan histories in SBA's database to determine whether all of these loans were paid off prior to their scheduled termination date; 100 additional loans and compared their status such as current, paid off, or default to determine if their status in the model agreed with SBA's loan-level databases. [End of section] Appendix II: Analysis of Default, Prepayment, and Recoveries Econometric Equations: This appendix provides more detail on the three econometric equations that the Small Business Administration (SBA) used to estimate the subsidy rate for its 7(a) loan guarantee program and the expanded equations that we developed. These equations are used to forecast defaults, prepayments, and recoveries. The first section of this appendix describes the variables that SBA used in the default and prepayment equations and presents SBA's estimated coefficients. The second section explains how we created the variable that we used to represent the borrower's industry and presents the estimated coefficients from our expanded default and prepayment equations. The third section describes the equation that SBA used to forecast recoveries and presents the estimated coefficients from that equation. SBA's Default and Prepayment Equations: In its new model for estimating the subsidy rate for the 7(a) loan program, SBA uses multinomial logistic regression to estimate the likelihood of defaults and prepayments as functions of a variety of explanatory variables. Because multinomial regression is a simultaneous estimation process, the default and prepayment equations are identically specified (that is, the same explanatory variables are used in each equation). SBA conducts its analysis at the level of the individual loan, using loans that were disbursed from 1988 through 2001. For each loan, SBA's data set contains an observation for each quarter that the loan is active. For example, if a loan prepays at the end of the third year (counting the disbursement year as the first year), then it is active during 12 quarters and, therefore, there are 12 observations for that loan in the data set. For each observation, the dependent variable measures whether in that quarter the borrower defaults on the loan, prepays the loan, or keeps it active. As a result, the coefficients in the default or prepayment equation are estimates of the association of each explanatory variable with the likelihood of the loan defaulting or prepaying in that quarter. There are several categories of explanatory variables included in the default and prepayment equations. The first group consists of a set of dummy variables that indicate the age of the loan. These variables thus serve to reflect the fact that prepayment and default behavior change as a loan seasons. Specifically, there is a dummy variable for each of the first ten quarters of the life of a loan. From the eleventh quarter to the thirty-fourth quarter, there is a dummy variable for each two consecutive quarters. Finally, if a loan remains active past an age of thirty-four quarters, there is one more dummy variable. The second set of explanatory variables concern loan characteristics. A set of dummy variables indicates the contractual term of the loan at origination. The categories are less than 5 years, 5 to up to 10 years, 10 years to up to 15 years, and 15 years or greater. Less than five years serves as the omitted category in the regression. Loan amount is another characteristic and is measured in millions of dollars. SBA also includes a dummy variable that shows whether a loan was delivered through the SBA Express Program. Also known as Subprogram 1027, this program allows lenders to originate a loan using their own loan documents instead of SBA documents and processing, but the loan guarantee is only up to 50 percent. By comparison, the typical SBA guarantee is almost 80 percent. Finally, there is a set of dummy variables for type of lender: Regular, Preferred, and Certified. In the regression, the regular type serves as the omitted category. The next set of explanatory variables provides information on the borrower. A set of dummy variables identifies ownership structure. The categories are sole proprietorship, corporation, or partnership. Sole proprietorship is the omitted category in the regression. An additional dummy variable indicates whether the borrower is a new business. Finally, there is a set of dummy variables that indicate the U.S. Census Bureau region where the borrower is located. The final set of explanatory variables contains two measures of economic conditions. The first is the state unemployment rate where the borrower is based. The source for these data is the U.S. Bureau of Labor Statistics. The second is the quarterly percentage change in gross domestic product. SBA obtained these data from the U.S. Bureau of Economic Analysis.[Footnote 29] Table 1 summarizes the explanatory variables. Table 1: Variable Names and Descriptions: Variable name; Age dummy variables: i1; Variable description: 1 if loan is 1 quarter old, else 0. Variable name; Age dummy variables: i2; Variable description: 1 if loan is 2 quarters old, else 0. Variable name; Age dummy variables: i3; Variable description: 1 if loan is 3 quarters old, else 0. Variable name; Age dummy variables: i4; Variable description: 1 if loan is 4 quarters old, else 0. Variable name; Age dummy variables: i5; Variable description: 1 if loan is 5 quarters old, else 0. Variable name; Age dummy variables: i6; Variable description: 1 if loan is 6 quarters old, else 0. Variable name; Age dummy variables: i7; Variable description: 1 if loan is 7 quarters old, else 0. Variable name; Age dummy variables: i8; Variable description: 1 if loan is 8 quarters old, else 0. Variable name; Age dummy variables: i9; Variable description: 1 if loan is 9 quarters old, else 0. Variable name; Age dummy variables: i10; Variable description: 1 if loan is 10 quarters old, else 0. Variable name; Age dummy variables: i1112; Variable description: 1 if loan is 11 or 12 quarters old, else 0. Variable name; Age dummy variables: i1314; Variable description: 1 if loan is 13 or 14 quarters old, else 0. Variable name; Age dummy variables: i1516; Variable description: 1 if loan is 15 or 16 quarters old, else 0. Variable name; Age dummy variables: i1718; Variable description: 1 if loan is 17 or 18 quarters old, else 0. Variable name; Age dummy variables: i1920; Variable description: 1 if loan is 19 or 20 quarters old, else 0. Variable name; Age dummy variables: i2122; Variable description: 1 if loan is 21 or 22 quarters old, else 0. Variable name; Age dummy variables: i2324; Variable description: 1 if loan is 23 or 24 quarters old, else 0. Variable name; Age dummy variables: i2526; Variable description: 1 if loan is 25 or 26 quarters old, else 0. Variable name; Age dummy variables: i2728; Variable description: 1 if loan is 27 or 28 quarters old, else 0. Variable name; Age dummy variables: i2930; Variable description: 1 if loan is 29 or 30 quarters old, else 0. Variable name; Age dummy variables: i3132; Variable description: 1 if loan is 31 or 32 quarters old, else 0. Variable name; Age dummy variables: i3334; Variable description: 1 if loan is 33 or 34 quarters old, else 0. Variable name; Age dummy variables: i35p; Variable description: 1 if loan is older than 34 quarters, else 0. Variable name; Loan characteristics: t5_10; Variable description: 1 if term of loan is at least 5 years but less than 10, else 0. Variable name; Loan characteristics: t10_15; Variable description: 1 if term of loan is at least 10 years but less than 15, else 0. Variable name; Loan characteristics: t15p; Variable description: 1 if term of loan is 15 years or more, else 0. Variable name; Loan characteristics: sub1027; Variable description: 1 if loan delivered through SBA Express Program, else 0. Variable name; Loan characteristics: loan_amt; Variable description: Gross guaranteed disbursed amount in millions. Variable name; Loan characteristics: Lender_PLP; Variable description: 1 if lender is part of the Preferred Lender Program, else 0. Variable name; Loan characteristics: Lender_CLP; Variable description: 1 if lender is part of the Certified Lender Program, else 0. Variable name; Borrower characteristics: Corporation; Variable description: 1 if borrower is incorporated, else 0. Variable name; Borrower characteristics: Partnership; Variable description: 1 if borrower is a partnership, else 0. Variable name; Borrower characteristics: NewBusiness; Variable description: 1 if borrower is a new business, else 0. Variable name; Borrower characteristics: Northeast; Variable description: 1 if located in U.S. Census Bureau's Northeast Region, else 0. Variable name; Borrower characteristics: Midwest; Variable description: 1 if located in U.S. Census Bureau's Midwest Region, else 0. Variable name; Borrower characteristics: South; Variable description: 1 if located in U.S. Census Bureau's South Region, else 0. Variable name; Economic conditions: Urate; Variable description: Unemployment rate in the state where firm is located. Variable name; Economic conditions: pc_gdp96; Variable description: Quarterly percent change in constant dollar GDP. Source: GAO. [End of table] The coefficients in the SBA equations indicate that the probability of both defaults and prepayments generally increase and then decline as a loan seasons. Defaults peak during the eighth quarter while prepayments peak around quarters 27 and 28. Longer-term loans are less likely to default or prepay. By comparison, larger loans are more likely to default or prepay. Good economic conditions, as reflected by the coefficients on unemployment and the percentage change in gross domestic product, reduce the chances of default and increase the likelihood of prepayment. The positive coefficients on the variable for new business indicate that such firms are more likely to default and prepay. Corporations and partnerships are less likely to default and more likely to prepay than sole proprietors. Finally, loans granted under Subprogram 1027 are less likely to default and more likely to prepay. Table 2 presents the coefficients in SBA's default and prepayment equations as well as some summary statistics. Table 2: Multinomial Logistic Regression Coefficient Estimates[A]: Variables: Constant; Predicting to defaults: Base model: -9.7650; Predicting to prepayments: Base model: -5.2762. Variables: i1; Predicting to defaults: Base model: 2.1151; Predicting to prepayments: Base model: 1.1203. Variables: i2; Predicting to defaults: Base model: 3.1174; Predicting to prepayments: Base model: 1.6016. Variables: i3; Predicting to defaults: Base model: 3.8158; Predicting to prepayments: Base model: 1.9374. Variables: i4; Predicting to defaults: Base model: 4.2247; Predicting to prepayments: Base model: 2.1063. Variables: i5; Predicting to defaults: Base model: 4.5187; Predicting to prepayments: Base model: 2.2865. Variables: i6; Predicting to defaults: Base model: 4.6659; Predicting to prepayments: Base model: 2.4113. Variables: i7; Predicting to defaults: Base model: 4.7487; Predicting to prepayments: Base model: 2.5805. Variables: i8; Predicting to defaults: Base model: 4.8211; Predicting to prepayments: Base model: 2.7080. Variables: i9; Predicting to defaults: Base model: 4.8068; Predicting to prepayments: Base model: 2.8163. Variables: i10; Predicting to defaults: Base model: 4.8121; Predicting to prepayments: Base model: 2.9133. Variables: i1112; Predicting to defaults: Base model: 4.8033; Predicting to prepayments: Base model: 3.0540. Variables: i1314; Predicting to defaults: Base model: 4.7772; Predicting to prepayments: Base model: 3.1439. Variables: i1516; Predicting to defaults: Base model: 4.7101; Predicting to prepayments: Base model: 3.3111. Variables: i1718; Predicting to defaults: Base model: 4.6214; Predicting to prepayments: Base model: 3.4554. Variables: i1920; Predicting to defaults: Base model: 4.6136; Predicting to prepayments: Base model: 3.6945. Variables: i2122; Predicting to defaults: Base model: 4.5156; Predicting to prepayments: Base model: 3.5201. Variables: i2324; Predicting to defaults: Base model: 4.4297; Predicting to prepayments: Base model: 3.6685. Variables: i2526; Predicting to defaults: Base model: 4.2945; Predicting to prepayments: Base model: 3.8222. Variables: i2728; Predicting to defaults: Base model: 4.3414; Predicting to prepayments: Base model: 4.0106. Variables: i2930; Predicting to defaults: Base model: 4.2515; Predicting to prepayments: Base model: 3.6142. Variables: i3132; Predicting to defaults: Base model: 4.2036; Predicting to prepayments: Base model: 3.7143. Variables: i3334; Predicting to defaults: Base model: 4.1378; Predicting to prepayments: Base model: 3.7914. Variables: i35p; Predicting to defaults: Base model: 4.1027; Predicting to prepayments: Base model: 3.9950. Variables: t5_10; Predicting to defaults: Base model: -0.0462[A]; Predicting to prepayments: Base model: -0.6568. Variables: t10_15; Predicting to defaults: Base model: -0.7596; Predicting to prepayments: Base model: -1.1013. Variables: t15p; Predicting to defaults: Base model: -0.7395; Predicting to prepayments: Base model: -1.1014. Variables: sub1027; Predicting to defaults: Base model: -0.5800; Predicting to prepayments: Base model: 0.0812. Variables: loan_amt; Predicting to defaults: Base model: 0.2578; Predicting to prepayments: Base model: 0.1189. Variables: corporation; Predicting to defaults: Base model: -0.0434; Predicting to prepayments: Base model: 0.0989. Variables: partnership; Predicting to defaults: Base model: -0.1982; Predicting to prepayments: Base model: 0.0211[A]. Variables: northeast; Predicting to defaults: Base model: 0.3612; Predicting to prepayments: Base model: -0.2054. Variables: midwest; Predicting to defaults: Base model: 0.2184; Predicting to prepayments: Base model: -0.1869. Variables: south; Predicting to defaults: Base model: 0.4142; Predicting to prepayments: Base model: -0.0928. Variables: Lender_PLP; Predicting to defaults: Base model: -0.1761; Predicting to prepayments: Base model: 0.0824. Variables: Lender_CLP; Predicting to defaults: Base model: -0.1688; Predicting to prepayments: Base model: -0.0014[B]. Variables: NewBusiness; Predicting to defaults: Base model: 0.2773; Predicting to prepayments: Base model: 0.0678. Variables: urate; Predicting to defaults: Base model: 0.1043; Predicting to prepayments: Base model: -0.0957. Variables: Pc_gdp96; Predicting to defaults: Base model: -0.1261; Predicting to prepayments: Base model: 0.0661. Summary statistics for multinomial logistic regression models: N of Observations; Predicting to prepayments: Base model: 5,736,628. Summary statistics for multinomial logistic regression models: Variables: Likelihood Ratio Chi Sq; Predicting to prepayments: Base model: 120,478. Summary statistics for multinomial logistic regression models: Variables: Degrees of Freedom; Predicting to prepayments: Base model: 76. Summary statistics for multinomial logistic regression models: Variables: Significance levels; Predicting to prepayments: Base model: <.0001. Source: GAO. [A] Except as noted, significance of coefficients is less than or equal to .0001. Significance of coefficients marked (a): < .05; those marked (b) had significance greater than .05. [End of table] Effects of Including Additional Variables: Although we found that SBA's default and prepayment equations are reasonable, we evaluated the impact of including additional variables in those equations and found that equations containing some additional variables are also reasonable. In particular, we found that when measures of interest rates and the industry of the borrower are included, these factors appear to be significantly related to the likelihood of defaults and prepayments. Table 3 presents the descriptions of the additional variables. Table 3: Names and Descriptions of Additional Variables: Variable name: tbill; Variable description: Interest rate on 1 year U.S. Treasury Bills. Variable name: Agri_etc; Variable description: 1 if firm is in agriculture, else 0. Variable name: Mine_Const; Variable description: 1 if firm is in mining or construction, else 0. Variable name: Manuf; Variable description: 1 if firm is in manufacturing, else 0. Variable name: Wholesale; Variable description: 1 if firm is in wholesale trade, else 0. Variable name: Trans_etc; Variable description: 1 if firm is in transportation, communication, or utilities, else 0. Variable name: Retail; Variable description: 1 if firm is in retail trade, else 0. Variable name: Finan_etc; Variable description: 1 if firm is in finance, insurance, or real estate, else 0. Source: GAO. [End of table] Table 4 presents the coefficients from three alternative specifications of the default and prepayment equations, respectively, as well as, for comparison purposes, the coefficients from SBA's equations. The first pair of alternative equations include an interest rate variable, the second pair include a set of dummy variables that identify the borrower's industry, and the third pair include both the interest rate variable and the industry-specific dummy variables. The interest rate variable that we use is the interest rate on 1-year Treasury bills. We selected that rate, in part, because of the availability of forecasted values for it that would be consistent with the forecasted values SBA uses for other economic indicators in forecasting future defaults and prepayments. To create the industry-specific dummy variables, we used data from SBA that identified the borrower's industry category, using either the Standard Industrial Classification (SIC) codes or the North American Industrial Classification (NAIC) codes. The NAIC is the Department of Commerce's current system for classifying businesses into industries and in 1997 the NAIC codes replaced the SIC codes that Commerce previously used. When possible, for loans that had NAIC codes, but not SIC codes, we converted the NAIC code into the corresponding SIC code. We aggregated the SIC codes into broader categories defined by the first digit of the code. To reduce the number of dummy variables, we aggregated some small categories. In particular, we aggregated mining and construction and combined the small number of firms classified in the public administration industry with firms in the service industry and used that category as the omitted category in our regressions. As a result, the coefficients on the industry-specific dummy variables should be interpreted as the difference in the likelihood of default and prepayment from the likelihood for the service category. Table 5 shows how loans in SBA's database are distributed among categories defined by single-digit SIC codes. Table 4: Multinomial Logistic Regression Coefficient Estimates[A]: Variables; Predicting to defaults: Constant; Base model: Predicting to defaults: -9.765; Base+ T-bill: Predicting to defaults: -9.903; Base + SIC codes: Predicting to defaults: -9.958; Base + SIC + T-bill: Predicting to defaults: -10.078. Variables; Predicting to defaults: i1; Base model: Predicting to defaults: 2.115; Base+ T-bill: Predicting to defaults: 2.116; Base + SIC codes: Predicting to defaults: 2.110; Base + SIC + T-bill: Predicting to defaults: 2.111. Variables; Predicting to defaults: i2; Base model: Predicting to defaults: 3.117; Base+ T-bill: Predicting to defaults: 3.119; Base + SIC codes: Predicting to defaults: 3.109; Base + SIC + T-bill: Predicting to defaults: 3.110. Variables; Predicting to defaults: i3; Base model: Predicting to defaults: 3.816; Base+ T-bill: Predicting to defaults: 3.819; Base + SIC codes: Predicting to defaults: 3.806; Base + SIC + T-bill: Predicting to defaults: 3.809. Variables; Predicting to defaults: i4; Base model: Predicting to defaults: 4.225; Base+ T-bill: Predicting to defaults: 4.228; Base + SIC codes: Predicting to defaults: 4.213; Base + SIC + T-bill: Predicting to defaults: 4.216. Variables; Predicting to defaults: i5; Base model: Predicting to defaults: 4.519; Base+ T-bill: Predicting to defaults: 4.523; Base + SIC codes: Predicting to defaults: 4.506; Base + SIC + T-bill: Predicting to defaults: 4.510. Variables; Predicting to defaults: i6; Base model: Predicting to defaults: 4.666; Base+ T-bill: Predicting to defaults: 4.671; Base + SIC codes: Predicting to defaults: 4.655; Base + SIC + T-bill: Predicting to defaults: 4.660. Variables; Predicting to defaults: i7; Base model: Predicting to defaults: 4.749; Base+ T-bill: Predicting to defaults: 4.755; Base + SIC codes: Predicting to defaults: 4.737; Base + SIC + T-bill: Predicting to defaults: 4.742. Variables; Predicting to defaults: i8; Base model: Predicting to defaults: 4.821; Base+ T-bill: Predicting to defaults: 4.828; Base + SIC codes: Predicting to defaults: 4.811; Base + SIC + T-bill: Predicting to defaults: 4.817. Variables; Predicting to defaults: i9; Base model: Predicting to defaults: 4.807; Base+ T-bill: Predicting to defaults: 4.815; Base + SIC codes: Predicting to defaults: 4.798; Base + SIC + T-bill: Predicting to defaults: 4.805. Variables; Predicting to defaults: i10; Base model: Predicting to defaults: 4.812; Base+ T-bill: Predicting to defaults: 4.821; Base + SIC codes: Predicting to defaults: 4.803; Base + SIC + T-bill: Predicting to defaults: 4.811. Variables; Predicting to defaults: i1112; Base model: Predicting to defaults: 4.803; Base+ T-bill: Predicting to defaults: 4.813; Base + SIC codes: Predicting to defaults: 4.795; Base + SIC + T-bill: Predicting to defaults: 4.804. Variables; Predicting to defaults: i1314; Base model: Predicting to defaults: 4.777; Base+ T-bill: Predicting to defaults: 4.789; Base + SIC codes: Predicting to defaults: 4.769; Base + SIC + T-bill: Predicting to defaults: 4.780. Variables; Predicting to defaults: i1516; Base model: Predicting to defaults: 4.710; Base+ T-bill: Predicting to defaults: 4.723; Base + SIC codes: Predicting to defaults: 4.703; Base + SIC + T-bill: Predicting to defaults: 4.715. Variables; Predicting to defaults: i1718; Base model: Predicting to defaults: 4.621; Base+ T-bill: Predicting to defaults: 4.634; Base + SIC codes: Predicting to defaults: 4.616; Base + SIC + T-bill: Predicting to defaults: 4.628. Variables; Predicting to defaults: i1920; Base model: Predicting to defaults: 4.614; Base+ T-bill: Predicting to defaults: 4.625; Base + SIC codes: Predicting to defaults: 4.611; Base + SIC + T-bill: Predicting to defaults: 4.620. Variables; Predicting to defaults: i2122; Base model: Predicting to defaults: 4.516; Base+ T-bill: Predicting to defaults: 4.526; Base + SIC codes: Predicting to defaults: 4.509; Base + SIC + T-bill: Predicting to defaults: 4.518. Variables; Predicting to defaults: i2324; Base model: Predicting to defaults: 4.430; Base+ T-bill: Predicting to defaults: 4.441; Base + SIC codes: Predicting to defaults: 4.421; Base + SIC + T-bill: Predicting to defaults: 4.431. Variables; Predicting to defaults: i2526; Base model: Predicting to defaults: 4.295; Base+ T-bill: Predicting to defaults: 4.308; Base + SIC codes: Predicting to defaults: 4.292; Base + SIC + T-bill: Predicting to defaults: 4.304. Variables; Predicting to defaults: i2728; Base model: Predicting to defaults: 4.341; Base+ T-bill: Predicting to defaults: 4.354; Base + SIC codes: Predicting to defaults: 4.332; Base + SIC + T-bill: Predicting to defaults: 4.343. Variables; Predicting to defaults: i2930; Base model: Predicting to defaults: 4.252; Base+ T-bill: Predicting to defaults: 4.263; Base + SIC codes: Predicting to defaults: 4.231; Base + SIC + T-bill: Predicting to defaults: 4.242. Variables; Predicting to defaults: i3132; Base model: Predicting to defaults: 4.204; Base+ T-bill: Predicting to defaults: 4.216; Base + SIC codes: Predicting to defaults: 4.198; Base + SIC + T-bill: Predicting to defaults: 4.209. Variables; Predicting to defaults: i3334; Base model: Predicting to defaults: 4.138; Base+ T-bill: Predicting to defaults: 4.151; Base + SIC codes: Predicting to defaults: 4.131; Base + SIC + T-bill: Predicting to defaults: 4.143. Variables; Predicting to defaults: i35p; Base model: Predicting to defaults: 4.103; Base+ T-bill: Predicting to defaults: 4.121; Base + SIC codes: Predicting to defaults: 4.084; Base + SIC + T-bill: Predicting to defaults: 4.100. Variables; Predicting to defaults: t5_10; Base model: Predicting to defaults: -0.046[A]; Base+ T-bill: Predicting to defaults: -0.046[A]; Base + SIC codes: Predicting to defaults: -0.064; Base + SIC + T-bill: Predicting to defaults: -0.063. Variables; Predicting to defaults: t10_15; Base model: Predicting to defaults: -0.760; Base+ T-bill: Predicting to defaults: -0.761; Base + SIC codes: Predicting to defaults: -0.738; Base + SIC + T-bill: Predicting to defaults: -0.739. Variables; Predicting to defaults: t15p; Base model: Predicting to defaults: -0.740; Base+ T-bill: Predicting to defaults: -0.739; Base + SIC codes: Predicting to defaults: -0.709; Base + SIC + T-bill: Predicting to defaults: -0.708. Variables; Predicting to defaults: sub1027; Base model: Predicting to defaults: -0.580; Base+ T-bill: Predicting to defaults: -0.565; Base + SIC codes: Predicting to defaults: -0.553; Base + SIC + T-bill: Predicting to defaults: -0.541. Variables; Predicting to defaults: Loan_amt; Base model: Predicting to defaults: 0.258; Base+ T-bill: Predicting to defaults: 0.259; Base + SIC codes: Predicting to defaults: 0.278; Base + SIC + T-bill: Predicting to defaults: 0.279. Variables; Predicting to defaults: Corporation; Base model: Predicting to defaults: -0.043; Base+ T-bill: Predicting to defaults: -0.043; Base + SIC codes: Predicting to defaults: -0.084; Base + SIC + T-bill: Predicting to defaults: -0.083. Variables; Predicting to defaults: Partnership; Base model: Predicting to defaults: -0.198; Base+ T-bill: Predicting to defaults: -0.199; Base + SIC codes: Predicting to defaults: -0.199; Base + SIC + T-bill: Predicting to defaults: -0.199. Variables; Predicting to defaults: Northeast; Base model: Predicting to defaults: 0.361; Base+ T-bill: Predicting to defaults: 0.365; Base + SIC codes: Predicting to defaults: 0.355; Base + SIC + T-bill: Predicting to defaults: 0.358. Variables; Predicting to defaults: Midwest; Base model: Predicting to defaults: 0.218; Base+ T-bill: Predicting to defaults: 0.224; Base + SIC codes: Predicting to defaults: 0.210; Base + SIC + T-bill: Predicting to defaults: 0.215. Variables; Predicting to defaults: South; Base model: Predicting to defaults: 0.414; Base+ T-bill: Predicting to defaults: 0.418; Base + SIC codes: Predicting to defaults: 0.433; Base + SIC + T-bill: Predicting to defaults: 0.436. Variables; Predicting to defaults: Lender_PLP; Base model: Predicting to defaults: -0.176; Base+ T-bill: Predicting to defaults: -0.171; Base + SIC codes: Predicting to defaults: -0.175; Base + SIC + T-bill: Predicting to defaults: -0.170. Variables; Predicting to defaults: Lender_CLP; Base model: Predicting to defaults: -0.169; Base+ T-bill: Predicting to defaults: -0.171; Base + SIC codes: Predicting to defaults: -0.176; Base + SIC + T-bill: Predicting to defaults: -0.177. Variables; Predicting to defaults: New business; Base model: Predicting to defaults: 0.277; Base+ T-bill: Predicting to defaults: 0.279; Base + SIC codes: Predicting to defaults: 0.278; Base + SIC + T-bill: Predicting to defaults: 0.279. Variables; Predicting to defaults: Urate; Base model: Predicting to defaults: 0.104; Base+ T-bill: Predicting to defaults: 0.107; Base + SIC codes: Predicting to defaults: 0.102; Base + SIC + T-bill: Predicting to defaults: 0.104. Variables; Predicting to defaults: pc_gdp96; Base model: Predicting to defaults: -0.126; Base+ T-bill: Predicting to defaults: -0.129; Base + SIC codes: Predicting to defaults: -0.124; Base + SIC + T-bill: Predicting to defaults: -0.126. Variables; Predicting to defaults: T-bill; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: 0.022; Base + SIC codes: Predicting to defaults: [Empty]; Base + SIC + T-bill: Predicting to defaults: 0.020. Variables; Predicting to defaults: Agri_etc; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: -0.537; Base + SIC + T-bill: Predicting to defaults: -0.537. Variables; Predicting to defaults: Mine_Const; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.306; Base + SIC + T-bill: Predicting to defaults: 0.306. Variables; Predicting to defaults: Manuf; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.319; Base + SIC + T-bill: Predicting to defaults: 0.318. Variables; Predicting to defaults: Wholesale; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.202; Base + SIC + T-bill: Predicting to defaults: 0.201. Variables; Predicting to defaults: Trans_etc; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.208; Base + SIC + T-bill: Predicting to defaults: 0.208. Variables; Predicting to defaults: Retail; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.443; Base + SIC + T-bill: Predicting to defaults: 0.443. Variables; Predicting to defaults: Finan_etc; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: -0.146[A]; Base + SIC + T-bill: Predicting to defaults: -0.145[A]. Variables; Predicting to prepayments: Constant; Base model: Predicting to defaults: -5.276; Base+ T-bill: Predicting to defaults: -4.917; Base + SIC codes: Predicting to defaults: -5.293; Base + SIC + T-bill: Predicting to defaults: -4.932. Variables; Predicting to prepayments: i1; Base model: Predicting to defaults: 1.120; Base+ T-bill: Predicting to defaults: 1.119; Base + SIC codes: Predicting to defaults: 1.121; Base + SIC + T-bill: Predicting to defaults: 1.119. Variables; Predicting to prepayments: i2; Base model: Predicting to defaults: 1.602; Base+ T-bill: Predicting to defaults: 1.597; Base + SIC codes: Predicting to defaults: 1.603; Base + SIC + T-bill: Predicting to defaults: 1.598. Variables; Predicting to prepayments: i3; Base model: Predicting to defaults: 1.937; Base+ T-bill: Predicting to defaults: 1.931; Base + SIC codes: Predicting to defaults: 1.937; Base + SIC + T-bill: Predicting to defaults: 1.930. Variables; Predicting to prepayments: i4; Base model: Predicting to defaults: 2.106; Base+ T-bill: Predicting to defaults: 2.097; Base + SIC codes: Predicting to defaults: 2.108; Base + SIC + T-bill: Predicting to defaults: 2.098. Variables; Predicting to prepayments: i5; Base model: Predicting to defaults: 2.287; Base+ T-bill: Predicting to defaults: 2.275; Base + SIC codes: Predicting to defaults: 2.288; Base + SIC + T-bill: Predicting to defaults: 2.275. Variables; Predicting to prepayments: i6; Base model: Predicting to defaults: 2.411; Base+ T-bill: Predicting to defaults: 2.398; Base + SIC codes: Predicting to defaults: 2.413; Base + SIC + T-bill: Predicting to defaults: 2.398. Variables; Predicting to prepayments: i7; Base model: Predicting to defaults: 2.581; Base+ T-bill: Predicting to defaults: 2.565; Base + SIC codes: Predicting to defaults: 2.580; Base + SIC + T-bill: Predicting to defaults: 2.564. Variables; Predicting to prepayments: i8; Base model: Predicting to defaults: 2.708; Base+ T-bill: Predicting to defaults: 2.691; Base + SIC codes: Predicting to defaults: 2.709; Base + SIC + T-bill: Predicting to defaults: 2.691. Variables; Predicting to prepayments: i9; Base model: Predicting to defaults: 2.816; Base+ T-bill: Predicting to defaults: 2.797; Base + SIC codes: Predicting to defaults: 2.817; Base + SIC + T-bill: Predicting to defaults: 2.797. Variables; Predicting to prepayments: i10; Base model: Predicting to defaults: 2.913; Base+ T-bill: Predicting to defaults: 2.893; Base + SIC codes: Predicting to defaults: 2.913; Base + SIC + T-bill: Predicting to defaults: 2.892. Variables; Predicting to prepayments: i1112; Base model: Predicting to defaults: 3.054; Base+ T-bill: Predicting to defaults: 3.032; Base + SIC codes: Predicting to defaults: 3.055; Base + SIC + T-bill: Predicting to defaults: 3.032. Variables; Predicting to prepayments: i1314; Base model: Predicting to defaults: 3.144; Base+ T-bill: Predicting to defaults: 3.117; Base + SIC codes: Predicting to defaults: 3.146; Base + SIC + T-bill: Predicting to defaults: 3.118. Variables; Predicting to prepayments: i1516; Base model: Predicting to defaults: 3.311; Base+ T-bill: Predicting to defaults: 3.281; Base + SIC codes: Predicting to defaults: 3.312; Base + SIC + T-bill: Predicting to defaults: 3.282. Variables; Predicting to prepayments: i1718; Base model: Predicting to defaults: 3.455; Base+ T-bill: Predicting to defaults: 3.427; Base + SIC codes: Predicting to defaults: 3.456; Base + SIC + T-bill: Predicting to defaults: 3.427. Variables; Predicting to prepayments: i1920; Base model: Predicting to defaults: 3.695; Base+ T-bill: Predicting to defaults: 3.670; Base + SIC codes: Predicting to defaults: 3.694; Base + SIC + T-bill: Predicting to defaults: 3.669. Variables; Predicting to prepayments: i2122; Base model: Predicting to defaults: 3.520; Base+ T-bill: Predicting to defaults: 3.497; Base + SIC codes: Predicting to defaults: 3.521; Base + SIC + T-bill: Predicting to defaults: 3.497. Variables; Predicting to prepayments: i2324; Base model: Predicting to defaults: 3.669; Base+ T-bill: Predicting to defaults: 3.644; Base + SIC codes: Predicting to defaults: 3.668; Base + SIC + T-bill: Predicting to defaults: 3.642. Variables; Predicting to prepayments: i2526; Base model: Predicting to defaults: 3.822; Base+ T-bill: Predicting to defaults: 3.793; Base + SIC codes: Predicting to defaults: 3.823; Base + SIC + T-bill: Predicting to defaults: 3.792. Variables; Predicting to prepayments: i2728; Base model: Predicting to defaults: 4.011; Base+ T-bill: Predicting to defaults: 3.982; Base + SIC codes: Predicting to defaults: 4.010; Base + SIC + T-bill: Predicting to defaults: 3.980. Variables; Predicting to prepayments: i2930; Base model: Predicting to defaults: 3.614; Base+ T-bill: Predicting to defaults: 3.587; Base + SIC codes: Predicting to defaults: 3.613; Base + SIC + T-bill: Predicting to defaults: 3.584. Variables; Predicting to prepayments: i3132; Base model: Predicting to defaults: 3.714; Base+ T-bill: Predicting to defaults: 3.685; Base + SIC codes: Predicting to defaults: 3.714; Base + SIC + T-bill: Predicting to defaults: 3.684. Variables; Predicting to prepayments: i3334; Base model: Predicting to defaults: 3.791; Base+ T-bill: Predicting to defaults: 3.760; Base + SIC codes: Predicting to defaults: 3.789; Base + SIC + T-bill: Predicting to defaults: 3.756. Variables; Predicting to prepayments: i35p; Base model: Predicting to defaults: 3.995; Base+ T-bill: Predicting to defaults: 3.952; Base + SIC codes: Predicting to defaults: 3.992; Base + SIC + T-bill: Predicting to defaults: 3.948. Variables; Predicting to prepayments: t5_10; Base model: Predicting to defaults: -0.657; Base+ T-bill: Predicting to defaults: -0.659; Base + SIC codes: Predicting to defaults: -0.652; Base + SIC + T-bill: Predicting to defaults: -0.654. Variables; Predicting to prepayments: t10_15; Base model: Predicting to defaults: -1.101; Base+ T-bill: Predicting to defaults: -1.100; Base + SIC codes: Predicting to defaults: -1.092; Base + SIC + T-bill: Predicting to defaults: -1.091. Variables; Predicting to prepayments: t15p; Base model: Predicting to defaults: -1.101; Base+ T-bill: Predicting to defaults: -1.101; Base + SIC codes: Predicting to defaults: -1.091; Base + SIC + T-bill: Predicting to defaults: -1.091. Variables; Predicting to prepayments: Sub1027; Base model: Predicting to defaults: 0.081; Base+ T-bill: Predicting to defaults: 0.048[A]; Base + SIC codes: Predicting to defaults: 0.078; Base + SIC + T-bill: Predicting to defaults: 0.045[A]. Variables; Predicting to prepayments: Loan_amt; Base model: Predicting to defaults: 0.119; Base+ T-bill: Predicting to defaults: 0.119; Base + SIC codes: Predicting to defaults: 0.110; Base + SIC + T-bill: Predicting to defaults: 0.110. Variables; Predicting to prepayments: Corporation; Base model: Predicting to defaults: 0.099; Base+ T-bill: Predicting to defaults: 0.097; Base + SIC codes: Predicting to defaults: 0.091; Base + SIC + T-bill: Predicting to defaults: 0.089. Variables; Predicting to prepayments: Partnership; Base model: Predicting to defaults: 0.021[A]; Base+ T-bill: Predicting to defaults: 0.024[A]; Base + SIC codes: Predicting to defaults: 0.020; Base + SIC + T-bill: Predicting to defaults: 0.022. Variables; Predicting to prepayments: Northeast; Base model: Predicting to defaults: -0.205; Base+ T-bill: Predicting to defaults: -0.214; Base + SIC codes: Predicting to defaults: -0.206; Base + SIC + T-bill: Predicting to defaults: -0.215. Variables; Predicting to prepayments: Midwest; Base model: Predicting to defaults: -0.187; Base+ T-bill: Predicting to defaults: -0.200; Base + SIC codes: Predicting to defaults: -0.186; Base + SIC + T-bill: Predicting to defaults: -0.200. Variables; Predicting to prepayments: South; Base model: Predicting to defaults: -0.093; Base+ T-bill: Predicting to defaults: -0.101; Base + SIC codes: Predicting to defaults: -0.091; Base + SIC + T-bill: Predicting to defaults: -0.099. Variables; Predicting to prepayments: Lender_PLP; Base model: Predicting to defaults: 0.082; Base+ T-bill: Predicting to defaults: 0.072; Base + SIC codes: Predicting to defaults: 0.085; Base + SIC + T-bill: Predicting to defaults: 0.075. Variables; Predicting to prepayments: Lender_CLP; Base model: Predicting to defaults: -0.001[B]; Base+ T-bill: Predicting to defaults: 0.003[B]; Base + SIC codes: Predicting to defaults: -0.001[B]; Base + SIC + T- bill: Predicting to defaults: 0.003[B]. Variables; Predicting to prepayments: NewBusiness; Base model: Predicting to defaults: 0.068; Base+ T-bill: Predicting to defaults: 0.064; Base + SIC codes: Predicting to defaults: 0.077; Base + SIC + T-bill: Predicting to defaults: 0.073. Variables; Predicting to prepayments: Urate; Base model: Predicting to defaults: -0.096; Base+ T-bill: Predicting to defaults: -0.103; Base + SIC codes: Predicting to defaults: -0.096; Base + SIC + T-bill: Predicting to defaults: -0.104. Variables; Predicting to prepayments: Pc_gdp96; Base model: Predicting to defaults: 0.066; Base+ T-bill: Predicting to defaults: 0.076; Base + SIC codes: Predicting to defaults: 0.065; Base + SIC + T-bill: Predicting to defaults: 0.076. Variables; Predicting to prepayments: Tbill; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: -0.059; Base + SIC codes: Predicting to defaults: [Empty]; Base + SIC + T-bill: Predicting to defaults: -0.059. Variables; Predicting to prepayments: Agri_etc; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.012[B]; Base + SIC + T-bill: Predicting to defaults: 0.010[B]. Variables; Predicting to prepayments: Mine_Const; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.058; Base + SIC + T-bill: Predicting to defaults: 0.059. Variables; Predicting to prepayments: Manuf; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.029[B]; Base + SIC + T-bill: Predicting to defaults: 0.032[A]. Variables; Predicting to prepayments: Wholesale; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.077; Base + SIC + T-bill: Predicting to defaults: 0.080. Variables; Predicting to prepayments: Trans_etc; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.138; Base + SIC + T-bill: Predicting to defaults: 0.139. Variables; Predicting to prepayments: Retail; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: -0.004[B]; Base + SIC + T-bill: Predicting to defaults: -0.004[B]. Variables; Predicting to prepayments: Finan_etc; Base model: Predicting to defaults: [Empty]; Base+ T-bill: Predicting to defaults: [Empty]; Base + SIC codes: Predicting to defaults: 0.057[A]; Base + SIC + T-bill: Predicting to defaults: 0.056[A]. Summary statistics for multinomial logistic regression models: N of Observations; Base model: Predicting to defaults: 5,736,628; Base+ T-bill: Predicting to defaults: 5,736,628; Base + SIC codes: Predicting to defaults: 5,710,096; Base + SIC + T-bill: Predicting to defaults: 5,710,096. Summary statistics for multinomial logistic regression models: Likelihood Ratio Chi Sq; Base model: Predicting to defaults: 120,478; Base+ T-bill: Predicting to defaults: 121,081; Base + SIC codes: Predicting to defaults: 121,718; Base + SIC + T-bill: Predicting to defaults: 122,318. Summary statistics for multinomial logistic regression models: Degrees of Freedom; Base model: Predicting to defaults: 76; Base+ T-bill: Predicting to defaults: 78; Base + SIC codes: Predicting to defaults: 90; Base + SIC + T-bill: Predicting to defaults: 92. Summary statistics for multinomial logistic regression models: Significance levels; Base model: Predicting to defaults: <.0001; Base+ T-bill: Predicting to defaults: <.0001; Base + SIC codes: Predicting to defaults: <.0001; Base + SIC + T-bill: Predicting to defaults: <.0001. Source: GAO. Note: Models with SIC codes are based on a smaller number of cases due to missing SIC values. [A] Except as noted, significance of coefficients is less than or equal to .0001. Significance of coefficients marked (a): < .05; those marked (b) had significance greater than .05. [B] Models including SIC codes are based on a smaller number of cases due to missing SIC values. [End of table] Table 5: Distribution of SIC Industry Codes in SBA's Loan Database Distribution of SIC Industry Codes in SBAs Loan Database: SIC industry codes: Agriculture, forestry, fishing; N of loans: 12,280; Percent of loans: 3.0. SIC industry codes: Mining and construction; N of loans: 22,349; Percent of loans: 5.5. SIC industry codes: Manufacturing; N of loans: 49,807; Percent of loans: 12.3. SIC industry codes: Wholesale trade; N of loans: 29,464; Percent of loans: 7.3. SIC industry codes: Transport, communication, utilities; N of loans: 13,276; Percent of loans: 3.3. SIC industry codes: Retail trade; N of loans: 130,278; Percent of loans: 32.3. SIC industry codes: Finance, insurance, real estate; N of loans: 6,054; Percent of loans: 1.5. SIC industry codes: Service industries; N of loans: 134,623; Percent of loans: 33.4. SIC industry codes: Public administration; N of loans: 160; Percent of loans: 0.0. SIC industry codes: Missing; N of loans: 5,252; Percent of loans: 1.3. Total; N of loans: 403,543; Percent of loans: 100.0. Source: GAO. [End of table] The coefficients for the interest rate on 1-year Treasury bills are positive and highly significant for the default equations, as expected, and negative and highly significant for the prepayment equation. Most of the coefficients for the industry-specific dummy variables are also statistically significant. As can be seen in Table 4, the coefficients for most of the other variables in the equations are not much different in the alternative specifications from their values in SBA's equations. SBA's Recovery Equation: SBA uses an ordinary least squares regression equation to estimate the relationship between the cumulative net recovery rate for a cohort of loans and the age of the loans in that cohort. This equation differs from the default and prepayment equations in that there are no economic or programmatic variables. As a result, forecasted recoveries on new loans will follow the historical pattern of recoveries on previously disbursed loans and will not depend on forecasted economic conditions. In addition, the unit of analysis is the cohort of loans rather than individual loans. The recovery equation uses ordinary least squares to regress the cumulative net recovery rate on a set of dummy variables for the age of the cohort. The cumulative net recovery rate is defined as cumulative net recoveries to date divided by cumulative defaults to date. Each dummy variable covers two quarters ranging from quarters 1 and 2 to quarters 55 and 56. As expected for a cumulative dependent variable, the coefficients are generally increasing. In addition, except for the variable indicating the first two quarters, they are highly statistically significant. The adjusted R is .9776, showing a good fit. Table 6 gives the names and descriptions for variables in the recovery equation while table 7 shows the coefficients for that equation. Table 6: Variable Names and Descriptions: Variable name: Cohort age 1-2; Variable description: 1 if in quarters 1 or 2, else 0. Variable name: Cohort age 3-4; Variable description: 1 if in quarters 3 or 4, else 0. Variable name: Cohort age 5-6; Variable description: 1 if in quarters 5 or 6, else 0. Variable name: Cohort age 7-8; Variable description: 1 if in quarters 7 or 8, else 0. Variable name: Cohort age 9-10; Variable description: 1 if in quarters 9 or 10, else 0. Variable name: Cohort age 11-12; Variable description: 1 if in quarters 11 or 12, else 0. Variable name: Cohort age 13-14; Variable description: 1 if in quarters 13 or 14, else 0. Variable name: Cohort age 15-16; Variable description: 1 if in quarters 15 or 16, else 0. Variable name: Cohort age 17-18; Variable description: 1 if in quarters 17 or 18, else 0. Variable name: Cohort age 19-20; Variable description: 1 if in quarters 19 or 20, else 0. Variable name: Cohort age 21-22; Variable description: 1 if in quarters 21 or 22, else 0. Variable name: Cohort age 23-24; Variable description: 1 if in quarters 23 or 24, else 0. Variable name: Cohort age 25-26; Variable description: 1 if in quarters 25 or 26, else 0. Variable name: Cohort age 27-28; Variable description: 1 if in quarters 27 or 28, else 0. Variable name: Cohort age 29-30; Variable description: 1 if in quarters 29 or 30, else 0. Variable name: Cohort age 31-32; Variable description: 1 if in quarters 31 or 32, else 0. Variable name: Cohort age 33-34; Variable description: 1 if in quarters 33 or 34, else 0. Variable name: Cohort age 35-36; Variable description: 1 if in quarters 35 or 36, else 0. Variable name: Cohort age 37-38; Variable description: 1 if in quarters 37 or 38, else 0. Variable name: Cohort age 39-40; Variable description: 1 if in quarters 39 or 40, else 0. Variable name: Cohort age 41-42; Variable description: 1 if in quarters 41 or 42, else 0. Variable name: Cohort age 43-44; Variable description: 1 if in quarters 43 or 44, else 0. Variable name: Cohort age 45-46; Variable description: 1 if in quarters 45 or 46, else 0. Variable name: Cohort age 47-48; Variable description: 1 if in quarters 47 or 48, else 0. Variable name: Cohort age 49-50; Variable description: 1 if in quarters 49 or 50, else 0. Variable name: Cohort age 51-52; Variable description: 1 if in quarters 51 or 52, else 0. Variable name: Cohort age 53-54; Variable description: 1 if in quarters 53 or 54, else 0. Variable name: Cohort age 55-56; Variable description: 1 if in quarters 55 or 56, else 0. Sources: SBA and OFHEO. [End of table] Table 7: Recovery Model: Variable: Cohort age 1-2; Coefficient: .0134; Standard error: .0298; T - statistic: 0.45. Variable: Cohort age 3-4; Coefficient: .0495; Standard error: .0081; T - statistic: 6.11. Variable: Cohort age 5-6; Coefficient: .0478; Standard error: .0083; T - statistic: 5.79. Variable: Cohort age 7-8; Coefficient: .0656; Standard error: .0083; T - statistic: 7.95. Variable: Cohort age 9-10; Coefficient: .0821; Standard error: .0086; T - statistic: 9.56. Variable: Cohort age 11-12; Coefficient: .1096; Standard error: .0086; T - statistic: 12.76. Variable: Cohort age 13-14; Coefficient: .1356; Standard error: .0090; T - statistic: 15.12. Variable: Cohort age 15-16; Coefficient: .1706; Standard error: .0090; T - statistic: 19.01. Variable: Cohort age 17-18; Coefficient: .1994; Standard error: .0094; T - statistic: 21.20. Variable: Cohort age 19-20; Coefficient: .2263; Standard error: .0094; T - statistic: 24.06. Variable: Cohort age 21-22; Coefficient: .2535; Standard error: .0099; T - statistic: 25.56. Variable: Cohort age 23-24; Coefficient: .2806; Standard error: .0099; T - statistic: 28.30. Variable: Cohort age 25-26; Coefficient: .3077; Standard error: .0105; T - statistic: 29.26. Variable: Cohort age 27-28; Coefficient: .3359; Standard error: .0105; T - statistic: 31.94. Variable: Cohort age 29-30; Coefficient: .3661; Standard error: .0112; T - statistic: 32.56. Variable: Cohort age 31-32; Coefficient: .3897; Standard error: .0112; T - statistic: 34.66. Variable: Cohort age 33-34; Coefficient: .4066; Standard error: .0121; T - statistic: 33.48. Variable: Cohort age 35-36; Coefficient: .4271; Standard error: .0121; T - statistic: 35.17. Variable: Cohort age 37-38; Coefficient: .4327; Standard error: .0133; T - statistic: 32.53. Variable: Cohort age 39-40; Coefficient: .4499; Standard error: .0133; T - statistic: 33.82. Variable: Cohort age 41-42; Coefficient: .4480; Standard error: .0149; T - statistic: 30.12. Variable: Cohort age 43-44; Coefficient: .4622; Standard error: .0149; T - statistic: 31.07. Variable: Cohort age 45-46; Coefficient: .4624; Standard error: .0172; T - statistic: 26.92. Variable: Cohort age 47-48; Coefficient: .4746; Standard error: .0172; T - statistic: 27.63. Variable: Cohort age 49-50; Coefficient: .4860; Standard error: .0210; T - statistic: 23.11. Variable: Cohort age 51-52; Coefficient: .4982; Standard error: .0210; T - statistic: 23.68. Variable: Cohort age 53-54; Coefficient: .5099; Standard error: .0298; T - statistic: 17.14. Variable: Cohort age 55-56; Coefficient: .5192; Standard error: .0298; T - statistic: 17.45. Summary statistics: Adjusted R[2]; Coefficient: .9776. Summary statistics: Observations; Coefficient: 393. Source: GAO. [End of table] [End of section] Appendix III: Comments from the Small Business Administration: U.S. SMALL BUSINESS ADMINISTRATION: WASHINGTON, D.C. 20416 OCT 17 2003 Ms. Davi D'Agostino, Director: Financial Markets and Community Investments Division General Accounting Office: Washington, DC: Dear Ms. D'Agostino, Thank you for the opportunity to review and comment on GAO's report entitled "Model For 7(A Program Is Reasonable But Could Be Enhanced." We feel the report presents a thorough description of the model and the work that GAO was asked to complete. As you know, SBA calculated significant re-estimates that resulted in large transfers of funds to the US Treasury from 1992-2001 for the 7(a) General Business Program. These re-estimates occurred primarily because SBA was using a historical average to model defaults. This method inherently reduces the ability to predict accurately in times where economic and program changes are taking place that do not repeat historical trends. As a result, SBA began this project with the mission of building a model that would result in more accurate estimates and lower levels of re-estimates in the future. SBA felt strongly that the model should include assumptions about the economy since these appeared to be a cause for overestimating defaults. SBA also wanted the model to be useful in making decisions on programmatic changes. In order to implement this mission, SBA management decided to hire contract support to provide additional expertise and ensure an objective process and result. SBA's choice of OFHEO allowed us access to experienced research economists with a past history in modeling loan programs at a very reasonable cost to the government. By building a model that produces reasonable results based on well documented economic theory, SBA feels that the mission has been accomplished, and the results of the improved model will continue to be evident over the coming years. However, SBA also recognizes that work in this area is an ongoing effort. Periodic review is a constant necessity in order to ensure the models reflect the 7(a) program, and incorporating new data can lead to better knowledge of borrower and lender behavior. GAO has made 2 recommendations and SBA agrees with them both. On the first recommendation, SBA intends to review any additional data that becomes available to assess if it would be useful in enhancing the accuracy of the model. Our Deputy CFO and subsidy team have been involved in reviewing the credit scoring product acquired by the Agency for loan monitoring. As the data becomes available, we will analyze it and assess its appropriateness for inclusion in the model. GAO has also recommended that SBA establish a process for revising the model to correct errors and reflect changes in the 7(a) program. In FY 2003, SBA established an annual schedule for updating the coefficients used in the formulae based on additional years of data. We will continue to have these updates validated by an independent validation party, as is part of our internal controls process. We also intend to improve the automation of these models which will serve to decrease the potential for human error. SBA and OMB agreed to correct the error identified in GAO's review in the 2004 Budget Request, and did so as of October 1, 2003. We appreciate the opportunity to comment on this report. Sincerely, Signed by: Tom Dumaresq: Chief Financial Officer: U.S. SMALL BUSINESS ADMINISTRATION WASHINGTON, D.C. 20416: FEB 19 2004: Davi M. D'Agostino Director: Financial Markets and Community Investment United States General Accounting Office Washington, DC: Dear Ms. D'Agostino, Thank you for this opportunity to comment on the draft GAO report on the Small Business Administration's (SBA) 7(a) program subsidy estimate ("Model for 7(a) Program Subsidy Estimate Had Reasonable Equations but Lack of Key Documentation Hampered Review"). SBA has the following comments. * Page 1: GAO states on the first page under "What GAO found," that SBA "did not adequately document its model development Process". This statement should be qualified, as it is later in the document, by making it clear that there is no requirement for SBA to document the model development process as GAO recommends. * Page 1: "SBA officials told us that the new 7a model was the first step in a long term effort to develop and implement new econometric models for their credit programs." Comment: SBA developed the 7a model using econometrics for many reasons, including the fact that there was a strong feeling in SBA and outside that the economy affects loan performance. However, SBA also knew that other factors affected loan performance and needed to be identified along with the economic factors. As such, SBA would like the following sentence added: Although this allowed SBA to build a model that responds to the need for greater sensitivity to a wider variety of factors than a model based on historical averages, this approach may not be appropriate for all the credit programs. * Page 7: SBA disagrees with the new section about the lack of documentation. Please add: GAO was given 800 pages of data with information on variables that were considered and rejected. SBA supplemented this extensive document with many hours of briefings and explanations. * Page 7: "However, maintaining documentation on how such models were developed is a sound internal control practice that would provide SBA and other agencies the opportunity to demonstrate and explain the rationale and basis for key aspects..." Comment: SBA would like GAO to add the words "more fully" after "opportunity to demonstrate". SBA feels that it has demonstrated and explained the rationale adequately according to the current formal and informal guidance. SBA followed the guidance that is available and a full explanation of the variables that SBA chose is supported by the documentation. * Page 25: SBA briefed GAO thoroughly on the issue of variables considered and rejected. GAO was provided with a list of the variables, and the reasons they were rejected, as well as the 800 page testing results document that was reviewed as a part of the government-wide financial statement audit. * Page 25: Add: SBA hired an independent contractor in 2002 to review the model prior to its finalization as part of its review and validation process. This occurred prior to the completion of the documentation. "The independent contractor hired to perform an initial review of the SBA 7(a) credit subsidy model prior to its finalization was hampered by the lack of detailed model documentation. In response to our inquiry, the contractor stated that it did not validate the model which, from an audit perspective, would have encompassed a more robust effort. In its final report to SBA, the contractor reported that SBA lacked sufficient supporting documentation for a "thorough review of its [the model's] theoretical basis (including alternative modeling methodologies explored), its working features, or the update and maintenance procedures necessary to use the model on an ongoing basis. This lack of documentation severely limited our ability to assess certain critical parts of the model in detail, including its econometric components." Further, the contractor recommended that "SBA develop a robust set of documentation to support this model" including "the modeling methodology, alternate methodologies considered, data inputs and outputs, and model maintenance and update requirements." "Nevertheless, GAO believes that maintaining sufficient documentation on how such models were developed is a sound internal control practice that would provide SBA and other agencies the opportunity to demonstrate and explain the rationale and basis for key aspects of their models that provide important cost information for budgets, financial statements, and congressional decision makers. Moreover, as a practical matter, this documentation would help facilitate SBA's and other agencies' annual financial statement audits". Comment: Please add the words in bold. * Page 28: GAO makes a statement that "SBA's lack of documentation on the 7(a) model process could impede our ability to conclude on SBA's loan accounts in connection with audit of the consolidated financial statements of the federal government." Comment: SBA strongly disagrees with this particular comment because it establishes a new and unnecessary requirement. SBA has followed the available guidance, including Tech release 6 (formerly this was covered by Tech release 3) and SFFAS statements 2 and 18. GAO agrees that the models produce a reasonable result. GAO has not proven bias in either direction. SBA believes that GAO's definition of "independent person" needs to be clarified to include and "be an informed reader who is familiar with statistical and econometric analysis, and the tools used to perform this activity." Please add in the conclusion section: "SBA chose an independent party to ensure that bias from selection did not exist, and further that SBA's tests show that there is no identifiable bias in this model. It should be noted however that some degree of bias can exist in all forecasting models, including those using a simple historical average." Thank you again for this opportunity to comment on the draft report. Sincerely, Thomas A. Dumaresq, Chief Financial Officer: The following are GAO's comments on the Small Business Administration's letter dated February 19, 2004. GAO Comments: 1. Highlights page was adjusted to reflect SBA's position. 2. We adjusted the report text to recognize SBA's position. See page 1. 3. We adjusted the text to reflect that SBA subsequently provided us access to this documentation and provided a description of the documentation as well as an assessment of its usefulness in assessing the model development process. See page 26. 4. We adjusted the text of the report. See page 7. 5. We acknowledge that SBA briefed us on the variables that were selected and rejected and that we could not corroborate this with the supporting documentation that SBA provided. See the Agency Comments and Our Evaluation section of the report pages 35-36. 6. We do not concur with SBA that this change should be made to the report because it is redundant with the information provided on pages 24 and 25. 7. We adjusted the report text to clarify our position. 8. We do not concur with SBA. See the Agency Comments and Our Evaluation section of the report on page 36. 9. We concur with SBA's assertion that we have not proven that the model had a bias. Our report states that we were unable to determine whether such a bias existed because of SBA's insufficient documentation. 10. We concur with SBA's definition of an independent person in the context of this report and point out our team that reviewed the 7(a) model met SBA's definition of an independent person. However, any revisions of the definition of an independent person would need to be made by the Federal Accounting Standards Advisory Board. 11. We do not concur with SBA's statement that an independent party ensured that the 7(a) model was free from bias from variable selection. As we discussed, neither Bearing Point nor Ernst and Young, both of which SBA asserted were independent reviewers who ensured the model was free of bias, assessed the variable selection process. Bearing Point reported that its review was severely limited by the lack of documentation and did not assess the econometric segment of the model. Ernst and Young reported that, at the request of SBA, it did not assess the econometric component of the model. Thus, neither of these firms could assess whether a bias existed from the variable selection process. We also do not concur with SBA's statement that its tests show that there was no identifiable bias in the model. While SBA may have tested its final model for bias, the agency has not provided us with any supporting documentation of these analyses. Further, testing the model would not identify this type of bias. Rather, an analysis of the variable selection process and whether it was consistently applied to all variables tested would more likely reveal whether such a bias existed in the final model. We also do not concur with SBA's suggested change to the conclusions of our report regarding whether a possible bias existed in the final model. The bias that is described in our report would result from variable selection or rejection. SBA discusses a statistical bias that suggests that over the historical period the chosen model systematically either under predicts or over predicts the likelihood of defaults or prepayments. To provide reasonable assurance that a bias was not introduced into the subsidy rate estimate through the choice of particular equations from among the set of reasonable equations, adequate documentation of the basis for selecting and rejecting variables is an important internal control. We were unable to determine whether this type of bias existed because of the lack of documentation on the model development process. [End of section] Appendix IV: Comments from the Office of Management and Budget: EXECUTIVE OFFICE OF THE PRESIDENT OFFICE OF MANAGEMENT AND BUDGET WASHINGTON, D. C. 20503: FEB 18 2004: Ms. Davi M. D'Agostino Director: Financial Markets and Community Investment: United States General Accounting Office Washington, DC 20548: Dear Ms. D'Agostino, Thank you for this opportunity to comment on the draft General Accounting Office (GAO) report on the Small Business Administration's (SBA) 7(a) program subsidy estimate ("Model for 7(a) Program Subsidy Estimate Had Reasonable Equations but Lack of Key Documentation Hampered Review"). The Office of Management and Budget (OMB) was pleased that GAO's draft report concluded that SBA's new "econometric equations were reasonable, and its model produced estimated default and recovery rates that were in line with historical experience."[NOTE 1] As we explained in our interviews with GAO during this review, OMB has worked closely and diligently with SBA staff on the development of the new model, which we believe is a vast improvement to the prior cost estimating techniques used by SBA. However, as we explain below, OMB disagrees with the recommendation that we revise OMB Circular A-11, as we believe that the draft report does not demonstrate that a revision is needed. The draft report recommends that OMB revise Circular A-11 to require agencies to document the development of their credit subsidy models, including "processes for selecting modeling methodologies over alternatives and variables tested and rejected along with the basis for excluding them." [NOTE 2] According to the draft report, this documentation would facilitate OMB and external party model review and financial statement audits. OMB agrees that model documentation is important for both budget and financial statement purposes. We do not believe, however, that the draft report findings demonstrate that more documentation is needed. Consistent with our practice for all Federal credit agencies and to fulfill the responsibilities of the OMB Director under the Federal Credit Reform Act of 1990, as amended, we worked closely with SBA during the model development process. Accordingly, we believe that the documentation SBA provided to OMB was adequate to determine that the subsidy estimates and reestimates published in recent President's Budgets are reasonable. In addition, as SBA explains in their letter on the draft report to you, Ernst & Young independently validated the model with available documentation from SBA. Ernst & Young found that "the 7(a)'s Model's assumptions and methodology appear to be reasonable and accurate" and that there was "reasonable accuracy of model calculations and output." [NOTE 4] GAO has confirmed the results of these reviews in its report, finding that the "econometric equations were reasonable." [NOTE 5] Consequently, we do not concur with the draft report's statement that "lack of improved OMB guidance for model documentation hampers adequate external oversight and validation of models used to generate credit subsidy estimates." [NOTE 6] Within the draft report, GAO also referenced the requirements in the Statement on Auditing Standards (SAS) No. 57 as the basis for reporting that "SBA did not prepare adequate supporting documentation to enable independent reviewers to understand and evaluate the process that SBA used." The standard, however, explains that "[m]anagement is responsible for establishing a process for preparing accounting estimates. Although the process may not be documented or formally applied, it normally consists of ... [i]dentifying the relevant factors that may affect the accounting estimate... [d]eveloping assumptions that represent management's judgment of the most likely circumstances and events with respect to the relevant factors... [d]etermining the estimated amount based on the assumptions and other relevant factors." We believe that SBA has fulfilled the management responsibilities as stated in SAS No. 57, as validated by Ernst & Young, and as referenced within the GAO draft report, thus providing independent reviewers with adequate documentation to understand and evaluate its estimates. Further, to require agencies to prepare additional documentation of the "variables tested and rejected along with the basis for excluding them" [NOTE 7] would be unduly burdensome. Since the draft report indicates that your staff received and tested those final decisions, namely, the model itself and its estimated default and recovery rates and determined that they were reasonable, we do not think the draft report adequately makes the case that a change is needed. Finally, GAO received the underlying loan data used as a basis for model development. With these data, GAO has the information necessary to test alternative variables to measure the reasonableness of the model. OMB agrees that agencies should be encouraged to keep adequate documentation of complex statistical models to assist in their review and improvement over time. However, OMB recognizes that agencies should have discretion in the level of record keeping that is required, based on the importance of the models, ease of replication of results, and other factors. The draft report states that a revision of OMB's Circular A-11 would facilitate "external financial statement audit[s]."S We believe that this rationale misconstrues the fundamental nature and purpose of the Circular. Circular A-11 provides guidance on issues affecting the Budget, including both formulation and execution of credit subsidy estimates. The Circular does not provide guidance on internal control as it relates to financial statement audits. Accordingly, even if we agreed with the draft report that additional guidance is needed, which we do not, we believe that Circular A-11 would not be the appropriate location to include instruction on matters affecting agency financial statement audits. Moreover, such instruction is already available in standards and guidance elsewhere. For example, the GAO report references the Federal Accounting Standards Advisory Board's ("FASAB") Technical Release 6, "Preparing Estimates for Direct Loan and Loan Guarantee Subsidies Under the Federal Credit Reform Act" ("TR6"). This guidance addresses internal control and the proper documentation an agency should maintain to support the assumptions used in subsidy calculations (paragraphs 20- 22). TR6 was developed by an interagency task force, whose members included GAO, OMB, and Federal credit agencies, under the auspices of the Accounting and Auditing Policy Committee of FASAB. This guidance is authoritative, falling within Level C on the hierarchy of the Federal generally accepted accounting principles (footnote 21 on page 29 of GAO's draft report should be changed to reflect that TR6 is, in fact, authoritative). This guidance specifically outlines requirements to document "key assumptions" of models, that is, those which have the greatest effect on the subsidy estimate. We believe that the documentation provided by SBA, which includes documentation of the key assumptions underlying the 7(a) subsidy estimate, satisfies the TR6 documentation requirements. In closing, we want to reaffirm that OMB takes seriously its responsibilities in overseeing the Federal Credit Reform Act. However, the draft report does not demonstrate that a revision to the procedures set forth in OMB's Circular A-11, or TR6, is necessary. Thank you again for this opportunity to comment on the draft report. Sincerely, Signed by: Linda M. Springer: Controller: and Richard P. Emery: Assistant Director for Budget: NOTES: [1] GAO draft report, Highlights page and p. 5. [2] GAO draft report, p. 37. [3] 2 U.S.C. § 661. [4] Ernst & Young report "Task 6: Independent Review of SBA 7(a) Subsidy Model, Part 2;" November 18, 2003, p. 2. [5] GAO draft report, Highlights page and p. 5. [6] GAO draft report, p. 36. [7] GAO draft report, p. 37. [8] GAO draft report, p. 29. The following are GAO's comments on the Office of Management and Budget's letter dated February 18, 2004. GAO Comments: 1. We do not concur with OMB and believe that in light of the consistent difficulty experienced by three independent reviews of SBA's 7(a) model, our report makes a case for the need to enhance the guidance in Circular A-11 to require agencies to document the process they used to develop the model. See the Agency Comments and Our Evaluation section of the report pages 36-37. 2. We do not concur with OMB. See Agency Comments and Our Evaluation section of the report page 37. 3. We do not concur with OMB. See Agency Comments and Our Evaluation section of the report pages 37-38. 4. We do not concur with OMB. See Agency Comments and Our Evaluation section of the report page 38. 5. While we concur that agencies need to have discretion in the level of documentation that they maintain when dealing with inconsequential matters, we do not agree that such discretion should be allowed in clearly consequential activities such as the development of the 7(a) model. We reaffirm our position that OMB needs to enhance its guidance regarding the need for adequate documentation for the credit subsidy model development process. 6. We concur with OMB that the fundamental nature and purpose of Circular A-11 is not to provide guidance on internal controls as it relates to financial statement audits. However, the primary focus of this report is on credit subsidy estimates which are prepared in accordance with Circular A-11. Also, the financial statement audit is an important validation of the credit subsidy estimates included in the budget. We reaffirm our conclusion and recommendation that enhanced guidance on credit subsidy model development would facilitate external review, including those performed by OMB, of the credit subsidy estimate. Because of the relationship between the credit subsidy estimates prepared for the budget and those used in the financial statements, the enhanced guidance would benefit both the financial statement audit and budgetary review. 7. Report language was revised to address technical points about Technical Release 6. However, as we discussed, this guidance does not specifically require documentation of credit subsidy model development. [End of section] Appendix V: GAO Contacts and Staff Acknowledgments: GAO Contacts: Davi M. D'Agostino (202) 512-8678 M. Katie Harris (202) 512-8415: Staff Acknowledgments: In addition to those individuals named above, Jay Cherlow, Dan Blair, Edda Emmanueli-Perez, Mitch Rachlis, Marcia Carlsen, Beverly Ross, Susan Sawtelle, and Mark Stover made key contributions to this report. (250113): FOOTNOTES [1] Econometric modeling is a series of techniques used to quantify relationships among a group of variables and is often used to forecast the value of economic variables such as loan defaults. [2] OFHEO was established as an independent entity within the Department of Housing and Urban Development. OFHEO's primary mission is ensuring the capital adequacy and financial safety and soundness of two government-sponsored enterprises--the Federal National Mortgage Association and the Federal Home Loan Mortgage Corporation. [3] A cohort includes those direct loans or loan guarantees of a program for which a subsidy appropriation is provided in a given fiscal year even if the loans are not disbursed until subsequent years. [4] U.S. General Accounting Office, SBA's 7(a) Credit Subsidy Estimates, GAO-01-1095R (Washington, D.C.: Aug. 21, 2001). [5] U.S. General Accounting Office, Internal Control: Standards for Internal Control in the Federal Government, GAO/AIMD-00-21.3.1 (Washington, D.C.: November 1999). [6] See, for example, Brian Headd, "Business Success: Factors Leading to Surviving and Closing Successfully," Office of Advocacy, U.S. Small Business Administration. (This paper is part of a series of papers distributed by the U.S. Bureau of the Census and is based on research conducted by the author when he worked there, but does not represent the official views of either SBA or the Census.) [7] Multinomial logistic regression is a technique used to estimate the probability of an event occurring when the variable of interest, such as the status of a loan, is best presented in categories rather than as continuous numbers. In this case, the categories might be default, prepay, or still active. Economists generally prefer this method to simpler techniques that provide less realistic estimates. [8] SBA's model is based on quarters of the year, and the unit of analysis is the individual loan for as long as it remains active. So, if a loan was active for 16 quarters before being prepaid, there will be 16 observations as to whether a borrower had defaulted, prepaid, or made regular payments on the loan in that quarter. All these observations are used in estimating the likelihood of default and prepayment. [9] We also wanted to use a variable measuring firm size since larger firms may have more resources they can use to avoid default in the event of adverse business conditions. However, SBA's database did not include data on firm size. [10] The optimistic assumptions were that the GDP growth rate was 10 percent higher than the OMB forecast, and the unemployment rate was 10 percent lower. The pessimistic assumptions were a 10 percent lower GDP growth rate and a 10 percent higher unemployment rate. [11] The cumulative net recovery rate for a cohort of loans is defined as cumulative net recoveries to date divided by cumulative defaulted dollars to date. [12] Economic reasoning might suggest that recovery rates would be lower when economic conditions are unfavorable, but other attempts to incorporate economic variables into recovery rate equations have not been successful. [13] The average default experience was calculated based on the actual default experience for loans issued between 1992 and 2001 (all years referred to are fiscal years) depending on the year of the loan when the default occurred. For example, year 1 average defaults are based on the average of actual first-year defaults that occurred for loans between 1992 and 2001. Year 2 average defaults are based on the average of actual second year-defaults that occurred for loans issued between 1992 and 2000. Year 10 average defaults are based on the average of actual tenth year defaults that occurred for loans issued in 1992. [14] SBA officials told us they did not have the resources available to provide these data through fiscal year 2002. [15] We calculated the actual default experience during fiscal year 2001 for the loans issued since 1986 based on the default experience of those loans during fiscal year 2001 and the age of those loans. For example, the default experience of loans issued in 1991, which were in their eleventh year during 2001, was compared with the estimated default rates projected for the eleventh year. [16] A credit score is a numerical measure of a borrower's creditworthiness based on a statistical analysis of past financial behavior and current financial obligations. [17] Berger, Allen N., Frame, W. Scott, Miller, Nathan H. "Credit Scoring and the Availability, Price, and Risk of Small Business Credit," Federal Reserve Board, Mimeographed, April 2002; Caouette, John B., Altman, Edward I., Narayanan, Paul. Managing Credit Risk: The Next Great Financial Challenge. New York: John Wiley & Sons Inc., 1998; and W. Scott, Padhi, Machael, Woosley, Lynn, The Effect of Credit Scoring on Small Business Lending in Low-and Moderate-Income Areas, Frame. Federal Reserve Bank, Atlanta, Working Paper 2001-6, Unpublished, April 2001. [18] Present value is the worth of the future stream of returns or costs in terms of money paid immediately. In calculating present value, prevailing interest rates provide the basis for converting future amounts into their "money now" equivalents. [19] The Economy Act, 31 U.S.C. 1535, permits federal agencies to enter into agreements with other federal agencies for goods or services if the agency contracting the service cannot obtain the goods or services as conveniently or economically by contracting with a private source. [20] The 7(a) program has three classifications of lenders: regular, certified, and preferred lenders. [21] U.S. Small Business Administration, Office of the Inspector General Auditing Division, Audit of SBA's FY 2003 Financial Statements, audit report 4-10, (Washington, D.C.: Jan. 30, 2004). [22] Technical Release 6 was issued by the Accounting and Auditing Policy Committee (AAPC) a permanent committee established by the Federal Accounting Standards Advisory Board whose mission is to promulgate accounting standards for federal government reporting entities. The AAPC's role is to assist the federal government in improving financial reporting by providing solutions to accounting and auditing related issues. Technical Release 6 provides implementation guidance for agencies to prepare and report credit subsidy estimates. [23] The number of errors does not equal the number of loans that had errors since a single loan can have multiple errors. [24] According to SBA officials, the actual repair and denial amounts are higher than this because many lenders release SBA of its guaranty obligations rather than having repairs or denials on their lender record. [25] Because we followed a probability procedure based on random selections, our sample is only one of a large number of samples that we might have drawn. Since each sample could have provided different estimates, we express our confidence in the precision of our particular sample's results as a 95 percent confidence interval. This is the interval that would contain the actual population value for 95 percent of the samples we could have drawn. As a result, we are 95 percent confident that the number of data errors in key aspects of the model, such as default and recovery dates and amounts and loan status, do not exceed 1 percentage point of the key data used in the model. [26] SAS No. 57 became effective for audits of financial statements for periods beginning on or after January 1, 1989. [27] U.S. General Accounting Office, Internal Control: Standards for Internal Control in the Federal Government, GAO/AIMD-00-21.3.1 (Washington, D.C.: November 1999). [28] We are 95 percent confident that the number of data errors in key aspects of the model, such as default and recovery dates and amounts and loan status, do not exceed 1 percent of the data population. [29] Using data obtained from SBA, we were able to successfully replicate their equations. GAO's Mission: The General Accounting Office, the investigative arm of Congress, exists to support Congress in meeting its constitutional responsibilities and to help improve the performance and accountability of the federal government for the American people. GAO examines the use of public funds; evaluates federal programs and policies; and provides analyses, recommendations, and other assistance to help Congress make informed oversight, policy, and funding decisions. GAO's commitment to good government is reflected in its core values of accountability, integrity, and reliability. Obtaining Copies of GAO Reports and Testimony: The fastest and easiest way to obtain copies of GAO documents at no cost is through the Internet. GAO's Web site ( www.gao.gov ) contains abstracts and full-text files of current reports and testimony and an expanding archive of older products. The Web site features a search engine to help you locate documents using key words and phrases. You can print these documents in their entirety, including charts and other graphics. Each day, GAO issues a list of newly released reports, testimony, and correspondence. GAO posts this list, known as "Today's Reports," on its Web site daily. The list contains links to the full-text document files. To have GAO e-mail this list to you every afternoon, go to www.gao.gov and select "Subscribe to e-mail alerts" under the "Order GAO Products" heading. Order by Mail or Phone: The first copy of each printed report is free. Additional copies are $2 each. A check or money order should be made out to the Superintendent of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or more copies mailed to a single address are discounted 25 percent. Orders should be sent to: U.S. General Accounting Office 441 G Street NW, Room LM Washington, D.C. 20548: To order by Phone: Voice: (202) 512-6000: TDD: (202) 512-2537: Fax: (202) 512-6061: To Report Fraud, Waste, and Abuse in Federal Programs: Contact: Web site: www.gao.gov/fraudnet/fraudnet.htm E-mail: fraudnet@gao.gov Automated answering system: (800) 424-5454 or (202) 512-7470: Public Affairs: Jeff Nelligan, managing director, NelliganJ@gao.gov (202) 512-4800 U.S. General Accounting Office, 441 G Street NW, Room 7149 Washington, D.C. 20548: