Building the CFPB’s Arbitration Archive A Commentary on Design, Implementation, and Privacy 2 | Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy Dr. Xiaoling Ang Edgeworth Economics Dr. Xiaoling Ang is an expert in banking and financial institutions and the economics of consumer financial products—including mortgage, student loan, subprime lending, and deposit products. She joined Edgeworth Economics from the Consumer Financial Protection Bureau, where she was lead economist on several rulemakings, program evaluations, and a congressional report. Her research on financial services has been published in law and economic journals, presented at leading international research conferences, and cited in various media outlets. Dr. Ang holds an MA and PhD in Economics from Princeton University, and a BS, summa cum laude, and MS in Mathematics from Loyola University Chicago. ABOUT THE AUTHORS Disclaimer: The opinions expressed herein do not necessarily represent the views of Edgeworth Economics or any other Edgeworth consultant. Thomas Kearney Akerman LLP Thomas Kearney is a Partner in Akerman’s Consumer Financial Services Practice Group. He joined the firm from the Consumer Financial Protection Bureau’s Office of Regulations, where he played key roles in developing and drafting multiple mortgage originations related rulemakings. He most recently led the team responsible for the Home Mortgage Disclosure Act proposed and final rules. Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy | 1 I. Introduction An element of the Consumer Financial Protection Bureau’s (CFPB) potential rulemaking on arbitration agreements is a proposal to require the submission of arbitral claims and awards, which could be published online by the CFPB.1 The justifications proposed for this approach in the Outline of Proposals Under Consideration and Alternatives Considered center around transparency and related diagnostic analyses, including identifying administrator bias,2 ongoing monitoring of arbitrations to identify trends in subject matter and outcomes,3 transparency of awards and arbitration decisions,4 and confidence in the arbitral system through transparency.5 The CFPB’s arbitral archive data collection would apply to all individual and class consumer arbitrations6 related to products in the scope of the CFPB’s jurisdiction.7 The challenges of defining the parameters of the data collection flow not only from the breadth of the products and providers covered but also from the logistic and privacy considerations of creating a data collection and processing system that is systematic and durable to changes in participants, products, and analysis goals. Assuming that a submission system for consumer financial arbitral claims and awards is put into place, this article considers the costs and benefits of implementing the proposal to require the submission of arbitral claims and awards for consumer financial products,8 and how these costs and benefits vary with the nature of implementation. The CFPB should take into consideration the nuances and myriad business practices of covered persons when crafting regulation that touches on all of their related arbitration agreements. Given the different roles of covered persons and types of consumer interactions with firms in the consumer financial marketplace, the complexity of the CFPB’s task is obvious. The analysis that follows focuses on the implementation considerations of the collection, as well as its policy implications, in light of the potential coverage of the arbitral publication system.9 Section II considers potential applications of the data and its form, both with respect to potential data analyses and privacy considerations that should be taken into account. Section III turns attention to the practicalities of implementation, both in terms of maintaining and distributing data as well as who is responsible for privacy considerations. The approach to data collection, processing, and privacy should be driven by analysis and privacy goals, which should be developed and considered by the CFPB throughout its rulemaking process. 2 | Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy II. Potential Analyses and Privacy Considerations A. Research and Analysis Applications The CFPB’s materials for its October 7, 2015 Small Business Advisory Review Panel (SBREFA) for potential rulemaking on arbitration agreements include a proposal for mandatory submission of consumer financial service arbitration claims and awards that the CFPB could publish online. The materials point to several potential uses of the collected data, including “to monitor arbitrations on an ongoing basis and to identify trends in arbitration proceedings” as well as “assist the CFPB and the public in identifying potentially problematic business practices that harm consumers.”10 While claims and award documents undoubtedly contain valuable information that could be used to pursue these analyses, there is a lot of work to be done between the production of these documents and the systematic analysis of their content. The arbitral archive should be easy to mobilize for both legal and economic analysis and provide balance between submission burden and privacy considerations across consumer financial service providers, consumers, and arbitration administrators. The intricacy and scope of this undertaking is not lost on users of other systems that aggregate documents and data across various systems, such as Public Access to Court Electronic Records (PACER),11 either directly or through third party search functions such as LexisNexis12 or PacerPro.13 The process of preparing data for analysis is referred to as “data building” and the approach to a data build depends on the techniques that will be used to analyze the data. Previous studies of arbitration provide valuable insight into how arbitral claim and award data might be used in both qualitative and quantitative applications, as is discussed in the next subsection (II.A.1). 1. Classification and Categorization A basic necessity for any archive is a search function based on document content. Implementing a search function that is more sophisticated than a text match (or, realistically, text match to OCR text from a PDF), requires classification of content. To the extent that multiple terms may be used for the same concept or there is an overarching category not mentioned verbatim in the document, these themes may not be retrieved. For example, an arbitral claim related to payday lending may never reference “alternative financial services,” “non-bank services,” or “small-dollar lending.” The complexity of defining categories in the CFPB’s potential arbitral archive is compounded by its coverage of a variety of products and services, including checking accounts, student loans, credit reporting, credit and debit cards, payday loans, medical debt, international money transfers, and consumer deposit accounts.14 In a particular market, this coverage may extend to multiple points that involve consumers in the value chain. For example, in the credit card markets the CFPB’s jurisdiction extends to credit card advertising, the underwriting and card issuance process, servicing, any related credit reporting, and debt collection.15 Categorization of claims and awards would streamline searches for similar cases based on defendant, product market, and a range of other factors. This could make it more efficient to engage with precedent, as is common in labor arbitration.16 Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy | 3 The benefit of well-defined categories is illustrated by the Financial Industry Regulatory Authority’s (FINRA) dispute resolution statistics17 which include case filings by controversy type, security type, and open/close status. These categories make it possible for FINRA to produce consistent historic statistics to examine broad arbitration and mediation trends such as the number of cases filed in a category. FINRA data inherits this categorization from structure imposed by its Online Arbitration Claim Filing System through a series of drop-down boxes and radio buttons.18 Since consumer financial services arbitration is handled by multiple arbitration administrators,19 categorization will have to be harmonized across administrators or completed by a third party—likely the CFPB—after the documents are submitted. Given the challenge of collecting and harmonizing data from multiple administrators, it is unsurprising that most empirical academic studies of arbitration rely on awards from a single administrator (see section III.A). 2. Bias Analysis The CFPB cites specifically to the use of claim and awards data in diagnosing whether an arbitrator exhibits bias. It states “The [CFPB] believes that there is a potential for consumer harm if arbitration agreements were to be administered by biased administrators (as was alleged in the case of the NAF) or individual arbitrations were otherwise conducted in an unfair manner. Thus the [CFPB] is considering a limited intervention that would serve to deter the emergence of such unfair arbitrations and also to shed sunlight on any unfairness that might emerge, while at the same time would impose minimal regulatory burdens on current arbitration activity.”20 The important issue with this approach is that win-loss data bias measures are difficult to interpret, particularly once they are used as a diagnostic. This is illustrated by examples that take into account small sample issues, omitted variable bias, and potential incentive effects. The power21 of statistical analysis in diagnosing bias depends on the number of observations available for analysis—the larger the number of observations for any given analysis, the higher the power of the test.22 In the CFPB’s arbitration study dataset there are two affirmative Electronic Fund Transfer Act (EFTA) claims filed in 2010-2011 that were resolved by arbitrators, both of which resulted in affirmative consumer awards.23 Suppose that each arbitration was arbitrated by a different arbitrator, and that one arbitration resulted in a consumer award and the other arbitration resulted in no consumer award. While the records are 100% consumer awards versus 0% consumer awards, standard statistical tests do not reject equality of outcomes because we only observe one observation for each arbitrator.24 This may be particularly important when attention is restricted to particular types of arbitrations or arbitrations that occur within a particular interval. Even when more arbitrations are observed for each arbitrator for each type of arbitration, the data collected about each claim can affect the analysis. If arbitrators are not assigned randomly to cases, then information observed in the data may not fully capture the differences between arbitrations, even with more sophisticated techniques like regression analysis. Arbitrators may be chosen to have “qualifications that match the needs of the case” and must be agreed on by both parties.25 Consider two arbitrators that have identical characteristics in the data but who arbitrate cases that differ in ways not captured in the data. Suppose Arbitrator A specializes in cases that should result in consumer awards 40% of the time based on merits and Arbitrator B specializes in cases that should be awarded in 60% of the time based on merits. This specialization may be based on a variety of factors, such as previous experience with the product or legal concepts related to the case. Then even though they are both neutral, due to arbitrator assignment, Arbitrator B looks like she is more favorable to consumers than Arbitrator A. 4 | Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy A diagnostic for neutrality may be benchmarked in multiple ways, including against an absolute proportion of consumer awards or versus other arbitrators. If arbitrators think that their selection depends on their appearance of neutrality and understand how neutrality is measured, then this information can potentially create adverse incentives even with detailed data collection. Arbitrator A in the example in the previous paragraph might take steps to manage her reputation for neutrality. She might try to get selected for more cases that should objectively result in consumer awards, despite being less familiar with the relevant subject matter. If the neutrality measure is relative to Arbitrator B, the two might be able to “trade” cases in order to converge on similar award records. Perhaps more seriously, Arbitrator A might decide to change the way that she makes award decisions in order to balance her record. As Klement and Neeman point out, “[s]ince the only way an arbitrator can establish a reputation for being impartial is by avoiding a series of decisions that might seem biased against a specific group, she might want to make an incorrect decision when a correct decision may raise the suspicion that she is biased.”26 B. Privacy Considerations Specific to Consumer Financial Arbitral Claims and Awards The CFPB’s SBREFA report states “before collecting or publishing any arbitral claims or awards, the [CFPB] would ensure that these activities comply with privacy considerations.”27 The CFPB did not, however, identify the privacy considerations at issue. In the U.S., privacy protections generally depend on context; letters from nursing home residents are generally protected,28 while letters from deployed soldiers have historically been censored.29 No law expressly articulates unique privacy considerations applicable to the collection and dissemination of arbitral claims and awards.30 The CFPB has discretion to articulate relevant privacy considerations, but those considerations will affect the costs and benefits of the arbitral publication system. For example, consumers may be particularly protective of information about their consumer financial services arbitration experiences because of stigma associated with bankruptcy31 or debt.32 It is therefore important to understand what privacy considerations the CFPB is considering. Existing privacy regulations specify varying standards for protected private information. Agencies commonly rely on the GAO’s definition of personally identifiable information (PII), which means any information about an individual, including “any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.”33 These principles are familiar; the CFPB already considers them with respect to its own operations, such as the consumer complaint database and Freedom of Information Act requests. But these principles, which are intended to appropriately maximize government transparency, are necessarily broad. A privacy rule that follows the same approach would maximize redaction and limit the utility of an arbitral publication system. Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy | 5 In some cases, the CFPB should also consider privacy implications beyond those generally applicable to financial organizations. Special privacy considerations are imposed under the Health Insurance Portability and Accountability Act (HIPAA).38 Protected health information includes any information that relates to “the provision of health care to an individual” and “the past, present, or future payment for the provision of health care to an individual.”39 This type of information may be included in arbitral claims and awards. Medical point-of-sale lending, such as loans for orthodontia, cosmetic surgery, or LASIK, is an increasingly important part of the consumer financial services market.40 Sometimes medical financing is provided by a third party financial services company, but in many cases patient financing is directly provided by the medical provider itself. Similar issues exist in medical debt collection, which is another area of concern to the CFPB.41 Determining what information should be redacted will likely require a qualitative analysis of the information in the arbitral claim and award. Even if the names of consumers are removed, a determination must be made whether there is a reasonable basis to believe the information included in an arbitral claim or award can be used to identify the individual. The standards applicable to the financial services industry may provide better guidance. The GrammLeach-Bliley Act (GLB) is the first logical stop for guidance. Personally identifiable financial information is defined to include any information “about a consumer resulting from any transaction involving a financial product or service between a covered person and a consumer, as well as information obtained about a consumer in connection with providing a financial product or service to that consumer.”34 This definition is narrower than the GAO’s definition because it does not include the “linked or linkable” concept. Applying these principles would be less burdensome, but still protective enough to potentially hamper the CFPB’s transparency goals. State law presents another set of privacy considerations. Several states have financial privacy protections similar to, and in some cases stronger than, those provided by GLB.35 Will the CFPB incorporate state law requirements into the privacy protection requirements? If the CFPB assumes responsibility for redacting private information, will it tailor its protections based on the residence of a particular consumer? The CFPB may also want to consider state requirements related to the confidentiality of the arbitration proceedings themselves. While several states limit the admissibility of arbitration information in legal proceedings, Missouri completely prohibits the disclosure of information related to an arbitration.36 Texas and Arkansas similarly prohibit arbitrationrelated disclosures, and also require an in-camera judicial proceeding to determine whether protected information can be disclosed in the event the confidentiality requirements conflict with other legal requirements.37 As in other contexts, the CFPB should consider whether to tailor arbitral disclosure requirements in light of the relevant goals and considerations applicable in different states. 6 | Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy Practical healthcare privacy issues arise beyond the determination of what information should be redacted. U.S. Department of Health and Human Services (HHS) regulations impose security, notification, and privacy requirements on persons in possession of protected health information.42 Even if the CFPB is not subject to these rules, is it willing to provide a similar level of protection to consumers’ sensitive protected health information? Also, if the CFPB imposes a data collection requirement, presumably it will also examine for compliance with its requirements. What will happen if the CFPB discovers violations of HHS’ security, notification, or privacy regulations during an examination? Is the CFPB prepared to work with HHS or State Attorneys General to address these violations?43 The medical financing industry should also carefully consider the CFPB’s proposal, and be prepared to work with the CFPB on issues arising in this context. Special privacy considerations also apply to students. While most education financing is provided through private student lenders or the Department of Education, nonprofit public and private universities directly lend to students.44 The Family Educational Rights and Privacy Act45 (FERPA) imposes privacy requirements on these institutions. FERPA protects personally identifiable information, which is expansively defined and includes “information that, alone or in combination, is linked or linkable to a specific student that would allow a reasonable person in the school community, who does not have personal knowledge of the relevant circumstances, to identify the student with reasonable certainty.”46 If a dispute over a loan between a student and an educational institution leads to arbitration, information included in educational records may be included in arbitral claims and awards. Redacting some types of FERPA PII, like the student’s name or personal identifier, will be straightforward, but redacting linked or linkable information involves a qualitative analysis of the facts and circumstances. Articulating a privacy standard in this context is further complicated by the fact that students are often a particularly vulnerable segment of the population. Lawsuits between nonprofit educational institutions and indebted students are increasingly common, and arbitration may become the preferred forum over time.47 The prospect of imposing new compliance costs on these educational institutions, which may be passed on to indebted students, deserves serious consideration.48 These complexities illustrate the core problem— protecting privacy can require much effort. If the CFPB takes an under-inclusive approach and requires redaction of only a few items, costs on industry will be minimized and the data disclosed to the public will be maximized, but the risk of consumer harm will increase. If the CFPB takes an over-inclusive approach the risk of consumer harm will decrease, but industry costs will be higher and the published data will likely be less useful. Given these challenges, the CFPB should consider whether the proposed privacy considerations need to be defined based on type of institution involved, type of consumer affected, or nature of the dispute. No matter the approach proposed, the clearer the CFPB articulates the privacy standard, the easier it will be for the public to evaluate the utility the arbitral publication system. Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy | 7 A. Ex-Post Hand Coding: The Simplest Approach? Standardization and categorization is a fundamental step to any analysis. Hand coding after the data is collected is one approach, and the one used by the CFPB in its Arbitration Study.49 The CFPB’s SBREFA materials acknowledge that “[t]he CFPB believes that the Study provides the most comprehensive data on individual consumer financial arbitration frequency and outcomes to date.”50 CFPB staff should be lauded for this labor-intensive undertaking: they manually coded all non-class consumer awards from American Arbitration Association (AAA) case management records that were received from January 1, 2010 through February 2013. The data review procedures are documented in Appendix B of the report,51 and include codes constructed at the researchers’ discretion, such as who invoked arbitration as to prior litigation.52 This type of coding is not unheard of in coding qualitative data, and often requires that the coder be knowledgeable about the material being coded. Academic studies of arbitration have taken a similar data building approach to the CFPB Arbitration Study. While the CFPB study involved hand-coding 1,241 consumer arbitrations related to checking accounts, credit cards, and payday loans,53 larger scale arbitration data building exercises have been undertaken to study both securities and labor arbitration. The CFPB draws an analogy to FINRA publication of awards and AAA publication of employment awards in its proposal54 and states that “this data would also be helpful to the CFPB, consumers, companies, and possibly to other regulatory entities and academics who study consumer finance.”55 While the FINRA and AAA awards mentioned above are published, awards have been hand-coded by researchers in order to perform III. Implementation Considerations quantitative analysis at the appropriate level of detail. In 2007 the Cornell Industrial and Labor Relations School purchased and hand-coded 3,200 arbitration awards issued by FINRA and its predecessors, which was used in multiple academic papers by various authors.56 Subsequent papers made use of another hand-coding of FINRA records from November 1992 through December 2016.57 Similar hand-coding projects were undertaken in research related to employment disputes in securities58 and labor and employment arbitration.59 Even when data is published as a spreadsheet by the arbitration administrator, as AAA did for its consumer arbitration statistics pursuant to “state statutes such as California Code of Civil Procedure §1281.96 and Maryland Commercial Law § 14-3901 to 3905”60, the coding may not reflect the information necessary for the CFPB’s purposes. For example, consumer financial services arbitration can be isolated, but the product type, such as brokerage product versus educational lending, is not coded. Getting this information for these records would require additional coding involving researcher discretion. The planning for and demands on a dataset for specific research projects require less relative to those for ongoing government data collection and dissemination. Even a well-resourced retrospective hand-coding approach may not achieve the CFPB’s intended goals if implemented in lieu of a well-defined, forward-looking data reporting standard. 8 | Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy B. Costs and Benefits of Data Collection and Maintenance The aggregation, standardization, and dissemination of public data is a major undertaking that involves both public and private sector participants. Beyond the participation of firms required to submit the data, there are also industries created around reformatting, retrieving, and validating data from government sources. Both government and private sector data disseminators must assure that their product is quality controlled and consistent over time. This includes thorough documentation, particularly of any efforts to harmonize the data across multiple sources. In the case of the CFPB’s proposal to collect arbitral claims and awards, this requires, at a minimum, combining data from AAA and JAMS.61 1. Potential Benefits There are various successful examples of ongoing government data projects that build on the submission of administrative data: the Bureau of Labor Statistics’ Quarterly Census of Employment and Wages62 makes it possible to compute quarterly employment statistics at a local level based on information submitted by employers; the U.S. Energy Information Administration incorporates state agency data, third party sources, and Department of Interior data to project monthly crude oil production estimates63; and the Department of Education’s National Center for Educational Statistics has been fielding the National Postsecondary Student Aid Survey that combines administrative data on postsecondary transcripts, financial aid, and test scores in addition to a survey component.64 These are examples of successful government efforts that overcome coordination problems in assembling data from disparate sources. These data collections and their attendant reports and publicly available data are a fount of information that are used in a variety of business, policy, and research applications. Producing high quality data for public consumption is a major undertaking: after the data collection is designed, dedicated staff and information technology resources are a critical to ensure that accuracy and privacy standards are met and reports are produced. To the extent that data is made publicly available, answering user questions about technical matters such as the definition of fields and any methods used to suppress confidential information, should also be provided. To the extent that the government data product may not be readily available in the format that the end user requires for analysis or that may be easily accessed, there are a series of third party intermediaries that provide additional search and processing for a fee. Title companies, which perform and guarantee searches of government records of real property ownership, make up one such industry. For data that can be searched on a document-bydocument basis, there is also a growing industry of firms that code the data so that it can be analyzed quantitatively. These include EDGAR Online,65 which converts publicly available SEC Electronic Data Gathering, Analysis, and Retrieval (EDGAR) filings to quantitative data, and FNC National Collateral Database,66 which collects and harmonizes information from real estate appraisal and local property assessments. The existence of these services indicate that there is market value for the intermediate processing of government data. The extent to which the CFPB processes and makes available its consumer arbitration data will largely determine who analyzes it and how. Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy | 9 C. Responsibility for Privacy Protection and Required Disclosure Format Identifying the privacy considerations is only the one of the first steps in analyzing the benefits and burdens of an arbitral publication system. Two more important privacy-related steps remain—the privacy considerations must be applied to the data and the data must be transmitted. Determining who is required to apply the privacy principles will affect both the compliance costs and the usefulness of the data. The CFPB should not only consider a covered person’s technological sophistication and privacy expertise, but also other considerations that may affect the data. Two financial institutions may both have a thorough understanding of privacy practices, but one may under-redact if it is more concerned about the CFPB’s scrutiny than potential harms to a consumer’s privacy, while another with aggressive privacy practices may over-redact. If arbitrators are required to redact, and arbitrators have a less sophisticated understanding of privacy considerations, then the work may take longer and the final data transmitted may be over- or underredacted. In either of these examples, the data submitted will be inconsistent, which will likely impair the usefulness of the data and potentially undermine the value of the arbitral publication system overall. Further, if financial institutions or arbitrators are required to apply the privacy principles, the CFPB must still develop processes to ensure that consumer privacy is protected in the event that a covered entity incorrectly submits data that should have been redacted. A privacy standard that avoids qualitative assessments, such as an enumerated list of proscribed identifiers, sidesteps these issues, but as discussed in section II.B above also increases the risk of consumer harm. 2. Potential Costs Government agencies have taken different approaches to fund data collection and dissemination. One common strategy is to include data collection and processing as part of the organizations budget. For example, the Department of Energy’s FY 2016 budget includes a $122 million line item for EIA funding.67 A second funding scheme is employed by the Federal Financial Institutions Examination Council which splits costs between its member organizations (Federal Reserve System, Federal Deposit Insurance Corporation, Office of the Comptroller of the Currency, the CFPB, Housing and Urban Development, and mortgage insurance companies) for the production and distribution of data and reports. The expenditures for these totaled $4.6 million in 2014 and $4.2 million in 2015.68 Finally, some government entities charge the user directly for use of the system. For example, PACER is available on a fee-for-service basis at a cost of $0.10 per page and $2.40 per audio file, with a $15 charge exemption per quarter for the indigent and certain pro-bono work.69 Data costs, of course, depend on what data is collected and how long it is retained. For example, the California Code of Civil Procedure §1281.96 requires that all information related to consumer arbitration commenced in or after Jan. 1, 2003 be retained for five years. The records to which retention rules apply can also play an important role in the analyses that can be performed: in a sample of 205 cases from the first half of 2013, 93.7% of FINRA customer complaints that were settled and involved the broker appearing before an arbitration panel requesting expungement were expunged.70 In the case of consumer arbitration, the CFPB should think carefully about how this relates to statutes of limitations and whether its collection should be retained for a longer period than record retention periods required of firms involved in the arbitration. 10 | Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy These concerns could be avoided if the CFPB assumes responsibility for protecting privacy, but this raises another set of concerns. This approach would impose the least compliance costs on industry, but it might take longer for the CFPB to scrub records than for those with knowledge of the facts and circumstances of individual cases. This may frustrate the goal of empowering the public with an arbitral publication system. Delays may create an information asymmetry, where financial institutions know the current state of arbitration liability but consumers and their representatives are relying on stale data. Under this approach the CFPB might also obtain a substantial amount of consumer PII, which would increase the magnitude and complexity of the CFPB’s own internal privacy operations. Importantly, if the CFPB proposes to assume responsibility, the CFPB may also be motivated to articulate the proposed privacy standard less clearly. But if the proposed privacy standard is not sufficiently clear, consumer and industry groups may not be able to comment effectively on the value of the proposed arbitral publication system in the near term, and may not be able to judge the reliability of the system in the long term, as the privacy algorithm could evolve without further notice and comment. Regardless of who is responsible for protecting privacy, the format of the data itself also plays a role in the burdens and benefits of the arbitral publication system. As discussed above, the more structured a dataset is the more useful it is to data users. But the more sophisticated a data reporting structure is, the higher the implementation and ongoing costs tend to be. While financial institutions often have, or can acquire, data processing and transmission tools, these tools are not cheap. Even though financial institutions can spread the costs among a large number of transactions, it is unlikely an arbitral publication system with the complexity of the SEC’s EDGAR system would “impose minimal costs,” as the CFPB asserts in its SBREFA outline of proposals.71 On the other hand, if a less sophisticated—and less expensive—system is chosen, the more likely it is that data quality and usability will be affected. For example, if arbitrators are required to transmit the data, they would likely have an easy time transmitting PDFs, but that format lacks structure and would significantly increase the time needed to process and disclose the data. Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy | 11 IV. Closing Thoughts The CFPB’s potential proposed rulemaking on consumer financial arbitration agreements is poised to require collecting data on consumer financial arbitral claims and awards and possibly disseminating it to the public, potentially covering a wide range of business practices and contexts. This article considers how data collection goals and practical considerations might inform the design of the data collection. Privacy considerations, particularly for consumers, and who is responsible for maintaining privacy standards is critical to the design of the data collection and dissemination strategy. This is particularly critical for products that may be impacted by other privacy standards, such as HIPAA, based on their relationship to products in industries with higher privacy standards such as healthcare. The analysis also considers the tradeoffs between more freeform data collection versus more structured data collection that embeds categorization and classification into the data submission process. Considering the extensive use of hand-coding in previous studies of arbitration and the large amount of skilled labor these tasks require, ex-ante standardization may be a more efficient way to follow trends and get a picture of the market. The extent to which data is processed and the extent to which data processing is financed through the CFPB’s budget or usage fees is another fundamental design consideration. Regardless of the data collection and dissemination approach selected, implementing an ongoing data collection that is consistent, durable, and useable for research and analysis requires ongoing investment of time and resources. 12 | Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy NOTES 1 CFPB, SMALL BUSINESS ADVISORY REVIEW PANEL FOR POTENTIAL RULEMAKING ON ARBITRATION AGREEMENTS: OUTLINE OF PROPOSALS UNDER CONSIDERATION AND ALTERNATIVES CONSIDERED (Oct. 7, 2015), available at http://files.consumer finance.gov/f/201510_cfpb_small-business-reviewpanel-packet-explaining-the-proposal-underconsideration.pdf. 2 Id. at 20. 3 Id. at 20. 4 Id. at 20. 5 Id. at 20. 6 Id. at 21. 7 Id. at 12 (references authority granted by the Dodd-Frank Act to “prohibit or impose conditions or limitations on the use of an agreement between a covered person and a consumer for a consumer financial product or service providing for arbitration of any future dispute between the parties.”). 8 Id. at 19. 9 From a practical standpoint, the contours of the CFPB’s arbitral publication system may change the size of the data collection, but are unlikely to affect its structure. 10 Supra note 1, at 20. 11 UNITED STATES COURTS, PUBLIC ACCESS TO COURT ELECTRONIC RECORDS, https://www.pacer. gov/ (last visited Apr. 20, 2016). 12 LEXISNEXIS®, http://lexis.com (last visited Apr. 20, 2016). 13 PacerPro, https://www.pacerpro.com/ (last visited Apr. 20, 2016). 14 Id. at 22-23 (lists products and providers potentially subject to the requirements). 15 See 12 U.S.C. § 5481(12) (listing enumerated consumer laws subject to the CFPB’s jurisdiction, including the Fair Credit Reporting Act, the Fair Debt Collection Practices Act, and the Truth in Lending Act). 16 W. Mark C. Weidemaier, Judging Lite: How Arbitrators Use and Create Precedent, 90 NORTH CAROLINA LAW REVIEW 1091, at 5 (2012). 17 FINRA, DISPUTE RESOLUTION STATISTICS, http:// www.finra.org/arbitration-and-mediation/disputeresolution-statistics#arbitrationstats (last visited Apr. 20, 2016). 18 FINRA, DISPUTE RESOLUTION: ONLINE ARBITRATION CLAIM FILING SYSTEM USER GUIDE. (Oct. 2013). https://www.finra.org/sites/default/files/ ArbMed/ p037193.pdf. 19 CFPB, ARBITRATION STUDY: REPORT TO CONGRESS, PURSUANT TO DODD-FRANK WALL STREET REFORM AND CONSUMER PROTECTION ACT §1028(A), at 35 (Mar. 2015), available at http:// files.consumerfinance.gov/f/201503_cfpb_arbitrationstudy-report-to-congress-2015.pdf (lists the American Arbitration Association, JAMS, and the National Arbitration Forum as arbitrators of consumer financial disputes). 20 Supra note 1, at 19. 21 RICHARD LARSEN AND MORRIS MARX, AN INTRODUCTION TO MATHEMATICAL STATISTICS AND ITS APPLICATIONS 384 (3rd ed., 2001). (Statistical power is defined as the probability that we accept the null hypothesis of no statistical difference between groups when there actually is a difference between groups: “it represents the ability of the decision rule to ‘recognize’ (correctly) that H0 is false.”) 22 Id. at 387. 23 Supra note 19, at 49 (Section 5). 24 Supra note 21, at 506. (Applying the formula to test the equality of the proportion of successes for two Bernoulli trials, we compute a z-score of 1.41 which fails to reject the hypothesis that the proportions of arbitrations that result in a consumer award is different between the two arbitrators at a 5% level of significance). 25 AMERICAN ARBITRATION ASSOCIATION, ARBITRATION: ARBITRATION PROCESS: ARBITRATOR SELECTION, https://www.adr.org/ aaa/faces/services/disputeresolutionservices/ arbitration?_afrWindowId=19lgas6gl_50&_ afrLoop=1339953530625928&_ afrWindowMode=0&_adf.ctrl-state=19lgas6gl_53 (last visited Apr. 20, 2016). 26 Alon Klement & Zvika Neeman, Does Information about Arbitrators’ Win/Loss Ratios Improve Their Accuracy? 42 THE JOURNAL OF LEGAL STUDIES 369–397, at 373 (2013). 27 Supra note 1, at 20. 28 See e.g., Ariz. Admin. Code, R9-10-711. 29 See e.g., Trading with the Enemy Act of 1917, Pub. L. No. 65-91, § 3(d), 40 Stat. 411, 413 (1917). 30 Laws generally applicable to the collection and dissemination of data, such as the Privacy Act of 1974 and section 1022(c)(8) of the Dodd-Frank Act, articulate privacy considerations. We assume that the CFPB’s statement regarding compliance with privacy considerations refers to considerations unique to an arbitral publication system, beyond the general considerations applicable to government collection and dissemination of data. Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy | 13 31 Scott Fay, Erik Hurst & Michelle J. White, The Household Bankruptcy Decision, 92 THE AMERICAN ECONOMIC REVIEW, 706–718 (2002). 32 John Gathergood, Debt and Depression: Causal Links and Social Norm Effects, 122 THE ECONOMIC JOURNAL, 1094–1114 (2012). 33 U.S. Gov’t Accountability Off., GAO-08-536, Privacy: Alternatives Exist for Enhancing Protection of Personally Identifiable Information 1 (2008). 34 12 C.F.R. § 1216.3(q). 35 See e.g., Conn. Gen. Stat. §§ 36a-41 through 36a-45; Cal. Fin. Code §§ 4050–4060. 36 Mo. Rev. Stat. § 435.014. 37 Tex. Civ. Prac. & Rem. Code Ann. § 154.073; Ark. Code Ann. § 16-7-206. 38 The Health Insurance Portability and Accountability Act of 1996, Pub. L. No. 104–191, 110 Stat. 1936 (1996). 39 45 C.F.R. § 160.103. 40 In 2014, GAO estimated that more than 4 million consumers use medical financing. U.S. Gov’t Accountability Off., GAO-14-570, Consumer Finance: Credit Cards Designed for Medical Services Not Covered by Insurance 1 (2014) (using term “credit cards” to “refer collectively to financial products— including revolving credit lines and installment loans— that are designed specifically to finance health care services not covered by health insurance”). 41 See Kenneth P. Brevoort & Michelle Kambara, Data Point: “Medical Debt and Credit Scores (May 2014), available at http://files.consumerfinance. gov/f/201405_cfpb_report_data-point_medical-debtcredit-scores.pdf. 42 45 C.F.R. §§ 164.302–318 (security requirements); 164.400–414 (notification requirements); 164.500–534 (privacy requirements). 43 42 U.S.C. § 1320d-5 (provideing HHS and state Attorneys General with enforcement authority). 44 See, e.g., LOYOLA MARYMOUNT UNIVERSITY, INSTITUTIONAL LOANS, http://financialaid.lmu.edu/ prospective/faq/institutionalloans/ (last accessed Apr. 20, 2016) (example of a nonprofit private postsecondary institution that offers loans directly to students); UNIVERSITY OF CALIFORNIA, BERKELEY, INSTITUTIONAL LOAN PROGRAM, http:// studentbilling.berkeley.edu/InstitutionalLoanProgram. htm (last accessed Apr. 20, 2016) (example of direct financing at a public post-secondary institution). 45 The Family Educational Rights and Privacy Act of 1974, Pub. L. No. 93-380, 88 Stat. 571 § 513, (1974). 46 34 C.F.R. § 99.3(f). 47 See, e.g., Bloomberg, “Yale Suing Former Students Shows Crisis in Loans to Poor,” available at http:// www.bloomberg.com/news/articles/2013-02-05/yalesuing-former-students-shows-crisis-in-loans-to-poor. 48 Although the pass through of costs may not be prevalent in other markets, the potential inelasticity of demand for student debt may present a special case. 49 Supra note 19. 50 Supra note 1, at 19. 51 Supra note 19, at 136 (Appendix B). 52 Supra note 19, at 140 (Appendix B). 53 Supra note 19, at 136 (Appendix B). 54 Supra note 1, at 21. 55 Supra note 1, at 21. 56 J. Ryan Lamare & David B. Lipsky, Employment Arbitration in the Securities Industry: Lessons Drawn from Recent Empirical Research, 35 BERKELEY J. EMP. & LAB. L. 113, at 119 (2014); David B. Lipsky et al., The Arbitration of Employment Disputes in the Securities Industry: A Study of FINRA Awards, 1986- 2008, 65 DISPUTE RESOLUTION JOURNAL 12 (2010). 57 Adam Pritchard et al., The Influence of Arbitrator Background and Representation on Arbitration Outcomes, PRITCHARD, ADAM C. “THE INFLUENCE OF ARBITRATOR BACKGROUND AND REPRESENTATION ON ARBITRATION OUTCOMES.” S. J. CHOI AND J. E. FISCH, CO-AUTHORS. VA. L. & BUS. REV. 9, NO. 1 (2014); Stephen J. Choi & Theodore Eisenberg, Punitive Damages in Securities Arbitration: An Empirical Study, CORNELL LEGAL STUDIES RESEARCH PAPER 09–01 (2009). 58 Seth E. Lipner, Expungement of Customer Complaint CRD Information Following Settlement of a FINRA Arbitration, XIX FORDHAM JOURNAL OF CORPORATE AND FINANCIAL LAW 57 (2013). (Lipner performed a text search for the term “expungement” in FINRA records for the first 6 months of 2013 to understand the effects of expungement of consumer complaints following the settlement of a FINRA arbitration claim.) 59 Alexandre Mas, Pay, Reference Points, and Police Performance, 121 THE QUARTERLY JOURNAL OF ECONOMICS 783–821 (2006); Orley Ashenfelter & Gordon B. Dahl, Bargaining and the Role of Expert Agents: An Empirical Study of Final-Offer Arbitration, 94 REVIEW OF ECONOMICS AND STATISTICS 116–132 (2010). (Hand-coded 1978-1996 New Jersey municipality and police bargaining unit arbitration.) 14 | Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy 60 AMERICAN ARBITRATION ASSOCIATION, GOVERNMENT & CONSUMER: CONSUMER ARBITRATION STATISTICS, https://www. adr.org/aaa/faces/aoe/gc/consumer/ consumerarbstat?_afrWindowId=14fxmwf2i0_201&_ afrLoop=1027970136106473&_ afrWindowMode=0&_adf.ctrlstate=14fxmwf2i0_204 (last visited Mar. 8, 2016); Alexander J. S. Colvin & Mark D. Gough, Individual Employment Rights Arbitration in the United States Actors and Outcomes, 68 ILR REVIEW 1019–1042 (2015). (This paper makes use of the data collected in spreadsheet format by AAA). 61 Supra note 19, at 35. 62 BUREAU OF LABOR STATISTICS, QUARTERLY CENSUS OF EMPLOYMENT AND WAGES, http://www.bls.gov/cew/home.htm (last visited Apr. 20, 2016). 63 U.S. ENERGY INFORMATION ADMINISTRATION, METHODOLOGY FOR MONTHLY CRUDE OIL ESTIMATES, http://www.eia.gov/petroleum/supply/ monthly/pdf/crudemeth.pdf (last visited Apr. 20, 2016). 64 DEPARTMENT OF EDUCATION, NATIONAL POSTSECONDARY STUDENT AID STUDY - ABOUT NPSAS, http://nces.ed.gov/surveys/npsas/about.asp (last visited Apr. 20, 2016). 65 EDGAR ONLINE, DATA CONTENT SOLUTIONS, http://www.edgar-online.com/DataContentSolutions. aspx (last visited Apr. 20, 2016). 66 FNC, NATIONAL COLLATERAL DATABASETM, http:// www.fncinc.com/Products/ncd.aspx?ref=63 (last visited Apr. 20, 2016). 67 U.S. ENERGY INFORMATION ADMINISTRATION, BUDGET AND PERFORMANCE, https://www.eia.gov/ about/budget_performance.cfm (last accessed Apr. 20, 2016). 68 FFIEC, ANNUAL REPORT 2014, at 48 (2015), available at http://www.ffiec.gov/PDF/annrpt14.pdf, at 48. 69 PACER. ELECTRONIC PUBLIC ACCESS FEE SCHEDULE, at 2-3 (Dec. 2013), available at https:// www.pacer.gov/documents/epa_feesched.pdf. 70 Supra note 58. 71 Supra note 1, at 20. NOTES Building the CFPB’s Arbitration Archive: A Commentary on Design, Implementation, and Privacy | 15 About Edgeworth Edgeworth Economics provides quantitative and economic consulting in the course of litigation and business to its clients, which include world-class law firms, Fortune 500 companies, and government agencies. Edgeworth experts apply their knowledge and experience, along with state-of-the-art computing infrastructure, to help clients efficiently manage complex issues including antitrust litigation, privacy & data security, transfer pricing, intellectual property, mergers and acquisitions, class actions, labor, and data & HR analytics. As a rapidly growing firm with a fresh approach, Edgeworth attracts leaders and teachers from across the industry including PhD economists, MBAs, statisticians, and programmers. Edgeworth has offices in Washington, DC, Pasadena, and San Francisco. Contact: Dr. Xiaoling Ang Principal Consultant Washington, DC firstname.lastname@example.org 202.580.7744 Copyright 2016 Edgeworth Economics, LLC All rights reserved.