What to Consider in Data Science for the Financial Services Industry

When working with data for marketing in Financial Services, there are a number of implications analysts and marketers  need to consider. The reason is that there is a higher regulatory scrutiny on the financial industry, especially due to the financial crisis. This is especially true for marketing campaigns where prospects are pre-screened using credit bureau data in order to receive a firm offer of credit. This affects marketers who work in financial services.

The most important laws that regulate marketing campaigns of financial institutions are the Equal Credit Opportunity Act (ECOA) and the Fair Housing Act (FHA).

The ECOA prohibits discrimination in any aspect of a credit transaction. It applies to any extension of credit. The ECOA prohibits discrimination based on the following protected groups:

  • Race or color
  • Religion
  • National origin
  • Sex
  • Marital status
  • Age (provided the applicant has the capacity to contract)
  • The applicant’s receipt of income derived from any public assistance program
  • The applicant’s exercise, in good faith, of any right under the Consumer Credit Protection Act

The existence of illegal disparate treatment against a protective group may be established by either overt evidence or comparative evidence. Overt discrimination based on these factors can easily be avoided by financial institutions since it’s easy to determine whether a proposed policy constitutes a violation. Consider a lender that offers a credit card with a limit of $10,000 for married couples and $5,000 for single applicants. This policy would obviously violate the ECOA’s prohibition on discrimination based on marital status.

However, the marketing or credit criteria of a financial institution can be found to be discriminatory even in cases where a lender does not intend to discriminate. This is typically the case when a seemingly neutral criteria is applied equally to all applicants or prospects, however that criteria disproportionately and negatively impacts applicants on a prohibited basis. Such criteria can be found to be discriminatory if there is no legitimate, non-discriminatory business need for the criteria that create the disparate impact. This is called comparative evidence. In it’s overview over the Federal Fair Lending Regulations and Statutes, the Federal Reserve provides the following example:

“A lender's policy is not to extend loans for single family residences for less than $60,000.” This policy has been in effect for ten years. This minimum loan amount policy is shown to disproportionately exclude potential minority applicants from consideration because of their income levels, or because of the value of the houses in the areas in which they live. The lender will be required to justify the "business necessity" for the policy.”

The issue for financial institutions and their marketers therefore is that they could design a marketing or credit policy with no intention of violating ECOA, and later still be found to be in violation. It becomes clear that “business necessity” is the key factor that needs to be considered when determining whether a policy violates ECOA due to negative impacts against a protected group. Consider the analyst developing a marketing campaign that leverages a statistical model. Assume that the model uses the presence and size of a mortgage as a sign of a credit worthiness, with larger mortgages leading to a better score and, by contrast, the lack of a mortgage being a negative factor in the model. This appears to be a reasonable approach. Applicants with a mortgage went through an application process for that mortgage and were approved. If they have a larger mortgage they had to convince the lender that they would be able to afford the resulting larger payments. Both of these factors make the applicant appear more credit-worthy than a person without a mortgage. This clearly doesn’t constitute an overt discrimination against a protected group. But could the model still be found to be in violation, because it has disparate impact on older people? You could argue that many older applicants who lack mortgages only do so because they already paid them off in full. The model isn’t likely making that distinction, and therefore could lack business necessity in the eye of an examiner.

It has hopefully also become clear that whether a criteria or a certain factor in a model is a violation is not always obvious and easy to understand. Therefore, what are the implications for analysts and marketers working for or with financial institutions? How should analysts and marketers assess and protect against themselves from these compliance risks?

First, they should provide a business necessity for each variable in their models or criteria. For statistical models this should also include statistical justification. This should include evidence that the variable is a powerful predictor of the target variable of the model. For a response model this could mean demonstrating that the variable is a statistically significant predictor of response in past campaigns. This is obviously easier dealt with in a straightforward model compared to an ensemble model. It is easier to determine the role each variable plays in a simple model. This complicates the use of more complex models, where the role and significance of each variable cannot always easily be determined.

Secondly, it is recommended to have an independent compliance or legal review of the criteria and/or model to minimize the compliance risk. The reviewer should have the right to “veto” a factor in the model if he/she has serious concerns about the use of that factor.

In the case of the “presence and size of a mortgage” example above, it was indeed a compliance officer who raised these concerns during a review of a model. We determined that applicants with a paid off mortgage were indeed more creditworthy than applicants that never had a mortgage. We were also able to demonstrate that, through other factors in the model, the model scores accurately captured and reflected this. Had this not been the case, we would very likely have been asked to remove that variable from our model.

These examples hopefully underscore the added complexity faced by data scientist and modelers working in financial services. But even for analysts working in other industries, it is a good idea to consider the reputational risks caused by the models they develop. What if a major news outlet, such as the New York Times, were to publish a story about your model? Do you feel comfortable that you could defend your model and the variables in it in the court of public opinion? If there are factors in a model that you’d rather not read about, maybe you should consider not incorporating them into your model.  

Join the Discussion