This blog explores the evolving landscape of AI fairness regulations in the insurance industry, highlighting the new requirements for fairness testing and the technical challenges that arise due to the lack of accessible demographic data. The article discusses the implications of US and EU regulations, the innovative techniques for demographic data imputation, and the broader impact on insurance underwriting and pricing. We examine how insurers can navigate these complex requirements to ensure AI models are both fair and effective.
By Andrew Marble, Philip Dawson and Stephanie Cairns
As policy makers and regulators introduce new obligations for companies’ use of AI, including to conduct regular fairness testing of the data sets and AI models they develop and/or use, enterprise data science teams are confronted with a series of compliance options and challenging technical choices.
One of the core challenges to measuring bias in AI solutions is that system developers generally do not collect, or have access to, relevant demographic data of their clients, patients, employees or job candidates, due to restrictions in privacy laws that protect the collection and processing of personal information. Without such demographic data, it can be challenging for data science teams to evaluate the fairness of the AI models they develop across protected and intersectional groups (e.g. gender and/or race).
US Insurance Regulations
In the US, insurance regulators are considering requiring data science teams to adopt specific inferencing techniques that enable the estimation, or ‘imputation’, of missing demographic data (e.g. gender or race) to enable fairness assessments. In Colorado, for instance, the State insurance regulator has published a draft regulation on “Quantitative Testing for Unfairly Discriminatory Outcomes for Algorithms and Predictive Models Used for Life Insurance Underwriting1.” New York State Department of Financial Services has released the a draft circular on “Use of Artificial Intelligence Systems and External Consumer Data and Information Sources in Insurance Underwriting and Pricing2.” At the core of these instruments are proposed measures to quantify differences in outcomes across demographic groups, in particular race and gender. Gender data is more commonly available (and more readily imputed) so we focus primarily on race in this discussion though the same considerations can apply to gender.
For race, the common requirement we see is for proposed insured to be categorized according to labels roughly aligned with US census race categories: White, Black, Asian, and Hispanic3. Impact and disparity analyses are then performed to determine whether there is a material difference in outcomes for the different demographics.
Historically, race data has not been collected and data sets are rarely annotated with this information. This makes testing during model development as well as compliance auditing challenging. It has been acknowledged that estimating (imputing) race based on statistical models is a necessary approach to perform fairness testing. It should be emphasized that such imputation is always imperfect, as are the coarse and arguably artificial census race categories. The analysis should necessarily be seen as a first order check on demographic fairness rather than a scientific or deterministic calculation of absolute fairness.
Several common models are available that estimate race based on correlations with other personal information. Colorado DOI specifies the “Bayesian Improved First Name Surname Geocoding” (BIFSG) model for race imputation4. It uses name and location to infer race using a statistical model. Alternate approaches include services such as NAMSOR5 that use proprietary models to infer race from name, as well as freely available models such as raceBERT6 (a variant of the popular BERT AI model) and ethnicolr7. Name and address clearly are not causally related to race so these models all rely on statistical correlation.
While some jurisdictions mandate imputation methods, the choice also depends on the available data and privacy considerations. BIFSG requires address information which is not always available. API based imputation methods require transmitting names to a third party which may not be possible due to privacy or data handling requirements. In terms of performance, a study compared NAMSOR, ethnicolr, and BIFSG with self identified race category and concluded that all methods have significant errors but that all methods are also successful at confirming directional racial disparities8. This brings to mind the British statistician George Box’s observation that “all models are wrong, some are useful”.
Alternative option under the EU AI Act
In Europe, the recently adopted AI Act contains a provision which would allow companies to collect sensitive personal information, such as demographic data, if it is done for the purpose of mitigating bias9. In general, while using real demographic data may allow for greater accuracy in bias measurements, collection and processing of such data introduces new risks for the enterprise, related to privacy and security, as well as necessary costs for implementing safeguards and assuring compliance with applicable data protection regulation. European lawmakers, likely aware of these criticisms, have imposed strict privacy and security requirements on data processors and have kept the scope of the EU AI Act’s exception narrow: it applies only to high-risk systems where bias detection and mitigation via anonymous or synthetic data is impossible.
Even in cases where the collection and use of demographic data can be achieved in a legal, safe manner, however, the process of building and maintaining comprehensive, up-to-date datasets can be slow and technically arduous.
Until more data is collected or regulations change, race imputation for fairness evaluations is going to remain a necessity. There are various available methods, and regulators, businesses, and auditors need to be flexible in which are applied based on the data that is available and other constraints.
1 https://www.debevoise.com/insights/publications/2023/10/the-final-colorado-ai-insurance-regulations-whats
2 https://www.dfs.ny.gov/industry_guidance/circular_letters/cl2024_nn_proposed
3 Colorado uses the four categories listed, New York does not specify categories, similar HR legislation such as NY Local Law 144 includes Native Hawaiian or Pacific Islander and Native American or Alaska Native categories: https://rules.cityofnewyork.us/wp-content/uploads/2023/04/DCWP-NOA-for-Use-of-Automated-Employment-Decisionmaking-Tools-2.pdf
4 https://www.rand.org/health-care/tools-methods/bisg.html
5 https://namsor.app/
6 https://arxiv.org/abs/2112.03807
7 https://ethnicolr.readthedocs.io/
8 https://dl.acm.org/doi/10.1145/3531146.3533140
9 Article 10(5), EU AI Act, https://artificialintelligenceact.eu; see also Marvin van Bekkum and Frederik Zuiderveen Borgesius, “The AI Act's debiasing exception to the GDPR”, IAPP, February 2024, https://iapp.org/news/a/the-ai-acts-debiasing-exception-to-the-gdpr/