September 15, 2022
|
AI Bias Bar Keeps Rising for HRTech
How Independent Assessments and Robustness Testing Can Help Developers Stay Ahead

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

The AI Revolution in HR

In an increasingly globalized labor market, sourcing and recruiting top talent is one of the most pressing and consistent challenges that businesses face.

 

For this reason, the HR sector has become one of the most rapid adopters of AI/ML technologies, which have brought new levels of efficiency to nearly all aspects of the traditional HR pipeline. According to recent industry analyses, there are currently over 250 different commercial, AI-based HR tools on the market. This market is only set to grow further, with industry surveys showing that 92% of organizations currently plan to increase their use of AI in at least one step of their HR operations.

New Risks and Regulation

However, the efficiency gains of automation come with important risks. Long-standing concerns about the potential for harmful biases, for example, mean that the use of AI in HR is already under significant levels of ethical and legal scrutiny. High-profile cases of algorithmic discrimination — such as Amazon’s biased resume-screening software — have heightened public awareness of these dangers.

 

Consequently, regulatory momentum is increasing as governments have begun drafting legislation to provide stricter rules for the use of algorithms or AI in an employment context. These emerging regulations will have a variety of direct and indirect effects on HR software developers.

 

In some cases, regulations will directly affect the software development process itself. For example, the US Algorithmic Accountability Act places new obligations on HR software developers to conduct impact assessments of their automated and AI tools. Europe’s Artificial Intelligence Act includes sweeping new rules to assure models to support recruitment or promotions for quality, accuracy, fairness, explainability and robustness, or to document data and model specifications or testing and validation processes.

 

In other cases, new regulatory frameworks will place novel legal liabilities on end-users of HR software, rather than its developers. NYC’s recent AEDT bill, for instance, requires any employers wishing to use HR software to conduct independent bias audits both prior to deployment, and on a rolling annual basis. While such laws may not directly apply to HR software vendors, they will nevertheless feel its effects, as clients will not expect to shoulder the new liabilities alone.

How Can HR Tech Developers Stay Ahead?

The fact that most employers and recruiters don’t have the time or technical expertise to perform in-depth audits of the HR software they procure means that companies will increasingly look to their software vendors to provide assurances that their products meet new requirements. Looking forward, clients will be less and less likely to select HR software vendors who have not performed independent assessments or cannot attest to comprehensive internal bias testing.

 

Rather than viewing these new regulatory requirements as a burden, however, HR software developers can embrace these developments as opportunities to differentiate their offerings and secure a new competitive advantage in an increasingly crowded marketplace.

 

One practical option for increasing customer confidence in the near-term may be for developers to have their AI/ML models tested for bias by third-party assessors. Independent verification can help developers build trust and distinguish their product offering as “audit-ready”. In fact, some major HR software providers have already begun voluntarily subjecting their products to third-party assessments. As we’ve covered in a previous post, given the variety of bias assessment offerings we see emerging, companies will have to choose their assessment partners wisely.

  

To sustain client confidence in the long-term, developers will need to invest in more rigorous approaches to internal product testing. In practice, this will mean that developers will have to become more acutely aware of the many different ways in which biases can arise in AI/ML models. They will also have to adopt new tools that can provide deeper insight into how models may behave in complex, real-world scenarios, and offer recommendations for improving model fairness without sacrificing quality. 

Bias Testing Today — An Illustration

To illustrate how more robust testing can lead to fair and more accurate models, let’s consider one of the most common use-cases for AI/ML in HR: automated resume screening.

 

AI developers understand that including personally-identifying variables — such as job applicants’ names, sex, and even their alma maters — can introduce systematic biases into models that lead to disparate outcomes for different demographic groups. For this reason, it has become common practice to explicitly exclude these variables during model training. This effectively “blinds” the model to these features — which, in theory, should prevent it from making biased decisions based on these factors.

 

However, this common approach has two important drawbacks. First, it is not a surefire guarantee against discriminatory biases, as real-world variables are often inter-correlated in subtle and unpredictable ways. Variables that can seem less relevant at first blush, such as extracurricular activities (e.g. lacrosse, baseball, sewing, etc…), can be indirect signals of the type of identity-related factors that we wouldn’t want a model to actually consider when making decisions. Another example could be long gaps in a candidate’s experience, which can be related to parental leave and thus potentially associated with a specific gender. These are called ‘proxy variables’ — and this is only one out of many ways in which subtle biases can creep into a model.

 

Secondly, if these variables are excluded from both the training and testing processes, then you could be building biased models while eliminating the ability to determine whether your models are biased! Therefore, the ideal solution is not to simply “throw out” the variables, but instead to take a more meticulous approach to testing that can actually reveal the specific weaknesses and features of a model that could lead to biased outputs and, ultimately, future compliance issues.

Towards 'Robust' Bias Testing of HR Software

To achieve these goals, developers will have to move beyond traditional model-testing solutions — such as basic accuracy or bias testing — which do not afford developers the fine-grained insight into how their models are actually producing decisions, to a focus on the robustness of AI systems. Some of these new techniques that developers should consider will be covered in future blog posts. The following are just two examples of how robustness testing can be scaled to strengthen performance while ensuring model fairness endures over time.

  

Scenarios testing: Rather than focusing solely on “general” performance measures (such as overall accuracy), software designers should carefully consider the scenarios that their models will encounter during deployment, and identify test cases that reveal how the model performs under each context. For bias prevention, it’s especially important to consider unexpected scenarios and edge-cases where model performance may be particularly sensitive or critical. Adopting testing solutions that automate the discovery of such scenarios for use as test cases will help developers build HR tools that outperform competitors’ in terms of both performance and fairness.

 

Perturbation testing:  Models deployed “in the wild” will often encounter data that can be quite different from what the model was originally trained on. Developers need to understand how a model will react when inputs are systematically perturbed or manipulated in different ways. Specific perturbations may also give insight into model bias. For example, sociocultural trends may cause a hobby or work experience that has historically been associated with a particular gender – itself an example of unconscious gender bias – to change over time. Alternatively, users could maliciously attempt to “game the system” by manipulating their input in subtle ways in order to produce a desired outcome. In both cases, these could change how a given model impacts different subsets of end-users. Scaled perturbation testing during the development phase can give developers a better measure of how robust the model might be in the wild, where the data isn’t as controlled.

Looking Ahead

Ultimately, in the face of an evolving and increasingly demanding series of expectations from regulators and clients alike, HR software developers can look to third-party bias assessments and enhanced model testing to convert these pressures into new competitive advantages. Combining these approaches will help developers  build trust with their clients in the short term, while ensuring they remain ready to satisfy emerging AI compliance requirements of their own.

Keep Reading

More To Explore