MIT study suggest banks’ hiring algorithms are getting it wrong5 min read

Sarah Butcher / August 20 2020

If you’re a student trying to get a job in an investment bank now, you will almost certainly come up against a hiring algorithm. For example, banks like JPMorgan, Goldman Sachs and others use Hirevue, a digital interviewing system that uses an algorithm to identify the candidates who should go through to a second round. 

In theory, algorithms are scrupulously fair, but as the recent exam results fiasco in the U.K. shows, algorithms can be extremely biased if not constructed carefully.

A new study* from academics** at MIT Sloan and Columbia University says that while recruitment algorithms are typically less biased than actual human beings, they still tend to favor traditional applicants. 

Most algorithms use a form of supervised learning based on a training dataset comprised of high quality applicants from the past, say the academics. Having learned what a high quality applicant looks like, the algorithm assumes that these past examples extend to the future and selects future applicants that fit the model. 

This sounds a lot like how the Hirevue algorithm works. Speaking to us last year, Nathan Mondragon, Hireview’s chief industrial and organizational psychologist, said Hirevue, ‘discovers the competencies, attributes and behaviors of the firm’s best employees who were hired for a similar role and builds the ideal profile for a given job description.’ Each job description and each ideal candidate is different. – Hirevue looks at over 15,000 traits expressed in digital interviews (eg. choice of language, breadth of your vocabulary, eye movements, speed of your delivery, level of stress in your voice, ability to retain information) and matches existing top performers with potential new hires. 

MIT’s paper doesn’t cite Hirevue directly, but the academics suggest that recruiting using this kind of algorithm makes firms risk averse: “Firms that rely on supervised learning will tend to select from groups with proven track records rather than taking risks on non-traditional applicants.”

As most banks seek to increase the proportion of minority candidates making it onto their training programs, this is problematic. 

Instead of relying on supervised learning, the academics suggest that hiring algorithms should be built to value exploration. They suggest alternative algorithm that selects candidates based on estimates of their potential, and that includes an “exploration bonus” taking into account the upper boundary of the confidence interval of a candidates’ potential in the job.

This new machine learning algorithm (known as a ‘contextual bandit’ algorithm) also allocates higher exploration bonuses to candidates designated as “rare,” and rarity designations are made by the algorithm itself. “The algorithm can choose to assign higher exploration bonuses on the basis of race or gender, but it is not required to and could choose, instead, to focus on other variables such as education or work history,” say the academics. 

Once rarity candidates have been selected, the academics then incorporate the “realized hiring outcomes” into the training data, and the algorithm is updated for the next round of recruitment. 

In this way, the study found it possible to dramatically improve results when it comes to hiring some minority groups. For example, the share of selected applicants who were black or hispanic went from 10% under the supervised learning algorithm to 23% under the contextual bandit. However, the proportion of women hired went from 50% to 39% because, ” men tend to be more heterogeneous on other dimensions—geography, education, and race, for instance—leading them to receive higher exploration bonuses on average.”

Ultimately, the study found that even when algorithms are blinded to candidates based on demographic inputs, the contextual bandit approach allows companies to identify stronger candidates from traditionally under-represented groups than either human recuriters or traditional supervised learning algorithms.

By comparison, they suggest that human recruiters can be inefficient and tend to select weak minority candidates in place of stronger ones, while traditional supervised learning algorithms typically select candidates for quality without taking the need for diversity into consideration.