Terence Tao Just Gave Us the Right Vocabulary for AI in Hiring

AI Research Governance Human-in-the-Loop Hiring Strategy

Terence Tao is one of the most decorated mathematicians alive. Fields Medal at 31. A career spent on the deepest problems in partial differential equations, combinatorics, harmonic analysis, and additive number theory. When he writes about a tool, the field pays attention.

Earlier this spring he co-authored a paper with longtime collaborator Tanya Klowden, who is listed as the first author, called Mathematical Methods and Human Thought in the Age of AI. Most of it is about how AI is changing mathematical proof. But buried in the middle is a framework borrowed from cybersecurity that gives us, finally, the precise vocabulary for what's gone wrong with AI in hiring.

THE FRAMEWORK

Red Team, Blue Team

In security operations, the blue team builds and defends the system. The red team probes it. They look for weaknesses, test assumptions, find what breaks. Both are essential. Neither does the other's job.

Tao and Klowden apply the same lens to AI. The line worth printing out:

AI is relatively safe to utilize in a "red team" capacity of reviewing human-generated content for errors or suggested improvements; but with the stochastic unreliability and lack of groundedness of the current and near-term tools, it is unsafe to trust them in any "blue team" structural capacity that is beyond the ability of the "red team" [...] to verify.

Translated: AI is safe when it checks human work. It is unsafe when it generates structural decisions that no one then verifies.

That distinction does more work than two years of "human-in-the-loop" marketing copy. It tells you, mechanically, where AI belongs in a workflow and where it doesn't.

THE INVERSION

The Hiring Market Got This Exactly Backwards

A 2024 ResumeBuilder survey of nearly 1,000 business leaders found that 71% of companies allow AI to reject candidates without full human oversight. Twenty-one percent let AI reject at every stage. Fifty percent let AI reject at initial screening. Only 29% maintain human oversight on every rejection decision.

That's a market full of blue team tools. The AI generates the structural decision. A human, if one is present at all, sees the output downstream and either ratifies it or doesn't even know it happened.

The pitch is always the same. Faster funnels. Higher throughput. Less recruiter time. On the surface it works, until you ask the question Tao would ask: who verified the rejection?

Nobody. The score is the output. There is no red team. The AI is making structural decisions about people's careers, and nothing is checking the work.

THE EVIDENCE

What Happens When the Red Team Is Missing

Two recent research findings, one from cognitive science and one from hiring specifically, show what failure looks like at the architecture level.

In January 2026, Wharton researchers Steven Shaw and Gideon Nave published research on cognitive surrender: the phenomenon where people stop evaluating AI outputs and simply accept them. Across three preregistered experiments with 1,372 participants and nearly 10,000 reasoning trials, participants who consulted AI showed dramatically degraded performance when the AI was wrong, and felt more confident in their wrong answers, not less. The human in the loop wasn't checking the AI. The human was deferring to it.

In November 2025, University of Washington researchers Kyra Wilson and Aylin Caliskan published a study on hiring decisions specifically. Five hundred and twenty-eight participants reviewed candidates for sixteen different jobs, with simulated LLM recommendations that varied in racial bias. In the severely biased condition, human reviewers followed the AI's biased picks approximately 90% of the time.

Tao's frame explains why both findings are unsurprising. When AI is the blue team and humans are downstream of the output, there is no red team. Cognitive surrender is the behavior. Ninety percent bias adoption is the outcome. The architecture invited both.

THE COURT RECORD

Architecture Determines Liability

The same architecture that produces those research findings is now producing settlements and class certifications.

In 2023, the EEOC settled its first AI hiring discrimination case against iTutorGroup for $365,000. The company's automated software had rejected female applicants over 55 and male applicants over 60 without human review. The structural decision was made by the AI. No red team checked it. The settlement followed.

In May 2025, a federal court in California granted preliminary certification of a nationwide ADEA collective action in Mobley v. Workday, allowing claims that Workday's AI screening tools systematically discriminated against applicants over 40. The collective covers individuals aged 40 or older who applied for jobs through Workday's platform from September 24, 2020 onward. The architecture, again, was the issue: AI scoring outputs that drove rejections without independent verification.

These aren't isolated cases. They're predictable outcomes of blue-team-AI design.

Regulators are catching up to the same conclusion. Ontario's Working for Workers Four Act, which took effect January 1, 2026, requires employers with 25 or more employees to disclose AI use in job postings when AI is used to "screen, assess, or select" candidates. The province isn't banning AI. It's requiring transparency about where AI is making structural decisions, because those are the decisions most likely to fail in court.

THE INVERSION, APPLIED

What Red Team AI Looks Like in Screening

A red team approach inverts the design. The humans generate the content. The AI verifies it.

The candidate generates claims through their resume and pre-screen responses. The references generate accounts in their own words. The background check generates verified facts from official records. The job requirements anchor everything. All of that content is human-originated, or at least human-attestable.

The AI's job is to look across those sources and ask the questions a thorough reviewer would ask. Do the sources agree? Where do they diverge? Which claims are corroborated by three sources and which are supported by only one? Where are the gaps?

The output isn't a decision. It's evidence, attributed to its source, mapped to the requirement it relates to. A human reads the evidence and decides.

This is harder to pitch than "our AI finds the best candidate." It's also defensible in a procurement review, in an audit, in a wrongful-hiring claim, and in front of a regulator. Those are increasingly the rooms where hiring tools live or die.

THE SMELL TEST

What Tao Calls a "Bad Smell"

Tao makes a second point worth holding onto. Experienced mathematicians can read a proof and sense whether it smells right before they check it line by line. A bad proof has a "bad smell," even when the conclusion happens to be correct. The form of the argument can look correct while the substance is hollow.

Recruiters know this instinct. A skilled hiring manager can read a reference transcript and feel whether the conversation was real, whether the reference was hedging, whether the candidate's story holds together.

Survey-based reference tools strip that out. You get a five-point scale and a comment box. There is nothing to smell.

Voice conversations preserve it. Real conversations probe. They follow hesitation. They ask the question the candidate didn't quite answer. The transcript that comes out has texture, and a reviewer reading it can tell the difference between a real endorsement and a polite one. That is information no number can carry, and it's exactly the kind of human-generated content Tao's framework says AI should be verifying rather than replacing.

THE VIRVELL APPROACH

How We Think About This

We built Virvell as a red team tool, before we had the vocabulary for it.

Our AI conducts pre-screen interviews, reference check conversations, and coordinates background verification through Certn. It gathers what candidates say, what references report, and what records reveal. It cross-verifies those sources against the job requirements with five-source cross-verification, flagging where claims are corroborated, where they're partial, and where there are gaps.

Every finding is attributed to its source. Resume says one thing. Pre-screen says another. Reference confirms or contradicts. Background check verifies or doesn't. Each piece of evidence is tagged with where it came from and mapped to the requirement it relates to.

What the platform doesn't do is score candidates, rank applicants, or generate hiring recommendations. The reviewer sees evidence, not conclusions. The decision belongs to the human.

This isn't a limitation we accept. It's the architecture we chose. Tao's paper put a name to it.

THE BIGGER POINT

A Design Principle for AI in Consequential Decisions

What Klowden and Tao are really pointing at is bigger than hiring. They're describing a design principle for AI anywhere it touches consequential decisions.

Use AI to verify, not to decide. Use it to check work, not to do work. Use it where errors can be caught, not where they compound.

In hiring, that principle has a name. It's called keeping humans in the loop. The industry has said that for two years. Tao's frame tells us how to actually build it: put AI on the red team, keep humans on the blue team, and never let that get reversed.

The hiring tools that survive the next regulatory cycle, the next class action, the next bias audit, will be the ones that already did.

Sources

Klowden, T. & Tao, T. (2026). Mathematical Methods and Human Thought in the Age of AI. arXiv:2603.26524
Shaw, S. & Nave, G. (2026). Thinking — Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender. SSRN 6097646
Wilson, K., Sim, M., Gueorguieva, R. & Caliskan, A. (2025). No Thoughts Just AI: Biased LLM Hiring Recommendations Alter Human Decision Making and Limit Human Autonomy. AIES 2025. DOI: 10.1609/aies.v8i3.36749
EEOC v. iTutorGroup, Inc., No. 1:22-cv-02565 (E.D.N.Y. 2023). EEOC press release
Mobley v. Workday, Inc., No. 3:23-cv-00770 (N.D. Cal., preliminary collective certification May 16, 2025)
ResumeBuilder.com (2024). 7 in 10 Companies Will Use AI in the Hiring Process in 2025.
Ontario Working for Workers Four Act, 2024 (Bill 149), AI disclosure provisions effective January 1, 2026.

About the Author

Julien Gagnier is the founder and CEO of Virvell, the conversational AI platform for talent teams. With 15+ years of HR leadership experience at Honda Canada, Later, Microart, and Unisys, Julien brings practitioner insight to the intersection of AI and hiring. He holds a CHRL designation and an MBA from Schulich School of Business.