In the AI Era, Where Is Human-ness Proven When AI Passes Interviews?

 In January 2026, Tristan Hume, the performance optimization team lead at Anthropic, left a somewhat bittersweet confession on the company's technical blog. It was the story of the technical interview assignment he had worked for years to design being repeatedly defeated by Claude, the AI model his company built.

The reason this confession lands particularly heavily is that the assignment was not a simple coding test.

It was a validated hiring tool that Anthropic had used to select dozens of core performance engineers over the past two years, and internally it had even earned the trust that "if you do well on this assignment, you will definitely do well in actual work."

But...

That test could no longer distinguish people.

In the AI era, today's discriminating power becomes tomorrow's powerlessness.

In technical talent hiring, 'take-home test' assignments have long been considered the most rational method. Rather than improvising solutions in front of a whiteboard, it allows observing how one thinks in an environment similar to reality, where one gets stuck, and what one gives up.

Anthropic introduced a unique assignment in 2024 for the same reason.

Applicants had to optimize given code in a virtual accelerator environment. Multi-core, SIMD, memory bottlenecks, instruction packing — the exact problems performance engineers wrestle with every day. More than 1,000 applicants have gone through this test. Many of them are still working as Anthropic's core staff.

However, as AI became more sophisticated, problems arose.

Claude 3.5 Sonnet: Began threatening the level of the average human applicant.

Claude 4 Opus: Overwhelmed most human applicants within the time limit.

Claude 4.5 Opus: Became indistinguishable from the top 5% of human applicants based on output alone.

Ultimately, within the two-hour time limit, a threshold was reached where it became impossible to determine whether code written by an applicant was human logic or model computation.

At this point, you can picture the scene of an interviewer pausing while looking at the code.

 "This... did a human do this?" 

The moment this question arises, the evaluation tool has already served its role.

The lesson left by three failures is that now only 'unfamiliarity' can prove human-ness.

Tristan Hume did not discard this assignment. 

Instead, through three comprehensive revisions, he experimented with 'discriminating power' in the AI era.
The conclusions obtained in the process directly overturn conventional hiring wisdom.

① The more it resembles actual work, the more advantageous for AI.
The initial assignment was a 'good problem.' It was realistic, had depth, and resembled actual work. The problem was precisely that. Such problems already exist countless times in AI's training data. Practical orientation was not the standard for evaluating humans — it was becoming the material for training AI.

② If you raise the difficulty, AI adapts even faster.
Increasing bottlenecks, adding constraints, deepening thinking stages — the result was the same. Claude grasped the problem structure much faster than humans and arrived first at optimization points that humans find difficult to reach. 'Harder problems' was not the solution.

③ Only 'intentional unfriendliness' reveals thinking ability.
The last choice looked almost like cheating.

  • Extremely limited instruction sets, no standard debugging tools, no friendly explanations.
  • An environment that doesn't resemble actual work, somewhat bizarre and uncomfortable. 

Here, something interesting happened. AI hesitated, and humans began each creating their own paths.

The method of printing logs, attempts to redefine the problem, judgment in instantly creating small tools.

 Only then did 'the human thought process' begin to be seen. 

Interviews must now look at 'adaptability to the unknown' rather than 'recreation of reality'

AI is now the world's strongest at 'everything related to knowledge that is already known to the world.' Meanwhile, human strength lies in how one judges, gives up, and detours in situations never experienced before.

If the standard of a past excellent interview was how precisely it recreated actual work, now problems known from existing actual work are areas where AI conquers them faster than any human.

This is precisely why Anthropic paradoxically chose "strange problems that don't resemble reality."

There is also an interesting point. When the time limit is completely removed, the highest-level human engineers still marginally surpassed Claude.

Only under the condition of no time limit are humans still not completely defeated by AI. It was merely that short-term output-centered evaluation methods made humans look like AI's inferior substitutes.

Then, where does human-ness ultimately prove itself?

Not in speed or quantity of knowledge. Not in optimization within the given framework. But in the moment of confronting truly unknown territory — choosing between the anxiety of not knowing what to do and the impulse to try anyway, in that split-second — human judgment is revealed. AI is fast and precise within a frame, but humans create frames themselves. The next question of evaluation is there: can you think outside the frame you've been given?