Can AI Solve the Hacker Attribution Problem?

When I was at the Department of Justice working computer crime cases with the Federal Bureau of Investigation, we experimented—somewhat optimistically—with the idea that hackers could be “profiled” in the same way behavioral scientists profiled arsonists or serial offenders. The theory was appealing: Identify patterns of conduct, infer psychological traits, and narrow the suspect pool. The reality was far less satisfying. The resulting profile—overeducated, socially awkward, isolated—was less a forensic tool than a caricature. It was both wrong and operationally useless. You’ve seen this in movies and TV shows. Lone hacker in the basement eating Cheez-its and wearing a dirty hoodie. I made up the Cheez-its part, but if I were a hacker, that’s what I would be eating. Artistic license.

Later, in private practice, working with a former profiler from the Central Intelligence Agency, the approach became more granular and more productive. Instead of attempting to define “hackers” as a class, we focused on attribution in specific incidents. We asked narrower, evidence-based questions: What is the likely native language of the actor? What cultural idioms appear in the code comments or communications? What time zones are suggested by activity patterns? Does the cadence of attacks reflect a professionalized operation or an opportunistic individual? Learning about a specific threat actor – not so much to know WHO they are as much as WHAT they are.

The answers often emerged from subtle signals. In one reported case, a series of intrusions into the New York Times, tied to reporting on Chinese leadership (Wen Jiabao’s finances), appeared to track the rhythm of a government workday in East Asia—commencing in the morning, pausing for tea, breaking for lunch, and terminating promptly at the close of business. That pattern alone did not prove state sponsorship, but it contributed to a mosaic of attribution that was far more probative than any generalized “profile.” The data – and the data about the data – could be used as a fingerprint. Of sorts.

Attribution has always been a central challenge of cybersecurity. Unlike traditional crimes, where physical presence, eyewitnesses, and jurisdiction provide anchors, cyber operations are designed to obfuscate origin. Attackers route through proxies, compromise intermediate systems, and deliberately plant false flags. Attribution is therefore not a single evidentiary event but a probabilistic exercise—an aggregation of technical indicators, behavioral signals, and contextual intelligence.

Artificial intelligence, however, is beginning to alter that equation.

A recent column by Megan McArdle in The Washington Post provides a striking demonstration of AI’s emerging capabilities in this domain. In controlled experiments, relatively accessible AI models were able to identify the author of anonymous text passages with remarkable accuracy—sometimes from as little as 124 words, assuming a corpus of known writing samples existed.

This is not merely a parlor trick. It is stylometry at scale—what might be called “linguistic fingerprinting.” Every individual exhibits distinctive patterns in syntax, vocabulary, punctuation, and rhetorical structure. Historically, stylometric analysis required expert linguists and was limited in scope and difficult to adapt across languages. AI changes the calculus by enabling rapid, large-scale comparison across vast datasets.

Applied to cybersecurity, the implications are substantial.

First, AI can assist in correlating disparate threat actor identities. If multiple online personas exhibit consistent linguistic signatures, an AI system can infer—probabilistically—that they are controlled by the same individual or group. This has immediate utility in fraud investigations, disinformation campaigns, and coordinated intrusion activity. Is it grass roots or astroturf? AI might be able to tell.

Second, AI can contribute to partial deanonymization. Even where a threat actor’s infrastructure is obfuscated, their communications—phishing emails, ransom notes, forum postings—may betray consistent stylistic markers. These markers can be mapped against known samples to narrow the field of suspects or at least to classify the actor within a defined cohort. At scale, pattern matching may include social media, individual postings, writings, etc.

Third, AI can enhance behavioral profiling. Beyond authorship, machine learning models can infer attributes such as education level, technical sophistication, and even cultural background from textual and coding artifacts. These inferences are inherently probabilistic and must be treated with caution, but they provide an additional layer of analytic context.

Fourth, AI can help distinguish between individual and organizational actors. The consistency—or inconsistency—of style across communications may indicate whether activity is centralized or distributed, scripted or improvisational, amateur or professional.

Yet this is not a panacea.

The introduction of AI into attribution creates a recursive problem: Adversaries can and will use AI to evade attribution. If a threat actor generates communications through a language model, the resulting text may lack the consistent stylistic markers that enable identification. More sophisticated actors may deliberately vary prompts or use multiple models to introduce noise into the attribution process. We are, in effect, entering an era of AI analyzing AI-generated artifacts—a classic adversarial dynamic.

Moreover, attribution remains constrained by data availability. AI systems are only as effective as the corpus on which they are trained. Without sufficient known samples, even the most advanced models cannot reliably identify an author. This limitation is particularly acute for novel actors or highly compartmentalized operations.

Another problem is that, in order for this to work effectively, there must be a sufficiently large database of information to compare this with. That means the collection and analysis of this information.

There are also profound legal and policy implications. If AI can “unmask” anonymous speakers, the consequences extend far beyond cybercrime. As McArdle notes, anonymity underpins not only malicious conduct but also journalism, whistleblowing, and political dissent. The same technology that identifies a hacker could expose a confidential source or a dissident under an authoritarian regime. The dual-use nature of attribution technology is unavoidable.

The trajectory is nonetheless clear. AI will not “solve” the hacker attribution problem in the sense of providing definitive, courtroom-ready identification in every case. But it will materially improve the fidelity, speed, and scale of attribution analysis. It transforms attribution from an artisanal exercise into a data-driven discipline.

In that respect, AI represents an evolution rather than a revolution. The fundamental principle remains unchanged: attribution is about assembling a mosaic of evidence. AI simply adds more tiles—and assembles them faster.

Promising, certainly. But not a substitute for judgment, corroboration and skepticism.

Mark Rasch