In the Weights Measures Which Names Large Language Models Remember, Sparks Debate
A new site called In the Weights ranks how well major AI models recall people from their internal parameters, fueling discussion about memory, bias and digital legacy.
Strong summary of the launch and its purpose
In the Weights, a web tool created by Thomas Dimson and Joey Flynn, systematically tests how well contemporary large language models can identify individual names without using web search.
The site assigns each name a numeric “strength” score based on aggregated model outputs, and it has quickly drawn attention for turning model memory into a comparable metric.
How the site measures model recall
The service queries a set of models — including multiple GPT versions, Claude, Gemini, Grok and Llama families — with prompts that ask each model to describe a named person and produce a short list of results.
Responses are clustered by similarity, then combined into a strength score intended to reflect how strongly a person is encoded in a model’s weights rather than surfaced via internet lookup.
Leaderboard results and notable placements
In the site’s rolling leaderboard, public figures and cultural icons sit near the top, with some surprising pairings and ties between entertainers and historical performers.
For example, an actor and a famed opera singer were shown in close competition for the highest strength values, demonstrating how different models prioritize different cultural touchstones.
Examples and surprises from early tests
A tech writer who checked his own listing found a comparatively high score that placed him well above the median of names tested.
At the same time, some journalists and known industry figures received even higher rankings, illustrating the uneven way memory appears across models.
Transparency on which model returned which answers
In the Weights displays the outputs for each queried model alongside the aggregated score, making it possible to see which model produced which description and which entries appear inconsistent.
The site also flags likely hallucinations — responses where a model invents unsupported facts or misattributes identifiers — to highlight variance in reliability across systems.
Founders’ background and motivations
Dimson and Flynn built the project after leaving a major AI employer, drawing on experience designing model interfaces and a desire to explore how identity is represented in machine-learned systems.
They described the effort as partly playful and partly investigative: an experiment to see whose existence lives on in the neural parameters of powerful models as reliance on search declines.
Community reaction and critical voices
Reception has been intense on social platforms and in AI circles, with many users intrigued by the idea of quantifying “remembered” status inside models.
Some critics have dismissed the exercise as essentially asking multiple chatbots the same question, warning that scores may reflect exposure and dataset quirks more than any intrinsic cultural permanence.
Design choices and user experience
Beyond the scoring mechanics, In the Weights adopts a retro visual style and a leaderboard format that invites comparison and sharing.
That design choice has helped the site spread quickly, turning an experimental diagnostic into a public-facing game of sorts that people use to check their own digital footprint within model parameters.
Planned deeper analysis and limitations acknowledged
The founders say they will probe why different versions within the same model family produce divergent outputs and which models show biases toward particular demographics.
They also acknowledge limitations: a strength score captures only a model’s internal associations and not the fullness of a person’s public presence, and it can be shaped by training data composition and prompt phrasing.
Implications for reputation, bias and dataset transparency
By making model memory measurable, the site raises practical questions about whose names are preserved in AI systems and why, and how those preserved identities influence downstream applications.
Experts note that such tools can surface dataset blind spots and prompt discussions about representation, but they caution against reading scores as definitive proof of impact or value.
In the Weights has converted a technical property of machine learning into a public metric, prompting curiosity and skepticism in equal measure.
As the creators expand the project and researchers examine model differences more closely, the site may become a reference point for debates about AI memory, dataset bias and how digital legacies are encoded in the models shaping public knowledge.