Anthropic Project Deal Experiment Shows AI Agents Closing Real Marketplace Transactions
Anthropic Project Deal pilot used AI agents to represent buyers and sellers in a controlled marketplace, completing 186 deals worth over $4,000 and revealing fairness concerns.
Anthropic said its internal pilot, dubbed Project Deal, placed AI agents on both sides of a classified-market simulation where employees could buy and sell real items. The experiment, conducted with 69 volunteer staff each given a $100 gift-card budget, produced 186 completed transactions and tested whether autonomous models could negotiate and honor real-world exchanges. Company researchers described multiple marketplace designs and flagged disparities in outcomes tied to the sophistication of the AI agents representing participants.
Pilot design and participant pool
Anthropic ran Project Deal as a limited internal experiment with a self-selected group of 69 employees who received $100 in gift cards to spend in the marketplace. Participants listed goods and interacted through AI agents that acted as their proxies rather than negotiating directly. The company framed the exercise as exploratory, emphasizing that the pilot tested both technical feasibility and behavioral dynamics rather than aiming for a production system.
The volunteer pool and gift-card funding constrained the size and stakes of the exercise, but researchers treated completed transactions as real: when deals were struck they were honored and participants exchanged actual goods and money. That real-world follow-through distinguished the pilot from purely simulated negotiations.
Marketplace models and experimental variants
Anthropic reported operating four distinct marketplace setups to compare agent behaviors and market outcomes. One environment was designated “real,” where every participant was represented by the company’s most advanced model and deals were actually carried out. The other three marketplaces ran different models or rulesets for comparative study, allowing researchers to isolate how agent capability and instruction affected results.
The firm tested variables including agent decision rules, negotiation strategies, and whether human oversight was exercised during exchanges. Running multiple marketplaces in parallel gave Anthropic a comparative dataset to evaluate which model configurations produced the most favorable economic outcomes for represented users.
Transaction results and economic scale
Across the pilot, agents negotiated 186 deals that Anthropic said amounted to more than $4,000 in exchanged value. With each of the 69 participants given $100 to spend, the experiment produced a substantial number of transactions relative to the small, internally funded pool. The company described being “struck by how well Project Deal worked,” noting both the volume and the completion rate of negotiated agreements.
Beyond gross totals, researchers examined metrics such as sale likelihood, final negotiated prices, and time-to-agreement to gauge market efficiency. Those operational performance indicators informed conclusions about how agent sophistication influenced individual participant outcomes.
Agent capability and differential outcomes
Anthropic’s analysis found that users represented by more advanced models tended to receive objectively better outcomes in negotiations, a gap the company termed an “agent quality” issue. More capable agents secured more favorable prices or higher success rates, suggesting that model proficiency directly affected participants’ economic positions within the market. Crucially, the company noted that less-advantaged participants often did not perceive they had been disadvantaged.
The possibility that people on the losing end might not realize they were worse off raised ethical and fairness questions for automated representation. If AI intermediaries routinely produce unequal results, that asymmetry could amplify existing economic disparities when deployed at scale.
Instruction effects and behavioral surprises
Contrary to expectations, the initial instructions given to agents had little measurable effect on whether items sold or on the prices negotiated in the pilot. Anthropic observed that variations in agent directives did not significantly alter sale likelihood or negotiated outcomes, implying that emergent agent behavior and model capability may dominate scripted guidance. This finding suggests that tweaking prompts or initial instructions alone may not be sufficient to control market outcomes.
Researchers stressed that these behavioral results merit further study in larger, more diverse populations and with higher-stakes transactions. The limited scope of the pilot means the durability of these patterns under different economic conditions remains an open question.
Policy and industry implications
Project Deal’s results spotlight practical and regulatory issues companies will face as AI intermediaries are integrated into commercial platforms. The experiment demonstrates that autonomous agents can execute real trades, which raises consumer-protection, transparency, and competition concerns. Regulators and industry groups may need to consider standards for disclosure of agent representation, equitable agent access, and mechanisms to detect and mitigate agent-quality disparities.
Anthropic positioned the pilot as a learning exercise, but the findings underscore broader questions about automation in marketplaces. How platforms certify agent competence, how they notify counterparties, and how they remediate harms from unequal agent performance will be central as such systems move from research labs to public markets.
The pilot confirmed that AI agents are capable of completing real transactions under controlled conditions, but it also highlighted fairness and governance challenges that will require further research and policy attention before agent-based marketplaces scale.