The AI Ethics Brief #166: AI Systems in Human Spaces - Accountability and Unintended Consequences
From MCP vulnerabilities and Reddit impersonation to hiring audits, university deployments, and collapsing AI-first strategies, we track how automation interacts with environments built for people.
Welcome to The AI Ethics Brief, a bi-weekly publication by the Montreal AI Ethics Institute. We publish every other Tuesday at 10 AM ET. Follow MAIEI on Bluesky and LinkedIn.
📌 Editor’s Note
In this Edition (TL;DR)
We examine the emerging risks of AI agents, spotlighting Anthropic’s Model Context Protocol (MCP) and a GitHub exploit that revealed how agents can be manipulated to leak private data.
Builder.ai, once valued at over $1 billion, has collapsed. Alongside quieter updates from DeepSeek and strategy pivots at Klarna and Duolingo, we consider what happens when AI hype, labour misrepresentation, and automation-first models meet operational reality.
In education, OpenAI’s rollout across California State University places ChatGPT into the hands of 460,000 students. We weigh this move against new Apple research, which suggests large reasoning models break down under complexity.
In our AI Policy Corner series with the Governance and Responsible AI Lab (GRAIL) at Purdue University, we break down NYC’s Local Law 144, an early attempt to regulate AI in hiring. While well-intentioned, its narrow scope and limited enforceability have drawn criticism from auditors and researchers.
A University of Zurich study deployed AI to impersonate marginalized users on Reddit without consent, violating community guidelines and raising serious concerns about research ethics in public spaces.
Finally, we share a visual tribute to MAIEI’s late founder, Abhishek Gupta, highlighting how his legacy continues to shape conversations on AI ethics around the world.
🔎 One Question We’re Pondering:
What Risks Emerge as MCP Becomes the Default Interface for AI Agents?
As AI continues to evolve, the integration of AI agents—software entities that act autonomously on behalf of users—presents emerging risks that differ from those associated with more generalized forms of Agentic AI, which broadly refers to AI systems demonstrating goal-directed behaviours.
In Brief #161, we explored Anthropic's Model Context Protocol (MCP), which has swiftly become the dominant standard for AI agent interactions.
A recent blog post by Invariant Labs revealed a serious vulnerability with the official GitHub Model Context Protocol (MCP) server. The issue allowed an attacker to hijack an AI agent by posting a malicious GitHub Issue and extract data from private repositories. This type of exploit is part of what Invariant calls a Toxic Agent Flow, where an agent is manipulated into performing actions not intended by the user.
At the Montreal AI Ethics Institute, we define this emerging risk in straightforward terms. Unlike human operators, AI agents run continuously as autonomous software driven by code. Their uninterrupted operation increases the surface area for risk.
This comes at a critical time as more developers adopt MCP to power AI agents across tools and platforms. First introduced by Anthropic in late 2024 and gaining momentum after a 2025 workshop, MCP has become the dominant interface protocol for agent-driven systems. It enables software agents to communicate with integrated development environments (IDEs) such as Visual Studio Code, Replit, and JetBrains, where they can write code, run scripts, and interact with APIs on behalf of the user. However, the extent of these capabilities depends on the specific agent and IDE integration. Not every agent is available in every IDE, and the depth of automation varies.
As the agentic economy develops, emerging risk vectors are becoming evident. Attackers can poison prompts or tool descriptions to cause agents to leak credentials or execute unauthorized actions. Shared memory systems introduce the risk of context bleeding between agents. Malicious actors can also deploy spoofed MCP servers to intercept requests or route sensitive data to untrusted destinations. In the absence of standard auditing mechanisms, such actions may go undetected, leaving users and organizations with limited avenues for recourse.
A broader concern is how trust and responsibility are defined in a world increasingly mediated by AI agents. There are now hundreds of active MCP servers, but no formal mechanism exists for verifying their security or reliability. If an enterprise deploys an agent using MCP, who certifies the endpoint? When will the first sanctioned MCP server emerge, and who will define the criteria for compliance?
These questions have implications for how users interact with AI systems. In decentralized finance (DeFi), there is growing discussion around agents that hold blockchain wallets and execute transactions autonomously. These agents can rebalance portfolios, respond to market conditions, or interact with smart contracts without direct human involvement. This shift requires new design patterns for interfaces, permissioning, and contingency planning.
The impact on traditional finance may be more complex. As banks and financial institutions explore how agents might support internal operations, such as real-time risk assessments, automated reconciliation, or transaction processing, they will face new challenges related to auditability, liability, and regulatory compliance. What does it mean for an agent to initiate or authorize a financial transaction? What safeguards are in place if something goes wrong?
Understanding the difference between AI agents and Agentic AI is essential.
AI agents are narrowly scoped, modular systems built on top of large language models (LLMs) or large image models (LIMs) that automate specific tasks through external tool integration, structured prompts, and reasoning enhancements. These agents evolve from foundational generative models into more capable systems through extension layers such as API access and function calling.
Agentic AI, by contrast, constitutes a higher level of autonomy and complexity. These systems:
Orchestrate multiple specialized agents to divide and coordinate tasks
Maintain persistent memory for context retention across sessions
Decompose complex objectives into actionable subtasks
Operate in a goal-directed, self-coordinating manner
MCP acts as the infrastructure layer connecting these systems to real-world tools and workflows. It enables advanced functionality while also making it more difficult to monitor and regulate the line between human and machine agency.
As always-on agents take on greater responsibilities in software development, including critical systems such as finance, healthcare, and energy infrastructure, as well as broader digital infrastructure, the stakes are rising. New standards, governance frameworks, and institutional safeguards will be required to ensure that autonomy does not come at the cost of accountability.
Please share your thoughts with the MAIEI community:
❤️ From Canada to Italy: MAIEI’s Founder Remembered Across Communities
La diversità regionale e culturale è fondamentale per ogni discussione sull'etica dell'Al
Luca Baraldi and Laura Zambarda, via the symboolic.ai Instagram page, shared a thoughtful tribute to the late Abhishek Gupta, founder of the Montreal AI Ethics Institute. Originally published in Italian, the tribute is shared below in English to make it accessible to the wider community.
🚨 Here’s Our Take on What Happened Recently
Why Did DeepSeek’s Latest Update Go Largely Unnoticed?
What happened: At the end of May, Deepseek’s R1 model received an update to version DeepSeek-R1-0528, which included improved reasoning capabilities by using more compute resources to “think” for longer. This upgrade led to measurable performance improvements across various benchmarks. However, it did not generate the same level of attention as the original R1 release, which, as we covered in Brief #157, was significant enough to trigger a temporary market reset. By contrast, this May release barely made a ripple.
📌 MAIEI’s Take and Why It Matters:
The muted reception to DeepSeek’s latest update reflects how quickly the AI space moves on. Connor Wright, our Director of Partnerships, has written previously on the mechanics of AI hype, with DeepSeek’s earlier reception serving as a textbook case of FOMO-driven enthusiasm (fear of missing out).
As highlighted in Brief #157, early reporting around DeepSeek’s cost to train (~$5.6 million) was often misinterpreted. This figure referred to the final training run, not the total cost. Despite that, the model’s efficiency under constrained resources prompted a competitive response. OpenAI, Google, and Meta accelerated efforts to improve model efficiency, while the U.S. government responded by tightening export controls on advanced chips to China, leading to a reported $5.5 billion in additional costs for Nvidia.
The lack of reaction to the May update may be attributed to the fact that it was not a release of a new “R2” model. This follows a broader trend in the field, where updates to existing models (e.g., GPT-4.1-mini instead of GPT-5, Gemini 2.5 instead of Gemini 3) fail to drive the same level of public or market interest. It may also signal early symptoms of model collapse, where further model improvements stall due to the increased use of synthetic data and a shortage of high-quality, human-generated data.
This links to earlier accusations that DeepSeek may have used distillation, an AI technique where a smaller model is trained on the outputs of a larger model, such as Google’s Gemini. If true, this would raise further questions about the long-term viability of model performance gains that rely solely on efficiency rather than original, high-quality data. As the field continues to evolve, it becomes increasingly clear that while efficiency is important, robust model performance still depends heavily on the strength and integrity of the underlying data.
When AI-First Backfires: Klarna, Duolingo, and Builder.ai Relearn the Value of People
The AI-first workplace strategy, once hailed as the future, is beginning to show its limitations. Klarna and Duolingo, two high-profile companies that leaned heavily into automation, are now facing the operational and social consequences of prioritizing AI over people.
Two years ago, Klarna CEO Sebastian Siemiatkowski positioned the company as a “guinea pig” for OpenAI, freezing hiring while AI systems replaced hundreds of customer service agents. Today, Klarna is reversing course, announcing plans to reintroduce human staff to improve customer service quality.
This move comes alongside findings from a recent IBM survey, which show that only one in four AI projects delivers a return on investment. Even fewer, just 16 percent, are successfully scaled across organizations. Despite these figures, 64 percent of CEOs say they are continuing to invest in AI primarily to avoid being left behind.
Duolingo, by contrast, is continuing its shift to an AI-first approach, but the company is encountering public backlash. Viral posts on TikTok and other platforms have criticized the change as alienating and inauthentic. Duolingo has responded by clarifying that AI will enhance, not replace, its language experts. However, this message does not seem to be landing with many of its users.
📌 MAIEI’s Take and Why It Matters:
The AI-first approach is beginning to look less like a strategic leap forward and more like an overcorrection. Klarna and Duolingo offer two examples of what happens when companies treat automation as a replacement for human engagement, rather than a tool to support it.
This echoes earlier tensions highlighted in Shopify’s internal hiring memo, which we discussed in Brief #162. When employees are required to “prove a human is necessary,” the cost is not only operational but cultural.
Duolingo’s experience is particularly revealing. For a brand built on playfulness and social learning, fully automating the experience diminishes what many users value most: connection. After all, language is inherently social. Users intuitively sense that fully automating its instruction diminishes the experience. This type of dissonance is often overlooked by AI-first strategies.
Klarna’s decision to rehire human agents is more pragmatic. It reflects not only a focus on service quality but also a broader ethical consideration: automation may reduce costs, but those savings are often externalized, borne by customers who receive subpar experiences and workers who lose employment stability.
The collapse of Builder.ai, a once billion-dollar AI startup, further highlights the need to reassess how AI is positioned within companies. As reported by the Financial Times, Builder.ai filed for insolvency in May after internal investigations revealed that projected revenues were significantly overstated. Revenue estimates for 2024 were revised from $220 million to $55 million, and for 2023 from $180 million to $45 million. Investigators also raised concerns about questionable sales practices involving resellers and long-standing unpaid bills.
Despite raising over $450 million from investors, including Microsoft, SoftBank, and Qatar’s sovereign wealth fund, giving it a valuation of more than $1 billion, the company was ultimately left with just $5 million in cash. Additional reporting suggested that Builder.ai employed more than 700 workers in India, many of whom were building what was publicly marketed as AI-generated software.
Much like Klarna and Duolingo, Builder.ai’s trajectory highlights the consequences of treating AI not as a tool to augment human capacity but as the product itself. When hype runs ahead of actual capability, and when labour is misrepresented as automation, the gap between public expectation and operational reality becomes increasingly difficult to maintain.
These examples signal the need for a more measured approach to AI adoption. Integration should not be driven solely by competitive pressure. Organizations must ask: Are we empowering people with new tools, or simply replacing them? Are we improving outcomes, or removing the very elements that users and workers find meaningful?
The University of Zurich’s Non-Consensual AI Reddit Experiment
Researchers at the University of Zurich conducted an anonymous, non-consensual experiment on Reddit users over a four-month period, which was only recently made public. r/changemyview is a subreddit dedicated to challenging users’ perspectives on contentious topics. Redditors in this forum award “deltas” to peers who successfully change their view on an issue.
Seeking to evaluate the persuasiveness of AI-generated content, researchers at the University of Zurich employed large language models to generate over 1,500 contributions across 34 Reddit accounts and analyzed the number of deltas they received. The study violated multiple ethical standards: r/changemyview strictly prohibits undisclosed AI-generated content, no consent was obtained from users, and the LLMs impersonated many marginalized identities. The researchers now face significant backlash and potential legal action from Reddit and have stated they will not be publish the findings.
📌 MAIEI’s Take and Why It Matters:
The research found that their LLM-generated responses were more persuasive than human-written ones. However, the findings, rendered unusable by poor experimental design, are less important than the broader concerns this case raises. Most notably, it highlights the growing difficulty of distinguishing between human and AI interactions in online spaces.
Following the revelation, users on r/changemyview expressed concerns that many of the accounts they engage with may not be human. If researchers were able to post over 1,500 AI-generated comments undetected within just four months, it raises valid questions about how much of Reddit’s content may already be influenced by similar bots. This case offers a clear example of AI encroaching on spaces intended for authentic human dialogue.
The uncertainty about whether content is human- or AI-generated further complicates online experimentation. Existing research suggests that LLMs tend to prefer content generated by other LLMs. In a digital environment increasingly populated by AI, researchers cannot reliably assess the impact of AI-generated content on humans, potentially inflating its perceived persuasiveness. This also risks corrupting future model performance, as Reddit and similar platforms are frequently scraped for large language model (LLM) training and retrieval-augmented generation (RAG) data. If AI-generated material is continuously fed back into model training pipelines, it may degrade model quality through self-reinforcing loops.
Lastly, the LLM-generated content appropriated marginalized identities, including impersonating a Black man opposing the Black Lives Matter movement, LGBTQ+ individuals, and a male rape victim. This not only misled users who engaged in vulnerable discussions under the assumption they were speaking to real people, but also trivialized the lived experiences of the groups being impersonated. Legal scholar Chaz Arnett describes this as a “digital blackface,” stating in Science: “The very act of presuming that you could pick up and put on a fundamental identity belittles the lived experiences of those groups.”
Undisclosed generative AI already poses a threat to the integrity of human spaces. If left unchecked, it risks undermining human relationships, trust, and identity online.
Did we miss anything? Let us know in the comments below.
💭 Insights & Perspectives:
AI Policy Corner: New York City Local Law 144
This article is part of our AI Policy Corner series, a collaboration between the Montreal AI Ethics Institute (MAIEI) and the Governance and Responsible AI Lab (GRAIL) at Purdue University.
In this edition, we spotlight New York City’s Local Law 144 (LL144), enacted in July 2023, which aims to regulate the use of Automated Employment Decision Tools (AEDTs) in hiring and promotion. While it mandates bias audits and public disclosures, its scope is limited. The definition of AEDTs, which are tools that substantially assist or replace human decision-making, is open to interpretation and left to the discretion of the employer. As a result, many human-in-the-loop systems fall outside its coverage. The law also focuses only on race/ethnicity and sex, omitting protections for age, disability, and other categories.
Moreover, LL144 applies only to employers using these tools, not the vendors or developers of AEDTs, making it difficult for those using the tools to correct for potential biases without access to the underlying systems. Despite these concerns, LL144 remains a significant step towards fair artificial intelligence systems. It showcases how governance strategies, such as public disclosures and audits, can mitigate the risks of bias, discrimination, and civil rights violations. It further demonstrates the merits of transparent AI, as it allows AEDTs to be held accountable to the same standards as humans when it comes to employment decisions.
Still, as with many early regulatory efforts, implementation gaps and definitional ambiguity risk limiting its actual impact in the real world. The combination of these factors, according to various auditors of these AEDTs, makes LL144 well-intentioned but ineffective.
To dive deeper, read the full article here.
What if today's AI can't actually think?
Apple’s recently released research paper, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, offers a critical lens on the current capabilities of Large Reasoning Models (LRMs). While the paper reflects Apple’s internal research perspective and has not undergone peer review, it highlights a broader pattern in the field: reasoning models are often marketed as systems that can "think" through and are capable of navigating complex tasks, yet the study finds that these models face significant limitations when presented with even moderately difficult problems. Apple researchers tested models on controlled reasoning benchmarks and found that performance collapses once complexity crosses a certain threshold. Models also showed a tendency to reduce their reasoning effort as problems became more difficult, suggesting structural or training-related limitations.
This is particularly relevant in light of the recent New York Times article on OpenAI’s rollout across the California State University system, which will make ChatGPT available to more than 460,000 students across its 23 campuses to help prepare them for “California’s future A.I.-driven economy.” OpenAI is positioning its tools not only as supplemental aids but as foundational infrastructure for education, integrated directly into learning management systems and course delivery.
If reasoning models cannot reliably solve complex problems, there are serious questions about the role they should play in education. Critics have noted that these deployments risk overstating AI’s current capabilities while diverting attention from issues such as accuracy, bias, and learning integrity. As Apple’s findings suggest, current systems may appear capable on the surface but fail to reason in a consistent or meaningful way.
In educational settings, that disconnect has consequences. Students may unknowingly rely on tools that cannot deliver valid answers. Instructors may struggle to assess whether work is student-generated or model-assisted. Most importantly, the assumption that AI can enhance learning may be undermined if the technology fails to meet the level of reasoning required by the curriculum.
The gap between perception and actual capability continues to grow. It is in high-stakes contexts, such as education, that this gap matters most.
To dive deeper, read the Apple research paper here and the New York Times article here.
❤️ Support Our Work
Help us keep The AI Ethics Brief free and accessible for everyone by becoming a paid subscriber on Substack or making a donation at montrealethics.ai/donate. Your support sustains our mission of democratizing AI ethics literacy and honours Abhishek Gupta’s legacy.
For corporate partnerships or larger contributions, please contact us at support@montrealethics.ai
✅ Take Action:
Have an article, research paper, or news item we should feature? Leave us a comment below — we’d love to hear from you!
https://substack.com/@uncertaineric/note/c-126878332?r=3zosze