AI Ethics Brief #140: Limitations of RLHF, data annotation aspirations, better rewards in LLM training, PII leaks in ChatGPT, and more.
With the advent of ever-more powerful Generative AI systems, how much more difficult have the jobs of HR professionals become and what can they do to fight against this unwanted tide?
Welcome to another edition of the Montreal AI Ethics Institute’s weekly AI Ethics Brief that will help you keep up with the fast-changing world of AI Ethics! Every week, we summarize the best of AI Ethics research and reporting, along with some commentary. More about us at montrealethics.ai/about.
Support our work through Substack
💖 To keep our content free for everyone, we ask those who can, to support us: become a paying subscriber for the price of a couple of ☕.
If you’d prefer to make a one-time donation, visit our donation page. We use this Wikipedia-style tipping model to support our mission of Democratizing AI Ethics Literacy and to ensure we can continue to serve our community.
This week’s overview:
🙋 Ask an AI Ethicist:
Which industries are well-poised to harness Generative AI but will face significant ethical concerns this year?
✍️ What we’re thinking:
Hallucinating and Moving Fast
🤔 One question we’re pondering:
With the advent of ever-more powerful Generative AI systems, how much more difficult have the jobs of HR professionals become and what can they do to fight against this unwanted tide?
🪛 AI Ethics Praxis: From Rhetoric to Reality
To what extent will RLHF be able to address some of the most pressing challenges in LLMs?
🔬 Research summaries:
Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation.
Whose AI Dream? In search of the aspiration in data annotation.
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
📰 Article summaries:
Personal Information Exploit With OpenAI’s ChatGPT Model Raises Privacy Concerns - The New York Times
These six questions will dictate the future of generative AI | MIT Technology Review
We’re Not Living a “Predicted” Life: Student Perspectives on Wisconsin’s Dropout Algorithm – The Markup
📖 Living Dictionary:
What are frontier AI models?
🌐 From elsewhere on the web:
Top 25 Most Read Pieces on Tech Policy Press in 2023
💡 ICYMI
Promoting Bright Patterns
🚨 2024 is off to a start(!) - here’s our quick take on what happened last week.
In the past week, there have been several significant developments in the field of responsible AI:
1. Microsoft Research's Advancements in Transparency: Microsoft Research has been working on a framework called AHA! (Anticipating Harms of AI), which is a human-AI collaboration for systematic impact assessment. This framework enables people to make judgments about the impact of potential AI deployment. It uses a language model to generate vignettes, or fictional scenarios, that account for an ethical matrix of problematic AI behaviors or harms. The goal of these efforts is to advance transparency and responsible AI.
2. ChatGPT's Second Year: As the integration of AI systems deepens into society, businesses, and education, an amplified focus on ethical considerations becomes imperative. This includes addressing algorithmic biases, safeguarding privacy, ensuring security and copyright protection, as well as promoting transparency, fairness, and interpretability. Deploying mechanisms for responsible AI will be central to these efforts.
3. DoD's International Cooperation on Responsible AI: The U.S. Department of Defense (DoD) is working towards building international cooperation on responsible AI and autonomy. The DoD has been on a path to getting the responsible and ethical use of AI and autonomy right through the department’s guidance on autonomous weapons, or strategies like the 2023 Data, Analytics, and AI Adoption Strategy.
4. Workday's AI Trust Gap Survey: Workday conducted a global survey revealing a trust gap in the workplace when it comes to the responsible development and deployment of AI. The survey found a lack of organization-wide visibility around AI regulation and guidelines, with three in four employees saying their organization is not collaborating on AI regulation.
5. Open-Source AI Regulations: David Evan Harris, a senior research fellow at the International Computer Science Institute, proposed regulations for open-source AI. These include pausing all new releases of unsecured AI systems until developers have met certain requirements, establishing registration and licensing of all AI systems above a certain threshold, and creating liability for “reasonably foreseeable misuse” and negligence.
Did we miss anything?
🙋 Ask an AI Ethicist:
Every week, we’ll feature a question from the MAIEI community and share our thinking here. We invite you to ask yours, and we’ll answer it in the upcoming editions.
Here are the results from the previous edition for this segment:
A somewhat even split across all the options here with a very interesting discussion taking place in the comments section for last week’s edition for those who want to grab a few additional resources and peek into the responses behind “Other.” Hallucinations certainly top the chart here because they have a direct impact on the utility that one can derive from the use of GenAI systems, they’re not completely reliable just yet and integrating them into mission-critical contexts is rife with pitfalls.
Moving along to a question we received from a reader, K.R., on which industries are well-poised to harness Generative AI but will face significant ethical concerns this year (unless we can operationalize some of the tooling and legal mechanisms - see last week’s comments section).
The industries facing the greatest ethical barriers in the adoption of generative AI include healthcare, banking, retail, manufacturing, and human resources.
Healthcare: The biggest ethical challenge in healthcare is data privacy and security. Patient data is highly sensitive, and the misuse of this data can lead to serious consequences. A potential solution is to implement robust data governance practices, including anonymization of patient data and strict access controls, to ensure that data is used responsibly and securely.
Banking: In the banking industry, the primary ethical concern is bias and discrimination. AI systems used for credit scoring or loan approvals can inadvertently perpetuate societal biases, leading to unfair outcomes. A potential solution is to use fairness metrics and bias mitigation techniques during the model development and validation process to ensure that the AI system does not discriminate against certain groups.
Retail: The retail industry faces the challenge of transparency and accountability. AI systems used for personalized marketing or pricing can lead to concerns about consumer manipulation and privacy. A potential solution is to provide clear explanations to consumers about how their data is being used and to give them control over their data.
Manufacturing: Worker displacement is a major ethical concern in the manufacturing industry. The adoption of generative AI can potentially lead to job losses. A potential solution is to provide training and support to help workers acquire new skills and transition to new roles in the AI-driven economy.
Human Resources (HR): In HR, the main ethical challenge is bias and discrimination. AI systems used for resume screening or performance evaluation can inadvertently perpetuate societal biases, leading to unfair hiring or promotion decisions. A potential solution is to use improve the human-in-the-loop interventions to avoid discrimination and unintended biases from creeping in.
If you know of good solutions that can help address some of the challenges outlined per sector above, please let us know! Share your thoughts with the MAIEI community:
✍️ What we’re thinking:
WHAT’S HAPPENING: "Move fast and break things" is broken. But we've all said that many times before. Instead, I believe we need to adopt the "Move fast and fix things" approach. Given the rapid pace of innovation and its distributed nature across many diverse actors in the ecosystem building new capabilities, realistically, it is infeasible to hope to course-correct at the same pace. Because course correction is a much harder and slow-yielding activity, this ends up amplifying the magnitude of the impact of negative consequences.
FOG OF WAR: What we need to do instead is to think ahead of how the landscape of problems and solutions is going to evolve. For example, when thinking about the problem of hallucinations in GenAI systems, it is unclear at the moment where and how they will manifest. This hinders the adoption of GenAI-powered systems by companies that seek to offer safe and reliable outcomes to their customers, e.g., in customer-service chatbots in financial services or other high-stakes scenarios.
To delve deeper, read the full article here.
🤔 One question we’re pondering:
When it comes to human resources, recruiters and hiring staff (including managers) are usually overwhelmed with the number of applications they receive, often resorting to techniques like keyword matching to quickly sort and filter candidates. With the advent of ever-more powerful Generative AI systems, how much more difficult has their job become and what can they do to fight against this unwanted tide?
We’d love to hear from you and share your thoughts with everyone in the next edition:
🪛 AI Ethics Praxis: From Rhetoric to Reality
Reinforcement Learning from Human Feedback (RLHF) dominated discussions in 2023 as a technique that helped LLMs deliver on their promise of utility while providing a certain degree of guardrails, alignment, and tuning without which problems of hallucination, bias, and other issues would be even more pronounced. Yet, it isn’t a panacea.
The tendency of LLMs to generate unfaithful or fabricated content, known as misinformation, hallucination, and inconsistency is one of the main issues. This not only affects the trustworthiness of LLMs but also limits their applications in professional fields such as medicine and law. Another challenge is the susceptibility of RLHF to produce harmful outputs when prompted accordingly. While RLHF models exhibit superior judgment when it comes to harmful content, they are not entirely immune to inducements.
RLHF also faces challenges in ensuring accurate inner alignment without ulterior motives. The design of RLHF models must consider fundamentally conflicting feedback and preferences, which can be a complex task. Moreover, RLHF has challenges involving feedback, the reward model, and the policy. Some of these problems are tractable, but others are fundamental and substantial enough that overcoming them would require a method that is no longer a form of RLHF.
To address these issues, several approaches have been proposed. One is the Safe Reinforcement Learning from Human Feedback (Safe RLHF), a novel algorithm for human value alignment that explicitly decouples human preferences regarding helpfulness and harmlessness. This approach effectively avoids confusion about the tension and allows for the training of safer and more helpful LLMs. Another approach is to improve how RLHF is approached. Even some of the fundamental problems can be alleviated with improved techniques and methods. For example, RLHF can be used to refine and improve the performance of LLM models in generating human-like, relevant, and appropriate responses. Lastly, it's important to note that while RLHF has its limitations, it has also been successful in aligning LLMs with complex human values.
You can either click the “Leave a comment” button below or send us an email! We’ll feature the best response next week in this section.
🔬 Research summaries:
This paper presents a holistic vision of trustworthy AI through the principles for its ethical use and development, a philosophical reflection on AI Ethics, an analysis of regulatory efforts around trustworthy AI, and an examination of the fundamental pillars and requirements for trustworthy AI. It concludes with a definition of responsible AI systems, the role of regulatory sandboxes, and a debate on the recent diverging views about AI’s future.
To delve deeper, read the full summary here.
Whose AI Dream? In search of the aspiration in data annotation.
This paper delves into the crucial role of annotators in developing AI systems, exploring their perspectives, aspirations, and ethical considerations surrounding their work. It offers valuable insights into the human element within AI and the impact annotators have on shaping the future of artificial intelligence.
To delve deeper, read the full summary here.
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
This paper explores a new framework called Fine-Grained RLHF that improves how LLMs are trained using human feedback. Instead of just asking people which LLM output they prefer overall, the researchers had annotators label specific parts of outputs by the type of error (e.g., sentence 2 is not truthful). Experiments show this more detailed “fine-grained” feedback allows the LLM to better learn what kinds of outputs people want.
To delve deeper, read the full summary here.
📰 Article summaries:
What happened: Last month, an alarming email from Rui Zhu, a Ph.D. candidate at Indiana University Bloomington, revealed that he and his research team successfully extracted a list of business and personal email addresses for over 30 New York Times employees from GPT-3.5 Turbo. By bypassing the model’s privacy restrictions, they demonstrated the potential for generative AI tools like ChatGPT to divulge sensitive personal information with slight adjustments. The researchers utilized the API of ChatGPT's fine-tuning process to achieve this, exploiting a vulnerability that allowed them to override certain defenses present in the tool.
Why it matters: ChatGPT, when queried, generates responses based on extensive training data rather than performing web searches. The concern arises from the revelation that memories in large language models, like ChatGPT, can be jogged, enabling them to recall and disclose sensitive information. While the model is designed to forget irrelevant details during training, recent experiments show that memories can resurface, potentially exposing personal information. The incident with New York Times email addresses highlights the risk of AI tools inadvertently revealing private data, raising questions about the safety of fine-tuning processes and the extent of protection against accessing sensitive information.
Between the lines: The researchers, including Rui Zhu, accessed ChatGPT's API for fine-tuning, a process intended for users to provide additional knowledge in specific domains. However, they found that this method could override standard restrictions, allowing requests that are denied in the regular ChatGPT interface. The vulnerability exposes the challenge of ensuring the privacy of models like ChatGPT, as the company cannot guarantee what potentially sensitive information lies within the models' training data memory. The risk is further compounded by the ongoing learning nature of these models, and concerns are raised about the unknown contents of ChatGPT's training data.
These six questions will dictate the future of generative AI | MIT Technology Review
What happened: This article discusses the transformative impact of the internet, paralleling it with the potential consequences of generative AI, using ChatGPT as an example. It draws attention to the unintended downsides of the internet, such as cyberbullying and misinformation. It suggests that generative AI, now equipped with infrastructure from various tech giants, may face similar challenges as it becomes widely used. It also explores predictions related to biases in AI models, legal issues involving copyright infringement, and the potential impact on various professions, including concerns about job displacement.
Why it matters: The narrative underscores the inherent biases in AI models, originating from the real-world data they are trained on, posing challenges related to gender, racial discrimination, and other societal biases. It predicts that despite efforts to mitigate bias, it will persist in generative AI models. The discussion expands to legal concerns, highlighting class-action lawsuits against tech companies for alleged copyright infringement, and predicts the emergence of ethical data marketplaces. The article also explores the potential impact of AI on employment, especially for white-collar workers, cautioning against exaggerated fears of mass job losses but acknowledging evolving roles and skill requirements.
Between the lines: It is important to reflect on the ongoing hype surrounding AI and ChatGPT's role in shaping perceptions of AI capabilities. Despite the hype, the text questions the lack of a clear "killer app" for AI and suggests that user engagement may dwindle without one. Drawing parallels with the early days of the internet, it acknowledges the possibility of generative AI facing similar challenges and fading out if a groundbreaking application does not emerge. The article also emphasizes that as AI becomes mainstream, societal concerns surrounding its impact will become more pervasive and demand careful consideration.
What happened: This article recounts the discovery of the Dropout Early Warning System (DEWS) algorithm used by Wisconsin to predict future dropouts, revealing its reliance on factors such as race. As Black students, the authors found it troubling that their race contributed to being labeled as "high risk" of dropping out, particularly when the algorithm proved inaccurate. It also highlights their involvement in the investigation and subsequent removal of DEWS data from school dashboards, prompting the state to consider changes to the algorithm.
Why it matters: The authors express their initial shock at learning about the algorithm's use of race to predict graduation likelihood, emphasizing the flaws in predicting failure instead of assisting students who genuinely need help. They stress the importance of addressing racism within school systems advocating for support and resources for struggling students instead of stigmatizing them.
Between the lines: The reluctance of students to report incidents to administrators led to the creation of the Black Student Union (BSU) as a safe space for sharing experiences. The authors express concern that addressing and correcting students' behavior shouldn't be solely the responsibility of the BSU. The article underscores the need for discussions about racism and biases in algorithms, initiating conversations that led to the formation of a "restorative practice" group aiming to improve mental health policies and address various forms of discrimination on campus.
📖 From our Living Dictionary:
👇 Learn more about why it matters in AI Ethics via our Living Dictionary.
🌐 From elsewhere on the web:
Top 25 Most Read Pieces on Tech Policy Press in 2023
Our founder, Abhishek Gupta, contributed a seminal piece titled “Beware the Emergence of Shadow AI“ to the Tech Policy Press that made it into the Top 25 pieces for 2023! You can take a look at the full list here.
To delve deeper, read the full article here.
💡 In case you missed it:
User experience designers face increasing scrutiny and criticism for creating harmful technologies, leading to pushback against unethical design practices. While clear-cut harmful practices such as dark patterns have received attention, trends towards automation, personalization, and recommendation present more ambiguous ethical challenges. To address potential harm in these “gray” instances, we propose the concept of “bright patterns” – persuasive design solutions that prioritize user goals and well-being over their desires and business objectives.
To delve deeper, read the full article here.
Take Action:
We’d love to hear from you, our readers, on what recent research papers caught your attention. We’re looking for ones that have been published in journals or as a part of conference proceedings.
I would add the pertinent question AT WHAT COST can RLHF address some of the most pressing challenges of LLMs (considering the explotation of ghost workers at minimal wages)?
And if non-transparent, gigantic datasets are one issue (e.g. as illustrated by ChatGPT), what value lies in creating and improving Little Language Models, as possibly the better LLMs based on highly contextual, curated, high-quality datasets?
Credit for this last point goes to Lelapa.AI and a conversation I had with Pelonomi Moiloa.