AI Ethics Brief #139: Measuring surprise, definition of GPAIs, getting started with external stakeholder engagement, and more.
How can we quantify the return on investment (ROI) or measure the success of integrating external stakeholder feedback into the AI development process?
Welcome to another edition of the Montreal AI Ethics Institute’s weekly AI Ethics Brief that will help you keep up with the fast-changing world of AI Ethics! Every week, we summarize the best of AI Ethics research and reporting, along with some commentary. More about us at montrealethics.ai/about.
Support our work through Substack
💖 To keep our content free for everyone, we ask those who can, to support us: become a paying subscriber for the price of a couple of ☕.
If you’d prefer to make a one-time donation, visit our donation page. We use this Wikipedia-style tipping model to support our mission of Democratizing AI Ethics Literacy and to ensure we can continue to serve our community.
This week’s overview:
🙋 Ask an AI Ethicist:
What should be the key areas of focus for Responsible AI in 2024 for practitioners who are getting started on their Responsible AI journeys?
✍️ What we’re thinking:
Poor facsimile: The problem in chatbot conversations with historical figures
🤔 One question we’re pondering:
How can we quantify the return on investment (ROI) or measure the success of integrating external stakeholder feedback into the AI development process?
🪛 AI Ethics Praxis: From Rhetoric to Reality
Getting started with external stakeholder engagement in operationalizing Responsible AI
🔬 Research summaries:
Clinical trial site matching with improved diversity using fair policy learning
Operationalizing the Definition of General Purpose AI Systems: Assessing Four Approaches
Measuring Surprise in the Wild
📰 Article summaries:
How do we know how smart AI systems are? | Science
AI scientists make ‘exciting’ discovery using chatbots to solve maths problems
At Meta, Millions of Underage Users Were an ‘Open Secret,’ States Say - The New York Times
📖 Living Dictionary:
What is the relevance of AI-generated text detection tools in AI ethics?
🌐 From elsewhere on the web:
Emerging AI Governance is an Opportunity for Business Leaders to Accelerate Innovation and Profitability
💡 ICYMI
Going public: the role of public participation approaches in commercial AI labs
🚨 Happy New Year - let’s make 2024 the year of Responsible AI!
Welcome back to the AI Ethics Brief - this is our first edition for 2024! Wishing you all the health and happiness in the world!
Our sincere hope is that this year becomes the year where we move from regulations, principles, ideas, research that has been in motion to experiments and implementations in practice within organizations that are building and deploying AI systems across traditional and novel industries.
If there are examples of folks who are doing it well (and those who aren’t!), please let us know as we’d like to chat with them. On to the rest of the newsletter!
🙋 Ask an AI Ethicist:
Every week, we’ll feature a question from the MAIEI community and share our thinking here. We invite you to ask yours, and we’ll answer it in the upcoming editions.
Here are the results from the previous edition for this segment:
That’s very interesting to see that there is a majority of folks who have seen higher quality output from GenAI systems than from humans. This immediately raises the question on who the humans are that one is making the comparison to and how subjective is the evaluation of the outputs (e.g., solutions to closed-form problems are more objective than the beauty of a piece of artwork).
Over the holiday break, one of our readers R.S. reached out asking on what should be the key areas of focus for Responsible AI in 2024 for practitioners who are getting started on their Responsible AI journeys?
A new year is always an opportunity to reset our frame of thinking and perhaps also evaluate what is working well and what isn’t. We’ve spent some time analyzing the effectiveness of various approaches to Responsible AI, contextualized them with the developments that took place in 2023 and then synthesized them into the following points as (a) interdisciplinary insights and (b) technical considerations for those who are just getting started with Responsible AI.
Enhanced Transparency and Explainability in AI Systems:
Interdisciplinary Insight: Drawing from cognitive science and human-computer interaction (HCI), the focus should be on developing AI systems whose decision-making processes are understandable not just to experts, but also to lay users. This involves leveraging techniques like interactive visualizations and narrative explanations that align with human cognitive processes.
Technical Considerations: Implementing methods like Layer-wise Relevance Propagation (LRP) or SHAP (SHapley Additive exPlanations) to provide insight into neural network decisions. Additionally, there's a growing need for AI systems to generate self-explanations in natural language, bridging the gap between complex AI algorithms and human understanding.
Robustness and Fairness in AI Models:
Interdisciplinary Insight: From the perspective of sociology and organizational behavior, ensuring AI fairness involves understanding and addressing biases not only in data but also in the societal structures that the data represents. This includes actively working to understand marginalized perspectives and incorporating diverse datasets.
Technical Considerations: Implementing advanced techniques for bias detection and mitigation, such as adversarial training, fairness-aware modeling, and continual learning to adapt to evolving social norms and biases. Furthermore, there's a need for comprehensive benchmarking that goes beyond traditional performance metrics to include fairness and bias assessments.
Sustainable and Ethical AI Development:
Interdisciplinary Insight: From an environmental and design perspective, the focus should be on developing AI in a way that is sustainable, considering the environmental impact of training large models. This includes efficient model design and the use of green computing resources.
Technical Considerations: Emphasis on energy-efficient algorithms, model pruning, and quantization to reduce the computational load of AI models. Also, exploring new paradigms like federated learning that can minimize the environmental footprint of AI by reducing data transfer and centralized processing.
Given your response to the above poll, what are some resources that you’ve seen/would like to see to help address the issue that you voted for? Share your thoughts with the MAIEI community:
✍️ What we’re thinking:
Poor facsimile: The problem in chatbot conversations with historical figures
It is important to recognize that AI systems often provide a poor representation and imitation of a person's true identity. As a reference, it can be compared to a blurry JPEG image, lacking depth and accuracy. AI systems are also limited by the information that has been published and captured in their training datasets. The responses they provide can only be as accurate as the data they have been trained on. It is crucial to have extensive and detailed data in order to capture the relevant tone and authentic views of the person being represented.
To delve deeper, read the full article here.
🤔 One question we’re pondering:
Related to the thinking on external stakeholder engagement discussed in this week’s “AI Ethics Praxis” section, we’re trying to work our way to answer the following question: How can we quantify the return on investment (ROI) or measure the success of integrating external stakeholder feedback into the AI development process? While the moral argument is easy to make, without senior leadership support to get the resources and authority to implement this within the organization’s practices, we will not get too far.
We’d love to hear from you and share your thoughts with everyone in the next edition:
🪛 AI Ethics Praxis: From Rhetoric to Reality
We often find ourselves advocating for the incorporation of feedback from external stakeholders. Yet, in our day-to-day work, as machine learning practitioners and Responsible AI advocates, we sometimes struggle to make that a reality. There are a few challenges such as IP protections, sourcing the right external stakeholders, providing sufficient background knowledge and context, and being able to integrate feedback at the right times in the AI lifecycle, amongst other areas of concern.
To get started though, here are a few ideas that we’ve found to be practical that emerge from our work with organizations around the world last year. Specifically, we’ve found the following segments to be a good starting point upon which you can add more as the organization matures in its approach to external stakeholder engagement: (1) Identification and Engagement, (2) Feedback Integration, (3) and Continuous Monitoring and Iteration.
1. Identification and Engagement of External Stakeholders
Identification: Begin by identifying a diverse group of external stakeholders relevant to your AI system. These may include end-users, industry experts, community representatives, ethical and legal experts, and those potentially impacted by the AI system. Diversity in stakeholder representation ensures a wide range of perspectives, particularly from groups that might be disproportionately affected.
Engagement Strategies: Develop a structured approach for engaging these stakeholders. This could involve regular meetings, workshops, and surveys. Emphasize transparent communication about the AI system’s capabilities, limitations, and intended use cases. Encourage open dialogue where stakeholders feel comfortable expressing concerns and suggestions.
Documentation: Create a system for documenting stakeholder feedback. This might include quantitative data from surveys as well as qualitative insights from discussions and interviews. Ensure this documentation is accessible and organized for effective analysis.
2. Feedback Integration into the AI Lifecycle
Analysis of Feedback: Analyze the collected feedback to identify common themes, concerns, and suggestions. Prioritize feedback based on its potential impact on fairness, accountability, transparency, and ethical implications.
Development Adjustments: Implement changes in the AI development process based on this feedback. This might involve adjusting algorithms to address biases, enhancing transparency in AI decision-making processes, or modifying use cases to prevent unintended harm.
Feedback Loop: Establish a feedback loop where changes made are communicated back to stakeholders for further input. This iterative process ensures that the AI system continuously evolves to meet ethical standards and stakeholder expectations.
3. Continuous Monitoring and Iteration
Monitoring Mechanisms: Post-deployment, continuously monitor the AI system's performance and impact. Use both automated tools and human oversight to track how the system is being used and its societal impact.
Regular Check-ins with Stakeholders: Schedule regular meetings with stakeholders to discuss ongoing performance and any new concerns or suggestions. This ongoing dialogue is crucial for maintaining trust and ensuring the AI system remains aligned with Responsible AI principles.
Iterative Improvements: Use insights gained from monitoring and stakeholder feedback to make iterative improvements to the AI system. This includes not just technical adjustments but also changes in governance policies and operational practices.
You can either click the “Leave a comment” button below or send us an email! We’ll feature the best response next week in this section.
🔬 Research summaries:
Clinical trial site matching with improved diversity using fair policy learning
Effective patient enrollment for clinical trials in healthcare requires the recruitment of a cohort of patients who are both eligible for the trial and whose population characteristics reflect that of the overall population. This paper develops a machine learning algorithm to select a set of trial sites for patient recruitment that collectively meet the above criteria. The algorithm is trained to account for patient eligibility, trial site quality, and patient diversity using health insurance claims, part clinical trials performance data, and census data and identifies trial sites that maximize both patient enrollment and patient diversity.
To delve deeper, read the full summary here.
Operationalizing the Definition of General Purpose AI Systems: Assessing Four Approaches
Through its Artificial Intelligence (AI) Act, the European Union (EU) is seeking to regulate general-purpose AI systems (GPAIS). However, clear criteria to discriminate between fixed and general-purpose systems have yet to be formulated. This paper assesses different perspectives for determining what systems could be classified as GPAIS by examining four approaches: quantity, performance, adaptability, and emergence. Based on this work, we suggest that EU policymakers engage with these approaches as a starting point for determining the inclusion criteria for GPAIS.
To delve deeper, read the full summary here.
Measuring Surprise in the Wild
Surprise is a pervasive phenomenon that plays a key role across a wide range of human behavior, but the quantitative measurement of how and when we experience surprise has mostly remained limited to laboratory studies. In this paper, we demonstrate, for the first time, how computational models of surprise rooted in cognitive science and neuroscience combined with state-of-the-art machine-learned generative models can be used to detect surprising human behavior in complex, dynamic environments like road traffic.
To delve deeper, read the full summary here.
📰 Article summaries:
How do we know how smart AI systems are? | Science
What happened: The text reflects on Marvin Minsky's 1967 prediction about achieving artificial intelligence (AI) comparable to human intelligence within a generation. Almost two generations later, some AI researchers, including Geoffrey Hinton and Yoshua Bengio, claim that superintelligent AI is closer than expected. The narrative discusses the challenges of assessing AI intelligence objectively, acknowledging the difficulty in determining if systems like GPT-4 truly exhibit human-level understanding.
Why it matters: The significance lies in assessing AI capabilities, especially with claims of nearing human intelligence. While GPT-4 performed well on standardized tests, concerns arise regarding data contamination, robustness, and flawed benchmarks. Data used for training might have contaminated test questions, and AI systems may not exhibit consistent understanding across similar prompts. Flawed benchmarks may lead to shortcut learning, raising questions about the actual intelligence these systems demonstrate. The cautionary note urges transparency in AI model training and the development of better experimental methods and benchmarks to assess AI capabilities accurately.
Between the lines: The text underscores the need for transparency in training these models, emphasizing the importance of open-source AI models. Collaborations between AI researchers and cognitive scientists are proposed to develop better experimental methods and benchmarks, drawing on insights from cognitive science to accurately assess intelligence, understanding, and cognitive capabilities in AI systems.
AI scientists make ‘exciting’ discovery using chatbots to solve maths problems
What happened: Researchers at Google DeepMind claim to have achieved the world's first scientific discovery using a large language model (LLM), such as ChatGPT, indicating that these models can generate knowledge beyond human understanding. DeepMind's "FunSearch" project utilized an LLM to tackle mathematical problems by generating computer programs. Surprisingly, the LLM produced solutions that surpassed existing human-generated knowledge, marking the first instance of a large language model making a genuine scientific breakthrough.
Why it matters: While chatbots like ChatGPT are popular, they typically repurpose existing information and are susceptible to confabulation, providing plausible but flawed answers. In contrast, FunSearch demonstrated the LLM's potential to contribute to scientific discovery by evolving computer programs to solve complex problems. This breakthrough challenges the traditional role of humans in algorithmic discovery, offering a transformative approach to computer science. The findings suggest that LLMs can assist in pushing the boundaries of algorithmic possibilities, potentially revolutionizing how computer programming evolves.
Between the lines: The immediate impact of this discovery extends to computer programmers, indicating a transformation in how algorithms are approached. Human-created specialized algorithms have dominated coding for decades, but FunSearch suggests a new era where LLMs play a crucial role in pushing algorithmic boundaries. Rather than replacing humans, LLMs like FunSearch offer assistance in algorithmic discovery, opening up possibilities for novel human-machine interactions in fields like mathematics. The generated programs not only solve specific problems but also provide insights that humans can interpret, generating ideas for solving a range of related problems in the future.
At Meta, Millions of Underage Users Were an ‘Open Secret,’ States Say - The New York Times
What happened: Meta is facing legal action by attorneys general from 33 states, revealing that Meta has received over 1.1 million reports of users under 13 on Instagram since early 2019. The complaint alleges that Meta only disabled a fraction of these accounts while continuing to collect children's personal information without parental consent, violating federal children's privacy laws. The company's knowledge of millions of underage Instagram users is described as a well-documented, analyzed, and protected "open secret" within Meta.
Why it matters: The privacy charges are part of a broader federal lawsuit filed by multiple states, accusing Meta of unfairly attracting young users to Instagram and Facebook, concealing internal studies on user harms, and seeking remedies to stop harmful features. The complaint highlights Meta's failure to prioritize effective age-checking systems, enabling users under 13 to lie about their age. Executives are accused of misleadingly claiming the effectiveness of age-checking processes despite the internal awareness of millions of underage users. The lawsuit could lead to significant civil penalties for Meta if the allegations are proven.
Between the lines: Meta's knowledge of specific underage accounts through reporting channels is mentioned, noting that the company automatically ignored certain reports of users under 13, allowing them to continue using accounts without user biographies or photos. The complaint cites instances where Meta discussed why the accounts of a 12-year-old were not deleted despite complaints from the mother. Meta's past privacy violations, including a $5 billion settlement in 2019, are mentioned, and it is suggested that pursuing children's privacy violations may be more straightforward for the states than proving the encouragement of compulsive social media use among young people, a relatively new phenomenon.
📖 From our Living Dictionary:
What is the relevance of AI-generated text detection tools in AI ethics?
👇 Learn more about why it matters in AI Ethics via our Living Dictionary.
🌐 From elsewhere on the web:
As AI capabilities rapidly advance, especially in generative AI, there is a growing need for systems of governance to ensure we develop AI responsibly in a way that is beneficial for society. Much of the current Responsible AI (RAI) discussion focuses on risk mitigation. Although important, this precautionary narrative overlooks the means through which regulation and governance can promote innovation.
Suppose companies across industries take a proactive approach to corporate governance. In that case, we argue that this could boost innovation (similar to the whitepaper from the UK Government on a pro-innovation approach to AI regulation) and profitability for individual companies as well as for the entire industry that designs, develops, and deploys AI. This can be achieved through a variety of mechanisms we outline below, including increased quality of systems, project viability, a safety race to the top, usage feedback, and increased funding and signaling from governments.
Organizations that recognize this early can not only boost innovation and profitability sooner but also potentially benefit from a first-mover advantage.
To delve deeper, read the full article here.
💡 In case you missed it:
Going public: the role of public participation approaches in commercial AI labs
What’s the state of public participation in the AI industry? Our paper explores attitudes and approaches to public participation in commercial AI labs. While tech industry discourse frequently adopts the language of participation in calls to ‘democratize AI’ (and similar), this may not match the reality of practices in these companies.
To delve deeper, read the full article here.
Take Action:
We’d love to hear from you, our readers, on what recent research papers caught your attention. We’re looking for ones that have been published in journals or as a part of conference proceedings.
While this is not a full answer, the following paper addresses a close question:
https://arxiv.org/abs/2309.13057
The Return on Investment in AI Ethics: A Holistic Framework published in the Hawaii International Conference on System Sciences (HICSS) 2024 Proceedings
The most pressing issue to solve in 2024 is the AI alignment problem. https://www.lesswrong.com/tag/ai