We are many times wrong, and algorithms, too, even when perfectly programmed or nailing our goals. Our complex reality is full of unintended consequences. Video recommendations show more extremist and less scientific content, likes makes us stay longer that we actually want in certain pages...
Deep learning is a powerful tool, and with great power, great responsability comes. Before we jumped into the detailed approaches and tools to make deep learning more ethic, it is important to make a disclaimer:
- The fact that Deep Learning could not be perfect or by being perfect could have unitended consequences is not reason to stop using it or blame the entire field from flawed use.
- It is the developer, but also the owner of the development, responsible to ensure that single minded metrics and the model set up considers the consequences of success anf failure
- We need regulation to ensure that the implementation of safety and ethical guardrails is not up to big tech companies, but rather an open and rigorous debate considering experts of other fields, such as lawyers, economists, biologists...
Some examples illustrate issues in Deep Learning:
Bugs and Recourse
Software, with or without deep learning under the hood, have errors. Some errors may provoque an inconvenience (a cat pops up in a footware retail online store) others can cost lives (wrong treatment assigment or required medical coverage prediction).
Feedback Loops
Applications that can affect the kpi they are optimizing such as (watching time in youtube, or sales in ecom) could easily create a negative loop when undesirable content (such conspiracy theories videos) or superfluous consumption (selling things leveraging compulsive behaviours or hacks in people behaviour) are far from the mission statement of those applications.
Bias
Our societal/etnic/gender and other discriminations are to some extend reflected in our data. Criminal records, low women applications to tech jobs ... can be perpetuated if the data and the algorithms do not adjust for that.
The Means cannot be more important than the ends
Every deep learning project should start with a clear why we do it, and what are the impacts of optimizing certain metric. Brainstorming the known and potential unkown consequences of success and failure are key to build resilient and for the common good programs.
The world is full of examples of the flaws in human and artifical systems being trigger by narrow-minded metrics. For example the pursue of GDP growth is exhausting our life support systems and eroding the social tights, exploding individualism and inequality. The obsession with technology design to keep us engaged as long as possible is making us addicted and stay for unintended purposes, making deep work and real social interaction a rare event. We should choose very careful our metrics, and have a very clear end. Why are we doing what we are doing? Is this making our products or services better? Our consumers happier? Our world a most prosperous and resilient place? From the junior progammer to the CEO the private sector have a responsability.
I do not think every application deserves the same level of attention in terms of ethics, as the impacts diverge massively between a data cleansing system for internal operations with respect one assessing medical treatments. We need to understand the amount of risk inherited from the decisions taken by the application.
In the last years I hear tens of podcast on our field and hear intentionally or not very naive or even scapist reasoning over the impacts of their applications on society, for example:
- some applications being very vocal about their mission ("connecting people"), and very quiet on their business model (selling people's data to provide customized adds).
- some applications making assumptions such as you spending more time in the app is because you freely choose it, rather than asking if we are making the people addicted to it
- ignoring that adds influence our purchae behaviour, buying things we do not actually need
- ignoring that increasing conversion could be achieved by targetting consumers more sensible to compulsive buying or some sort of disability
Those are very common issues in the most successful tech platforms as we speak. While we are all inspired and could benefit from Youtube, Facebook and Google, the business model but most importantly the single metric focused on some of their algorithms are, to say the least, problematic and we should all contribute to ensure that business thrive using deeplearning with positive net social outcomes. Let's dive deep in to key deeplearning application issues next:
Accountability and Quality
Deep learning practitioners normally live in a black or white environment. They are overtrust in the sense that any outcome from the model is believed to be perfect, unbiased and optimal... Or in the other side of the spectrum, their models are rejected because they are not 100% perfect. I am afraid that wise use of deep learning leans in the middle of those areas.
Data, as our experience of the world is biased. On top of that, programms, even those very well tested could have gaps. Code and data is not perfect, and our algorithms are not perfect too. Even if all the latter would be perfect, our limitation to integrate all relevant metrics in a loss function and to account for all complex consequences make 100% success a quimera. This is no reason to despair.
Machine Learning / Deep Learning applications should be used as long as :
- they provide a consitent improvement over the status quo
- they are audited and regularly tested and challenged
That means that even if our carbon footprint calculator, or credit scoring is not 100% accurate, we should still use it if it is more reliable that what we have before. The problem most of the times is that we DO NOT have a BASELINE.
A Baseline is the current performance of the metrics we care about, specially those that can be compared with the new application performance. If our current calculations are 65% correct and the new algorithm is 72% correct we should use it. It is actually very likely that using an algorithm, together with humans in the loop, you will quickly to something greater than 72%.
My main message here is: do not blindly trust those applications, but please based your concerns not on examples but rather on rigorous baselines. It going to expose your mistakes, but it will make your team better. Another note, even if you can review everything manually, leverage this technology to challenge you and increase the reliability of your data and decisions. Two pairs of eyes are better than one, and a human and a machine together are better that one of those on their own.
Feedback Loops
The more we can impact our metric (conversion, time spend online, sales...) the more likely we are to enter the bumpy world of feedback loops. Imagine the following scenario: you sell shoes, and someone powerful in the org pushes for a very specific type of shoe to be everywhere and produce massively. The product was not selling well, as expected by the demand forecasting team and product experts, but after a lot of expensive discounting and marketing, we end up selling more of this product than others. The next season the algorithm notice that these types of products sold a big amount, so the next season it will predict it will see a lot , and so we produce a lot again...
We should never forget that for a product of service to be sold, first it has to be visible, it has to be rightly priced, and has to be available. Forgetting than some products or services have more privileged and correct presence than others will perpetuate bad decision making, and reinforce poor decision in the future.
Bias
In this section we talk of bias as the historical, social,measurement, aggregation, evaluation unbalances that our data have to represent truly our reality. The following paper explain in detail those 7 bias sources and how to deal with them:
Historical Bias
Historical bias could mean simply that past years would have little to do with current behaviour (critical for forecasting) or that social missrepresentation of certain groups can be perpetuaded ( low female participation in tech jobs in the past affecting profile search for future positions).
One thing that we should be aware of is that for example some open data sets or benchmark data sets do not include countries is a properly distributed fashion, for example Imagenet contains almost 60% of its data from Canada, US and UK, while they represent less than 10% of the worlds population.
Computer vision is not the only field, as language models are heavily biased toward history ( he is a doctor and she is a nurse instead of the other way around).
Measurement Bias
If we measure the wrong thing we will probably will not go where we want. The most known example is GDP, which was intended to manage militar spending and not wellfare, but given the lack of a clear substitutor it is used over and over to define successful policy when economies grow. There are many other examples of poor kpis definition such as:
- engagement in social media
- conversion in ecom
- net sales in retail
- rankings in sports
If they are optimized in isolation we will likely getting where we do not want: addicted users, huge returns rate, wrong pricing signals, athletes engaging in fragmented competitions....
Before we start developing a program, we need to make sure our target is well measured or at least its limitations as a kpi are clearly stated and ideally more than one metric is tracked to validated programm sucess.
Aggregation bias
The following plot shows how easy is to mess up when aggregating the data wrongly and not having good domain knowledge and letting some poorly aggregated data speak:
Do you think that exercise increases cholesterol or not? if you would not consider age groups your data set will look like the one on the right and you would infer a positive correlation between exercise and cholesterol. The reason for that is that without adjusting by age group one would not see the reality, which is that given a certain age those who do more sport have lower cholesterol levels. As much as I like learning and being surprise by data, I suggest to define research backed theories before we run like crazy reporting strong correlations. The book of Why from Judea Pearl is a golden mine for anyone who want to do good data science.
All in all the point here is that : most data and kpis are biased, and there are no shortcuts for good thinking before we jump into modeling. Domain knowledge and robust theories are key for successful applications. I love the quote "there is nothing more practical than a good theory", and I cannot agree more. Listen first, make hypothesis, gather data, model and be ready for bias.
Addressing bias
There is not a single bullet proof to adress bias, but a good list of check items should be part of any decision engine:
- The source data should contain clear documentation on how it has been collected
- Variables that contain ethnicity, gender or other social grouping should be avoided and also the algorithm should not be able to inferred
- There has to be a proper metric setting and programing behaviour tracking to detect : unintended feedback loops or biases. Testing should go beyond software engineering and contain some business and social dimensions.
- Audit for bugs in code and data
- Audit algorithm methodology, validate with literature and domain knowledge, particuarly when insights contradict years of research
- Ensure teams building such programs are as diverse as possible in backgrounds and social groups
- Use ethical tools to analyze the compliance of the application : Ethical Toolkit - Markkula Center for Applied Ethics (scu.edu)
The role of regulators
One would probably would not have a law for every potential ethical breakthrough, but regulation should set fastly enough some saferails to ensure every company works within the rules and ethical standards of the society its operates. This is particuarly important when behaving less ethically could mean a competitive advantage ( less costly or more profitable in general).
Penalties are key to generate a use case for ethical ML/AI . This one is a good example:
Kids and other vulnerable groups are hit but AI/ML driven apps and as we put limits in TV expousure, the same needs to happen with such apps and digital channels.
Clean air and clean drinking water are public goods which are nearly impossible to protect through individual market decisions, but rather require coordinated regulatory action. Similarly, many of the harms resulting from unintended consequences of misuses of technology involve public goods, such as a polluted information environment or deteriorated ambient privacy. Too often privacy is framed as an individual right, yet there are societal impacts to widespread surveillance (which would still be the case even if it was possible for a few individuals to opt out).
Many of the issues we are seeing in tech are actually human rights issues, such as when a biased algorithm recommends that Black defendants have longer prison sentences, when particular job ads are only shown to young people, or when police use facial recognition to identify protesters. The appropriate venue to address human rights issues is typically through the law.
We need both regulatory and legal changes, and the ethical behavior of individuals. Individual behavior change can’t address misaligned profit incentives, externalities (where corporations reap large profits while offloading their costs and harms to the broader society), or systemic failures. However, the law will never cover all edge cases, and it is important that individual software developers and data scientists are equipped to make ethical decisions in practice.
Concluding remarks
Those like me that work and read on the actual developments on DeepLearning are not concerned about Terminators like AI or people falling in love with Alexa, instead, we are concern about the amount of usage with poor understanding of the negative impacts current Deep Learning.
Humans and data have biases, and this is not reason for not working with humans and machines to achieve our goals. Good usage of deep learning requires:
- Understanding of how the data has been gathered
- Domain knowledge of the field of focus
- Sufficiently concrete but hollistic set of metrics to define success
- Audit and checks on data, code and impacts of application (on social, environmental and business dimensions)
- Regulations that direct innovation toward the common good
- Diverse teams in backgrounds (studies, academia, industry...) and societal groups
- Scientific and Moral mindset - be rigorous and not evil
With all the below we are much more likely to use wisely this powerful and versatile tool that can make a better world, if we design it and use it correctly.
Questionnaire
Comments
Post a Comment