I had two main thoughts about this: self regulation by the data science profession, and data literacy.
The promise of big data and artificial intelligence is at an all time high, but by no means at its peak. The availability of data to mine is growing exponentially. And yet the data science community is still relatively small (compared with say, accountants, or bankers) and focused on scientific techniques .
Data science is making immense changes to the way people live, that will impact generations to come.
Reading these articles made me wonder, are data scientists proactively managing the ethical ramifications of the data they create, the algorithms they build, and the decisions made on the basis of their work?
This is a pivotal time in the evolution of data science ethics.
Data Scientists must establish strong ethical foundations in their profession, to ensure data science is used to make the world a better place, and before the profession gets over regulated by government if they dont do their part voluntarily.
As I explain in a past blog post, even Facebook is recognising that they are not just a technology tool, but make a real impact on the world: https://15-6762.ca.uts.edu.au/according-to-mark-zuckerberg-facebook-is-not-a-media-company/
Is now a good time for the profession to become a self regulating membership body?
Will auditors soon start to audit machine learning algorithms? (They should!)
I came across this code of conduct http://www.datascienceassn.org/code-of-conduct.html
Data literacy is also an interesting counterpoint to all of this.
I dont think it will be long before the general populace will revolt against organisations careless with their data, and opaque algorithms determining their fate in a way NOONE can explain. People dont have blind faith anymore.
The University of Washington is now offering this course: “Calling bullshit” to improve the quality of science. http://callingbullshit.org/syllabus.html
In the mid nineties, I read Wild Swans, an autobiographical story about three generations of Chinese women (the last being the author Jung Chang) spanning about 100 years. If you want the abridged version, you can read it here in Wikipedia https://en.wikipedia.org/wiki/Wild_Swans.
After reading what they endured being on the losing side of a war, and then being under Communist rule, I’m certain those three daughters of China would warn us to guard our personal information closely, and watch how its being used against us. Random pieces of data given away here and there, could become information weapons in the wrong hands, and not just for us but for our descendants.
This is just one of the many sources of a general feeling of foreboding that I have about my personal data.
The other forces that make me think a slow train wreck is coming:
- Ease of dissemination of “information” due to social media
- Growing ease of storage
- inability to destroy your own data, its immutable
- diminishing interpretability of results
Below are some notes from the articles
privacy anonymity transparency trust and responsibility concern data collection curation analysis and use
What is data ethics? http://rsta.royalsocietypublishing.org/content/374/2083/20160360
Floridi and Taddeo talk about three axes of data science ethics
Data ethics concerns the generation recording curation processing dissemination sharing and use of the data
Data science ethics is what is done with the data ie the ethics of the algorithms and the ethics of the practices.
regarding the algorithms, auditing the outcomes against a gold standard is esssential, to ensure it is achieving sensible and ethical results
creating a professional code of conduct to ensure ethical practices
3 Key Ethics Principles for Big Data and Data Science
Jay Taylor
collect minimal and aggregate
identify and scrub sensitive data
have a crisis management plan in place in case your insight backfires
above all, teach ethics!