Data Science Ethics: my initial thoughts

I had two main thoughts about this: self regulation by the data science profession, and data literacy.

The promise of big data and artificial intelligence is at an all time high, but by no means at its peak. The availability of data to mine is growing exponentially. And yet the data science community is still relatively small (compared with say, accountants, or bankers) and focused on scientific techniques .

Data science is making immense changes to the way people live, that will impact generations to come.

Reading these articles made me wonder, are data scientists proactively managing the ethical ramifications of the data they create, the algorithms they build, and the decisions made on the basis of their work?

This is a pivotal time in the evolution of data science ethics.

Data Scientists must establish strong ethical foundations in their profession, to ensure data science is used to make the world a better place, and before the profession gets over regulated by government if they dont do their part voluntarily.

As I explain in a past blog post, even Facebook is recognising that they are not just a technology tool, but make a real impact on the world:

Is now a good time for the profession to become a self regulating membership body?

Will auditors soon start to audit machine learning algorithms? (They should!)

I came across this code of conduct

Data literacy is also an interesting counterpoint to all of this.

I dont think it will be long before the general populace will revolt against organisations careless with their data, and opaque algorithms determining their fate in a way NOONE can explain.  People dont have blind faith anymore.

The University of Washington is now offering this course: “Calling bullshit”  to improve the quality of science.

In the mid nineties, I read Wild Swans, an autobiographical story about three generations of Chinese women (the last being the author Jung Chang) spanning about 100 years. If you want the abridged version, you can read it here in Wikipedia

After reading what they endured being on the losing side of a war, and then being under Communist rule, I’m certain those three daughters of China would warn us to guard our personal information closely, and watch how its being used against us. Random pieces of data given away here and there, could become information weapons in the wrong hands, and not just for us but for our descendants.

This is just one of the many sources of a general feeling of foreboding that I have about my personal data.

The other forces that make me think a slow train wreck is coming:

  • Ease of dissemination of “information” due to social media
  • Growing ease of storage
  • inability to destroy your own data, its immutable
  • diminishing interpretability of results

Below are some notes from the articles

privacy anonymity transparency trust and responsibility concern data collection curation analysis and use

What is data ethics?

Floridi and Taddeo talk about three axes of data science ethics

Data ethics concerns the generation recording curation processing dissemination sharing and use of the data

Data science ethics is what is done with the data ie the ethics of the algorithms and the ethics of the practices.

regarding the algorithms, auditing the outcomes against a gold standard is esssential, to ensure it is achieving  sensible and ethical results

creating a professional code of conduct to ensure ethical practices

3 Key Ethics Principles for Big Data and Data Science

Jay Taylor

collect minimal and aggregate

identify and scrub sensitive data

have a crisis management plan in place in case your insight backfires

above all, teach ethics!

Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: