New Tech Data as Court Evidence?

Or why judges are as cautious as the Amish when it comes to admissable evidence

Sharing a Ride“Sharing a Ride” by Forsaken Fotos is licensed under CC BY 2.0

This is my reaction to material we discussed in my CMGT530 class at Annenberg: Social Dynamics of Communication Technology. The material was Czitrom (Czitrom, 1982) and the film Devil’s Playground and it’s Amish subjects (Walker, 2002).

The Amish people have a philosophy of Ordnung where they try to slow down or reject technology that may pollute their traditions (Amish America, 2019). Czitrom wrote of the telegram’s impact on macro issues like corporate and government power (Czitrom, 1982). This made me think about today’s technology and how it was used in a murder case in California, described in October 2019 Wired Magazine (Smiley, 2019). It raises the question whether admitting as evidence of data of modern devices puts the underlying tenet of “innocent until proven guilt” in criminal proceedings at risk.

In Wired October 2019 issue, I read about Tony Aiello, a frail 4’11’ Californian in his 90s who died last month in jail awaiting trial (updated in online story) (Smiley, 2019). Accused of brutally murdering his stepdaughter Karen, he died before his guilt or innocence could be determined (Smiley, 2019). A neighbor’s doorbell camera placed Tony at the scene for a crucial 20 min period during which Karen’s Fitbit registered heart rate accelerating and then dropping to none at all. DNA and other evidence led to Tony being put in jail.

I have previously researched how wide DNA database searches and wide facial recognition database searches could lead to coincidental matches (a la the birthday paradox) and false positives, resulting in innocent people having to defend themselves in court and even serving prison time (Keys, 2017). However, this was different, as Tony was a suspect very early on. Nevertheless, device data and expert testimony can be incomprehensible to jury members and also accepted without understanding, even with all its flaws and without establishing motive (Gibson 2017).

With each new technology it’s really important to establish the characteristics of the devices and their data quality before admitting it, if “innocent until proven guilty” and justice is to prevail in our courts in future.

Citations

Amish America. (2019). Do Amish use technology? Retrieved October 23, 2019, from Amishamerica.com website: http://amishamerica.com/do-amish-use-technology/

Czitrom, D. J. (1982). Media and the American Mind From Morse to McLuhan. Chapel Hill: University of North Carolina Press.

Gibson, A.J. 2017, On the face of it: CCTV images, recognition evidence and criminal prosecutions in New South Wales, PhD Thesis.

Keys, T. (2017). Image Processing in The Age of Surveillance [Personal Blog]. Retrieved October 23, 2019, from Tracy Keys website: https://tracykeys.net/2019/10/23/image-processing-in-the-age-of-surveillance/

Smiley, L. (2019, October). A Brutal Murder, a Wearable Witness, and an Unlikely Suspect. Wired, 2019(27.10). Retrieved from https://www.wired.com/story/telltale-heart-fitbit-murder/

Walker, L. (2002). Devil’s Playground—Full Movie | Snagfilms. Retrieved from https://www.youtube.com/watch?v=I0h4nRYZ8d0

Image Processing in The Age of Surveillance

TL:DR


How three forces: the explosion of individual images available online, the
accelerating data science capabilities of image processing, and pressure on individual rights and freedoms impact the use of image recognition in surveillance in crime prevention and criminal prosecution. Covers the potential risks of reliance on this kind of visual evidence, and recommendations to reduce these risks to society.

We are living in an “Age of Surveillance”

Surveillance is an age-old tool of crime prevention, and through the analysis of video and still
images, provides the basis for prosecution in some cases today for individual and national security
crimes.
Despite strong lobbying against it, general surveillance by government and corporations has seen an
unprecedented increase in recent years (New South Wales et al. 2001). This surveillance occurs at
your work place, on the street, in public venues, in supermarkets, at the airport, but also through
analysis of what you post publicly on the internet through social media.
The ability to conduct surveillance effectively is driven by three forces: the explosion in images
available in databases, the image processing capability of data science and the erosion of individual
rights.
Image Databases are growing exponentially
The number of databases with videos and images of people is growing exponentially.
Firstly, due to the increased use of CCTV for general surveillance.
CCTV has been around since the 1960s, but it has outgrown being closed circuit and on a television,
and is now any “monitoring system that uses video cameras .. aimed at preventing and detecting
crime through general (not targeted) surveillance. “ (Gibson 2017). Government at all levels use
CCTV to deter and detect crime, and its not just fixed cameras but also cameras attached to the
bodies of law enforcement agents.
Whilst surveillance is an unpleasant fact, many corporations and public-sector organisations gather
data on individuals for other purposes, such as marketing, customer service, problem solving, and
product development. Individuals often willing consent to the collection of this data, in return for
their services. However many individuals do not understand the terms and conditions they are
agreeing to when providing their consent (Sedenberg & Hoffmann 2016).
Indeed, as our lives are increasingly conducted online, and cloud computing makes storage cheaper,
and faster, our activities are tracked, recorded and stored by corporations and governments (Hern
2016; boyd & Crawford 2012; Sedenberg & Hoffmann 2016).
As a result of general surveillance and the voluntary provision of images and video over social media,
your image is now stored in databases online by governments and corporates.
Image Processing capability is growing rapidly also
The capability to analyse all these images has made great progress in recent years also, making it
possible for machines to process of petabytes of surveillance images to identify individuals.
4
Over the last five years, using deep learning convolutional neural networks (ConvNets), image
processing capabilities have progressed from image classification tasks (Krizhevsky, Sutskever &
Hinton 2012) using large image databases like ImageNet, to human re-identification using Siamese
Neural Networks and contrastive difference to be able to accurately recognise faces they have only
seen once before, and in real time (Koch, Zemel & Salakhutdinov 2015; Varior, Haloi & Wang 2016).
The YOLO object identification and classification network ( You Only Look Once) are achieving fast
processing speeds in real time and competitive accuracy (Redmon et al. 2015).
Recurrent neural networks such as long short term memory networks have also proved able to
identify objects in video sequences and caption them (Lipton, Berkowitz & Elkan 2015), however this
is not in real time.
In 2013, Ian Goodfellow developed generative adversarial networks (GANs), where two ConvNets
are trained simultaneously, one to generate artificially created images, and the other to discriminate
between real images and generated ones (Goodfellow et al. 2014).
And in the last two years, both Google and Facetime Artificial Intelligence teams have independently
developed the ability to create images using ConvNets (Mordvintsev, Olah & Tyka 2015; Chintala
2015).
Lastly, the processing power available to data scientists is growing rapidly, through advancements in
graphic processing unit (GPU) speed and the availability of cloud computing, enabling analysis of
extremely large data sets without huge investment in compute power.
The speed of development is incredibly fast in this deep learning field, and it is very conceivable that
products will be developed in the next 10 years that could productionise and scale these automated
image recognition and generation capabilities for use by corporations, government and law
enforcement for use in surveillance for crime prevention, detection and prosecution.
The ready availability of image databases, and the advancements in data science image processing
capability is not enough without the right of corporations and governments to use this data for
general (not targeted) surveillance). This third force is also increasingly becoming a reality in recent
years.
Erosion of Individual Rights
There are several ways our rights are being eroded.
Individual rights to privacy are being eroded voluntarily, as we give away licenses to our own images,
and involuntarily through legislation or court decisions enacting crime prevention and national
security measures.
More images of our daily life are captured through our phones and posted to social media.
Technically, you own these images and can control their usage (Wikipedia 2017) (US Copyright Office
n.d.; Orlowski n.d.).
However, while you own the copyright of the images you have created, you have probably already
given Facebook and Amazon permission to profit from your image and images you own, through a
very wide-ranging license to store and use it (Facebook n.d.).
Private organisations are using the data gathered on their users for research, however these
organisations are outside of the ethics required by government on education and health institutions
5
(Sedenberg & Hoffmann 2016). The profit motive of these companies could undermine privacy and
security of your data (Sedenberg & Hoffmann 2016).
On the personal data level, there are some serious attempts at protecting the rights of the
individual. The General Data Protection Regulation of the European Union which comes into effect
April 2018, covers all data captured from EU citizens. It codifies the “right to be forgotten”, and “the
right to an explanation” for the result of any algorithms (Goodman & Flaxman 2016). However,
these regulations do not seem to matter when it comes to national security.
However, Edward Snowden and Wikileaks revealed that organisations like Yahoo and Google have
been compelled in the United States courts and in Europe to hand over your data to government
bodies for national security surveillance (Wikipedia 2018). It is quite feasible that Apple, Facebook
and Amazon have the same obligations, and we just don’t know about it yet.
The use of video cameras for general surveillance erodes an individual’s right to privacy, which
although reduced in public, is still expected to some degree due to people’s perception of the “veil
of anonymity” (Gibson 2017). It also indirectly erodes freedom of speech, as people are unable to
express themselves without fear of reprisal (Gibson 2017).
People often say they have nothing to hide when it comes to fighting against general surveillance,
but this is predicated on society and government keeping the same values of today into the future.
Once something is recorded online, either in image or text, it is there forever and could be used
against you. This is something people from totalitarian regimes would be able to tell Westerners.
Having online databases of images and advanced processing power combined with the erosion of
individual right to privacy make the perfect conditions for an explosion in the use of image
processing in criminal prevention, detection and prosecution. The next section focuses on the
current and future use of image processing as a form of visual evidence in criminal prosecution.
Uses of Image Processing in Criminal Prosecution
Video and images are a form of visual evidence, whose purpose is to provide positive visual
identification evidence (i.e it is the same person) , circumstantial identification evidence (i.e it is a
similar person) or recognition evidence (I know that it is the same person in the image) that supports
the case to prove that the accused is the offender (Gibson 2017).
Computer image processing provides visual evidence in a number of ways. Firstly, its sheer
processing power enables a very wide and deep search for this evidence within image databases or
millions of hours of video.
It also has useful capabilities in gathering video evidence. It can detect individuals across a range of
different surveillance cameras as the offender moves through the landscape. Algorithms can be used
to “sharpen” blurry images. YOLO image recognition can enable a person’s face to be found in a
huge database of images using neural network architecture.
Variable lighting, recording quality, movement of the camera, obstructions to line of sight, and other
factors make for many interpretations of an image (Henderson et al. 2015). For this reason, an
expert in “facial mapping” or “body mapping” usually examines the image and testifies in the court
room, where they can be cross examined (Gibson 2017). The expert may not positively identify the
defendant, so at other times, it is up to the juror to determine if the offender and the defendant are
the same.
6
In future, as the database of images grow and the capability to use computer vision processing
accelerates, I can imagine a huge facial image database similar to the DNA database collated in the
USA in states like California (LA Times 2012), where instead of DNA samples, CCTV video images
from a cold case will be matched to the database in order to track down a suspect.
However, unlike DNA, where few people have their DNA recorded in the database, we are moving
towards the entire population’s faces being recorded online somewhere, and most likely one day in
the hands of law enforcement.
What can we learn about the risks of the use of DNA forensic evidence and CCTV evidence to be sure
that visual evidence procured through image processing will not create false positives and injustice?
Limitations of Visual Evidence in Criminal Prosecution
We begin by understanding the limitations of visual evidence for the jurors who must evaluate it in
criminal trials.
Video is a constructed medium, which can be interpreted in more than one, and even opposing,
ways in the court room. After the lawyers for the 4 police officers accused of beating Rodney King
deconstructed the eye witness video, 3 of the 4 were acquitted, yet public outcry was so intense that
it led to the LA Riots (Gibson 2017).
Unlike witnesses, video and images cannot be cross examined, however they are efficiently
absorbed by the jury compared to witnesses who may be boring or too technical (Gibson 2017).
When evidence is presented by an expert, jurors can suffer from the “white coat effect” which
prejudices the juror to weight the experts evidence more heavily (Gibson 2017).
Therefore, visual evidence is fraught with a lot of the issues that face forensic evidence more
broadly, including DNA evidence.
In the USA, since 1994 the FBI have been using the Combined DNA Index System (CODIS): a
computer program that enables the comparison of DNA profiles in databases at the local, state, and
national level (Morris 2010). Recently, CODIS has been used to search for suspects using DNA
matches on cold cases, and a growing proportion of criminal cases are relying on these cold DNA
database hits.
Worryingly, there have been many examples of a miscarriage of justice, where match statistics were
wildly wrong, yet heavily overweighted by the jury despite the accused having no means, motive or
opportunity (Murphy 2015).
We must explore the limitations of DNA evidence to understand what limitations there could be if
image searches were used like this in the future.
Like visual evidence, jurors must evaluate DNA evidence in criminal trials. DNA evidence is
accompanied by random match probability (RMP) statistics: the likelihood of finding a DNA match by
chance.
There are many differences between the databases in CODIS: the collection process, accuracy of
samples, the criteria for inclusion in the database and the statistical methods and programs used for
analysis. (Morris 2010). These differences can lead to very different impacts on match statistics.
Research has shown that a juror’s interpretation of the likelihood of a coincidental match also
depends on how these statistics are presented (Morris 2010). The statistics are complicated, but
7
seemingly rare events can have surprisingly high likelihood if you present the probability of
someone, somewhere matching, rather than the odds of a certain person matching. For example,
the chance of any two people in a room having the same birth day and month is greater than 50% if
there are more than 22 people in the room. This represents the database match probability. When
the Arizona DNA database was searched for intra-database record to record matches they found
multiple occurrences of the same DNA profile from different people.
The wider the search, the greater the likelihood of a coincidental match, and Type I errors (false
positives). Therefore, coincidental matches would be much more likely in a national or even global
database of faces. Databases such as CODIS also suffer from ascertainment bias, due to their nonrandom sampling.
There are currently 4 different ways of presenting these match statistics (3 of them court approved)
with research finding widely different outcomes in terms of verdict (Morris 2010). Jurors fall prey to
the prosecutors fallacy “drawing the inappropriate conclusion that a particular probability of chance
occurrence is the same as the likelihood that the person incriminated by the statistics is innocent of
the crime.” (Morris 2010)
How can data scientists prevent their image databases and research from being similarly
misunderstood and misrepresented?
Recommendations
The field of forensic evidence and especially DNA and visual evidence is evolving, and data scientists
must conduct themselves today in a way to prevent the pitfalls of injustice now and in the future.
Database standardisation is essential in terms of quality of images, compression and formats, plus
the data dictionary used.
Data Scientists must ensure that their work is statistically sound and agree a common methodology.
They must search for opposing evidence, to avoid the trap of confirmation bias. They must form a
close relationship with legal professionals to work in forensics.
Informed consent must be gained from users to use their images in this way. To protect their privacy
and justice, society must become more data literate as these issues are having a greater impact in
every part of our lives, even in criminal justice.
Bibliography
boyd, danah & Crawford, K. 2012, ‘Critical Questions for Big Data’, Information, Communication &
Society, vol. 15, no. 5, pp. 662–79.
Chintala, S. 2015, The Eyescream Project: NeuralNets dreaming natural images, viewed 14 January
2018, <http://soumith.ch/eyescream/&gt;.
Facebook n.d., ‘Facebook Terms of service’, facebook.com, viewed 17 December 2017,
<https://www.facebook.com/legal/terms&gt;.
Gibson, A.J. 2017, On the face of it: CCTV images, recognition evidence and criminal prosecutions in
New South Wales, PhD Thesis.
8
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. &
Bengio, Y. 2014, ‘Generative Adversarial Networks’, arXiv:1406.2661 [cs, stat], viewed 14
January 2018, <http://arxiv.org/abs/1406.2661&gt;.
Goodman, B. & Flaxman, S. 2016, ‘European Union regulations on algorithmic decision-making and a
‘right to explanation’’, arXiv:1606.08813 [cs, stat], viewed 13 November 2017,
<http://arxiv.org/abs/1606.08813&gt;.
Henderson, C., Blasi, S.G., Sobhani, F. & Izquierdo, E. 2015, ‘On the impurity of street-scene video
footage’, IET Conference Proceedings; Stevenage, The Institution of Engineering &
Technology, Stevenage, United Kingdom, Stevenage, viewed 21 January 2018,
<https://search.proquest.com/docview/1776480046/abstract/3C556FDE82424A67PQ/7&gt;.
Hern, A. 2016, ‘Your battery status is being used to track you online’, The Guardian, 2 August, viewed
30 December 2017, <http://www.theguardian.com/technology/2016/aug/02/batterystatus-indicators-tracking-online&gt;.
Koch, G., Zemel, R. & Salakhutdinov, R. 2015, ‘Siamese neural networks for one-shot image
recognition’, ICML Deep Learning Workshop.
Krizhevsky, A., Sutskever, I. & Hinton, G.E. 2012, ‘Imagenet classification with deep convolutional
neural networks’, Advances in neural information processing systems, pp. 1097–1105.
LA Times, T.E. 2012, ‘Playing fast and loose with DNA’, Los Angeles Times, 31 July, viewed 13 January
2018, <http://articles.latimes.com/2012/jul/31/opinion/la-ed-dna-database-california-
20120731>.
Lipton, Z.C., Berkowitz, J. & Elkan, C. 2015, ‘A Critical Review of Recurrent Neural Networks for
Sequence Learning’, arXiv:1506.00019 [cs], viewed 5 November 2017,
<http://arxiv.org/abs/1506.00019&gt;.
Mordvintsev, A., Olah, C. & Tyka, M. 2015, ‘Inceptionism: Going Deeper into Neural Networks’,
Research Blog, viewed 17 December 2017,
<https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html&gt;.
Morris, E.K. 2010, Statistical probabilities in a forensic context: How do jurors weigh the likelihood of
coincidence?, Ph.D., University of California, Irvine, United States — California, viewed 13
January 2018,
<https://search.proquest.com/docview/755686007/abstract/7A00420D28404DF2PQ/2&gt;.
Murphy, E.. 2015, Inside the cell: the dark side of forensic DNA, First., Nation Books, New York, NY,
USA.
New South Wales, Law Reform Commission, New South Wales & Law Reform Commission 2001,
Surveillance: an interim report, New South Wales Law Reform Commission, Sydney.
OfficerJoeK-9 n.d., ‘Joi’, Off-world: The Blade Runner Wiki, viewed 30 December 2017,
<http://bladerunner.wikia.com/wiki/Joi&gt;.
Orlowski, A. n.d., ‘Cracking copyright law: How a simian selfie stunt could make a monkey out of
Wikipedia’, The Register.
9
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. 2015, ‘You Only Look Once: Unified, Real-Time
Object Detection’, arXiv:1506.02640 [cs], viewed 14 January 2018,
<http://arxiv.org/abs/1506.02640&gt;.
Sedenberg, E. & Hoffmann, A.L. 2016, ‘Recovering the History of Informed Consent for Data Science
and Internet Industry Research Ethics’, arXiv:1609.03266 [cs], viewed 17 December 2017,
<http://arxiv.org/abs/1609.03266&gt;.
US Copyright Office n.d., Compenduim II of Copyright Office Practices, viewed 17 December 2017,
<http://www.copyrightcompendium.com/&gt;.
Varior, R.R., Haloi, M. & Wang, G. 2016, ‘Gated Siamese Convolutional Neural Network Architecture
for Human Re-Identification’, arXiv:1607.08378 [cs], viewed 13 January 2018,
<http://arxiv.org/abs/1607.08378&gt;.
Wikipedia 2018, ‘Edward Snowden’, Wikipedia, viewed 13 January 2018,
<https://en.wikipedia.org/w/index.php?title=Edward_Snowden&oldid=819863748&gt;.
Wikipedia 2017, ‘Personality rights’, Wikipedia, viewed 30 December 2017,
<https://en.wikipedia.org/w/index.php?title=Personality_rights&oldid=814604845&gt;.

Data and Innovation in Slow Fashion

How the Sustainable Fashion Industry can learn from Fast Fashion

The Fashion Industry: Fast and Slow

The fast fashion industry, which includes brands such as H&M, Zara, Forever21, Asos and TopShop, serves up new products at historically low prices and at an ever growing pace: within weeks of New York Fashion Week or being worn by the latest It girl (Kindred, 2015).  Using data driven marketing, rapid product development and agile supply chain management, annual product lines have increased tenfold, product life cycles have decreased from months to weeks and even days (Sull and Turconi, 2008) and customers can consume clothing in on demand, disposable manner (Pal 2016).  

As it feeds insatiable consumer demand, fast fashion is considered by some to epitomise materialistic consumption (Kim et al 2013). The rapid growth comes at social and environmental cost: unethical labour practices with poor health and safety such as child labour, sweatshops, excessive waste from disused clothing, and production methods that pollute the land, water and air (Kim et al 2013).

This represents a new opportunity for the sustainable fashion industry, as anti-consumerism is turning some consumers away from fast fashion (Kim et al 2013). The industry, dubbed “slow fashion” akin to the slow food movement, flies in the face of the fast fashion trend, as it pursues the “triple bottom line” objectives of economic prosperity, social justice and environmental quality (Elkington 1994).

This paper will explore how slow fashion can follow the lessons of fast fashion to transform itself using data in order to achieve their sustainability objectives.

Key challenges of ‘Slow Fashion’ and how to address them through data

This paper will discuss how, in order to achieve their objectives, the Slow Fashion industry must overcome 3 key challenges: (1) increase customer lifetime value, (2) reduce waste, and (3) prove a sustainable supply chain.

1.     Grow Customer base and increase Customer Lifetime Value

The business model of Slow Fashion, is characterised by higher unit costs and lower sales volumes. Handmade, artisanal items take time and skill to create, so they cost more per item, but tend to promote more timeless and more durable and therefore last longer (Clark, 2008).

As a result, Slow Fashion must find innovative ways to engage consumers over a long period of time to increase customer lifetime value (CLV), such as gathering and synthesizing data about what customers want (Zarley Watson and Yan 2013).

Customer Segmentation

Slow Fashion can mimic its nemesis the fast fashion industry and obtain consumer demand data throughout the sales cycle as shown in Figure 1 (Kindred 2015). This enables a better knowledge of customers and supports agile product development (Sull and Turconi, 2008).

Consumer demand data is obtained through surveys, quizzes, competitions, social media mining, A/B testing of campaigns, using cookie tracking, wish lists, browsing behaviour, shopping cart and purchasing history, and participation in loyalty programmes (Kindred 2015).

Once the data is gathered, unstructured machine learning algorithms such as “K-Means Clustering” (Hartigan and Wong 1979) is used to make meaningful consumer segments to identify and target the styles and preferences of existing (and new) consumers.  These segments, combined with identity management API services are used to recognise and know customers across devices (mobile, desktop, in store), tailor marketing campaigns, help them discover new product ranges, and make recommendations to suit their style and stage of the customer life cycle (Kindred 2015). This is turn drives customer loyalty and hence CLV.

Product Discovery and Recommendations

Recommendations algorithms are being built for product discovery, which, when accurate, encourages more frequent purchases from a customer increasing their Total Average Revenue Per User (ARPU) and total average products per user (APPU). These can be suggested to consumers when online, via eDM, or returned as search results, and may utilise natural language processing and visual search (although these are both nascent technologies) (Kindred 2015).

By building artificial intelligence based learning mechanisms (such as a feedback loop on click through rates among other indicators), the accuracy of recommendations algorithms can improve, in turn driving better customer retention and repeat visits (in store and online) in a cost efficient way.

Figure 1 Fashion Data Cycle (Kindred 2015)

However, categorising and classifying customer preferences and inventory into a metadata taxonomy and structure to enable natural language processing and visual search can be challenging. There is no universal taxonomy for fashion styles, colours and preferences, from a product or a customer point of view (Kindred 2015). For example, the slightest change in hue of colour or the length of an item can make all the difference in what is “on trend”. In addition, the visual images of collections and related metadata are intellectual property and brands are unwilling to release this information, which limits the data available for these services (Kindred 2015).

For an organisation to become more data intensive, a significant change in mind set and skill set is required (in order to change company’s culture towards data driven decisions. The data science skills required to achieve this customer segmentation are short in supply and might not be as accessible  to small manufacturers, which Slow Fashion entrepreneurs usually are.

2.     Use of data to reduce waste through optimisation

By delivering the right product, in the right quantity to the right location to the right customer, the fashion industry has an opportunity to reduce waste (Sull and Turconi 2008). This is even more crucial for the Slow Fashion industry, as the lead times are naturally longer due to sustainable production practices, the supply chain is less agile and conservation of natural resources is also an objective. Utilising data to improve sales forecasts and optimise systems can reduce waste for slow fashion brands.

Accurate Sales Forecasts

Sales forecasts must be accurate at an individual SKU level in order to avoid stock outs and discounting, however this is a challenge as demand is highly uncertain and seasonal (Guo et al, 2011).

Through the data gathered on consumer preferences throughout the product data cycle (Figure 1), and through tracking market signals i.e. key word mentions in media, influencers and brands online channels, and visual image search and recognition, Slow Fashion can build more accurate demand forecasts as fast fashion players. Statistical techniques and machine learning presents an opportunity to take hundreds of signals in real time and translate them into a product forecast (Guo et al, 2011).

However, analysing and making sense of this data cannot be performed by machines alone as fashion is characterised by subjectivity, extreme fluctuations in demand and contextual relevance (Kindred 2015). For example, the hands-on role of Zara’s store managers has been critical to the success of Zara’s agile supply chain (Sull and Turconi 2008).

Human understanding is needed to interpret the qualitative and quantitative data, in order to differentiate in real time, between an anomaly and an emerging trend, and adjust forecasts as necessary (Kindred 2015).

Optimisation of Systems

Optimising production processes and the distribution value chain is also crucial for ensuring efficient use of resources and reducing waste.

In recent years, the price of microprocessors and cloud storage is becoming so low that it is possible to connect almost all devices to the internet, for example putting a micro-chip into each garment like UnderArmor – Fitbit for clothes (Kindred 2015). Through this “Internet Of Things”, performance data for key elements of Slow Fashion production, distribution network, online and offline, can be tracked in real time and stored in the cloud.

Data analytics and structured machine learning algorithms can be used to analyse and visualise this data, in order to provide solutions that optimise processes and production to reduce waste and use resources sustainably i.e. factor in enough workers so hours are reasonable, allow enough time for fields to recover between plantings etc (Guo et al 2011).

The challenge with obtaining this data is the internet connectivity and power supply of local suppliers who are often in developing nations may not be reliable, and therefore there may be missing data. In addition, despite the reduction in tracking costs, rolling out a tracking system to a large number of small independent producers may not be feasible for the smaller scale slow fashion brand.

3.     Improve Supply Chain Sustainability through transparency and traceability

Slow Fashion has the desire to prove to stakeholders that their supply chain is managed sustainably, and the provision of reporting can provide transparency and traceability to illustrate this (Morgan 2015).

By tracking raw materials from their source (using the Internet Of Things as described above), reporting on key equity measures such as working hours, and the production of, and payments to artisanal suppliers, and the use of data visualisations on Slow Fashion brands websites, consumers can see the origin of their purchases, and also see the impact their patronage is having on local communities over time (Carter and Rogers 2008).

In addition, in order to make the Slow Fashion supply chain as agile and responsive as possible, it can use supplier identity credentials, electronic data interchange and open book accounting to enable trust between suppliers, brands and consumers (Park et al 2013).

However, slow fashion brands are not generally of the scale to demand compliance from their suppliers, and gathering consent might be difficult. Suppliers may be primary producers without IT systems, so obtaining consistent, accurate and regular data could be a challenge.

Lastly, this kind of open data sharing could be a privacy issue for many small suppliers as it basically reveals their household income. In areas of civil unrest, data could be used for unintended purposes and compromise the safety of some suppliers.

Impact on Slow Fashion

This paper has shown how Slow Fashion participants can become more data driven to address the opportunities and challenges facing their industry.

They can use data mining to identify potential customers as consumers pursue fast fashion avoidance. Concurrently, they can use product and consumer data to know their customers better, and through algorithms and machine learning, match their product line and processes with consumer demand more accurately increasing ARPU and APPU. These strategies reduce waste, grow revenue and improve the triple bottom line (Elkington 1994).

Furthermore, using data to report on the supply chain can also prove to stakeholders that a Slow Fashion brand is authentic and sharing value with its suppliers, and over time illustrate that it is delivering long term value to the communities that work with it.

This data intensity will require a significant mind shift amongst suppliers and brands in order to make data central to decision making, as well as making the supply chain mobile internet connected.

In this way, as Slow Fashion becomes more data intensive, they can innovate in a way to achieve the triple bottom line benefits of economic prosperity, social justice and environmental quality.

Reference List

Carter, C.R., and Rogers D.S (2008) “A framework of sustainable supply chain management: moving toward new theory” International Journal of Physical Distribution & Logistics Management Vol. 38 (5): 360-387

Clark, H., 2008. SLOW + FASHION – an oxymoron-or a promise for the future…?. Fashion Theory, 12(4): 427-446.

 Guo, Z.X, Wong, W.K, Leung, S.Y.S and Li, Min (2011)  “Applications of artificial intelligence in the apparel industry: a review” Textile Research Journal 81(18):1871–1892

Hartigan, J. A.; Wong, M. A. (1979). “Algorithm AS 136: A K-Means Clustering Algorithm”. Journal of the Royal Statistical Society, Series C. 28 (1): 100–108

Kindred, L.and Steele, J. (2015) “Fashioning Data: A 2015 Update” O’Reilly Media Inc, Sebastopol

Kim, H., Ho, J.C, Yoon, N. (2013) “The motivational drivers of fast fashion avoidance” Journal of Fashion Marketing and Management `7(2): 243-260

Jung, S., and Jin, B., (2016) “Sustainable Development of Slow Fashion Businesses: Customer Value Approach” Sustainability 8(6) :540-556

Morgan, T. R. (2015) “Supply chain transparency: An overlooked critical element of supply chain management” The University of Alabama, Tuscaloosa

Pal, R. 2016. “Sustainable Value Generation Through Post-retail Initiatives: An Exploratory Study of Slow and Fast Fashion Businesses.” In Green Fashion, edited by S. S. Muthu and M. A. Gardetti

Park, A., Nayyar, G., and Low, P. (2013) “Supply Chain perspectives and issues: A literature review” World Trade Organisation and Fung Global Institute, Geneva

Sull, D., Turconi, S. (2008) “Fast Fashion Lessons” Business Strategy Review19.2 (Summer 2008): 4-11.

Zarley Watson M, and Yan, R. (2013)” An exploratory study of the decision processes of fast versus slow fashion consumers”  Journal of Fashion Marketing and Management 17(2): 141-159

Eradicating Racial Differences in Prostate Cancer Outcomes

Dedicated in loving memory of Calvin Harris Snr

A report written as part of my Masters of Communication Data Science at University of Southern California in Fall 2018.

Abstract

Racial disparities in health care outcomes contribute to African Americans (AA) men living ten years less on average than a white American (Rosenberg, Ranapurwala, Townes, & Bengtson, 2017).  One of those disparities is due to prostate cancer (PC), the second most deadly form of cancer in America, with a mortality rate double the rate for AA men than non-Hispanic whites (American Cancer Society, 2018).  This literature review examines the research for possibilities to reduce this racial disparity to zero, by asking what are the underlying factors that cause these outcomes for AA men?  This question will be answered by considering the attitudes, beliefs and behaviors of both patients and health care providers and focusing on where there are racial differences.

Keywords:  Prostate Cancer, Racial Disparity, African American, Reasoned Action Approach

Eradicating Racial Differences in Prostate Cancer Outcomes
Literature Review

Racial disparities in health care outcomes contribute to African Americans (AA) men living ten years less on average than a white American (Rosenberg et al., 2017).  One of those disparities is due to prostate cancer (PC), the second most deadly form of cancer in America, with a mortality rate double the rate for AA men than non-Hispanic whites (American Cancer Society, 2018).  This literature review examines the research for possibilities to reduce this racial disparity to zero.

Prostate Cancer in America

In 2018, 29,000 American men are predicted to die due to PC, and160,000 new cases will be diagnosed (American Cancer Society, 2018)1. 

The longer a man lives, the higher the likelihood he will have PC, yet most men “die with prostate cancer, not die from it” (Ablin, 2014; Peehl, 1999).

This is because the unique, dual nature of PC: one type is microscopic, almost latent and very slow growing, and the other is much more aggressive, metastic and deadly (Ablin, 2014; Peehl, 1999; Schröder, Hugosson, Roobol, & et al, n.d.) 2. Therefore, despite PC being so fatal, the numbers are relatively low considering how many will men have it (Peehl, 1999).

Incidents and deaths from PC skyrocketed in the nineties (National Cancer Institute, 2017). At this time, a general male population test was introduced; the prostate specific antigen (PSA) test, but its use quickly became controversial (Ablin, 2014).

It is not cancer-specific, and as there is a high incidence of pre-malignant microscopic lesions in most prostate glands, critics argue the test overdiagnoses the severity of the cancer, resulting in unnecessary biopsies and radical treatment, rather than watching and waiting to determine what kind of tumor it is 3  (Ablin, 2014; Andriole et al., 2009; Benoit & Naslund, 1995; Halpern et al., 2017; Lyons et al., 2017; Moyer, 2012; Peehl, 1999; Schröder et al., n.d.; Vollmer, 2012).

In fact, in 2012 the U.S Preventive Services Task Force (USPSTF) recommended against the use of PSA for general population screening, but rather recommended it for use in Active Surveillance to determine the rate of growth of the cancer (Andriole et al., 2009; Moyer, 2012).

The changing levels of use of the PSA test before and after the USPSTF recommendation has directly and significantly impacted the biopsy and radical prostatectomy volumes (Ablin, 2014; Halpern et al., 2017).

This conflict between health care practice and the advice of government bodies makes a challenging environment for the prevention and treatment of PC.

Prevalence of Prostate Cancer in African American men

Disturbingly, African Americans (AA) have for many years had the highest rates of PC caused fatalities in the world (Blocker, Romocki, Thomas, Jones, & al, 2006; Levi, Kohler, Grimley, & Anderson-Lewis, 2007; Odedina, Scrivens, Emanuel, LaRose-Pierre, & al, 2004).

In 2017, prostate cancer incidence rates for African Americans (AA) were 1.5 times more likely than for non-Hispanic white Americans  (NHWs), and mortality rates were double that of NHWs (National Cancer Institute, 2017; Taksler, Cutler, Giovannucci, Smith, & Keating, 2013; Taksler, Keating, & Cutler, 2012).

The AA mortality rate has dropped by over 30% since 2007, and over 400% since 1993, when the disparity was 2.5 times greater likelihood to die from prostate cancer than NPW,  however this is still a very poor outcome for a lot of Americans (National Cancer Institute, 2017; Taksler et al., 2012).

The direct drivers of this disparity are threefold: AA develop PC earlier in life, and the cancer is at a later stage when diagnosed, and once diagnosed AA do not receive all the recommended treatments (American Cancer Society, 2018; Hawley & Morris, 2017; Levi et al., 2007; Morris, Rhoads, Stain, & Birkmeyer, 2010; National Cancer Institute, 2017).  

This literature review asks: what are the underlying factors that cause these outcomes for AA men?  

From a biological point of view, there is no strong evidence to date to prove that AA experience more aggressive tumor biology than NDWs (Jaratlerdsiri et al., 2018; Morris et al., 2010). African genes may be more susceptible to PC in general however,  (Chornokur et al., 2012; Wang et al., 2017), and  recent genome sequencing research has indicated the potential for a genetic difference resulting in worse health outcomes for those with African genes (Jaratlerdsiri et al., 2018).

Physically, the reduced ability to absorb vitamin D may be contributing to racial disparities. Vitamin D deficiency has been linked to prostate cancer, and AAs with higher melanin in their skin are slower to absorb Vitamin D than white people (Peehl, 1999; Taksler et al., 2012). Further research in the biology of PC in AA would be worthwhile.

Lower socioeconomic status (SES) is a factor in lower PC survival rates (Klein & von dem Knesebeck, 2015), and as a large proportion of AA are in lower SES groups than NHWs, they suffer PC disproportionately due to SES also  (Morris et al., 2010).

The rest of this literature review focuses on whether there are racial disparities in patient and practitioner behavior that may contribute to AA to not be diagnosed early enough and to not receive all the recommended treatment (Morris et al., 2010).

Exploration of casual factors in racial disparity using Reasoned Action Approach

 The reasoned-action approach can be used as a framework to predict a person’s behavior towards prevention, screening and treatment of PC (Ajzen, 1991; McEachan et al., 2016; Tippey, 2012).

“The reasoned-action approach states that attitudes towards the behavior, perceived norms, and perceived behavioral control determine people’s intentions, while people’s intentions predict their behaviors.” (Levi et al., 2007).

Patient Attitudes, Beliefs and Perceptions  

Patients behaviors regarding prevention, screening and treatment options have many influences, some have been proven to contribute to racial disparities in PC outcomes, and others have not.

Participation in prevention and screening behavior

In terms of preventative health attitudes and behaviors, research has found that a diet high in red meat and fat increases the risk of prostate cancer, and conversely a diet high in vegetables (especially cruciferous vegetables) has been shown to reduce it  (Blocker et al., 2006; Cohen, Kristal, & Stanford, 2000).  The AA diet is generally worse on these measures than white men (Blocker et al., 2006). Attitudes underlying this difference could be a significant contributor to the racial disparity in mortality rate and would be good to research further.

AA have lower participation rates in PC screening that NHW (Morris et al., 2010), which definitely contributes to the higher mortality rate. There are different reasons for this.

Research has found that those with family history have greater knowledge of the risk of PC as representativeness and availability heuristics works towards weighting the risk appropriately (McDowell, Occhipinti, & Chambers, 2013). There is no evidence that this is a cause of racial disparity however.

However, there is a body of research supporting significant negative associations to screening behavior in AA men, relating to feelings of embarrassment, decision regret for multiple types of treatment and threats to masculine sexual identity as a result of impotence and lethargy following treatment, but again it is not known if these contribute to the racial disparity (Allen, Kennedy, Wilson-Glover, & Gilligan, 2007; Collingwood et al., 2014; Hawley & Morris, 2017; Odedina et al., 2004).

Studies have shown that awareness or knowledge of screening was less of an indicator of participating in screening than being advised to do so by a doctor (Meissner, Potosky, & Convissor, 1992). Evidence supports that there is a racial discrepancy in having a regular doctor, and trust in the health care profession, due to a history and perceptions of racism, and also cognitive biases and difficulty in communication because so many of the medical profession are white and have different cultural sensitivities (Blocker et al., 2006; Hawley & Morris, 2017; Kahneman & Frederick, 2002; Morris et al., 2010; Odedina et al., 2004). 

Building up trust and regular contact with the medical profession is vital for AA to receive culturally and personally relevant advice, to encourage participation in screening despite the negative associations and attitudes towards prostate cancer (Grubbs et al., 2013; Hawley & Morris, 2017; Morris et al., 2010).  A program in Delaware brought the racial disparity in colorectal cancer down to zero over ten years, through building up trust by using local doctors and community leaders to promote screening behaviors (Grubbs et al., 2013).

Attitudes and preferences in regards treatment

Attitudes and preferences towards treatment options have been measured in studies in terms of expectations, decision conflict, satisfaction and regret, and mostly there were no racial disparities, except for one very important one (Collingwood et al., 2014; Lyons et al., 2017; Meissner et al., 1992; Potosky et al., 2001; Reamer, Yang, & Xu, 2016).

The main racial disparity lies in the lower proportion of AA men who participate in a shared decision-making process with their doctor, which in turn affects the metrics (Collingwood et al., 2014; Hawley & Morris, 2017; Morris et al., 2010).  

One study found that decision regret was greater in African Americans, for both radical surgery and non-treatment, and it was suggested that this could be due to the level of shared decision making with the health care provider to manage patient expectations (Collingwood et al., 2014).

Higher decision regret due to reduced quality of life from radical surgery can reinforce the community’s negative associations with prostate cancer, and influence the number of people participating in screening (Blocker et al., 2006; Hawley & Morris, 2017)

In addition, if the treatment is biased towards active treatment over active surveillance, these impacts can also be totally avoidable because the surgery may be unnecessary, and therefore these outcomes reinforce the feeling of mistrust (Ablin, 2014; Reamer et al., 2016; Xu et al., 2016).    

Studies have shown there does tend to be a bias towards active treatment over active surveillance, however no racial differences were found in the results (Reamer et al., 2016; Xu et al., 2016). Patients are fearful upon being diagnosed with PC, and feel that active surveillance is “doing nothing” (Reamer et al., 2016; Xu et al., 2016). Hence doctors play a vital role in ensuring patients control their fear and make a good decision for their treatment (Blocker et al., 2006; Reamer et al., 2016; Xu et al., 2016).

Lyons et al also looked at preferences for active treatment (AT) versus active surveillance, and found that people with a close relationship with a trusted physician were able to overcome their preference for AT (Lyons et al., 2017). Again, no racial disparity was found, but this must be considered in the context of lesser participation of regular contact with a regular doctor in AA communities (Grubbs et al., 2013; Hawley & Morris, 2017; Morris et al., 2010).

Health Care Providers Knowledge and Beliefs

The literature reveals three potential factors for unbalanced representation of AA in PC health care.

Researchers

Researchers may be employing heuristics that unintentionally create systematic bias that excludes AA in their research, or focus overly on them as controlling the outcome (Kahneman & Frederick, 2002).

For example, Vastola et al argue that the criteria for participation in clinical trials are set at levels that exclude a disproportionate number of AA, due to differences in the average levels for these criteria between NHW and AA populations (Vastola et al., 2018).

Whilst there has not been a review of research disparities in PC, research conducted by Rosenberg et al found that homicide was the biggest contributor to mortality for AA and received significantly less research funding and effort than heart disease which was the greatest killer of white people (Rosenberg et al., 2017).

Therefore, researchers need to consider if their programs are unintentionally excluding African Americans.

Health Care Providers

Health care providers are essential to giving AA patients sound advice when choosing active treatment over active surveillance, given the consequences to the patients quality of life (Ablin, 2014; Collingwood et al., 2014; Lyons et al., 2017). Patients are biased towards action due to the fear of being diagnosed with PC, and feel that active surveillance is doing nothing (Ablin, 2014; Collingwood et al., 2014; Lyons et al., 2017). It is up to the doctor to advise them that most PC is not aggressive and should be monitored in the first instance, because once they are referred to a urologist, the chance of them having surgery increases dramatically (Ablin, 2014; Collingwood et al., 2014; Lyons et al., 2017).

Administrators and Government

There is a very sound business case for government investment in free screening and treatment of PC for lower SES African Americans.

A ten-year trial in Delaware for colorectal cancer reduced the racial disparity in mortality to zero by providing free screening and treatment to low SES people, and it was much cheaper than funding surgery and medicines (Grubbs et al., 2013). This program was also culturally sensitive, utilizing local doctors and community leaders like pastors to promote screening (Grubbs et al., 2013).

Government and policy makers must consider if they are biased towards cures rather than prevention, or are allocating resources towards one community over another and contributing to the PC mortality rate disparity.

Further areas for research

Overall, it is difficult to grasp which factors are the more significant contributors to racial disparity in PC mortality from the research, because each study is on such a narrow topic.

Therefore, further research to measure the impact of each factor would be useful to be able to prioritize efforts to reduce the AA mortality rate.

An analysis of the research from this perspective, plus quantitative analysis to build a predictive model would be useful.

Also, researchers should try to cover the views of patients and practitioners in their studies, as that relationship is so important in the prevention of PC deaths.

Lastly, research into the reasoned action approach in relation to a PC preventative diet would also be fruitful.

References

Ablin, R. J. (2014). The great prostate hoax : how big medicine hijacked the PSA test and caused a public health disaster (First edition.). New York, NY: Palgrave Macmillan.

Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179–211. https://doi.org/10.1016/0749-5978(91)90020-T

Allen, J. D., Kennedy, M., Wilson-Glover, A., & Gilligan, T. D. (2007). African-American men’s perceptions about prostate cancer: Implications for designing educational interventions. Social Science & Medicine, 64(11), 2189–2200. https://doi.org/10.1016/j.socscimed.2007.01.007

American Cancer Society. (2018). Cancer Facts & Figures 2018. Retrieved September 10, 2018, from https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-facts-figures-2018.html

Andriole, G. L., Crawford, E. D., Grubb, R. L., Buys, S. S., Chia, D., Church, T. R., … PLCO Project Team. (2009). Mortality results from a randomized prostate-cancer screening trial. The New England Journal of Medicine, 360(13), 1310–1319. https://doi.org/10.1056/NEJMoa0810696

Benoit, R. M., & Naslund, M. J. (1995). Detection of latent prostate cancer from routine screening: comparison with breast cancer screening. Urology, 46(4), 533–536; discussion 536-7.

Blocker, D. E., Romocki, L. S., Thomas, K. B., Jones, B. L., & al,  et. (2006). Knowledge, Beliefs and Barriers Associated with Prostate Cancer Prevention and Screening Behaviors among African-American Men. Journal of the National Medical Association; Washington, 98(8), 1286–1295.

Chornokur, G., Han, G., Tanner, R., Lin, H., Gwede, C., Kumar, N., … Phelan, C. (2012). Risk factors of prostate cancer in African American men. Cancer Research, 72(s8). https://doi.org/10.1158/1538-7445.AM2012-3592

Cohen, J. H., Kristal, A. R., & Stanford, J. L. (2000). Fruit and Vegetable Intakes and Prostate Cancer Risk. Journal of the National Cancer Institute, 92(1), 61–68. https://doi.org/10.1093/jnci/92.1.61

Collingwood, S. A., McBride, R. B., Leapman, M., Hobbs, A. R., Kwon, Y. S., Stensland, K. D., … Samadi, D. B. (2014). Decisional regret after robotic-assisted laparoscopic prostatectomy is higher in African American men. Urologic Oncology: Seminars and Original Investigations, 32(4), 419–425. https://doi.org/10.1016/j.urolonc.2013.10.011

Grubbs, S. S., Polite, B. N., Carney, J., Bowser, W., Rogers, J., Katurakes, N., … Paskett, E. D. (2013). Eliminating Racial Disparities in Colorectal Cancer in the Real World: It Took a Village. Journal of Clinical Oncology, 31(16), 1928–1930. https://doi.org/10.1200/JCO.2012.47.8412

Halpern, J. A., Shoag, J. E., Artis, A. S., Ballman, K. V., Sedrakyan, A., Hershman, D. L., … Hu, J. C. (2017). National Trends in Prostate Biopsy and Radical Prostatectomy Volumes Following the US Preventive Services Task Force Guidelines Against Prostate-Specific Antigen Screening. JAMA Surgery, 152(2), 192–198. https://doi.org/10.1001/jamasurg.2016.3987

Hawley, S. T., & Morris, A. M. (2017). Cultural Challenges to Engaging Patients in Shared Decision Making. Patient Education and Counseling, 100(1), 18–24. https://doi.org/10.1016/j.pec.2016.07.008

Jaratlerdsiri, W., Chan, E. K. F., Gong, T., Petersen, D. C., Kalsbeek, A. M. F., Venter, P. A., … Hayes, V. M. (2018). Whole Genome Sequencing Reveals Elevated Tumor Mutational Burden and Initiating Driver Mutations in African Men with Treatment-Naive, High-Risk Prostate Cancer. Cancer Research, canres.0254.2018. https://doi.org/10.1158/0008-5472.CAN-18-0254

Kahneman, D., & Frederick, S. (2002). Representativeness revisited: Attribute substitution in intuitive judgment. In Heuristics and biases:  The psychology of intuitive judgment. (pp. 49–81). New York,  NY,  US: Cambridge University Press. https://doi.org/10.1017/CBO9780511808098.004

Klein, J., & von dem Knesebeck, O. (2015). Socioeconomic inequalities in prostate cancer survival: A review of the evidence and explanatory factors. Social Science & Medicine, 142, 9–18. https://doi.org/10.1016/j.socscimed.2015.07.006

Levi, R., Kohler, C. L., Grimley, D. M., & Anderson-Lewis, C. (2007). The Theory of Reasoned Action and Intention to Seek Cancer Information. American Journal of Health Behavior; Star City, 31(2), 123–134.

Lyons, K. D., Li, H. H., Mader, E. M., Stewart, T. M., Morley, C. P., Formica, M. K., … Hegel, M. T. (2017). Cognitive and Affective Representations of Active Surveillance as a Treatment Option for Low-Risk Prostate Cancer. American Journal of Men’s Health, 11(1), 63–72. https://doi.org/10.1177/1557988316657041

McDowell, M. E., Occhipinti, S., & Chambers, S. K. (2013). The influence of family history on cognitive heuristics, risk perceptions, and prostate cancer screening behavior. Health Psychology, 32(11), 1158–1169. http://dx.doi.org.libproxy2.usc.edu/10.1037/a0031622

McEachan, R., Taylor, N., Harrison, R., Lawton, R., Gardner, P., & Conner, M. (2016). Meta-Analysis of the Reasoned Action Approach (RAA) to Understanding Health Behaviors. Annals of Behavioral Medicine, 50(4), 592–612. https://doi.org/10.1007/s12160-016-9798-4

Meissner, H. I., Potosky, A. L., & Convissor, R. (1992). How Sources of Health Information Relate to Knowledge and Use of Cancer Screening Exams. Journal of Community Health; New York, N.Y., 17(3), 153–165.

Morris, A. M., Rhoads, K. F., Stain, S. C., & Birkmeyer, J. D. (2010). Understanding Racial Disparities in Cancer Treatment and Outcomes. Journal of the American College of Surgeons, 211(1), 105–113. https://doi.org/10.1016/j.jamcollsurg.2010.02.051

Moyer, V. A. (2012). Screening for Prostate Cancer: U.S. Preventive Services Task Force Recommendation Statement. Annals of Internal Medicine, 157(2), 120. https://doi.org/10.7326/0003-4819-157-2-201207170-00459

National Cancer Institute. (2017, April 14). SEER*Explorer: An interactive website for SEER cancer statistics. Retrieved September 28, 2018, from https://seer.cancer.gov/explorer/.

Odedina, F. T., Scrivens, J., Emanuel, A., LaRose-Pierre, M., & al,  et. (2004). A Focus Group Study of Factors Influencing African-American Men’s Prostate Cancer Screening Behavior. Journal of the National Medical Association; Washington, 96(6), 780–788.

Peehl, D. M. (1999). Vitamin D and Prostate Cancer Risk. European Urology, 35(5–6), 392–394. https://doi.org/10.1159/000019914

Potosky, A. L., Knopf, K., Clegg, L. X., Albertsen, P. C., Stanford, J. L., Hamilton, A. S., … Hoffman, R. M. (2001). Quality-of-Life Outcomes After Primary Androgen Deprivation Therapy: Results From the Prostate Cancer Outcomes Study. Journal of Clinical Oncology, 19(17), 3750–3757. https://doi.org/10.1200/JCO.2001.19.17.3750

Reamer, E., Yang, F., & Xu, J. (2016). Abstract A48: Treatment decision making in a population-based sample of black and white men with localized prostate cancer. Cancer Epidemiology and Prevention Biomarkers, 25(3 Supplement), A48–A48. https://doi.org/10.1158/1538-7755.DISP15-A48

Rosenberg, M., Ranapurwala, S. I., Townes, A., & Bengtson, A. M. (2017). Do black lives matter in public health research and training? PLoS ONE, 12(10). https://doi.org/10.1371/journal.pone.0185957

Schröder, S., Hugosson, J., Roobol, M., & et al. (n.d.). Screening and Prostate-Cancer Mortality in a Randomized European Study | NEJM. Retrieved October 2, 2018, from https://www-nejm-org.libproxy2.usc.edu/doi/10.1056/NEJMoa0810084?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub%3Dwww-ncbi-nlm-nih-gov.libproxy2.usc.edu

Segal, R. J., Reid, R. D., Courneya, K. S., Malone, S. C., Parliament, M. B., Scott, C. G., … Wells, G. A. (2003). Resistance Exercise in Men Receiving Androgen Deprivation Therapy for Prostate Cancer. Journal of Clinical Oncology, 21(9), 1653–1659. https://doi.org/10.1200/JCO.2003.09.534

Taksler, G. B., Cutler, D. M., Giovannucci, E., Smith, M. R., & Keating, N. L. (2013). Ultraviolet index and racial differences in prostate cancer incidence and mortality. Cancer, 119(17), 3195–3203. https://doi.org/10.1002/cncr.28127

Taksler, G. B., Keating, N. L., & Cutler, D. M. (2012). Explaining racial differences in prostate cancer mortality. Cancer, 118(17), 4280–4289. https://doi.org/10.1002/cncr.27379

Tippey, A. R. (2012). Cortisol Response to Prostate Cancer Screening Information among African American Men (M.A.). East Carolina University, United States — North Carolina. Retrieved from http://search.proquest.com/docview/1069314501/abstract/6256BC0A6974244PQ/1

Vastola, M. E., Yang, D. D., Muralidhar, V., Mahal, B. A., Lathan, C. S., McGregor, B. A., & Nguyen, P. L. (2018). Laboratory Eligibility Criteria as Potential Barriers to Participation by Black Men in Prostate Cancer Clinical Trials. JAMA Oncology, 4(3), 413–414. https://doi.org/10.1001/jamaoncol.2017.4658

Vollmer, R. T. (2012). The Dynamics of Death in Prostate Cancer. American Journal of Clinical Pathology, 137(6), 957–962. https://doi.org/10.1309/AJCPJK9V9LUMUETV

Wang, Y., Freedman, J. A., Liu, H., Moorman, P. G., Hyslop, T., George, D. J., … Wei, Q. (2017). Associations between RNA splicing regulatory variants of stemness-related genes and racial disparities in susceptibility to prostate cancer: Stemness-related genes and racial disparities in prostate cancer. International Journal of Cancer, 141(4), 731–743. https://doi.org/10.1002/ijc.30787

Xu, J., Janisse, J., Ruterbusch, J. J., Ager, J., Liu, J., Holmes-Rovner, M., & Schwartz, K. L. (2016). Patients’ Survival Expectations With and Without Their Chosen Treatment for Prostate Cancer. The Annals of Family Medicine, 14(3), 208–214. https://doi.org/10.1370/afm.1926

Footnotes

1 For the record, lung cancer is the greatest killer for both men and women, with over 150,000 deaths estimated for 2018 (American Cancer Society, 2018).

2 “Independent, multiple foci of cancer are present in the majority of prostate specimens, and the incidence of premalignant lesions is even higher than that of cancer. Yet, despite the high incidence of microscopic cancer, only 8% of men in the US present with clinically significant disease during their lifetime. Furthermore, only 3% of men in the US die of prostate cancer. In no other human cancer is there such disparity between the high incidence of microscopic malignancy and the relatively low death rate. Thus, there are many windows of opportunity for control of prostate cancer.” (Peehl, 1999)

3 There are a number of different treatment options for PC: open retropubic radical prostatectomy, the newer robot assisted laparoscopic prostatectomy, external beam radiation, primary androgen deprivation therapy (to castration levels) and active monitoring/surveillance (Collingwood et al., 2014; Potosky et al., 2001; Segal et al., 2003).

Figure 1 Conceptual Model

Figure 1.  A conceptual model of mechanisms underlying disparities in cancer outcomes  (Morris et al., 2010)

As shown in Figure 1 above, cancer outcomes are influenced by effective cancer care, which in turn is driven by the patient’s utilization of health care, and quality of health care provided by the system and practitioners (Morris et al., 2010).  

Utilization of health care can be influenced by the patients socioeconomic status (SES) which affects their knowledge and ability to pay for care, geography which affects their access to care, race as physical differences can make a person more susceptible to certain cancers, and the persons beliefs and preferences (Morris et al., 2010).   There are also physical differences such as cancer stage, tumor biology, and comorbid diseases.

The quality of health care is influenced by the practioners knowledge, beliefs and technical skills, and the resources of the health care system (Morris et al., 2010).  

Tracy Keys’ Communication Data Science blog

Data Science as

creative expression

and exploration

of society

You must do the thing you think you cannot do

— Eleanor Roosevelt.

Welcome! I am Tracy Keys, and you can find me on Instagram @benjibex.

This blog is all about my passion for media, entertainment, fashion, society and the environment and how data science can be used as a tool in communication, activism, politics and marketing. This site is a showcase for my creations as I develop from data lover to communication data science professional.

I’m just getting this new blog going, so right now it’s mostly the transfer of my academic papers and blogs into one place. Ultimately, I want to express myself with data science and explore society through this medium: data science is also an art and highly creative as well as being analytical. The work to date comes from my journey of exploration and learning, but bringing it to life with my tone of voice will no doubt be a lifelong addiction. Stay tuned for more blog entries. Subscribe below to get notified when I post new updates.

Or Browse Categories

The true loss for a society of Pokie Players

UPDATE 26th of May:

I have recently discovered that turnover is defined by cash plus wagering wins, which means if a person puts in $300 cash, wins and loses $2700 over the course of the day in small increments, and then loses their $300 too, they have “wagered” $3000 and “won” $2700 so the expenditure is only $300 ie 10% gross margin. So I cant even begin to count how much has really been lost or wagered by problem gamblers! This makes true measurement/ accurate metrics so much more important.

You may have heard that NSW has the second highest number of gaming machines in the world (99k), second only to Nevada (181k).  But unlike Nevada, whose capitol is Las Vegas, these poker machines are all in local pubs and clubs, being played by regular locals, not tourists. (The 200k gaming machines in Australia does not include those in Casinos.) So what does this mean for local people and our communities in NSW and around Australia?

The Queensland Government have been conducting a survey “Australian Gambling Statistics” nationally for 33 years.

33 years ago, in 1990/91, Gaming Machine gross profit (known as expenditure in the survey) was $3.2bn in today’s dollars, or $233 lost on average per Australian adult . As shown in Figure 1, the biggest losers were NSW ($680.50) and the ACT ($649.10).

Figure 1 Real Gaming Machine Expenditure per capita 1990/1991

Gaming machines were first made legal in Australia in 1956, but these days Australians play electronic gaming machines (EGMs) with much faster spin cycles and multi line play that can accept notes, without limit. This means that a person can now put as much as $1500 an hour through a gaming machine (Productivity Commission). Figure 2 compares the old mechanical “one armed bandits” with the modern EGM.

Figure 2 An original Aristocrat gaming machine compared to a modern Electronic Gaming Machine

Now, in 2015/6, these product innovations along with deregulation has more than doubled the average Australian adults loss to $650 (Figure 3 shows this broken down by State) , and quadrupled gross profit to $12bn.

Figure 3 2015/6 Gaming Machines real expenditure per capita

Gaming Machine Turnover is a massive $142bn up from $23bn in 1990, and 4% of adult Australians (600,000) play gaming machines more than once a week. An estimated 95,000 Australians (0.6% of the adult population)  are classified as Problem Gamblers, and it is estimated they are responsible for 40% of gaming machine turnover (Productivity Commission).

That average loss of $650, even the NSW loss of $1,023 shown in Figure 3 seems to obscure this imbalance.

Whilst the various State legislation requires gaming machines to pay out a minimum of 85% of turnover as winnings (Productivity Commission), anyone who has ever gambled knows that these winnings are not evenly distributed. How can we get a better sense of how much people in our communities are losing?

Firstly, turnover is a more accurate measure of how much money people put through the EGMs. This has increased 423% since 1990/91 from $1,812 per adult to $7,670 in 2015/6 .

But if 40% of turnover is contributed by 95k problem gamblers, how much is that per problem gambler?

Figure 4 Real Gaming Machine Turnover by Problem Gambler

$600k per problem gambler in 2015/6 (Figure 4), up from from $98k in 1990/1. At today’s prices in Sydney, that could mean up to 95,000 families lost their homes to playing the pokies.

There isn’t much other data available other than these averages, and even from simple maths, you can see the figures are quite devastating.

My goal is to convince national government policy makers, to change the way gaming machine losses are reported on. Gaming machines account for every dollar that flows in and out of them and are reported to the State for tax collection purposes.

State Governments must show a frequency histogram of the amount of gains and losses from these machines so our community understands the true cost to individuals, and just how rare a win is. Im sure it is even more than $600,000 for some people.

This way we will all learn the true cost of gaming machines to our society.

Context:

I needed to counter balance my last post with this one, against gaming machines!

My goal is to convince my target audience, national government policy makers, to change the way gaming machine losses are reported on. Gaming machines account for every dollar that flows in and out of them. We should be able to show a frequency histogram of the amount of gains and losses from these machines so society understands the true cost to individuals, and just how rare a win is. The medium for this article is an online blog.

My data is from the Australian Gambling Statistics 1990–91 to 2015–16, 33rd edition which is a survey conducted annually by the Queensland Government.  The data comes in excel format, ready to use.  I augmented this with data on the number of gaming machines in each state, combined with the Australian Government Productivity Commission’s Inquiry into Gambling from 2010.

Some definitions:
Gaming machines: All jurisdictions, except Western Australia, have
a state–wide gaming machine (poker machine) network operating in clubs and/or hotels. (WA only has machines in the Crown Casino, 1,750 of them). The data reported under this heading do not include gaming machine data from casinos. Gaming machines accurately record the amount of wagers played on the machines. So turnover is an actual figure for each jurisdiction. In most jurisdictions operators must return at least 85 per cent of wagers to players as winnings, either by cash or a mixture of cash and product.
Instant lottery: Commonly known as ‘scratchies’, where a player scratches a coating off the ticket to identify whether the ticket is a winner. Prizes in the instant lottery are paid on a set return to player and are based on the number of  tickets in a set, the cost to purchase the tickets, and a set percentage retained by  the operator for costs.
Expenditure (gross profit): These figures relate to the net amount lost or, in other words, the amount wagered less the amount won, by people who gamble.  Conversely, by definition, it is the gross profit (or gross winnings) due to the operators of each particular form of gambling.

Hollywood: a man’s world?

Context:

In this blog, I am imagining I am guest lecturer at University of Southern California’s Cinematic Arts School.  I work at seejane.org which is the public site of Geena Davis Institute on Gender in the Media. My target audience are students enrolled in Film Studies. This blog shows the presentation I did (although it was to fellow students at UTS in Sydney).

My goal is convince new film makers that diversity is where the money is. My data on Hollywood is from a survey conducted by the Annenberg School of Journalism for the Geena Davis Institute on Gender in the Media entitled “Gender Roles & Occupations: A Look at Character Attributes and Job-Related Aspirations in Film and Television”. This is a study of 129 family films, 275 prime time shows and 36 children’s shows from 2006-2011, and evaluates this media on the roles it portrays for males and females.

My data on the population’s roles for males and females is from the US Bureau of Labor Statistics, and I use this to compare Hollywood to reality.

My take on the research conducted by the Institute is that it has actually been quite timely, in that in the years since 2011 when it was conducted, Hollywood has actually really started a massive movement and change. So my pitch is very positive and saying that the tide is turning by looking at real box office results from box office mojo.

My pitch deck begins by outlining how from 2006-2011 there is clearly gender imbalance towards males, and also a narrow view of what it means to be a male (slide 2-4). My visualisations highlight the key indicators of the lack of diversity in the cast (on a gender axis) and the types of professions the majority of men are portrayed in.

In slide 3, I posit the reason for this imbalance, due to the huge number of male directors dominating Hollywood. My visualisation highlights the 100% statistic and shows some familar faces.

In slide 6  I show who is underrepresented, women and certain males, and my visualisation in Slide 7 creates a metric to show the imbalance, ie the variance between population proportion of roles vs Hollywood, so under representation is on the left, and over representation is on the right.

In slide 8 and 9 I give evidence that since 2011 this is actually starting to change with some serious box office success of films with diverse casts.

Slide 10 is my key take away to students, that magic happens through embracing diversity and uncertainty, rather than a staid old formulaic film.

The presentation:

Slide 1: Title Slide

Slide 2 Establishing the basis for the argument: Observation 1

Slide 3 Establishing the basis for the argument: Observation 2

Slide 4 Establishing the basis for the argument: Observation 3

Slide 5 Surmising the underlying reason

Slide 6  Expressing dissatisfaction with Hollywood who leaves large groups underrepresented

Slide 7  Expressing the opportunity in the imbalance

Slide 8  Evidence of a shift in the balance

Slide 9  Final Argument 

Slide 10 Conclusion and Key take away

A News Agent and a Publican walk into a bar

A News Agent and a Publican walk into a bar….

Context:

My imaginary target audience is News Agent National Association members, and the medium is an online newsletter to members. The message is to lobby for gaming machine licenses, and the goal is a call to action to members to give their feedback on this proposal (via a survey). My role is the advocate for this idea. This idea is for NSW in Australia, where poker machines are legal in any pub, club or hotel.

Please note the ideas in this blog post are not endorsed by me, it’s a future thinking exercise.

The Proposal:

Australia’s 3,800 News Agencies have been suffering greatly in recent times, and their future continues to look dark.

As our members know too well,  turnover is forecast to continue to decline 3% annually, driven largely by consumption of free digital media, but also the decline in instant lottery sales, which represents one quarter of News Agencies $2bn turnover.

The other enormous problem that our members face is that news agencies exclusive rights to sell scratchies in Australia expired on 31 March 2018 (after being extended for 5 years), and super markets are desperate to move in on instant lottery sales.

News agencies still have not diversified and seem at a loss to resolve this problem.

For inspiration, News Agencies, the National Association and the Media companies who rely on our member network should look at Gaming Machines.

Gaming Machines have been the source of the decline in instant lottery turnover, as gamblers turn in droves to gaming machines in pubs and clubs since deregulation in the early nineties.

Gaming machines are now widely distributed (except for WA), present in 3,000 licensed pubs and clubs (75% of total), and there are almost 200k machines, with half of them located in NSW (Figure 1) in these venues.

Figure 1  The proliferation of Gaming Machines in Clubs and Pubs in 2015/2016

Subsequently, Gaming machine turnover has increased 600%, from $23 bn in 1990/91 to $143bn in 2015/16 (Figure 2) .

Figure 2 Explosion in turnover of Gaming Machines in Clubs and Pubs

For pubs across Australia, this has meant a reversal in fortunes for the publican. Each machine is estimated to make over $100k annually for the pub owner.

If News Agencies were given the license to have poker machines on premises, they could double State revenue assuming another 200k machines could be installed in their base of 3,800 outlets, and provide a future revenue stream that will support their business well beyond what instant lotteries and newspapers could do.

Figure 2 Newsagency of the future

The National Association of News Agencies are polling our members to assess the level of support for this future direction of News Agencies.  Once we have gathered the facts, we will lobby both media companies and Government as your representative body for the legislative change to make this happen.

Please complete the survey HERE to give your view on this vital issue by 15 June 2018.

My data is from the Australian Gambling Statistics 1990–91 to 2015–16, 33rd edition which is a survey conducted annually by the Queensland Government.  The data comes in excel format, ready to use.  I augmented this with data on the number of gaming machines in each state, combined with the Australian Government Productivity Commission’s Inquiry into Gambling from 2010.

Some definitions:
Gaming machines: All jurisdictions, except Western Australia, have
a state–wide gaming machine (poker machine) network operating in clubs and/or hotels. (WA only has machines in the Crown Casino, 1,750 of them). The data reported under this heading do not include gaming machine data from casinos. Gaming machines accurately record the amount of wagers played on the machines. So turnover is an actual figure for each jurisdiction. In most jurisdictions operators must return at least 85 per cent of wagers to players as winnings, either by cash or a mixture of cash and product.
Instant lottery: Commonly known as ‘scratchies’, where a player scratches a coating off the ticket to identify whether the ticket is a winner. Prizes in the instant lottery are paid on a set return to player and are based on the number of  tickets in a set, the cost to purchase the tickets, and a set percentage retained by  the operator for costs.
Expenditure (gross profit): These figures relate to the net amount lost or, in other words, the amount wagered less the amount won, by people who gamble.  Conversely, by definition, it is the gross profit (or gross winnings) due to the operators of each particular form of gambling.

Using think cell for the corporate audience

Sometimes, you have an extremely corporate audience, all blue suits and ties.

This audience has seen it all before, every tool, every business idea and every design fad. They do not want razzle dazzle, they want accountability and reproduce-ability.

They want transparent and honest presentation of your research,  assumptions, workings, plans and conclusions, to ensure stakeholders can critique and ultimately buy into your work.

For these types of audiences, I use think cell. It is the secret weapon behind the professional charts of a consultancy firm, and I am sharing it with you!  It is an excel and powerpoint plug in, and costs about $300 a year, although you can get a 28 day free trial when you sign up https://server.think-cell.com/portal/en/trial.srf.

Think cell can do waterfall charts in a flash, gorgeous work break down structures using GANTT charts, calculate and demonstrate cumulative annualised growth rate in a few clicks.

For anyone who has tried to do these things in Excel, you are going to enjoy seeing how easy this is.

Waterfall Charts

Waterfall charts are often used to show contributions to movements in profit, revenue or expenditure from one period to the next https://en.wikipedia.org/wiki/Waterfall_chart.

The ABS National Expenditure data

For my data, I am using the Australian Bureau of Statistics National Accounts for 2016/2017 and 2015/2016 to illustrate movements in national expenditure

http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/5206.0Dec%202017?OpenDocument#Time

I selected this data because it is publicly available and appropriate for use of a waterfall chart.

My first step is to ensure I understand how the columns work, and what is a subtotal.

There is a tiny bit of cleaning required.

It is common for ABS data to compare the current period to the same period last year i.e Dec 2017 to Dec 2016, to account for seasonal variations between quarters.

So I only included December quarters. I then worked out the movement between each December quarter in each category.

Think cell viz tool

Think cell is a Office plug in therefore has its own menu in Excel (and Powerpoint) as shown in Figure 3 below. I selected the waterfall option.

Think cell have a guide to create each of their charts, including waterfall charts: https://www.think-cell.com/en/support/manual/waterfall.shtml

Firstly, you need to attempt to lay the data out as shown in the guide, and select it with your mouse (Table 1).  Note the empty row between header and data which is required. also note the “e” for the end column.

Table 1 Excel Data Table

Then you select the chart you want: waterfall (Figure 3).

Figure 3 Selecting the right chart

You then move into Powerpoint to paste the chart. This is a bug as you used to be able to paste directly into Excel.  Figure 4 shows how Table 1 is laid out, straight out of the box.

Figure 4 Waterfall Chart

The last step is customising it to make it easier to read for the audience.

Figure 5 shows the options when you right click, which are all there to easily add more or less detail and allow you to focus.

Figure 5 Right Click

Almost every aspect is configurable if you click on it. You just need to zoom in so you can differentiate the details.

Figure 6 shows about 5 minutes of finessing within Powerpoint.

Figure 6 The power of think cell

This is the kind of thing corporate executives love to see. The colours are consistent. Key variances are highlighted.

Except for that little yellow 18 which is an error in movement (see how accountable think cell is!).

I just love think cell.  Of my three blogs about data viz tools, this is the one tool I have used extensively before, but I just had to share it.

This tool is built on Microsoft Office and distills years of consulting experience into its left and right mouse buttons. The only downside: it doesnt work with Google docs and probably not Mac either.

Down the Conversion Funnel using rawgraphs.io

Feedback on my first Data Viz blog led me down the conversion funnel

My first blog and in class presentation used Tableau to explore conversion rates

My feedback suggested funnel charts and Sankey diagrams, and a free app rawgraph.io site.

A Sankey Diagram, and the more recent Alluvial Charts are an attention grabbing flow diagram

These diagram types basically map the change between a number of histograms showing the same data, but split different ways, as shown by the example alluvial diagram created by Cory Brunson in Diagram 1.

Diagram  1 An Alluvial Diagram using the Titanic data set  http://corybrunson.github.io/ggalluvial/articles/ggalluvial.html

Data cleaning was an iterative and educational process

Once I began experimenting with the tool,  I questioned my decision to use Alluvial charts almost immediately!

The data I had was conversion data but was not at all in the right format, basically columns of the data by day.

Thankfully the rawgraph tool was fast, so I could play with the data.

The original data ended up like Table 2 below. It was completely aggregated without any of the daily detail.

Table 2 Final data format

For the 55,361 attempted logins, I ensured the categorical variables all added to this number by creating new variables.

The Source variable I defined as the original Mobile, Desktop and Tablet, but then added in Login, Redemption and Sign Up.

For the Destination variable I created the Login, Redemption, Sign up and Cancel Login, Cancel Redemption and Cancel Sign up values.

Lastly for the device variable, I had the original mobile, desktop and tablet and then created a Lost variable.

rawgraph was super easy to use

rawgraph was an excellent tool for quickly learning about the graphs and what data is required for them.

The website has a comprehensive library of guides.

I used “How to make an alluvial diagram” https://rawgraphs.io/learning/how-to-make-an-alluvial-diagram/

There are four steps:

  1. Load your data
  2. Choose the layout
  3. Map your dimensions
  4. Customize

It  was iterative: cycling through steps 1- 4 and reformatting the data based on what I learned, but luckily the site was very responsive.

Step 1 Load your data

I actually just pasted mine in, easy!

Figure 1 Step 1

Step 2 Choose a Chart

rawgraphs has 21 chart templates and the ability to create a custom chart.

Figure 2 Step 2

Step 3 Map your dimensions

As shown in Figure 3, the dimensions on the left are parsed into types (number and strings) to create your graph.

Figure 3 Step 3

Step 4 The final step, customise your Visualisation

In the final step, there are a limited number of customisation options, but not enough!

Figure 4 Step 4

I played around with the order of the Steps, in order to make the chart more meaningful, and ended up with Figure 5 below.

Figure 5 The Final Alluvial Chart

My findings

My alluvial chart explains how the traffic moves from mobile desktop and tablet (bottom left column), into the different steps in the funnel (middle column). The right column shows what proportion of this traffic is lost.

Honestly, this does not make intuitive sense to me, so there must be more work to be done in order to have more columns and splits.

Essentially these types of diagrams best represent a snap shot of a segmented population.  Working through this type of chart (Sankey, Alluvial and Parallel coordinates) made me realise the shortcomings in the data for this purpose.

If each error landing page had been tagged, then I would have not required a LOST category. So this data was not ideal for use with this type of chart, but I learned a lot!

I guess these charts are useful for temporal comparisons, or have no idea which parts of the website get used most.

In conclusion, I think the rawgraph.io site is easy to use, and great to learn about different charts, and what they can and cannot do, and what data formats they need.  But the Sankey-like charts did not work for the data I had, and I needed to do a lot more re-work to get it making sense.

Read on if you want more information on Sankey and Alluvial diagrams

“Sankey Diagrams are attention grabbing flowcharts that help in quick visualisation of the distribution and losses of material and energy in a process. The width of the lines used in drawing the flowchart is proportional to the quantum of material or energy.”
(source: http://www.sankeydiagrams.com)

As material or volume flows from one step to the next,  all volume must be accounted for, including wastage, new inputs and growth via processing.

“Alluvial diagrams are a type of flow diagram originally developed to represent changes in network structure over time. In allusion to both their visual appearance and their emphasis on flow, alluvial diagrams are named after alluvial fans that are naturally formed by the soil deposited from streaming water.” https://en.wikipedia.org/wiki/Alluvial_diagram

Cory Brunson also had some definitions..

  • An axis is a dimension (variable) along which the data are vertically grouped at a fixed horizontal position. The diagram above uses three categorical axes: ClassSex, and Age.
  • The groups at each axis are depicted as opaque blocks called strata. For example, the Class axis contains four strata: 1st2nd3rd, and Crew.
  • Horizontal (x-) splines called alluvia span the width of the diagram. In this diagram, each alluvium corresponds to a fixed value of each axis variable, indicated by its vertical position at the axis, as well as of the Survived variable, indicated by its fill color.
  • The segments of the alluvia between pairs of adjacent axes are flows.
  • The alluvia intersect the strata at lodes. The lodes are not visualized in the above diagram, but they can be inferred as filled rectangles extending the flows through the strata at each end of the diagram or connecting the flows on either side of the center stratum.”