Impact with data science flows from empowered individuals who can make wise, well-informed decisions and actions. Does this sound like data science?

In a nutshell:

  • Data Science is about evidence-based decisions, action, and impact

  • The application and context of data is everything

  • … and this is inherently more focused on humans than computers

  • If you’re worried your job might get automated out of existence, prepare to think much bigger than an individual’s DS activities and technology-centric thinking.

Data is and should be boring

Data is the most boring part of being a data scientist, which is as it should be. Data is inherently passive. Science is the important part! Data scientists are trying to learn something, to make better decisions, to help effect change.

To make data science valuable to some domain of human endeavor, you need many resources. Data are the materials and software provides the tools. Quality data and software tooling certainly are critical, but that’s just the beginning of a long funnel that converts the data into impact. Most of the resources are actually human resources, and this seems commonly under-appreciated. In loose order of development, this funnel might be:

  • acquire interested people
  • acquire data
  • gather or develop tooling to work with the data (e.g. ETL, analytics utilities, reproducible test environments, models)
  • organize efforts effectively through project management

The last one is the meaty part of the work to be done!

  • validation of data
  • activation of data
  • modeling the data
  • training contributors
  • coordination between subject matter experts, data scientists, politicians, journalists, social leaders, etc.
  • setting valuable (and SMART) goals
  • creating valuable outcomes through model application (e.g. generating and testing hypotheses or predictions)
  • documenting and communicating the activities and outcomes
  • empowering future contributions through good reproducibility practices and advocacy

This is in stark contrast to the individually-focused workflows for data science that are ubiquitous. (See slide 18 in the presentation embedded below.)

The DIKW pyramid and the impact of data science on business outcomes

I like to think about the part of good data science practices in the context of the DIKW pyramid: Data -> Information -> Knowledge -> Wisdom

To progress along this hierarchy, we ideally need activities in this order (roughly, and not 1-to-1):

  • Collection
  • Governance
  • Enablement
  • Activation

Ultimately, without activation there can be no impact.

Ideally, there is a trustworthy, reliable process associated with each activity, which is aimed to reduce cost and risk associated with data science. Everything about moving down the hierarchy involves creating more trust, accountability, reliability, reproducibility, and resilience.

The deeper we go into this hierarchy, the more it exists in the minds of people and less it’s stored in computers. Beyond thin slices captured by reports or essays, or the prototypical facsimiles encoded into predictive models, most knowledge and (especially) wisdom that can be derived from good data science exist almost entirely outside of the digital realm. Data is inherently passive as a historical record. The data layer doesn’t speak for itself. Any other qualities we associate with our data (“this data will change the way we think about X”) are actually being projected onto it, aspirationally: those qualities require significant human activities to support the building of the layers that support it (“knowledge infrastructure” including systems of formal logic and subject matter expertise) to “mine” and derive insights from the data that are useful to humans.

A checklist for enablement and activation

Feel free to use this checklist in your own work, and let me know how it works for you (and whether you have any comments and suggestions).

This is the first of several posts about “data enablement”.