Artificial Intelligence 12 min read

What Is Data Science? Definitions, Work Processes, and Roles – Reflections on a Decade of Data Science and Future Visualization Tools

This article reviews a decade of data‑science growth, defines data science as a multidisciplinary field, outlines its four high‑level and fourteen low‑level work processes, categorises nine distinct data‑science roles, and discusses how these insights should shape the next generation of data‑visualisation and analysis tools.

Architects Research Society

Jul 1, 2023

What Is Data Science? Definitions, Work Processes, and Roles – Reflections on a Decade of Data Science and Future Visualization Tools

Data science has exploded over the past ten years, reshaping business and preparing the next generation of professionals, yet its rapid rise has left many ambiguities about how to extract actionable insights from massive datasets.

Motivated by personal career reflections and a desire to identify unmet needs for visual‑analysis tools, the author reviewed the research paper “Passing the Data Baton: A Retrospective Analysis of Data Science Work and Workers,” extracting key findings about what data science is, what data‑science work entails, and who the data‑science workers are.

The study also serves as a foundation for future research and tool development, aiming to address gaps that did not exist when the author first pursued advanced computer‑science research a decade ago.

What Is Data Science?

Data science means different things to different people. Some view it as the practical application of long‑standing statistical techniques, while others argue it also requires computational skills to scale those techniques to large datasets. A third perspective treats data science as a genuinely new discipline that combines statistics, computer science, and domain expertise to solve challenges unique to real‑world, large‑scale data.

“Data science is a multidisciplinary field that aims to learn new insights from real‑world data through the structured application of core statistical and computational techniques.”

This definition highlights the challenges data‑science practitioners face, especially when working with real rather than simulated data and when applying statistical and computational methods at scale.

What Is Data Science Work?

The authors distilled data‑science work into four high‑level phases—Preparation, Analysis, Deployment, and Communication—and fourteen lower‑level processes. Processes highlighted in red rely heavily on data visualisation, though visualisation is not limited to those steps.

By focusing on the specific analyses performed by data scientists (rather than all possible analyses), the framework narrows the research scope and aligns with industry standards such as the KDD (Knowledge Discovery in Databases) methodology, which has been extended over time.

Who Are Data Science Workers?

Meta‑analysis of twelve studies involving thousands of identified data scientists revealed nine distinct roles, spanning statistics, computer science, and domain expertise. These roles are fluid, overlapping, and reflect a growing specialization within data‑science teams, including emerging positions like data engineers and ML/AI engineers.

The diversity of roles explains communication gaps between data scientists and those seeking help, and underscores the need for tools that cater to varied responsibilities.

How Will This Change the Way We Build Visualization and Data‑Analysis Tools?

Understanding the definition of data science and the detailed work‑role framework helps create evidence‑based visualisation standards. By classifying users into the nine identified roles, tool designers can better target the specific tasks each role performs, avoiding one‑size‑fits‑all solutions.

The research also reveals a critical shortfall in current tools: most focus on visualising machine‑learning models and neglect other essential phases such as data preparation, deployment, and communication. This gap increases the overhead of data‑science work and hampers the ability of data scientists—regardless of role—to influence organisational decisions.

Addressing these gaps offers an opportunity to develop next‑generation visualisation and analysis platforms that support the full spectrum of data‑science activities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Analytics AI data science Data Visualization data science roles

Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.