Artificial Intelligence 17 min read

Core Concepts and Relationships in Data Science: Big Data, Machine Learning, Data Mining, Deep Learning, and AI

This article examines six core data‑science concepts—Big Data, Machine Learning, Data Mining, Deep Learning, Artificial Intelligence, and Data Science itself—explaining their definitions, interrelationships, and how they fit together as pieces of a larger analytical puzzle.

Architects Research Society

Jul 10, 2020

Core Concepts and Relationships in Data Science: Big Data, Machine Learning, Data Mining, Deep Learning, and AI

By examining the relationships among several key concepts in the field of data science, we can test the challenges of data science; as we shall see, differing opinions on specific concepts are inevitable and must be considered.

There are virtually no articles on the web that compare and contrast data‑science terminology, and many people have written various pieces to convey their views, creating an overwhelming amount of opinion.

So, let me state plainly for those wondering if this is one of those posts: yes, it is.

Why another? Although many opinions partially define and compare these related terms, the fact is that most of the terminology is fluid and not universally agreed upon; exposing oneself to other viewpoints is one of the best ways to test and improve one’s own understanding.

Thus, even if I do not fully agree with every aspect of these terms, there is still something to be gained by examining the core concepts of data science, or at least what I consider core, and by outlining how they relate to each other as parts of a larger puzzle.

As an example of differing opinions, Gregory Piatetsky‑Shapiro of KDnuggets compiled a Venn diagram that outlines the relationships among the same data‑science terms we will consider. Readers are encouraged to compare this diagram with Drew Conway’s well‑known data‑science Venn diagram, as well as with my own discussion and the revised relationship diagram near the bottom of the post.

We will now process the same six core concepts described in the Venn diagram and provide insight into how they combine to form the data‑science puzzle, beginning with one of the most popular topics of the past decade.

Big Data

There are many articles defining big data, but in short, big data can be defined as datasets that are “beyond the capture, management, and processing capabilities of commonly used software tools.” The definition is deliberately vague yet accurate enough to capture its core characteristic.

To understand the remaining concepts, it helps to look at their search‑term popularity and N‑gram frequencies, which distinguish fact from hype. For the older concepts (1980‑2008) the N‑gram frequencies are shown above.

Recent Google Trends show two emerging terms, two terms with sustained growth, and one term gradually declining. Note that big data is not included in the quantitative analysis graphics.

Machine Learning

According to Tom Mitchell’s seminal work, machine learning “concerns the question of how to build computer programs that automatically improve with experience.” It is inherently interdisciplinary, drawing techniques from computer science, statistics, and artificial intelligence. The main artefacts of machine learning are algorithms that can automatically improve from experience and be applied across many domains.

There is little doubt that machine learning is a core aspect of data science. If the goal of data science is to extract insight from data, machine learning is the engine that automates that process. While it shares much with classical statistics—both use samples to infer and generalize—machine learning rarely focuses on descriptive analysis and instead serves as an intermediate step for prediction.

Machine learning is often equated with pattern recognition; however, I tend to avoid the term because pattern recognition can imply a broader set of processes than machine learning actually encompasses.

Machine learning has a complex relationship with data mining.

Data Mining

Fayyad, Piatetsky‑Shapiro, and Smyth define data mining as “the application of specific algorithms to extract patterns from data.” This emphasizes the use of algorithms rather than the algorithms themselves. We can define the relationship between machine learning and data mining as follows: data mining is a process in which machine‑learning algorithms are used as tools to extract valuable patterns stored in a dataset.

Data mining, as a sister term to machine learning, is also crucial to data science. Before the term “data science” became popular, data mining enjoyed greater success as a Google search term. Over time, data mining has been split between machine learning and data science itself. If we view data mining as a process, it makes sense to consider data science as a superset that includes data mining.

Deep Learning

Deep learning is a relatively new term that existed before its recent surge in online searches. Because of its incredible success across many fields, research and industry are booming. Deep learning is the process of applying deep neural‑network techniques (networks with multiple hidden layers) to solve problems. Like data mining, deep learning is a process that employs a specific type of machine‑learning algorithm.

Several important points should be remembered about deep learning:

• It is not a universal “silver bullet” that solves every problem.

• It is not the legendary “master algorithm”; deep learning does not replace all other machine‑learning or data‑science techniques, at least not yet.

• Expectation tempering is necessary—although deep learning has made great strides in classification tasks (especially computer vision, NLP, reinforcement learning, etc.), it cannot yet handle extremely complex problems such as “solving world peace.”

• Deep learning and artificial intelligence are not synonyms.

• Deep learning can greatly assist data science as an auxiliary process and set of tools, making it a valuable complement to the field.

Artificial Intelligence

Most people find a precise definition of artificial intelligence elusive, even when presented with broad definitions. I am not an AI researcher, so my answer may differ from those in other fields. After years of philosophical reflection, I conclude that AI, at least as commonly imagined, does not truly exist as a concrete entity.

In my view, AI is a moving target, a shifting benchmark that never becomes fully attainable. Each time we claim an AI achievement, the achievement is re‑labelled as something else.

Where does AI fit into data science? I do not believe AI is a tangible object, yet many AI‑related areas—such as deep learning research—benefit data science. Computer vision, for example, clearly draws from AI concepts.

AI is likely the deepest‑pocketed R&D tool, even though it has not yet produced a standalone product that transforms the industry. The relationship between AI and data science is mediated by many intermediate steps that AI has helped develop and refine.

Data Science

After discussing the related concepts and their positions within data science, defining data science itself proves to be the toughest challenge. Data science is a multidisciplinary field that includes machine learning and other analytical processes, statistics and related mathematics, and increasingly high‑performance scientific computing, all aimed at extracting insight from data and telling stories with those insights.

Data science employs a wide variety of tools from many related domains. It is both a synonym for data mining and a superset that contains data mining.

Data science produces many different outcomes, but they all share the common goal of insight. It may mean something entirely different to each practitioner, and we have not even covered data acquisition, cleaning, or preprocessing yet—let alone what “data” actually is or whether it must be “big.”

My view of the data‑science puzzle aligns well with the Venn diagram at the top of the article and with Drew Conway’s diagram, though I would argue that Conway’s graphic refers more to data scientists than to data science itself.

Of course, this is not a complete, static picture; opinions evolve. For example, I once read that data mining was a sub‑field of business intelligence, a view that no longer seems valid.

If you feel strongly about any of the points raised, feel free to comment; otherwise, I hope this article either introduces newcomers to the data‑science puzzle or encourages readers to reflect on their own mental model of it.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

artificial-intelligence data mining deep learning data science

Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.