KDnuggets 2016 Poll: Top Algorithms Used by Data Scientists – Usage Trends and Industry vs. Academia Analysis

The KDnuggets 2016 poll of 844 data scientists reveals the most popular algorithms, shifts since 2011, differences in usage across employment sectors, regional participation, and an industry‑academic affinity metric, highlighting a rise in boosting, text mining, visualization, and deep learning while noting declines in association rules and uplift modeling.

Architects Research Society
Architects Research Society
Architects Research Society
KDnuggets 2016 Poll: Top Algorithms Used by Data Scientists – Usage Trends and Industry vs. Academia Analysis

The latest KDnuggets poll asked 844 data scientists which methods or algorithms they used in the past 12 months for real data‑science applications. The results show the top 10 algorithms by share of voters and a notable increase in the average number of algorithms used per respondent (8.1).

Fig. 1: Top 10 algorithms used by Data Scientists.

Compared with the 2011 poll, the core methods (Regression, Clustering, Decision Trees/Rules, Visualization) remain dominant, while the biggest relative gains are seen in Boosting (+40%), Text Mining (+30%), Visualization (+27%), Time‑series/Sequence analysis (+25%), Anomaly/Deviation detection (+19%), Ensemble methods (+19%), SVM (+18%), and Regression (+16%).

Boosting , up 40% to 32.8% share in 2016

Text Mining , up 30% to 35.9%

Visualization , up 27% to 48.7%

Time series/Sequence analysis , up 25% to 37.0%

Anomaly/Deviation detection , up 19% to 19.5%

Ensemble methods , up 19% to 33.6%

SVM , up 18% to 33.6%

Regression , up 16% to 67.1%

Newly popular options in 2016 include K‑nearest neighbors (46%), PCA (43%), Random Forests (38%), Optimization (24%), Neural networks – Deep Learning (19%), and Singular Value Decomposition (16%).

Association rules declined 47% to 15.3%

Uplift modeling declined 36% to 3.1%

Factor Analysis declined 24% to 14.2%

Survival Analysis declined 15% to 7.9%

Table 1: Algorithm usage by Employment Type

Employment Type

% Voters

Avg Num Algorithms Used

% Used Super‑vised

% Used Unsuper‑vised

% Used Meta

% Used Other Methods

Industry

59%

8.4

94%

81%

55%

83%

Government/Non‑profit

4.1%

9.5

91%

89%

49%

89%

Student

16%

8.1

94%

76%

47%

77%

Academia

12%

7.2

95%

81%

44%

77%

All

8.3

94%

82%

48%

81%

Almost everyone uses supervised learning algorithms. Industry data scientists employ a broader variety of methods and are more likely to use meta‑algorithms, while government/non‑profit scientists favor visualization, PCA, and time‑series. Academic researchers lean toward PCA and deep learning, and students perform more text mining and deep learning.

Table 2: Top 10 Algorithms + Deep Learning usage by Employment Type

Algorithm

Industry

Government/Non‑profit

Academia

Student

All

Regression

71%

63%

51%

64%

67%

Clustering

58%

63%

51%

58%

57%

Decision Trees/Rules

59%

63%

38%

57%

55%

Visualization

55%

71%

28%

47%

49%

K‑NN

46%

54%

48%

47%

46%

PCA

43%

57%

48%

40%

43%

Statistics

47%

49%

37%

36%

43%

Random Forests

40%

40%

29%

36%

38%

Time series

42%

54%

26%

24%

37%

Text Mining

36%

40%

33%

38%

36%

Deep Learning

18%

9%

24%

19%

19%

Algorithm bias for a specific employment type is computed as Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All)‑1. The bias plot (Fig. 2) shows that industry data scientists are more likely to use Regression, Visualization, Statistics, Random Forests, and Time Series, while academia leans toward PCA and Deep Learning.

Fig. 2: Algorithm usage bias by Employment.

Regional participation mirrors overall KDnuggets traffic: US/Canada (40%), Europe (32%), Asia (18%), Latin America (5%), Africa/Middle East (3.4%), Australia/NZ (2.2%).

Affinity of an algorithm to Industry/Government versus Academia/Students is calculated as (N(Alg,Ind_Gov)/N(Alg,Aca_Stu)) / (N(Ind_Gov)/N(Aca_Stu)) ‑ 1. Values near 0 indicate equal use; positive values denote “industrial” algorithms, negative values denote “academic” ones.

The most “industrial” algorithms are Uplift modeling (2.01), Anomaly Detection (1.61), Survival Analysis (1.39), Factor Analysis (0.83), Time series/Sequences (0.69), and Association Rules (0.5). Despite its high industrial affinity, uplift modeling is used by only 3.1% of respondents.

The most “academic” algorithms are Neural networks – regular (‑0.35), Naive Bayes (‑0.35), SVM (‑0.24), Deep Learning (‑0.19), and EM (‑0.17).

Fig. 3. KDnuggets Poll: Top Algorithms used by Data Scientists – Industry vs Academia

Table 3: KDnuggets 2016 Poll – Algorithms Used by Data Scientists (summary)

N

Algorithm

Type

2016 % used

2011 % used

% Change

Industry Affinity

1

Regression

S

67%

58%

16%

0.21

2

Clustering

U

57%

52%

8.7%

0.05

3

Decision Trees/Rules

S

55%

60%

-7.3%

0.21

4

Visualization

Z

49%

38%

27%

0.44

5

K‑nearest neighbors

S

46%

0.32

6

PCA

U

43%

0.02

7

Statistics

Z

43%

48%

-11.0%

1.39

8

Random Forests

S

38%

0.22

9

Time series/Sequence analysis

Z

37%

30%

25.0%

0.69

10

Text Mining

Z

36%

28%

29.8%

0.01

11

Ensemble methods

M

34%

28%

18.9%

-0.17

12

SVM

S

34%

29%

17.6%

-0.24

13

Boosting

M

33%

23%

40%

0.24

14

Neural networks – regular

S

24%

27%

-10.5%

-0.35

15

Optimization

Z

24%

0.07

16

Naive Bayes

S

24%

22%

8.9%

-0.02

17

Bagging

M

22%

20%

8.8%

0.02

18

Anomaly/Deviation detection

Z

20%

16%

19%

1.61

19

Neural networks – Deep Learning

S

19%

-0.35

20

Singular Value Decomposition

U

16%

0.29

21

Association rules

Z

15%

29%

-47%

0.50

22

Graph / Link / Social Network Analysis

Z

15%

14%

8.0%

-0.08

23

Factor Analysis

U

14%

19%

-23.8%

0.14

24

Bayesian networks

S

13%

-0.10

25

Genetic algorithms

Z

8.8%

9.3%

-6.0%

0.83

26

Survival Analysis

Z

7.9%

9.3%

-14.9%

-0.15

27

EM

U

6.6%

-0.19

28

Other methods

Z

4.6%

-0.06

29

Uplift modeling

S

3.1%

4.8%

-36.1%

2.01

This comprehensive poll provides a snapshot of current data‑science practice, showing a shift toward more advanced techniques such as boosting and deep learning, while traditional association‑rule mining continues to decline.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data Sciencesurveyalgorithm usageindustry vs academia
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.