KDnuggets 2016 Poll: Top Algorithms Used by Data Scientists – Usage Trends and Industry vs. Academia Analysis
The KDnuggets 2016 poll of 844 data scientists reveals the most popular algorithms, shifts since 2011, differences in usage across employment sectors, regional participation, and an industry‑academic affinity metric, highlighting a rise in boosting, text mining, visualization, and deep learning while noting declines in association rules and uplift modeling.
The latest KDnuggets poll asked 844 data scientists which methods or algorithms they used in the past 12 months for real data‑science applications. The results show the top 10 algorithms by share of voters and a notable increase in the average number of algorithms used per respondent (8.1).
Fig. 1: Top 10 algorithms used by Data Scientists.
Compared with the 2011 poll, the core methods (Regression, Clustering, Decision Trees/Rules, Visualization) remain dominant, while the biggest relative gains are seen in Boosting (+40%), Text Mining (+30%), Visualization (+27%), Time‑series/Sequence analysis (+25%), Anomaly/Deviation detection (+19%), Ensemble methods (+19%), SVM (+18%), and Regression (+16%).
Boosting , up 40% to 32.8% share in 2016
Text Mining , up 30% to 35.9%
Visualization , up 27% to 48.7%
Time series/Sequence analysis , up 25% to 37.0%
Anomaly/Deviation detection , up 19% to 19.5%
Ensemble methods , up 19% to 33.6%
SVM , up 18% to 33.6%
Regression , up 16% to 67.1%
Newly popular options in 2016 include K‑nearest neighbors (46%), PCA (43%), Random Forests (38%), Optimization (24%), Neural networks – Deep Learning (19%), and Singular Value Decomposition (16%).
Association rules declined 47% to 15.3%
Uplift modeling declined 36% to 3.1%
Factor Analysis declined 24% to 14.2%
Survival Analysis declined 15% to 7.9%
Table 1: Algorithm usage by Employment Type
Employment Type
% Voters
Avg Num Algorithms Used
% Used Super‑vised
% Used Unsuper‑vised
% Used Meta
% Used Other Methods
Industry
59%
8.4
94%
81%
55%
83%
Government/Non‑profit
4.1%
9.5
91%
89%
49%
89%
Student
16%
8.1
94%
76%
47%
77%
Academia
12%
7.2
95%
81%
44%
77%
All
8.3
94%
82%
48%
81%
Almost everyone uses supervised learning algorithms. Industry data scientists employ a broader variety of methods and are more likely to use meta‑algorithms, while government/non‑profit scientists favor visualization, PCA, and time‑series. Academic researchers lean toward PCA and deep learning, and students perform more text mining and deep learning.
Table 2: Top 10 Algorithms + Deep Learning usage by Employment Type
Algorithm
Industry
Government/Non‑profit
Academia
Student
All
Regression
71%
63%
51%
64%
67%
Clustering
58%
63%
51%
58%
57%
Decision Trees/Rules
59%
63%
38%
57%
55%
Visualization
55%
71%
28%
47%
49%
K‑NN
46%
54%
48%
47%
46%
PCA
43%
57%
48%
40%
43%
Statistics
47%
49%
37%
36%
43%
Random Forests
40%
40%
29%
36%
38%
Time series
42%
54%
26%
24%
37%
Text Mining
36%
40%
33%
38%
36%
Deep Learning
18%
9%
24%
19%
19%
Algorithm bias for a specific employment type is computed as Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All)‑1. The bias plot (Fig. 2) shows that industry data scientists are more likely to use Regression, Visualization, Statistics, Random Forests, and Time Series, while academia leans toward PCA and Deep Learning.
Fig. 2: Algorithm usage bias by Employment.
Regional participation mirrors overall KDnuggets traffic: US/Canada (40%), Europe (32%), Asia (18%), Latin America (5%), Africa/Middle East (3.4%), Australia/NZ (2.2%).
Affinity of an algorithm to Industry/Government versus Academia/Students is calculated as (N(Alg,Ind_Gov)/N(Alg,Aca_Stu)) / (N(Ind_Gov)/N(Aca_Stu)) ‑ 1. Values near 0 indicate equal use; positive values denote “industrial” algorithms, negative values denote “academic” ones.
The most “industrial” algorithms are Uplift modeling (2.01), Anomaly Detection (1.61), Survival Analysis (1.39), Factor Analysis (0.83), Time series/Sequences (0.69), and Association Rules (0.5). Despite its high industrial affinity, uplift modeling is used by only 3.1% of respondents.
The most “academic” algorithms are Neural networks – regular (‑0.35), Naive Bayes (‑0.35), SVM (‑0.24), Deep Learning (‑0.19), and EM (‑0.17).
Fig. 3. KDnuggets Poll: Top Algorithms used by Data Scientists – Industry vs Academia
Table 3: KDnuggets 2016 Poll – Algorithms Used by Data Scientists (summary)
N
Algorithm
Type
2016 % used
2011 % used
% Change
Industry Affinity
1
Regression
S
67%
58%
16%
0.21
2
Clustering
U
57%
52%
8.7%
0.05
3
Decision Trees/Rules
S
55%
60%
-7.3%
0.21
4
Visualization
Z
49%
38%
27%
0.44
5
K‑nearest neighbors
S
46%
0.32
6
PCA
U
43%
0.02
7
Statistics
Z
43%
48%
-11.0%
1.39
8
Random Forests
S
38%
0.22
9
Time series/Sequence analysis
Z
37%
30%
25.0%
0.69
10
Text Mining
Z
36%
28%
29.8%
0.01
11
Ensemble methods
M
34%
28%
18.9%
-0.17
12
SVM
S
34%
29%
17.6%
-0.24
13
Boosting
M
33%
23%
40%
0.24
14
Neural networks – regular
S
24%
27%
-10.5%
-0.35
15
Optimization
Z
24%
0.07
16
Naive Bayes
S
24%
22%
8.9%
-0.02
17
Bagging
M
22%
20%
8.8%
0.02
18
Anomaly/Deviation detection
Z
20%
16%
19%
1.61
19
Neural networks – Deep Learning
S
19%
-0.35
20
Singular Value Decomposition
U
16%
0.29
21
Association rules
Z
15%
29%
-47%
0.50
22
Graph / Link / Social Network Analysis
Z
15%
14%
8.0%
-0.08
23
Factor Analysis
U
14%
19%
-23.8%
0.14
24
Bayesian networks
S
13%
-0.10
25
Genetic algorithms
Z
8.8%
9.3%
-6.0%
0.83
26
Survival Analysis
Z
7.9%
9.3%
-14.9%
-0.15
27
EM
U
6.6%
-0.19
28
Other methods
Z
4.6%
-0.06
29
Uplift modeling
S
3.1%
4.8%
-36.1%
2.01
This comprehensive poll provides a snapshot of current data‑science practice, showing a shift toward more advanced techniques such as boosting and deep learning, while traditional association‑rule mining continues to decline.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
