Template Notebook for Building Machine Learning Models with Scikit-learn
This notebook provides ready‑to‑use Python code templates for ten common machine‑learning algorithms—including linear regression, logistic regression, decision trees, Naïve Bayes, SVM, K‑Nearest Neighbors, K‑Means, Random Forest, PCA, and Gradient Boosting—showing how to import, train, evaluate, and predict with scikit‑learn.
This notebook contains code templates for creating the main machine‑learning algorithms using scikit‑learn. By adjusting parameters, supplying data, training the model, and making predictions, users can quickly build and evaluate models.
1. Linear Regression
Import the linear_model module, create training and test subsets, instantiate a LinearRegression object, fit the model, evaluate its score, print coefficients and intercept, and make predictions.
# Import modules
from sklearn import linear_model
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted_variable
x_test = test_dataset_precictor_variables
# Create linear regression object
linear = linear_model.LinearRegression()
# Train the model with training data and check the score
linear.fit(x_train, y_train)
linear.score(x_train, y_train)
# Collect coefficients
print('Coefficient:
', linear.coef_)
print('Intercept:
', linear.intercept_)
# Make predictions
predicted_values = linear.predict(x_test)2. Logistic Regression
Replace LinearRegression with LogisticRegression, then fit, score, and predict similarly.
# Import modules
from sklearn.linear_model import LogisticRegression
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted_variable
x_test = test_dataset_precictor_variables
# Create logistic regression object
model = LogisticRegression()
# Train the model with training data and checking the score
model.fit(x_train, y_train)
model.score(x_train, y_train)
# Collect coefficients
print('Coefficient:
', model.coef_)
print('Intercept:
', model.intercept_)
# Make predictions
predicted_vaues = model.predict(x_teste)3. Decision Tree
Switch to DecisionTreeRegressor or DecisionTreeClassifier, fit, score, and predict.
# Import modules
from sklearn import tree
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted_variable
x_test = test_dataset_precictor_variables
# Create Decision Tree Regressor Object
model = tree.DecisionTreeRegressor()
# Create Decision Tree Classifier Object
model = tree.DecisionTreeClassifier()
# Train the model with training data and checking the score
model.fit(x_train, y_train)
model.score(x_train, y_train)
# Make predictions
predicted_values = model.predict(x_test)4. Naïve Bayes
Use GaussianNB for classification.
# Import modules
from sklearn.naive_bayes import GaussianNB
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted variable
x_test = test_dataset_precictor_variables
# Create GaussianNB object
model = GaussianNB()
# Train the model with training data
model.fit(x_train, y_train)
# Make predictions
predicted_values = model.predict(x_test)5. Support Vector Machine
Instantiate an SVC (or SVR) object, fit, score, and predict.
# Import modules
from sklearn import svm
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted variable
x_test = test_dataset_precictor_variables
# Create SVM Classifier object
model = svm.svc()
# Train the model with training data and checking the score
model.fit(x_train, y_train)
model.score(x_train, y_train)
# Make predictions
predicted_values = model.predict(x_test)6. K‑Nearest Neighbors
Adjust the n_neighbors hyper‑parameter, fit, and predict.
# Import modules
from sklearn.neighbors import KNeighborsClassifier
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted variable
x_test = test_dataset_precictor_variables
# Create KNeighbors Classifier Objects
KNeighborsClassifier(n_neighbors = 6) # default value = 5
# Train the model with training data
model.fit(x_train, y_train)
# Make predictions
predicted_values = model.predict(x_test)7. K‑Means Clustering
Define number of clusters, fit on training data, and predict cluster assignments.
# Import modules
from sklearn.cluster import KMeans
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted variable
x_test = test_dataset_precictor_variables
# Create KMeans objects
k_means = KMeans(n_clusters = 3, random_state = 0)
# Train the model with training data
model.fit(x_train)
# Make predictions
predicted_values = model.predict(x_test)8. Random Forest
Instantiate RandomForestClassifier, fit on training data, and predict.
# Import modules
from sklearn.ensemble import RandomForestClassifier
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted variable
x_test = test_dataset_precictor_variables
# Create Random Forest Classifier objects
model = RandomForestClassifier()
# Train the model with training data
model.fit(x_train, x_test)
# Make predictions
predicted_values = model.predict(x_test)9. Dimensionality Reduction
Use PCA or FactorAnalysis to reduce feature space before training.
# Import modules
from sklearn import decomposition
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted variable
x_test = test_dataset_precictor_variables
# Creating PCA decomposition object
pca = decomposition.PCA(n_components = k)
# Creating Factor analysis decomposition object
fa = decomposition.FactorAnalysis()
# Reduce the size of the training set using PCA
reduced_train = pca.fit_transform(train)
# Reduce the size of the test set using PCA
reduced_test = pca.transform(test)10. Gradient Boosting and AdaBoost
Instantiate GradientBoostingClassifier (or AdaBoost), fit, and predict.
# Import modules
from sklearn.ensemble import GradientBoostingClassifier
# Create training and test subsets
x_train = train_dataset_predictor_variables
y_train = train_dataset_predicted variable
x_test = test_dataset_precictor_variables
# Creating Gradient Boosting Classifier object
model = GradientBoostingClassifier(n_estimators = 100, learning_rate = 1.0, max_depth = 1, random_state = 0)
# Training the model with training data
model.fit(x_train, x_test)
# Make predictions
predicted_values = model.predict(x_test)The workflow for each algorithm involves defining a business problem, preprocessing data, training the model, tuning hyper‑parameters, validating results, and iterating until satisfactory accuracy is achieved.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
