Build a Multi‑Layer Perceptron with Keras: Step‑by‑Step Guide
This tutorial walks through using Keras to create, compile, train, and evaluate a multi‑layer perceptron for image classification on the Fashion MNIST dataset, covering data loading, model construction with the Sequential API, hyperparameter choices, and prediction of new samples.
Using Keras to Implement a Multi‑Layer Perceptron
Keras is a high‑level deep‑learning API that simplifies building, training, evaluating, and deploying neural networks. It was released as an open‑source project in March 2015 by François Chollet and supports multiple backends such as TensorFlow, Microsoft Cognitive Toolkit (CNTK), and Theano, making it a multibackend library.
Since late 2016, Keras can also run on Apache MXNet, Apple Core ML, JavaScript/TypeScript (in browsers), and PlaidML, which extends GPU support beyond Nvidia. TensorFlow now bundles its own implementation, tf.keras , which we use in this guide, though the code remains portable to other Keras backends with minimal changes.
Other popular deep‑learning frameworks include PyTorch, whose API is similar to Keras because both draw inspiration from scikit‑learn and Chainer. TensorFlow 2 adopts Keras as its official high‑level API, simplifying model development.
<code># import TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
</code>Building an Image Classifier with the Sequential API
We start by loading the Fashion MNIST dataset, which contains 70,000 grayscale images of size 28×28 pixels across 10 fashion categories.
Loading the dataset with Keras
<code># load dataset
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
</code>Each image is a 28×28 array of uint8 values (0‑255). We inspect the shape and datatype:
<code># training set shape
X_train_full.shape
</code>(60000, 28, 28)
<code># training set dtype
X_train_full.dtype
</code>dtype('uint8')
We split the original training set into a validation set (5,000 images) and scale pixel values to the 0‑1 range:
<code># create validation set
X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.
</code>The class names are:
<code>class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
</code>Sample images from the dataset are shown below:
Creating the model with the Sequential API
<code>model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))
</code>Explanation of each layer:
The first line creates a simple sequential model, which stacks layers in order.
The Flatten layer reshapes each 28×28 image into a 1‑D vector.
The first Dense layer adds 300 neurons with ReLU activation.
The second Dense layer adds 100 neurons, also with ReLU.
The final Dense layer uses softmax activation and 10 neurons, one for each class.
Alternatively, the model can be built by passing a list of layers:
<code>keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28, 28]),
keras.layers.Dense(300, activation="relu"),
keras.layers.Dense(100, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
</code>Calling model.summary() displays each layer’s name, output shape, and parameter count. The first hidden layer alone has 784×300 + 300 = 235,500 trainable parameters.
<code>model.summary()
</code>We can inspect individual layers and their weights:
<code># get the name of the first hidden layer
hidden1 = model.layers[1]
hidden1.name
# retrieve weights and biases
weights, biases = hidden1.get_weights()
weights.shape
biases.shape
</code>Compiling the model
<code>model.compile(
loss="sparse_categorical_crossentropy",
optimizer="sgd",
metrics=["accuracy"]
)
</code>We use sparse categorical cross‑entropy because labels are integer class indices. The optimizer is stochastic gradient descent (SGD); its learning rate can be adjusted via keras.optimizers.SGD(lr=...) . Accuracy is tracked as an additional metric.
Training and evaluating the model
<code># train the model
history = model.fit(
X_train, y_train,
epochs=30,
validation_data=(X_valid, y_valid)
)
</code>During training, Keras reports loss and accuracy for both training and validation sets. In this example, validation accuracy reaches about 89 % after 30 epochs, indicating limited over‑fitting.
Making predictions with the trained model
<code># predict on new samples (using first three test images)
X_new = X_test[:3]
y_proba = model.predict(X_new)
y_proba.round(2)
</code>The output shows class‑probability vectors; the highest‑probability class can be obtained with model.predict_classes(X_new) (or np.argmax(y_proba, axis=1) ). The predictions for the three samples match the true labels.
<code>y_pred = model.predict_classes(X_new)
np.array(class_names)[y_pred]
</code>True labels:
<code>y_new = y_test[:3]
</code>Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.