Artificial Intelligence 10 min read

Analyzing a Trained Neural Network: Visualizing Hidden Layers and Understanding Its Limitations

This article walks through an interactive exploration of a simple two‑hidden‑layer neural network, showing how real‑time visualizations reveal its learned representations, accuracy limits, and why constrained training leads to over‑confident yet unintelligent predictions before introducing backpropagation.

Cognitive Technology Team

Apr 12, 2025

Analyzing a Trained Neural Network: Visualizing Hidden Layers and Understanding Its Limitations

In the previous article we learned how to train a neural network using gradient descent to minimize the cost function; the next article will dive into computing the related gradients, but first we take a short interlude to see the overall results of the training process.

The whole course will focus on analyzing the trained network to understand exactly what it does and why, and before reading you are encouraged to experiment yourself to discover the network's behavior.

Therefore we re‑introduce the digit‑drawing demo from the first lesson, this time updating all neurons in real time so you can immediately feel how input pixels affect the hidden layers and ultimately the output.

Spend some time experimenting; after you gain a better intuition of the system’s actual operation we will discuss the underlying principles.

Analyze Network

The network we built has two hidden layers with 16 neurons each, a design chosen mainly for aesthetic reasons; it performs reasonably well, correctly classifying about 96% of new images. If you look at some of its mistakes you might feel a bit forgiving.

By tweaking and optimizing the hidden‑layer structure you can raise accuracy to roughly 98%, which is already quite good, though modern networks with more sophisticated techniques can reach up to 99.75%.

Considering how difficult the original task is, it is remarkable that any network can perform so well on unseen images without being explicitly told what patterns to look for.

What are these layers doing?

When I originally designed the network I imagined the second layer would detect edge features and the third layer would capture local parts that compose digits. The question is whether the network actually learned those functions.

Recall from the first lesson that we could visualize the weights of the second‑layer neurons as the pixel patterns they focus on.

Now, when we look at the weight transformations from the first layer to the next, they do not capture the dispersed edge features we expected; instead they appear almost random, with only loose patterns that fall far short of the anticipated structure.

In the 13,002‑dimensional space of weights and biases the network seems to have settled in a comfortable local minimum. This minimum classifies most images correctly but does not capture more universal patterns.

To truly understand this, look at what happens when you feed a random image into the network.

If the system were intelligent you would expect it to be uncertain about random noise—perhaps not activating any of the ten output neurons or activating them uniformly. Instead, it confidently produces absurd answers, as sure about a noisy image of a “5” as it is about a genuine handwritten “5”.

This shows that although the network can recognize digits, it has no understanding of how to generate them.

The phenomenon largely stems from the overly constrained training environment: from the network’s perspective the entire universe consists of clearly defined, stationary digits centered in a small grid, and the loss function never penalizes over‑confidence.

You might wonder why we introduced edge detection as a motivation if the network never truly learns it. It is not the ultimate goal; it is merely a starting point.

This is an old technique from the 1980s‑1990s. Before diving into more modern variants you need to understand this method, which can still solve interesting problems, even though the deeper you look, the less “intelligent” the hidden layers appear.

Returning to the drawing demo: clicking any neuron in the second layer reveals its weight grid, which detects strange spot‑like shapes rather than the edges we hoped for.

Even cooler, while you draw a digit and watch a neuron's weight grid, the weights are “revealed” and the neuron's value changes in real time.

An interesting challenge is to draw a “3” and gradually morph it into an “8”. At some point the network’s prediction will change; observe which second‑layer neurons change and why they react when the shape transforms.

Spending time playing with the network will expose many edge cases: confusion, wrong answers, and behavior that does not match our expectations of intelligence. The technique is fascinating but far from perfect.

Also consider the limited training setup: every digit is a specific size and centered. If you present a digit that is too large, too small, or off‑center, the network will become confused.

Our current learning algorithm cannot transfer a pattern learned in one grid region to another, nor can it infer scaling; it does not exploit the adjacency of pixels at all.

If you think about how to modify the network architecture for more flexible learning—such as enabling patterns learned in one part of an image to naturally migrate to other parts—you will gain insight into more modern variants, especially convolutional neural networks.

But before that, it is time to study the main engine of neural‑network training: backpropagation.

Translated from: https://www.3blue1brown.com/lessons/neural-network-analysis

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Deep Learning Neural Networks visualization model analysis Backpropagation hidden layers

Written by

Cognitive Technology Team

Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.