Unlocking Tree‑Structured Data: A Deep Dive into Recursive Neural Networks and BPTS
Recursive Neural Networks (RNN) extend deep learning to tree and graph structures, using Back‑Propagation Through Structure (BPTS) for training; this article explains their theory, forward and backward computations, implementation details, code snippets, and applications in natural language and scene parsing, while noting practical challenges.
Recursive Neural Networks Overview
Recursive Neural Networks (RNN) process data with hierarchical structures such as trees or graphs, which cannot be handled by standard recurrent networks that assume linear sequences.
By recursively combining child node vectors into a parent vector, the network encodes the whole structure into a single semantic vector. Similar meanings are mapped to nearby points, enabling tasks such as sentiment analysis where the vector of a subtree represents its polarity.
Forward Computation
For a binary node with child vectors C1 and C2 (each of dimension d), the parent vector P is computed by a fully‑connected layer with shared weight matrix W ∈ℝ^{d×2d} and bias b ∈ℝ^{d}: P = tanh( W · [C1 ; C2] + b ) The same parameters W and b are reused at every node, so the recursion proceeds until the root vector is obtained.
Training with Back‑Propagation Through Structure (BPTS)
BPTS propagates the error δ from a parent node to its children. If δ_parent is the error at the parent, the error transferred to each child is: δ_child = (Wᵀ · δ_parent) ⊙ f'(input) where ⊙ denotes element‑wise multiplication and f' is the derivative of the activation function (tanh). In matrix form, the gradient of the loss with respect to W at a given node is: ∇_W = δ_parent · [C1 ; C2]ᵀ The total gradient for the shared weight matrix is the sum of gradients over all nodes in the tree. The bias gradient is the sum of δ_parent over all nodes.
Parameter updates follow standard gradient descent:
W ← W - η · ∇_W
b ← b - η · ∇_bImplementation Details
The implementation is organized around a RecursiveLayer class. A simple TreeNode structure stores the vector for each node and references to its children.
The constructor initializes shared parameters W and b (randomly or from a given initializer). The forward method recursively concatenates child vectors, applies the linear transformation and tanh, and stores the resulting parent vector. The final root vector is accessible as self.root.
Key helper functions: concatenate(child_nodes) stacks child vectors into a single 2d -dimensional vector. calc_delta(node, upstream_delta) computes the error signal for a node using the BPTS formula. calc_gradient(node) accumulates gradients for W and b by traversing the tree in post‑order.
Gradient checking compares analytical gradients with finite‑difference approximations to verify correctness.
Applications
In natural‑language processing, a recursive network can act as a parser that builds a syntax tree for a sentence. Each node’s vector encodes the meaning of the corresponding phrase, enabling downstream tasks such as sentiment analysis.
Parsing can be performed by adding a scoring function score = U · [child1 ; child2], where U is a learned matrix. A greedy algorithm repeatedly merges the adjacent pair with the highest score until a single root remains.
The scoring function is trained with a max‑margin objective: L = max(0, 1 - score_correct + score_incorrect) Similar architectures can be applied to visual scene parsing, where objects are combined hierarchically.
Summary
Recursive neural networks share the same supervised learning framework as fully‑connected, convolutional, and recurrent networks: they minimize a loss defined over input‑output pairs (X, Y). When labeled data are unavailable, alternative learning paradigms such as reinforcement learning may be considered.
Learning Task‑Dependent Distributed Representations by Back‑Propagation Through Structure
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
