Artificial Intelligence 18 min read

Why Static Graphs Outperform Dynamic Graphs in AutoDiff: A Deep Dive

This article explains the fundamental differences between static and dynamic computation graphs, compares their memory and performance characteristics, shows how automatic differentiation works in each paradigm, and provides a step‑by‑step implementation of a toy static‑graph AutoDiff engine with Python code examples.

Baidu Geek Talk

Feb 5, 2024

Why Static Graphs Outperform Dynamic Graphs in AutoDiff: A Deep Dive

Static vs. Dynamic Graphs

Static graphs (e.g., TensorFlow, Paddle Fluid) use declarative programming: the computation graph is constructed once before any data is fed, and the same graph is executed repeatedly. Dynamic graphs (e.g., PyTorch, Paddle) use imperative programming: the graph is built on‑the‑fly during each forward pass. Dynamic graphs are easier to prototype but typically consume more GPU memory because the system cannot predict which intermediate tensors will become obsolete. They also require rebuilding the graph for every iteration, which reduces computational efficiency and makes deployment on embedded devices difficult; conversion to a static format such as ONNX or TensorRT is usually required.

Dynamic‑Graph AutoDiff

In a dynamic graph the forward and backward computations happen concurrently. During the forward pass a forward cache stores intermediate results, while a backward cache records gradients that can be computed immediately. When output.backward() is called, the runtime traverses the graph, stitches together the cached gradients, and produces the final gradients for each leaf node.

Static‑Graph AutoDiff

Static graphs construct both the forward sub‑graph and the backward sub‑graph during the build phase. Placeholders represent inputs that are fed at runtime, and any node (forward value or gradient) can be fetched via sess.run(). The following TensorFlow example builds a simple graph, computes the output, and obtains the gradients with respect to the inputs:

import tensorflow as tf
X1 = tf.placeholder(tf.float32, shape=(1,), name="X1")
X2 = tf.placeholder(tf.float32, shape=(1,), name="X2")
h1 = tf.multiply(X1, X2)
h2 = tf.add(h1, X1)
output = tf.div(h2, X2)
grad = tf.gradients(output, [X1, X2])
feed_dict = {"X1": 0.6, "X2": 0.2}
sess = tf.Session()
output_v = sess.run(output, feed_dict)
grad_v = sess.run(grad, feed_dict)

Because the static graph knows the entire topology beforehand, it can perform memory‑usage optimizations, node fusion, and parallel execution planning, resulting in higher computational efficiency and lower memory consumption. Debugging is harder because ordinary Python print statements cannot be inserted; TensorFlow provides tf.Print nodes or fetching intermediate tensors via the fetch_list argument.

Control Flow in Static Graphs

All possible branches must be declared before graph construction. Conditional logic therefore requires building sub‑graphs for every branch. The following C‑style pseudo‑code illustrates this requirement:

if (X > 2) {
    return X * X3;
} else {
    return X4 - X;
}

Toy Static‑Graph AutoDiff Engine

The minimal AutoDiff framework is open‑sourced at https://github.com/FesianXu/ToyAutoDiff. Its core consists of two classes:

Node : stores a list of input nodes, an optional constant attribute, and a name. It represents a vertex in the directed acyclic graph (DAG).

Op (abstract): defines compute for forward evaluation and gradient for building the backward sub‑graph.

Example implementation of Node:

class Node(object):
    def __init__(self):
        self.inputs = []
        self.op = None
        self.const_attr = None
        self.name = ""
    def __add__(self, other):
        if isinstance(other, Node):
            new_node = add_op(self, other)
        else:
            new_node = add_byconst_op(self, other)
        return new_node
    def __mul__(self, other):
        if isinstance(other, Node):
            new_node = mul_op(self, other)
        else:
            new_node = mul_byconst_op(self, other)
        return new_node
    __radd_ = __add_
    __rmul_ = __mul_
    def __str__(self):
        return self.name
    __repr_ = __str_

Abstract Op class:

class Op(object):
    def __call__(self):
        new_node = Node()
        new_node.op = self
        return new_node
    def compute(self, node, input_vals):
        raise NotImplementedError
    def gradient(self, node, output_grad):
        raise NotImplementedError

Concrete matrix‑multiplication operator:

class MatMulOp(Op):
    """Op to matrix multiply two nodes."""
    def __call__(self, node_A, node_B, trans_A=False, trans_B=False):
        new_node = Op.__call__(self)
        new_node.matmul_attr_trans_A = trans_A
        new_node.matmul_attr_trans_B = trans_B
        new_node.inputs = [node_A, node_B]
        new_node.name = "MatMul(%s,%s,%s,%s)" % (node_A.name, node_B.name, str(trans_A), str(trans_B))
        return new_node
    def compute(self, node, input_vals):
        assert len(input_vals) == 2
        assert isinstance(input_vals[0], np.ndarray) and isinstance(input_vals[1], np.ndarray)
        return np.dot(input_vals[0], input_vals[1])
    def gradient(self, node, output_grad):
        """If Y = A B, then dA = dY B^T, dB = A^T dY"""
        return [matmul_op(output_grad, transpose_op(node.inputs[1])),
                matmul_op(transpose_op(node.inputs[0]), output_grad)]

Gradient computation traverses the graph in reverse topological order, invoking each node’s gradient method and accumulating results:

def gradients(output_node, node_list):
    node_to_output_grads = {output_node: [oneslike_op(output_node)]}
    reverse_topo = reversed(find_topo_sort([output_node]))
    for idx, node in enumerate(reverse_topo):
        if idx == 0:
            grads = node.op.gradient(node, oneslike_op(output_node))
        else:
            grads = node.op.gradient(node, node_to_output_grads[node])
        if grads is None:
            continue
        for i, g in enumerate(grads):
            inp = node.inputs[i]
            if inp in node_to_output_grads:
                node_to_output_grads[inp] += g
            else:
                node_to_output_grads[inp] = g
    return [node_to_output_grads[n] for n in node_list]

def find_topo_sort(node_list):
    visited = set()
    order = []
    for n in node_list:
        topo_sort_dfs(n, visited, order)
    return order

def topo_sort_dfs(node, visited, order):
    if node in visited:
        return
    visited.add(node)
    for inp in node.inputs:
        topo_sort_dfs(inp, visited, order)
    order.append(node)

The Executor evaluates a list of target nodes by performing a topological traversal and calling each node’s op.compute method:

class Executor:
    """Executor computes values for a given subset of nodes in a computation graph."""
    def __init__(self, eval_node_list):
        self.eval_node_list = eval_node_list
    def run(self, feed_dict):
        node_to_val = dict(feed_dict)
        topo_order = find_topo_sort(self.eval_node_list)
        for node in topo_order:
            if node.inputs:
                input_vals = [node_to_val[inp] for inp in node.inputs]
                node_to_val[node] = node.op.compute(node=node, input_vals=input_vals)
        return [node_to_val[n] for n in self.eval_node_list]

With these components a functional static‑graph AutoDiff system is demonstrated. Possible future extensions include shape inference, layer abstractions, parameter initializers, optimizers, loss functions, and ultimately a toy TensorFlow‑like framework.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Deep Learning TensorFlow PyTorch Dynamic Graph Static Graph AutoDiff

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.