Analysis of Google Quickdraw CNN‑RNN Model for Sketch Recognition
The article dissects Google’s Quickdraw sketch‑recognition model, detailing its 1‑D convolutional front‑end, Bi‑LSTM encoder, and softmax classifier, explaining the TFRecord‑based normalization and interpolation steps, why pooling harms accuracy, and how the massive dataset can fuel diverse sequential‑learning applications and product concepts.
Recently Google released the "Quickdraw" mini‑program, which quickly became a viral AI game. The service is powered by a neural‑network model trained on more than 50 million hand‑drawn sketches. This article examines the underlying model and its data‑processing pipeline.
The Quickdraw model is essentially a classification network. Input consists of stroke points (x, y) together with a start‑of‑stroke flag. The architecture stacks several 1‑D convolutional layers, feeds the result into a Bi‑LSTM layer, sums the outputs and finally applies a Softmax classifier. A diagram of the network is shown in the original article.
Data preprocessing is performed on the TFRecord files. The strokes are normalized and interpolated before being fed to the network. The normalization code (kept verbatim) is:
# 1. Size normalization.
lower = np.min(np_ink[:,0:2], axis=0)
upper = np.max(np_ink[:,0:2], axis=0)
scale = upper - lower
scale[scale == 0] = 1
np_ink[:,0:2] = (np_ink[:,0:2] - lower) / scale
# 2. Compute deltas.
np_ink[1:,0:2] -= np_ink[-1:0,0:2]
np_ink = np_ink[1:,:]The reasons for these steps are two‑fold: (1) similar to batch‑norm, normalization moves the data distribution to a region where gradients are larger; (2) the model cares about the trajectory of strokes rather than absolute size, so scaling removes irrelevant variance. Interpolation removes the influence of the starting coordinate, making drawings from different canvas locations comparable.
Convolutional layer details : multiple 1‑D conv layers are cascaded with a linear activation function; no pooling layers are used. Experiments showed that replacing the linear activation with ReLU drops accuracy to ~73 %, and adding a pooling layer (size = 4, stride = 4) further reduces it to ~70 %.
The author argues that pooling is less useful for sketch data because the input is already a high‑level abstraction (stroke sequences) and the subsequent RNN captures global features.
Broader considerations : Quickdraw was launched as a web demo in 2016 and later revived as a mini‑program, accumulating a large real‑world dataset. The same sketch data can be leveraged for other sequential classification tasks such as anomaly detection, handwriting recognition, speech recognition, and text classification.
Product ideas inspired by the dataset include:
AutoDraw – automatically converting doodles into polished artwork (already released by Google).
Story generation – creating a four‑panel comic from a sketch and generating a narrative using NLG techniques.
Sketch scoring – automatically evaluating creativity, technical quality, and completeness of a drawing.
Beyond applications, sketch data offers a window into how humans abstract objects and the world. Learning from these simple drawings could improve higher‑level image‑recognition models or even enhance machine reasoning by mimicking human abstraction.
References: https://tensorflow.juejin.im/tutorials/recurrent_quickdraw.html https://github.com/tensorflow/models/blob/master/tutorials/rnn/quickdraw/ https://www.jiqizhixin.com/articles/2017-09-12-5 https://juejin.im/post/5b559b76e51d45616f4596dd https://zhuanlan.zhihu.com/p/39059583
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.