How Smaz2 Compresses LoRa Messages on Tiny Devices
This article explains the motivation, dictionary design, bigram table, encoding rules, and real‑world compression results of the Smaz2 library, a space‑optimized C/Python compressor for short LoRa messages on microcontrollers with less than 2 KB RAM.
Motivation
LoRa networks have extremely limited bandwidth, often requiring several seconds to transmit a single message. When LoRa is used for human‑readable text, a lightweight compression scheme can significantly improve channel utilization. The Smaz2 library targets very memory‑constrained devices such as ESP‑32 running MicroPython, keeping total RAM usage under 2 KB.
Word Dictionary
The compressor uses a fixed dictionary of 256 common English words. Short words (less than four bytes) are omitted because they are more efficiently encoded as bigrams. The full word list includes entries such as "that", "this", "with", "from", "your", "have", and many others.
"that", "this", "with", "from", "your", "have", "more", "will",
"home", "about", "page", "search", "free", "other", "information",
"time", "they", "site", "what", "which", "their", "news", "there",
"only", "when", "contact", "here", "business", "also", "help",
"view", "online", "first", "been", "would", "were", "services",
"some", "these", "click", "like", "service", "than", "find",
"price", "date", "back", "people", "list", "name", "just",
"over", "state", "year", "into", "email", "health", "world",
"next", "used", "work", "last", "most", "products", "music", "data",
"make", "them", "should", "product", "system", "post", "city",
"policy", "number", "such", "please", "available", "copyright",
"support", "message", "after", "best", "software", "then", "good",
"video", "well", "where", "info", "rights", "public", "books",
"high", "school", "through", "each", "links", "review", "years",
"order", "very", "privacy", "book", "items", "company", "read",
"group", "need", "many", "user", "said", "does", "under",
"general", "research", "university", "january", "mail", "full", "reviews",
"program", "life", "know", "games", "days", "management", "part",
"could", "great", "united", "hotel", "real", "item", "international",
"center", "ebay", "must", "store", "travel", "comments", "made",
"development", "report", "member", "details", "line", "terms",
"before", "hotels", "send", "right", "type", "because", "local",
"those", "using", "results", "office", "education", "national",
"design", "take", "posted", "internet", "address", "community",
"within", "states", "area", "want", "phone", "shipping", "reserved",
"subject", "between", "forum", "family", "long", "based", "code",
"show", "even", "black", "check", "special", "prices", "website",
"index", "being", "women", "much", "sign", "file", "link",
"open", "today", "technology", "south", "case", "project", "same",
"pages", "version", "section", "found", "sports", "house", "related",
"security", "both", "county", "american", "photo", "game", "members",
"power", "while", "care", "network", "down", "computer", "systems",
"three", "total", "place", "following", "download", "without",
"access", "think", "north", "resources", "current", "posts", "media",
"control", "water", "history", "pictures", "size", "personal",
"since", "including", "guide", "shop", "directory", "board",
"location", "change", "white", "text", "small", "rating", "rate",
"government"Bigram Table
If no matching word is found, the compressor falls back to a bigram table consisting of the 128 most frequent bigrams, occupying 256 bytes.
intherreheanonesorteattistenntartondalitseediseangoule
comeneriroderaioicliofasetvetasihamaecomceelllcaurla
chhidihofonsotacnarssoprrtsassusnoiltsemctgeloeebet
rnipeiepancpooldaadviunamutwimoshyoaiewowosfiepttmi
opiaweagsuiddoooirspplscaywaigeirylytuulivimabtyEncoding Rules
Byte values 128‑255 encode the ID of a bigram (0‑127).
Byte values 0 or 9‑127 represent themselves.
Byte value 6 is followed by a byte that gives the ID of a word to emit.
Byte value 7 works like 6 but also emits a trailing space after the word.
Byte value 8 works like 6 but emits a leading space before the word.
Byte values 1‑5 indicate that the next 1‑5 bytes are literal (verbatim) bytes.
This scheme only expands the output when literal bytes are used, which occurs rarely (e.g., for special or Unicode characters). For typical Latin‑letter natural‑language messages the algorithm never uses more space than the input and usually compresses words to fewer bytes.
Real‑World Compression Results
./smaz2 "The program is designed to work well with English text"Compression ratio: 44.44%
./smaz2 "As long as the messages are latin letters natural language messages with common statistical properties, the program will only seldom use more space than needed"Compression ratio: 54.72%
./smaz2 "Anche se in maniera meno efficiente, questo algoritmo di compressione è in grado di comprimere testi in altre lingue."Compression ratio: 66.95%
Implementation Details
The repository provides both a C implementation and a Python implementation. Both are heavily optimized for minimal RAM usage and small executable size rather than raw speed, because the typical use case involves occasional transmission of short messages. The algorithm scans the dictionary/table at each string position, which is computationally expensive but acceptable for the intended low‑throughput scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Tech Hub
Sharing cutting-edge internet technologies and practical AI resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
