Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance

This article shows how the compress‑fasttext Python library can shrink a 7 GB fastText word‑embedding model to about 21 MB—a 300‑fold reduction—while preserving almost the same accuracy on downstream NLP tasks, and explains the underlying compression techniques, usage examples, and evaluation results.

Code DAO
Code DAO
Code DAO
Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance

This article introduces fastText, a Facebook‑released method for generating word embeddings, and explains why its original pretrained models are large (e.g., the English model occupies 7 GB after extraction).

Using a Pre‑Compressed fastText Model

Install the compress-fasttext package (it mirrors fastText’s Gensim API) and load a compressed model directly from the web:

import compress_fasttext
small_model = compress_fasttext.models.CompressedFastTextKeyedVectors.load('https://github.com/avidale/compress-fasttext/releases/download/v0.0.4/cc.en.300.compressed.bin')

The model behaves like a dictionary mapping words to 300‑dimensional numpy vectors:

print(small_model['hello'])
# [ 1.847366e-01  6.326839e-03  4.439018e-03 ... -2.884310e-02]

Similarity between words can be measured with cosine similarity:

def cosine_sim(x, y):
    return sum(x * y) / (sum(x**2) * sum(y**2)) ** 0.5
print(cosine_sim(small_model['cat'], small_model['cat']))   # 1.0
print(cosine_sim(small_model['cat'], small_model['dog']))   # 0.6768642734684225
print(cosine_sim(small_model['cat'], small_model['car']))   # 0.18485135055040858

Using most_similar demonstrates that the compressed model still captures semantic relations (e.g., “Python” is close to “PHP”, “.NET”, and “Java”).

print(small_model.most_similar('Python'))
# [('PHP', 0.5253), ('.NET', 0.5027), ('Java', 0.4897), ...]

Integrating fastText Embeddings into a Classifier

FastText vectors can be fed to downstream models. The article provides a scikit‑learn pipeline that averages word vectors and trains a logistic regression classifier to distinguish edible from inedible items.

import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.base import BaseEstimator, TransformerMixin

class FastTextTransformer(BaseEstimator, TransformerMixin):
    """Convert texts into their mean fastText vectors"""
    def __init__(self, model):
        self.model = model
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return np.stack([
            np.mean([self.model[w] for w in text.split()], 0)
            for text in X
        ])

classifier = make_pipeline(
    FastTextTransformer(model=small_model),
    LogisticRegression()
).fit(
    ['banana', 'soup', 'burger', 'car', 'tree', 'city'],
    [1, 1, 1, 0, 0, 0]
)
print(classifier.predict(['jet', 'cake']))
# array([0, 1])

Compressing Your Own fastText Model

To create a custom compressed model, install the library with optional dependencies:

pip install compress-fasttext[full]

Load the original model (either a Facebook‑trained model or a Gensim‑format model) and run three lines of code to prune it:

from gensim.models.fasttext import load_facebook_model
big_model = load_facebook_model('path-to-original-model').wv

import compress_fasttext
small_model = compress_fasttext.prune_ft_freq(big_model, pq=True)
small_model.save('path-to-new-model')

Additional parameters let you control vocabulary size, n‑gram count, product‑quantization usage, and quantization dimensionality. Reducing vocabulary and dimensions lowers size but also slightly degrades accuracy.

small_model = compress_fasttext.prune_ft_freq(
    big_model,
    new_vocab_size=20000,   # number of words
    new_ngrams_size=100000, # number of character n‑grams
    pq=True,                # use product quantization
    qdim=100                # quantization dimensionality
)

How fastText Compression Works

fastText combines word vectors with character n‑gram vectors, averaging them. If a word is out‑of‑vocabulary, only its n‑grams contribute, enabling handling of misspellings and morphologically rich languages.

Pseudo‑code for embedding a word:

def embed(word, model):
    if word in model.vocab:
        result = model.vectors_vocab[word]
    else:
        result = zeros()
    n = 1
    for ngram in get_ngrams(word, model.min_n, model.max_n):
        result += model.vectors_ngrams[hash(ngram)]
        n += 1
    return result / n

Compression reduces the two large matrices ( vectors_vocab and vectors_ngrams) by:

Keeping only the most frequent words and n‑grams.

Storing them in lower‑precision (float16 instead of float32).

Applying product quantization to cluster rows.

(Optionally) factorizing the matrix into two smaller matrices (not recommended due to accuracy loss).

The compress-fasttext library implements the first three methods, which together achieve the reported size reduction.

Evaluation of the Compressed Model

SentEval, a benchmark suite covering 17 downstream tasks and 10 diagnostic tasks, is used to compare the original and compressed models. The average score of the small model is 0.9579 of the full model’s score, while its size is roughly 300× smaller, confirming that compression retains most useful information.

Conclusion

fastText provides fast, easy‑to‑maintain word embeddings, but its original models are too large for mobile or memory‑constrained environments. The methods in compress-fasttext shrink the model by hundreds of times with only a minor drop in downstream performance, making fastText practical for such scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

model compressionNLPword embeddingsfasttextcompress-fasttext
Code DAO
Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.