Transformer‑Based Denoising AutoEncoder (TSDAE) for Job Description Embeddings (Job2Vec)

This article explains how TSDAE, a transformer‑based denoising auto‑encoder, converts noisy job description sentences into robust vector embeddings, details its training process, loss function, dataset preparation, and demonstrates using FAISS for similarity search on the resulting Job2Vec representations.

Code DAO
Code DAO
Code DAO
Transformer‑Based Denoising AutoEncoder (TSDAE) for Job Description Embeddings (Job2Vec)

TSDAE is a domain‑adaptive pre‑training method for sentence embeddings that outperforms approaches such as Masked Language Models.

It trains by adding noise (e.g., deleting or swapping words) to input sentences, encoding the corrupted sentence into a fixed‑size vector, and decoding it back to the original text; the decoder only receives the encoder’s sentence representation, creating a bottleneck that forces meaningful embeddings.

TSDAE’s S‑BERT implementation adds noise to the input text and removes roughly 60% of the words. The encoder maps this noisy input to a fixed‑size sentence embedding, and the decoder attempts to reconstruct the clean sentence. After training, only the encoder is used to generate embeddings.

The training loss, DenoisingAutoEncoderLoss, expects batches composed of pairs [noise_fn(sentence), sentence]. Batches are built with DenoisingAutoEncoderDataset, where sentence is a list of strings and noise_fn returns a noisy version of a given string.

param sentence: a list of sentences

param noise_fn: a function that takes a string and returns a noisy string (e.g., with deleted words)

After cleaning a job‑requirements dataset, the document length distribution is visualized, and the cleaned sentence list is fed directly into the model.

Training for five epochs produces embeddings for sample data, shown in the accompanying figures.

FAISS is then used to index these embeddings, enabling similarity search to retrieve matching job descriptions; the retrieval results appear promising.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TransformerFAISSNLPAutoencodersentence embeddingjob descriptionTSDAE
Code DAO
Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.