Artificial Intelligence 7 min read

Automatic Ticket Classification Using SVM and word2vec at Qunar

At Qunar, the data center algorithm team developed an automatic ticket classification system that combines Support Vector Machine with word2vec embeddings to handle high‑dimensional, low‑sample text data, achieving 89% accuracy and 80% recall while outlining the full machine‑learning pipeline from feature extraction to deployment.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Automatic Ticket Classification Using SVM and word2vec at Qunar

Wei Jinfeng, a data mining algorithm engineer at Qunar's Data Center, introduced a project aimed at automatically classifying customer service tickets (recorded phone calls) into categories such as cash‑back, bed‑type issues, location problems, and more.

The rapid growth of Qunar's business increased the volume of service calls, making manual ticket categorization labor‑intensive. The project seeks to apply machine‑learning methods to automate this process, providing immediate labor savings and building a foundation for related tasks like automatic quality inspection and intelligent客服.

The solution follows a typical machine‑learning workflow: model selection, feature extraction, feature selection, model implementation and optimization, evaluation, and deployment.

After evaluating several candidates, the team chose a Support Vector Machine (SVM) combined with word2vec embeddings. SVM was selected because the labeled data set is small, high‑dimensional, and not linearly separable, while word2vec provides semantic vector representations that help capture meaning in text.

SVM model : By maximizing the margin, SVM improves classification accuracy and generalization. Kernel functions map non‑linearly separable data into higher‑dimensional space, expanding applicability.

word2vec model : This shallow neural‑network language model converts words into low‑dimensional vectors that retain semantic information, overcoming the sparsity and lack of similarity in traditional one‑hot encodings.

Feature extraction involved converting speech to text, tokenizing into fine‑grained words, and generating uni‑gram and bi‑gram candidates. Feature enhancement was achieved by manually adding keyword features and weighting them with TF‑IDF. Feature selection used part‑of‑speech filtering and CHI‑square to reduce dimensionality caused by bi‑grams and combined features.

During model tuning, the team performed hyper‑parameter optimization and data‑driven debugging. Separate models were trained and fine‑tuned for each ticket category to maximize per‑class performance.

Evaluation of the first‑phase model showed an accuracy of 89% and a recall of 80%, meeting the initial requirements, with further optimization planned for the second phase.

The approach is applicable to many other scenarios, including ticket content mining, quality inspection, hotel review analysis, sentiment analysis, app feedback analysis, and hotel aggregation.

machine learningSVMText ClassificationWord2VecQunarticket classification
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.