Artificial Intelligence 12 min read

Quora’s 2017 Machine‑Learning Roadmap: Use Cases, Models, and Platform

The article presents a comprehensive 2017 overview of how Quora applies machine learning across question understanding, answer ranking, personalization, content quality, and ad optimization, detailing the models, libraries, and platform infrastructure that power these systems.

High Availability Architecture

May 4, 2017

Quora’s 2017 Machine‑Learning Roadmap: Use Cases, Models, and Platform

Author: Nikhil Dandekar, head of Quora’s Machine‑Learning team (translated by Tim).

In 2015 Quora’s VP of Engineering Xavier Amatriain wrote about the company’s early use of machine learning; this article expands that view to a full 2017 roadmap.

Machine‑Learning Use Cases

1. Discovering Information

Quora’s core Q&A flow starts with a user‑asked question. A suite of ML models performs question understanding , extracting features from the text and context to classify question quality, determine question type, and assign topic labels from a taxonomy of over one million possible topics.

Features include asker identity, location, and other metadata. The system also powers two search experiences – a simple search bar and a deeper full‑text search – each using different ranking algorithms.

2. Obtaining Answers

The output of question‑understanding feeds the next stage: routing questions to experts. The “Request Answers” (formerly “Ask To Answer”) feature lets users solicit answers from domain experts, a problem described in the article “Ask To Answer as a Machine‑Learning Problem”.

Unanswered questions are also matched to experts via the personalized Feed, which ranks questions and answers using a rich set of user, question, and derived features.

3. Enhancing the Reading Experience

Feed ranking not only surfaces answerable questions but also highlights high‑quality answers. Answer ranking, email‑summary generation, and comment ranking all rely on advanced ML models that combine interaction signals, content quality, and user activity.

Related‑question recommendations and personalized topic/user suggestions further improve navigation, driven by a “user‑understanding” signal that captures likes/dislikes, expertise, and social graph information, including user‑topic and user‑user affinity scores.

4. Improving Content Quality

Quora maintains content standards with ML systems for duplicate‑question detection (with a public dataset and Kaggle competition), abusive‑content detection, and spam detection, among other quality‑preserving models.

5. Advertising Optimization

Since 2016 Quora has begun commercializing its platform, displaying intent‑relevant ads on question pages. An ML‑driven CTR prediction model selects ads that are relevant to the user, with plans to expand ML usage in advertising.

Models and Libraries Used

Quora’s engineers employ a variety of models (in no particular order): logistic regression, elastic net, gradient‑boosted decision trees (GBDT), random forest, (deep) neural networks, LambdaMART, matrix factorization techniques (SVD, BPR, weighted ALS), vector‑space and other NLP methods, k‑means and other clustering algorithms, and miscellaneous others.

Supported libraries include TensorFlow, scikit‑learn, XGBoost, LightGBM, RankLib, NLTK, and Quora’s internal matrix‑factorization library QMF, among other internal tools.

Machine‑Learning Platform Team

Since 2015 Quora has built a dedicated ML platform team to provide offline model‑training pipelines and online model‑serving infrastructure, enabling rapid, standardized, and reusable development for other engineering teams.

The platform accelerates both batch and real‑time ML workloads, allowing the company to process larger data volumes daily.

For more details and future roadmap, the author promises follow‑up posts.

Quora is actively hiring for various ML roles; see the careers page for opportunities.

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.