Predicting Movie Box Office with Playwright Data Scraping and DeepSeek AI
This article demonstrates how to combine Playwright web‑scraping of multiple Chinese movie platforms with the DeepSeek AI model to automatically collect data and generate a scientific prediction of the box‑office revenue for the film "Ne Zha 2".
Ever wondered how to use AI to predict a movie's box office? In this tutorial we combine large‑scale data collection with the DeepSeek AI model to forecast the final revenue of the recent hit "Ne Zha 2".
What is this operation?
First, we use Playwright —a powerful browser automation tool—to scrape real‑time data from platforms such as Douban, Taopiaopiao, Maoyan, Weibo, and Douyin. These platforms provide ratings, review counts, wish‑to‑see numbers, and social‑media heat indices, all of which are strong indicators for box‑office performance.
How to scrape the data?
Below is the code for extracting Douban data:
<span style="line-height: 26px">async function scrapeDouban() {</span></code><code><span style="line-height: 26px"> const browser = await chromium.launch({ headless: true });</span></code><code><span style="line-height: 26px"> const page = await browser.newPage();</span></code><code><span style="line-height: 26px"> try {</span></code><code><span style="line-height: 26px"> await page.goto('https://movie.douban.com/subject/34780991/');</span></code><code><span style="line-height: 26px"> const ratingLocator = page.locator('.rating_num');</span></code><code><span style="line-height: 26px"> const votesLocator = page.locator('span[property="v:votes"]');</span></code><code><span style="line-height: 26px"> const rating = await ratingLocator.innerText();</span></code><code><span style="line-height: 26px"> const votes = await votesLocator.innerText();</span></code><code><span style="line-height: 26px"> console.log(`Douban data - rating: ${rating}, votes: ${votes}`);</span></code><code><span style="line-height: 26px"> return { rating, votes };</span></code><code><span style="line-height: 26px"> } catch (error) {</span></code><code><span style="line-height: 26px"> console.error('Error scraping Douban:', error);</span></code><code><span style="line-height: 26px"> return null;</span></code><code><span style="line-height: 26px"> } finally {</span></code><code><span style="line-height: 26px"> await browser.close();</span></code><code><span style="line-height: 26px"> }</span></code><code><span style="line-height: 26px">}</span>Similar functions are provided for Taopiaopiao, Maoyan, Weibo, and Douyin, each extracting the relevant metrics (rating, wish‑to‑see count, view count, heat index, etc.).
Data collection is easy, analysis is the hard part
After gathering all platform data, we feed it to DeepSeek using the Promise.all pattern to run the scrapers in parallel, then call a predictBoxOffice function that constructs a detailed prompt for the AI model.
<span style="line-height: 26px">async function main() {</span></code><code><span style="line-height: 26px"> const [doubanData, taopiaopiaoData, maoyanData, weiboData, douyinData] = await Promise.all([</span></code><code><span style="line-height: 26px"> scrapeDouban(),</span></code><code><span style="line-height: 26px"> scrapeTaopiaopiao(),</span></code><code><span style="line-height: 26px"> scrapeMaoyan(),</span></code><code><span style="line-height: 26px"> scrapeWeibo(),</span></code><code><span style="line-height: 26px"> scrapeDouyin()</span></code><code><span style="line-height: 26px"> ]);</span></code><code><span style="line-height: 26px"> if (!doubanData || !taopiaopiaoData || !maoyanData || !weiboData || !douyinData) {</span></code><code><span style="line-height: 26px"> console.error('Error: some data failed to scrape');</span></code><code><span style="line-height: 26px"> return;</span></code><code><span style="line-height: 26px"> }</span></code><code><span style="line-height: 26px"> const combinedData = { douban: doubanData, taopiaopiao: taopiaopiaoData, maoyan: maoyanData, weibo: weiboData, douyin: douyinData };</span></code><code><span style="line-height: 26px"> const predictedBoxOffice = await predictBoxOffice(combinedData);</span></code><code><span style="line-height: 26px"> console.log(`Predicted box office for Ne Zha 2: ${predictedBoxOffice}`);</span></code><code><span style="line-height: 26px">}</span></code><code><span style="line-height: 26px">main();</span>The predictBoxOffice function uses the OpenAI-compatible DeepSeek API. It builds a prompt that lists all collected metrics, asks the model to consider holiday effects, cultural impact, social‑media trends, historical box‑office ceilings, and competing releases, and finally returns a predicted revenue range.
<span style="line-height: 26px">import OpenAI from "openai";</span></code><code><span style="line-height: 26px">const openai = new OpenAI({ baseURL: 'https://api.deepseek.com', apiKey: '<DeepSeek API Key>' });</span></code><code><span style="line-height: 26px">async function predictBoxOffice(data) {</span></code><code><span style="line-height: 26px"> const { douban, taopiaopiao, maoyan, weibo, douyin } = data;</span></code><code><span style="line-height: 26px"> const prompt = `You are a senior movie‑box‑office forecasting expert...`; // shortened for brevity</span></code><code><span style="line-height: 26px"> const completion = await openai.chat.completions.create({</span></code><code><span style="line-height: 26px"> model: "deepseek-chat",</span></code><code><span style="line-height: 26px"> messages: [{ role: 'system', content: prompt }, { role: 'user', content: 'Give the final box‑office prediction for Ne Zha 2 in billions of yuan.' }],</span></code><code><span style="line-height: 26px"> temperature: 0.3,</span></code><code><span style="line-height: 26px"> });</span></code><code><span style="line-height: 26px"> return completion.choices[0].message.content;</span></code><code><span style="line-height: 26px">}</span>Result
The AI model returned a prediction of 165 billion yuan for "Ne Zha 2", illustrating how automated data collection and large‑language‑model analysis can produce a scientifically grounded box‑office estimate far beyond simple guesswork.
Feel free to copy the code, run it locally, and experiment with your own movie predictions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
