Backend Development 31 min read

Step-by-Step Guide to Building a Spring Boot Backend and Douyin Hot Search Crawler

This tutorial walks through creating a Maven‑based Spring Boot backend project with multiple modules, configuring pom.xml files, application properties, and logging, then adds a scheduled Douyin hot‑search crawler using OkHttp, demonstrating full end‑to‑end setup for a web service.

Rare Earth Juejin Tech Community

Sep 26, 2024

Step-by-Step Guide to Building a Spring Boot Backend and Douyin Hot Search Crawler

In this tutorial the author explains how to set up a Spring Boot backend project using Maven, configure multiple modules, and integrate a scheduled Douyin hot‑search crawler.

First a Maven project is created in IntelliJ IDEA, with groupId, artifactId and modules such as summo-sbmy-dao, summo-sbmy-service, summo-sbmy-web, summo-sbmy-start, summo-sbmy-job and summo-sbmy-common. The pom.xml of each module is shown, including dependencies for Spring Boot, MyBatis‑Plus, Redis, Lombok, Redisson, FastJSON, etc., and build plugins.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.7.15</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.summo</groupId>
    <artifactId>summo-sbmy</artifactId>
    <packaging>pom</packaging>
    <version>1.0-SNAPSHOT</version>
    <modules>
        <module>summo-sbmy-dao</module>
        <module>summo-sbmy-service</module>
        <module>summo-sbmy-web</module>
        <module>summo-sbmy-start</module>
        <module>summo-sbmy-job</module>
        <module>summo-sbmy-common</module>
    </modules>
    ... (other pom.xml sections omitted for brevity) ...
</project>

The application.properties file configures the Spring application name, server port, Druid datasource, MyBatis‑Plus, and Redis connection settings.

## 应用名
spring.application.name=summo-sbmy
## 端口号
server.port=8080

# 配置Druid数据源类型
spring.datasource.type=com.alibaba.druid.pool.DruidDataSource
spring.datasource.url=jdbc:mysql://xxx:3306/summo-sbmy?allowPublicKeyRetrieval=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai&rewriteBatchedStatements=true&zeroDateTimeBehavior=convertToNull
spring.datasource.username=xxx
spring.datasource.password=xxx
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.druid.initial-size=5
spring.datasource.druid.max-active=30
spring.datasource.druid.min-idle=5
spring.datasource.druid.max-wait=60000
spring.datasource.druid.time-between-eviction-runs-millis=60000
spring.datasource.druid.min-evictable-idle-time-millis=300000
spring.datasource.druid.validation-query=SELECT 1 FROM DUAL
spring.datasource.druid.test-while-idle=true
spring.datasource.druid.test-on-borrow=false
spring.datasource.druid.test-on-return=false
spring.datasource.druid.pool-prepared-statements=false
spring.datasource.druid.max-pool-prepared-statement-per-connection-size=0
spring.datasource.druid.filters=stat,wall
spring.datasource.druid.connection-properties=druid.stat.mergeSql=true;druid.stat.slowSqlMillis=500
spring.datasource.druid.use-global-data-source-stat=true
spring.datasource.druid.filter.wall.enabled=true
spring.datasource.druid.filter.wall.db-type=mysql
spring.datasource.druid.filter.stat.db-type=mysql
spring.datasource.druid.filter.stat.enabled=true

mybatis.configuration.auto-mapping-behavior=full
mybatis.configuration.map-underscore-to-camel-case=true
mybatis-plus.mapper-locations=classpath*:/mybatis/mapper/*.xml

spring.redis.database=0
spring.redis.timeout=1800000
spring.redis.host=127.0.0.1
spring.redis.port=6379
spring.redis.password=xxx
spring.redis.lettuce.pool.max-wait=-1
spring.redis.lettuce.pool.max-idle=5
spring.redis.lettuce.pool.min-idle=0
spring.redis.lettuce.pool.max-active=20
spring.redis.jedis.pool.min-idle=8
spring.redis.jedis.pool.max-idle=500
spring.redis.jedis.pool.max-active=2000
spring.redis.jedis.pool.max-wait=10000

The logback-spring.xml defines logging to console and rolling files for INFO, WARN and ERROR levels.

<configuration>
    <!-- 默认的一些配置 -->
    <include resource="org/springframework/boot/logging/logback/defaults.xml"/>
    <property name="APP_NAME" value="summo-sbmy"/>
    <property name="LOG_PATH" value="${user.home}/logs/${APP_NAME}"/>
    <property name="LOG_FILE" value="${LOG_PATH}/application.log"/>
    <property name="WARN_LOG_FILE" value="${LOG_PATH}/warn.log"/>
    <property name="ERROR_LOG_FILE" value="${LOG_PATH}/error.log"/>
    <property name="FILE_LOG_PATTERN" value="%green(%d{yyyy-MM-dd HH:mm:ss.SSS}) [%blue(requestId: %X{requestId})] [%highlight(%thread)] ${PID:- } %logger{36} %-5level - %msg%n"/>
    <appender name="APPLICATION" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOG_FILE}</file>
        <encoder>
            <pattern>${FILE_LOG_PATTERN}</pattern>
            <charset>utf8</charset>
        </encoder>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <fileNamePattern>${LOG_FILE}.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
            <maxHistory>7</maxHistory>
            <maxFileSize>50MB</maxFileSize>
            <totalSizeCap>500MB</totalSizeCap>
        </rollingPolicy>
    </appender>
    ... (WARN and ERROR appenders omitted for brevity) ...
    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
        <appender-ref ref="APPLICATION"/>
        <appender-ref ref="WARN"/>
        <appender-ref ref="ERROR"/>
    </root>
</configuration>

The main class Application.java starts the Spring Boot application.

package com.summo.sbmy;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.scheduling.annotation.EnableScheduling;

/**
 * @author summo
 * @version Application.java, 1.0.0
 * @description 启动核心类
 * @date 2024年08月09
 */
@SpringBootApplication(scanBasePackages = {"com.summo.sbmy"})
@EnableScheduling
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }

}

To fetch Douyin hot‑search data, a scheduled job DouyinHotSearchJob.java is added under the summo-sbmy-job module. It uses OkHttp to call the public API

https://www.iesdouyin.com/web/api/v2/hotsearch/billboard/word/

every hour and prints the JSON response.

package com.summo.sbmy.job.douyin;

import java.io.IOException;
import com.alibaba.fastjson.JSONObject;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

/**
 * @author summo
 * @version DouyinHotSearchJob.java, 1.0.0
 * @description 抖音热搜Java爬虫代码
 * @date 2024年08月09
 */
@Component
public class DouyinHotSearchJob {

    /**
     * 定时触发爬虫方法，1个小时执行一次
     */
    @Scheduled(fixedRate = 1000 * 60 * 60)
    public void hotSearch() throws IOException {
        OkHttpClient client = new OkHttpClient().newBuilder()
            .build();
        Request request = new Request.Builder()
            .url("https://www.iesdouyin.com/web/api/v2/hotsearch/billboard/word/")
            .method("GET", null)
            .addHeader("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7")
            .addHeader("accept-language", "zh-CN,zh;q=0.9")
            .addHeader("cache-control", "no-cache")
            .addHeader("cookie", "ttwid=1%7CJ6ehEognyMAob_gD6oZwA40monN8E_sENr3IUZmuk7o%7C1712472728%7C44b0cd0003fb75861789d62e56f014eaea3d198898a0ae9a947bf61d95d8ac1a; __ac_signature=_02B4Z6wo00f01fFoqvgAAIDBFmj97SX8qiXxSK5AABr708; __ac_referer=https://pre-dc-console.alibaba-inc.com/; ttwid=1%7CX9ppA_NoTHJI9DG3JN7wNnZ662r-aJbZwCFPLLGK-og%7C1713836331%7Cdbc79a439d0ecc994f60043d66b4ad3ff81c3820f3ab83ef85d30875cc59a18b")
            .addHeader("pragma", "no-cache")
            .addHeader("priority", "u=0, i")
            .addHeader("sec-ch-ua", "\"Not/A)Brand\";v=\"8\", \"Chromium\";v=\"126\", \"Google Chrome\";v=\"126\"")
            .addHeader("sec-ch-ua-mobile", "?0")
            .addHeader("sec-ch-ua-platform", "\"macOS\"")
            .addHeader("sec-fetch-dest", "document")
            .addHeader("sec-fetch-mode", "navigate")
            .addHeader("sec-fetch-site", "none")
            .addHeader("sec-fetch-user", "?1")
            .addHeader("upgrade-insecure-requests", "1")
            .addHeader("user-agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36")
            .build();
        Response response = client.newCall(request).execute();
        System.out.println(JSONObject.toJSONString(response.body().string()));
    }

}

The article also shows how to obtain the cURL command from the browser, import it into Postman, and generate ready‑to‑use code snippets for Java, illustrating a practical workflow for API testing.

Finally, the author reminds readers to carefully follow each configuration step because the project contains many files, and encourages experimenting with additional hot‑search crawlers for further learning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

maven Spring Boot Web Crawler Douyin

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.