Cloud Computing 22 min read

Containerizing the Live Classroom Service: Architecture, Migration Process, and Lessons Learned

This article details the background, goals, architectural analysis, migration scope, step‑by‑step containerization process, code‑level challenges, and post‑migration results of moving a large‑scale live‑classroom platform from virtual machines to a Kubernetes‑based container environment, highlighting performance, reliability, and operational improvements.

Xueersi Online School Tech Team
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Containerizing the Live Classroom Service: Architecture, Migration Process, and Lessons Learned

In recent years, micro‑service architectures have grown rapidly, but traditional virtual‑machine deployments struggle with scaling and management overhead, especially for high‑traffic live‑classroom services that require rapid expansion.

Containerization offers a lightweight alternative, reducing resource consumption and enabling faster scaling. The project aimed to migrate the entire live‑classroom platform to containers by the 2020 winter term, with a fallback to the VM environment if needed.

The existing system consists of an access layer (HTTP APIs), a service layer (micro‑services providing RPC), an infrastructure layer (Redis, MySQL, Zookeeper, Kafka, logging, publishing, gateway), and external dependencies (course, material, OA, user systems). The migration focused on three major areas: stateful services, persistent file storage, and auxiliary processes.

Key migration steps included:

Redesigning TW node service discovery by consolidating agents into a centralized cluster.

Replacing file‑based message‑queue fallback with Redis for reliability.

Adopting hostPath volumes for log persistence and adding random suffixes to log files to avoid overwrites.

Implementing Kubernetes storage options (emptyDir, hostPath, ConfigMap, Secret) for various needs.

Updating service registration and discovery to use Zookeeper with a fallback to Kubernetes Services, illustrated by code changes.

Code example for Zookeeper watch implementation:

func (s *Zookeeper) WatchTree(directory string, stopCh <-chan struct{}) (<-chan []*store.KVPair, error) {
    entries, err := s.List(directory)
    if err != nil {
        return nil, err
    }
    watchCh := make(chan []*store.KVPair)
    go func() {
        defer close(watchCh)
        watchCh <- entries
        for {
            _, _, eventCh, err := s.client.ChildrenW(s.normalize(directory))
            if err != nil {
                return
            }
            select {
            case e := <-eventCh:
                if e.Type == zk.EventNodeChildrenChanged {
                    if kv, err := s.List(directory); err == nil {
                        watchCh <- kv
                    }
                }
            case <-stopCh:
                return
            }
        }
    }()
    return watchCh, nil
}

Additional code for listing Zookeeper keys:

func (s *Zookeeper) List(directory string) ([]*store.KVPair, error) {
    keys, stat, err := s.client.Children(s.normalize(directory))
    if err != nil {
        if err == zk.ErrNoNode {
            return nil, store.ErrKeyNotFound
        }
        return nil, err
    }
    kv := []*store.KVPair{}
    for _, key := range keys {
        pair, err := s.Get(strings.TrimSuffix(directory, "/") + s.normalize(key))
        if err != nil {
            if err == store.ErrKeyNotFound {
                return s.List(directory)
            }
            return nil, err
        }
        kv = append(kv, &store.KVPair{Key: key, Value: []byte(pair.Value), LastIndex: uint64(stat.Version)})
    }
    return kv, nil
}

Testing revealed event‑loss issues during rapid pod scaling, which were resolved by resetting the Zookeeper watch after each event, as demonstrated in the updated design diagrams.

The migration also covered asynchronous consumer services, scheduled tasks (replacing XXL‑Job with a custom Go‑based scheduler supporting second‑level granularity), and gray‑release strategies with one‑click rollback.

After completing the migration, the live‑classroom platform operated fully in the container environment, handling millions of concurrent students during the 2020 winter term, with plans for further cloud integration, dynamic scaling, and service mesh adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

live streamingMicroservicesKubernetesservice discoverycontainerizationInfrastructure Migration
Xueersi Online School Tech Team
Written by

Xueersi Online School Tech Team

The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.