Databases 10 min read

Key Takeaways from DTCC2023: Vector Databases, Data Privacy, and Intelligent Ops

The 14th China Database Technology Conference (DTCC2023) showcased cutting‑edge advances in vector databases, data privacy, MySQL security, and AI‑driven intelligent operations, featuring insights from industry leaders at Huawei, Tencent, eBay, Bilibili and more.

ITPUB

Aug 18, 2023

Key Takeaways from DTCC2023: Vector Databases, Data Privacy, and Intelligent Ops

Vector Database Session

In the era of large‑language models (LLMs) and AI‑generated content, vector databases are used to store deep‑learning embeddings and enable semantic search over images, audio, video and text. Vearch (open‑sourced in September 2019) supports storage of billions of vectors and returns results within a few milliseconds. Its architecture separates the storage layer (distributed object store) from the indexing layer (e.g., IVF, HNSW) and provides automatic sharding, replication and fault‑tolerant query routing. Major adopters such as Huawei, Oppo, Vivo and JD run Vearch in production.

Huawei Cloud’s chief architect described a low‑cost, high‑performance, highly‑scalable and highly‑available vector database built on commodity servers. Key techniques include:

Horizontal sharding of vector partitions.

Replica groups for HA with automatic failover.

GPU‑accelerated distance calculations for millisecond‑level latency.

Hybrid storage (memory‑first, SSD‑backed) to balance cost and performance.

Tencent Cloud highlighted seven to eight years of development in vector retrieval. While traditional use‑cases (recommendation, search, face‑recognition) are mature, the rapid adoption of LLM‑driven applications introduces new challenges: higher query volume, stricter latency budgets, and the need for dynamic indexing algorithms. Tencent’s recent innovations include adaptive IVF parameters, quantization‑aware training, and cost‑aware routing.

eBay senior engineer shared practical experience with Milvus . Deployment steps covered:

Cluster provisioning (Kubernetes or Docker‑Compose) with dedicated GPU nodes.

Data ingestion pipelines that batch‑load embeddings and create IVF‑PQ indexes.

Performance tuning for high concurrency: increasing search.search_threads, enabling cache.cache_capacity, and adjusting gpu.search_devices.

Monitoring latency and throughput with Prometheus‑exported metrics.

Future work discussed integrating Milvus with LLMs to build knowledge‑base retrieval systems, where vector similarity is combined with textual reranking.

Data Privacy and Security

Bilibili’s big‑data security team presented a one‑stop solution that enforces legality, safety and compliance for massive data assets. Core components include:

Transparent data encryption at rest (AES‑256) and in transit (TLS 1.3).

Fine‑grained access control using role‑based policies and attribute‑based encryption keys.

Automated compliance checks against Chinese data‑protection regulations.

Audit logging with tamper‑evident storage.

Oracle MySQL chief engineer outlined security best practices for MySQL deployments:

Enable require_secure_transport and configure server‑side TLS certificates.

Use innodb_encrypt_tables and innodb_encrypt_log for data‑at‑rest encryption.

Implement role‑based access control via CREATE ROLE and GRANT statements.

Deploy the MySQL Enterprise Audit plugin or open‑source MariaDB Audit Plugin for detailed activity logs.

Enforce strong password policies and rotate credentials regularly.

Intelligent Operations

CTO Xu Ji described a digital‑modeling approach to database operations that captures operational knowledge as reusable assets. By combining knowledge graphs with large‑language‑model (LLM) assistants, routine tasks such as schema migration, performance diagnosis and incident triage can be partially automated.

DBbrain , presented by Tencent Cloud, provides intelligent operation for distributed databases. Its workflow includes:

Real‑time collection of node‑level metrics (CPU, I/O, network) via agents.

SQL‑level tracing that records execution plans, latency distribution and lock contention.

Automated anomaly detection that triggers alerts when metrics exceed dynamic thresholds.

Context‑aware optimization recommendations (e.g., index suggestions, query rewrite) linked to business KPIs.

Senior DBA Yang Yicong emphasized that intelligent operations combine four pillars: DBA expertise, algorithmic model design, robust data support, and automation pipelines. This synergy improves reliability, reduces MTTR and enables proactive capacity planning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data AI vector database Database Security Intelligent Operations data privacy

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.