Soul’s AIGC Practices and Explorations: From Industry Trends to Product Integration
This article reviews recent AIGC developments, outlines Soul’s multi‑year experiments with virtual humans, visual generation, large language models, audio synthesis, and product integration, and discusses the strategic balance between using external AI services and building proprietary capabilities.
In recent years, AIGC has surged since OpenAI released ChatGPT in late 2022, followed by a wave of large language models, multimodal capabilities, and rapid innovations in image and video generation such as DALL‑E 2, Stable Diffusion, Midjourney, and Sora.
Soul, a social app targeting Gen‑Z users, leverages this wave to build an AI‑native social network. The platform’s core functions—private chat, instant‑share square, video/voice matching, AI‑enhanced camera, and themed group squares—are designed around “soul” connections rather than superficial looks.
1. Virtual Humans
Soul built an on‑device rendering engine in 2020, released it in 2021, and later added full‑body avatars for multi‑user scenes. The engine now integrates AR‑generated textures, multimodal AI for text‑driven motion, and compatibility layers for Unity and Unreal, aiming for realistic virtual personas powered by large language and speech synthesis models.
2. Visual Generation
Starting with real‑time face and gesture recognition, Soul moved to GAN‑based image generation in 2021 and introduced AI drawing for in‑app events in Dec 2022. Since 2023 the focus has shifted to a proprietary model matrix offering diverse styles, video generation (including SVD‑based transitions), and user‑uploaded UGC model training, while exploring technologies similar to Sora.
3. Dialogue & Large‑Model Development
Soul’s backend combines pre‑processing (face detection, segmentation), guided generation, fine‑tuning for style, and inference optimizations to reduce latency and cost. Products include the AI companion “AI Gou Dan”, multilingual dialogue, AI‑driven game modes (e.g., AI‑wolf‑kill), and a virtual‑person chat system that generates images, text, and audio based on user‑defined personas.
4. Audio & Music
The team developed single‑ and multi‑speaker TTS, voice cloning, and lip‑sync technologies, as well as AI‑generated background music, AI singers, and lyric‑to‑song pipelines. Ongoing work includes a large‑scale audio synthesis model to improve quality and scalability.
5. AIGC Technology Integrated with Products
Features such as AI‑assisted chat suggestions, dynamic content recognition (e.g., detecting user‑posted gifts or birthdays), AI‑generated avatars, virtual‑person role‑play, AI‑driven KTV, and large‑scale content generation for growth and activation are deployed to boost DAU, session length, and user engagement.
6. General vs. Self‑Developed AIGC
The discussion emphasizes evaluating external models (e.g., GPT‑4, Midjourney) against the need for proprietary solutions that provide unique value in social‑media contexts, leveraging internal assets, and building vertical‑specific barriers that generic models cannot easily overcome.
7. Q&A Highlights
Key questions addressed include the feasibility of truly human‑like digital avatars, business impact of AI companions on metrics like DAU and session time, and moderation challenges for inappropriate or politically sensitive content, with suggestions to combine robust data filtering, model‑level safeguards, and personalized persona recommendations.
Overall, Soul’s AIGC journey illustrates a comprehensive blend of research, product experimentation, and strategic decision‑making aimed at creating an engaging, AI‑native social experience for young users.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.