Artificial Intelligence 9 min read

CodeFuse: Open‑Source Large Code Model with Multi‑Task Fine‑Tuning and 4‑Bit Quantization

Ant Group’s open‑source CodeFuse is a large‑scale code‑generation model featuring multi‑task fine‑tuning and 4‑bit quantization, achieving a 74.4% HumanEval score that outperforms GPT‑4, supporting tasks from code synthesis to bug fixing, and can be deployed on a single high‑end GPU.

Ant R&D Efficiency
Ant R&D Efficiency
Ant R&D Efficiency
CodeFuse: Open‑Source Large Code Model with Multi‑Task Fine‑Tuning and 4‑Bit Quantization

On September 8, 2023, at the “2023 inclusion·外滩大会--云端 Al:探索新兴技术和发展模式” forum in Shanghai, Ant Group open‑sourced the code‑generation large model CodeFuse.

CodeFuse is a self‑developed code generation model that can automatically generate code, add comments, create test cases, and fix/optimize code based on developer prompts, aiming to improve R&D efficiency for both beginners and experienced developers.

The project’s goal is to redefine the next generation of AI‑assisted software development, providing a full‑lifecycle AI toolset and encouraging community collaboration to advance software engineering paradigms.

In recent HumanEval benchmark tests, CodeFuse achieved a score of 74.4%, surpassing GPT‑4 (67%) and WizardCoder‑34B (73.2%), placing it among the top open‑source models.

The open‑source release includes the codebase, dataset, and model weights, hosted on GitHub ( https://github.com/codefuse-ai ), HuggingFace ( https://huggingface.co/codefuse-ai ), and the domestic ModelScope community ( https://modelscope.cn/organization/codefuse-ai ).

CodeFuse’s architecture consists of three layers: a distributed training layer (supporting DeepSpeed and Ant’s ATorch), a middle layer with the Multi‑Task Fine‑Tuning (MFT) framework, and a top layer that integrates various open‑source models such as LLaMA, LLaMA‑2, StarCoder, Baichuan, Qwen, ChatGLM2, and GPT‑NeoX.

The MFT framework supports dozens of tasks (code generation, translation, test case generation, bug fixing, etc.) simultaneously, balances task convergence with a novel loss design, and works with both HuggingFace and ATorch training pipelines. It also supports efficient fine‑tuning via LoRA and QLoRA.

Evaluation tables show that MFT‑fine‑tuned models consistently improve HumanEval Pass@1 scores by roughly 10 % over the original checkpoints, with notable gains on StarCoder and CodeLLaMA.

For deployment, the 34B CodeFuse‑CodeLLaMA model normally requires an A100 or four A10 GPUs in FP16/INT8. By applying 4‑bit (INT4) quantization using GPTQ or Nvidia TensorRT‑LLM, the model size drops from 64.9 GB to 19 GB, enabling deployment on a single A10 or RTX 4090 (24 GB) with inference speed comparable to FP16 on A100.

Quantized performance drops less than 1 % on HumanEval and about 1 % on Chinese NLP benchmarks (CMNLI/C‑Eval). Memory usage measurements confirm that 4‑bit models can handle 3K token sequences on an A10.

The quantized model, along with the MFTCoder framework, is publicly available on HuggingFace, ModelScope, and Wisemodel.

For more information and to try the models, visit the links above or scan the QR code for the official Ant Group enterprise WeChat group.

code generationAIquantizationopen-sourcelarge language modelCodeFusemulti-task fine-tuning
Ant R&D Efficiency
Written by

Ant R&D Efficiency

We are the Ant R&D Efficiency team, focused on fast development, experience-driven success, and practical technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.