Artificial Intelligence 5 min read

Deploying Meta LLaMA 3 on JD Cloud: A Step‑by‑Step Tutorial

This article introduces Meta's newly released LLaMA 3 models, highlights their performance improvements, and provides a detailed, hands‑on guide for provisioning a JD Cloud GPU instance, installing the llama‑factory code, and running the model through a Jupyter‑based web demo.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Deploying Meta LLaMA 3 on JD Cloud: A Step‑by‑Step Tutorial

On April 19, Meta announced the open‑source LLaMA 3 family, offering an 8‑billion‑parameter and a 70‑billion‑parameter model with an 8K context window, claiming performance that rivals GPT‑4 and surpasses many proprietary models in benchmark tests.

The models use a standard decoder‑only Transformer architecture, benefit from a seven‑fold increase in pre‑training data (15 TB) and a larger 128K tokenizer, and employ Grouped‑Query Attention (GQA) to improve inference efficiency.

The open‑source community responded quickly, with over a thousand variants appearing on Hugging Face within five days.

The author attempted to access LLaMA 3 via Meta's online demo but faced network delays, then decided to run the model locally on JD Cloud due to the large model size (≈60 GB) and GPU requirements.

Step 1: Log into the JD Cloud AI console at https://gcs-console.jdcloud.com/instance/list .

Step 2: Create a GPU instance, selecting the “按配置” (pay‑by‑configuration) billing option; the hourly price is ¥1.89, so a ¥2 recharge yields about two hours of usage.

Step 3: Wait for the instance status to become “Running”, then click the Jupyter button to open the AI development environment.

Step 4: In the Jupyter terminal, copy the factory code to the data directory:

cp -r /gcs-pub/llama-factory/ /data/

Step 5: Open llama-factory/src/web_demo.py in the file explorer and change server_port to 28888 , then save.

Step 6: In the terminal, run the following commands to set up the environment and launch the demo:

cd /data/llama-factory

conda create -n liandan python=3.10 -y

conda activate liandan

pip install -e .[metrics]

CUDA_VISIBLE_DEVICES=0 python src/web_demo.py --model_name_or_path /gcs-pub/Meta-Llama-3-8B-Instruct --template llama3

The platform is noted for its fast inference speed, allowing the model to respond within minutes.

Step 7: Return to the JD Cloud console, select the instance, choose “应用 → 自定义应用”, and the LLaMA 3 prototype becomes accessible.

The author concludes that the platform also supports no‑code text‑to‑image applications and expresses excitement to experiment further with LLaMA 3.

Deploymentlarge language modelJD CloudAI TutorialGPU InstanceLlama3
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.