Deploying Meta LLaMA 3 on JD Cloud: A Step‑by‑Step Tutorial
This article introduces Meta's newly released LLaMA 3 models, highlights their performance improvements, and provides a detailed, hands‑on guide for provisioning a JD Cloud GPU instance, installing the llama‑factory code, and running the model through a Jupyter‑based web demo.
On April 19, Meta announced the open‑source LLaMA 3 family, offering an 8‑billion‑parameter and a 70‑billion‑parameter model with an 8K context window, claiming performance that rivals GPT‑4 and surpasses many proprietary models in benchmark tests.
The models use a standard decoder‑only Transformer architecture, benefit from a seven‑fold increase in pre‑training data (15 TB) and a larger 128K tokenizer, and employ Grouped‑Query Attention (GQA) to improve inference efficiency.
The open‑source community responded quickly, with over a thousand variants appearing on Hugging Face within five days.
The author attempted to access LLaMA 3 via Meta's online demo but faced network delays, then decided to run the model locally on JD Cloud due to the large model size (≈60 GB) and GPU requirements.
Step 1: Log into the JD Cloud AI console at https://gcs-console.jdcloud.com/instance/list .
Step 2: Create a GPU instance, selecting the “按配置” (pay‑by‑configuration) billing option; the hourly price is ¥1.89, so a ¥2 recharge yields about two hours of usage.
Step 3: Wait for the instance status to become “Running”, then click the Jupyter button to open the AI development environment.
Step 4: In the Jupyter terminal, copy the factory code to the data directory:
cp -r /gcs-pub/llama-factory/ /data/
Step 5: Open llama-factory/src/web_demo.py in the file explorer and change server_port to 28888 , then save.
Step 6: In the terminal, run the following commands to set up the environment and launch the demo:
cd /data/llama-factory
conda create -n liandan python=3.10 -y
conda activate liandan
pip install -e .[metrics]
CUDA_VISIBLE_DEVICES=0 python src/web_demo.py --model_name_or_path /gcs-pub/Meta-Llama-3-8B-Instruct --template llama3
The platform is noted for its fast inference speed, allowing the model to respond within minutes.
Step 7: Return to the JD Cloud console, select the instance, choose “应用 → 自定义应用”, and the LLaMA 3 prototype becomes accessible.
The author concludes that the platform also supports no‑code text‑to‑image applications and expresses excitement to experiment further with LLaMA 3.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.