Artificial Intelligence 6 min read

Deploy Meta’s LLaMA 3 on JD Cloud: A Complete Step‑by‑Step Tutorial

Meta’s newly released LLaMA 3 models (8B and 70B) boast record‑breaking performance, and this guide walks you through the community buzz, technical specs, and a detailed JD Cloud workflow—from provisioning a GPU instance to running the model in a Jupyter environment.

JD Cloud Developers

May 8, 2024

Deploy Meta’s LLaMA 3 on JD Cloud: A Complete Step‑by‑Step Tutorial

On April 19, Meta announced its latest large language model, LLaMA 3, offering an 8‑billion‑parameter and a 70‑billion‑parameter version with an 8K context window, touted as the strongest open‑source LLM to date and rivaling GPT‑4 in many benchmarks. Detailed evaluation results are available at the official report .

LLaMA 3 adopts a standard decoder‑only Transformer architecture, with performance gains largely attributed to higher‑quality data: 15 terabytes of pre‑training data (seven times that of LLaMA 2) and a significantly larger proportion of code to boost reasoning. The tokenizer’s vocabulary was expanded to 128 K tokens (up from 32 K), improving tokenization granularity, and both model sizes employ Grouped‑Query Attention (GQA) for faster inference.

The open‑source community reacted quickly; within five days, over a thousand variants appeared on Hugging Face and the number continues to grow.

Inspired by the AI wave, the author attempted to try LLaMA 3 via the official demo site but faced network delays, and a local installation was discouraged by the model’s 60 GB size and the high cost of GPU resources.

Turning to JD Cloud, the author outlines a practical deployment process:

Step 1

Enter the JD Cloud AI console at https://gcs-console.jdcloud.com/instance/list .

Step 2

Create a GPU instance, selecting the “pay‑by‑configuration” billing mode (≈ ¥1.89 per hour). Recharge ¥2 to obtain roughly two hours of compute.

Step 3

Wait for the instance status to become “Running”, then launch Jupyter from the console.

Step 4

Open a terminal in Jupyter and execute:

cp -r /gcs-pub/llama-factory/ /data/

Step 5

In the file explorer, open llama-factory/src/web_demo.py, change server_port to 28888, and save.

Step 6

Back in the terminal, run the following commands:

cd /data/llama-factory</code>
<code>conda create -n liandan python=3.10 -y</code>
<code>conda activate liandan</code>
<code>pip install -e .[metrics]</code>
<code>CUDA_VISIBLE_DEVICES=0 python src/web_demo.py --model_name_or_path /gcs-pub/Meta-Llama-3-8B-Instruct --template llama3

Step 7

After a few minutes of fast inference, the model is accessible. Finally, in the JD Cloud console, select the instance, choose “Application → Custom Application”, and launch the LLaMA 3 prototype.

The platform also promises no‑code text‑to‑image capabilities, which the author plans to explore next.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Model AI Deployment tutorial JD Cloud Llama3

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

JD Cloud Developers

How this landed with the community

Was this worth your time?

0 Comments

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7