Artificial Intelligence 8 min read

DeepKE-LLM: An Open‑Source Large Language Model Toolkit for Knowledge Extraction

DeepKE-LLM is an open‑source, extensible knowledge‑graph extraction framework that leverages large language models for entity, relation, and attribute extraction, supports multiple LLMs, provides installation scripts, various usage modes, fine‑tuning pipelines, and integrates with the KnowLM project for advanced instruction‑following capabilities.

DataFunTalk

Jun 23, 2023

DeepKE-LLM: An Open‑Source Large Language Model Toolkit for Knowledge Extraction

DeepKE-LLM is an open‑source, extensible knowledge‑graph extraction tool that supports named‑entity recognition, relation extraction, and attribute extraction, and has been upgraded for the large‑model era to provide intelligent parsing via multiple large language models (LLMs).

The toolkit currently supports a variety of LLMs, including Llama‑series models (Alpaca, Linly, etc.), ChatGLM, and, through the self‑developed EasyInstruct module, OpenAI and Claude models. It also offers diverse prompt formats such as text instructions and code prompts, and features batch request optimization.

To get started, create a Python 3.9 environment and install dependencies:

conda create -n deepke-llm python=3.9
conda activate deepke-llm
cd example/llm
pip install -r requirements.txt

Two usage methods are provided. The first follows the KnowLM project workflow, running a script like:

python examples/generate_lora.py --load_8bit --base_model ./zhixi --lora_weights ./lora --run_ie_cases

The second follows the DeepKE‑LLM project instructions, for example:

CUDA_VISIBLE_DEVICES="0" python inference_llama.py \
    --base_model 'path_to_base_model' \
    --lora_weights 'path_to_lora_weights' \
    --input_file 'path_to_input' \
    --output_file 'path_to_output' \
    --load_8bit

DeepKE‑LLM also supports fine‑tuning of LLMs such as Llama‑7B using LoRA. A typical fine‑tuning command is:

CUDA_VISIBLE_DEVICES="0" python finetune_llama.py \
    --base_model 'path_to_base_model' \
    --train_path 'path_to_train_data' \
    --output_dir 'path_to_output_model' \
    --batch_size 128 \
    --micro_train_batch_size 4 \
    --num_epochs 3 \
    --learning_rate 1e-4 \
    --cutoff_len 512 \
    --val_set_size 1000 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,v_proj]' \
    --train_on_inputs \
    --group_by_length

For GPT‑style or Anthropic models, users can install the EasyInstruct package via pip install easyinstruct and run the provided run.py after configuring API keys and datasets.

The underlying “ZhiXi” large model is built on LLaMA‑13B and has been pre‑trained on a massive multilingual corpus (5.5M Chinese, 1.5M English, 0.9M code samples) without expanding the vocabulary, followed by instruction‑following fine‑tuning using KG2Instructions derived from Wikipedia, WikiData, and other academic extraction datasets.

In addition to knowledge‑extraction abilities, the model retains general instruction‑following skills such as translation, code generation, and reasoning, as documented in the KnowLM repository.

The article concludes with a brief outlook, inviting community feedback and outlining future directions like model customization, multi‑agent collaboration, and embodied interaction scenarios.

Acknowledgments list contributors from Zhejiang University and other collaborators.

Finally, the page promotes the OpenKG initiative and an unrelated event about Stable Diffusion prompt engineering, which is not part of the technical documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python LLM Prompt engineering fine-tuning Knowledge Extraction DeepKE

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.