Prompt Chaining Có Luôn Là Phương Án Tốt Nhất?

Bài viết thảo luận về khi nào Prompt Chaining (xâu chuỗi các prompt nhỏ) tốt hơn, và khi nào không – so với sử dụng một prompt duy nhất (single prompt). Prompt Chaining không phải lúc nào cũng cho kết quả tốt hơn, và nên thử nghiệm từng cách để chọn phương án phù hợp với mục tiêu của bạn.

Tóm tắt chính

Prompt Chaining không phải lúc nào cũng tốt hơn single prompt.
Nên so sánh thử nghiệm giữa hai cách này trong từng tình huống cụ thể.
Trong một số trường hợp, Prompt Chaining có thể làm mất thông tin ngữ cảnh quan trọng.

Ví dụ Prompt Chaining (LangChain)

Bài viết đưa ra ví dụ về cách xây dựng chuỗi prompt trong LangChain:

Bước 1: Clean đoạn văn bản.

Bước 2: Trích xuất từ khóa.

Bước 3: Xử lý hậu kỳ (normalize) và lọc kết quả JSON.

Mỗi bước được thực hiện bằng một prompt riêng, được nối với nhau thành một “chain”.

# pip install langchain langchain-core langchain-community langchain-ollama
from langchain_core.prompts import PromptTemplate
from langchain_ollama import OllamaLLM
import json, re

llm = OllamaLLM(model="gemma2:2b", temperature=0)

# Step 1. Text cleaning
clean_prompt = PromptTemplate.from_template("""
You are a preprocessor for drug keyword extraction.
Clean the input text by:
- Removing parentheses and explanations (e.g., (anti-inflammatory))
- Splitting joined words like analgesic·anti-inflammatory
- Removing suffixes such as plural markers, type, formulation, medication
- Normalizing plurals to singular

Input:
{text}

Cleaned:
""")
clean_chain = clean_prompt | llm

# Step 2. Keyword extraction
extract_prompt = PromptTemplate.from_template("""
You are a keyword extractor for drug efficacy and medication-related descriptions.
Extract the most relevant and searchable **keywords** from the input text.

Focus on:
- Treatment effects (analgesic, anti-inflammatory, antipyretic)
- Symptoms (muscle pain, skin itching)
- Patient groups (pregnant women, elderly)
- Product names (Tylenol)
- Manufacturer (Shinshin Pharmaceutical)
- Ingredients (Acetaminophen)
- Dosage forms (patch, tablet, ointment)
Return a JSON array only.

Input:
{text}
Output:
""")
extract_chain = extract_prompt | llm

# Step 3. Post-processing
def parse_json_array(output: str):
    try:
        return json.loads(output.strip())
    except Exception:
        match = re.search(r"[(.*?)]", output, re.DOTALL)
        return json.loads("[" + match.group(1) + "]") if match else []

def normalize_keywords(keywords: list[str]):
    result = []
    for kw in keywords:
        kw = re.sub(r"(plural|type|formulation|medication)$", "", kw.strip())
        if kw and kw not in result:
            result.append(kw)
    return result

def extract_drug_keywords(text: str):
    cleaned = clean_chain.invoke({"text": text})
    raw = extract_chain.invoke({"text": cleaned})
    keywords = normalize_keywords(parse_json_array(raw))
    return sorted(set([kw for kw in keywords if kw in text]))

# Test execution
if __name__ == "__main__":
    query = """A patch-type medication with analgesic and anti-inflammatory effects,
    intended for pregnant women and the elderly.
    The Tylenol product manufactured by Shinshin Pharmaceutical
    contains acetaminophen as its main ingredient.
    It is used for symptoms such as muscle pain, joint pain,
    lower back pain, and shoulder stiffness."""
    
    print("Chaining approach result:", extract_drug_keywords(query))

Ví dụ Single Prompt

Trong phương pháp single prompt, tất cả logic được gom vào một prompt duy nhất:

Nội dung mô tả mục tiêu
Quy tắc trích xuất
Định dạng đầu ra JSON

Các yêu cầu rõ ràng

→ trả về danh sách từ khóa trực tiếp.

from string import Template
import json, ollama, re

MODEL_NAME = "gemma2:2b"
prompt_template = """
You are a keyword extractor for drug efficacy and medication-related descriptions.

1. **Keyword Extraction Focus:**  
   My goal is to extract only the most relevant and searchable **keywords** from drug-related text.  
   These include:
   - Treatment effects (e.g., analgesic, anti-inflammatory, antipyretic)
   - Disease or symptom names (e.g., muscle pain, skin itching)
   - Intended patient groups (e.g., pregnant women, elderly, infants)
   - Product names (e.g., Tylenol)
   - Manufacturer names (e.g., Shinshin Pharmaceutical)
   - Active ingredients (e.g., acetaminophen, ibuprofen)
   - Dosage forms / drug formulations (e.g., ointment, tablet, patch)
   - Pharmacological classifications (e.g., antipyretic analgesics, antihistamines)

   These keywords will be used for drug search and classification, so completeness and precision are important.

   Note:  
   Some dosage forms may appear as common words, but in the context of drug descriptions, they must be treated as searchable keywords.  
   Please always extract the following as keywords when mentioned:

   - "Patch": A medicated plaster for muscle and joint pain.
   - "Ointment": Ointment or topical cream.
   - "Tablet": Tablet form medication.
   - "Injection": Injectable formulation.
   - "Capsule": Capsule-type formulation.
   - "Syrup": Liquid formulation often used for children.

   Similar terms that describe drug delivery methods (e.g., topical, oral, injection) are also valid dosage form keywords.

2. **Output Format:**  
   Return a JSON array containing only the extracted keywords, without any explanation or additional formatting:
   ["keyword1", "keyword2", "keyword3", ...]

3. **Rules & Context:**
   - Always include all mentioned patient groups, product names, manufacturer names, and active ingredients.
   - Extract only short and specific terms related to symptoms, effects, ingredients, dosage forms (drug formulations), or pharmacological classification.
   - Do not include dosage units (e.g., mg, %, ml), filler phrases, or general descriptions.
   - Avoid vague, overly broad, or repetitive terms.
   - Output must be deduplicated and clean.

4. **Short Terms:**  
   Normalize terms to short, domain-specific English equivalents where appropriate:
   - "fever reduction" → "antipyretic"  
   - "pain relief" → "analgesic"  
   - "muscle pain" → "muscle pain"  
   - "Tylenol" → "Tylenol"  
   - "Shinshin Pharm" → "Shinshin Pharmaceutical"

---

**Example Input:**
A patch-type medication with analgesic and anti-inflammatory effects,
intended for pregnant women and the elderly.
The Tylenol product manufactured by Shinshin Pharmaceutical
contains acetaminophen as its main ingredient.
It is used for symptoms such as muscle pain, joint pain,
lower back pain, and shoulder stiffness.

**Expected Output:**
["pregnant women", "elderly", "analgesic", "anti-inflammatory", "patch",
 "Shinshin Pharmaceutical", "Tylenol", "acetaminophen",
 "muscle pain", "joint pain", "lower back pain", "shoulder stiffness"]

Now process this:
$user_input
"""

def generate_llm_output(_user_input: str):
    prompt = Template(prompt_template).substitute(user_input=_user_input)
    response = ollama.generate(model=MODEL_NAME, prompt=prompt)
    return response["response"]

def parse_json_array(output: str):
    try:
        return json.loads(output.strip())
    except Exception:
        match = re.search(r"[(.*?)]", output, re.DOTALL)
        return json.loads("[" + match.group(1) + "]") if match else []

def normalize_keywords(keywords: list[str]):
    result = []
    for kw in keywords:
        kw = re.sub(r"(plural|type|formulation|medication)$", "", kw.strip())
        if kw and kw not in result:
            result.append(kw)
    return result

def extract_keywords(user_input: str):
    raw = generate_llm_output(user_input)
    keywords = normalize_keywords(parse_json_array(raw))
    return sorted(set([kw for kw in keywords if kw in user_input]))

if __name__ == "__main__":
    query = """A patch-type medication with analgesic and anti-inflammatory effects,
    intended for pregnant women and the elderly.
    The Tylenol product manufactured by Shinshin Pharmaceutical
    contains acetaminophen as its main ingredient.
    It is used for symptoms such as muscle pain, joint pain,
    lower back pain, and shoulder stiffness."""
    
    print("Single prompt result:", extract_keywords(query))

Tóm lại: Prompt Chaining không đồng nghĩa với kết quả tốt hơn toàn diện — đặc biệt nếu việc giữ nguyên ngữ cảnh tổng thể là quan trọng hoặc nếu dữ liệu đầu vào phức tạp

Kết luận

Trong nhiều trường hợp, Prompt Chaining giúp phân nhỏ vấn đề thành các bước rõ ràng, dễ kiểm soát, dễ chỉnh sửa đầu ra.
Nhưng nó không “luôn luôn” là giải pháp tối ưu, đặc biệt khi bạn cần xử lý toàn bộ ngữ cảnh một cách nguyên vẹn.

Hãy Thử nghiệm với cả hai phương pháp để xem cách nào thực sự phù hợp với mục tiêu của bạn.

Nguồn bài viết - Dịch từ ryukato.github.io