I m 采用Langchain和RAG, llama 回答根据FAQ文件提出的问题。 该文件采用pdf格式,是一份编号的问题和答复清单。 守则完美运作,但从文件中检索信息并不正确。 尽管文件中提出了同样的问题,但兰链不能检索正确的问答表,删除了该文件中类似但不能回答我的部分。
Here is my code:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("doc.pdf")
data = loader.load_and_split()
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100)
all_splits = text_splitter.split_documents(data)
# Embed and store
from langchain.embeddings import (
GPT4AllEmbeddings,
OllamaEmbeddings, # We can also try Ollama embeddings
)
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())
from langchain.prompts import PromptTemplate
template = "..."
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)# Run chain
# LLM
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import requests
import json
# QA chain
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
return_source_documents=True,
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)
question = "..."
print(qa_chain({"query": question}))
What am I doing wrong? Can I improve something or use another technique? Why doesn t it retrieve the right document question for me?
我感谢大家的任何帮助?