English 中文(简体)
Langchain和RAG没有适当检索需要的文件第一部分。
原标题:Langchain and RAG does not properly retrieve the parts of the document I need

I m 采用Langchain和RAG, llama 回答根据FAQ文件提出的问题。 该文件采用pdf格式,是一份编号的问题和答复清单。 守则完美运作,但从文件中检索信息并不正确。 尽管文件中提出了同样的问题,但兰链不能检索正确的问答表,删除了该文件中类似但不能回答我的部分。

Here is my code:

from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("doc.pdf")
data = loader.load_and_split()

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100)
all_splits = text_splitter.split_documents(data)

# Embed and store
from langchain.embeddings import (
    GPT4AllEmbeddings,
    OllamaEmbeddings,  # We can also try Ollama embeddings
)
from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())

from langchain.prompts import PromptTemplate

template = "..."

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)# Run chain

# LLM
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import requests
import json

# QA chain
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

question = "..."
print(qa_chain({"query": question}))

What am I doing wrong? Can I improve something or use another technique? Why doesn t it retrieve the right document question for me?

我感谢大家的任何帮助?

问题回答

添加一部供你参考的法典,其中使用HuggingFace 嵌入模型。

from langchain.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI


loader = DirectoryLoader( /tmp/ , glob="./*.pdf")    # copy any PDF to /tmp
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,
                                               chunk_overlap=100)
texts = text_splitter.split_documents(documents)

embeddings = HuggingFaceEmbeddings(
    # model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
    model_name="bert-base-multilingual-cased")

persist_directory = "/tmp/chromadb"
vectordb = Chroma.from_documents(documents=texts, embedding=embeddings, persist_directory=persist_directory)
vectordb.persist()


qa_chain = RetrievalQA.from_chain_type(
        llm=OpenAI(temperature=0, model_name="text-davinci-003"),
        retriever=vectordb.as_retriever(), chain_type="stuff",
        # chain_type_kwargs=chain_type_kwargs,
        return_source_documents=True)

response = qa_chain("please summarize this book")
print(response)

如果你想要使用其他<>开放源 法学硕士,而不是开放式考试

llm = HuggingFaceHub(repo_id = "google/flan-t5-base",
                     model_kwargs={"temperature":0.6,"max_length": 500, "max_new_tokens": 200
                                  })

llm = HuggingFaceHub(repo_id="EleutherAI/gpt-neo-2.7B")




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签