document-loader
https://python.langchain.com/v0.2/docs/integrations/document_loaders/
%pip install --user -Uq langchain langchain_community pypdf pdf2image docx2txt pdfminerwebBaseLoader
https://python.langchain.com/v0.2/docs/integrations/document_loaders/web_base/)
%pip install --user -Uq beautifulsoup4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.thisisgame.com/webzine/news/nboard/4/?n=189952")
data = loader.load()
print(data[0].page_content)loader = WebBaseLoader(["https://www.espn.com/", "https://google.com"])
docs = loader.load()
docsLoad multiple urls concurrently
xml parser
sitemap loader
์ด์ ๊ธ์ ์์ ํ์
Add custom scraping rules
pdf Document Loader
MS Word Document Loader
https://python.langchain.com/v0.2/docs/integrations/document_loaders/microsoft_word/
API Reference:Docx2txtLoader (https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.word_document.Docx2txtLoader.html)
Last updated
Was this helpful?