document-loader

https://python.langchain.com/v0.2/docs/integrations/document_loaders/


%pip install  --user -Uq  langchain langchain_community pypdf pdf2image docx2txt pdfminer

webBaseLoader

https://python.langchain.com/v0.2/docs/integrations/document_loaders/web_base/)

%pip install  --user -Uq beautifulsoup4

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://www.thisisgame.com/webzine/news/nboard/4/?n=189952")

data = loader.load()

print(data[0].page_content)
loader = WebBaseLoader(["https://www.espn.com/", "https://google.com"])
docs = loader.load()
docs

Load multiple urls concurrently

xml parser

sitemap loader

์ด์ œ ๊ธ€์„ ์ˆ˜์ •ํ•˜์ž

Add custom scraping rules

pdf Document Loader

MS Word Document Loader

https://python.langchain.com/v0.2/docs/integrations/document_loaders/microsoft_word/

API Reference:Docx2txtLoader (https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.word_document.Docx2txtLoader.html)

Last updated

Was this helpful?