document-loader
%pip install --user -Uq langchain langchain_community pypdf pdf2image docx2txt pdfminerwebBaseLoader
%pip install --user -Uq beautifulsoup4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.thisisgame.com/webzine/news/nboard/4/?n=189952")
data = loader.load()
print(data[0].page_content)loader = WebBaseLoader(["https://www.espn.com/", "https://google.com"])
docs = loader.load()
docsLoad multiple urls concurrently
xml parser
sitemap loader
Add custom scraping rules
pdf Document Loader
MS Word Document Loader
Last updated