langchain_community.document_loaders.parsers.pdf.PDFMinerParser¶
- class langchain_community.document_loaders.parsers.pdf.PDFMinerParser(extract_images: bool = False, *, concatenate_pages: bool = True)[source]¶
Parse PDF using PDFMiner.
Initialize a parser based on PDFMiner.
- Parameters
extract_images (bool) – Whether to extract images from PDF.
concatenate_pages (bool) – If True, concatenate all PDF pages into one a single document. Otherwise, return one document per page.
Methods
__init__([extract_images, concatenate_pages])Initialize a parser based on PDFMiner.
lazy_parse(blob)Lazily parse the blob.
parse(blob)Eagerly parse the blob into a document or documents.
- __init__(extract_images: bool = False, *, concatenate_pages: bool = True)[source]¶
Initialize a parser based on PDFMiner.
- Parameters
extract_images (bool) – Whether to extract images from PDF.
concatenate_pages (bool) – If True, concatenate all PDF pages into one a single document. Otherwise, return one document per page.