public class TextAbsorber extends Object
Represents an absorber object of a text.
Performs text extraction and provides access to the result via TextAbsorber.Text
object.
The example demonstrates how to extract text on the first PDF document page.// open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for first page doc.getPages().get(1).accept(absorber); // get the extracted text String extractedText = absorber.getText();
TextAbsorber
object is used to extract text from a Pdf document or the document's page.
Constructor and Description |
---|
TextAbsorber()
Initializes a new instance of the
TextAbsorber . |
TextAbsorber(TextExtractionOptions extractionOptions)
Initializes a new instance of the
TextAbsorber with extraction options. |
Modifier and Type | Method and Description |
---|---|
TextExtractionOptions |
getExtractionOptions()
Gets or sets text extraction options.
|
String |
getText()
Gets extracted text that the
TextAbsorber extracts on the PDF document or page. |
TextSearchOptions |
getTextSearchOptions()
Gets or sets text search options.
|
void |
setExtractionOptions(TextExtractionOptions value) |
void |
setTextSearchOptions(TextSearchOptions value) |
void |
visit(IDocument pdf)
Extracts text on the specified document
The example demonstrates how to extract text on PDF document.
|
void |
visit(Page page)
Extracts text on the specified page
The example demonstrates how to extract text on the first PDF document page.
|
void |
visit(XForm form)
Extracts text on the specified XForm.
|
public TextAbsorber()
Initializes a new instance of the TextAbsorber
.
The example demonstrates how to extract text from all pages of the PDF document.// open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text String extractedText = absorber.Text;
TextAbsorber.Text
object.
public TextAbsorber(TextExtractionOptions extractionOptions)
Initializes a new instance of the TextAbsorber
with extraction options.
The example demonstrates how to extract text from all pages of the PDF document.// open document Document doc = new Document(inFile); // create TextAbsorber object to extract text with formatting TextAbsorber absorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure)); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text string extractedText = absorber.Text;
TextAbsorber.Text
object.
public String getText()
Gets extracted text that the TextAbsorber
extracts on the PDF document or page.
The example demonstrates how to extract text from all pages of the PDF document.// open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text String extractedText = absorber.getText();
public void visit(Page page)
Extracts text on the specified page
The example demonstrates how to extract text on the first PDF document page.// open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages absorber.visit(doc.getPages(1)); // get the extracted text String extractedText = absorber.Text;
public void visit(XForm form)
The example demonstrates how to extract text on the first PDF document page.// open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages absorber.Visit(doc.Pages[1].Resources.Forms["Xform1"]); // get the extracted text string extractedText = absorber.Text;
form
- public void visit(IDocument pdf)
Extracts text on the specified document
The example demonstrates how to extract text on PDF document.// open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages absorber.visit(doc); // get the extracted text String extractedText = absorber.Text;
public TextExtractionOptions getExtractionOptions()
Gets or sets text extraction options.
The example demonstrates how to set Pure text formatting mode and perform text extraction.// open document Document doc = new Document(inFile); // create TextAbsorber object to extract text with formatting TextAbsorber absorber = new TextAbsorber(); // set pure text formatting mode absorber.setExtractionOptions ( new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure)); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text String extractedText = absorber.Text;
TextExtractionOptions
during extraction.
The default mode is TextExtractionOptions.TextFormattingMode.Pure
public void setExtractionOptions(TextExtractionOptions value)
public TextSearchOptions getTextSearchOptions()
public void setTextSearchOptions(TextSearchOptions value)
Copyright © 2019 Aspose. All Rights Reserved.