public final class TextExtractor extends Object implements IPdfTypeExtractor
Modifier and Type | Field and Description |
---|---|
com.aspose.ms.System.Collections.Generic.Dictionary<Integer,com.aspose.pdf.groupprocessor.Page> |
_numberedPages |
Constructor and Description |
---|
TextExtractor()
Creates TextExtractor instance.
|
Modifier and Type | Method and Description |
---|---|
long |
buildProperties(com.aspose.pdf.groupprocessor.ByteRange range,
com.aspose.pdf.groupprocessor.PdfTreeNode parentNode)
Builds tree of nodes those contain all pdf parameters with their values.
|
long |
buildProperties(com.aspose.pdf.groupprocessor.ByteRange range,
com.aspose.pdf.groupprocessor.PdfTreeNode parentNode,
boolean extractJustValue)
Builds tree of nodes those contain all pdf parameters with their values.
|
void |
dispose() |
String[] |
extractAllText()
Extracts text from the document
|
String[] |
extractAllTextInternal() |
String |
extractPageText(int pageNumber)
Extracts text from the page
|
int |
getPageCount()
Gets count of pages in the document.
|
String |
getVersion() |
void |
initialize(String pdfDocumentPath,
int bufferSize,
boolean allowAsyncInitialization)
Initializes TextExtractor instance.
|
void |
initializeAlternative(String pdfDocumentPath)
Initializes TextExtractor instance.
|
boolean |
isFastExtractionUsed() |
public final com.aspose.ms.System.Collections.Generic.Dictionary<Integer,com.aspose.pdf.groupprocessor.Page> _numberedPages
public void initialize(String pdfDocumentPath, int bufferSize, boolean allowAsyncInitialization)
Initializes TextExtractor instance.
pdfDocumentPath
- Path to a pdf document.bufferSize
- Maximum size of content in bytes that can be kept in memory.allowAsyncInitialization
- Allows async initialization of resources.public void initializeAlternative(String pdfDocumentPath)
Initializes TextExtractor instance.
pdfDocumentPath
- Path to a pdf document.public long buildProperties(com.aspose.pdf.groupprocessor.ByteRange range, com.aspose.pdf.groupprocessor.PdfTreeNode parentNode)
Builds tree of nodes those contain all pdf parameters with their values.
range
- Byte range where to parse parameters.parentNode
- Initial (root) node for building tree.extractJustValue
- For recursive calling.
Just shows that next recursive function should find parameter value but not parameter itself.public long buildProperties(com.aspose.pdf.groupprocessor.ByteRange range, com.aspose.pdf.groupprocessor.PdfTreeNode parentNode, boolean extractJustValue)
Builds tree of nodes those contain all pdf parameters with their values.
range
- Byte range where to parse parameters.parentNode
- Initial (root) node for building tree.extractJustValue
- For recursive calling.
Just shows that next recursive function should find parameter value but not parameter itself.public String[] extractAllText()
Extracts text from the document
extractAllText
in interface IDocumentTextExtractor
extractAllText
in interface IPdfTypeExtractor
public String[] extractAllTextInternal()
public String extractPageText(int pageNumber)
Extracts text from the page
extractPageText
in interface IDocumentPageTextExtractor
extractPageText
in interface IPdfTypeExtractor
pageNumber
- 1-based number of the pagepublic int getPageCount()
Gets count of pages in the document.
getPageCount
in interface IDocumentPageTextExtractor
getPageCount
in interface IPdfTypeExtractor
public void dispose()
dispose
in interface com.aspose.ms.System.IDisposable
dispose
in interface IPdfTypeExtractor
public String getVersion()
getVersion
in interface IPdfTypeExtractor
public boolean isFastExtractionUsed()
isFastExtractionUsed
in interface IPdfTypeExtractor
Copyright © 2016 Aspose. All Rights Reserved.