public final class TextFragmentAbsorber extends TextAbsorber
Represents an absorber object of text fragments. Performs text search and provides access to
search results via TextFragmentAbsorber.TextFragments collection.
The example demonstrates how to find text on the first PDF document page and replace the text and it's font.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Find font that will be used to change document text font
com.aspose.pdf.Font font = FontRepository.findFont("Arial");
// Create TextFragmentAbsorber object to find all "hello world" text occurrences
TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
// Accept the absorber for first page
doc.getPages().get(1).accept(absorber);
// Change text and font of the first text occurrence
absorber.getTextFragments().get_Item(1).setText ( "hi world");
absorber.getTextFragments().get_Item(1).getTextState().setFont ( font);
// Save document
doc.save("D:\\Tests\\output.pdf");
The TextFragmentAbsorber object is basically used in text search scenario. When the
search is completed the occurrences are represented with TextFragment objects that the
TextFragmentAbsorber.TextFragments collection contains. The TextFragment object
provides access to the search occurrence text, text properties, and allows to edit text and
change the text state (font, font size, color etc).
| Constructor and Description |
|---|
TextFragmentAbsorber()
Initializes a new instance of the
TextFragmentAbsorber that performs search of all
text segments of the document or page. |
TextFragmentAbsorber(Pattern regex)
Initializes a new instance of the
TextFragmentAbsorber class for the specified System.Text.RegularExpressions.Regex class object. |
TextFragmentAbsorber(Pattern[] patterns,
TextSearchOptions textSearchOptions)
Initializes a new instance of the
TextFragmentAbsorber class for the specified text phrase and text search options. |
TextFragmentAbsorber(Pattern regex,
TextEditOptions textEditOptions)
Initializes a new instance of the
TextFragmentAbsorber class for the specified text phrase and text edit options. |
TextFragmentAbsorber(Pattern regex,
TextSearchOptions textSearchOptions)
Initializes a new instance of the
TextFragmentAbsorber class for the specified text phrase and text search options. |
TextFragmentAbsorber(String phrase)
Initializes a new instance of the
TextFragmentAbsorber class for the specified text
phrase. |
TextFragmentAbsorber(String phrase,
TextEditOptions textEditOptions)
Initializes a new instance of the
TextFragmentAbsorber class for the specified text
phrase and text edit options. |
TextFragmentAbsorber(String phrase,
TextSearchOptions textSearchOptions)
Initializes a new instance of the
TextFragmentAbsorber class for the specified text
phrase and text search options. |
TextFragmentAbsorber(String phrase,
TextSearchOptions textSearchOptions,
TextEditOptions textEditOptions)
Initializes a new instance of the
TextFragmentAbsorber class for the specified text
phrase, text search options and text edit options. |
TextFragmentAbsorber(TextEditOptions textEditOptions)
Initializes a new instance of the
TextFragmentAbsorber with text edit options, that
performs search of all text segments of the document or page. |
| Modifier and Type | Method and Description |
|---|---|
void |
applyForAllFragments(float fontSize)
Applies font size for all text fragments that were absorbed.
|
void |
applyForAllFragments(Font font)
Applies font for all text fragments that were absorbed.
|
void |
applyForAllFragments(Font font,
float fontSize)
Applies font and size for all text fragments that were absorbed.
|
List<TextExtractionError> |
getErrors()
List of
TextExtractionError objects. |
TextExtractionOptions |
getExtractionOptions()
Gets text extraction options.
|
String |
getPhrase()
Gets phrase that the
TextFragmentAbsorber searches on the PDF document or page. |
HashMap<Pattern,TextFragmentCollection> |
getRegexResults()
Gets dictionary of search occurrences that are presented with System.Text.RegularExpressions.Regex class as key and
TextFragment as value. |
com.aspose.ms.System.Collections.Generic.Dictionary<com.aspose.ms.System.Text.RegularExpressions.Regex,TextFragmentCollection> |
getRegexResultsInternal() |
String |
getText()
Gets extracted text that the
TextAbsorber extracts on the PDF document or page. |
TextEditOptions |
getTextEditOptions()
Gets text edit options.
|
TextFragmentCollection |
getTextFragments()
Gets collection of search occurrences that are presented with
TextFragment objects. |
TextReplaceOptions |
getTextReplaceOptions()
Gets text replace options.
|
TextSearchOptions |
getTextSearchOptions()
Gets search options.
|
boolean |
hasErrors_Fragment()
Value indicates whether errors were found during text extraction.
|
void |
removeAllText(Document document)
Removes all text from the document.
|
void |
removeAllText(Page page)
Removes all text from the specified page.
|
void |
removeAllText(Page page,
Rectangle rect)
Removes text inside the specified rectangle from the specified page.
|
void |
reset()
Clears TextFragments collection of this
TextFragmentAbsorber object. |
void |
setExtractionOptions(TextExtractionOptions value)
Sets text extraction options.
|
void |
setPhrase(String value)
Sets phrase that the
TextFragmentAbsorber searches on the PDF document or page. |
void |
setTextEditOptions(TextEditOptions value)
Sets text edit options.
|
void |
setTextFragments(TextFragmentCollection value)
Sets collection of search occurrences that are presented with
TextFragment objects. |
void |
setTextReplaceOptions(TextReplaceOptions value)
Sets text replace options.
|
void |
setTextSearchOptions(TextSearchOptions value)
Sets search options.
|
void |
visit(IDocument pdf)
Performs search on the specified document.
|
void |
visit(Page page)
Performs search on the specified page.
|
void |
visit(XForm xForm)
Performs search on the specified form object.
|
hasErrorspublic TextFragmentAbsorber()
Initializes a new instance of the TextFragmentAbsorber that performs search of all
text segments of the document or page.
The example demonstrates how to find text on the first PDF document page and replace the text.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Find font that will be used to change document text font
Font font = FontRepository.findFont("Arial");
// Create TextFragmentAbsorber object
TextFragmentAbsorber absorber = new TextFragmentAbsorber();
// Make the absorber to search all "hello world" text occurrences
absorber.setPhrase ( "hello world");
// Accept the absorber for first page
doc.getPages().get(1).accept(absorber);
// Change text of the first text occurrence
absorber.getTextFragments().get_Item(1).setText ( "hi world");
// Save document
doc.save("D:\\Tests\\output.pdf");
Performs text search and provides access to search results via
TextFragmentAbsorber.TextFragments collection.
public TextFragmentAbsorber(TextEditOptions textEditOptions)
Initializes a new instance of the TextFragmentAbsorber with text edit options, that
performs search of all text segments of the document or page.
The example demonstrates how to find all text fragments on the first PDF document page and replace font for them.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Create TextFragmentAbsorber object
TextFragmentAbsorber absorber = new TextFragmentAbsorber(new TextEditOptions(TextEditOptions.FontReplace
.RemoveUnusedFonts));
// Accept the absorber for first page
doc.getPages()get(1).accept(absorber);
// Find Courier font
Font font = FontRepository.findFont("Courier");
// Set the font for all the text fragments
for (TextFragment textFragment : (Iterable<TextFragment>)absorber.TextFragments)
{
textFragment.getTextState().setFont ( font);
}
// Save document
doc.save("D:\\Tests\\output.pdf");
textEditOptions - Text edit options (Allows to turn on some edit features).
Performs text search and provides access to search results via
TextFragmentAbsorber.TextFragments collection.
public TextFragmentAbsorber(String phrase)
Initializes a new instance of the TextFragmentAbsorber class for the specified text
phrase.
The example demonstrates how to find text on the first PDF document page and replace the text and it's font.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Find font that will be used to change document text font
com.aspose.pdf.Font font = FontRepository.findFont("Arial");
// Create TextFragmentAbsorber object to find all "hello world" text occurrences
TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
// Accept the absorber for first page
doc.getPages().get_Item(1).accept(absorber);
// Change text and font of the first text occurrence
absorber.getTextFragments().get_Item(1).setText ( "hi world");
absorber.getTextFragments().get_Item(1).getTextState().setFont ( font);
// Save document
doc.save("D:\\Tests\\output.pdf");
phrase - Phrase that the TextFragmentAbsorber searches
Performs text search of the specified phrase and provides access to search results via
TextFragmentAbsorber.TextFragments collection.
public TextFragmentAbsorber(Pattern regex)
Initializes a new instance of the TextFragmentAbsorber class for the specified System.Text.RegularExpressions.Regex class object.
// Open document
Document doc = new Document("input.pdf");
// Find font that will be used to change document text font
Font font = FontRepository.findFont("Arial");
// Create TextAbsorber object to find all instances of the input regex
TextFragmentAbsorber absorber = new TextFragmentAbsorber(new Regex("h\\w*?o"));
// Accept the absorber for first page
doc.getPages().get_item(1).accept(absorber);
// we should find "hello" word and replace it with "Hi"
absorber.getTextFragments().get_item(1).setText("Hi");
// Save document
doc.save("output.pdf");
regex - System.Text.RegularExpressions.Regex class object that the TextFragmentAbsorber searches
Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments(getTextFragments()/setTextFragments(TextFragmentCollection)) collection.
public TextFragmentAbsorber(String phrase, TextSearchOptions textSearchOptions)
Initializes a new instance of the TextFragmentAbsorber class for the specified text
phrase and text search options.
The example demonstrates how to find text with regular expression on the first PDF document page and replace
the text.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Create TextFragmentAbsorber object that searches all words starting 'h' and ending 'o' using regular
expression.
TextFragmentAbsorber absorber = new TextFragmentAbsorber("h\\w*?o", new TextSearchOptions(true));
// we should find "hello" word and replace it with "Hi"
doc.getPages().get_Item(1).accept(absorber);
absorber.getTextFragments().get_Item(1).setText ( "Hi");
// Save document
doc.save("D:\\Tests\\output.pdf");
phrase - Phrase that the TextFragmentAbsorber searchestextSearchOptions - Text search options (Allows to turn on some search features. For example, search with
regular
expression)
Performs text search of the specified phrase and provides access to search results via
TextFragmentAbsorber.TextFragments collection.
public TextFragmentAbsorber(Pattern regex, TextSearchOptions textSearchOptions)
Initializes a new instance of the TextFragmentAbsorber class for the specified text phrase and text search options.
// Open document
Document doc = new Document("input.pdf");
// Create TextFragmentAbsorber object that searches all words starting 'h' and ending 'o' using regular expression.
TextFragmentAbsorber absorber = new TextFragmentAbsorber(new Regex("h\\w*?o"), new TextSearchOptions(true));
// we should find "hello" word and replace it with "Hi"
doc.getPages().get_Item(1).accept(absorber);
absorber.getTextFragments.get_Item(1).setText("Hi");
// Save document
doc.save("output.pdf");
regex - Regex class object that the TextFragmentAbsorber searchestextSearchOptions - Text search options (Allows to turn on some search features.)
Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments(getTextFragments()/setTextFragments(TextFragmentCollection)) collection.
public TextFragmentAbsorber(Pattern[] patterns, TextSearchOptions textSearchOptions)
Initializes a new instance of the TextFragmentAbsorber class for the specified text phrase and text search options.
TextFragmentAbsorber.RegexResults(getRegexResults())
var results = absorber.getRegexResults();patterns - Array of System.Text.RegularExpressions.Regex class object that the TextFragmentAbsorber searches.textSearchOptions - Text search options (Allows to turn on some search features.).
Performs text search of the specified array of phrases and provides access to search results via TextFragmentAbsorber.RegexResults(getRegexResults()) dictionary.
public TextFragmentAbsorber(String phrase, TextSearchOptions textSearchOptions, TextEditOptions textEditOptions)
Initializes a new instance of the TextFragmentAbsorber class for the specified text
phrase, text search options and text edit options. The text edit options are not supported
yet.
The example demonstrates how to find text with regular expression on the first PDF document page and replace
the text.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Create TextFragmentAbsorber object that searches all words starting 'h' and ending 'o' using regular
expression.
TextFragmentAbsorber absorber = new TextFragmentAbsorber("h\w*?o", new TextSearchOptions(true));
// we should find "hello" word and replace it with "Hi"
doc.getPages().get_item(1).accept(absorber);
absorber.getTextFragments().get_Item(1).setText ( "Hi");
// Save document
doc.save("D:\\Tests\\output.pdf");
phrase - Phrase that the TextFragmentAbsorber searchestextSearchOptions - Text search options (Allows to turn on some search features. For example, search with
regular
expression)textEditOptions - Text edit options (Allows to turn on some edit features. For example, define special
behavior
when requested symbol cannot be written with font). The parameter is not supported yet.
Performs text search of the specified phrase and provides access to search results via
TextFragmentAbsorber.TextFragments collection.
public TextFragmentAbsorber(Pattern regex, TextEditOptions textEditOptions)
Initializes a new instance of the TextFragmentAbsorber class for the specified text phrase and text edit options.
regex - System.Text.RegularExpressions.Regex class object that the TextFragmentAbsorber searchestextEditOptions - Text edit options (Allows to turn on some edit features).
Performs text search of the specified phrase and provides access to search results via TextFragmentAbsorber.TextFragments(getTextFragments()/setTextFragments(TextFragmentCollection)) collection.
public TextFragmentAbsorber(String phrase, TextEditOptions textEditOptions)
Initializes a new instance of the TextFragmentAbsorber class for the specified text
phrase and text edit options.
phrase - Phrase that the TextFragmentAbsorber searchestextEditOptions - Text edit options (Allows to turn on some edit features).
Performs text search of the specified phrase and provides access to search results via
TextFragmentAbsorber.TextFragments collection.
public TextFragmentCollection getTextFragments()
Gets collection of search occurrences that are presented with TextFragment objects.
The example demonstrates how to find text on the first PDF document page and replace all search occurrences
with new text.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Find font that will be used to change document text font
Font font = FontRepository.findFont("Arial");
// Create TextFragmentAbsorber object to find all "hello world" text occurrences
TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
// Accept the absorber for first page
doc.getPages().get(1).accept(absorber);
// Change text of all search occurrences
for (TextFragment textFragment : (Iterable<TextFragment>)absorber.getTextFragments())
{
textFragment.setText ( "hi world");
}
// Save document
doc.save("D:\\Tests\\output.pdf");
public void setTextFragments(TextFragmentCollection value)
Sets collection of search occurrences that are presented with TextFragment objects.
value - TextFragmentCollection object
The example demonstrates how to find text on the first PDF document page and replace all search
occurrences with new text.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Find font that will be used to change document text font
Font font = FontRepository.findFont("Arial");
// Create TextFragmentAbsorber object to find all "hello world" text occurrences
TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
// Accept the absorber for first page
doc.getPages().get(1).accept(absorber);
// Change text of all search occurrences
for (TextFragment textFragment : (Iterable<TextFragment>)absorber.getTextFragments())
{
textFragment.setText ( "hi world");
}
// Save document
doc.save("D:\\Tests\\output.pdf");
public final HashMap<Pattern,TextFragmentCollection> getRegexResults()
Gets dictionary of search occurrences that are presented with System.Text.RegularExpressions.Regex class as key and TextFragment as value.
// Open document
Document doc = new Document("input.pdf");
Regex regexes = new Regex[]
{
new Regex( RegexOptions.IgnoreCase),
new Regex( RegexOptions.IgnoreCase),
};
// Create TextFragmentAbsorber object that searches all words starting 'h' and ending 'o' using regular expression.
TextFragmentAbsorber absorber = new TextFragmentAbsorber(regexes, new TextSearchOptions(true));
doc.getPages().get_Item(1).accept(absorber);
// Get results
Dictionary results = absorber.getRegexResults();
public final com.aspose.ms.System.Collections.Generic.Dictionary<com.aspose.ms.System.Text.RegularExpressions.Regex,TextFragmentCollection> getRegexResultsInternal()
public String getPhrase()
Gets phrase that the TextFragmentAbsorber searches on the PDF document or page.
The example demonstrates how to perform search text several times and perform text replacements.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Create TextFragmentAbsorber object to find all "hello" text occurrences
TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello");
doc.getPages().get(1).accept(absorber);
absorber.getTextFragments().get_Item(1).setText ( "Hi");
// search another word and replace it
absorber.setPhrase ( "world");
doc.getPages().get(1).accept(absorber);
absorber.getTextFragments().get_Item(1).setText ( "John");
// Save document
doc.save("D:\\Tests\\output.pdf");
public void setPhrase(String value)
Sets phrase that the TextFragmentAbsorber searches on the PDF document or page.
value - String value
The example demonstrates how to perform search text several times and perform text replacements.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Create TextFragmentAbsorber object to find all "hello" text occurrences
TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello");
doc.getPages().get(1).accept(absorber);
absorber.getTextFragments().get_Item(1).setText ( "Hi");
// search another word and replace it
absorber.setPhrase ( "world");
doc.getPages().get(1).accept(absorber);
absorber.getTextFragments().get_Item(1).setText ( "John");
// Save document
doc.save("D:\\Tests\\output.pdf");
public TextSearchOptions getTextSearchOptions()
Gets search options. The options enable search using regular expressions.
getTextSearchOptions in class TextAbsorber
The example demonstrates how to perform search text using regular expression.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Create TextFragmentAbsorber object
TextFragmentAbsorber absorber = new TextFragmentAbsorber();
// make the absorber to search all words starting 'h' and ending 'o' using regular expression.
absorber.setPhrase ( "h\w*?o");
absorber.setTextSearchOptions ( new TextSearchOptions(true));
// we should find "hello" word and replace it with "Hi"
doc.getPages().get(1).accept(absorber);
absorber.getTextFragments().get_Item(1).setText ( "Hi");
// Save document
doc.save("D:\\Tests\\output.pdf");
public void setTextSearchOptions(TextSearchOptions value)
Sets search options. The options enable search using regular expressions.
setTextSearchOptions in class TextAbsorbervalue - TextSearchOptions object
The example demonstrates how to perform search text using regular expression.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Create TextFragmentAbsorber object
TextFragmentAbsorber absorber = new TextFragmentAbsorber();
// make the absorber to search all words starting 'h' and ending 'o' using regular expression.
absorber.setPhrase ( "h\w*?o");
absorber.setTextSearchOptions ( new TextSearchOptions(true));
// we should find "hello" word and replace it with "Hi"
doc.getPages().get(1).accept(absorber);
absorber.getTextFragments().get_Item(1).setText ( "Hi");
// Save document
doc.save("D:\\Tests\\output.pdf");
public TextEditOptions getTextEditOptions()
Gets text edit options. The options define special behavior when requested symbol cannot be written with font.
public void setTextEditOptions(TextEditOptions value)
Sets text edit options. The options define special behavior when requested symbol cannot be written with font.
value - TextEditOptions objectpublic TextReplaceOptions getTextReplaceOptions()
Gets text replace options. The options define behavior when fragment text is replaced to more short/long.
public void setTextReplaceOptions(TextReplaceOptions value)
Sets text replace options. The options define behavior when fragment text is replaced to more short/long.
value - TextReplaceOptions valuepublic boolean hasErrors_Fragment()
Value indicates whether errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.
public List<TextExtractionError> getErrors()
List of TextExtractionError objects. It contain information about errors were found
during text extraction. Searching for errors will performed only if
TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.
getErrors in class TextAbsorberpublic String getText()
Gets extracted text that the TextAbsorber extracts on the PDF document or page.
getText in class TextAbsorberThe example demonstrates how to extract text from all pages of the PDF document. // open document Document doc = new Document(inFile); // create TextAbsorber object to extract text TextAbsorber absorber = new TextAbsorber(); // accept the absorber for all document's pages doc.getPages().accept(absorber); // get the extracted text String extractedText = absorber.getText();
public void visit(Page page)
Performs search on the specified page.
The example demonstrates how to find text on the first PDF document page and replace the text.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Find font that will be used to change document text font
Font font = FontRepository.findFont("Arial");
// Create TextFragmentAbsorber object to find all "hello world" text occurrences
TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
// Accept the absorber for first page
absorber.visit(doc.getPages().get(1));
// Change text of all search occurrences
for (TextFragment textFragment : (Iterable<TextFragment>)absorber.getTextFragments())
{
textFragment.setText ( "hi world");
}
// Save document
doc.save("D:\\Tests\\output.pdf");
visit in class TextAbsorberpage - PDF document page object.public void visit(IDocument pdf)
Performs search on the specified document.
The example demonstrates how to find text on PDF document and replace text of all search occurrences.
// Open document
Document doc = new Document("D:\\Tests\\input.pdf");
// Find font that will be used to change document text font
Font font = FontRepository.findFont("Arial");
// Create TextFragmentAbsorber object to find all "hello world" text occurrences
TextFragmentAbsorber absorber = new TextFragmentAbsorber("hello world");
// Accept the absorber for first page
absorber.visit(doc);
// Change text of the first text occurrence
absorber.getTextFragments().get_Item(1).setText ( "hi world");
// Save document
doc.save("D:\\Tests\\output.pdf");
visit in class TextAbsorberpdf - PDF document object.public void applyForAllFragments(Font font)
Applies font for all text fragments that were absorbed. It works faster than looping through the fragments if all fragments on the page(s) were absorbed. Otherwise it works similar with looping.
font - Fontof the text.public void applyForAllFragments(float fontSize)
Applies font size for all text fragments that were absorbed. It works faster than looping through the fragments if all fragments on the page(s) were absorbed. Otherwise it works similar with looping.
fontSize - Font size of the text.public void applyForAllFragments(Font font, float fontSize)
Applies font and size for all text fragments that were absorbed. It works faster than looping through the fragments if all fragments on the page(s) were absorbed. Otherwise it works similar with looping.
font - Fontof the text.fontSize - Font size of the text.public void reset()
Clears TextFragments collection of this TextFragmentAbsorber object.
public void removeAllText(Page page)
Removes all text from the specified page.
page - PDF document page object.public final void removeAllText(Page page, Rectangle rect)
Removes text inside the specified rectangle from the specified page.
page - PDF document page object.rect - Rectangle to remove text inside.public void removeAllText(Document document)
Removes all text from the document.
document - PDF document object.public void visit(XForm xForm)
Performs search on the specified form object.
visit in class TextAbsorberxForm - Pdf form object.public TextExtractionOptions getExtractionOptions()
Gets text extraction options.
getExtractionOptions in class TextAbsorberpublic void setExtractionOptions(TextExtractionOptions value)
Sets text extraction options.
setExtractionOptions in class TextAbsorbervalue - TextExtractionOptions objectCopyright © 2025 Aspose. All Rights Reserved.