public class ParagraphAbsorber extends Object
Represents an absorber object of page structure objects such as sections and paragraphs.
Performs search for sections and paragraphs of text and provides access for rectangles and polydons that describes it in text coordinate space.
Also performs text segments search and provides access to search results via TextFragments
collections grouped by structure elements.
// Open document Document doc = new Document("input.pdf"); // Create ParagraphAbsorber object ParagraphAbsorber absorber = new ParagraphAbsorber(); // Accept the absorber for first page absorber.visit(doc.getPages.get_Item(1)); // Get markup object of first page PageMarkup markup = absorber.getPageMarkups().get(0); // Loop through structure elements of the page text to find first text fragment of each paragraph for (MarkupSection section : markup.getSections()) { for (MarkupParagraph paragraph : section.getParagraphs()) { TextFragment fragment = paragraph.getFragments().get_Item(0); // Update text properties fragment.getTextState().setBackgroundColor (Color.getLightBlue()); } } // Save document doc.save(GetOutputPath("output.pdf"));
ParagraphAbsorber.PageMarkups
collection will contains PageMarkup
objects that represents page structure by collections of MarkupSection
and MarkupParagraph
.
The TextFragment
object provides access to the search occurrence text, text properties, and allows to edit text and change the text state (font, font size, color etc).Constructor and Description |
---|
ParagraphAbsorber()
Initializes a new instance of the
ParagraphAbsorber that performs search for sections/paragraphs of the document or page. |
ParagraphAbsorber(int sectionsSearchDepth)
Initializes a new instance of the
ParagraphAbsorber that performs search for sections/paragraphs of the document or page. |
ParagraphAbsorber(int sectionsSearchDepth,
ParagraphAbsorberOptions paragraphAbsorberOptions)
Initializes a new instance of the
ParagraphAbsorber that performs search for sections/paragraphs of the document or page
with the specified parameters. |
ParagraphAbsorber(ParagraphAbsorberOptions paragraphAbsorberOptions)
Initializes a new instance of the
ParagraphAbsorber that performs search for sections/paragraphs of the document or page
with the specified parameters. |
Modifier and Type | Method and Description |
---|---|
List<PageMarkup> |
getPageMarkups()
Gets collection of
PageMarkup that were absorbed. |
ParagraphAbsorberOptions |
getParagraphAbsorberOptions()
Gets the ParagraphAbsorberOptions.
|
int |
getSectionsSearchDepth()
Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed.
|
TextReplaceOptions |
getTextReplaceOptions()
Gets or sets the TextReplaceOptions.
|
boolean |
isMulticolumnParagraphsAllowed()
Gets or sets value that indicates whether starting text lines of a next section may be treated as continuation of the last paragraph of a previous section.
|
void |
setMulticolumnParagraphsAllowed(boolean value)
Gets or sets value that indicates whether starting text lines of a next section may be treated as continuation of the last paragraph of a previous section.
|
void |
setParagraphAbsorberOptions(ParagraphAbsorberOptions value)
Sets the ParagraphAbsorberOptions.
|
void |
setSectionsSearchDepth(int value)
Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed.
|
void |
setTextReplaceOptions(TextReplaceOptions value)
Gets or sets the TextReplaceOptions.
|
void |
visit(Document doc)
Performs search for sections and paragraphs on the specified
Document . |
void |
visit(Page page)
Performs search on the specified
Page . |
public ParagraphAbsorber()
Initializes a new instance of the ParagraphAbsorber
that performs search for sections/paragraphs of the document or page.
public ParagraphAbsorber(int sectionsSearchDepth)
Initializes a new instance of the ParagraphAbsorber
that performs search for sections/paragraphs of the document or page.
sectionsSearchDepth
- Number of sequential searches for more fine elements of structure that will be performed.
ParagraphAbsorber.SectionsSearchDepth
property for more hints about the parameter.
public ParagraphAbsorber(ParagraphAbsorberOptions paragraphAbsorberOptions)
Initializes a new instance of the ParagraphAbsorber
that performs search for sections/paragraphs of the document or page
with the specified parameters.
paragraphAbsorberOptions
- The ParagraphAbsorberOptions.public ParagraphAbsorber(int sectionsSearchDepth, ParagraphAbsorberOptions paragraphAbsorberOptions)
Initializes a new instance of the ParagraphAbsorber
that performs search for sections/paragraphs of the document or page
with the specified parameters.
sectionsSearchDepth
- Number of sequential searches for more fine elements of structure that will be performed.paragraphAbsorberOptions
- The ParagraphAbsorberOptions.public List<PageMarkup> getPageMarkups()
Gets collection of PageMarkup
that were absorbed.
public int getSectionsSearchDepth()
Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed. Default search depth is 3. It means three searches for horizontally divided sections (headers, paragraphs etc) and three searches for vertically divided ones (columns).
public void setSectionsSearchDepth(int value)
Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed. Default search depth is 3. It means three searches for horizontally divided sections (headers, paragraphs etc) and three searches for vertically divided ones (columns).
value
- int valuepublic final boolean isMulticolumnParagraphsAllowed()
Gets or sets value that indicates whether starting text lines of a next section may be treated as continuation of the last paragraph of a previous section.
public final void setMulticolumnParagraphsAllowed(boolean value)
Gets or sets value that indicates whether starting text lines of a next section may be treated as continuation of the last paragraph of a previous section.
value
- boolean valuepublic final ParagraphAbsorberOptions getParagraphAbsorberOptions()
Gets the ParagraphAbsorberOptions.
public final void setParagraphAbsorberOptions(ParagraphAbsorberOptions value)
Sets the ParagraphAbsorberOptions.
value
- ParagraphAbsorberOptions instancepublic final TextReplaceOptions getTextReplaceOptions()
Gets or sets the TextReplaceOptions.
public final void setTextReplaceOptions(TextReplaceOptions value)
Gets or sets the TextReplaceOptions.
value
- TextReplaceOptions instancepublic void visit(Document doc)
Performs search for sections and paragraphs on the specified Document
.
doc
- Pdf document object.public void visit(Page page)
Performs search on the specified Page
.
page
- Pdf document page object.Copyright © 2025 Aspose. All Rights Reserved.