Browse our Products

Aspose.Words for Python via .NET 22.7 Release Notes

Major Features

There are 85 improvements and fixes in this regular monthly release. The most notable are:

  • Implemented an ability to convert PDF documents to fixed page formats with high fidelity and performance.
  • Implemented support of WCAG 2.0 PDF.
  • Implemented our own glyph outlines parsing for OpenType(CFF) fonts.
  • Introduced new HTML import mode for block-level elements.
  • Provided an ability to set shadow formatting of the shape object.

Full List of Issues Covering all Changes in this Release (Reported by .NET Users)

KeySummaryCategory
WORDSNET-13702Support parsing of glyph data for OpenType(CFF)New Feature
WORDSNET-15752Support DATABASE fieldNew Feature
WORDSNET-19220Add feature to support WCAG 2.0 PDFNew Feature
WORDSNET-23295Add a flag to take EXIF orientation in account while inserting a JPEG image by LINQ Reporting EngineNew Feature
WORDSNET-23654Add a new mode for import HTML block-level elements during inserting HTML via DocumentBuilder.InsertHtml methodNew Feature
WORDSNET-18125Make sure saving to tagged PDF follows Section 508 GuidelinesEnhancement
WORDSNET-6892TextBox is not preserved on HTML importBug
WORDSNET-14009Text Font and Gradient fill not saved in PDF outputBug
WORDSNET-20981Word document converted to PDF results in different font for last pageBug
WORDSNET-21368Unexpected Bold Formatting to custom style during Word to HTML to Word conversionBug
WORDSNET-22323DOCX to PDF conversion issue with formula/equation renderingBug
WORDSNET-22948Import of SVG image differs from what is in browserBug
WORDSNET-23313Invalidate document layout after calling Document.Compare with two PDF documentsBug
WORDSNET-23544Document missing sections after savingBug
WORDSNET-23646Date X-Axis shows values with incorrect stepBug
WORDSNET-23684Incorrect calculation of indents for border box  around the formulaBug
WORDSNET-23701Font size is not exported to HTMLBug
WORDSNET-23706Numbering is broken after converting document to HTMLBug
WORDSNET-23709Shape stroke is not rendered to JPEGBug
WORDSNET-23783Consider disabling support for external resources when loading EPUB documentsBug
WORDSNET-23810Incorrect background image after Pdf2Word conversionBug
WORDSNET-23817Header height is changed that leads to layout issuesBug
WORDSNET-23828Content is removed after saving the documentBug
WORDSNET-23829DOCX to PDF: Characters rendered as boxesBug
WORDSNET-23841Text orientation is turned to vertical after converting to HTMLBug
WORDSNET-23851Data label values are rendered improperlyBug
WORDSNET-23855CryptographicException: The input data is not a complete blockBug
WORDSNET-23865KeepSourceFormatting does not honor source document styleBug
WORDSNET-23866Field updating hangs if document is optimized for Word2016Bug
WORDSNET-23867Wrong outlines are returned for the space characterBug
WORDSNET-23869Incorrect font detection when rendering a formulaBug
WORDSNET-23874Thickness of hairline is different when render with .NET and .NET Standard versionsBug
WORDSNET-23875Header row is not repeated upon rendering for a floating tableBug
WORDSNET-23878Text is wrapped improperlyBug
WORDSNET-23886Style applied to text is changed after open/save DOCX documentBug
WORDSNET-23888Aspose.Words hangs for a while upon loading MHTML fileBug
WORDSNET-23889Wrong list numbering in SDT bound to custom XML partBug
WORDSNET-23890Evaluation watermark in ODT document overlaps content of the documentBug
WORDSNET-23902Redundant space between letter is added upon rendering SVG imageBug
WORDSNET-23913FileNotFoundException is thrown upon loading DOCX documentBug
WORDSNET-23918ArgumentException because of duplicates in CustomDocumentPropertiesBug
WORDSNET-23919Aspose.Words hangs upon updating fields or layoutBug
WORDSNET-23922Incorrect font detection for East Asian characters when rendering a formulaBug
WORDSNET-23924InvalidCastException is thrown upon updating fieldsBug
WORDSNET-23925Word document not saving PNGBug
WORDSNET-23929Text is wrapped differently after renderingBug
WORDSNET-23936Reverse order of replies on the comment in the airBug
WORDSNET-23941ZlibException: Bad state (invalid distance code)Bug
WORDSNET-23942Images are rendered in PDF as red crossBug
WORDSNET-23947System.OverflowException: Value was either too large or too small for an Int32Bug
WORDSNET-23948InvalidOperationException: MediaBox is nullBug
WORDSNET-23950Reply naming differences within export to PDFBug
WORDSNET-23951Formating issue on the lastest Pdf2Word releaseBug
WORDSNET-23952Chart axis are not visible when render as SVGBug
WORDSNET-23954List labels in Swedish are rendered in EnglishBug
WORDSNET-23955Spacing between numbers and Chinese hieroglyphs is too big in chart axis labelsBug
WORDSNET-23958Exception when comparing documentsBug
WORDSNET-23963List label is added to the paragraph on the next page when ExtractPages is usedBug
WORDSNET-23965InvalidOperationException is thrown upon rendering documentBug
WORDSNET-23974Style separator produces line break after renderingBug
WORDSNET-23976Korean text is not wrapped properly when WordWrap option is disabledBug
WORDSNET-23981DOCX to MD conversion exceptionBug
WORDSNET-24010ImportStyle() returns null for KeepDifferentStylesBug
WORDSNET-24034InvalidOperationException is thrown upon comparing documentBug

Full List of Issues Covering all Changes in this Release (Reported by Java Users)

KeySummaryCategory
WORDSNET-21279Arabic text rendered LTR (garbled) when converting from document to PDFBug
WORDSNET-21764Math equations are blurred during exporting Word to HTML on LinuxBug
WORDSNET-22648Incorrect Rendering of Math Equations in PDFBug
WORDSNET-22896Font Fallback does not work properly for text within SVG imagesBug
WORDSNET-23598Part of content is moved to previous pageBug
WORDSNET-23599Whitespaces font is reset to Arial upon importing HTMLBug
WORDSNET-23623API fails to load EML files as MHTMLBug
WORDSNET-23781UpdatePageLayout hangsBug
WORDSNET-23862Chinese text in SVG is rendered as tofu when convert to PDFBug
WORDSNET-23877Provide API to remove the shape shadowsBug
WORDSNET-23893InvalidOperationException is thrown upon executing mail mergeBug
WORDSNET-23909Numbering is changed after inserting documentBug
WORDSNET-23910Font is changed after inserting document when KeepDifferentStyles is usedBug
WORDSNET-23927NullReferenceException is thrown upon rendering documentBug
WORDSNET-23937Layout is different after DOCX to PDF conversionBug
WORDSNET-23938FileCorruptedException is thrown upon loading DOCX documentBug
WORDSNET-23968Hanging during export to PDFBug
WORDSNET-23970Header and footer are lost after renderingBug
WORDSNET-23979Word to PDF -  conversion issue with floating table header rowsBug
WORDSNET-23980IF field with wildcard is updated improperlyBug
WORDSNET-24007FileCorruptedException on loading RTF fileBug

Public API and Backward Incompatible Changes

This section lists public API changes that were introduced in Aspose.Words 22.7. It includes not only new and obsoleted public methods, but also a description of any changes in the behavior behind the scenes in Aspose.Words which may affect existing code. Any behavior introduced that could be seen as regression and modifies the existing behavior is especially important and is documented here.

Added a new mode for import HTML block-level elements during inserting HTML via DocumentBuilder.insert_html() method

Related issue: WORDSNET-23654

New HTML insertion option was added to HtmlInsertOptions enum.

class HtmlInsertOptions(IntEnum):
    ...
    
    # Preserve properties of block-level elements.
    #
    # By default, properties of parent blocks are merged and stored on their child elements (i.e. paragraphs or tables).
    # If this option is specified, properties of each block are stored separately in a special logical structure.
    # As a result, this option allows to better preserve individual borders and margins seen in the HTML document
    # and get better conversion results. The downside is that the resulting document gets harder to modify, since borders
    # and margins stored in the logical structure are not available for editing.
    #
    # Only margins and borders of 'body', 'div', and 'blockquote' HTML elements are preserved. Properties of each HTML
    # element are stored separately.
    #
    # If this option is specified, Aspose.Words mimics MS Word's behavior regarding import of block properties.
    PRESERVE_BLOCKS = 4

The new mode of import HTML block-level elements during inserting HTML via DocumentBuilder.insert_html() method allows to better preserve borders and margins seen in the HTML document and get better conversion results.

html = """
<html>
    <div style='border:dotted'>
        <div style='border:solid'>
            <p>paragraph 1</p>
            <p>paragraph 2</p>
        </div>
    </div>
</html>
"""

# Set the new mode of import HTML block-level elements.
insert_options = aw.HtmlInsertOptions.PRESERVE_BLOCKS
builder = aw.DocumentBuilder()
builder.insert_html(html, insert_options)
builder.document.save(my_dir + "sample.docx")

Added new public property shadow_format

Related issue: WORDSNET-23877

A new public shadow_format property has been added to ShapeBase class

class ShapeBase:
    ...

    @property
    def shadow_format(self) -> aw.drawing.ShadowFormat:
        """Gets shadow formatting for the shape."""
        ...

With this property customers can set or get one of the preset shadow types.

class ShadowFormat:
    ...

    @property
    def type(self) -> aw.drawing.ShadowType:
        """Gets the specified ShadowType for ShadowFormat."""
        ...

    @type.setter
    def type(self, value: aw.drawing.ShadowType):
        """Sets the specified ShadowType for ShadowFormat."""
        ...

Users can also get information about a shadow’s visibility.

class ShadowFormat:
    ...

    @property
    def visible(self) -> bool:
        """Returns True if the formatting applied to this instance is visible.
        
        Unlike clear(), assigning False to visible does not clear the formatting,
        it only hides the shape effect."""
        ...

And it is also possible to clear ShadowFormat.

class ShadowFormat:
    ...

    def clear(self):
        """Clears shadow format."""
        ...

Use Case:

doc = aw.Document("DocumentWithShape.docx")
shape = doc.first_section.body.get_child(aw.NodeType.SHAPE, 0, True).as_shape()
# Checking whether the shadow effect is visible and whether the preset type is SHADOW2.
if shape.shadow_format.visible and shape.shadow_format.shape_type == aw.drawing.ShapeType.SHADOW2:
    # Setting the preset shadow type to SHADOW7.
    shape.shadow_format.type = aw.drawing.ShadowType.SHADOW7
# Checking whether the shadow is customized, i.e. the preset type is SHADOW_MIXED.
if shape.shadow_format.type == aw.drawing.ShadowType.SHADOW_MIXED:
    # Clearing ShadowFormat.
    shape.shadow_format.clear()

ReportBuildOptions.RESPECT_JPEG_EXIF_ORIENTATION enum member

Related issue: WORDSNET-23295

The following member has been added to the ReportBuildOptions enum:

class ReportBuildOptions(IntEnum):
    ...

    # Specifies that the engine should use EXIF ​​image orientation values to appropriately rotate inserted
    # JPEG images.
    RESPECT_JPEG_EXIF_ORIENTATION = 16

The option can be applied while building a report in the following way:

engine = aw.reporting.ReportingEngine()
engine.options |= aw.reporting.ReportBuildOptions.RESPECT_JPEG_EXIF_ORIENTATION
engine.build_report(...)

Added new class for saving PDFs to other fixed formats

Related feature task: WORDSNET-23059

We’ve added a new way to work with PDF input files. Now they can be converted into a fixed format without using Words layout model.

I.e. the feature runs without Document class and returns the result in a stream object.

Example:

pdf_renderer = aw.pdf2word.fixedformats.PdfFixedRenderer();
options = aw.pdf2word.fixedformats.PdfFixedOptions()
options.page_index = 0
options.page_count = 2
result_stream = pdf_renderer.save_pdf_as_html(pdf_stream, options)

Pros:

  • More accurate conversion (positions of text and other elements).
  • Better performance and memory usage (less logic to run, no need to build flow models, etc).

Cons:

  • The list of output formats is limited for now (PDF, Html, XPS, Jpeg, Png, Tiff, Bmp).
  • There is no way to edit the data during the conversion.
  • A small amount of options such as Password, page range and Jpeg image quality.

Supported methods:

save_pdf_as_html(...)
save_pdf_as_xps(...)
save_pdf_as_images(...)
save_pdf_as_pdf(...)

Available options:

  • page_index and page_count can be used to select a subset of pages.
  • password - allows to decode an encrypted PDF. The result would be decrypted.
  • jpeg_quality - can be provided before save_pdf_as_images calls to setup output Jpeg image quality.
  • image_format - should be used to specify the output image format for save_pdf_as_images.

All options are optional and can be ommited in favor of default values.