Browse our Products
Aspose.OCR for Python via .NET 25.1.0 - Release Notes
What was changed
Key | Summary | Category |
---|---|---|
OCRNET‑80 | Recognition results can now be saved in hOCR format. | New feature |
OCRNET‑80 | Optimized searchable PDFs to fully preserve the original image quality and maintain the file size. Note: This improvement applies only when both the source and target files are in PDF format. | Enhancement |
OCRNET‑80 | Removed deprecated APIs to improve code readability and performance. | Enhancement |
OCRNET‑80 | Changed the default language model to English (without diacritics) when no recognition language is explicitly specified. | Enhancement |
Public API changes and backwards compatibility
This section lists all public API changes introduced in Aspose.OCR for Python via .NET 25.1.0 that may affect the code of existing applications.
Added public APIs:
The following public APIs have been introduced in this release:
SaveFormat.HOCR
Instructs Aspose.OCR library to save the recognition results in hOCR format - an open standard of data representation for formatted text obtained from OCR. It includes extracted text, style, layout, and other information.
Updated public APIs:
The following public APIs have been updated in this release:
recognize()
If the language
setting is not specified, the OCR engine defaults to a lightweight Latin character model, which does not support diacritics.
If your project involves the recognition of languages with diacritics, such as French, German, or Spanish, ensure you specify the language explicitly in recognition setting or use the universal Language.EXT_LATIN
model.
recognize_id_card()
If the language
setting is not specified, the OCR engine defaults to a lightweight Latin character model, which does not support diacritics.
If your project involves the recognition of languages with diacritics, such as French, German, or Spanish, ensure you specify the language explicitly in recognition setting or use the universal Language.EXT_LATIN
model.
recognize_passport()
If the language
setting is not specified, the OCR engine defaults to a lightweight Latin character model, which does not support diacritics.
If your project involves the recognition of languages with diacritics, such as French, German, or Spanish, ensure you specify the language explicitly in recognition setting or use the universal Language.EXT_LATIN
model.
recognize_car_plate()
If the language
setting is not specified, the OCR engine defaults to a lightweight Latin character model, which does not support diacritics.
If your project involves the recognition of languages with diacritics, such as French, German, or Spanish, ensure you specify the language explicitly in recognition setting or use the universal Language.EXT_LATIN
model.
recognize_invoice()
If the language
setting is not specified, the OCR engine defaults to a lightweight Latin character model, which does not support diacritics.
If your project involves the recognition of languages with diacritics, such as French, German, or Spanish, ensure you specify the language explicitly in recognition setting or use the universal Language.EXT_LATIN
model.
recognize_receipt()
If the language
setting is not specified, the OCR engine defaults to a lightweight Latin character model, which does not support diacritics.
If your project involves the recognition of languages with diacritics, such as French, German, or Spanish, ensure you specify the language explicitly in recognition setting or use the universal Language.EXT_LATIN
model.
Removed public APIs:
The following APIs deprecated during 2024 have been removed in this release:
recognize_street_photo()
Use the universal recognize()
method which allows to control recognition settings, multiple languages, image regions, spellcheck, and other advanced features.
DetectAreasMode.NONE
Use DetectAreasMode.LEAN
, which provides the same functionality.
DetectAreasMode.DOCUMENT
Use DetectAreasMode.MULTICOLUMN
, which provides the same functionality.
DetectAreasMode.TEXT_IN_WILD
Use DetectAreasMode.UNIVERSAL
, which detects all blocks of text in the image, including sparse and irregular text on street photos.
DetectAreasMode.COMBINE
Use DetectAreasMode.UNIVERSAL
, which works best with sparse irregular text.
DetectAreasMode.PHOTO
Use DetectAreasMode.UNIVERSAL
, which is optimal for all types of text except for multi-column layouts and tables.
Language.CZE
Use Language.CES
to recognize texts in Czech.
Language.DUM
Use Language.NLD
to recognize texts in Dutch.
Language.RUM
Use Language.RON
to recognize texts in Romanian.
Language.SRP_HRV
Use Language.HBS
to recognize texts in Serbo-Croatian (Latin alphabet).
Language.CHI
Use Language.CHINESE
to recognize all Chinese languages, including mixed-language Chinese/English texts.
Language.NONE
Specify the recognition language directly. If the language is not specified, the OCR engine defaults to a lightweight Latin character model, which does not support diacritics.
SpellCheckLanguage.CZE
Use SpellCheckLanguage.CES
to check the spelling in Czech texts.
SpellCheckLanguage.DUM
Use SpellCheckLanguage.NLD
to check the spelling in Dutch texts.
SpellCheckLanguage.RUM
Use SpellCheckLanguage.RON
to check the spelling in Romanian texts.
Examples
The code samples below illustrate the changes introduced in this release:
Save recognition results to hOCR
# Instantiate Aspose.OCR API
api = AsposeOcr()
# Add image to the recognition batch
input = OcrInput(InputType.SINGLE_IMAGE)
input.add("source.png")
# Recognize the image
results = api.recognize(input)
# Save recognition result
results[0].save("result.txt", SaveFormat.HOCR)