Browse our Products
Latest Release
Deprecation warning
The release 24.3.0 updates the codes of some recognition languages to align with ISO 639-2 standard.
To make it easier to upgrade your code, we have kept all legacy values, but marked them as deprecated. All of your existing code will continue to work and you can even make minor updates to it, but be aware that all deprecated language codes are scheduled to be removed in release 25.1.0 (January 2025).
Time to deprecation: 6 months left.
What was changed
Key | Summary | Category |
---|---|---|
OCRPY‑71 | Added Arabic language recognition and recognition of texts in mixed Arabic/English. | New feature |
OCRPY‑71 | Added Persian (Farsi) language recognition and recognition of texts in mixed Persian/English. | New feature |
OCRPY‑71 | Added Urdu language recognition and recognition of texts in mixed Persian/English. | New feature |
OCRPY‑71 | Added Uyghur language recognition and recognition of texts in mixed Persian/English. | New feature |
OCRPY‑71 | Automatic detection of problematic areas of an image that can significantly impact the accuracy of OCR. | New feature |
OCRPY‑71 | Embedding of user-specified fonts in recognition results saved as PDFs. | New feature |
OCRPY‑71 | Improved saving of recognition results as searchable PDFs. | Enhancement |
Public API changes and backwards compatibility
This section lists all public API changes introduced in Aspose.OCR for Python via Java 24.7.0 that may affect the code of existing applications.
Added public APIs:
The following public APIs have been added to Aspose.OCR for Java 24.7.0 release:
detect_defects()
method
Automatically find potentially problematic areas of image and return the information on the type of defect and its coordinates.
DefectType
enumeration
Image defects that can be detected automatically:
Defect | Value | Description |
---|---|---|
Salt-and-pepper noise | SALT_PEPPER_NOISE | Appears as random white and black pixels scattered across the area. Often occurs in digital photographs. |
Low contrast between text and background | LOW_CONTRAST | Highlights and shadows typically appear on curved pages. |
Blur | BLUR | The entire image or some of its areas are out of focus. Important: This detection algorithm can only identify the entire image as blurry. Specific areas cannot be detected. |
Glare | GLARE | Highlight areas in an image caused by uneven lighting, such as spot lights or flash. |
All supported defects | ALL | All above-mentioned defects. |
DefectAreas
class
Image areas containing a certain type of defect.
Property | Type | Description |
---|---|---|
defectType | DefectType | Defect type (see DefectType enumeration above). |
rectangles | Rectangle[] | Image areas where the defect was found. |
DefectOutput
class
Image areas containing a certain type of defect.
Property | Type | Description |
---|---|---|
source | string | The full path to the file or URL, if any. Empty for streams, byte arrays, and Base64 encoded files. |
page | int | The page number for multi-page images and PDFs. |
defectAreas | DefectAreas[] | The array of image defects and areas where they were found (see DefectAreas class above). |
save_multipage_document_user_font()
method
Save recognition results into a PDF document with embedded TrueType (.TTF) or OpenType (.OTF) font.
Updated public APIs:
The following public APIs have been changed in Aspose.OCR for Python via Java 24.7.0 release:
Language
enumeration
Aspose.OCR for Python via Java can now recognize 4 new alphabets, including texts in mixed languages:
Value | Alphabet |
---|---|
Language.ARA | Arabic and English |
Language.PES | Persian (Farsi) and English |
Language.UIG | Uyghur and English |
Language.URD | Urdu and English |
The following public APIs have been introduced in this release:
Removed public APIs:
No changes
Examples
The code samples below illustrate the changes introduced in this release:
Recognize Arabic text
import aspose as ocr
api = ocr.AsposeOcr()
images = ocr.OcrInput(ocr.InputType.SINGLE_IMAGE)
images.add("source.png")
recognitionSettings = RecognitionSettings()
recognitionSettings.set_language(ocr.Language.ARA)
result = api.recognize(images, recognitionSettings)
print(result[0].recognition_text)
Embed custom font into saved PDF
import aspose as ocr
api = ocr.AsposeOcr()
images = ocr.OcrInput(ocr.InputType.PDF)
images.add("source.pdf")
result = api.recognize(images, recognitionSettings)
api.save_multipage_document_user_font("results.pdf", Format.PDF, result, "fonts/AdobeMingStd-Light.otf")