Abstract
The project focuses on optical character recognition (OCR). It’s the process of sorting images into categories according to numerical or other criteria. The optical character recognition process involves the sub-processes of segmentation, feature extraction, and classification method of recognising textual characters. The term “text capture” refers to the process of digitising previously analogue textual material. Converted materials may then be put to many other uses, such as text in indexes to help find specific files or pictures. With OCR, a digital picture of printed or handwritten text may be converted into a machine-readable digital text format. The digital picture is then broken down into its constituent parts and analysed to identify any traces of text, words, or character blocks. Next, the character blocks are disassembled into their constituent parts and compared to a character dictionary. To decipher the content of the portable document format files we upload to the server, we use Natural Language Processing to identify the recurrent terms (such as objects (name, things)) across several pages. It will provide us with a high-level overview of the material without requiring us to read it from start to finish. And detect plagiarism, convert to other languages. Python is a computer language that may be used to provide a conducive setting for tackling this problem. It provides us with access to a comprehensive library for performing OCR operations. Data mining, algorithm design, computational science, and Python allows you to do a lot more than that and a lot more besides. It helps us get through our problems more quickly and offers a straightforward answer.
doi: 10.17756/nwj.2023-s4-090
Citation: Mourya M, Manasa A, Sai Kumar B, Sunil D. 2023. Text Analysis and Summarization: Innovating Information Processing Through Advanced Technology. NanoWorld J 9(S4): S528-S532.