Optical chemical structure recognition
Optical chemical structure recognition (OCSR) is the translation of images that depict chemical structure information into machine-readable formats.[1] It addresses the challenge of translating chemical structures from graphical representations into their corresponding chemical formulas. In scientific publications, documents, and textbooks, molecular structures are typically represented through images and annotated text. These structural formulas are depicted as chemical graphs, where the vertices represent atoms, and the edges signify bonds between them. However, much of the data from older publications remains undigitised, both in image and descriptive formats. This lack of digitisation makes extracting useful information a time-consuming, manual process. OSCR can also translate digital images of molecules available online and scanned pages of chemical documents.[2] The development of the first OCSR systems faced limitations due to the computational resources available and the early stages of Computer Vision and machine learning algorithms. These initial systems primarily relied on heuristic and rule-based approaches, supported by classic Artificial Intelligence (AI) and optical character recognition techniques. However, advancements in hardware, cloud computing, and deep neural networks have revolutionised OCSR. Modern systems now employ attention-based and context-aware image classification models, eliminating the need for separate pre-processing steps like noise removal or image restoration.[3] References
Information related to Optical chemical structure recognition |