Digitization and data conversion


Digitization is the process of converting raw data/book into a digital format for XML, SGML, epubs and other electronic formats. Data conversion involves conversion of one form of digital data into another form. Data conversion is carried out mostly to meet the requirements of application interoperability or to leverage the capability of new features. Aster, a leader in digitization and data conversion process, transform various data formats to functional format, which will be compatible for the usage on web or other digital devices (such as epub, mobi, Nook etc). We convert hard copy- book, journal, magazine or newspaper in electronic formats such as PDF, word, MS Excel, TIFF, JPEG, QuarkXpress, InDesign, Frame Maker etc. and then into XML, SGML, HTML & other digital formats. These formats are validated based on the Document Type Definition (DTD) specified by the client. We have the expertise and capability to undertake huge and complex data conversion projects involving multiple file formats.

Our services include:


We at Aster can provide superior quality scanning services that scan the hardcopy of the book or journal or newspaper and convert them to PDF, JPEG or TIFF format.

Optical Character Recognition (OCR):

OCR is conversion of data from the scanned PDF, JPEG, TIFF, Word files into electronic format. Aster is endowed with technology and professionally skilled team to convert the scanned data into editable text which is error free seamlessly integrated and easily readable. This data is later used for XML conversion.

Defining Meta Data:

With ever growing digital data, metadata is expected to become a very significant feature for structured content especially to find a specific required resource/data from digital library. It is basically used for search engines. It eases search of documents from the database when looked up in the search engines.

We capture the following in the body part,

  • Section levels
  • Paragraph
  • Emphasis – bold, Italics & underline
  • Lists – numbered list, un-numbered list, bullet list and other list format.
  • Inline & display equation captured using MathML.
  • Tables are tagged according to the source alignment.
  • Abbreviations, glossary.
  • Cross reference links are given for tables, figures, notes, references, etc.
  • Bibliographical references
  • Author
  • Year
  • Article / book title
  • Journal / book name
  • Journal supplementary
  • Volume / edition number
  • Issue number
  • ISSN / ISBN (number)
  • Start & end page
The images are cross checked in the converted files to ensure the images are placed in its exact location and are not missing or interchanged.

Citation Tagging, Indexing, Referencing and Keywords coding:

The citation tagging enables to cite sources and references so it allows online search. For instance data can be searched with author’s name, title or ISSN, ISBN number. We provide indexing services which eases document search. Indexing using different search terms that acts as a guide to procure the specific document. We also provide referencing and keywords coding services to enable better search results.

Imaging Services

As part of post scanning process, a scanned image is edited to create better quality image output. Usually, an illustration is captured in resolutions of 50 dpi, 300 dpi and 600 dpi. The imaging process includes:

  • Edit image. Eliminate moire, bleed-through of ink from text, image, watermark, pen mark, stain such as ink and rust
  • Create and add background to a picture
  • Manipulate raw picture which involve masking, cloning and color correction
  • Create shadows for image object
  • Animate the manipulated picture with special coding