
Thought out structure and the logical connections in book contents are only visible to human beings. Book searching solutions currently available on the Web and in other digital environments, however, do not exploit these implicit semantics resulting in not satisfying the requirements of all stakeholders including readers, authors, publishers, and librarians. Books are inherently different from web pages and the traditional Web IR techniques do not account for their well-organized structure and the logically connected content. These techniques, however, are basically designed for dealing with hyperlinked collections of rich text in the form of web pages. Traditional Web Information Retrieval (IR) techniques of searching and ranking are applied for this purpose.

Dejean and Meunier used four methods for extracting book structure including (i) detecting and parsing TOC pages (ii) parsing index pages (iii) using classical methods for TOC detection and (iv) using trailing page whitespace methods.īooks being a valuable source of knowledge and learning, have always been searched for on the Web.
#Calibre pdf to epub software
For this purpose, several IE methods have been devised, which include using book layout analysis for extracting TOC using resurgence software for detecting different parts of books by considering typographical positions and book content instead of TOC to detect parts, chapters, sections, and pages using rule-based methods for extracting TOC from books that are having TOC pages, and SVM-based methods for books that are without TOC pages and using layout analysis to identify TOC and other functional regions including chapters, paragraphs, and notes in books. Information Extraction (IE) can be very tricky when applied to digitized books for extracting structure and layout information including TOC. Dejean and Meunier used four methods for extracting book structure including (i) detecting and parsing TOC pages (ii) parsing index pages (iii) using classical methods for TOC detection and (iv) using trailing page whitespace methods.

Compared with the most of other methods used to optimize workflow, this method is simpler, more efficient, and more suitable for e-book format conversion. This research introduces the traditional IE analytical techniques to the workflow optimization of e-book conversion. The simulation results show that, under similar circumstance, both quantity and quality of the products is improved after optimization, which indicates the optimization method is effective. In order to validate the optimization effect, the workflow before and after optimization are generated and implemented by the ExtendSim® simulation software. Then the workflow is analyzed by using 5W1H (why, who, what, where, when, how) methodology and optimized with ECRSI (Eliminate, Combine, Rearrange, Simplify and Increase) principles.

