Saturday, June 23, 2012

Multi-Column Journal Article PDF to Ebook Converter

[Update: references in this post to my trade organization and the signal processing journal it publishes have been removed. Think of this project as more of a generic multi-column journal article PDF to Ebook converter, for the time being. Perhaps it will be something more specific than that again in the future.]

Guess what day it is? 

To celebrate, I've decided that today is the big day I'm going to start a programming project that I've been thinking about on and off for the two years.

This post presents the design spec for my Computer Vision Assisted Research Paper PDF to Ebook Converter. I only mention Computer Vision because I loved the subject when I studied it in University, and this project will hopefully keep me learning about the field. CVARPPEC... hmm... that's pretty catchy. Any suggestions for a project name would be appreciated, especially funny ones.

My inspiration is as follows: I like reading (certain) computer vision-related technical papers, the kind university researchers publish, but unfortunately thse papers are published universally in PDF format, and my ebook reader doesn't handle PDFs very well. Especially ones with two columns of text and big fat margins on all sides, such as... almost every technical paper in the field of computer vision that I have ever seen.

There are some solutions available, of course, from around our bountiful internet. Calibre is a wonderful ebook app I use for converting, storing, and web-serving my ebook collection, and its file conversion powers never cease to amaze and please me. Unfortunately, even the fine folks behind that fine piece of software have faced reality... PDF is a horrible format to convert from.


From the Calibre manual:
What are the best source formats to convert?In order of decreasing preference: LIT, MOBI, AZW, EPUB, AZW3, FB2, HTML, PRC, RTF, PDB, TXT, PDF
In other news, here's a project design spec.
This is how everyone designs software: With colourful markers.

2 comments: