Saturday, June 30, 2012

PET: PDF to Ebook Transmogrifier

Finding a good project name is hard, but I'm pretty confident I've done it.

I've got a project, and it's name is the PDF Ebook Transmogrifier (P.E.T.).

My web browser offered to correct the spelling of Transmogrifier to ''Transmogrified" or "Transmogrifier", which makes me smile.

P.E.T. will convert all those PDFs I want to read into ebooks, so I can read them on my Kindle. You can read them on whatever ebook reader you like, from a smartphone to a picture frame.

I haven't decided what to build P.E.T. out of yet, but for the prototype I'll stick with what I know: Bash, ImageMagick, Octave and Calibre. If those names aren't familiar to you, don't fret; Just know that my first P.E.T. will be cobbled together from spaghetti and sparkles, like all good prototypes. Later I might rewrite it with ITK, because I think it's neat and I want to learn more about it before I try writing another medical image analyzer.

P.E.T. will accept as inputs PDFs with:

  • Multiple columns of text and pictures
  • Pictures as wide as a column, and as wide as multiple columns
  • Headers and footers and page numbers
  • Conversion Rules tailored to specific types of documents, and the can-do-attitude to decide when to use them.

If I'm lucky, maybe one day P.E.T. will go off to work with Calibre. That'd make me proud.

Saturday, June 23, 2012

Multi-Column Journal Article PDF to Ebook Converter

[Update: references in this post to my trade organization and the signal processing journal it publishes have been removed. Think of this project as more of a generic multi-column journal article PDF to Ebook converter, for the time being. Perhaps it will be something more specific than that again in the future.]

Guess what day it is? 

To celebrate, I've decided that today is the big day I'm going to start a programming project that I've been thinking about on and off for the two years.

This post presents the design spec for my Computer Vision Assisted Research Paper PDF to Ebook Converter. I only mention Computer Vision because I loved the subject when I studied it in University, and this project will hopefully keep me learning about the field. CVARPPEC... hmm... that's pretty catchy. Any suggestions for a project name would be appreciated, especially funny ones.

My inspiration is as follows: I like reading (certain) computer vision-related technical papers, the kind university researchers publish, but unfortunately thse papers are published universally in PDF format, and my ebook reader doesn't handle PDFs very well. Especially ones with two columns of text and big fat margins on all sides, such as... almost every technical paper in the field of computer vision that I have ever seen.

There are some solutions available, of course, from around our bountiful internet. Calibre is a wonderful ebook app I use for converting, storing, and web-serving my ebook collection, and its file conversion powers never cease to amaze and please me. Unfortunately, even the fine folks behind that fine piece of software have faced reality... PDF is a horrible format to convert from.


From the Calibre manual:
What are the best source formats to convert?In order of decreasing preference: LIT, MOBI, AZW, EPUB, AZW3, FB2, HTML, PRC, RTF, PDB, TXT, PDF
In other news, here's a project design spec.
This is how everyone designs software: With colourful markers.