Independent Software Engineer
I worked as an independent software engineer for 4 months, from November 2012 to February 2013.
A client wanted to automate a costly manual process of extracting information from PDF documents and storing it in Excel spreadsheets. I built the frontend and backend of a data extraction pipeline in Clojure and PostgreSQL. The crawler retrieved PDF forms from government websites. The data was extracted from the PDF using an OCR library and indexed in PostgreSQL. I designed a web interface to query the data set and build daily reports.
Another client wanted to automate the process of extracting key information from product labels on boxes. I built a naive Bayes classifier in Clojure to recognize the labels from their textual content. The project included the classifier as well as a suite of tools to manage the training set, train the classifier, and evaluate it on new documents. I also implemented several improvements to the OCR pipeline of a mobile document scanning solution. I implemented the Stroke Width transform algorithm and integrated the ZXing bar code detection library.