At Open Broadcast Systems, we have a lot of repetitive business processes that essentially involve the same simple tasks every day or every week. We like to follow the famous Japanese continuous improvement process of Kaizen, made famous by Toyota, where every employee from the factory floor worker to the CEO can suggest and enact improvements to their work to improve efficiency. But in the era of automation, we want to do more – we want to improve processes by several orders of magnitude and then have them done automatically. This lets us run with a much lower administrative headcount compared to similar companies, as well as move more quickly.
One of the hardest processes to improve was our ordering processes for Blackmagic cards. Owing to recent GBP currency fluctuations, our supply chain has had regular price changes, most of which we automatically calculate from USD or EUR and pull into spreadsheets.
However, all Blackmagic equipment from the UK distributor is priced using a master PDF price list pictured below:
It is, of course, easy for humans to read this datasheet but for a machine, it has a confusing mix of pictures, description, and a large number of subcategories of device. Note how there could be a single heading to list the subcategories or just a single device type. None of this is easy for a computer to understand.
Initially we tried basic PDF extraction tools like pdfextract but they struggled with the complex table structure. But then we found Tabula, software used by journalists to parse released documents.
It was able to understand the document structure very well as we can see:
From there it was some very simple python scripting to extract prices and loading all this data into our supply chain spreadsheets. We can now run this as a batch job every day and have nicely updated prices.
We want to do this for as many business processes, from the simple to the complicated. However, we lack a lot of things to really make the most of automation:
- Banking APIs, we want to automate these processes. There’s lots of innovation in the consumer space here, but in the business world, we are still stuck with generating CSV files and all the problems that entail, and then manually uploading these.
- Automated Language Processing in emails. The success of Google Inbox shows that it is possible to intelligently sort and suggest responses to emails. In our case, we get a lot of emails saying “Can I have a quote for X”. We’d like software that would understand this and create a quote (and send to a human to confirm). This is obviously a nontrivial problem (perhaps IBM Watson could solve it).