A random forest image classifier in a day

Learn about how we did collect data and trained a random forest image classifier within a single day for Hack Zurich 2016.

One of my first projects using my newly gathered know-how of machine learning was during HackZurich 2016. We built a sign digitizer which turned handwritten signs into word-like masterpieces. The final result can be seen below. The text detection used a cloud API, the sign recognition, however, used a custom model built using a random forest image classifier.

Demo of our Hack Zurich 2016 project

Data Collection

First, we needed a dataset of signs. A quick Google search made us realize that there wasn’t any back then. So we had to decide on either skipping the functionality of sign recognition or creating our dataset. The team votes went for the custom dataset and we all spent the next 15 minutes drawing shapes on pieces of paper. We scanned our art pieces with the Samsung printer. There were roughly 10 pages like the one shown below.

Data Preparation

Then we used a simple algorithm for finding connected components using https://opencv.org/ that fit into rectangles to crop the shapes into single instances of jpg images. This worked quite well, no further tuning was needed. To group, label or annotate the various shapes we sorted all instances multiple times based on the percentage of black pixels and compressed file size. For each label, we put the images into newly created folders. A final train/ test split and the dataset was ready.

Hand-drawn signs for our image dataset
Our custom dataset we used to train the sign classification model

The Random Forest Image Classifier

To classify the images we used https://scikit-learn.org/, back at that time my most favorite ML library. The images were resized to about 16×16 pixels before we fed them into our random forest image classifier. No further feature extraction has been performed. The simplicity of the shapes and well-aligned crops were good enough to yield satisfaction among the team.

Coupled with a flask based rest API the masterpiece was complete. A sign classifier using random forest, trained within a single day during Hack Zurich 2016.

This article has 1 comments

  1. باك لينك Reply

    Just desire to say your article is as surprising. The clearness in your post is
    simply cool and i can assume you’re an expert on this subject.

    Well with your permission let me to grab your feed
    to keep up to date with forthcoming post. Thanks a million and please carry on the
    enjoyable work.

Leave a Comment

Your email address will not be published. Required fields are marked *