Table of contents
All the signs in the sky tell us that Machine Learning and Artificial Intelligence will be as important of a revolution as web browsers or smartphones were back in the day. Unfortunately, this technology has a relatively steep learning curve, which makes its adoption much slower. Very few companies can afford a dedicated data scientist on the payroll. If they do, often, the quality or the amount of data becomes the problem, and just won’t be enough to train your models well.
Fortunately, there are companies on the market that provide very well-trained models and easy to integrate Machine Learning APIs for everyday use cases that you can incorporate into your project and thus take advantage of cutting-edge technology without making the leap of faith. In this blog post, I’d like to point your attention to some of those use cases, which can be the low-hanging fruit you can pick right up.
When thinking about AI and machine learning, a problem of recognizing patterns in an image usually comes first to our mind. Facial recognition, pattern recognition, QR or barcode read-out, detecting inappropriate images before they get published, matching face to a person and many many more use cases fall into this category. As with everything, when a problem is fairly common and there are many solutions to tackle it out there. Let’s first have a look at what device vendors offer us in their platform SDKs.
Apple’s CoreML and CreateML
It turns out both Apple and Google have been offering quite a set of APIs we can take advantage of to perform image recognition directly on a device. In the case of iOS devices, we got CoreML framework with several pre-trained models (for example for face detection or optical character recognition) along with a very easy to use software for training your own models called CreateML. One of the key changes in the third version of this framework is the ability to re-train your models on-device, so your apps can get smarter as users use them, without breaching their privacy because no data will leave their handset or tablet.
Of course, Google has a counterpart to CoreML called MLKit. Things get very interesting with Google’s offering because they made MLKit very closely integrated with Firebase (Google’s Backend as a Service platform) therefore the framework is available for both iOS and Android. What’s more, it can work in two scenarios - on-device and powered by the cloud, both of which have their pros and cons. Integrating MLKit is just a few lines of code and it’ll let you tackle a number of image recognition related problems for instance: detect faces, read barcodes, detect & track objects, recognize and label objects in an image and more. Obviously you’ll have to keep in mind the fact that your performance on the device will be much faster than when sending heavy imagery over to Google’s infrastructure, but you’ll be limited to only light and limited models. A good example may be OCR - on the device, you will only be able to detect alphanumeric characters.
Google’s Cloud Vision
If for whatever reason you don’t want to do the work on the client-side, you’re not out of luck, as there’s plenty of APIs you can take advantage of and move heavy lifting to the cloud. One of my favorite ones is Google’s Cloud Vision. As with any cloud services these APIs, unfortunately, don’t come free, however, usually, you’ll get a few free credits to start and experiment with.
Cloud Vision will let you do as much as previous examples of on-device frameworks and more, after all, you’ll be using Google’s state of the art infrastructure and extremely well-trained models on loads of data. One of the best examples of the quality results I often show people is the case of New York Times digitizing their entire photo library with Google Cloud. What’s really interesting in this example is how they took the result of the OCR process and piped it into another API for NLP (natural language processing) to understand things written on the back of the photos. Similar APIs are offered by Amazon under their Rekognition service, Microsoft under Azure Cloud and IBM on their Watson platform.
Natural Language Processing (NLP)
Another very common ML application is Natural Language Processing and everything that’s related to pulling insights from unstructured blocks of text. Overall it’s an extremely tough nut to crack because of the multitude of languages, dialects, wording styles, etc. Thankfully there are APIs that can help you tackle this problem, the bad news is you’ll be pretty much exclusively limited to the cloud - the models to address this are way too heavy for doing it on the device.
Looking at providers, again we have the usual suspects: Amazon Comprehend, Google Natural Language, and IBM Watson Natural Language Understanding. They all will obviously give you very different results, so before you start integrating any of these services make sure you thoroughly test each and every one of them. Ideally, you’ll want to structure your code so that it’ll be relatively easy to swap services underneath because all of them change and you may start with one, but decide you’ll want to change it later in the project lifecycle.
What kind of results can you expect after applying these models to your text? First of all, you’ll get a list of classified entities. You’ll also be provided with detailed sentiment analysis of the entire text, structural sentence breakdown, some high-level categories visible in the text, and much more. Following up on the example from the previous paragraph, once folks from NYT recognized all the labels on the back of the millions of photos they attempted to digitize, they feed these results into Google’s NLP service and were able to pick out important details of the photos like where it was taken, when, what’s on the photo and additional metadata from the back.
User Behavior Analysis
Making conclusions based on user behavior in the app is just a perfect case for applying heavy ML. So far in this scenario we’re pretty much limited to only one vendor: Google Firebase, however, it’s well worth giving it a go.
By using several products from under Firebase umbrella (RemoteConfig, A/B Testing, Notifications, and Predictions) you can target users who are (according to Google’s models) likely to churn and shower them with promotions via push notifications convincing them to stick around. Or do just the opposite - you can segment out users who are likely to make a purchase.
Because the system learns as people use your digital product you can create custom predictions based around your own events created in the Firebase analytics and then pick out users who are likely to hit them. Finally, thanks to Firebase’s integration with BigQuery you can export both events from your Analytics as well as Predictions and crunch them directly in BigQuery for further analysis.
Best of all, Firebase Predictions are free of charge on all Firebase plans, so the barrier to entry is almost non-existent.
As you can see there are plenty of platforms offering various APIs and models for a lot of different scenarios. And we’ve only covered a few relatively common cases. I highly encourage you to dip your toes into the services I mentioned, the time investment is minimal and you can try out a number of APIs that can bring a lot of value to your product and your users. Also, it’s a great way to get started with machine learning and eventually level up into building your own models and APIs, from there onwards sky is a limit!
Cover Photo by Hunter Harritt