An Introduction to Machine Learning with Turi
Jul 7, 2020 • 6 Minute Read
Introduction
Machine learning and data science are quickly becoming essential and vital practices in organizations that have raw data and would like to derive insights from it.
With large volumes of data, efficiency quickly becomes paramount in order to to test solutions and hypotheses as fast as possible, pushing the best viable solution to production. Turi was built with this in mind. The motivation behind Turi is to create powerful machine learning and data science tools that can allow quick progression from ideas to production.
This guide assumes that you possess at least intermediate level knowledge in Python and that you have some background in machine learning and data science.
Assume you are a machine learning developer at a young and budding startup that seeks to employ efficient data and ML models in the computer vision and retail space. The iOS ML team in the startup is currently considering a tool that can allow rapid prototyping and still deliver results efficiently. Your team decides to give Turi a try in both image classification and predictive recommendation.
Set up Turi
To set up and use Turi, you need either Turi Create, open-source, or graphlab, academic. Both are Python libraries used to build high-performing and large-scale data and machine learning applications. To achieve this speed and efficiency, they are supported by C++. Both are supported on Windows, Mac OS, and Linux-based operating systems.
-
Academic version - Targeted for anyone using Turi for academic purposes only. You will have to sign up for a free, one-year renewable subscription. You will receive a product key and download instructions. A guide to using graphlab can be found here.
-
Open-source version - Developed by Apple and ideal for non-academic users.
To install the open-source version, on your terminal, run the command pip install -U turicreate.
Image Classification with Turi
In this example, you will use the popular Kaggle flowers dataset. Download and copy the dataset to your working directory. Import the turi library and give it an alias for easy reference: import turicreate as tc. Since the flowers dataset is in the same directory, load the images using the load_image method.
data = tc.image_analysis.load_images('flowers', with_path=True).
Next, you will create the label column, which is the name of each subfolder.
import os
data['flower_name'] = data['path'].apply(lambda path: os.path.basename(os.path.dirname(path)))
The loaded data is then saved to an SFrame.
data.save(flowers.sframe)
Consider this a version of a pandas dataframe that can hold so much more data since it utilizes disk space rather than memory.
To perform image classification, you will need to load the data, train a classification model, and finally, save and export the model.
Load and Split Data for Classifier Training
The previously created flowers.sframe will be loaded into an SFrame object to allow manipulation and classification.
Load the data:
data = tc.SFrame("flowers.sframe")
Once loaded, split the data into a training and testing set in whichever ratio you see fit. In this example, 0.75 is used.
train_data, test_data = data.random_split(0.8)
Classification and Predictions
With Turi Create, it's easy to create a model with just one line of code. Pass the training set and the label you wish to predict. In this case, the label is flower_name.
model = tc.image_classifier.create(train_data, target='flower_name')
To make predictions on the test set, the method predict is called.
predictions = model.predict(test_data)
Model Evaluation, Saving, and Export
At this point, the model is complete. For quality purposes, it is a good idea to evaluate the model and examine the accuracy.
metrics = model.evaluate(test_data) print(metrics['accuracy'])
If you wish to view all the other evaluation metrics, just run the line print(metrics).
Save the model: model.save('flowers.model')
Export the model in coreML format:model.export_coreml('mymodels/flowers_coreml.mlmodel')
With the .mlmodel file, you can now add image classification capability to your Apple app. This model can be used on an iPhone app that classifies flowers in real time or in pictures.
Movie Recommender System with Turi
For this example, download the MovieLens dataset and copy it into your working directory.
Load the data into an SFrame object actions = tc.SFrame.read_csv('./dataset/ml-20m/ratings.csv') and print the data print(actions).
Split the train and test data and create a recommendation model.
training_data, validation_data = tc.recommender.util.random_split_by_user(actions, 'userId', 'movieId')
model = tc.recommender.create(training_data, 'userId', 'movieId')
At this point, all that is required is the recommend() method.
results = model.recommend()
print(results)
Conclusion
In the modern age of data, skills in machine learning and data science are not only vital but very marketable. These skills are much sought after for job roles such as machine learning engineers, data scientists, chief data/information officers, business intelligence developers, and data analysts, among others.
To build on the skills learned by developing machine learning tools with Turi, the next step is to learn how to deploy solutions in production, either on the cloud or on devices.