Computer Vision with Amazon Rekognition

By Douglas Starnes

Jul 21, 2020 • 7 Minute Read

Introduction

Is it possible for a computer to see? If a computer analyzes an image, will it be able to interpret it in the same way a human does? In computer vision, we attempt to accomplish exactly that. And it can be done, with remarkable accuracy. But it is still a difficult problem, and the knowledge and required resources are not within the reach of everyone. But as machine learning is more frequently used, how do those without the knowledge and resource needed for computer vision keep up?

AWS Rekognition

Amazon Web Services offers a product called Rekognition (pronounced like "recognition"). The purpose of Rekognition is to analyze images and predict what objects are in the image, if there are any faces, transcribe text, and other tasks. But to use Rekognition you need to know little, if anything, about computer vision or machine learning at all! You simply point Rekognition to a stored image file, tell it what computer vision task you want, pay Amazon a little bit of money (there is a free allowance for 12 months) and get the results back. Sound simple? It is!

The recommended way to access Rekognition and other AWS products is through a client library. Many different AWS client libraries support different languages. I'll be using Python and the boto3 package for this guide. Check the documentation for the other languages that are supported. I won't go into installing boto3 here, but it's not difficult.

Getting Started

Before using Rekognition, you need to have images stored in a location that is accessible by Rekognition, generally in the cloud. For this guide, I'll upload several images to S3.

          import boto3

s3 = boto3.resource('s3')

for image in ['faces.jpg', 'objects.jpg', 'text.jpg']:
    s3.Bucket('ps-guide-rekognition').upload_file(image, image)

The images are from Unsplash and can be found respectively at

https://source.unsplash.com/jH12C1yOvsY
https://source.unsplash.com/GkP0kt2mlXo
https://source.unsplash.com/FYlFYAqukyg

Detecting Objects

To use Rekoginition, first create a client.

      rek = boto3.client('rekognition', region_name='us-east-1')

Notice the use of client instead of resource. Create a dict representing the location of the image in S3.

          image_dict = {'S3Object':{'Bucket': 'ps-guide-rekognition', 'Name':'objects.jpg'}}
    

Pass the image to the detect_labels method in the Image keyword argument along with the number of labels/objects to detect.

      labels = rek.detect_labels(Image=image_dict, MaxLabels=10)

The resulting dict has a Labels key with the detected labels. Each label has a Name and a Confidence.

          for label in labels['Labels']:
    print('{} - {}'.format(label['Name'], label['Confidence']))
    

And here are the results.

          Furniture - 99.87967681884766
Table - 99.26332092285156
Wood - 99.20465087890625
Desk - 98.95134735107422
Person - 98.12129974365234
Flooring - 98.12129974365234
Hardwood - 97.50928497314453
Floor - 88.36457061767578
Electronics - 85.74508666992188
Interior Design - 83.3857192993164
    

And here is the image:

Please set an alt value for this image...

The bounding boxes of the objects detected are also returned in the dict. And Rekognition can also detect objects in video, not just images.

Detecting Faces

To detect a face, call the detect_faces method and pass it a dict to the Image keyword argument similar to detect_labels. The Attributes keyword argument is a list of different features to detect, such as age and gender. For this guide, I'll pass a single value, ALL, to get all of the attributes.

          image_dict = {'S3Object': {'Bucket': 'ps-guide-rekognition', 'Name': 'faces.jpg'}}

faces = rek.detect_faces(Image=image_dict, Attributes=['ALL'])

This is the image:

The faces are stored in the FaceDetails key. There is only one face in this image. The keys of the face are the attributes.

          face = faces['FaceDetails'][0]
for key in face.keys():
    print(key)
    

          BoundingBox
AgeRange
Smile
Eyeglasses
Sunglasses
Gender
Beard
Mustache
EyesOpen
MouthOpen
Emotions
Landmarks
Pose
Quality
Confidence
    

In the attributes, we can see that Rekognition predicted the AgeRange as 16-28, the Gender as Female with an almost 100% confidence, and that the subject's eyes were open but her mouth was not. And it predicted the subject's most likely emotion as CALM.

      f['AgeRange']

      {'Low': 16, 'High': 28}

      f['Gender']

      {'Value': 'Female', 'Confidence': 99.82478332519531}

      f['EyesOpen']

      {'Value': True, 'Confidence': 98.96699523925781}

      f['MouthOpen']

      {'Value': False, 'Confidence': 94.9305191040039}

      f['Emotions']

      {'Type': 'CALM'

And like with object detection, video is also supported.

Text Detection

To detect text, call the detect_text method and just the Image keyword argument.

          text = rek.detect_text(Image={'S3Object':{'Bucket':'ps-guide-rekognition', 'Name':'text.jpg'}})
    

Here is an image with some text:

The detected text is in the TextDetections key and each detection has a DetectedText key.

          for detection in text['TextDetections']:
    print(detection['DetectedText'])
    

          DANGER
HARD HAT
PROTECTION
REQUIRED
DANGER
HARD
HAT
PROTECTION
REQUIRED
    

It might look as if the text were detected twice. But notice that in the image the words "hard" and "hat" are on the same line. The first four detections are lines of text. The remaining detections are for words.

And as you might have guessed, video is also supported.

Conclusion

AWS Rekognition is a simple, easy, quick, and cost-effective way to detect objects, faces, text and more in both still images and videos. You don't need to know anything about computer or machine learning. All you need to know is how to use the API for the client libraries. This guide used Python. There are other client libraries for popular languages. This frees you from devoting resources to reinventing the wheel that Rekognition has built. Thanks for reading!

Douglas S.

Douglas Starnes is a tech author, professional explainer and Microsoft Most Valuable Professional in developer technologies in Memphis, TN. He is published on Pluralsight, Real Python and SkillShare. Douglas is co-director of the Memphis Python User Group, Memphis .NET User Group, Memphis Xamarin User Group and Memphis Power Platform User Group. He is also on the organizing committees of Scenic City Summit in Chattanooga, and TDevConf, a virtual conference in the state of Tennessee. A frequent conference and user group speaker, Douglas has delivered more than 70 featured presentations and workshops at more than 35 events over the past 10 years. He holds a Bachelor of Music degree with an emphasis on Music Composition from the University of Memphis.

More about this author