How to implement an AI‑native mobile app: Facial recognition

Learn how to build an AI-native mobile app with facial recognition using a pre-trained TensorFlow model, from model selection to optimization to integration.

By Axel Sirota

Apr 11, 2025 • 10 Minute Read

Please set an alt value for this image...

Subscribe to the newsletter

Let's build an AI‑native mobile app that uses facial recognition to greet users by name like this: “Hello, {name}!”

Rather than training a new model from scratch, we leverage a pre‑trained model from TensorFlow Hub that is designed for image recognition. By using a model such as ResNet‑V2 50 (pre‑trained on ImageNet), we can extract feature embeddings from a face image and compare them with a stored database of known individuals.

The tutorial covers every step, from setting up the model pipeline to converting it for mobile deployment with TensorFlow Lite and integrating it into an Android application.

Getting started: AI-native mobile apps

Mobile devices are increasingly powerful, and modern smartphones can now run sophisticated AI models on‑device. This capability allows us to build intelligent applications that work in real time without relying on a remote server. In this project, our goal is to implement a facial recognition system that operates entirely on the device. When the app detects a face from the camera feed, it extracts a feature vector using a pre‑trained ResNet‑based model from TensorFlow Hub. Then it compares this vector against a pre‑computed database of embeddings for known individuals. If a match is found, the app displays “Hello, {name}!” on the screen.

By taking advantage of a model pre‑trained on a large dataset such as ImageNet, we can bypass the need to collect and label a massive facial dataset. Instead, we use the model’s learned representations for feature extraction and then perform a simple nearest‑neighbor search to identify the face.

In the sections that follow, we provide an overview of the architecture, describe how to prepare data and extract embeddings, walk through quantization and model conversion, and finally detail the Android integration process.

Overview of the architecture

Our solution is divided into two main parts:

Model Inference pipeline and optimization:
- Pre‑trained model for feature extraction: We use a TensorFlow Hub module (for example, the ResNet‑V2 50 feature vector model) to extract a feature vector (embedding) from an input image.
- Face matching: For each known user, we pre‑compute and store an embedding. At runtime, when a face is detected, we extract its embedding and compare it to these stored vectors using a similarity metric (like cosine similarity). A match is declared if the similarity exceeds a preset threshold.
- Model optimization: To run efficiently on mobile hardware, we use TensorFlow Lite’s quantization toolkit to reduce the model size and speed up inference.
Mobile Application integration:
- Camera input: The Android app uses the device’s camera (via CameraX or the legacy Camera API) to capture frames.
- Pre‑processing: Captured images are cropped, resized, and normalized to match the input requirements of the pre‑trained model.
- On‑device inference: The quantized TFLite model is loaded in the Android app to run inference on each frame.
- UI update: When a face is recognized, the app displays “Hello, {name}!” on the screen.

This end‑to‑end architecture demonstrates the principles of an AI‑native mobile app, with AI seamlessly integrated into the app’s core functionality.

Leveraging a pre‑trained model from TensorFlow Hub

Choosing the right model

For our tutorial, we need a model that can provide robust image features without further training. TensorFlow Hub offers several candidates. A popular choice is the ResNet‑V2 50 feature vector module, which is pre‑trained on ImageNet. Although this model was not specifically designed for facial recognition, its learned representations can be repurposed to extract embeddings that differentiate between faces.

Alternatively, lighter models such as MobileNet‑V2 are available if you require a smaller footprint. In this example, we choose ResNet‑V2 50 due to its strong feature extraction capabilities. The process is as follows:

Load the model from TensorFlow Hub: Using TensorFlow’s Keras interface, we load the module as a KerasLayer.
Pre‑process the input: Resize the face image to the required input dimensions (usually 224×224) and normalize the pixel values.
Extract the feature vector: Run the image through the model to obtain a fixed‑length embedding.

Example snippet (Python) for loading the model:

          import tensorflow as tf
import tensorflow_hub as hub

INPUT_SHAPE = (224, 224, 3)
module_url = 
"https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/5"
feature_extractor = hub.KerasLayer(module_url, input_shape=INPUT_SHAPE, 
trainable=False)
    

Building a simple matching pipeline

Once the model is loaded, the next step is to generate and store embeddings for known individuals. For each user:

Capture a reference image.
Pre‑process the image (resize, normalize).
Run the image through the pre‑trained model to extract its embedding.
Save the embedding along with the user’s name (for example, in a JSON file or a database).

At runtime, when the app captures a new image:

Pre‑process the image.
Extract its embedding.
Compare this embedding with each stored embedding using a similarity measure (e.g., cosine similarity or Euclidean distance).
If a match is found (similarity score exceeds a threshold), retrieve the corresponding name.

Comparing embeddings:

          import numpy as np

def cosine_similarity(emb1, emb2):
    return np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))

def recognize_face(input_embedding, known_embeddings, threshold=0.8):
    for name, embedding in known_embeddings.items():
        sim = cosine_similarity(input_embedding, embedding)
        if sim > threshold:
            return name
    return None
    

Model optimization: Quantization and conversion to TensorFlow Lite

To ensure that our model runs efficiently on mobile devices, we need to optimize it. TensorFlow Lite supports post‑training quantization, which converts the model from float32 to a lower‑precision format (e.g., int8). This not only reduces the model’s size but also speeds up inference.

Steps for quantization and conversion

Load the pre‑trained model: Instead of training a new model, we use the pre‑trained model directly.
Define a representative dataset: Although our model is pre‑trained, the converter requires sample input data for calibration. We can use a few pre‑processed images of faces for this purpose.
Convert to TFLite: Use the TensorFlow Lite Converter to generate a quantized model file.

Quantization:

          import tensorflow as tf

# Convert the Keras model (which wraps the hub module) to TFLite
model = tf.keras.Sequential([feature_extractor])
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

def representative_data_gen():
    for input_value in tf.data.Dataset.from_tensor_slices(np.random.rand(1, 
224, 224, 3).astype('float32')).batch(1).take(100):
        yield [input_value]

converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()

with open('face_recognition_model_quant.tflite', 'wb') as f:
    f.write(tflite_model)
    

The resulting face_recognition_model_quant.tflite is optimized for mobile and ready for integration into an Android app.

Integrating the model into an Android app

Now that we have our quantized TFLite model, we integrate it into an Android application. The Android app will capture images from the camera, run inference using the TFLite interpreter, and display a greeting message if a known face is detected.

Setting up the Android project

Create a new Project: Open Android Studio and start a new project using an empty activity.
Add Dependencies: In your app’s build.gradle file, add the TensorFlow Lite dependency:

          dependencies {
    implementation 'org.tensorflow:tensorflow-lite:2.10.0'
    implementation 'org.tensorflow:tensorflow-lite-task-vision:0.4.0'
    // Import the GPU delegate plugin Library for GPU inference
    implementation 'org.tensorflow:tensorflow-lite-gpu-delegate-plugin:0.4.0'
    implementation 'org.tensorflow:tensorflow-lite-gpu:2.9.0'

    // Add additional dependencies for camera access if necessary.
}
    

3. Include the TFLite Model: Place the face_recognition_model_quant.tflite file in the assets folder of your project.

Android application architecture

The Android app’s components include:

Camera Module: Use CameraX or the legacy Camera API to capture live images.
Pre‑processing Module: Convert the camera frame to a bitmap, resize to 224×224, and normalize it to match the model’s input format.
Inference Module: Load the TFLite model using TensorFlow Lite’s interpreter and run inference on the processed image.
Face Matching: Convert the output of the model (the feature vector) and compare it with stored embeddings of known faces to determine the identity.
UI Module: Update the UI to display “Hello, {name}!” when a match is found.

Key Android code components

Loading the TFLite Model: Below is an example of how to load the quantized model in your Android app:

          import org.tensorflow.lite.Interpreter;
import android.content.res.AssetFileDescriptor;
import java.io.FileInputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class TFLiteModel {
    private Interpreter interpreter;

    public TFLiteModel(Context context) throws IOException {
        MappedByteBuffer modelBuffer = loadModelFile(context, 
"face_recognition_model_quant.tflite");
        interpreter = new Interpreter(modelBuffer);
    }

    private MappedByteBuffer loadModelFile(Context context, String modelFile) 
throws IOException {
        AssetFileDescriptor fileDescriptor = 
context.getAssets().openFd(modelFile);
        FileInputStream inputStream = new 
FileInputStream(fileDescriptor.getFileDescriptor());
        FileChannel fileChannel = inputStream.getChannel();
        long startOffset = fileDescriptor.getStartOffset();
        long declaredLength = fileDescriptor.getDeclaredLength();
        return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, 
declaredLength);
    }

    public float[] predict(byte[] inputData) {
        // Assuming output dimension matches the feature vector length
        float[][] output = new float[1][1280];  // 
Adjust size as per model output
        interpreter.run(inputData, output);
        return output[0];
    }
}
    

Capturing Camera Frames and Running Inference: Use Android’s Camera API (or preferably CameraX) to capture images. Process each frame to obtain a bitmap, resize it to 224×224, and convert it to a byte array. Then pass the byte array to the TFLite model for inference:

          // Inside your Camera callback method
public void onImageAvailable(ImageReader reader) {
    Image image = reader.acquireLatestImage();
    if (image != null) {
        Bitmap bitmap = convertImageToBitmap(image);
        Bitmap resizedBitmap = Bitmap.createScaledBitmap(bitmap, 224, 224, 
true);
        byte[] inputData = convertBitmapToByteArray(resizedBitmap);
        
        float[] embedding = tfliteModel.predict(inputData);
        String recognizedName = matchEmbeddingToName(embedding);
        
        if (recognizedName != null) {
            runOnUiThread(() -> greetingTextView.setText("Hello, " 
+ recognizedName + "!"));
        }
        image.close();
    }
}
    

Functions like convertImageToBitmap, convertBitmapToByteArray, and matchEmbeddingToName should be implemented to handle image conversion and matching logic (using cosine similarity against a stored database of embeddings following our Python example).

Testing and debugging

After integrating the model into the Android project, thorough testing is critical:

Real device testing: Emulators might not fully replicate camera and hardware performance. Deploy your app on an actual device to assess real‑time performance.
Inference speed: Check that the model runs fast enough to allow smooth user experience. Monitor inference time and optimize if needed.
Face matching accuracy: Validate that the pre‑computed embeddings and runtime inference produce accurate matches. Adjust the similarity threshold as necessary.
Resource management: Monitor memory usage and battery consumption since on‑device AI inference can be resource‑intensive.

Leverage Android Studio’s Logcat and profiling tools to track down issues and fine‑tune performance.

Conclusion

In this tutorial, we demonstrated how to build an AI‑native mobile app for facial recognition using a pre‑trained model from TensorFlow Hub. Instead of training a model from scratch, we leveraged the ResNet‑V2 50 feature vector module—pre‑trained on ImageNet—to extract embeddings from face images. These embeddings were compared against a stored database of known individuals, and when a match was found, the app displayed “Hello, {name}!”

The process involved the following key steps:

Loading a Pre‑Trained Model: We used TensorFlow Hub to load a high‑quality feature extractor, bypassing the need for extensive training.
Building a matching pipeline: By pre‑computing embeddings for known faces and comparing them in real time, we implemented a simple yet effective recognition system.
Optimizing for mobile: Quantization and conversion to TensorFlow Lite ensured that our model runs efficiently on mobile devices.
Android integration: The tutorial provided an end‑to‑end guide to integrating the TFLite model into an Android app, from capturing camera frames to updating the UI with personalized greetings.

This approach highlights the power of AI native mobile apps: intelligent, responsive, and privacy‑focused solutions that run entirely on the device. By leveraging pre‑trained models, developers can accelerate the development cycle and focus on delivering compelling user experiences.

As you experiment with these techniques, consider further enhancements such as improving the matching algorithm, incorporating real‑time face detection, or even integrating more sophisticated models for additional features. The world of on‑device AI is rapidly evolving, and your app can be at the forefront of this transformation.

Happy coding, and enjoy building the next generation of intelligent mobile applications!

Axel S.

Axel Sirota is a Microsoft Certified Trainer with a deep interest in Deep Learning and Machine Learning Operations. He has a Masters degree in Mathematics and after researching in Probability, Statistics and Machine Learning optimization, he works as an AI and Cloud Consultant as well as being an Author and Instructor at Pluralsight, Develop Intelligence, and O'Reilly Media.

More about this author