TensorFlow Lite for Mobile — Deep Dive
Model Conversion Deep Dive
Handling Unsupported Operations
Not all TensorFlow operations have TF Lite equivalents. When conversion fails, you have three options:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model("model_dir")
# Option 1: Use TF ops as fallback (increases binary size)
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS # Fallback to full TF ops
]
# Option 2: Custom op implementation (most control)
# Requires C++ implementation registered with the interpreter
# Option 3: Replace unsupported ops in the model before conversion
SELECT_TF_OPS pulls in parts of the full TensorFlow runtime, increasing the app binary by 10-20 MB. Use it for prototyping, then replace with native TF Lite ops or custom ops for production.
Metadata and Label Files
TF Lite models can embed metadata — labels, normalization parameters, input preprocessing specs:
from tflite_support import metadata as _metadata
from tflite_support import metadata_schema_py_generated as _metadata_fb
# Create model metadata
model_meta = _metadata_fb.ModelMetadataT()
model_meta.name = "Image Classifier"
model_meta.description = "Classifies images into 1000 ImageNet categories"
model_meta.version = "1.0"
# Associate label file
associated_file = _metadata_fb.AssociatedFileT()
associated_file.name = "labels.txt"
associated_file.type = _metadata_fb.AssociatedFileType.TENSOR_AXIS_LABELS
# Write metadata to .tflite file
populator = _metadata.MetadataPopulator.with_model_file("model.tflite")
populator.load_metadata_buffer(metadata_buf)
populator.load_associated_files(["labels.txt"])
populator.populate()
Apps using TF Lite’s Task API can read this metadata to automatically handle preprocessing and postprocessing.
Full Integer Quantization Pipeline
Calibration with Representative Dataset
import numpy as np
def representative_dataset():
"""Yield 100-500 samples that represent real-world input distribution."""
for i in range(200):
# Use actual training/validation data, not random noise
image = load_and_preprocess_image(calibration_paths[i])
yield [image[np.newaxis, ...].astype(np.float32)]
converter = tf.lite.TFLiteConverter.from_saved_model("model_dir")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
int8_model = converter.convert()
Calibration quality matters. If your representative dataset does not cover the real input distribution, quantization ranges will be wrong, causing accuracy degradation. Use 100-500 diverse samples from your actual data.
Per-Channel vs Per-Tensor Quantization
- Per-tensor: One scale/zero-point per entire tensor. Simpler, faster, but less accurate for layers with widely varying weight ranges.
- Per-channel: One scale/zero-point per output channel. Better accuracy for convolutional layers where different filters have different magnitudes.
TF Lite uses per-channel quantization for weights by default in full integer mode — this is why it achieves near-float accuracy in most cases.
Delegate Configuration
GPU Delegate (Android)
// Android (Java/Kotlin)
GpuDelegate.Options options = new GpuDelegate.Options();
options.setPrecisionLossAllowed(true); // Allow FP16 for speed
options.setInferencePreference(
GpuDelegate.Options.INFERENCE_PREFERENCE_SUSTAINED_SPEED
);
GpuDelegate gpuDelegate = new GpuDelegate(options);
Interpreter.Options interpreterOptions = new Interpreter.Options();
interpreterOptions.addDelegate(gpuDelegate);
Interpreter interpreter = new Interpreter(modelFile, interpreterOptions);
Core ML Delegate (iOS)
// iOS (Swift)
let coreMLDelegate = CoreMLDelegate()
var options = Interpreter.Options()
options.addDelegate(coreMLDelegate!)
let interpreter = try Interpreter(
modelPath: modelPath,
options: options
)
Delegate Selection Strategy
| Device | Best Delegate | Fallback |
|---|---|---|
| Pixel 6+ | NNAPI (Google TPU) | GPU |
| Samsung Galaxy S21+ | NNAPI (Samsung NPU) | GPU |
| iPhone 12+ | Core ML (Neural Engine) | Metal GPU |
| Older Android | GPU Delegate | CPU (XNNPACK) |
| Raspberry Pi | CPU (XNNPACK) | — |
Always benchmark on target hardware. Some models run faster on CPU with XNNPACK than on GPU due to delegate overhead and memory transfer costs.
Benchmarking on Device
TF Lite Benchmark Tool
# Android
adb push benchmark_model /data/local/tmp/
adb push model.tflite /data/local/tmp/
adb shell /data/local/tmp/benchmark_model \
--graph=/data/local/tmp/model.tflite \
--num_threads=4 \
--warmup_runs=10 \
--num_runs=100 \
--use_gpu=true
# Output includes:
# - Inference (avg): 12.3ms
# - Inference (std): 1.2ms
# - Memory footprint: 45MB
Profiling Individual Operations
adb shell /data/local/tmp/benchmark_model \
--graph=/data/local/tmp/model.tflite \
--enable_op_profiling=true \
--num_runs=20
This shows time spent in each operation, helping identify bottlenecks:
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %]
CONV_2D 13 8.421 68.2% 68.2%
DEPTHWISE_CONV_2D 13 2.156 17.5% 85.7%
FULLY_CONNECTED 1 0.892 7.2% 92.9%
Android Integration Patterns
Using the Task API
The Task API provides high-level wrappers that handle preprocessing and postprocessing:
// Image classification with Task API
val options = ImageClassifier.ImageClassifierOptions.builder()
.setBaseOptions(BaseOptions.builder().useGpu().build())
.setMaxResults(5)
.build()
val classifier = ImageClassifier.createFromFileAndOptions(
context, "model.tflite", options
)
val image = TensorImage.fromBitmap(bitmap)
val results = classifier.classify(image)
results.forEach { classification ->
classification.categories.forEach { category ->
Log.d("ML", "${category.label}: ${category.score}")
}
}
Camera Pipeline Integration
// CameraX + TF Lite for real-time inference
val imageAnalysis = ImageAnalysis.Builder()
.setTargetResolution(Size(224, 224))
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
.build()
imageAnalysis.setAnalyzer(executor) { imageProxy ->
val bitmap = imageProxy.toBitmap()
val results = classifier.classify(TensorImage.fromBitmap(bitmap))
runOnUiThread { updateUI(results) }
imageProxy.close()
}
iOS Integration Patterns
// Swift integration
class ImageClassifier {
private var interpreter: Interpreter
init(modelPath: String) throws {
var options = Interpreter.Options()
options.threadCount = 2
if let delegate = CoreMLDelegate() {
options.addDelegate(delegate)
}
interpreter = try Interpreter(modelPath: modelPath, options: options)
try interpreter.allocateTensors()
}
func classify(image: UIImage) throws -> [Classification] {
guard let pixelBuffer = image.pixelBuffer(width: 224, height: 224) else {
throw ClassificationError.preprocessingFailed
}
let inputData = pixelBuffer.rgbData(isModelQuantized: true)
try interpreter.copy(inputData, toInputAt: 0)
try interpreter.invoke()
let outputTensor = try interpreter.output(at: 0)
let probabilities = outputTensor.data.toArray(type: Float32.self)
return topK(probabilities, k: 5)
}
}
On-Device Personalization
TF Lite supports on-device training for personalization scenarios:
# Export model with training signature
@tf.function(input_signature=[
tf.TensorSpec([None, 224, 224, 3], tf.float32),
tf.TensorSpec([None], tf.int32),
])
def train(images, labels):
with tf.GradientTape() as tape:
predictions = model(images, training=True)
loss = loss_fn(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return {"loss": loss}
signatures = {
"serving_default": serve,
"train": train,
}
converter = tf.lite.TFLiteConverter.from_concrete_functions(
[serve.get_concrete_function(), train.get_concrete_function()]
)
Use cases: keyboard prediction that learns your vocabulary, photo app that recognizes your pets, health app that adapts to your patterns — all without sending data to the cloud.
Model Size Optimization Checklist
- Start with a mobile-first architecture — MobileNetV3, EfficientNet-Lite, not ResNet-152
- Apply quantization-aware training if accuracy matters
- Use post-training dynamic range quantization for quick wins
- Strip unnecessary ops — remove training-only nodes before conversion
- Compress the .tflite file in your APK/IPA (Android/iOS handle decompression)
- Benchmark on actual target devices — emulators are unreliable for performance
| Model | Float32 | Int8 Quantized | Latency (Pixel 6) |
|---|---|---|---|
| MobileNetV3-Small | 6.2 MB | 1.6 MB | 1.8ms |
| EfficientNet-Lite0 | 18.4 MB | 4.7 MB | 3.2ms |
| MobileBERT | 100 MB | 25 MB | 45ms |
Production Deployment Patterns
Dynamic Model Updates
Download new models from your server without app store updates:
// Android - download and swap model
val modelFile = downloadModel("https://api.example.com/models/v3.tflite")
val newInterpreter = Interpreter(modelFile, options)
// Atomic swap
synchronized(lock) {
oldInterpreter.close()
currentInterpreter = newInterpreter
}
A/B Testing On Device
val modelVersion = if (userId.hashCode() % 100 < 10) "v3_experimental" else "v3_stable"
val interpreter = Interpreter(getModel(modelVersion), options)
analytics.logModelVersion(modelVersion)
Error Handling and Fallbacks
try {
val results = interpreter.run(input, output)
} catch (e: Exception) {
// Fall back to rule-based logic or cloud API
analytics.logInferenceFailure(e)
return fallbackClassification(input)
}
The one thing to remember: Successful TF Lite deployment requires choosing the right architecture (mobile-first), the right quantization (QAT for accuracy), the right delegate (benchmark on target hardware), and proper integration patterns — the conversion step is just the beginning.
See Also
- Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
- Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.
- Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.
- Ai Ethics Why building AI fairly is harder than it sounds — bias, accountability, privacy, and who gets to decide what AI is allowed to do.
- Ai Hallucinations ChatGPT sometimes makes up facts with total confidence. Here's the weird reason why — and why it's not as simple as 'the AI lied.'