JA EN

Inference

The process of feeding new data into a trained model to obtain predictions. Unlike training, inference does not update model parameters.

Inference is the process of passing unseen data through a trained neural network to obtain predictions such as class labels, bounding boxes, or segmentation masks. Unlike training, inference executes only the forward pass with frozen weights, making it computationally lighter per sample.

Performance is measured by latency-accuracy trade-off. Real-time object detection requires sub-33ms per frame (30 FPS). YOLOv8 achieves about 1.5ms per image on GPU, while MobileNetV3 runs in approximately 5ms on CPU.

Browser-based inference via WebAssembly enables client-side image processing without server communication, benefiting privacy and latency. Since inference costs dominate cloud expenses, model optimization is critical for production deployment.

Related Terms

Related Articles