TinyML on the Edge: Model Compression, On-Device Learning, and Energy–Latency Trade-Offs

Arjun Patel; Fadi Al-Fayez

Authors

Arjun Patel Department of Computer Engineering, Indian Institute of Technology Bombay, Mumbai, India.

Fadi Al-Fayez * Department of Mechatronics Engineering, German Jordanian University, Amman, Jordan. fadi.alfayez@gju.edu.jo

Keywords:

Tiny Machine Learning (TinyML), model compression, on-device learning, edge AI, energy–latency optimization, neural architecture search, embedded intelligence

Abstract

This review article aims to synthesize contemporary developments in Tiny Machine Learning (TinyML)—with emphasis on model compression, on-device learning, and energy–latency trade-offs—to establish an integrated understanding of how intelligent inference and adaptation can be achieved on highly resource-constrained edge devices. This study employed a qualitative systematic review design grounded in thematic analysis. Sixteen peer-reviewed articles published between 2019 and 2025 were selected from major scientific databases, including IEEE Xplore, ACM Digital Library, ScienceDirect, and SpringerLink, based on relevance to TinyML, model compression, and edge inference optimization. Data collection was exclusively literature-based, following theoretical saturation principles. All selected studies were imported into NVivo 14 for open, axial, and selective coding. Analytical procedures involved identifying recurring concepts and grouping them into higher-order themes through iterative interpretation. The reliability of coding was maintained via memo-keeping and cross-verification of emergent categories. Four major thematic categories emerged: (1) Model compression and optimization, encompassing pruning, quantization, distillation, and compiler-level acceleration; (2) On-device learning and adaptation, highlighting federated, meta-learning, and reinforcement learning techniques for autonomous edge model evolution; (3) Energy–latency trade-off management, focusing on multi-objective optimization frameworks, hardware–software co-design, and low-power accelerators; and (4) Application scenarios and benchmarking, demonstrating TinyML’s adoption in vision, audio, biomedical, and industrial IoT contexts supported by standardized metrics such as MLPerf Tiny. Collectively, these findings confirm that achieving sustainable edge intelligence requires a unified co-optimization of algorithmic, hardware, and runtime dimensions. TinyML represents a convergence of embedded engineering and artificial intelligence where compression, learning, and energy optimization interlock to enable autonomous, low-power, and responsive systems. Future research should advance adaptive, security-aware, and cross-domain frameworks to realize robust, scalable edge intelligence.

Downloads

Download data is not yet available.

References

Alizadeh, M., et al. (2021). Optimizing quantized kernel operations for embedded inference. IEEE Transactions on Embedded Systems.

Analyzing the Trade-offs Between Model Compression and Security in Edge AI. (2023). ResearchGate.

Banbury, C., et al. (2021). MLPerf Tiny Benchmark: Bridging machine learning and embedded systems. Proceedings of MLSys.

Banner, R., et al. (2019). Post-training quantization for neural networks. NeurIPS.

Cheng, Y., et al. (2018). Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine.

Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation. ICML.

Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization, and Huffman coding. ICLR.

Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. NeurIPS Workshops.

Horowitz, M. (2014). Energy constraints in computing. IEEE International Solid-State Circuits Conference.

Kang, Y., et al. (2017). Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGCOMM.

Khan, L., et al. (2021). TinyML for industrial IoT predictive maintenance. IEEE Internet of Things Journal.

Lane, N. D., et al. (2015). DeepX: Resource-efficient deep inference on mobile devices. MobiSys.

Li, T., Sahu, A., Zaheer, M., & Smith, V. (2020). Federated optimization in heterogeneous networks. Proceedings of MLSys.

Lin, T. Y., et al. (2017). Focal loss for dense object detection. ICCV.

McMahan, H. B., et al. (2017). Communication-efficient learning of deep networks from decentralized data. AISTATS.

Parisi, G. I., et al. (2019). Continual lifelong learning with neural networks: A review. Neural Networks.

Reddi, V. J., et al. (2020). Hardware accelerators for efficient on-device AI. IEEE Micro.

Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal.

Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. S. (2020). Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE.

Tan, M., et al. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. ICML.

Teerapittayanon, S., McDanel, B., & Kung, H. T. (2016). BranchyNet: Fast inference via early exiting from deep neural networks. ICPR.

Warden, P. (2018). TinyML: Machine learning with TensorFlow Lite on microcontrollers. O’Reilly Media.

Xu, C., et al. (2020). On-device learning for edge AI: A survey. IEEE Access.

Xu, J., et al. (2021). Dynamic trade-off management in edge intelligence systems. IEEE Transactions on Neural Networks and Learning Systems.

Zhang, C., et al. (2017). Hello Edge: Keyword spotting on microcontrollers. arXiv:1711.07128.

Zhang, X., et al. (2022). Hardware–algorithm co-design for efficient TinyML. ACM Transactions on Embedded Computing Systems.

Zhu, X., et al. (2020). Pipeline parallelism for low-latency deep inference. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

TinyML on the Edge: Model Compression, On-Device Learning, and Energy–Latency Trade-Offs

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

Submitted

Revised

Accepted

Issue

Section

How to Cite

Similar Articles

Most read articles by the same author(s)

Make a Submission

Keywords

Information Table

Language

Journal Archive

Average time from submission until

Indexing & Abstracting