Porting Large Language Models to Mobile Devices for Question Answering

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Preprint 01 Jan 2024Embargo end date: 01 Jan 2024Publisher:arXivJournal:CoRR, volume abs/2404.15851Funded by:EC | AI4Media

Authors: Fassold, Hannes;

doi: 10.48550/arxiv.2404.15851 , 10.5281/zenodo.11058352 , 10.5281/zenodo.11058351

arXiv: 2404.15851

Porting Large Language Models to Mobile Devices for Question Answering

- Summary
- Subjects
- Metrics

Abstract

Deploying Large Language Models (LLMs) on mobile devices makes all the capabilities of natural language processing available on the device. An important use case of LLMs is question answering, which can provide accurate and contextually relevant answers to a wide array of user queries. We describe how we managed to port state of the art LLMs to mobile devices, enabling them to operate natively on the device. We employ the llama.cpp framework, a flexible and self-contained C++ framework for LLM inference. We selected a 6-bit quantized version of the Orca-Mini-3B model with 3 billion parameters and present the correct prompt format for this model. Experimental results show that LLM inference runs in interactive speed on a Galaxy S21 smartphone and that the model delivers high-quality answers to user queries related to questions from different subjects like politics, geography or history.

Accepted for ASPAI 2024 Conference

Related Organizations

Joanneum Research
Austria

Keywords

FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Funded by

EC| AI4Media