Chat application for Android on Snapdragon® with Llama 3.2 3B using Genie SDK.
The app demonstrates how to use the Genie C++ APIs from QAIRT SDK to run and accelerate LLMs using the Snapdragon® Neural Processing Unit (NPU).
Genie SDK requires newer meta-build to run LLMs on-device. Depending on which meta-build is picked by your phone vendor, this feature may or may not work.
We recommend using a device from QDC for rest of this demo to run models on-device. Android devices on QDC have newer meta-build and can run this demo on Android 14+.
We have verified sample ChatApp for the following device:
Device name | OS | Build Version |
---|---|---|
Samsung Galaxy S24 Plus | One UI 6.1 (Android 14) | UP1A.231005.007.S926U1UEU4AXK4 |
If you have a device listed in the above table, you can update OS to above mentioned or newer OS to run Sample App locally.
If your device is not listed above, we request to try this app on your device and share your feedback as a comment on this issue.
We are looking forward for community contributions for trying out this app on different devices and keep this information up-to-date.
chatapp-demo.mp4
- Snapdragon® Gen 3 or Snapdragon® 8 Elite
- Access to Android device on QDC
-
Clone this repository with Git-LFS enabled.
-
Download Android Studio. Version 2023.1.1 or newer is required.
-
Install AI Hub and AI Hub models
pip install qai-hub pip install "qai-hub-models[llama-v3-2-3b-chat-quantized]"
-
Download and extract QAIRT SDK compatible with sample app:
We recommend using same QAIRT SDK (also "QNN SDK" for older versions) version as the one used by AI Hub for generating QNN context binaries. You can find the AI Hub QAIRT version in the compile job page as shown in the following screenshot:
Having different QAIRT versions could result in runtime or load-time failures.
Follow these steps to configure QAIRT SDKs for ChatApp:
-
Download and extract Qualcomm® AI Runtime SDK (see QNN SDK for older versions) for Linux.
-
If you are using macOS, then we recommend using Docker to install
qpm-cli
to extract.qik
file. -
If successful, you will see a message like
SUCCESS: Installed qualcomm_ai_engine_direct.Core at /opt/qcom/aistack/qairt/<version>
We will use Llama 3.2 3B with context length 2048 as an example for this demo.
- Go to ChatApp directory
cd <ai-hub-apps-repo-root>/apps/android/ChatApp/
-
Export Llama 3.2 3B model with context length 2048
-
Read more about exporting LLMs via AI Hub here
- You'll have to replace model name from the above tutorial with
llama_v3_2_3b_chat_quantized
and reduce context length for this demo.
- You'll have to replace model name from the above tutorial with
-
Export Llama 3.2 3B model with context length 2048
python -m qai_hub_models.models.llama_v3_2_3b_chat_quantized.export --context-length 2048 --device "Snapdragon 8 Elite QRD" --output-dir genie_bundle
- Exporting Llama3.2 models will take a while depending on your internet connectivity.
- This takes around 1-2 hours with good internet connectivity.
-
-
Download and save
tokenizer.json
from Huggingface Llama3.2 tosrc/main/assets/models/llama3_2_3b/
-
Copy model binaries (
genie_bundle/*.bin
) from step 1 tosrc/main/assets/models/llama3_2_3b/
cp genie_bundle/*.bin src/main/assets/models/llama3_2_3b/
-
Update
<ai-hub-apps-repo-root>/apps/android/ChatApp/build.gradle
with path to QNN SDK root directory. If you are on QNN version 2.28.2 and have extracted to the default location on Linux, it may look like this:def qnnSDKLocalPath="/opt/qcom/aistack/qairt/2.28.2.241116"
-
Build APK
- Open the PARENT folder (
android
) (NOT THIS FOLDER) in Android Studio - Run gradle sync
- Build the
ChatApp
target- Click on
Build
->Build Bundle(s) / APK(s)
->Build APK(s)
- Click on
- You can find
APK
at the following path
<ai-hub-apps-repo-root>/apps/android/ChatApp/build/outputs/apk/{build_type}/ # here {build_type} can be either `release` or `debug`
- Open the PARENT folder (
-
Run on Android device
We recommend using QDC to run this app.
Current limitations
- This app does not work on consumer Android 14 devices. It might work if you have newer meta-build as mentioned in current limitations section.
- This app has not yet been verified on Android 15 beta.
Steps for running
ChatApp
on QDC-
Copy APK to QDC device: You can upload APK on QDC device instance with one of the following method:
- Upload at the start of the instance. You can find this option to upload model when you create a QDC device instance
- Upload APK for existing session
- Open your QDC instance in browser
- Open File browser view
- Upload
ChatApp APK
to device
- Upload using SSH tunneling
-
You must upload your public ssh key when you create a QDC session to use this path
-
Please check QDC documentation for more information.
adb -P <PORT> push <path to ChatApp.apk on host> /data/local/tmp/
-
-
Open ADB shell on QDC device
- In browser, open you QDC session
- Open ADB console
-
Install ChatApp.apk on-device using adb shell
# ASSUMPTION: you are already in adb shell after step 2 pm install -t <path to ChatApp.apk on-device> # e.g. pm install -t /data/local/tmp/ChatApp-release.apk
-
Use browser UI instance to open and run ChatApp
This app is released under the BSD-3 License found at the root of this repository.
All models from AI Hub Models are released under separate license(s). Refer to the AI Hub Models repository for details on each model.
The QNN SDK dependency is also released under a separate license. Please refer to the LICENSE file downloaded with the SDK for details.