-
Notifications
You must be signed in to change notification settings - Fork 2
开发手册
William edited this page Jun 16, 2024
·
8 revisions
创建你自己的项目,使用 Maven
或 Gradle
引入 llama-java-core 框架。
最新的版本请查阅 GitHub Release 或搜索 Maven repository。
Maven
<dependency>
<groupId>chat.octet</groupId>
<artifactId>llama-java-core</artifactId>
<version>LAST_RELEASE_VERSION</version>
</dependency>
Gradle
implementation group: 'chat.octet', name: 'llama-java-core', version: 'LAST_RELEASE_VERSION'
Examples
- Chat Console Example
这里提供了一个简单的聊天示例。
public class ConsoleExample {
private static final String MODEL_PATH = "/octet-chat/models/llama2/ggml-model-7b-q6_k.gguf";
public static void main(String[] args) {
ModelParameter modelParams = ModelParameter.builder()
.modelPath(MODEL_PATH)
.threads(6)
.contextSize(4096)
.verbose(true)
.build();
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8));
Model model = new Model(modelParams)) {
GenerateParameter generateParams = GenerateParameter.builder().build();
String system = "Answer the questions.";
while (true) {
System.out.print("\nQuestion: ");
String input = bufferedReader.readLine();
if (StringUtils.trimToEmpty(input).equalsIgnoreCase("exit")) {
break;
}
model.chat(generateParams, system, input).output();
System.out.print("\n");
model.metrics();
}
} catch (Exception e) {
System.err.println("Error: " + e);
System.exit(1);
}
}
}
- Continuous Chat Example
public class ContinuousChatExample {
private static final String MODEL_PATH = "/octet-chat/models/llama2/ggml-model-7b-q6_k.gguf";
public static void main(String[] args) {
String system = "You are a helpful assistant. ";
String[] questions = new String[]{
"List five emojis about food and explain their meanings",
"Write a fun story based on the third emoji",
"Continue this story and refine it",
"Summarize a title for this story, extract five keywords, and the keywords should not exceed five words",
"Mark the characters, time, and location of this story",
"Great, translate this story into Chinese",
"Who are you and why are you here?",
"Summarize today's conversation"
};
GenerateParameter generateParams = GenerateParameter.builder()
.verbosePrompt(true)
.user("William")
.build();
try (Model model = new Model(MODEL_PATH)) {
for (String question : questions) {
//Example: Continuous chat example
model.chat(generateParams, system, question).output();
System.out.println("\n");
model.metrics();
}
}
}
}
Tip
更多示例: chat.octet.examples.*
LogitsProcessor
StoppingCriteria
可以使用 LogitsProcessor
和 StoppingCriteria
对模型推理过程进行自定义控制。
注:如果需要在Java中进行矩阵计算请使用
openblas
- chat.octet.model.components.processor.LogitsProcessor
自定义一个处理器对词的概率分布进行调整,控制模型推理的生成结果。这里是一个示例:NoBadWordsLogitsProcessor.java
LogitBias logitBias = new LogitBias();
logitBias.put(5546, "false");
logitBias.put(12113, "5.89");
LogitsProcessorList logitsProcessorList = new LogitsProcessorList().add(new CustomBiasLogitsProcessor(logitBias, model.getVocabSize()));
GenerateParameter generateParams = GenerateParameter.builder()
.logitsProcessorList(logitsProcessorList)
.build();
- chat.octet.model.components.criteria.StoppingCriteria
自定义一个控制器实现对模型推理的停止规则控制,例如:控制生成最大超时时间,这里是一个示例:MaxTimeCriteria
long maxTime = TimeUnit.MINUTES.toMillis(Optional.ofNullable(params.getTimeout()).orElse(10L));
StoppingCriteriaList stopCriteriaList = new StoppingCriteriaList().add(new MaxTimeCriteria(maxTime));
GenerateParameter generateParams = GenerateParameter.builder()
.stoppingCriteriaList(stopCriteriaList)
.build();
完整的文档请参考
Java docs
下载原始模型文件
搜索 huggingface
获取开源模型,支持 Llama2
和 GPT
模型及其他开源模型,例如:Baichuan 7B
、Qwen 7B
。
转换为GGUF格式模型
# 下载llama.cpp项目
git clone https://github.com/ggerganov/llama.cpp.git
# 安装python库
cd llama.cpp
pip3 install -r requirements.txt
# 转换为GGUF格式模型
python3 convert.py SOURCE_MODEL_PATH --outfile OUTPUT_MODEL_PATH/model-f16.gguf
模型量化
使用 LlamaService.llamaModelQuantize
进行模型量化,设置 ModelFileType.LLAMA_FTYPE_MOSTLY_Q8_0
调整量化精度。
public class QuantizeExample {
public static void main(String[] args) {
int status = LlamaService.llamaModelQuantize("OUTPUT_MODEL_PATH/model-f16.gguf",
"OUTPUT_MODEL_PATH/model.gguf",
ModelFileType.LLAMA_FTYPE_MOSTLY_Q8_0
);
System.out.println("Quantize status: " + status);
}
}
或者使用
quantize
工具进行量化:
quantize OUTPUT_MODEL_PATH/model-f16.gguf OUTPUT_MODEL_PATH/model.gguf q8_0
Tip
默认已包含各系统版本库,可以直接使用。
如需重新编译,可以使用llama-java-core/build.sh
- On Linux
linux-x86-64/libllamajava.so
- On macOS
darwin-x86-64/default.metallib
darwin-x86-64/libllamajava.dylib
darwin-aarch64/default.metallib
darwin-aarch64/libllamajava.dylib
- On Windows
win32-x86-64/llamajava.dll
如果需要加载外部库文件,可以使用如下参数。
-Doctet.llama.lib=<YOUR_LIB_PATH>