You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/server/README.md
+56-22
Original file line number
Diff line number
Diff line change
@@ -343,6 +343,10 @@ node index.js
343
343
344
344
### POST `/completion`: Given a `prompt`, it returns the predicted completion.
345
345
346
+
> [!IMPORTANT]
347
+
>
348
+
> This endpoint is **not** OAI-compatible
349
+
346
350
*Options:*
347
351
348
352
`prompt`: Provide the prompt for this completion as a string or as an array of strings or numbers representing tokens. Internally, if `cache_prompt` is `true`, the prompt is compared to the previous completion and only the "unseen" suffix is evaluated. A `BOS` token is inserted at the start, if all of the following conditions are true:
@@ -444,38 +448,68 @@ These words will not be included in the completion, so make sure to add them to
444
448
445
449
`timings_per_token`: Include prompt processing and text generation speed information in each response. Default: `false`
446
450
451
+
`post_sampling_probs`: Returns the probabilities of top `n_probs` tokens after applying sampling chain.
452
+
447
453
**Response format**
448
454
449
455
- Note: In streaming mode (`stream`), only `content`, `tokens` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
450
456
451
-
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure:
452
-
453
-
```json
454
-
{
455
-
"content": "<the token generated by the model>",
456
-
"tokens": [ generated token ids if requested ],
457
-
"probs": [
458
-
{
459
-
"prob": float,
460
-
"tok_str": "<most likely token>"
461
-
},
462
-
{
463
-
"prob": float,
464
-
"tok_str": "<second most likely token>"
465
-
},
457
+
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has a nested array `top_logprobs`. It contains at **maximum** `n_probs` elements:
458
+
```json
459
+
{
460
+
"content": "<the generated completion text>",
461
+
"tokens": [ generated token ids if requested ],
466
462
...
467
-
]
468
-
},
469
-
```
470
-
471
-
Notice that each `probs` is an array of length `n_probs`.
463
+
"probs": [
464
+
{
465
+
"id": <token id>,
466
+
"logprob": float,
467
+
"token": "<most likely token>",
468
+
"bytes": [int, int, ...],
469
+
"top_logprobs": [
470
+
{
471
+
"id": <token id>,
472
+
"logprob": float,
473
+
"token": "<token text>",
474
+
"bytes": [int, int, ...],
475
+
},
476
+
{
477
+
"id": <token id>,
478
+
"logprob": float,
479
+
"token": "<token text>",
480
+
"bytes": [int, int, ...],
481
+
},
482
+
...
483
+
]
484
+
},
485
+
{
486
+
"id": <token id>,
487
+
"logprob": float,
488
+
"token": "<most likely token>",
489
+
"bytes": [int, int, ...],
490
+
"top_logprobs": [
491
+
...
492
+
]
493
+
},
494
+
...
495
+
]
496
+
},
497
+
```
498
+
Please note that if `post_sampling_probs` is set to `true`:
499
+
- `logprob`will be replaced with `prob`, with the value between 0.0 and 1.0
500
+
- `top_logprobs` will be replaced with `top_probs`. Each element contains:
501
+
- `id`: token ID
502
+
- `token`: token in string
503
+
- `bytes`: token in bytes
504
+
- `prob`: token probability, with the value between 0.0 and 1.0
505
+
- Number of elements in `top_probs` may be less than `n_probs`
472
506
473
507
- `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
474
508
- `tokens`: Same as `content` but represented as raw token ids. Only populated if `"return_tokens": true` or `"stream": true` in the request.
475
509
- `stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
476
510
- `generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`. These options may differ from the original ones in some way (e.g. bad values filtered out, strings converted to tokens, etc.).
477
-
- `model`: The path to the model loaded with `-m`
478
-
- `prompt`: The provided `prompt`
511
+
- `model`: The model alias (for model path, please use `/props` endpoint)
512
+
- `prompt`: The processed `prompt` (special tokens may be added)
479
513
- `stop_type`: Indicating whether the completion has stopped. Possible values are:
480
514
- `none`: Generating (not stopped)
481
515
- `eos`: Stopped because it encountered the EOS token
0 commit comments