Skip to content

Commit

Permalink
Feature/benchmark (#70)
Browse files Browse the repository at this point in the history
* feat: Benchmark

* doc: benchmarking
  • Loading branch information
Jesús Seijas authored Oct 2, 2018
1 parent 4f3be8b commit 9b7aca8
Show file tree
Hide file tree
Showing 9 changed files with 8,728 additions and 1 deletion.
1 change: 1 addition & 0 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
./screenshots
./model.nlp
./docs
./examples
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ NLP.js

- [Installation](#installation)
- [Example of use](#example-of-use)
- [Benchmarking](docs/benchmarking.md)
- [Language Support](docs/language-support.md)
- [Classification](docs/language-support.md#classification)
- [Sentiment Analysis](docs/language-support.md#sentiment-analysis)
Expand Down
35 changes: 35 additions & 0 deletions docs/benchmarking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Benchmarking

## Introduction

This benchmark is done following the instructions at https://github.com/Botfuel/benchmark-nlp-2018/blob/master/README.md

3 corpus called `Chatbot`, `Ask Ubuntu` and `Web Applications` as described in the paper http://workshop.colips.org/wochat/@sigdial2017/documents/SIGDIAL22.pdf

The corpus can be found at json files at https://github.com/sebischair/NLU-Evaluation-Corpora

| corpus | num of intents | train | test |
| ---------------- | -------------- | ----- | ---- |
| Chatbot | 2 | 100 | 106 |
| Ask Ubuntu | 5 | 53 | 109 |
| Web Applications | 8 | 30 | 59 |

For `Ask Ubuntu` and `Web Application` corpus, there is a specific `None` intent for sentences that should not be matched with the other intents.

The code using for the benchmark of NLP.js can be found at [`/examples/nlu-benchmark`](https://github.com/axa-group/nlp.js/tree/master/examples/nlu-benchmark)

## Intent classification results

We compute the `f1` score for each corpus and the overall `f1`:

| Platform\Corpus | Chatbot | Ask Ubuntu | Web Applications | Overall |
| ---------------- | ------- | ---------- | ---------------- | ------- |
| Watson | 0.97 | 0.92 | 0.83 | 0.92 |
| Botfuel | 0.98 | 0.90 | 0.80 | 0.91 |
| Luis | 0.98 | 0.90 | 0.81 | 0.91 |
| NLP.js | 0.97 | 0.90 | 0.76 | 0.90 |
| Snips | 0.96 | 0.83 | 0.78 | 0.89 |
| Recast | 0.99 | 0.86 | 0.75 | 0.89 |
| RASA | 0.98 | 0.86 | 0.74 | 0.88 |
| API (DialogFlow) | 0.93 | 0.85 | 0.80 | 0.87 |

1 change: 1 addition & 0 deletions examples/console-bot/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ rl.on('line', async (line) => {
process.exit();
} else {
const result = await nlpManager.process(line);
console.log(result);
const answer = result.score > threshold && result.answer ? result.answer : 'Sorry, I don\'t understand';
let sentiment = '';
if (result.sentiment.score !== 0) {
Expand Down
2,676 changes: 2,676 additions & 0 deletions examples/nlu-benchmark/AskUbuntuCorpus.json

Large diffs are not rendered by default.

Loading

0 comments on commit 9b7aca8

Please # to comment.