Feature/benchmark (#70)

* feat: Benchmark * doc: benchmarking
axa-group · Oct 2, 2018 · 9b7aca8 · 9b7aca8
1 parent 4f3be8b
commit 9b7aca8
Show file tree

Hide file tree

Showing 9 changed files with 8,728 additions and 1 deletion.
diff --git a/.npmignore b/.npmignore
@@ -1,3 +1,4 @@
 ./screenshots
 ./model.nlp
 ./docs
+./examples
diff --git a/README.md b/README.md
@@ -32,6 +32,7 @@ NLP.js
 
 - [Installation](#installation)
 - [Example of use](#example-of-use)
+- [Benchmarking](docs/benchmarking.md)
 - [Language Support](docs/language-support.md)
   - [Classification](docs/language-support.md#classification)
   - [Sentiment Analysis](docs/language-support.md#sentiment-analysis)

diff --git a/docs/benchmarking.md b/docs/benchmarking.md
@@ -0,0 +1,35 @@
+# Benchmarking
+
+## Introduction
+
+This benchmark is done following the instructions at https://github.com/Botfuel/benchmark-nlp-2018/blob/master/README.md
+
+3 corpus called `Chatbot`, `Ask Ubuntu` and `Web Applications` as described in the paper http://workshop.colips.org/wochat/@sigdial2017/documents/SIGDIAL22.pdf
+
+The corpus can be found at json files at https://github.com/sebischair/NLU-Evaluation-Corpora
+
+| corpus           | num of intents | train | test |
+| ---------------- | -------------- | ----- | ---- |
+| Chatbot          | 2              | 100   | 106  |
+| Ask Ubuntu       | 5              | 53    | 109  |
+| Web Applications | 8              | 30    | 59   |
+
+For `Ask Ubuntu` and `Web Application` corpus, there is a specific `None` intent for sentences that should not be matched with the other intents.
+
+The code using for the benchmark of NLP.js can be found at [`/examples/nlu-benchmark`](https://github.com/axa-group/nlp.js/tree/master/examples/nlu-benchmark)
+
+## Intent classification results
+
+We compute the `f1` score for each corpus and the overall `f1`:
+
+| Platform\Corpus  | Chatbot | Ask Ubuntu | Web Applications | Overall |
+| ---------------- | ------- | ---------- | ---------------- | ------- |
+| Watson           | 0.97    | 0.92       | 0.83             | 0.92    |
+| Botfuel          | 0.98    | 0.90       | 0.80             | 0.91    |
+| Luis             | 0.98    | 0.90       | 0.81             | 0.91    |
+| NLP.js           | 0.97    | 0.90       | 0.76             | 0.90    |
+| Snips            | 0.96    | 0.83       | 0.78             | 0.89    |
+| Recast           | 0.99    | 0.86       | 0.75             | 0.89    |
+| RASA             | 0.98    | 0.86       | 0.74             | 0.88    |
+| API (DialogFlow) | 0.93    | 0.85       | 0.80             | 0.87    |
+
diff --git a/examples/console-bot/index.js b/examples/console-bot/index.js
@@ -46,6 +46,7 @@ rl.on('line', async (line) => {
     process.exit();
   } else {
     const result = await nlpManager.process(line);
+    console.log(result);
     const answer = result.score > threshold && result.answer ? result.answer : 'Sorry, I don\'t understand';
     let sentiment = '';
     if (result.sentiment.score !== 0) {

diff --git a/examples/nlu-benchmark/AskUbuntuCorpus.json b/examples/nlu-benchmark/AskUbuntuCorpus.json