-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Improve random state handling #801
Conversation
Squashed commits: [58d1612] Remove seed from config and put it in the "shared" for the ProcessingUnit
e29cc06
to
6ff0eea
Compare
Codecov Report
@@ Coverage Diff @@
## develop #801 +/- ##
===========================================
+ Coverage 88.42% 88.47% +0.04%
===========================================
Files 76 76
Lines 4571 4571
Branches 882 882
===========================================
+ Hits 4042 4044 +2
+ Misses 397 395 -2
Partials 132 132 |
f956a78
to
9e43dfd
Compare
while True: | ||
noise_length = int(random_state.normal(mean_length, std_length)) | ||
i += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable i
@@ -35,8 +35,9 @@ def test_should_get_slots(self): | |||
- make me [number_of_cups:snips/number](five) cups of tea | |||
- please I want [number_of_cups](two) cups of tea""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
config = CRFSlotFillerConfig(random_seed=42) | |||
config = CRFSlotFillerConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is not needed anymore here.
@@ -101,10 +104,11 @@ def test_should_get_sub_builtin_slots(self): | |||
- find an activity from [start](6pm) to [end](8pm) | |||
- Book me a trip from [start](this friday) to [end](next tuesday)""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
config = CRFSlotFillerConfig(random_seed=42) | |||
config = CRFSlotFillerConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
@@ -65,9 +66,11 @@ def test_should_get_builtin_slots(self): | |||
- Can you tell me the weather [datetime] please ? | |||
- what is the weather forecast [datetime] in [location](paris)""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
config = CRFSlotFillerConfig(random_seed=42) | |||
config = CRFSlotFillerConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is not needed anymore here.
@@ -356,9 +360,10 @@ def test_should_get_slots_after_deserialization(self): | |||
- i want [number_of_cups] cups of tea please | |||
- can you prepare [number_of_cups] cups of tea ?""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
config = CRFSlotFillerConfig(random_seed=42) | |||
config = CRFSlotFillerConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
classifier_config = LogRegIntentClassifierConfig(random_seed=42) | ||
slot_filler_config = CRFSlotFillerConfig(random_seed=42) | ||
classifier_config = LogRegIntentClassifierConfig() | ||
slot_filler_config = CRFSlotFillerConfig() | ||
parser_config = ProbabilisticIntentParserConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
classifier_config = LogRegIntentClassifierConfig(random_seed=42) | ||
slot_filler_config = CRFSlotFillerConfig(random_seed=42) | ||
classifier_config = LogRegIntentClassifierConfig() | ||
slot_filler_config = CRFSlotFillerConfig() | ||
parser_config = ProbabilisticIntentParserConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
@@ -162,9 +169,12 @@ def test_should_get_intents(self): | |||
utterances: | |||
- yili yulu yele""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
classifier_config = LogRegIntentClassifierConfig(random_seed=42) | |||
classifier_config = LogRegIntentClassifierConfig() | |||
parser_config = ProbabilisticIntentParserConfig(classifier_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
random_seed=seed1), | ||
slot_filler_config=CRFSlotFillerConfig(random_seed=seed2) | ||
intent_classifier_config=LogRegIntentClassifierConfig(), | ||
slot_filler_config=CRFSlotFillerConfig() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
docs/source/tutorial.rst
Outdated
different outputs. | ||
|
||
If you want to run training in a reproducible way you can pass a random seed to | ||
your engine: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer using a more impersonal form in the documentation, but that's just a suggestion. That would be something like:
Reproducible training and testing can be achieved by passing a
**random seed** to the engine:
docs/source/tutorial.rst
Outdated
@@ -174,6 +174,26 @@ the dataset we generated earlier: | |||
|
|||
engine.fit(dataset) | |||
|
|||
Note that by default, the training of the engine is non-deterministic: if you | |||
train your NLU twice on the same data and test it on the same input, you'll get | |||
different outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be a bit more optimistic in the formulation:
Note that, by default, the training of the NLU engine is a non-deterministic process:
training and testing multiple times on the same data may produce different outputs.
ae0633e
to
45f4fd4
Compare
Description:
Currenlty
scikit-learn
bug the intent classification training was not deterministicDone:
sklearn==0.21
which contains a fix which makesSGDClassifier
training deterministicChecklist: