Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Do copy editing #36

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions notebooks/xgboost-titanic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We do just minimal preprocessing: convert obviously contiuous *Age* and *Fare* variables to floats,\n",
"We do just minimal preprocessing: convert obviously continuous *Age* and *Fare* variables to floats,\n",
"and *SibSp*, *Parch* to integers. Missing *Age* values are removed."
]
},
Expand Down Expand Up @@ -170,14 +170,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"There is one tricky bit about the code above: one may be templed to just pass ``dense=True`` to ``DictVectorizer``: after all, in this case the matrixes are small. But this is not a great solution, because we will loose the ability to distinguish features that are missing and features that have zero value.\n",
"There is one tricky bit about the code above: one may be tempted to just pass ``dense=True`` to ``DictVectorizer``: after all, in this case the matrixes are small. But this is not a great solution, because we will lose the ability to distinguish features that are missing and features that have zero value.\n",
"\n",
"\n",
"## 3. Explaining weights\n",
"\n",
"In order to calculate a prediction, XGBoost sums predictions of all its trees.\n",
"To calculate a prediction, XGBoost sums predictions of all its trees.\n",
"The number of trees is controlled by ``n_estimators`` argument and is 100 by default.\n",
"Each tree is not a great predictor on it's own, but by summing across all trees,\n",
"Each tree is not a great predictor on its own, but by summing across all trees,\n",
"XGBoost is able to provide a robust estimate in many cases. Here is one of the trees:"
]
},
Expand Down Expand Up @@ -1151,8 +1151,8 @@
"source": [
"## 5. Adding text features\n",
"\n",
"Right now we treat *Name* field as categorical, like other text features.\n",
"But in this dataset each name is unique, so XGBoost does not use this feature at all, because it's\n",
"Now we treat *Name* field as categorical, like other text features,\n",
"but in this dataset, each name is unique, so XGBoost does not use this feature at all, because it's\n",
"such a poor discriminator: it's absent from the weights table in section 3.\n",
"\n",
"But *Name* still might contain some useful information. We don't want to guess how to best pre-process it\n",
Expand Down