Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Some fixes in LM part regarding ngram history length + MLE ngram #13

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

uralik
Copy link

@uralik uralik commented Dec 30, 2017

So given the definition of n-gram the text is 100% correct but in the formulas there are always histories of length n, which is probably a typo. I have also added small explanation about why relative freq. ngram estimator is optimal from the MLE perspective.

lecture_note.tex Outdated
@@ -3568,11 +3568,12 @@ \section{$n$-Gram Language Model}
conditional probability (Eq.~\eqref{eq:unidir_sentence}~(a)) is only conditioned
on the $n-1$ preceding symbols only, meaning
\begin{align*}
p(w_k | w_{<k}) \approx p(w_k | w_{k-n}, w_{k-n+1}, \ldots, w_{k-1}).
% p(w_k | w_{<k}) \approx p(w_k | w_{k-n}, w_{k-n+1}, \ldots, w_{k-1}). % history length should be n-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this commented line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

lecture_note.tex Outdated
\end{align*}
This results in
\begin{align*}
p(S) \approx \prod_{t=1}^T p(w_t | w_{t-n}, \ldots, w_{t-1}).
p(S) \approx \prod_{t=1}^T p(w_t | w_{t-n+1}, \ldots, w_{t-1}). % history should have n-1 length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

lecture_note.tex Outdated
\subsection{Smoothing and Back-Off}

{\em Note that I am missing many references this section, as I am writing this
on my travel. I will fill in missing references once I'm back from my
travel.}

The biggest issue of having an $n$-gram that never occurs in the training corpus
is that any sentence containing the $n$-gram will be given a zero probability
is that any sentence containing such $n$-gram will be given a zero probability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such an $n$-gram

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants