-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Some fixes in LM part regarding ngram history length + MLE ngram #13
base: master
Are you sure you want to change the base?
Conversation
lecture_note.tex
Outdated
@@ -3568,11 +3568,12 @@ \section{$n$-Gram Language Model} | |||
conditional probability (Eq.~\eqref{eq:unidir_sentence}~(a)) is only conditioned | |||
on the $n-1$ preceding symbols only, meaning | |||
\begin{align*} | |||
p(w_k | w_{<k}) \approx p(w_k | w_{k-n}, w_{k-n+1}, \ldots, w_{k-1}). | |||
% p(w_k | w_{<k}) \approx p(w_k | w_{k-n}, w_{k-n+1}, \ldots, w_{k-1}). % history length should be n-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this commented line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
lecture_note.tex
Outdated
\end{align*} | ||
This results in | ||
\begin{align*} | ||
p(S) \approx \prod_{t=1}^T p(w_t | w_{t-n}, \ldots, w_{t-1}). | ||
p(S) \approx \prod_{t=1}^T p(w_t | w_{t-n+1}, \ldots, w_{t-1}). % history should have n-1 length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
lecture_note.tex
Outdated
\subsection{Smoothing and Back-Off} | ||
|
||
{\em Note that I am missing many references this section, as I am writing this | ||
on my travel. I will fill in missing references once I'm back from my | ||
travel.} | ||
|
||
The biggest issue of having an $n$-gram that never occurs in the training corpus | ||
is that any sentence containing the $n$-gram will be given a zero probability | ||
is that any sentence containing such $n$-gram will be given a zero probability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
such an
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
So given the definition of n-gram the text is 100% correct but in the formulas there are always histories of length n, which is probably a typo. I have also added small explanation about why relative freq. ngram estimator is optimal from the MLE perspective.