N-gram Set by Prediction Filter
A new approach of prediction filter is developed to resolve issues of limited memory. This approache retrieve an approximate n-Gram as an alternative approach. However, this approach is not comprehensive and should be replace by a more thorough approach.
I. Prediction Filter:
Use the frequency (NWC) of normalized (n)-gram terms as filter to generate (n+1)-gram terms:
II. N-gram Set with Prediction Filter:
III. Example Walk-through (MEDLINE.2014):
| Preprocess | unigrams | bigrams | trigrams | fourgrams | fivegrams | |
|---|---|---|---|---|---|---|
| N | n=1 | n=2 | n=3 | n=4 | n=5 | |
| Step 1 PmidTiAbSentences{YY}n{DDDD}.txt |
| |||||
| Step 2 Gen uniGram, sorted |
| |||||
| Step 3 Norm (n-1)-gram |
|
|
|
| ||
| Step 4 Gen (n-1)-gram for threshold on NWC |
|
|
|
| ||
| Step 5 Gen n-Gram |
|
|
|
|
| |
| Step 6 Sort n-Gram | 33 min. | 45 min. | 40 min. | 33 min. |