Spelling Variant Patterns - Test on Lexicon-LRSPL
Norm, MES, and ES are used in a sequential order to retrieve the most spelling variant groups. This model is tested on Lexicon (inflVars.data) for the recall, precisino, F1, and accuracy. The results are shown as follows:
2015 (Used in AMIA paper submission)
| Step | Methods | Edit Distance | Sample No. | ret-rel | ret-irrel | notRet-rel | notRet-irrel | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Lexicon.2015 | N/A | 867,728 | 363,217 | 0 | 0 | 504,511 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 1 | Norm | N/A | 867,728 | 306,387 | 19,374 | 56,830 | 485,137 | 0.9405 | 0.8435 | 0.8894 | 0.9122 |
| 2 | MES | 2 | 867,728 | 355,423 | 173,647 | 7,794 | 330,864 | 0.6718 | 0.9785 | 0.7967 | 0.7909 |
| 3 | ES | 1 | 867,728 | 360,599 | 286,932 | 2,618 | 217,579 | 0.5569 | 0.9928 | 0.7135 | 0.6663 |
| 4 | MES | 3 | 867,728 | 360,956 | 301,097 | 2,261 | 203,414 | 0.5452 | 0.9938 | 0.7041 | 0.6504 |
| 5 | ES | 2 | 867,728 | 362,082 | 353,512 | 1,135 | 150,999 | 0.5060 | 0.9969 | 0.6713 | 0.5913 |
| 6 | MES | 4 | 867,728 | 362,159 | 356,156 | 1,058 | 148,355 | 0.5042 | 0.9971 | 0.6697 | 0.5883 |
Step 6 is the final results we use for the matcher. Use it as example for calculation check:
| Check Item | Check numbers |
|---|---|
| Total sample no | 867,728 = 362,159 + 356,156 + 1,058 + 148,355 |
| Precision | 0.5042 = 362,159 / (362,159 + 356,156) |
| Recall | 0.9971 = 362,159 / (362,159 + 1,058) |
| F1 | 0.6697 = (2 * 0.5042 * 0.9971) / (0.5042 + 0.9971) |
| Accuracy | 0.5883 = (362,159 + 148,355) / 867,728 |
| Step | Methods | Edit Distance | Sample No. | ret-rel | ret-irrel | notRet-rel | notRet-irrel | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Baseline (step-2 from above) | MES | 2 | 867,728 | 355,423 | 173,647 | 7,794 | 330,864 | 0.6718 | 0.9785 | 0.7967 | 0.7909 |
| 1 | Double Metaphone (10) | 2 | 867,728 | 356,375 | 178,698 | 6,842 | 325,813 | 0.6660 | 0.9812 | 0.7935 | 0.7862 |
| 2 |
| 2 | 867,728 | 354,790 | 151,028 | 8,427 | 353,483 | 0.7014 | 0.9768 | 0.8165 | 0.8162 |
| 3 |
| 2 | 867,728 | 352,911 | 115,531 | 10,306 | 388,980 | 0.7534 | 0.9716 | 0.8487 | 0.8550 |
| Enhanced SpVarNorm | |||||||||||
| Baseline (step-1 from above) | Norm | N/A | 867,728 | 306,387 | 19,374 | 56,830 | 485,137 | 0.9405 | 0.8435 | 0.8894 | 0.9122 |
| New Basline | Norm | N/A | 867,728 | 304,831 | 3,973 | 58,386 | 500,538 | 0.9871 | 0.8393 | 0.9072 | 0.9281 |
| 4 |
| 2 | 867,728 | 352,826 | 114,271 | 10,391 | 390,240 | 0.7554 | 0.9714 | 0.8499 | 0.8563 |
| 5 |
| 2 | 867,728 | 352,675 | 105,623 | 10,542 | 398,888 | 0.7695 | 0.9710 | 0.8586 | 0.8661 |
| New GoldStandard - with inflectional Spelling Variants | |||||||||||
| 6.0 | Norm | 2 | 867,728 | 305,329 | 3,475 | 74,447 | 484,477 | 0.9887 | 0.8040 | 0.8868 | 0.9102 |
| 6.1?? |
| 2 | 867,728 | 369,200 | 97,897 | 10,576 | 390,055 | 0.7904 | 0.9722 | 0.8719 | 0.8750 |
| 6.1 |
| 1 | 867,728 | 369,049 | 89,249 | 10,727 | 398,703 | 0.8053 | 0.9718 | 0.8807 | 0.8845 |
| 6.2 |
| 2 | 867,728 | 369,049 | 89,249 | 10,727 | 398,703 | 0.8053 | 0.9718 | 0.8807 | 0.8845 |
Tried:
| Example | Term | Metaphone 1 | Metaphone 2 | Notes |
|---|---|---|---|---|
| 1 | meagreness | MKRNS | MKRNS |
|
| meagerness | MJRNS | MKRNS | ||
| 2 | abkhasian | ABKHXN | APKSN |
|
| abkhazian | ABKHSN | APKSN | ||
| 3 | toxic edema | TKSSTM | TKSKTM |
|
| toxic oedema | TKSKTM | TKSKTM |
| Example | Term | Metaphone 1 | Metaphone 2 | Caverphone 2.0 | Notes |
|---|---|---|---|---|---|
| 1 | zymographical | SMKRFKL | SMKRFKL | SMKRFKA111 |
|
| zymographically | SMKRFKL | SMKRFKL | SMKRFKLA11 | ||
| 2 | absorption test | ABSRPXNTST | APSRPXNTST | APSPSNTST1 |
|
| absorption tests | ABSRPXNTST | APSRPXNTST | APSPSNTSTS | ||
| 3 | bacterial culture media | BKTRLKLTRM | PKTRLKLTRM | PKTRKTRMTA |
|
| bacterial culture medium | BKTRLKLTRM | PKTRLKLTRM | PKTRKTRMTM |
| Example | Term | Metaphone (10) | Metaphone (60) | Notes |
|---|---|---|---|---|
| 1 | 2-item patient health questionnair | TMPTNTL0KS | ITMPTN0L0KSXNR |
|
| 2-item patient health questionnaires | TMPTNTL0KS | TMPTNTL0KSXNRS | ||
| 2 | bacterial culture media | PKTRLKLTRM | PKTRLKLTRMT |
|
| bacterial culture medium | PKTRLKLTRM | PKTRLKLTRMTM |
| Example | Singular | Plural | Notes |
|---|---|---|---|
| 1 | aan | aan's |
|
| 2 | dcmp deaminase | dcmp deaminase's |
|
| Example | Term | Metaphone (60) | Caverphone 2.0 | Greco-Latin | Notes |
|---|---|---|---|---|---|
| 1 | acroscleroses | AKRSKRSS | AKRSKLRSS1 | singular |
|
| acrosclerosis | AKRSKRSS | AKRSKLRSS1 | plural | ||
| 2 | ammon's horn scleroses | AMNSRNSKRSS | AMNSNSKLRS | singular |
|
| ammon's horn sclerosis | AMNSRNSKRSS | AMNSNSKLRS | plural | ||
| 3 | fimbria | FMPR | FMPRA11111 | singular |
|
| fimbriae | FMPR | FMPRA11111 | plural | ||
| 4 | infraorbital foramen | ANFRRPTLFRMN | ANFRPTFRMN | singular |
|
| infraorbital foramina | ANFRRPTLFRMN | ANFRPTFRMN | plural |
| Example | Term | Metaphone (60) | Caverphone 2.0 | Greco-Latin | Notes |
|---|---|---|---|---|---|
| 1 | zygomycetes | SKMSTS | SKMSTS1111 | ? |
|
| zygomycetous | SKMSTS | SKMSTS1111 | ? |
| Example | Term | Metaphone (60) | Caverphone 2.0 | Greco-Latin | Notes |
|---|---|---|---|---|---|
| 1 | zygapophyseal joint | TBD | TBD | ? |
|
| zygapophysial joint | TBD | TBD | ? | ||
| 2 | zuclomifene | TBD | TBD | ? |
|
| zuclomiphene | TBD | TBD | ? |