TT Source Model - Training and Test Set of Antonym Collection
I. Introduction
A collection of antonym pairs (aPairs) from various sources on the internet was established to find the characteristics and patterns of antonyms. Some sources have duplicated aPairs. For example, aPairs [absence|presence] and [presence|absence] are considered as the same aPair and counted as 1 unique aPair. In addition, antonyms in aPairs are lowercased and single word only. Multiword aPairs, such as [already|not yet] or [none of|a lot of], are removed from the collection. The source web sites, the number of unique aPairs and URLs of this training and test set are shown in Table 1.
| ID | Source | No of unique aPairs |
|---|---|---|
| 1 | Sherwood School | 449 |
| 2 | Proof Reading Services | 418 |
| 3 | Enchanted Learning | 324 |
| 4 | 7ESL | 339 |
| 5 | English Grammar Here | 321 |
| 6 | Synonyms Antonyms | 301 |
| 7 | SLP Lesson Plans | 251 |
| 8 | ESL Forums | 198 |
| 9 | My English Tutors | 170 |
| 10 | Love To Know | 167 |
| 11 | Your Dictionary | 159 |
| 12 | Classic Thesaurus | 100 |
| 13 | Power Thesaurus | 100 |
| 14 | Smart Words | 9 |
II. Design
A program is developed to:
Please see design documents for more details.
III. Implementation
Java source codes are implemented in the directory of TtSet:
Algorithm:
Antonym sources are identified by computer programs (AntObj.java) for collected aPairs as follows:
The algorithm for identifying a SD (suffixD) aPair is described as follows:
The algorithm for identifying a PD (prefixD) aPair is described as follows:
Co-occurrences in a Corpus, our first attempt is to use the terms co-occurring in MEDLINE. These are aPairs retrieved by co-occurring patterns from a corpus.
Semantic opposite in corpora. These are aPairs retrieved from a semantic network. If an aPair does not belong to the above sources, it is assigned as SN (semantic network). Patterns are yet to be developed.