XML/HTML Handler: Correct XML/HTML Entity
- Description:
This class is used to convert HTML/XML entity to ASCII.
- Features:
Convert the following HTML/XML entity.
| In | out
|
|---|
| < | <
|
| > | >
|
| & | &
|
| " | "
|
| |
|
- Examples:
| File Name | Input | Output
|
|---|
| 10058.txt | & | &
|
| 10715.txt | "? | "?
|
| 12190.txt | " why | " why
|
- Implementation Logic:
- store the conversion in a local HashMap with key as XML/HTML entity and the value as the converted ASCII character.
- go through all keys
- if the input text contains key, replaced with converted ASCII character
- Notes:
- Baseline source code: PreProcXml.java
- Bug fixes:
- [& X] -> [&X]
- [&....I] -> [&...I]
- Action: Redesign and implemented
- Do not convert all entities of [ddd;] to ASCII. Might need this conversion if they are in the input text.
- Source Code:
XmlHtmlHandler.java