Split Ligatures in Unicode
| Unicode | Mapped String | Char | Unicode Name |
|---|---|---|---|
| U+00C6 | AE | Æ | LATIN CAPITAL LETTER AE |
Please note:
As discussed in the Unicode Normalization, Unicode Normalization KC can be used for splitting ligatures. Unicode normalization KC decomposes a ligature into several Unicode characters. This process is used after the table mapping. Please note:
| Unicode | Mapped String | Char | Unicode Name |
|---|---|---|---|
| U+00B5 | µ | µ | MICRO SIGN |
| Unicode | Mapped ASCII | Char | Unicode Name |
|---|---|---|---|
| U+00BC | 1/4 | ¼ | VULGAR FRACTION ONE QUARTER |
| U+00BD | 1/2 | ½ | VULGAR FRACTION ONE HALF |
| U+00BE | 3/4 | ¾ | VULGAR FRACTION THREE QUARTER |
| U+00C6 | AE | Æ | LATIN CAPITAL LETTER AE |
| U+00E6 | ae | æ | LATIN SMALL LETTER AE |
| U+0132 | IJ | IJ | LATIN CAPITAL LETTER IJ |
| U+0133 | ij | ij | LATIN SMALL LETTER IJ |
| U+0152 | OE | Œ | LATIN CAPITAL LETTER OE |
| U+0153 | oe | œ | LATIN SMALL LETTER OE |
| U+FB00 | ff | ff | LATIN SMALL LIGATURE FF |
| U+FB01 | fi | fi | LATIN SMALL LIGATURE FI |
| U+FB02 | fl | fl | LATIN SMALL LIGATURE FL |
| U+FB03 | ffi | ffi | LATIN SMALL LIGATURE FFI |
| U+FB04 | ffl | ffl | LATIN SMALL LIGATURE FFL |
| U+FB05 | st | ſt | LATIN SMALL LIGATURE LONG S T |
| U+FB06 | st | st | LATIN SMALL LIGATURE ST |
- import com.ibm.icu.text.*;
- if the character is in the ligature mapping table
- Perform mapping
- else
- String normStr = Normalizer.normalize(inChar, Normalizer.NFKC);
- Set the split string to normStr.trim()