Commit cba88f0ab0f43696de3dfe6ec0e8744a2d367307
1 parent
a543e086
oups :)
git-svn-id: http://svn.net-core.org/repos/t-engine4@3320 51575b47-30f0-44d4-a5cc-537603b46e54
Showing
17 changed files
with
3489 additions
and
0 deletions
src/utf8proc/Changelog
0 → 100644
1 | +Changelog | |
2 | + | |
3 | +2006-06-02: | |
4 | +- initial release of version 0.1 | |
5 | + | |
6 | +2006-06-05: | |
7 | +- changed behaviour of PostgreSQL function to return NULL in case of | |
8 | + invalid input, rather than raising an exceptional condition | |
9 | +- improved efficiency of PostgreSQL function (no transformation to C string | |
10 | + is done) | |
11 | + | |
12 | +2006-06-20: | |
13 | +- added -fpic compiler flag in Makefile | |
14 | +- fixed bug in the C code for the ruby library (usage of non-existent | |
15 | + function) | |
16 | + | |
17 | +Release of version 0.2 | |
18 | + | |
19 | + | |
20 | +2006-07-18: | |
21 | +- changed normalization from NFC to NFKC for postgresql unifold function | |
22 | + | |
23 | +2006-08-04: | |
24 | +- added support to mark the beginning of a grapheme cluster with 0xFF | |
25 | + (option: CHARBOUND) | |
26 | +- added the ruby method String#chars, which is returning an array of UTF-8 | |
27 | + encoded grapheme clusters | |
28 | +- added NLF2LF transformation in postgresql unifold function | |
29 | +- added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no | |
30 | + normalization will be performed (different from previous versions) | |
31 | +- using integer constants rather than C-strings for character properties | |
32 | +- fixed (hopefully) a problem with the ruby library on Mac OS X, which | |
33 | + occured when compiler optimization was switched on | |
34 | + | |
35 | +Release of version 0.3 | |
36 | + | |
37 | + | |
38 | +2006-09-17: | |
39 | +- added the LUMP option, which lumps certain characters together | |
40 | + (see lump.txt) (also used for the PostgreSQL "unifold" function) | |
41 | +- added the STRIPMARK option, which strips marking characters | |
42 | + (or marks of composed characters) | |
43 | +- deprecated ruby method String#char_ary in favour of String#utf8chars | |
44 | + | |
45 | +Release of version 1.0 | |
46 | + | |
47 | + | |
48 | +2006-09-20: | |
49 | +- included a gem file for the ruby version of the library | |
50 | + | |
51 | +Release of version 1.0.1 | |
52 | + | |
53 | + | |
54 | +2006-09-21: | |
55 | +- included a check in Integer#utf8, which raises an exception, if the given | |
56 | + code-point is invalid because of being too high (this was missing yet) | |
57 | + | |
58 | +2006-12-26: | |
59 | +- added support for PostgreSQL version 8.2 | |
60 | + | |
61 | +Release of version 1.0.2 | |
62 | + | |
63 | + | |
64 | +2007-03-16: | |
65 | +- Fixed a bug in the ruby library, which caused an error, when splitting an | |
66 | + empty string at grapheme cluster boundaries (method String#utf8chars). | |
67 | + | |
68 | +Release of version 1.0.3 | |
69 | + | |
70 | + | |
71 | +2007-06-25: | |
72 | +- Added a new PostgreSQL function 'unistrip', which behaves like 'unifold', | |
73 | + but also removes all character marks (e.g. accents). | |
74 | + | |
75 | +2007-07-22: | |
76 | +- Changed license from BSD to MIT style. | |
77 | +- Added a new function 'utf8proc_codepoint_valid' to the C library. | |
78 | +- Changed compiler flags in Makefile from -g -O0 to -O2 | |
79 | +- The ruby script, which was used to build the utf8proc_data.c file, is now | |
80 | + included in the distribution. | |
81 | + | |
82 | +Release of version 1.1.1 | |
83 | + | |
84 | + | |
85 | +2007-07-25: | |
86 | +- Fixed a serious bug in the data file generator, which caused characters | |
87 | + being treated incorrectly, when stripping default ignorable characters or | |
88 | + calculating grapheme cluster boundaries. | |
89 | + | |
90 | +Release of version 1.1.2 | |
91 | + | |
92 | + | |
93 | +2008-10-04: | |
94 | +- Added a function utf8proc_version returning a string containing the version | |
95 | + number of the library. | |
96 | +- Included a target libutf8proc.dylib for MacOSX. | |
97 | + | |
98 | +2009-05-01: | |
99 | +- PostgreSQL 8.3 compatibility (use of SET_VARSIZE macro) | |
100 | + | |
101 | +Release of version 1.1.3 | |
102 | + | |
103 | + | |
104 | +2009-06-14: | |
105 | +- replaced C++ style comments for compatibility reasons | |
106 | +- added typecasts to suppress compiler warnings | |
107 | +- removed redundant source files for ruby-gemfile generation | |
108 | + | |
109 | +2009-08-19: | |
110 | +- Changed copyright notice for Public Software Group e. V. | |
111 | +- Minor changes in the README file | |
112 | +- Release of version 1.1.4 | |
113 | + | |
114 | +2009-08-20: | |
115 | +- Use RSTRING_PTR() and RSTRING_LEN() instead of RSTRING()->ptr and | |
116 | + RSTRING()->len for ruby1.9 compatibility (and #define them, if not | |
117 | + existent) | |
118 | + | |
119 | +2009-10-02: | |
120 | +- Patches for compatibility with Microsoft Visual Studio | |
121 | + | |
122 | +2009-10-08: | |
123 | +- Fixes to make utf8proc usable in C++ programs | |
124 | + | |
125 | +2009-10-16: | |
126 | +- Release of version 1.1.5 | |
127 | + | |
128 | +2009-10-08: | ... | ... |
src/utf8proc/LICENSE
0 → 100644
1 | + | |
2 | +Copyright (c) 2009 Public Software Group e. V., Berlin, Germany | |
3 | + | |
4 | +Permission is hereby granted, free of charge, to any person obtaining a | |
5 | +copy of this software and associated documentation files (the "Software"), | |
6 | +to deal in the Software without restriction, including without limitation | |
7 | +the rights to use, copy, modify, merge, publish, distribute, sublicense, | |
8 | +and/or sell copies of the Software, and to permit persons to whom the | |
9 | +Software is furnished to do so, subject to the following conditions: | |
10 | + | |
11 | +The above copyright notice and this permission notice shall be included in | |
12 | +all copies or substantial portions of the Software. | |
13 | + | |
14 | +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
15 | +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
16 | +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
17 | +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
18 | +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING | |
19 | +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER | |
20 | +DEALINGS IN THE SOFTWARE. | |
21 | + | |
22 | + | |
23 | +This software distribution contains derived data from a modified version of | |
24 | +the Unicode data files. The following license applies to that data: | |
25 | + | |
26 | +COPYRIGHT AND PERMISSION NOTICE | |
27 | + | |
28 | +Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed | |
29 | +under the Terms of Use in http://www.unicode.org/copyright.html. | |
30 | + | |
31 | +Permission is hereby granted, free of charge, to any person obtaining a | |
32 | +copy of the Unicode data files and any associated documentation (the "Data | |
33 | +Files") or Unicode software and any associated documentation (the | |
34 | +"Software") to deal in the Data Files or Software without restriction, | |
35 | +including without limitation the rights to use, copy, modify, merge, | |
36 | +publish, distribute, and/or sell copies of the Data Files or Software, and | |
37 | +to permit persons to whom the Data Files or Software are furnished to do | |
38 | +so, provided that (a) the above copyright notice(s) and this permission | |
39 | +notice appear with all copies of the Data Files or Software, (b) both the | |
40 | +above copyright notice(s) and this permission notice appear in associated | |
41 | +documentation, and (c) there is clear notice in each modified Data File or | |
42 | +in the Software as well as in the documentation associated with the Data | |
43 | +File(s) or Software that the data or software has been modified. | |
44 | + | |
45 | +THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY | |
46 | +KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF | |
47 | +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF | |
48 | +THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS | |
49 | +INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR | |
50 | +CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF | |
51 | +USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER | |
52 | +TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR | |
53 | +PERFORMANCE OF THE DATA FILES OR SOFTWARE. | |
54 | + | |
55 | +Except as contained in this notice, the name of a copyright holder shall | |
56 | +not be used in advertising or otherwise to promote the sale, use or other | |
57 | +dealings in these Data Files or Software without prior written | |
58 | +authorization of the copyright holder. | |
59 | + | |
60 | + | |
61 | +Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be | |
62 | +registered in some jurisdictions. All other trademarks and registered | |
63 | +trademarks mentioned herein are the property of their respective owners. | |
64 | + | ... | ... |
src/utf8proc/Makefile
0 → 100644
1 | +# libutf8proc Makefile | |
2 | + | |
3 | + | |
4 | +# settings | |
5 | + | |
6 | +cflags = -O2 -std=c99 -pedantic -Wall -fpic $(CFLAGS) | |
7 | +cc = $(CC) $(cflags) | |
8 | + | |
9 | + | |
10 | +# meta targets | |
11 | + | |
12 | +c-library: libutf8proc.a libutf8proc.so | |
13 | + | |
14 | +ruby-library: ruby/utf8proc_native.so | |
15 | + | |
16 | +pgsql-library: pgsql/utf8proc_pgsql.so | |
17 | + | |
18 | +all: c-library ruby-library ruby-gem pgsql-library | |
19 | + | |
20 | +clean:: | |
21 | + rm -f utf8proc.o libutf8proc.a libutf8proc.so | |
22 | + cd ruby/ && test -e Makefile && (make clean && rm -f Makefile) || true | |
23 | + rm -Rf ruby/gem/lib ruby/gem/ext | |
24 | + rm -f ruby/gem/utf8proc-*.gem | |
25 | + cd pgsql/ && make clean | |
26 | + | |
27 | +# real targets | |
28 | + | |
29 | +utf8proc.o: utf8proc.h utf8proc.c utf8proc_data.c | |
30 | + $(cc) -c -o utf8proc.o utf8proc.c | |
31 | + | |
32 | +libutf8proc.a: utf8proc.o | |
33 | + rm -f libutf8proc.a | |
34 | + ar rs libutf8proc.a utf8proc.o | |
35 | + | |
36 | +libutf8proc.so: utf8proc.o | |
37 | + $(cc) -shared -o libutf8proc.so utf8proc.o | |
38 | + chmod a-x libutf8proc.so | |
39 | + | |
40 | +libutf8proc.dylib: utf8proc.o | |
41 | + $(cc) -dynamiclib -o $@ $^ -install_name $(libdir)/$@ | |
42 | + | |
43 | +ruby/Makefile: ruby/extconf.rb | |
44 | + cd ruby && ruby extconf.rb | |
45 | + | |
46 | +ruby/utf8proc_native.so: utf8proc.h utf8proc.c utf8proc_data.c \ | |
47 | + ruby/utf8proc_native.c ruby/Makefile | |
48 | + cd ruby && make | |
49 | + | |
50 | +ruby/gem/lib/utf8proc.rb: ruby/utf8proc.rb | |
51 | + test -e ruby/gem/lib || mkdir ruby/gem/lib | |
52 | + cp ruby/utf8proc.rb ruby/gem/lib/ | |
53 | + | |
54 | +ruby/gem/ext/extconf.rb: ruby/extconf.rb | |
55 | + test -e ruby/gem/ext || mkdir ruby/gem/ext | |
56 | + cp ruby/extconf.rb ruby/gem/ext/ | |
57 | + | |
58 | +ruby/gem/ext/utf8proc_native.c: utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c | |
59 | + test -e ruby/gem/ext || mkdir ruby/gem/ext | |
60 | + cat utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c | grep -v '#include "utf8proc.h"' | grep -v '#include "utf8proc_data.c"' | grep -v '#include "../utf8proc.c"' > ruby/gem/ext/utf8proc_native.c | |
61 | + | |
62 | +ruby-gem:: ruby/gem/lib/utf8proc.rb ruby/gem/ext/extconf.rb ruby/gem/ext/utf8proc_native.c | |
63 | + cd ruby/gem && gem build utf8proc.gemspec | |
64 | + | |
65 | +pgsql/utf8proc_pgsql.so: utf8proc.h utf8proc.c utf8proc_data.c \ | |
66 | + pgsql/utf8proc_pgsql.c | |
67 | + cd pgsql && make | |
68 | + | ... | ... |
src/utf8proc/README
0 → 100644
1 | + | |
2 | +Please read the LICENSE file, which is shipping with this software. | |
3 | + | |
4 | + | |
5 | +*** QUICK START *** | |
6 | + | |
7 | +For compilation of the C library call "make c-library", for compilation of | |
8 | +the ruby library call "make ruby-library" and for compilation of the | |
9 | +PostgreSQL extension call "make pgsql-library". | |
10 | + | |
11 | +For ruby you can also create a gem-file by calling "make ruby-gem". | |
12 | + | |
13 | +"make all" can be used to build everything, but both ruby and PostgreSQL | |
14 | +installations are required in this case. | |
15 | + | |
16 | + | |
17 | +*** GENERAL INFORMATION *** | |
18 | + | |
19 | +The C library is found in this directory after successful compilation and | |
20 | +is named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of | |
21 | +the files "utf8proc.rb" and "utf8proc_native.so", which are found in the | |
22 | +subdirectory "ruby/". If you chose to create a gem-file it is placed in the | |
23 | +"ruby/gem" directory. The PostgreSQL extension is named "utf8proc_pgsql.so" | |
24 | +and resides in the "pgsql/" directory. | |
25 | + | |
26 | +Both the ruby library and the PostgreSQL extension are built as stand-alone | |
27 | +libraries and are therefore not dependent the dynamic version of the | |
28 | +C library files, but this behaviour might change in future releases. | |
29 | + | |
30 | +The Unicode version being supported is 5.0.0. | |
31 | +Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as | |
32 | + version 5.0.0 had not been available at the time of implementation. | |
33 | + | |
34 | +For Unicode normalizations, the following options have to be used: | |
35 | +Normalization Form C: STABLE, COMPOSE | |
36 | +Normalization Form D: STABLE, DECOMPOSE | |
37 | +Normalization Form KC: STABLE, COMPOSE, COMPAT | |
38 | +Normalization Form KD: STABLE, DECOMPOSE, COMPAT | |
39 | + | |
40 | + | |
41 | +*** C LIBRARY *** | |
42 | + | |
43 | +The documentation for the C library is found in the utf8proc.h header file. | |
44 | +"utf8proc_map" is most likely function you will be using for mapping UTF-8 | |
45 | +strings, unless you want to allocate memory yourself. | |
46 | + | |
47 | + | |
48 | +*** RUBY API *** | |
49 | + | |
50 | +The ruby library adds the methods "utf8map" and "utf8map!" to the String | |
51 | +class, and the method "utf8" to the Integer class. | |
52 | + | |
53 | +The String#utf8map method does the same as the "utf8proc_map" C function. | |
54 | +Options for the mapping procedure are passed as symbols, i.e: | |
55 | +"Hello".utf8map(:casefold) => "hello" | |
56 | + | |
57 | +The descriptions of all options are found in the C header file | |
58 | +"utf8proc.h". Please notice that the according symbols in ruby are all | |
59 | +lowercase. | |
60 | + | |
61 | +String#utf8map! is the destructive function in the meaning that the string | |
62 | +is replaced by the result. | |
63 | + | |
64 | +There are shortcuts for the 4 normalization forms specified by Unicode: | |
65 | +String#utf8nfd, String#utf8nfd!, | |
66 | +String#utf8nfc, String#utf8nfc!, | |
67 | +String#utf8nfkd, String#utf8nfkd!, | |
68 | +String#utf8nfkc, String#utf8nfkc! | |
69 | + | |
70 | +The method Integer#utf8 returns a UTF-8 string, which is containing the | |
71 | +unicode char given by the code point. | |
72 | +0x000A.utf8 => "\n" | |
73 | +0x2028.utf8 => "\342\200\250" | |
74 | + | |
75 | + | |
76 | +*** POSTGRESQL API *** | |
77 | + | |
78 | +For PostgreSQL there are two SQL functions supplied named "unifold" and | |
79 | +"unistrip". These functions function can be used to prepare index fields in | |
80 | +order to be folded in a way where string-comparisons make more sense, e.g. | |
81 | +where "bathtub" == "bath<soft hyphen>tub" | |
82 | +or "Hello World" == "hello world". | |
83 | + | |
84 | +CREATE TABLE people ( | |
85 | + id serial8 primary key, | |
86 | + name text, | |
87 | + CHECK (unifold(name) NOTNULL) | |
88 | +); | |
89 | +CREATE INDEX name_idx ON people (unifold(name)); | |
90 | +SELECT * FROM people WHERE unifold(name) = unifold('John Doe'); | |
91 | + | |
92 | +The function "unistrip" removes character marks like accents or diaeresis, | |
93 | +while "unifold" keeps then. | |
94 | + | |
95 | +NOTICE: The outputs of the function can change between releases, as | |
96 | + utf8proc does not follow a versioning stability policy. You have to | |
97 | + rebuild your database indicies, if you upgrade to a newer version | |
98 | + of utf8proc. | |
99 | + | |
100 | + | |
101 | +*** TODO *** | |
102 | + | |
103 | +- detect stable code points and process segments independently in order to | |
104 | + save memory | |
105 | +- do a quick check before normalizing strings to optimize speed | |
106 | +- support stream processing | |
107 | + | |
108 | + | |
109 | +*** CONTACT *** | |
110 | + | |
111 | +If you find any bugs or experience difficulties in compiling this software, | |
112 | +please contact us: | |
113 | + | |
114 | +Project page: http://www.public-software-group.org/utf8proc | |
115 | + | |
116 | + | ... | ... |
src/utf8proc/data_generator.rb
0 → 100644
1 | +#!/usr/pkg/bin/ruby | |
2 | + | |
3 | +# This file was used to generate the 'unicode_data.c' file by parsing the | |
4 | +# Unicode data file 'UnicodeData.txt' of the Unicode Character Database. | |
5 | +# It is included for informational purposes only and not intended for | |
6 | +# production use. | |
7 | + | |
8 | + | |
9 | +# Copyright (c) 2009 Public Software Group e. V., Berlin, Germany | |
10 | +# | |
11 | +# Permission is hereby granted, free of charge, to any person obtaining a | |
12 | +# copy of this software and associated documentation files (the "Software"), | |
13 | +# to deal in the Software without restriction, including without limitation | |
14 | +# the rights to use, copy, modify, merge, publish, distribute, sublicense, | |
15 | +# and/or sell copies of the Software, and to permit persons to whom the | |
16 | +# Software is furnished to do so, subject to the following conditions: | |
17 | +# | |
18 | +# The above copyright notice and this permission notice shall be included in | |
19 | +# all copies or substantial portions of the Software. | |
20 | +# | |
21 | +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
22 | +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
23 | +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
24 | +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
25 | +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING | |
26 | +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER | |
27 | +# DEALINGS IN THE SOFTWARE. | |
28 | + | |
29 | + | |
30 | +# This file contains derived data from a modified version of the | |
31 | +# Unicode data files. The following license applies to that data: | |
32 | +# | |
33 | +# COPYRIGHT AND PERMISSION NOTICE | |
34 | +# | |
35 | +# Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed | |
36 | +# under the Terms of Use in http://www.unicode.org/copyright.html. | |
37 | +# | |
38 | +# Permission is hereby granted, free of charge, to any person obtaining a | |
39 | +# copy of the Unicode data files and any associated documentation (the "Data | |
40 | +# Files") or Unicode software and any associated documentation (the | |
41 | +# "Software") to deal in the Data Files or Software without restriction, | |
42 | +# including without limitation the rights to use, copy, modify, merge, | |
43 | +# publish, distribute, and/or sell copies of the Data Files or Software, and | |
44 | +# to permit persons to whom the Data Files or Software are furnished to do | |
45 | +# so, provided that (a) the above copyright notice(s) and this permission | |
46 | +# notice appear with all copies of the Data Files or Software, (b) both the | |
47 | +# above copyright notice(s) and this permission notice appear in associated | |
48 | +# documentation, and (c) there is clear notice in each modified Data File or | |
49 | +# in the Software as well as in the documentation associated with the Data | |
50 | +# File(s) or Software that the data or software has been modified. | |
51 | +# | |
52 | +# THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY | |
53 | +# KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF | |
54 | +# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF | |
55 | +# THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS | |
56 | +# INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR | |
57 | +# CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF | |
58 | +# USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER | |
59 | +# TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR | |
60 | +# PERFORMANCE OF THE DATA FILES OR SOFTWARE. | |
61 | +# | |
62 | +# Except as contained in this notice, the name of a copyright holder shall | |
63 | +# not be used in advertising or otherwise to promote the sale, use or other | |
64 | +# dealings in these Data Files or Software without prior written | |
65 | +# authorization of the copyright holder. | |
66 | + | |
67 | + | |
68 | + | |
69 | +$ignorable_list = <<END_OF_LIST | |
70 | +0000..0008 ; Default_Ignorable_Code_Point # Cc [9] <control-0000>..<control-0008> | |
71 | +000E..001F ; Default_Ignorable_Code_Point # Cc [18] <control-000E>..<control-001F> | |
72 | +007F..0084 ; Default_Ignorable_Code_Point # Cc [6] <control-007F>..<control-0084> | |
73 | +0086..009F ; Default_Ignorable_Code_Point # Cc [26] <control-0086>..<control-009F> | |
74 | +00AD ; Default_Ignorable_Code_Point # Cf SOFT HYPHEN | |
75 | +034F ; Default_Ignorable_Code_Point # Mn COMBINING GRAPHEME JOINER | |
76 | +0600..0603 ; Default_Ignorable_Code_Point # Cf [4] ARABIC NUMBER SIGN..ARABIC SIGN SAFHA | |
77 | +06DD ; Default_Ignorable_Code_Point # Cf ARABIC END OF AYAH | |
78 | +070F ; Default_Ignorable_Code_Point # Cf SYRIAC ABBREVIATION MARK | |
79 | +115F..1160 ; Default_Ignorable_Code_Point # Lo [2] HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG FILLER | |
80 | +17B4..17B5 ; Default_Ignorable_Code_Point # Cf [2] KHMER VOWEL INHERENT AQ..KHMER VOWEL INHERENT AA | |
81 | +180B..180D ; Default_Ignorable_Code_Point # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE | |
82 | +200B..200F ; Default_Ignorable_Code_Point # Cf [5] ZERO WIDTH SPACE..RIGHT-TO-LEFT MARK | |
83 | +202A..202E ; Default_Ignorable_Code_Point # Cf [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE | |
84 | +2060..2063 ; Default_Ignorable_Code_Point # Cf [4] WORD JOINER..INVISIBLE SEPARATOR | |
85 | +2064..2069 ; Default_Ignorable_Code_Point # Cn [6] <reserved-2064>..<reserved-2069> | |
86 | +206A..206F ; Default_Ignorable_Code_Point # Cf [6] INHIBIT SYMMETRIC SWAPPING..NOMINAL DIGIT SHAPES | |
87 | +3164 ; Default_Ignorable_Code_Point # Lo HANGUL FILLER | |
88 | +D800..DFFF ; Default_Ignorable_Code_Point # Cs [2048] <surrogate-D800>..<surrogate-DFFF> | |
89 | +FE00..FE0F ; Default_Ignorable_Code_Point # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16 | |
90 | +FEFF ; Default_Ignorable_Code_Point # Cf ZERO WIDTH NO-BREAK SPACE | |
91 | +FFA0 ; Default_Ignorable_Code_Point # Lo HALFWIDTH HANGUL FILLER | |
92 | +FFF0..FFF8 ; Default_Ignorable_Code_Point # Cn [9] <reserved-FFF0>..<reserved-FFF8> | |
93 | +1D173..1D17A ; Default_Ignorable_Code_Point # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE | |
94 | +E0001 ; Default_Ignorable_Code_Point # Cf LANGUAGE TAG | |
95 | +E0002..E001F ; Default_Ignorable_Code_Point # Cn [30] <reserved-E0002>..<reserved-E001F> | |
96 | +E0020..E007F ; Default_Ignorable_Code_Point # Cf [96] TAG SPACE..CANCEL TAG | |
97 | +E0080..E00FF ; Default_Ignorable_Code_Point # Cn [128] <reserved-E0080>..<reserved-E00FF> | |
98 | +E0100..E01EF ; Default_Ignorable_Code_Point # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 | |
99 | +E01F0..E0FFF ; Default_Ignorable_Code_Point # Cn [3600] <reserved-E01F0>..<reserved-E0FFF> | |
100 | +END_OF_LIST | |
101 | + | |
102 | +$ignorable = [] | |
103 | +$ignorable_list.each do |entry| | |
104 | + if entry =~ /^([0-9A-F]+)\.\.([0-9A-F]+)/ | |
105 | + $1.hex.upto($2.hex) { |e2| $ignorable << e2 } | |
106 | + elsif entry =~ /^[0-9A-F]+/ | |
107 | + $ignorable << $&.hex | |
108 | + end | |
109 | +end | |
110 | + | |
111 | +$grapheme_extend_list = <<END_OF_LIST | |
112 | +0300..036F ; Grapheme_Extend # Mn [112] COMBINING GRAVE ACCENT..COMBINING LATIN SMALL LETTER X | |
113 | +0483..0486 ; Grapheme_Extend # Mn [4] COMBINING CYRILLIC TITLO..COMBINING CYRILLIC PSILI PNEUMATA | |
114 | +0488..0489 ; Grapheme_Extend # Me [2] COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..COMBINING CYRILLIC MILLIONS SIGN | |
115 | +0591..05BD ; Grapheme_Extend # Mn [45] HEBREW ACCENT ETNAHTA..HEBREW POINT METEG | |
116 | +05BF ; Grapheme_Extend # Mn HEBREW POINT RAFE | |
117 | +05C1..05C2 ; Grapheme_Extend # Mn [2] HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT | |
118 | +05C4..05C5 ; Grapheme_Extend # Mn [2] HEBREW MARK UPPER DOT..HEBREW MARK LOWER DOT | |
119 | +05C7 ; Grapheme_Extend # Mn HEBREW POINT QAMATS QATAN | |
120 | +0610..0615 ; Grapheme_Extend # Mn [6] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL HIGH TAH | |
121 | +064B..065E ; Grapheme_Extend # Mn [20] ARABIC FATHATAN..ARABIC FATHA WITH TWO DOTS | |
122 | +0670 ; Grapheme_Extend # Mn ARABIC LETTER SUPERSCRIPT ALEF | |
123 | +06D6..06DC ; Grapheme_Extend # Mn [7] ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA..ARABIC SMALL HIGH SEEN | |
124 | +06DE ; Grapheme_Extend # Me ARABIC START OF RUB EL HIZB | |
125 | +06DF..06E4 ; Grapheme_Extend # Mn [6] ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL HIGH MADDA | |
126 | +06E7..06E8 ; Grapheme_Extend # Mn [2] ARABIC SMALL HIGH YEH..ARABIC SMALL HIGH NOON | |
127 | +06EA..06ED ; Grapheme_Extend # Mn [4] ARABIC EMPTY CENTRE LOW STOP..ARABIC SMALL LOW MEEM | |
128 | +0711 ; Grapheme_Extend # Mn SYRIAC LETTER SUPERSCRIPT ALAPH | |
129 | +0730..074A ; Grapheme_Extend # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH | |
130 | +07A6..07B0 ; Grapheme_Extend # Mn [11] THAANA ABAFILI..THAANA SUKUN | |
131 | +07EB..07F3 ; Grapheme_Extend # Mn [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE | |
132 | +0901..0902 ; Grapheme_Extend # Mn [2] DEVANAGARI SIGN CANDRABINDU..DEVANAGARI SIGN ANUSVARA | |
133 | +093C ; Grapheme_Extend # Mn DEVANAGARI SIGN NUKTA | |
134 | +0941..0948 ; Grapheme_Extend # Mn [8] DEVANAGARI VOWEL SIGN U..DEVANAGARI VOWEL SIGN AI | |
135 | +094D ; Grapheme_Extend # Mn DEVANAGARI SIGN VIRAMA | |
136 | +0951..0954 ; Grapheme_Extend # Mn [4] DEVANAGARI STRESS SIGN UDATTA..DEVANAGARI ACUTE ACCENT | |
137 | +0962..0963 ; Grapheme_Extend # Mn [2] DEVANAGARI VOWEL SIGN VOCALIC L..DEVANAGARI VOWEL SIGN VOCALIC LL | |
138 | +0981 ; Grapheme_Extend # Mn BENGALI SIGN CANDRABINDU | |
139 | +09BC ; Grapheme_Extend # Mn BENGALI SIGN NUKTA | |
140 | +09BE ; Grapheme_Extend # Mc BENGALI VOWEL SIGN AA | |
141 | +09C1..09C4 ; Grapheme_Extend # Mn [4] BENGALI VOWEL SIGN U..BENGALI VOWEL SIGN VOCALIC RR | |
142 | +09CD ; Grapheme_Extend # Mn BENGALI SIGN VIRAMA | |
143 | +09D7 ; Grapheme_Extend # Mc BENGALI AU LENGTH MARK | |
144 | +09E2..09E3 ; Grapheme_Extend # Mn [2] BENGALI VOWEL SIGN VOCALIC L..BENGALI VOWEL SIGN VOCALIC LL | |
145 | +0A01..0A02 ; Grapheme_Extend # Mn [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI | |
146 | +0A3C ; Grapheme_Extend # Mn GURMUKHI SIGN NUKTA | |
147 | +0A41..0A42 ; Grapheme_Extend # Mn [2] GURMUKHI VOWEL SIGN U..GURMUKHI VOWEL SIGN UU | |
148 | +0A47..0A48 ; Grapheme_Extend # Mn [2] GURMUKHI VOWEL SIGN EE..GURMUKHI VOWEL SIGN AI | |
149 | +0A4B..0A4D ; Grapheme_Extend # Mn [3] GURMUKHI VOWEL SIGN OO..GURMUKHI SIGN VIRAMA | |
150 | +0A70..0A71 ; Grapheme_Extend # Mn [2] GURMUKHI TIPPI..GURMUKHI ADDAK | |
151 | +0A81..0A82 ; Grapheme_Extend # Mn [2] GUJARATI SIGN CANDRABINDU..GUJARATI SIGN ANUSVARA | |
152 | +0ABC ; Grapheme_Extend # Mn GUJARATI SIGN NUKTA | |
153 | +0AC1..0AC5 ; Grapheme_Extend # Mn [5] GUJARATI VOWEL SIGN U..GUJARATI VOWEL SIGN CANDRA E | |
154 | +0AC7..0AC8 ; Grapheme_Extend # Mn [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI | |
155 | +0ACD ; Grapheme_Extend # Mn GUJARATI SIGN VIRAMA | |
156 | +0AE2..0AE3 ; Grapheme_Extend # Mn [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL | |
157 | +0B01 ; Grapheme_Extend # Mn ORIYA SIGN CANDRABINDU | |
158 | +0B3C ; Grapheme_Extend # Mn ORIYA SIGN NUKTA | |
159 | +0B3E ; Grapheme_Extend # Mc ORIYA VOWEL SIGN AA | |
160 | +0B3F ; Grapheme_Extend # Mn ORIYA VOWEL SIGN I | |
161 | +0B41..0B43 ; Grapheme_Extend # Mn [3] ORIYA VOWEL SIGN U..ORIYA VOWEL SIGN VOCALIC R | |
162 | +0B4D ; Grapheme_Extend # Mn ORIYA SIGN VIRAMA | |
163 | +0B56 ; Grapheme_Extend # Mn ORIYA AI LENGTH MARK | |
164 | +0B57 ; Grapheme_Extend # Mc ORIYA AU LENGTH MARK | |
165 | +0B82 ; Grapheme_Extend # Mn TAMIL SIGN ANUSVARA | |
166 | +0BBE ; Grapheme_Extend # Mc TAMIL VOWEL SIGN AA | |
167 | +0BC0 ; Grapheme_Extend # Mn TAMIL VOWEL SIGN II | |
168 | +0BCD ; Grapheme_Extend # Mn TAMIL SIGN VIRAMA | |
169 | +0BD7 ; Grapheme_Extend # Mc TAMIL AU LENGTH MARK | |
170 | +0C3E..0C40 ; Grapheme_Extend # Mn [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II | |
171 | +0C46..0C48 ; Grapheme_Extend # Mn [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI | |
172 | +0C4A..0C4D ; Grapheme_Extend # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA | |
173 | +0C55..0C56 ; Grapheme_Extend # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK | |
174 | +0CBC ; Grapheme_Extend # Mn KANNADA SIGN NUKTA | |
175 | +0CBF ; Grapheme_Extend # Mn KANNADA VOWEL SIGN I | |
176 | +0CC2 ; Grapheme_Extend # Mc KANNADA VOWEL SIGN UU | |
177 | +0CC6 ; Grapheme_Extend # Mn KANNADA VOWEL SIGN E | |
178 | +0CCC..0CCD ; Grapheme_Extend # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA | |
179 | +0CD5..0CD6 ; Grapheme_Extend # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK | |
180 | +0CE2..0CE3 ; Grapheme_Extend # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL | |
181 | +0D3E ; Grapheme_Extend # Mc MALAYALAM VOWEL SIGN AA | |
182 | +0D41..0D43 ; Grapheme_Extend # Mn [3] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC R | |
183 | +0D4D ; Grapheme_Extend # Mn MALAYALAM SIGN VIRAMA | |
184 | +0D57 ; Grapheme_Extend # Mc MALAYALAM AU LENGTH MARK | |
185 | +0DCA ; Grapheme_Extend # Mn SINHALA SIGN AL-LAKUNA | |
186 | +0DCF ; Grapheme_Extend # Mc SINHALA VOWEL SIGN AELA-PILLA | |
187 | +0DD2..0DD4 ; Grapheme_Extend # Mn [3] SINHALA VOWEL SIGN KETTI IS-PILLA..SINHALA VOWEL SIGN KETTI PAA-PILLA | |
188 | +0DD6 ; Grapheme_Extend # Mn SINHALA VOWEL SIGN DIGA PAA-PILLA | |
189 | +0DDF ; Grapheme_Extend # Mc SINHALA VOWEL SIGN GAYANUKITTA | |
190 | +0E31 ; Grapheme_Extend # Mn THAI CHARACTER MAI HAN-AKAT | |
191 | +0E34..0E3A ; Grapheme_Extend # Mn [7] THAI CHARACTER SARA I..THAI CHARACTER PHINTHU | |
192 | +0E47..0E4E ; Grapheme_Extend # Mn [8] THAI CHARACTER MAITAIKHU..THAI CHARACTER YAMAKKAN | |
193 | +0EB1 ; Grapheme_Extend # Mn LAO VOWEL SIGN MAI KAN | |
194 | +0EB4..0EB9 ; Grapheme_Extend # Mn [6] LAO VOWEL SIGN I..LAO VOWEL SIGN UU | |
195 | +0EBB..0EBC ; Grapheme_Extend # Mn [2] LAO VOWEL SIGN MAI KON..LAO SEMIVOWEL SIGN LO | |
196 | +0EC8..0ECD ; Grapheme_Extend # Mn [6] LAO TONE MAI EK..LAO NIGGAHITA | |
197 | +0F18..0F19 ; Grapheme_Extend # Mn [2] TIBETAN ASTROLOGICAL SIGN -KHYUD PA..TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS | |
198 | +0F35 ; Grapheme_Extend # Mn TIBETAN MARK NGAS BZUNG NYI ZLA | |
199 | +0F37 ; Grapheme_Extend # Mn TIBETAN MARK NGAS BZUNG SGOR RTAGS | |
200 | +0F39 ; Grapheme_Extend # Mn TIBETAN MARK TSA -PHRU | |
201 | +0F71..0F7E ; Grapheme_Extend # Mn [14] TIBETAN VOWEL SIGN AA..TIBETAN SIGN RJES SU NGA RO | |
202 | +0F80..0F84 ; Grapheme_Extend # Mn [5] TIBETAN VOWEL SIGN REVERSED I..TIBETAN MARK HALANTA | |
203 | +0F86..0F87 ; Grapheme_Extend # Mn [2] TIBETAN SIGN LCI RTAGS..TIBETAN SIGN YANG RTAGS | |
204 | +0F90..0F97 ; Grapheme_Extend # Mn [8] TIBETAN SUBJOINED LETTER KA..TIBETAN SUBJOINED LETTER JA | |
205 | +0F99..0FBC ; Grapheme_Extend # Mn [36] TIBETAN SUBJOINED LETTER NYA..TIBETAN SUBJOINED LETTER FIXED-FORM RA | |
206 | +0FC6 ; Grapheme_Extend # Mn TIBETAN SYMBOL PADMA GDAN | |
207 | +102D..1030 ; Grapheme_Extend # Mn [4] MYANMAR VOWEL SIGN I..MYANMAR VOWEL SIGN UU | |
208 | +1032 ; Grapheme_Extend # Mn MYANMAR VOWEL SIGN AI | |
209 | +1036..1037 ; Grapheme_Extend # Mn [2] MYANMAR SIGN ANUSVARA..MYANMAR SIGN DOT BELOW | |
210 | +1039 ; Grapheme_Extend # Mn MYANMAR SIGN VIRAMA | |
211 | +1058..1059 ; Grapheme_Extend # Mn [2] MYANMAR VOWEL SIGN VOCALIC L..MYANMAR VOWEL SIGN VOCALIC LL | |
212 | +135F ; Grapheme_Extend # Mn ETHIOPIC COMBINING GEMINATION MARK | |
213 | +1712..1714 ; Grapheme_Extend # Mn [3] TAGALOG VOWEL SIGN I..TAGALOG SIGN VIRAMA | |
214 | +1732..1734 ; Grapheme_Extend # Mn [3] HANUNOO VOWEL SIGN I..HANUNOO SIGN PAMUDPOD | |
215 | +1752..1753 ; Grapheme_Extend # Mn [2] BUHID VOWEL SIGN I..BUHID VOWEL SIGN U | |
216 | +1772..1773 ; Grapheme_Extend # Mn [2] TAGBANWA VOWEL SIGN I..TAGBANWA VOWEL SIGN U | |
217 | +17B7..17BD ; Grapheme_Extend # Mn [7] KHMER VOWEL SIGN I..KHMER VOWEL SIGN UA | |
218 | +17C6 ; Grapheme_Extend # Mn KHMER SIGN NIKAHIT | |
219 | +17C9..17D3 ; Grapheme_Extend # Mn [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT | |
220 | +17DD ; Grapheme_Extend # Mn KHMER SIGN ATTHACAN | |
221 | +180B..180D ; Grapheme_Extend # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE | |
222 | +18A9 ; Grapheme_Extend # Mn MONGOLIAN LETTER ALI GALI DAGALGA | |
223 | +1920..1922 ; Grapheme_Extend # Mn [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U | |
224 | +1927..1928 ; Grapheme_Extend # Mn [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O | |
225 | +1932 ; Grapheme_Extend # Mn LIMBU SMALL LETTER ANUSVARA | |
226 | +1939..193B ; Grapheme_Extend # Mn [3] LIMBU SIGN MUKPHRENG..LIMBU SIGN SA-I | |
227 | +1A17..1A18 ; Grapheme_Extend # Mn [2] BUGINESE VOWEL SIGN I..BUGINESE VOWEL SIGN U | |
228 | +1B00..1B03 ; Grapheme_Extend # Mn [4] BALINESE SIGN ULU RICEM..BALINESE SIGN SURANG | |
229 | +1B34 ; Grapheme_Extend # Mn BALINESE SIGN REREKAN | |
230 | +1B36..1B3A ; Grapheme_Extend # Mn [5] BALINESE VOWEL SIGN ULU..BALINESE VOWEL SIGN RA REPA | |
231 | +1B3C ; Grapheme_Extend # Mn BALINESE VOWEL SIGN LA LENGA | |
232 | +1B42 ; Grapheme_Extend # Mn BALINESE VOWEL SIGN PEPET | |
233 | +1B6B..1B73 ; Grapheme_Extend # Mn [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG | |
234 | +1DC0..1DCA ; Grapheme_Extend # Mn [11] COMBINING DOTTED GRAVE ACCENT..COMBINING LATIN SMALL LETTER R BELOW | |
235 | +1DFE..1DFF ; Grapheme_Extend # Mn [2] COMBINING LEFT ARROWHEAD ABOVE..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW | |
236 | +200C..200D ; Grapheme_Extend # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER | |
237 | +20D0..20DC ; Grapheme_Extend # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE | |
238 | +20DD..20E0 ; Grapheme_Extend # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH | |
239 | +20E1 ; Grapheme_Extend # Mn COMBINING LEFT RIGHT ARROW ABOVE | |
240 | +20E2..20E4 ; Grapheme_Extend # Me [3] COMBINING ENCLOSING SCREEN..COMBINING ENCLOSING UPWARD POINTING TRIANGLE | |
241 | +20E5..20EF ; Grapheme_Extend # Mn [11] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING RIGHT ARROW BELOW | |
242 | +302A..302F ; Grapheme_Extend # Mn [6] IDEOGRAPHIC LEVEL TONE MARK..HANGUL DOUBLE DOT TONE MARK | |
243 | +3099..309A ; Grapheme_Extend # Mn [2] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK | |
244 | +A806 ; Grapheme_Extend # Mn SYLOTI NAGRI SIGN HASANTA | |
245 | +A80B ; Grapheme_Extend # Mn SYLOTI NAGRI SIGN ANUSVARA | |
246 | +A825..A826 ; Grapheme_Extend # Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E | |
247 | +FB1E ; Grapheme_Extend # Mn HEBREW POINT JUDEO-SPANISH VARIKA | |
248 | +FE00..FE0F ; Grapheme_Extend # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16 | |
249 | +FE20..FE23 ; Grapheme_Extend # Mn [4] COMBINING LIGATURE LEFT HALF..COMBINING DOUBLE TILDE RIGHT HALF | |
250 | +10A01..10A03 ; Grapheme_Extend # Mn [3] KHAROSHTHI VOWEL SIGN I..KHAROSHTHI VOWEL SIGN VOCALIC R | |
251 | +10A05..10A06 ; Grapheme_Extend # Mn [2] KHAROSHTHI VOWEL SIGN E..KHAROSHTHI VOWEL SIGN O | |
252 | +10A0C..10A0F ; Grapheme_Extend # Mn [4] KHAROSHTHI VOWEL LENGTH MARK..KHAROSHTHI SIGN VISARGA | |
253 | +10A38..10A3A ; Grapheme_Extend # Mn [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW | |
254 | +10A3F ; Grapheme_Extend # Mn KHAROSHTHI VIRAMA | |
255 | +1D165 ; Grapheme_Extend # Mc MUSICAL SYMBOL COMBINING STEM | |
256 | +1D167..1D169 ; Grapheme_Extend # Mn [3] MUSICAL SYMBOL COMBINING TREMOLO-1..MUSICAL SYMBOL COMBINING TREMOLO-3 | |
257 | +1D16E..1D172 ; Grapheme_Extend # Mc [5] MUSICAL SYMBOL COMBINING FLAG-1..MUSICAL SYMBOL COMBINING FLAG-5 | |
258 | +1D17B..1D182 ; Grapheme_Extend # Mn [8] MUSICAL SYMBOL COMBINING ACCENT..MUSICAL SYMBOL COMBINING LOURE | |
259 | +1D185..1D18B ; Grapheme_Extend # Mn [7] MUSICAL SYMBOL COMBINING DOIT..MUSICAL SYMBOL COMBINING TRIPLE TONGUE | |
260 | +1D1AA..1D1AD ; Grapheme_Extend # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO | |
261 | +1D242..1D244 ; Grapheme_Extend # Mn [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME | |
262 | +E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 | |
263 | +END_OF_LIST | |
264 | + | |
265 | +$grapheme_extend = [] | |
266 | +$grapheme_extend_list.each do |entry| | |
267 | + if entry =~ /^([0-9A-F]+)\.\.([0-9A-F]+)/ | |
268 | + $1.hex.upto($2.hex) { |e2| $grapheme_extend << e2 } | |
269 | + elsif entry =~ /^[0-9A-F]+/ | |
270 | + $grapheme_extend << $&.hex | |
271 | + end | |
272 | +end | |
273 | + | |
274 | +$exclusions = <<END_OF_LIST | |
275 | +0958 # DEVANAGARI LETTER QA | |
276 | +0959 # DEVANAGARI LETTER KHHA | |
277 | +095A # DEVANAGARI LETTER GHHA | |
278 | +095B # DEVANAGARI LETTER ZA | |
279 | +095C # DEVANAGARI LETTER DDDHA | |
280 | +095D # DEVANAGARI LETTER RHA | |
281 | +095E # DEVANAGARI LETTER FA | |
282 | +095F # DEVANAGARI LETTER YYA | |
283 | +09DC # BENGALI LETTER RRA | |
284 | +09DD # BENGALI LETTER RHA | |
285 | +09DF # BENGALI LETTER YYA | |
286 | +0A33 # GURMUKHI LETTER LLA | |
287 | +0A36 # GURMUKHI LETTER SHA | |
288 | +0A59 # GURMUKHI LETTER KHHA | |
289 | +0A5A # GURMUKHI LETTER GHHA | |
290 | +0A5B # GURMUKHI LETTER ZA | |
291 | +0A5E # GURMUKHI LETTER FA | |
292 | +0B5C # ORIYA LETTER RRA | |
293 | +0B5D # ORIYA LETTER RHA | |
294 | +0F43 # TIBETAN LETTER GHA | |
295 | +0F4D # TIBETAN LETTER DDHA | |
296 | +0F52 # TIBETAN LETTER DHA | |
297 | +0F57 # TIBETAN LETTER BHA | |
298 | +0F5C # TIBETAN LETTER DZHA | |
299 | +0F69 # TIBETAN LETTER KSSA | |
300 | +0F76 # TIBETAN VOWEL SIGN VOCALIC R | |
301 | +0F78 # TIBETAN VOWEL SIGN VOCALIC L | |
302 | +0F93 # TIBETAN SUBJOINED LETTER GHA | |
303 | +0F9D # TIBETAN SUBJOINED LETTER DDHA | |
304 | +0FA2 # TIBETAN SUBJOINED LETTER DHA | |
305 | +0FA7 # TIBETAN SUBJOINED LETTER BHA | |
306 | +0FAC # TIBETAN SUBJOINED LETTER DZHA | |
307 | +0FB9 # TIBETAN SUBJOINED LETTER KSSA | |
308 | +FB1D # HEBREW LETTER YOD WITH HIRIQ | |
309 | +FB1F # HEBREW LIGATURE YIDDISH YOD YOD PATAH | |
310 | +FB2A # HEBREW LETTER SHIN WITH SHIN DOT | |
311 | +FB2B # HEBREW LETTER SHIN WITH SIN DOT | |
312 | +FB2C # HEBREW LETTER SHIN WITH DAGESH AND SHIN DOT | |
313 | +FB2D # HEBREW LETTER SHIN WITH DAGESH AND SIN DOT | |
314 | +FB2E # HEBREW LETTER ALEF WITH PATAH | |
315 | +FB2F # HEBREW LETTER ALEF WITH QAMATS | |
316 | +FB30 # HEBREW LETTER ALEF WITH MAPIQ | |
317 | +FB31 # HEBREW LETTER BET WITH DAGESH | |
318 | +FB32 # HEBREW LETTER GIMEL WITH DAGESH | |
319 | +FB33 # HEBREW LETTER DALET WITH DAGESH | |
320 | +FB34 # HEBREW LETTER HE WITH MAPIQ | |
321 | +FB35 # HEBREW LETTER VAV WITH DAGESH | |
322 | +FB36 # HEBREW LETTER ZAYIN WITH DAGESH | |
323 | +FB38 # HEBREW LETTER TET WITH DAGESH | |
324 | +FB39 # HEBREW LETTER YOD WITH DAGESH | |
325 | +FB3A # HEBREW LETTER FINAL KAF WITH DAGESH | |
326 | +FB3B # HEBREW LETTER KAF WITH DAGESH | |
327 | +FB3C # HEBREW LETTER LAMED WITH DAGESH | |
328 | +FB3E # HEBREW LETTER MEM WITH DAGESH | |
329 | +FB40 # HEBREW LETTER NUN WITH DAGESH | |
330 | +FB41 # HEBREW LETTER SAMEKH WITH DAGESH | |
331 | +FB43 # HEBREW LETTER FINAL PE WITH DAGESH | |
332 | +FB44 # HEBREW LETTER PE WITH DAGESH | |
333 | +FB46 # HEBREW LETTER TSADI WITH DAGESH | |
334 | +FB47 # HEBREW LETTER QOF WITH DAGESH | |
335 | +FB48 # HEBREW LETTER RESH WITH DAGESH | |
336 | +FB49 # HEBREW LETTER SHIN WITH DAGESH | |
337 | +FB4A # HEBREW LETTER TAV WITH DAGESH | |
338 | +FB4B # HEBREW LETTER VAV WITH HOLAM | |
339 | +FB4C # HEBREW LETTER BET WITH RAFE | |
340 | +FB4D # HEBREW LETTER KAF WITH RAFE | |
341 | +FB4E # HEBREW LETTER PE WITH RAFE | |
342 | +END_OF_LIST | |
343 | +$exclusions = $exclusions.chomp.split("\n").collect { |e| e.hex } | |
344 | + | |
345 | +$excl_version = <<END_OF_LIST | |
346 | +2ADC # FORKING | |
347 | +1D15E # MUSICAL SYMBOL HALF NOTE | |
348 | +1D15F # MUSICAL SYMBOL QUARTER NOTE | |
349 | +1D160 # MUSICAL SYMBOL EIGHTH NOTE | |
350 | +1D161 # MUSICAL SYMBOL SIXTEENTH NOTE | |
351 | +1D162 # MUSICAL SYMBOL THIRTY-SECOND NOTE | |
352 | +1D163 # MUSICAL SYMBOL SIXTY-FOURTH NOTE | |
353 | +1D164 # MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE | |
354 | +1D1BB # MUSICAL SYMBOL MINIMA | |
355 | +1D1BC # MUSICAL SYMBOL MINIMA BLACK | |
356 | +1D1BD # MUSICAL SYMBOL SEMIMINIMA WHITE | |
357 | +1D1BE # MUSICAL SYMBOL SEMIMINIMA BLACK | |
358 | +1D1BF # MUSICAL SYMBOL FUSA WHITE | |
359 | +1D1C0 # MUSICAL SYMBOL FUSA BLACK | |
360 | +END_OF_LIST | |
361 | +$excl_version = $excl_version.chomp.split("\n").collect { |e| e.hex } | |
362 | + | |
363 | +$case_folding_string = <<END_OF_LIST | |
364 | +0041; C; 0061; # LATIN CAPITAL LETTER A | |
365 | +0042; C; 0062; # LATIN CAPITAL LETTER B | |
366 | +0043; C; 0063; # LATIN CAPITAL LETTER C | |
367 | +0044; C; 0064; # LATIN CAPITAL LETTER D | |
368 | +0045; C; 0065; # LATIN CAPITAL LETTER E | |
369 | +0046; C; 0066; # LATIN CAPITAL LETTER F | |
370 | +0047; C; 0067; # LATIN CAPITAL LETTER G | |
371 | +0048; C; 0068; # LATIN CAPITAL LETTER H | |
372 | +0049; C; 0069; # LATIN CAPITAL LETTER I | |
373 | +004A; C; 006A; # LATIN CAPITAL LETTER J | |
374 | +004B; C; 006B; # LATIN CAPITAL LETTER K | |
375 | +004C; C; 006C; # LATIN CAPITAL LETTER L | |
376 | +004D; C; 006D; # LATIN CAPITAL LETTER M | |
377 | +004E; C; 006E; # LATIN CAPITAL LETTER N | |
378 | +004F; C; 006F; # LATIN CAPITAL LETTER O | |
379 | +0050; C; 0070; # LATIN CAPITAL LETTER P | |
380 | +0051; C; 0071; # LATIN CAPITAL LETTER Q | |
381 | +0052; C; 0072; # LATIN CAPITAL LETTER R | |
382 | +0053; C; 0073; # LATIN CAPITAL LETTER S | |
383 | +0054; C; 0074; # LATIN CAPITAL LETTER T | |
384 | +0055; C; 0075; # LATIN CAPITAL LETTER U | |
385 | +0056; C; 0076; # LATIN CAPITAL LETTER V | |
386 | +0057; C; 0077; # LATIN CAPITAL LETTER W | |
387 | +0058; C; 0078; # LATIN CAPITAL LETTER X | |
388 | +0059; C; 0079; # LATIN CAPITAL LETTER Y | |
389 | +005A; C; 007A; # LATIN CAPITAL LETTER Z | |
390 | +00B5; C; 03BC; # MICRO SIGN | |
391 | +00C0; C; 00E0; # LATIN CAPITAL LETTER A WITH GRAVE | |
392 | +00C1; C; 00E1; # LATIN CAPITAL LETTER A WITH ACUTE | |
393 | +00C2; C; 00E2; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX | |
394 | +00C3; C; 00E3; # LATIN CAPITAL LETTER A WITH TILDE | |
395 | +00C4; C; 00E4; # LATIN CAPITAL LETTER A WITH DIAERESIS | |
396 | +00C5; C; 00E5; # LATIN CAPITAL LETTER A WITH RING ABOVE | |
397 | +00C6; C; 00E6; # LATIN CAPITAL LETTER AE | |
398 | +00C7; C; 00E7; # LATIN CAPITAL LETTER C WITH CEDILLA | |
399 | +00C8; C; 00E8; # LATIN CAPITAL LETTER E WITH GRAVE | |
400 | +00C9; C; 00E9; # LATIN CAPITAL LETTER E WITH ACUTE | |
401 | +00CA; C; 00EA; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX | |
402 | +00CB; C; 00EB; # LATIN CAPITAL LETTER E WITH DIAERESIS | |
403 | +00CC; C; 00EC; # LATIN CAPITAL LETTER I WITH GRAVE | |
404 | +00CD; C; 00ED; # LATIN CAPITAL LETTER I WITH ACUTE | |
405 | +00CE; C; 00EE; # LATIN CAPITAL LETTER I WITH CIRCUMFLEX | |
406 | +00CF; C; 00EF; # LATIN CAPITAL LETTER I WITH DIAERESIS | |
407 | +00D0; C; 00F0; # LATIN CAPITAL LETTER ETH | |
408 | +00D1; C; 00F1; # LATIN CAPITAL LETTER N WITH TILDE | |
409 | +00D2; C; 00F2; # LATIN CAPITAL LETTER O WITH GRAVE | |
410 | +00D3; C; 00F3; # LATIN CAPITAL LETTER O WITH ACUTE | |
411 | +00D4; C; 00F4; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX | |
412 | +00D5; C; 00F5; # LATIN CAPITAL LETTER O WITH TILDE | |
413 | +00D6; C; 00F6; # LATIN CAPITAL LETTER O WITH DIAERESIS | |
414 | +00D8; C; 00F8; # LATIN CAPITAL LETTER O WITH STROKE | |
415 | +00D9; C; 00F9; # LATIN CAPITAL LETTER U WITH GRAVE | |
416 | +00DA; C; 00FA; # LATIN CAPITAL LETTER U WITH ACUTE | |
417 | +00DB; C; 00FB; # LATIN CAPITAL LETTER U WITH CIRCUMFLEX | |
418 | +00DC; C; 00FC; # LATIN CAPITAL LETTER U WITH DIAERESIS | |
419 | +00DD; C; 00FD; # LATIN CAPITAL LETTER Y WITH ACUTE | |
420 | +00DE; C; 00FE; # LATIN CAPITAL LETTER THORN | |
421 | +00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S | |
422 | +0100; C; 0101; # LATIN CAPITAL LETTER A WITH MACRON | |
423 | +0102; C; 0103; # LATIN CAPITAL LETTER A WITH BREVE | |
424 | +0104; C; 0105; # LATIN CAPITAL LETTER A WITH OGONEK | |
425 | +0106; C; 0107; # LATIN CAPITAL LETTER C WITH ACUTE | |
426 | +0108; C; 0109; # LATIN CAPITAL LETTER C WITH CIRCUMFLEX | |
427 | +010A; C; 010B; # LATIN CAPITAL LETTER C WITH DOT ABOVE | |
428 | +010C; C; 010D; # LATIN CAPITAL LETTER C WITH CARON | |
429 | +010E; C; 010F; # LATIN CAPITAL LETTER D WITH CARON | |
430 | +0110; C; 0111; # LATIN CAPITAL LETTER D WITH STROKE | |
431 | +0112; C; 0113; # LATIN CAPITAL LETTER E WITH MACRON | |
432 | +0114; C; 0115; # LATIN CAPITAL LETTER E WITH BREVE | |
433 | +0116; C; 0117; # LATIN CAPITAL LETTER E WITH DOT ABOVE | |
434 | +0118; C; 0119; # LATIN CAPITAL LETTER E WITH OGONEK | |
435 | +011A; C; 011B; # LATIN CAPITAL LETTER E WITH CARON | |
436 | +011C; C; 011D; # LATIN CAPITAL LETTER G WITH CIRCUMFLEX | |
437 | +011E; C; 011F; # LATIN CAPITAL LETTER G WITH BREVE | |
438 | +0120; C; 0121; # LATIN CAPITAL LETTER G WITH DOT ABOVE | |
439 | +0122; C; 0123; # LATIN CAPITAL LETTER G WITH CEDILLA | |
440 | +0124; C; 0125; # LATIN CAPITAL LETTER H WITH CIRCUMFLEX | |
441 | +0126; C; 0127; # LATIN CAPITAL LETTER H WITH STROKE | |
442 | +0128; C; 0129; # LATIN CAPITAL LETTER I WITH TILDE | |
443 | +012A; C; 012B; # LATIN CAPITAL LETTER I WITH MACRON | |
444 | +012C; C; 012D; # LATIN CAPITAL LETTER I WITH BREVE | |
445 | +012E; C; 012F; # LATIN CAPITAL LETTER I WITH OGONEK | |
446 | +0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE | |
447 | +0132; C; 0133; # LATIN CAPITAL LIGATURE IJ | |
448 | +0134; C; 0135; # LATIN CAPITAL LETTER J WITH CIRCUMFLEX | |
449 | +0136; C; 0137; # LATIN CAPITAL LETTER K WITH CEDILLA | |
450 | +0139; C; 013A; # LATIN CAPITAL LETTER L WITH ACUTE | |
451 | +013B; C; 013C; # LATIN CAPITAL LETTER L WITH CEDILLA | |
452 | +013D; C; 013E; # LATIN CAPITAL LETTER L WITH CARON | |
453 | +013F; C; 0140; # LATIN CAPITAL LETTER L WITH MIDDLE DOT | |
454 | +0141; C; 0142; # LATIN CAPITAL LETTER L WITH STROKE | |
455 | +0143; C; 0144; # LATIN CAPITAL LETTER N WITH ACUTE | |
456 | +0145; C; 0146; # LATIN CAPITAL LETTER N WITH CEDILLA | |
457 | +0147; C; 0148; # LATIN CAPITAL LETTER N WITH CARON | |
458 | +0149; F; 02BC 006E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE | |
459 | +014A; C; 014B; # LATIN CAPITAL LETTER ENG | |
460 | +014C; C; 014D; # LATIN CAPITAL LETTER O WITH MACRON | |
461 | +014E; C; 014F; # LATIN CAPITAL LETTER O WITH BREVE | |
462 | +0150; C; 0151; # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE | |
463 | +0152; C; 0153; # LATIN CAPITAL LIGATURE OE | |
464 | +0154; C; 0155; # LATIN CAPITAL LETTER R WITH ACUTE | |
465 | +0156; C; 0157; # LATIN CAPITAL LETTER R WITH CEDILLA | |
466 | +0158; C; 0159; # LATIN CAPITAL LETTER R WITH CARON | |
467 | +015A; C; 015B; # LATIN CAPITAL LETTER S WITH ACUTE | |
468 | +015C; C; 015D; # LATIN CAPITAL LETTER S WITH CIRCUMFLEX | |
469 | +015E; C; 015F; # LATIN CAPITAL LETTER S WITH CEDILLA | |
470 | +0160; C; 0161; # LATIN CAPITAL LETTER S WITH CARON | |
471 | +0162; C; 0163; # LATIN CAPITAL LETTER T WITH CEDILLA | |
472 | +0164; C; 0165; # LATIN CAPITAL LETTER T WITH CARON | |
473 | +0166; C; 0167; # LATIN CAPITAL LETTER T WITH STROKE | |
474 | +0168; C; 0169; # LATIN CAPITAL LETTER U WITH TILDE | |
475 | +016A; C; 016B; # LATIN CAPITAL LETTER U WITH MACRON | |
476 | +016C; C; 016D; # LATIN CAPITAL LETTER U WITH BREVE | |
477 | +016E; C; 016F; # LATIN CAPITAL LETTER U WITH RING ABOVE | |
478 | +0170; C; 0171; # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE | |
479 | +0172; C; 0173; # LATIN CAPITAL LETTER U WITH OGONEK | |
480 | +0174; C; 0175; # LATIN CAPITAL LETTER W WITH CIRCUMFLEX | |
481 | +0176; C; 0177; # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX | |
482 | +0178; C; 00FF; # LATIN CAPITAL LETTER Y WITH DIAERESIS | |
483 | +0179; C; 017A; # LATIN CAPITAL LETTER Z WITH ACUTE | |
484 | +017B; C; 017C; # LATIN CAPITAL LETTER Z WITH DOT ABOVE | |
485 | +017D; C; 017E; # LATIN CAPITAL LETTER Z WITH CARON | |
486 | +017F; C; 0073; # LATIN SMALL LETTER LONG S | |
487 | +0181; C; 0253; # LATIN CAPITAL LETTER B WITH HOOK | |
488 | +0182; C; 0183; # LATIN CAPITAL LETTER B WITH TOPBAR | |
489 | +0184; C; 0185; # LATIN CAPITAL LETTER TONE SIX | |
490 | +0186; C; 0254; # LATIN CAPITAL LETTER OPEN O | |
491 | +0187; C; 0188; # LATIN CAPITAL LETTER C WITH HOOK | |
492 | +0189; C; 0256; # LATIN CAPITAL LETTER AFRICAN D | |
493 | +018A; C; 0257; # LATIN CAPITAL LETTER D WITH HOOK | |
494 | +018B; C; 018C; # LATIN CAPITAL LETTER D WITH TOPBAR | |
495 | +018E; C; 01DD; # LATIN CAPITAL LETTER REVERSED E | |
496 | +018F; C; 0259; # LATIN CAPITAL LETTER SCHWA | |
497 | +0190; C; 025B; # LATIN CAPITAL LETTER OPEN E | |
498 | +0191; C; 0192; # LATIN CAPITAL LETTER F WITH HOOK | |
499 | +0193; C; 0260; # LATIN CAPITAL LETTER G WITH HOOK | |
500 | +0194; C; 0263; # LATIN CAPITAL LETTER GAMMA | |
501 | +0196; C; 0269; # LATIN CAPITAL LETTER IOTA | |
502 | +0197; C; 0268; # LATIN CAPITAL LETTER I WITH STROKE | |
503 | +0198; C; 0199; # LATIN CAPITAL LETTER K WITH HOOK | |
504 | +019C; C; 026F; # LATIN CAPITAL LETTER TURNED M | |
505 | +019D; C; 0272; # LATIN CAPITAL LETTER N WITH LEFT HOOK | |
506 | +019F; C; 0275; # LATIN CAPITAL LETTER O WITH MIDDLE TILDE | |
507 | +01A0; C; 01A1; # LATIN CAPITAL LETTER O WITH HORN | |
508 | +01A2; C; 01A3; # LATIN CAPITAL LETTER OI | |
509 | +01A4; C; 01A5; # LATIN CAPITAL LETTER P WITH HOOK | |
510 | +01A6; C; 0280; # LATIN LETTER YR | |
511 | +01A7; C; 01A8; # LATIN CAPITAL LETTER TONE TWO | |
512 | +01A9; C; 0283; # LATIN CAPITAL LETTER ESH | |
513 | +01AC; C; 01AD; # LATIN CAPITAL LETTER T WITH HOOK | |
514 | +01AE; C; 0288; # LATIN CAPITAL LETTER T WITH RETROFLEX HOOK | |
515 | +01AF; C; 01B0; # LATIN CAPITAL LETTER U WITH HORN | |
516 | +01B1; C; 028A; # LATIN CAPITAL LETTER UPSILON | |
517 | +01B2; C; 028B; # LATIN CAPITAL LETTER V WITH HOOK | |
518 | +01B3; C; 01B4; # LATIN CAPITAL LETTER Y WITH HOOK | |
519 | +01B5; C; 01B6; # LATIN CAPITAL LETTER Z WITH STROKE | |
520 | +01B7; C; 0292; # LATIN CAPITAL LETTER EZH | |
521 | +01B8; C; 01B9; # LATIN CAPITAL LETTER EZH REVERSED | |
522 | +01BC; C; 01BD; # LATIN CAPITAL LETTER TONE FIVE | |
523 | +01C4; C; 01C6; # LATIN CAPITAL LETTER DZ WITH CARON | |
524 | +01C5; C; 01C6; # LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON | |
525 | +01C7; C; 01C9; # LATIN CAPITAL LETTER LJ | |
526 | +01C8; C; 01C9; # LATIN CAPITAL LETTER L WITH SMALL LETTER J | |
527 | +01CA; C; 01CC; # LATIN CAPITAL LETTER NJ | |
528 | +01CB; C; 01CC; # LATIN CAPITAL LETTER N WITH SMALL LETTER J | |
529 | +01CD; C; 01CE; # LATIN CAPITAL LETTER A WITH CARON | |
530 | +01CF; C; 01D0; # LATIN CAPITAL LETTER I WITH CARON | |
531 | +01D1; C; 01D2; # LATIN CAPITAL LETTER O WITH CARON | |
532 | +01D3; C; 01D4; # LATIN CAPITAL LETTER U WITH CARON | |
533 | +01D5; C; 01D6; # LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON | |
534 | +01D7; C; 01D8; # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE | |
535 | +01D9; C; 01DA; # LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON | |
536 | +01DB; C; 01DC; # LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE | |
537 | +01DE; C; 01DF; # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON | |
538 | +01E0; C; 01E1; # LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON | |
539 | +01E2; C; 01E3; # LATIN CAPITAL LETTER AE WITH MACRON | |
540 | +01E4; C; 01E5; # LATIN CAPITAL LETTER G WITH STROKE | |
541 | +01E6; C; 01E7; # LATIN CAPITAL LETTER G WITH CARON | |
542 | +01E8; C; 01E9; # LATIN CAPITAL LETTER K WITH CARON | |
543 | +01EA; C; 01EB; # LATIN CAPITAL LETTER O WITH OGONEK | |
544 | +01EC; C; 01ED; # LATIN CAPITAL LETTER O WITH OGONEK AND MACRON | |
545 | +01EE; C; 01EF; # LATIN CAPITAL LETTER EZH WITH CARON | |
546 | +01F0; F; 006A 030C; # LATIN SMALL LETTER J WITH CARON | |
547 | +01F1; C; 01F3; # LATIN CAPITAL LETTER DZ | |
548 | +01F2; C; 01F3; # LATIN CAPITAL LETTER D WITH SMALL LETTER Z | |
549 | +01F4; C; 01F5; # LATIN CAPITAL LETTER G WITH ACUTE | |
550 | +01F6; C; 0195; # LATIN CAPITAL LETTER HWAIR | |
551 | +01F7; C; 01BF; # LATIN CAPITAL LETTER WYNN | |
552 | +01F8; C; 01F9; # LATIN CAPITAL LETTER N WITH GRAVE | |
553 | +01FA; C; 01FB; # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE | |
554 | +01FC; C; 01FD; # LATIN CAPITAL LETTER AE WITH ACUTE | |
555 | +01FE; C; 01FF; # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE | |
556 | +0200; C; 0201; # LATIN CAPITAL LETTER A WITH DOUBLE GRAVE | |
557 | +0202; C; 0203; # LATIN CAPITAL LETTER A WITH INVERTED BREVE | |
558 | +0204; C; 0205; # LATIN CAPITAL LETTER E WITH DOUBLE GRAVE | |
559 | +0206; C; 0207; # LATIN CAPITAL LETTER E WITH INVERTED BREVE | |
560 | +0208; C; 0209; # LATIN CAPITAL LETTER I WITH DOUBLE GRAVE | |
561 | +020A; C; 020B; # LATIN CAPITAL LETTER I WITH INVERTED BREVE | |
562 | +020C; C; 020D; # LATIN CAPITAL LETTER O WITH DOUBLE GRAVE | |
563 | +020E; C; 020F; # LATIN CAPITAL LETTER O WITH INVERTED BREVE | |
564 | +0210; C; 0211; # LATIN CAPITAL LETTER R WITH DOUBLE GRAVE | |
565 | +0212; C; 0213; # LATIN CAPITAL LETTER R WITH INVERTED BREVE | |
566 | +0214; C; 0215; # LATIN CAPITAL LETTER U WITH DOUBLE GRAVE | |
567 | +0216; C; 0217; # LATIN CAPITAL LETTER U WITH INVERTED BREVE | |
568 | +0218; C; 0219; # LATIN CAPITAL LETTER S WITH COMMA BELOW | |
569 | +021A; C; 021B; # LATIN CAPITAL LETTER T WITH COMMA BELOW | |
570 | +021C; C; 021D; # LATIN CAPITAL LETTER YOGH | |
571 | +021E; C; 021F; # LATIN CAPITAL LETTER H WITH CARON | |
572 | +0220; C; 019E; # LATIN CAPITAL LETTER N WITH LONG RIGHT LEG | |
573 | +0222; C; 0223; # LATIN CAPITAL LETTER OU | |
574 | +0224; C; 0225; # LATIN CAPITAL LETTER Z WITH HOOK | |
575 | +0226; C; 0227; # LATIN CAPITAL LETTER A WITH DOT ABOVE | |
576 | +0228; C; 0229; # LATIN CAPITAL LETTER E WITH CEDILLA | |
577 | +022A; C; 022B; # LATIN CAPITAL LETTER O WITH DIAERESIS AND MACRON | |
578 | +022C; C; 022D; # LATIN CAPITAL LETTER O WITH TILDE AND MACRON | |
579 | +022E; C; 022F; # LATIN CAPITAL LETTER O WITH DOT ABOVE | |
580 | +0230; C; 0231; # LATIN CAPITAL LETTER O WITH DOT ABOVE AND MACRON | |
581 | +0232; C; 0233; # LATIN CAPITAL LETTER Y WITH MACRON | |
582 | +023A; C; 2C65; # LATIN CAPITAL LETTER A WITH STROKE | |
583 | +023B; C; 023C; # LATIN CAPITAL LETTER C WITH STROKE | |
584 | +023D; C; 019A; # LATIN CAPITAL LETTER L WITH BAR | |
585 | +023E; C; 2C66; # LATIN CAPITAL LETTER T WITH DIAGONAL STROKE | |
586 | +0241; C; 0242; # LATIN CAPITAL LETTER GLOTTAL STOP | |
587 | +0243; C; 0180; # LATIN CAPITAL LETTER B WITH STROKE | |
588 | +0244; C; 0289; # LATIN CAPITAL LETTER U BAR | |
589 | +0245; C; 028C; # LATIN CAPITAL LETTER TURNED V | |
590 | +0246; C; 0247; # LATIN CAPITAL LETTER E WITH STROKE | |
591 | +0248; C; 0249; # LATIN CAPITAL LETTER J WITH STROKE | |
592 | +024A; C; 024B; # LATIN CAPITAL LETTER SMALL Q WITH HOOK TAIL | |
593 | +024C; C; 024D; # LATIN CAPITAL LETTER R WITH STROKE | |
594 | +024E; C; 024F; # LATIN CAPITAL LETTER Y WITH STROKE | |
595 | +0345; C; 03B9; # COMBINING GREEK YPOGEGRAMMENI | |
596 | +0386; C; 03AC; # GREEK CAPITAL LETTER ALPHA WITH TONOS | |
597 | +0388; C; 03AD; # GREEK CAPITAL LETTER EPSILON WITH TONOS | |
598 | +0389; C; 03AE; # GREEK CAPITAL LETTER ETA WITH TONOS | |
599 | +038A; C; 03AF; # GREEK CAPITAL LETTER IOTA WITH TONOS | |
600 | +038C; C; 03CC; # GREEK CAPITAL LETTER OMICRON WITH TONOS | |
601 | +038E; C; 03CD; # GREEK CAPITAL LETTER UPSILON WITH TONOS | |
602 | +038F; C; 03CE; # GREEK CAPITAL LETTER OMEGA WITH TONOS | |
603 | +0390; F; 03B9 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS | |
604 | +0391; C; 03B1; # GREEK CAPITAL LETTER ALPHA | |
605 | +0392; C; 03B2; # GREEK CAPITAL LETTER BETA | |
606 | +0393; C; 03B3; # GREEK CAPITAL LETTER GAMMA | |
607 | +0394; C; 03B4; # GREEK CAPITAL LETTER DELTA | |
608 | +0395; C; 03B5; # GREEK CAPITAL LETTER EPSILON | |
609 | +0396; C; 03B6; # GREEK CAPITAL LETTER ZETA | |
610 | +0397; C; 03B7; # GREEK CAPITAL LETTER ETA | |
611 | +0398; C; 03B8; # GREEK CAPITAL LETTER THETA | |
612 | +0399; C; 03B9; # GREEK CAPITAL LETTER IOTA | |
613 | +039A; C; 03BA; # GREEK CAPITAL LETTER KAPPA | |
614 | +039B; C; 03BB; # GREEK CAPITAL LETTER LAMDA | |
615 | +039C; C; 03BC; # GREEK CAPITAL LETTER MU | |
616 | +039D; C; 03BD; # GREEK CAPITAL LETTER NU | |
617 | +039E; C; 03BE; # GREEK CAPITAL LETTER XI | |
618 | +039F; C; 03BF; # GREEK CAPITAL LETTER OMICRON | |
619 | +03A0; C; 03C0; # GREEK CAPITAL LETTER PI | |
620 | +03A1; C; 03C1; # GREEK CAPITAL LETTER RHO | |
621 | +03A3; C; 03C3; # GREEK CAPITAL LETTER SIGMA | |
622 | +03A4; C; 03C4; # GREEK CAPITAL LETTER TAU | |
623 | +03A5; C; 03C5; # GREEK CAPITAL LETTER UPSILON | |
624 | +03A6; C; 03C6; # GREEK CAPITAL LETTER PHI | |
625 | +03A7; C; 03C7; # GREEK CAPITAL LETTER CHI | |
626 | +03A8; C; 03C8; # GREEK CAPITAL LETTER PSI | |
627 | +03A9; C; 03C9; # GREEK CAPITAL LETTER OMEGA | |
628 | +03AA; C; 03CA; # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA | |
629 | +03AB; C; 03CB; # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA | |
630 | +03B0; F; 03C5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS | |
631 | +03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA | |
632 | +03D0; C; 03B2; # GREEK BETA SYMBOL | |
633 | +03D1; C; 03B8; # GREEK THETA SYMBOL | |
634 | +03D5; C; 03C6; # GREEK PHI SYMBOL | |
635 | +03D6; C; 03C0; # GREEK PI SYMBOL | |
636 | +03D8; C; 03D9; # GREEK LETTER ARCHAIC KOPPA | |
637 | +03DA; C; 03DB; # GREEK LETTER STIGMA | |
638 | +03DC; C; 03DD; # GREEK LETTER DIGAMMA | |
639 | +03DE; C; 03DF; # GREEK LETTER KOPPA | |
640 | +03E0; C; 03E1; # GREEK LETTER SAMPI | |
641 | +03E2; C; 03E3; # COPTIC CAPITAL LETTER SHEI | |
642 | +03E4; C; 03E5; # COPTIC CAPITAL LETTER FEI | |
643 | +03E6; C; 03E7; # COPTIC CAPITAL LETTER KHEI | |
644 | +03E8; C; 03E9; # COPTIC CAPITAL LETTER HORI | |
645 | +03EA; C; 03EB; # COPTIC CAPITAL LETTER GANGIA | |
646 | +03EC; C; 03ED; # COPTIC CAPITAL LETTER SHIMA | |
647 | +03EE; C; 03EF; # COPTIC CAPITAL LETTER DEI | |
648 | +03F0; C; 03BA; # GREEK KAPPA SYMBOL | |
649 | +03F1; C; 03C1; # GREEK RHO SYMBOL | |
650 | +03F4; C; 03B8; # GREEK CAPITAL THETA SYMBOL | |
651 | +03F5; C; 03B5; # GREEK LUNATE EPSILON SYMBOL | |
652 | +03F7; C; 03F8; # GREEK CAPITAL LETTER SHO | |
653 | +03F9; C; 03F2; # GREEK CAPITAL LUNATE SIGMA SYMBOL | |
654 | +03FA; C; 03FB; # GREEK CAPITAL LETTER SAN | |
655 | +03FD; C; 037B; # GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL | |
656 | +03FE; C; 037C; # GREEK CAPITAL DOTTED LUNATE SIGMA SYMBOL | |
657 | +03FF; C; 037D; # GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL | |
658 | +0400; C; 0450; # CYRILLIC CAPITAL LETTER IE WITH GRAVE | |
659 | +0401; C; 0451; # CYRILLIC CAPITAL LETTER IO | |
660 | +0402; C; 0452; # CYRILLIC CAPITAL LETTER DJE | |
661 | +0403; C; 0453; # CYRILLIC CAPITAL LETTER GJE | |
662 | +0404; C; 0454; # CYRILLIC CAPITAL LETTER UKRAINIAN IE | |
663 | +0405; C; 0455; # CYRILLIC CAPITAL LETTER DZE | |
664 | +0406; C; 0456; # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I | |
665 | +0407; C; 0457; # CYRILLIC CAPITAL LETTER YI | |
666 | +0408; C; 0458; # CYRILLIC CAPITAL LETTER JE | |
667 | +0409; C; 0459; # CYRILLIC CAPITAL LETTER LJE | |
668 | +040A; C; 045A; # CYRILLIC CAPITAL LETTER NJE | |
669 | +040B; C; 045B; # CYRILLIC CAPITAL LETTER TSHE | |
670 | +040C; C; 045C; # CYRILLIC CAPITAL LETTER KJE | |
671 | +040D; C; 045D; # CYRILLIC CAPITAL LETTER I WITH GRAVE | |
672 | +040E; C; 045E; # CYRILLIC CAPITAL LETTER SHORT U | |
673 | +040F; C; 045F; # CYRILLIC CAPITAL LETTER DZHE | |
674 | +0410; C; 0430; # CYRILLIC CAPITAL LETTER A | |
675 | +0411; C; 0431; # CYRILLIC CAPITAL LETTER BE | |
676 | +0412; C; 0432; # CYRILLIC CAPITAL LETTER VE | |
677 | +0413; C; 0433; # CYRILLIC CAPITAL LETTER GHE | |
678 | +0414; C; 0434; # CYRILLIC CAPITAL LETTER DE | |
679 | +0415; C; 0435; # CYRILLIC CAPITAL LETTER IE | |
680 | +0416; C; 0436; # CYRILLIC CAPITAL LETTER ZHE | |
681 | +0417; C; 0437; # CYRILLIC CAPITAL LETTER ZE | |
682 | +0418; C; 0438; # CYRILLIC CAPITAL LETTER I | |
683 | +0419; C; 0439; # CYRILLIC CAPITAL LETTER SHORT I | |
684 | +041A; C; 043A; # CYRILLIC CAPITAL LETTER KA | |
685 | +041B; C; 043B; # CYRILLIC CAPITAL LETTER EL | |
686 | +041C; C; 043C; # CYRILLIC CAPITAL LETTER EM | |
687 | +041D; C; 043D; # CYRILLIC CAPITAL LETTER EN | |
688 | +041E; C; 043E; # CYRILLIC CAPITAL LETTER O | |
689 | +041F; C; 043F; # CYRILLIC CAPITAL LETTER PE | |
690 | +0420; C; 0440; # CYRILLIC CAPITAL LETTER ER | |
691 | +0421; C; 0441; # CYRILLIC CAPITAL LETTER ES | |
692 | +0422; C; 0442; # CYRILLIC CAPITAL LETTER TE | |
693 | +0423; C; 0443; # CYRILLIC CAPITAL LETTER U | |
694 | +0424; C; 0444; # CYRILLIC CAPITAL LETTER EF | |
695 | +0425; C; 0445; # CYRILLIC CAPITAL LETTER HA | |
696 | +0426; C; 0446; # CYRILLIC CAPITAL LETTER TSE | |
697 | +0427; C; 0447; # CYRILLIC CAPITAL LETTER CHE | |
698 | +0428; C; 0448; # CYRILLIC CAPITAL LETTER SHA | |
699 | +0429; C; 0449; # CYRILLIC CAPITAL LETTER SHCHA | |
700 | +042A; C; 044A; # CYRILLIC CAPITAL LETTER HARD SIGN | |
701 | +042B; C; 044B; # CYRILLIC CAPITAL LETTER YERU | |
702 | +042C; C; 044C; # CYRILLIC CAPITAL LETTER SOFT SIGN | |
703 | +042D; C; 044D; # CYRILLIC CAPITAL LETTER E | |
704 | +042E; C; 044E; # CYRILLIC CAPITAL LETTER YU | |
705 | +042F; C; 044F; # CYRILLIC CAPITAL LETTER YA | |
706 | +0460; C; 0461; # CYRILLIC CAPITAL LETTER OMEGA | |
707 | +0462; C; 0463; # CYRILLIC CAPITAL LETTER YAT | |
708 | +0464; C; 0465; # CYRILLIC CAPITAL LETTER IOTIFIED E | |
709 | +0466; C; 0467; # CYRILLIC CAPITAL LETTER LITTLE YUS | |
710 | +0468; C; 0469; # CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS | |
711 | +046A; C; 046B; # CYRILLIC CAPITAL LETTER BIG YUS | |
712 | +046C; C; 046D; # CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS | |
713 | +046E; C; 046F; # CYRILLIC CAPITAL LETTER KSI | |
714 | +0470; C; 0471; # CYRILLIC CAPITAL LETTER PSI | |
715 | +0472; C; 0473; # CYRILLIC CAPITAL LETTER FITA | |
716 | +0474; C; 0475; # CYRILLIC CAPITAL LETTER IZHITSA | |
717 | +0476; C; 0477; # CYRILLIC CAPITAL LETTER IZHITSA WITH DOUBLE GRAVE ACCENT | |
718 | +0478; C; 0479; # CYRILLIC CAPITAL LETTER UK | |
719 | +047A; C; 047B; # CYRILLIC CAPITAL LETTER ROUND OMEGA | |
720 | +047C; C; 047D; # CYRILLIC CAPITAL LETTER OMEGA WITH TITLO | |
721 | +047E; C; 047F; # CYRILLIC CAPITAL LETTER OT | |
722 | +0480; C; 0481; # CYRILLIC CAPITAL LETTER KOPPA | |
723 | +048A; C; 048B; # CYRILLIC CAPITAL LETTER SHORT I WITH TAIL | |
724 | +048C; C; 048D; # CYRILLIC CAPITAL LETTER SEMISOFT SIGN | |
725 | +048E; C; 048F; # CYRILLIC CAPITAL LETTER ER WITH TICK | |
726 | +0490; C; 0491; # CYRILLIC CAPITAL LETTER GHE WITH UPTURN | |
727 | +0492; C; 0493; # CYRILLIC CAPITAL LETTER GHE WITH STROKE | |
728 | +0494; C; 0495; # CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK | |
729 | +0496; C; 0497; # CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER | |
730 | +0498; C; 0499; # CYRILLIC CAPITAL LETTER ZE WITH DESCENDER | |
731 | +049A; C; 049B; # CYRILLIC CAPITAL LETTER KA WITH DESCENDER | |
732 | +049C; C; 049D; # CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE | |
733 | +049E; C; 049F; # CYRILLIC CAPITAL LETTER KA WITH STROKE | |
734 | +04A0; C; 04A1; # CYRILLIC CAPITAL LETTER BASHKIR KA | |
735 | +04A2; C; 04A3; # CYRILLIC CAPITAL LETTER EN WITH DESCENDER | |
736 | +04A4; C; 04A5; # CYRILLIC CAPITAL LIGATURE EN GHE | |
737 | +04A6; C; 04A7; # CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK | |
738 | +04A8; C; 04A9; # CYRILLIC CAPITAL LETTER ABKHASIAN HA | |
739 | +04AA; C; 04AB; # CYRILLIC CAPITAL LETTER ES WITH DESCENDER | |
740 | +04AC; C; 04AD; # CYRILLIC CAPITAL LETTER TE WITH DESCENDER | |
741 | +04AE; C; 04AF; # CYRILLIC CAPITAL LETTER STRAIGHT U | |
742 | +04B0; C; 04B1; # CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE | |
743 | +04B2; C; 04B3; # CYRILLIC CAPITAL LETTER HA WITH DESCENDER | |
744 | +04B4; C; 04B5; # CYRILLIC CAPITAL LIGATURE TE TSE | |
745 | +04B6; C; 04B7; # CYRILLIC CAPITAL LETTER CHE WITH DESCENDER | |
746 | +04B8; C; 04B9; # CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE | |
747 | +04BA; C; 04BB; # CYRILLIC CAPITAL LETTER SHHA | |
748 | +04BC; C; 04BD; # CYRILLIC CAPITAL LETTER ABKHASIAN CHE | |
749 | +04BE; C; 04BF; # CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER | |
750 | +04C0; C; 04CF; # CYRILLIC LETTER PALOCHKA | |
751 | +04C1; C; 04C2; # CYRILLIC CAPITAL LETTER ZHE WITH BREVE | |
752 | +04C3; C; 04C4; # CYRILLIC CAPITAL LETTER KA WITH HOOK | |
753 | +04C5; C; 04C6; # CYRILLIC CAPITAL LETTER EL WITH TAIL | |
754 | +04C7; C; 04C8; # CYRILLIC CAPITAL LETTER EN WITH HOOK | |
755 | +04C9; C; 04CA; # CYRILLIC CAPITAL LETTER EN WITH TAIL | |
756 | +04CB; C; 04CC; # CYRILLIC CAPITAL LETTER KHAKASSIAN CHE | |
757 | +04CD; C; 04CE; # CYRILLIC CAPITAL LETTER EM WITH TAIL | |
758 | +04D0; C; 04D1; # CYRILLIC CAPITAL LETTER A WITH BREVE | |
759 | +04D2; C; 04D3; # CYRILLIC CAPITAL LETTER A WITH DIAERESIS | |
760 | +04D4; C; 04D5; # CYRILLIC CAPITAL LIGATURE A IE | |
761 | +04D6; C; 04D7; # CYRILLIC CAPITAL LETTER IE WITH BREVE | |
762 | +04D8; C; 04D9; # CYRILLIC CAPITAL LETTER SCHWA | |
763 | +04DA; C; 04DB; # CYRILLIC CAPITAL LETTER SCHWA WITH DIAERESIS | |
764 | +04DC; C; 04DD; # CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS | |
765 | +04DE; C; 04DF; # CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS | |
766 | +04E0; C; 04E1; # CYRILLIC CAPITAL LETTER ABKHASIAN DZE | |
767 | +04E2; C; 04E3; # CYRILLIC CAPITAL LETTER I WITH MACRON | |
768 | +04E4; C; 04E5; # CYRILLIC CAPITAL LETTER I WITH DIAERESIS | |
769 | +04E6; C; 04E7; # CYRILLIC CAPITAL LETTER O WITH DIAERESIS | |
770 | +04E8; C; 04E9; # CYRILLIC CAPITAL LETTER BARRED O | |
771 | +04EA; C; 04EB; # CYRILLIC CAPITAL LETTER BARRED O WITH DIAERESIS | |
772 | +04EC; C; 04ED; # CYRILLIC CAPITAL LETTER E WITH DIAERESIS | |
773 | +04EE; C; 04EF; # CYRILLIC CAPITAL LETTER U WITH MACRON | |
774 | +04F0; C; 04F1; # CYRILLIC CAPITAL LETTER U WITH DIAERESIS | |
775 | +04F2; C; 04F3; # CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE | |
776 | +04F4; C; 04F5; # CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS | |
777 | +04F6; C; 04F7; # CYRILLIC CAPITAL LETTER GHE WITH DESCENDER | |
778 | +04F8; C; 04F9; # CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS | |
779 | +04FA; C; 04FB; # CYRILLIC CAPITAL LETTER GHE WITH STROKE AND HOOK | |
780 | +04FC; C; 04FD; # CYRILLIC CAPITAL LETTER HA WITH HOOK | |
781 | +04FE; C; 04FF; # CYRILLIC CAPITAL LETTER HA WITH STROKE | |
782 | +0500; C; 0501; # CYRILLIC CAPITAL LETTER KOMI DE | |
783 | +0502; C; 0503; # CYRILLIC CAPITAL LETTER KOMI DJE | |
784 | +0504; C; 0505; # CYRILLIC CAPITAL LETTER KOMI ZJE | |
785 | +0506; C; 0507; # CYRILLIC CAPITAL LETTER KOMI DZJE | |
786 | +0508; C; 0509; # CYRILLIC CAPITAL LETTER KOMI LJE | |
787 | +050A; C; 050B; # CYRILLIC CAPITAL LETTER KOMI NJE | |
788 | +050C; C; 050D; # CYRILLIC CAPITAL LETTER KOMI SJE | |
789 | +050E; C; 050F; # CYRILLIC CAPITAL LETTER KOMI TJE | |
790 | +0510; C; 0511; # CYRILLIC CAPITAL LETTER REVERSED ZE | |
791 | +0512; C; 0513; # CYRILLIC CAPITAL LETTER EL WITH HOOK | |
792 | +0531; C; 0561; # ARMENIAN CAPITAL LETTER AYB | |
793 | +0532; C; 0562; # ARMENIAN CAPITAL LETTER BEN | |
794 | +0533; C; 0563; # ARMENIAN CAPITAL LETTER GIM | |
795 | +0534; C; 0564; # ARMENIAN CAPITAL LETTER DA | |
796 | +0535; C; 0565; # ARMENIAN CAPITAL LETTER ECH | |
797 | +0536; C; 0566; # ARMENIAN CAPITAL LETTER ZA | |
798 | +0537; C; 0567; # ARMENIAN CAPITAL LETTER EH | |
799 | +0538; C; 0568; # ARMENIAN CAPITAL LETTER ET | |
800 | +0539; C; 0569; # ARMENIAN CAPITAL LETTER TO | |
801 | +053A; C; 056A; # ARMENIAN CAPITAL LETTER ZHE | |
802 | +053B; C; 056B; # ARMENIAN CAPITAL LETTER INI | |
803 | +053C; C; 056C; # ARMENIAN CAPITAL LETTER LIWN | |
804 | +053D; C; 056D; # ARMENIAN CAPITAL LETTER XEH | |
805 | +053E; C; 056E; # ARMENIAN CAPITAL LETTER CA | |
806 | +053F; C; 056F; # ARMENIAN CAPITAL LETTER KEN | |
807 | +0540; C; 0570; # ARMENIAN CAPITAL LETTER HO | |
808 | +0541; C; 0571; # ARMENIAN CAPITAL LETTER JA | |
809 | +0542; C; 0572; # ARMENIAN CAPITAL LETTER GHAD | |
810 | +0543; C; 0573; # ARMENIAN CAPITAL LETTER CHEH | |
811 | +0544; C; 0574; # ARMENIAN CAPITAL LETTER MEN | |
812 | +0545; C; 0575; # ARMENIAN CAPITAL LETTER YI | |
813 | +0546; C; 0576; # ARMENIAN CAPITAL LETTER NOW | |
814 | +0547; C; 0577; # ARMENIAN CAPITAL LETTER SHA | |
815 | +0548; C; 0578; # ARMENIAN CAPITAL LETTER VO | |
816 | +0549; C; 0579; # ARMENIAN CAPITAL LETTER CHA | |
817 | +054A; C; 057A; # ARMENIAN CAPITAL LETTER PEH | |
818 | +054B; C; 057B; # ARMENIAN CAPITAL LETTER JHEH | |
819 | +054C; C; 057C; # ARMENIAN CAPITAL LETTER RA | |
820 | +054D; C; 057D; # ARMENIAN CAPITAL LETTER SEH | |
821 | +054E; C; 057E; # ARMENIAN CAPITAL LETTER VEW | |
822 | +054F; C; 057F; # ARMENIAN CAPITAL LETTER TIWN | |
823 | +0550; C; 0580; # ARMENIAN CAPITAL LETTER REH | |
824 | +0551; C; 0581; # ARMENIAN CAPITAL LETTER CO | |
825 | +0552; C; 0582; # ARMENIAN CAPITAL LETTER YIWN | |
826 | +0553; C; 0583; # ARMENIAN CAPITAL LETTER PIWR | |
827 | +0554; C; 0584; # ARMENIAN CAPITAL LETTER KEH | |
828 | +0555; C; 0585; # ARMENIAN CAPITAL LETTER OH | |
829 | +0556; C; 0586; # ARMENIAN CAPITAL LETTER FEH | |
830 | +0587; F; 0565 0582; # ARMENIAN SMALL LIGATURE ECH YIWN | |
831 | +10A0; C; 2D00; # GEORGIAN CAPITAL LETTER AN | |
832 | +10A1; C; 2D01; # GEORGIAN CAPITAL LETTER BAN | |
833 | +10A2; C; 2D02; # GEORGIAN CAPITAL LETTER GAN | |
834 | +10A3; C; 2D03; # GEORGIAN CAPITAL LETTER DON | |
835 | +10A4; C; 2D04; # GEORGIAN CAPITAL LETTER EN | |
836 | +10A5; C; 2D05; # GEORGIAN CAPITAL LETTER VIN | |
837 | +10A6; C; 2D06; # GEORGIAN CAPITAL LETTER ZEN | |
838 | +10A7; C; 2D07; # GEORGIAN CAPITAL LETTER TAN | |
839 | +10A8; C; 2D08; # GEORGIAN CAPITAL LETTER IN | |
840 | +10A9; C; 2D09; # GEORGIAN CAPITAL LETTER KAN | |
841 | +10AA; C; 2D0A; # GEORGIAN CAPITAL LETTER LAS | |
842 | +10AB; C; 2D0B; # GEORGIAN CAPITAL LETTER MAN | |
843 | +10AC; C; 2D0C; # GEORGIAN CAPITAL LETTER NAR | |
844 | +10AD; C; 2D0D; # GEORGIAN CAPITAL LETTER ON | |
845 | +10AE; C; 2D0E; # GEORGIAN CAPITAL LETTER PAR | |
846 | +10AF; C; 2D0F; # GEORGIAN CAPITAL LETTER ZHAR | |
847 | +10B0; C; 2D10; # GEORGIAN CAPITAL LETTER RAE | |
848 | +10B1; C; 2D11; # GEORGIAN CAPITAL LETTER SAN | |
849 | +10B2; C; 2D12; # GEORGIAN CAPITAL LETTER TAR | |
850 | +10B3; C; 2D13; # GEORGIAN CAPITAL LETTER UN | |
851 | +10B4; C; 2D14; # GEORGIAN CAPITAL LETTER PHAR | |
852 | +10B5; C; 2D15; # GEORGIAN CAPITAL LETTER KHAR | |
853 | +10B6; C; 2D16; # GEORGIAN CAPITAL LETTER GHAN | |
854 | +10B7; C; 2D17; # GEORGIAN CAPITAL LETTER QAR | |
855 | +10B8; C; 2D18; # GEORGIAN CAPITAL LETTER SHIN | |
856 | +10B9; C; 2D19; # GEORGIAN CAPITAL LETTER CHIN | |
857 | +10BA; C; 2D1A; # GEORGIAN CAPITAL LETTER CAN | |
858 | +10BB; C; 2D1B; # GEORGIAN CAPITAL LETTER JIL | |
859 | +10BC; C; 2D1C; # GEORGIAN CAPITAL LETTER CIL | |
860 | +10BD; C; 2D1D; # GEORGIAN CAPITAL LETTER CHAR | |
861 | +10BE; C; 2D1E; # GEORGIAN CAPITAL LETTER XAN | |
862 | +10BF; C; 2D1F; # GEORGIAN CAPITAL LETTER JHAN | |
863 | +10C0; C; 2D20; # GEORGIAN CAPITAL LETTER HAE | |
864 | +10C1; C; 2D21; # GEORGIAN CAPITAL LETTER HE | |
865 | +10C2; C; 2D22; # GEORGIAN CAPITAL LETTER HIE | |
866 | +10C3; C; 2D23; # GEORGIAN CAPITAL LETTER WE | |
867 | +10C4; C; 2D24; # GEORGIAN CAPITAL LETTER HAR | |
868 | +10C5; C; 2D25; # GEORGIAN CAPITAL LETTER HOE | |
869 | +1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW | |
870 | +1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE | |
871 | +1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW | |
872 | +1E06; C; 1E07; # LATIN CAPITAL LETTER B WITH LINE BELOW | |
873 | +1E08; C; 1E09; # LATIN CAPITAL LETTER C WITH CEDILLA AND ACUTE | |
874 | +1E0A; C; 1E0B; # LATIN CAPITAL LETTER D WITH DOT ABOVE | |
875 | +1E0C; C; 1E0D; # LATIN CAPITAL LETTER D WITH DOT BELOW | |
876 | +1E0E; C; 1E0F; # LATIN CAPITAL LETTER D WITH LINE BELOW | |
877 | +1E10; C; 1E11; # LATIN CAPITAL LETTER D WITH CEDILLA | |
878 | +1E12; C; 1E13; # LATIN CAPITAL LETTER D WITH CIRCUMFLEX BELOW | |
879 | +1E14; C; 1E15; # LATIN CAPITAL LETTER E WITH MACRON AND GRAVE | |
880 | +1E16; C; 1E17; # LATIN CAPITAL LETTER E WITH MACRON AND ACUTE | |
881 | +1E18; C; 1E19; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX BELOW | |
882 | +1E1A; C; 1E1B; # LATIN CAPITAL LETTER E WITH TILDE BELOW | |
883 | +1E1C; C; 1E1D; # LATIN CAPITAL LETTER E WITH CEDILLA AND BREVE | |
884 | +1E1E; C; 1E1F; # LATIN CAPITAL LETTER F WITH DOT ABOVE | |
885 | +1E20; C; 1E21; # LATIN CAPITAL LETTER G WITH MACRON | |
886 | +1E22; C; 1E23; # LATIN CAPITAL LETTER H WITH DOT ABOVE | |
887 | +1E24; C; 1E25; # LATIN CAPITAL LETTER H WITH DOT BELOW | |
888 | +1E26; C; 1E27; # LATIN CAPITAL LETTER H WITH DIAERESIS | |
889 | +1E28; C; 1E29; # LATIN CAPITAL LETTER H WITH CEDILLA | |
890 | +1E2A; C; 1E2B; # LATIN CAPITAL LETTER H WITH BREVE BELOW | |
891 | +1E2C; C; 1E2D; # LATIN CAPITAL LETTER I WITH TILDE BELOW | |
892 | +1E2E; C; 1E2F; # LATIN CAPITAL LETTER I WITH DIAERESIS AND ACUTE | |
893 | +1E30; C; 1E31; # LATIN CAPITAL LETTER K WITH ACUTE | |
894 | +1E32; C; 1E33; # LATIN CAPITAL LETTER K WITH DOT BELOW | |
895 | +1E34; C; 1E35; # LATIN CAPITAL LETTER K WITH LINE BELOW | |
896 | +1E36; C; 1E37; # LATIN CAPITAL LETTER L WITH DOT BELOW | |
897 | +1E38; C; 1E39; # LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON | |
898 | +1E3A; C; 1E3B; # LATIN CAPITAL LETTER L WITH LINE BELOW | |
899 | +1E3C; C; 1E3D; # LATIN CAPITAL LETTER L WITH CIRCUMFLEX BELOW | |
900 | +1E3E; C; 1E3F; # LATIN CAPITAL LETTER M WITH ACUTE | |
901 | +1E40; C; 1E41; # LATIN CAPITAL LETTER M WITH DOT ABOVE | |
902 | +1E42; C; 1E43; # LATIN CAPITAL LETTER M WITH DOT BELOW | |
903 | +1E44; C; 1E45; # LATIN CAPITAL LETTER N WITH DOT ABOVE | |
904 | +1E46; C; 1E47; # LATIN CAPITAL LETTER N WITH DOT BELOW | |
905 | +1E48; C; 1E49; # LATIN CAPITAL LETTER N WITH LINE BELOW | |
906 | +1E4A; C; 1E4B; # LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW | |
907 | +1E4C; C; 1E4D; # LATIN CAPITAL LETTER O WITH TILDE AND ACUTE | |
908 | +1E4E; C; 1E4F; # LATIN CAPITAL LETTER O WITH TILDE AND DIAERESIS | |
909 | +1E50; C; 1E51; # LATIN CAPITAL LETTER O WITH MACRON AND GRAVE | |
910 | +1E52; C; 1E53; # LATIN CAPITAL LETTER O WITH MACRON AND ACUTE | |
911 | +1E54; C; 1E55; # LATIN CAPITAL LETTER P WITH ACUTE | |
912 | +1E56; C; 1E57; # LATIN CAPITAL LETTER P WITH DOT ABOVE | |
913 | +1E58; C; 1E59; # LATIN CAPITAL LETTER R WITH DOT ABOVE | |
914 | +1E5A; C; 1E5B; # LATIN CAPITAL LETTER R WITH DOT BELOW | |
915 | +1E5C; C; 1E5D; # LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON | |
916 | +1E5E; C; 1E5F; # LATIN CAPITAL LETTER R WITH LINE BELOW | |
917 | +1E60; C; 1E61; # LATIN CAPITAL LETTER S WITH DOT ABOVE | |
918 | +1E62; C; 1E63; # LATIN CAPITAL LETTER S WITH DOT BELOW | |
919 | +1E64; C; 1E65; # LATIN CAPITAL LETTER S WITH ACUTE AND DOT ABOVE | |
920 | +1E66; C; 1E67; # LATIN CAPITAL LETTER S WITH CARON AND DOT ABOVE | |
921 | +1E68; C; 1E69; # LATIN CAPITAL LETTER S WITH DOT BELOW AND DOT ABOVE | |
922 | +1E6A; C; 1E6B; # LATIN CAPITAL LETTER T WITH DOT ABOVE | |
923 | +1E6C; C; 1E6D; # LATIN CAPITAL LETTER T WITH DOT BELOW | |
924 | +1E6E; C; 1E6F; # LATIN CAPITAL LETTER T WITH LINE BELOW | |
925 | +1E70; C; 1E71; # LATIN CAPITAL LETTER T WITH CIRCUMFLEX BELOW | |
926 | +1E72; C; 1E73; # LATIN CAPITAL LETTER U WITH DIAERESIS BELOW | |
927 | +1E74; C; 1E75; # LATIN CAPITAL LETTER U WITH TILDE BELOW | |
928 | +1E76; C; 1E77; # LATIN CAPITAL LETTER U WITH CIRCUMFLEX BELOW | |
929 | +1E78; C; 1E79; # LATIN CAPITAL LETTER U WITH TILDE AND ACUTE | |
930 | +1E7A; C; 1E7B; # LATIN CAPITAL LETTER U WITH MACRON AND DIAERESIS | |
931 | +1E7C; C; 1E7D; # LATIN CAPITAL LETTER V WITH TILDE | |
932 | +1E7E; C; 1E7F; # LATIN CAPITAL LETTER V WITH DOT BELOW | |
933 | +1E80; C; 1E81; # LATIN CAPITAL LETTER W WITH GRAVE | |
934 | +1E82; C; 1E83; # LATIN CAPITAL LETTER W WITH ACUTE | |
935 | +1E84; C; 1E85; # LATIN CAPITAL LETTER W WITH DIAERESIS | |
936 | +1E86; C; 1E87; # LATIN CAPITAL LETTER W WITH DOT ABOVE | |
937 | +1E88; C; 1E89; # LATIN CAPITAL LETTER W WITH DOT BELOW | |
938 | +1E8A; C; 1E8B; # LATIN CAPITAL LETTER X WITH DOT ABOVE | |
939 | +1E8C; C; 1E8D; # LATIN CAPITAL LETTER X WITH DIAERESIS | |
940 | +1E8E; C; 1E8F; # LATIN CAPITAL LETTER Y WITH DOT ABOVE | |
941 | +1E90; C; 1E91; # LATIN CAPITAL LETTER Z WITH CIRCUMFLEX | |
942 | +1E92; C; 1E93; # LATIN CAPITAL LETTER Z WITH DOT BELOW | |
943 | +1E94; C; 1E95; # LATIN CAPITAL LETTER Z WITH LINE BELOW | |
944 | +1E96; F; 0068 0331; # LATIN SMALL LETTER H WITH LINE BELOW | |
945 | +1E97; F; 0074 0308; # LATIN SMALL LETTER T WITH DIAERESIS | |
946 | +1E98; F; 0077 030A; # LATIN SMALL LETTER W WITH RING ABOVE | |
947 | +1E99; F; 0079 030A; # LATIN SMALL LETTER Y WITH RING ABOVE | |
948 | +1E9A; F; 0061 02BE; # LATIN SMALL LETTER A WITH RIGHT HALF RING | |
949 | +1E9B; C; 1E61; # LATIN SMALL LETTER LONG S WITH DOT ABOVE | |
950 | +1EA0; C; 1EA1; # LATIN CAPITAL LETTER A WITH DOT BELOW | |
951 | +1EA2; C; 1EA3; # LATIN CAPITAL LETTER A WITH HOOK ABOVE | |
952 | +1EA4; C; 1EA5; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND ACUTE | |
953 | +1EA6; C; 1EA7; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE | |
954 | +1EA8; C; 1EA9; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE | |
955 | +1EAA; C; 1EAB; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND TILDE | |
956 | +1EAC; C; 1EAD; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW | |
957 | +1EAE; C; 1EAF; # LATIN CAPITAL LETTER A WITH BREVE AND ACUTE | |
958 | +1EB0; C; 1EB1; # LATIN CAPITAL LETTER A WITH BREVE AND GRAVE | |
959 | +1EB2; C; 1EB3; # LATIN CAPITAL LETTER A WITH BREVE AND HOOK ABOVE | |
960 | +1EB4; C; 1EB5; # LATIN CAPITAL LETTER A WITH BREVE AND TILDE | |
961 | +1EB6; C; 1EB7; # LATIN CAPITAL LETTER A WITH BREVE AND DOT BELOW | |
962 | +1EB8; C; 1EB9; # LATIN CAPITAL LETTER E WITH DOT BELOW | |
963 | +1EBA; C; 1EBB; # LATIN CAPITAL LETTER E WITH HOOK ABOVE | |
964 | +1EBC; C; 1EBD; # LATIN CAPITAL LETTER E WITH TILDE | |
965 | +1EBE; C; 1EBF; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND ACUTE | |
966 | +1EC0; C; 1EC1; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND GRAVE | |
967 | +1EC2; C; 1EC3; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE | |
968 | +1EC4; C; 1EC5; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND TILDE | |
969 | +1EC6; C; 1EC7; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND DOT BELOW | |
970 | +1EC8; C; 1EC9; # LATIN CAPITAL LETTER I WITH HOOK ABOVE | |
971 | +1ECA; C; 1ECB; # LATIN CAPITAL LETTER I WITH DOT BELOW | |
972 | +1ECC; C; 1ECD; # LATIN CAPITAL LETTER O WITH DOT BELOW | |
973 | +1ECE; C; 1ECF; # LATIN CAPITAL LETTER O WITH HOOK ABOVE | |
974 | +1ED0; C; 1ED1; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND ACUTE | |
975 | +1ED2; C; 1ED3; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND GRAVE | |
976 | +1ED4; C; 1ED5; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE | |
977 | +1ED6; C; 1ED7; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND TILDE | |
978 | +1ED8; C; 1ED9; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND DOT BELOW | |
979 | +1EDA; C; 1EDB; # LATIN CAPITAL LETTER O WITH HORN AND ACUTE | |
980 | +1EDC; C; 1EDD; # LATIN CAPITAL LETTER O WITH HORN AND GRAVE | |
981 | +1EDE; C; 1EDF; # LATIN CAPITAL LETTER O WITH HORN AND HOOK ABOVE | |
982 | +1EE0; C; 1EE1; # LATIN CAPITAL LETTER O WITH HORN AND TILDE | |
983 | +1EE2; C; 1EE3; # LATIN CAPITAL LETTER O WITH HORN AND DOT BELOW | |
984 | +1EE4; C; 1EE5; # LATIN CAPITAL LETTER U WITH DOT BELOW | |
985 | +1EE6; C; 1EE7; # LATIN CAPITAL LETTER U WITH HOOK ABOVE | |
986 | +1EE8; C; 1EE9; # LATIN CAPITAL LETTER U WITH HORN AND ACUTE | |
987 | +1EEA; C; 1EEB; # LATIN CAPITAL LETTER U WITH HORN AND GRAVE | |
988 | +1EEC; C; 1EED; # LATIN CAPITAL LETTER U WITH HORN AND HOOK ABOVE | |
989 | +1EEE; C; 1EEF; # LATIN CAPITAL LETTER U WITH HORN AND TILDE | |
990 | +1EF0; C; 1EF1; # LATIN CAPITAL LETTER U WITH HORN AND DOT BELOW | |
991 | +1EF2; C; 1EF3; # LATIN CAPITAL LETTER Y WITH GRAVE | |
992 | +1EF4; C; 1EF5; # LATIN CAPITAL LETTER Y WITH DOT BELOW | |
993 | +1EF6; C; 1EF7; # LATIN CAPITAL LETTER Y WITH HOOK ABOVE | |
994 | +1EF8; C; 1EF9; # LATIN CAPITAL LETTER Y WITH TILDE | |
995 | +1F08; C; 1F00; # GREEK CAPITAL LETTER ALPHA WITH PSILI | |
996 | +1F09; C; 1F01; # GREEK CAPITAL LETTER ALPHA WITH DASIA | |
997 | +1F0A; C; 1F02; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA | |
998 | +1F0B; C; 1F03; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA | |
999 | +1F0C; C; 1F04; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA | |
1000 | +1F0D; C; 1F05; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA | |
1001 | +1F0E; C; 1F06; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI | |
1002 | +1F0F; C; 1F07; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI | |
1003 | +1F18; C; 1F10; # GREEK CAPITAL LETTER EPSILON WITH PSILI | |
1004 | +1F19; C; 1F11; # GREEK CAPITAL LETTER EPSILON WITH DASIA | |
1005 | +1F1A; C; 1F12; # GREEK CAPITAL LETTER EPSILON WITH PSILI AND VARIA | |
1006 | +1F1B; C; 1F13; # GREEK CAPITAL LETTER EPSILON WITH DASIA AND VARIA | |
1007 | +1F1C; C; 1F14; # GREEK CAPITAL LETTER EPSILON WITH PSILI AND OXIA | |
1008 | +1F1D; C; 1F15; # GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA | |
1009 | +1F28; C; 1F20; # GREEK CAPITAL LETTER ETA WITH PSILI | |
1010 | +1F29; C; 1F21; # GREEK CAPITAL LETTER ETA WITH DASIA | |
1011 | +1F2A; C; 1F22; # GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA | |
1012 | +1F2B; C; 1F23; # GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA | |
1013 | +1F2C; C; 1F24; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA | |
1014 | +1F2D; C; 1F25; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA | |
1015 | +1F2E; C; 1F26; # GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI | |
1016 | +1F2F; C; 1F27; # GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI | |
1017 | +1F38; C; 1F30; # GREEK CAPITAL LETTER IOTA WITH PSILI | |
1018 | +1F39; C; 1F31; # GREEK CAPITAL LETTER IOTA WITH DASIA | |
1019 | +1F3A; C; 1F32; # GREEK CAPITAL LETTER IOTA WITH PSILI AND VARIA | |
1020 | +1F3B; C; 1F33; # GREEK CAPITAL LETTER IOTA WITH DASIA AND VARIA | |
1021 | +1F3C; C; 1F34; # GREEK CAPITAL LETTER IOTA WITH PSILI AND OXIA | |
1022 | +1F3D; C; 1F35; # GREEK CAPITAL LETTER IOTA WITH DASIA AND OXIA | |
1023 | +1F3E; C; 1F36; # GREEK CAPITAL LETTER IOTA WITH PSILI AND PERISPOMENI | |
1024 | +1F3F; C; 1F37; # GREEK CAPITAL LETTER IOTA WITH DASIA AND PERISPOMENI | |
1025 | +1F48; C; 1F40; # GREEK CAPITAL LETTER OMICRON WITH PSILI | |
1026 | +1F49; C; 1F41; # GREEK CAPITAL LETTER OMICRON WITH DASIA | |
1027 | +1F4A; C; 1F42; # GREEK CAPITAL LETTER OMICRON WITH PSILI AND VARIA | |
1028 | +1F4B; C; 1F43; # GREEK CAPITAL LETTER OMICRON WITH DASIA AND VARIA | |
1029 | +1F4C; C; 1F44; # GREEK CAPITAL LETTER OMICRON WITH PSILI AND OXIA | |
1030 | +1F4D; C; 1F45; # GREEK CAPITAL LETTER OMICRON WITH DASIA AND OXIA | |
1031 | +1F50; F; 03C5 0313; # GREEK SMALL LETTER UPSILON WITH PSILI | |
1032 | +1F52; F; 03C5 0313 0300; # GREEK SMALL LETTER UPSILON WITH PSILI AND VARIA | |
1033 | +1F54; F; 03C5 0313 0301; # GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA | |
1034 | +1F56; F; 03C5 0313 0342; # GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI | |
1035 | +1F59; C; 1F51; # GREEK CAPITAL LETTER UPSILON WITH DASIA | |
1036 | +1F5B; C; 1F53; # GREEK CAPITAL LETTER UPSILON WITH DASIA AND VARIA | |
1037 | +1F5D; C; 1F55; # GREEK CAPITAL LETTER UPSILON WITH DASIA AND OXIA | |
1038 | +1F5F; C; 1F57; # GREEK CAPITAL LETTER UPSILON WITH DASIA AND PERISPOMENI | |
1039 | +1F68; C; 1F60; # GREEK CAPITAL LETTER OMEGA WITH PSILI | |
1040 | +1F69; C; 1F61; # GREEK CAPITAL LETTER OMEGA WITH DASIA | |
1041 | +1F6A; C; 1F62; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA | |
1042 | +1F6B; C; 1F63; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA | |
1043 | +1F6C; C; 1F64; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA | |
1044 | +1F6D; C; 1F65; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA | |
1045 | +1F6E; C; 1F66; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI | |
1046 | +1F6F; C; 1F67; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI | |
1047 | +1F80; F; 1F00 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI | |
1048 | +1F81; F; 1F01 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAMMENI | |
1049 | +1F82; F; 1F02 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND YPOGEGRAMMENI | |
1050 | +1F83; F; 1F03 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA AND YPOGEGRAMMENI | |
1051 | +1F84; F; 1F04 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA AND YPOGEGRAMMENI | |
1052 | +1F85; F; 1F05 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA AND YPOGEGRAMMENI | |
1053 | +1F86; F; 1F06 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI | |
1054 | +1F87; F; 1F07 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI | |
1055 | +1F88; F; 1F00 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI | |
1056 | +1F88; S; 1F80; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI | |
1057 | +1F89; F; 1F01 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI | |
1058 | +1F89; S; 1F81; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI | |
1059 | +1F8A; F; 1F02 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI | |
1060 | +1F8A; S; 1F82; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI | |
1061 | +1F8B; F; 1F03 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI | |
1062 | +1F8B; S; 1F83; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI | |
1063 | +1F8C; F; 1F04 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI | |
1064 | +1F8C; S; 1F84; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI | |
1065 | +1F8D; F; 1F05 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI | |
1066 | +1F8D; S; 1F85; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI | |
1067 | +1F8E; F; 1F06 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI | |
1068 | +1F8E; S; 1F86; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI | |
1069 | +1F8F; F; 1F07 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI | |
1070 | +1F8F; S; 1F87; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI | |
1071 | +1F90; F; 1F20 03B9; # GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI | |
1072 | +1F91; F; 1F21 03B9; # GREEK SMALL LETTER ETA WITH DASIA AND YPOGEGRAMMENI | |
1073 | +1F92; F; 1F22 03B9; # GREEK SMALL LETTER ETA WITH PSILI AND VARIA AND YPOGEGRAMMENI | |
1074 |