Commit cba88f0ab0f43696de3dfe6ec0e8744a2d367307

Authored by dg
1 parent a543e086

oups :)



git-svn-id: http://svn.net-core.org/repos/t-engine4@3320 51575b47-30f0-44d4-a5cc-537603b46e54
  1 +Changelog
  2 +
  3 +2006-06-02:
  4 +- initial release of version 0.1
  5 +
  6 +2006-06-05:
  7 +- changed behaviour of PostgreSQL function to return NULL in case of
  8 + invalid input, rather than raising an exceptional condition
  9 +- improved efficiency of PostgreSQL function (no transformation to C string
  10 + is done)
  11 +
  12 +2006-06-20:
  13 +- added -fpic compiler flag in Makefile
  14 +- fixed bug in the C code for the ruby library (usage of non-existent
  15 + function)
  16 +
  17 +Release of version 0.2
  18 +
  19 +
  20 +2006-07-18:
  21 +- changed normalization from NFC to NFKC for postgresql unifold function
  22 +
  23 +2006-08-04:
  24 +- added support to mark the beginning of a grapheme cluster with 0xFF
  25 + (option: CHARBOUND)
  26 +- added the ruby method String#chars, which is returning an array of UTF-8
  27 + encoded grapheme clusters
  28 +- added NLF2LF transformation in postgresql unifold function
  29 +- added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no
  30 + normalization will be performed (different from previous versions)
  31 +- using integer constants rather than C-strings for character properties
  32 +- fixed (hopefully) a problem with the ruby library on Mac OS X, which
  33 + occured when compiler optimization was switched on
  34 +
  35 +Release of version 0.3
  36 +
  37 +
  38 +2006-09-17:
  39 +- added the LUMP option, which lumps certain characters together
  40 + (see lump.txt) (also used for the PostgreSQL "unifold" function)
  41 +- added the STRIPMARK option, which strips marking characters
  42 + (or marks of composed characters)
  43 +- deprecated ruby method String#char_ary in favour of String#utf8chars
  44 +
  45 +Release of version 1.0
  46 +
  47 +
  48 +2006-09-20:
  49 +- included a gem file for the ruby version of the library
  50 +
  51 +Release of version 1.0.1
  52 +
  53 +
  54 +2006-09-21:
  55 +- included a check in Integer#utf8, which raises an exception, if the given
  56 + code-point is invalid because of being too high (this was missing yet)
  57 +
  58 +2006-12-26:
  59 +- added support for PostgreSQL version 8.2
  60 +
  61 +Release of version 1.0.2
  62 +
  63 +
  64 +2007-03-16:
  65 +- Fixed a bug in the ruby library, which caused an error, when splitting an
  66 + empty string at grapheme cluster boundaries (method String#utf8chars).
  67 +
  68 +Release of version 1.0.3
  69 +
  70 +
  71 +2007-06-25:
  72 +- Added a new PostgreSQL function 'unistrip', which behaves like 'unifold',
  73 + but also removes all character marks (e.g. accents).
  74 +
  75 +2007-07-22:
  76 +- Changed license from BSD to MIT style.
  77 +- Added a new function 'utf8proc_codepoint_valid' to the C library.
  78 +- Changed compiler flags in Makefile from -g -O0 to -O2
  79 +- The ruby script, which was used to build the utf8proc_data.c file, is now
  80 + included in the distribution.
  81 +
  82 +Release of version 1.1.1
  83 +
  84 +
  85 +2007-07-25:
  86 +- Fixed a serious bug in the data file generator, which caused characters
  87 + being treated incorrectly, when stripping default ignorable characters or
  88 + calculating grapheme cluster boundaries.
  89 +
  90 +Release of version 1.1.2
  91 +
  92 +
  93 +2008-10-04:
  94 +- Added a function utf8proc_version returning a string containing the version
  95 + number of the library.
  96 +- Included a target libutf8proc.dylib for MacOSX.
  97 +
  98 +2009-05-01:
  99 +- PostgreSQL 8.3 compatibility (use of SET_VARSIZE macro)
  100 +
  101 +Release of version 1.1.3
  102 +
  103 +
  104 +2009-06-14:
  105 +- replaced C++ style comments for compatibility reasons
  106 +- added typecasts to suppress compiler warnings
  107 +- removed redundant source files for ruby-gemfile generation
  108 +
  109 +2009-08-19:
  110 +- Changed copyright notice for Public Software Group e. V.
  111 +- Minor changes in the README file
  112 +- Release of version 1.1.4
  113 +
  114 +2009-08-20:
  115 +- Use RSTRING_PTR() and RSTRING_LEN() instead of RSTRING()->ptr and
  116 + RSTRING()->len for ruby1.9 compatibility (and #define them, if not
  117 + existent)
  118 +
  119 +2009-10-02:
  120 +- Patches for compatibility with Microsoft Visual Studio
  121 +
  122 +2009-10-08:
  123 +- Fixes to make utf8proc usable in C++ programs
  124 +
  125 +2009-10-16:
  126 +- Release of version 1.1.5
  127 +
  128 +2009-10-08:
... ...
  1 +
  2 +Copyright (c) 2009 Public Software Group e. V., Berlin, Germany
  3 +
  4 +Permission is hereby granted, free of charge, to any person obtaining a
  5 +copy of this software and associated documentation files (the "Software"),
  6 +to deal in the Software without restriction, including without limitation
  7 +the rights to use, copy, modify, merge, publish, distribute, sublicense,
  8 +and/or sell copies of the Software, and to permit persons to whom the
  9 +Software is furnished to do so, subject to the following conditions:
  10 +
  11 +The above copyright notice and this permission notice shall be included in
  12 +all copies or substantial portions of the Software.
  13 +
  14 +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  15 +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  16 +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  17 +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  18 +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
  19 +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  20 +DEALINGS IN THE SOFTWARE.
  21 +
  22 +
  23 +This software distribution contains derived data from a modified version of
  24 +the Unicode data files. The following license applies to that data:
  25 +
  26 +COPYRIGHT AND PERMISSION NOTICE
  27 +
  28 +Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed
  29 +under the Terms of Use in http://www.unicode.org/copyright.html.
  30 +
  31 +Permission is hereby granted, free of charge, to any person obtaining a
  32 +copy of the Unicode data files and any associated documentation (the "Data
  33 +Files") or Unicode software and any associated documentation (the
  34 +"Software") to deal in the Data Files or Software without restriction,
  35 +including without limitation the rights to use, copy, modify, merge,
  36 +publish, distribute, and/or sell copies of the Data Files or Software, and
  37 +to permit persons to whom the Data Files or Software are furnished to do
  38 +so, provided that (a) the above copyright notice(s) and this permission
  39 +notice appear with all copies of the Data Files or Software, (b) both the
  40 +above copyright notice(s) and this permission notice appear in associated
  41 +documentation, and (c) there is clear notice in each modified Data File or
  42 +in the Software as well as in the documentation associated with the Data
  43 +File(s) or Software that the data or software has been modified.
  44 +
  45 +THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
  46 +KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  47 +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF
  48 +THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS
  49 +INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR
  50 +CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF
  51 +USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
  52 +TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
  53 +PERFORMANCE OF THE DATA FILES OR SOFTWARE.
  54 +
  55 +Except as contained in this notice, the name of a copyright holder shall
  56 +not be used in advertising or otherwise to promote the sale, use or other
  57 +dealings in these Data Files or Software without prior written
  58 +authorization of the copyright holder.
  59 +
  60 +
  61 +Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be
  62 +registered in some jurisdictions. All other trademarks and registered
  63 +trademarks mentioned herein are the property of their respective owners.
  64 +
... ...
  1 +# libutf8proc Makefile
  2 +
  3 +
  4 +# settings
  5 +
  6 +cflags = -O2 -std=c99 -pedantic -Wall -fpic $(CFLAGS)
  7 +cc = $(CC) $(cflags)
  8 +
  9 +
  10 +# meta targets
  11 +
  12 +c-library: libutf8proc.a libutf8proc.so
  13 +
  14 +ruby-library: ruby/utf8proc_native.so
  15 +
  16 +pgsql-library: pgsql/utf8proc_pgsql.so
  17 +
  18 +all: c-library ruby-library ruby-gem pgsql-library
  19 +
  20 +clean::
  21 + rm -f utf8proc.o libutf8proc.a libutf8proc.so
  22 + cd ruby/ && test -e Makefile && (make clean && rm -f Makefile) || true
  23 + rm -Rf ruby/gem/lib ruby/gem/ext
  24 + rm -f ruby/gem/utf8proc-*.gem
  25 + cd pgsql/ && make clean
  26 +
  27 +# real targets
  28 +
  29 +utf8proc.o: utf8proc.h utf8proc.c utf8proc_data.c
  30 + $(cc) -c -o utf8proc.o utf8proc.c
  31 +
  32 +libutf8proc.a: utf8proc.o
  33 + rm -f libutf8proc.a
  34 + ar rs libutf8proc.a utf8proc.o
  35 +
  36 +libutf8proc.so: utf8proc.o
  37 + $(cc) -shared -o libutf8proc.so utf8proc.o
  38 + chmod a-x libutf8proc.so
  39 +
  40 +libutf8proc.dylib: utf8proc.o
  41 + $(cc) -dynamiclib -o $@ $^ -install_name $(libdir)/$@
  42 +
  43 +ruby/Makefile: ruby/extconf.rb
  44 + cd ruby && ruby extconf.rb
  45 +
  46 +ruby/utf8proc_native.so: utf8proc.h utf8proc.c utf8proc_data.c \
  47 + ruby/utf8proc_native.c ruby/Makefile
  48 + cd ruby && make
  49 +
  50 +ruby/gem/lib/utf8proc.rb: ruby/utf8proc.rb
  51 + test -e ruby/gem/lib || mkdir ruby/gem/lib
  52 + cp ruby/utf8proc.rb ruby/gem/lib/
  53 +
  54 +ruby/gem/ext/extconf.rb: ruby/extconf.rb
  55 + test -e ruby/gem/ext || mkdir ruby/gem/ext
  56 + cp ruby/extconf.rb ruby/gem/ext/
  57 +
  58 +ruby/gem/ext/utf8proc_native.c: utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c
  59 + test -e ruby/gem/ext || mkdir ruby/gem/ext
  60 + cat utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c | grep -v '#include "utf8proc.h"' | grep -v '#include "utf8proc_data.c"' | grep -v '#include "../utf8proc.c"' > ruby/gem/ext/utf8proc_native.c
  61 +
  62 +ruby-gem:: ruby/gem/lib/utf8proc.rb ruby/gem/ext/extconf.rb ruby/gem/ext/utf8proc_native.c
  63 + cd ruby/gem && gem build utf8proc.gemspec
  64 +
  65 +pgsql/utf8proc_pgsql.so: utf8proc.h utf8proc.c utf8proc_data.c \
  66 + pgsql/utf8proc_pgsql.c
  67 + cd pgsql && make
  68 +
... ...
  1 +
  2 +Please read the LICENSE file, which is shipping with this software.
  3 +
  4 +
  5 +*** QUICK START ***
  6 +
  7 +For compilation of the C library call "make c-library", for compilation of
  8 +the ruby library call "make ruby-library" and for compilation of the
  9 +PostgreSQL extension call "make pgsql-library".
  10 +
  11 +For ruby you can also create a gem-file by calling "make ruby-gem".
  12 +
  13 +"make all" can be used to build everything, but both ruby and PostgreSQL
  14 +installations are required in this case.
  15 +
  16 +
  17 +*** GENERAL INFORMATION ***
  18 +
  19 +The C library is found in this directory after successful compilation and
  20 +is named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of
  21 +the files "utf8proc.rb" and "utf8proc_native.so", which are found in the
  22 +subdirectory "ruby/". If you chose to create a gem-file it is placed in the
  23 +"ruby/gem" directory. The PostgreSQL extension is named "utf8proc_pgsql.so"
  24 +and resides in the "pgsql/" directory.
  25 +
  26 +Both the ruby library and the PostgreSQL extension are built as stand-alone
  27 +libraries and are therefore not dependent the dynamic version of the
  28 +C library files, but this behaviour might change in future releases.
  29 +
  30 +The Unicode version being supported is 5.0.0.
  31 +Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as
  32 + version 5.0.0 had not been available at the time of implementation.
  33 +
  34 +For Unicode normalizations, the following options have to be used:
  35 +Normalization Form C: STABLE, COMPOSE
  36 +Normalization Form D: STABLE, DECOMPOSE
  37 +Normalization Form KC: STABLE, COMPOSE, COMPAT
  38 +Normalization Form KD: STABLE, DECOMPOSE, COMPAT
  39 +
  40 +
  41 +*** C LIBRARY ***
  42 +
  43 +The documentation for the C library is found in the utf8proc.h header file.
  44 +"utf8proc_map" is most likely function you will be using for mapping UTF-8
  45 +strings, unless you want to allocate memory yourself.
  46 +
  47 +
  48 +*** RUBY API ***
  49 +
  50 +The ruby library adds the methods "utf8map" and "utf8map!" to the String
  51 +class, and the method "utf8" to the Integer class.
  52 +
  53 +The String#utf8map method does the same as the "utf8proc_map" C function.
  54 +Options for the mapping procedure are passed as symbols, i.e:
  55 +"Hello".utf8map(:casefold) => "hello"
  56 +
  57 +The descriptions of all options are found in the C header file
  58 +"utf8proc.h". Please notice that the according symbols in ruby are all
  59 +lowercase.
  60 +
  61 +String#utf8map! is the destructive function in the meaning that the string
  62 +is replaced by the result.
  63 +
  64 +There are shortcuts for the 4 normalization forms specified by Unicode:
  65 +String#utf8nfd, String#utf8nfd!,
  66 +String#utf8nfc, String#utf8nfc!,
  67 +String#utf8nfkd, String#utf8nfkd!,
  68 +String#utf8nfkc, String#utf8nfkc!
  69 +
  70 +The method Integer#utf8 returns a UTF-8 string, which is containing the
  71 +unicode char given by the code point.
  72 +0x000A.utf8 => "\n"
  73 +0x2028.utf8 => "\342\200\250"
  74 +
  75 +
  76 +*** POSTGRESQL API ***
  77 +
  78 +For PostgreSQL there are two SQL functions supplied named "unifold" and
  79 +"unistrip". These functions function can be used to prepare index fields in
  80 +order to be folded in a way where string-comparisons make more sense, e.g.
  81 +where "bathtub" == "bath<soft hyphen>tub"
  82 +or "Hello World" == "hello world".
  83 +
  84 +CREATE TABLE people (
  85 + id serial8 primary key,
  86 + name text,
  87 + CHECK (unifold(name) NOTNULL)
  88 +);
  89 +CREATE INDEX name_idx ON people (unifold(name));
  90 +SELECT * FROM people WHERE unifold(name) = unifold('John Doe');
  91 +
  92 +The function "unistrip" removes character marks like accents or diaeresis,
  93 +while "unifold" keeps then.
  94 +
  95 +NOTICE: The outputs of the function can change between releases, as
  96 + utf8proc does not follow a versioning stability policy. You have to
  97 + rebuild your database indicies, if you upgrade to a newer version
  98 + of utf8proc.
  99 +
  100 +
  101 +*** TODO ***
  102 +
  103 +- detect stable code points and process segments independently in order to
  104 + save memory
  105 +- do a quick check before normalizing strings to optimize speed
  106 +- support stream processing
  107 +
  108 +
  109 +*** CONTACT ***
  110 +
  111 +If you find any bugs or experience difficulties in compiling this software,
  112 +please contact us:
  113 +
  114 +Project page: http://www.public-software-group.org/utf8proc
  115 +
  116 +
... ...
  1 +#!/usr/pkg/bin/ruby
  2 +
  3 +# This file was used to generate the 'unicode_data.c' file by parsing the
  4 +# Unicode data file 'UnicodeData.txt' of the Unicode Character Database.
  5 +# It is included for informational purposes only and not intended for
  6 +# production use.
  7 +
  8 +
  9 +# Copyright (c) 2009 Public Software Group e. V., Berlin, Germany
  10 +#
  11 +# Permission is hereby granted, free of charge, to any person obtaining a
  12 +# copy of this software and associated documentation files (the "Software"),
  13 +# to deal in the Software without restriction, including without limitation
  14 +# the rights to use, copy, modify, merge, publish, distribute, sublicense,
  15 +# and/or sell copies of the Software, and to permit persons to whom the
  16 +# Software is furnished to do so, subject to the following conditions:
  17 +#
  18 +# The above copyright notice and this permission notice shall be included in
  19 +# all copies or substantial portions of the Software.
  20 +#
  21 +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  22 +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  23 +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  24 +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  25 +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
  26 +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  27 +# DEALINGS IN THE SOFTWARE.
  28 +
  29 +
  30 +# This file contains derived data from a modified version of the
  31 +# Unicode data files. The following license applies to that data:
  32 +#
  33 +# COPYRIGHT AND PERMISSION NOTICE
  34 +#
  35 +# Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed
  36 +# under the Terms of Use in http://www.unicode.org/copyright.html.
  37 +#
  38 +# Permission is hereby granted, free of charge, to any person obtaining a
  39 +# copy of the Unicode data files and any associated documentation (the "Data
  40 +# Files") or Unicode software and any associated documentation (the
  41 +# "Software") to deal in the Data Files or Software without restriction,
  42 +# including without limitation the rights to use, copy, modify, merge,
  43 +# publish, distribute, and/or sell copies of the Data Files or Software, and
  44 +# to permit persons to whom the Data Files or Software are furnished to do
  45 +# so, provided that (a) the above copyright notice(s) and this permission
  46 +# notice appear with all copies of the Data Files or Software, (b) both the
  47 +# above copyright notice(s) and this permission notice appear in associated
  48 +# documentation, and (c) there is clear notice in each modified Data File or
  49 +# in the Software as well as in the documentation associated with the Data
  50 +# File(s) or Software that the data or software has been modified.
  51 +#
  52 +# THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
  53 +# KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  54 +# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF
  55 +# THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS
  56 +# INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR
  57 +# CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF
  58 +# USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
  59 +# TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
  60 +# PERFORMANCE OF THE DATA FILES OR SOFTWARE.
  61 +#
  62 +# Except as contained in this notice, the name of a copyright holder shall
  63 +# not be used in advertising or otherwise to promote the sale, use or other
  64 +# dealings in these Data Files or Software without prior written
  65 +# authorization of the copyright holder.
  66 +
  67 +
  68 +
  69 +$ignorable_list = <<END_OF_LIST
  70 +0000..0008 ; Default_Ignorable_Code_Point # Cc [9] <control-0000>..<control-0008>
  71 +000E..001F ; Default_Ignorable_Code_Point # Cc [18] <control-000E>..<control-001F>
  72 +007F..0084 ; Default_Ignorable_Code_Point # Cc [6] <control-007F>..<control-0084>
  73 +0086..009F ; Default_Ignorable_Code_Point # Cc [26] <control-0086>..<control-009F>
  74 +00AD ; Default_Ignorable_Code_Point # Cf SOFT HYPHEN
  75 +034F ; Default_Ignorable_Code_Point # Mn COMBINING GRAPHEME JOINER
  76 +0600..0603 ; Default_Ignorable_Code_Point # Cf [4] ARABIC NUMBER SIGN..ARABIC SIGN SAFHA
  77 +06DD ; Default_Ignorable_Code_Point # Cf ARABIC END OF AYAH
  78 +070F ; Default_Ignorable_Code_Point # Cf SYRIAC ABBREVIATION MARK
  79 +115F..1160 ; Default_Ignorable_Code_Point # Lo [2] HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG FILLER
  80 +17B4..17B5 ; Default_Ignorable_Code_Point # Cf [2] KHMER VOWEL INHERENT AQ..KHMER VOWEL INHERENT AA
  81 +180B..180D ; Default_Ignorable_Code_Point # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
  82 +200B..200F ; Default_Ignorable_Code_Point # Cf [5] ZERO WIDTH SPACE..RIGHT-TO-LEFT MARK
  83 +202A..202E ; Default_Ignorable_Code_Point # Cf [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE
  84 +2060..2063 ; Default_Ignorable_Code_Point # Cf [4] WORD JOINER..INVISIBLE SEPARATOR
  85 +2064..2069 ; Default_Ignorable_Code_Point # Cn [6] <reserved-2064>..<reserved-2069>
  86 +206A..206F ; Default_Ignorable_Code_Point # Cf [6] INHIBIT SYMMETRIC SWAPPING..NOMINAL DIGIT SHAPES
  87 +3164 ; Default_Ignorable_Code_Point # Lo HANGUL FILLER
  88 +D800..DFFF ; Default_Ignorable_Code_Point # Cs [2048] <surrogate-D800>..<surrogate-DFFF>
  89 +FE00..FE0F ; Default_Ignorable_Code_Point # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16
  90 +FEFF ; Default_Ignorable_Code_Point # Cf ZERO WIDTH NO-BREAK SPACE
  91 +FFA0 ; Default_Ignorable_Code_Point # Lo HALFWIDTH HANGUL FILLER
  92 +FFF0..FFF8 ; Default_Ignorable_Code_Point # Cn [9] <reserved-FFF0>..<reserved-FFF8>
  93 +1D173..1D17A ; Default_Ignorable_Code_Point # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE
  94 +E0001 ; Default_Ignorable_Code_Point # Cf LANGUAGE TAG
  95 +E0002..E001F ; Default_Ignorable_Code_Point # Cn [30] <reserved-E0002>..<reserved-E001F>
  96 +E0020..E007F ; Default_Ignorable_Code_Point # Cf [96] TAG SPACE..CANCEL TAG
  97 +E0080..E00FF ; Default_Ignorable_Code_Point # Cn [128] <reserved-E0080>..<reserved-E00FF>
  98 +E0100..E01EF ; Default_Ignorable_Code_Point # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
  99 +E01F0..E0FFF ; Default_Ignorable_Code_Point # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
  100 +END_OF_LIST
  101 +
  102 +$ignorable = []
  103 +$ignorable_list.each do |entry|
  104 + if entry =~ /^([0-9A-F]+)\.\.([0-9A-F]+)/
  105 + $1.hex.upto($2.hex) { |e2| $ignorable << e2 }
  106 + elsif entry =~ /^[0-9A-F]+/
  107 + $ignorable << $&.hex
  108 + end
  109 +end
  110 +
  111 +$grapheme_extend_list = <<END_OF_LIST
  112 +0300..036F ; Grapheme_Extend # Mn [112] COMBINING GRAVE ACCENT..COMBINING LATIN SMALL LETTER X
  113 +0483..0486 ; Grapheme_Extend # Mn [4] COMBINING CYRILLIC TITLO..COMBINING CYRILLIC PSILI PNEUMATA
  114 +0488..0489 ; Grapheme_Extend # Me [2] COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..COMBINING CYRILLIC MILLIONS SIGN
  115 +0591..05BD ; Grapheme_Extend # Mn [45] HEBREW ACCENT ETNAHTA..HEBREW POINT METEG
  116 +05BF ; Grapheme_Extend # Mn HEBREW POINT RAFE
  117 +05C1..05C2 ; Grapheme_Extend # Mn [2] HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT
  118 +05C4..05C5 ; Grapheme_Extend # Mn [2] HEBREW MARK UPPER DOT..HEBREW MARK LOWER DOT
  119 +05C7 ; Grapheme_Extend # Mn HEBREW POINT QAMATS QATAN
  120 +0610..0615 ; Grapheme_Extend # Mn [6] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL HIGH TAH
  121 +064B..065E ; Grapheme_Extend # Mn [20] ARABIC FATHATAN..ARABIC FATHA WITH TWO DOTS
  122 +0670 ; Grapheme_Extend # Mn ARABIC LETTER SUPERSCRIPT ALEF
  123 +06D6..06DC ; Grapheme_Extend # Mn [7] ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA..ARABIC SMALL HIGH SEEN
  124 +06DE ; Grapheme_Extend # Me ARABIC START OF RUB EL HIZB
  125 +06DF..06E4 ; Grapheme_Extend # Mn [6] ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL HIGH MADDA
  126 +06E7..06E8 ; Grapheme_Extend # Mn [2] ARABIC SMALL HIGH YEH..ARABIC SMALL HIGH NOON
  127 +06EA..06ED ; Grapheme_Extend # Mn [4] ARABIC EMPTY CENTRE LOW STOP..ARABIC SMALL LOW MEEM
  128 +0711 ; Grapheme_Extend # Mn SYRIAC LETTER SUPERSCRIPT ALAPH
  129 +0730..074A ; Grapheme_Extend # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH
  130 +07A6..07B0 ; Grapheme_Extend # Mn [11] THAANA ABAFILI..THAANA SUKUN
  131 +07EB..07F3 ; Grapheme_Extend # Mn [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE
  132 +0901..0902 ; Grapheme_Extend # Mn [2] DEVANAGARI SIGN CANDRABINDU..DEVANAGARI SIGN ANUSVARA
  133 +093C ; Grapheme_Extend # Mn DEVANAGARI SIGN NUKTA
  134 +0941..0948 ; Grapheme_Extend # Mn [8] DEVANAGARI VOWEL SIGN U..DEVANAGARI VOWEL SIGN AI
  135 +094D ; Grapheme_Extend # Mn DEVANAGARI SIGN VIRAMA
  136 +0951..0954 ; Grapheme_Extend # Mn [4] DEVANAGARI STRESS SIGN UDATTA..DEVANAGARI ACUTE ACCENT
  137 +0962..0963 ; Grapheme_Extend # Mn [2] DEVANAGARI VOWEL SIGN VOCALIC L..DEVANAGARI VOWEL SIGN VOCALIC LL
  138 +0981 ; Grapheme_Extend # Mn BENGALI SIGN CANDRABINDU
  139 +09BC ; Grapheme_Extend # Mn BENGALI SIGN NUKTA
  140 +09BE ; Grapheme_Extend # Mc BENGALI VOWEL SIGN AA
  141 +09C1..09C4 ; Grapheme_Extend # Mn [4] BENGALI VOWEL SIGN U..BENGALI VOWEL SIGN VOCALIC RR
  142 +09CD ; Grapheme_Extend # Mn BENGALI SIGN VIRAMA
  143 +09D7 ; Grapheme_Extend # Mc BENGALI AU LENGTH MARK
  144 +09E2..09E3 ; Grapheme_Extend # Mn [2] BENGALI VOWEL SIGN VOCALIC L..BENGALI VOWEL SIGN VOCALIC LL
  145 +0A01..0A02 ; Grapheme_Extend # Mn [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI
  146 +0A3C ; Grapheme_Extend # Mn GURMUKHI SIGN NUKTA
  147 +0A41..0A42 ; Grapheme_Extend # Mn [2] GURMUKHI VOWEL SIGN U..GURMUKHI VOWEL SIGN UU
  148 +0A47..0A48 ; Grapheme_Extend # Mn [2] GURMUKHI VOWEL SIGN EE..GURMUKHI VOWEL SIGN AI
  149 +0A4B..0A4D ; Grapheme_Extend # Mn [3] GURMUKHI VOWEL SIGN OO..GURMUKHI SIGN VIRAMA
  150 +0A70..0A71 ; Grapheme_Extend # Mn [2] GURMUKHI TIPPI..GURMUKHI ADDAK
  151 +0A81..0A82 ; Grapheme_Extend # Mn [2] GUJARATI SIGN CANDRABINDU..GUJARATI SIGN ANUSVARA
  152 +0ABC ; Grapheme_Extend # Mn GUJARATI SIGN NUKTA
  153 +0AC1..0AC5 ; Grapheme_Extend # Mn [5] GUJARATI VOWEL SIGN U..GUJARATI VOWEL SIGN CANDRA E
  154 +0AC7..0AC8 ; Grapheme_Extend # Mn [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI
  155 +0ACD ; Grapheme_Extend # Mn GUJARATI SIGN VIRAMA
  156 +0AE2..0AE3 ; Grapheme_Extend # Mn [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL
  157 +0B01 ; Grapheme_Extend # Mn ORIYA SIGN CANDRABINDU
  158 +0B3C ; Grapheme_Extend # Mn ORIYA SIGN NUKTA
  159 +0B3E ; Grapheme_Extend # Mc ORIYA VOWEL SIGN AA
  160 +0B3F ; Grapheme_Extend # Mn ORIYA VOWEL SIGN I
  161 +0B41..0B43 ; Grapheme_Extend # Mn [3] ORIYA VOWEL SIGN U..ORIYA VOWEL SIGN VOCALIC R
  162 +0B4D ; Grapheme_Extend # Mn ORIYA SIGN VIRAMA
  163 +0B56 ; Grapheme_Extend # Mn ORIYA AI LENGTH MARK
  164 +0B57 ; Grapheme_Extend # Mc ORIYA AU LENGTH MARK
  165 +0B82 ; Grapheme_Extend # Mn TAMIL SIGN ANUSVARA
  166 +0BBE ; Grapheme_Extend # Mc TAMIL VOWEL SIGN AA
  167 +0BC0 ; Grapheme_Extend # Mn TAMIL VOWEL SIGN II
  168 +0BCD ; Grapheme_Extend # Mn TAMIL SIGN VIRAMA
  169 +0BD7 ; Grapheme_Extend # Mc TAMIL AU LENGTH MARK
  170 +0C3E..0C40 ; Grapheme_Extend # Mn [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II
  171 +0C46..0C48 ; Grapheme_Extend # Mn [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI
  172 +0C4A..0C4D ; Grapheme_Extend # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
  173 +0C55..0C56 ; Grapheme_Extend # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK
  174 +0CBC ; Grapheme_Extend # Mn KANNADA SIGN NUKTA
  175 +0CBF ; Grapheme_Extend # Mn KANNADA VOWEL SIGN I
  176 +0CC2 ; Grapheme_Extend # Mc KANNADA VOWEL SIGN UU
  177 +0CC6 ; Grapheme_Extend # Mn KANNADA VOWEL SIGN E
  178 +0CCC..0CCD ; Grapheme_Extend # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
  179 +0CD5..0CD6 ; Grapheme_Extend # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
  180 +0CE2..0CE3 ; Grapheme_Extend # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
  181 +0D3E ; Grapheme_Extend # Mc MALAYALAM VOWEL SIGN AA
  182 +0D41..0D43 ; Grapheme_Extend # Mn [3] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC R
  183 +0D4D ; Grapheme_Extend # Mn MALAYALAM SIGN VIRAMA
  184 +0D57 ; Grapheme_Extend # Mc MALAYALAM AU LENGTH MARK
  185 +0DCA ; Grapheme_Extend # Mn SINHALA SIGN AL-LAKUNA
  186 +0DCF ; Grapheme_Extend # Mc SINHALA VOWEL SIGN AELA-PILLA
  187 +0DD2..0DD4 ; Grapheme_Extend # Mn [3] SINHALA VOWEL SIGN KETTI IS-PILLA..SINHALA VOWEL SIGN KETTI PAA-PILLA
  188 +0DD6 ; Grapheme_Extend # Mn SINHALA VOWEL SIGN DIGA PAA-PILLA
  189 +0DDF ; Grapheme_Extend # Mc SINHALA VOWEL SIGN GAYANUKITTA
  190 +0E31 ; Grapheme_Extend # Mn THAI CHARACTER MAI HAN-AKAT
  191 +0E34..0E3A ; Grapheme_Extend # Mn [7] THAI CHARACTER SARA I..THAI CHARACTER PHINTHU
  192 +0E47..0E4E ; Grapheme_Extend # Mn [8] THAI CHARACTER MAITAIKHU..THAI CHARACTER YAMAKKAN
  193 +0EB1 ; Grapheme_Extend # Mn LAO VOWEL SIGN MAI KAN
  194 +0EB4..0EB9 ; Grapheme_Extend # Mn [6] LAO VOWEL SIGN I..LAO VOWEL SIGN UU
  195 +0EBB..0EBC ; Grapheme_Extend # Mn [2] LAO VOWEL SIGN MAI KON..LAO SEMIVOWEL SIGN LO
  196 +0EC8..0ECD ; Grapheme_Extend # Mn [6] LAO TONE MAI EK..LAO NIGGAHITA
  197 +0F18..0F19 ; Grapheme_Extend # Mn [2] TIBETAN ASTROLOGICAL SIGN -KHYUD PA..TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS
  198 +0F35 ; Grapheme_Extend # Mn TIBETAN MARK NGAS BZUNG NYI ZLA
  199 +0F37 ; Grapheme_Extend # Mn TIBETAN MARK NGAS BZUNG SGOR RTAGS
  200 +0F39 ; Grapheme_Extend # Mn TIBETAN MARK TSA -PHRU
  201 +0F71..0F7E ; Grapheme_Extend # Mn [14] TIBETAN VOWEL SIGN AA..TIBETAN SIGN RJES SU NGA RO
  202 +0F80..0F84 ; Grapheme_Extend # Mn [5] TIBETAN VOWEL SIGN REVERSED I..TIBETAN MARK HALANTA
  203 +0F86..0F87 ; Grapheme_Extend # Mn [2] TIBETAN SIGN LCI RTAGS..TIBETAN SIGN YANG RTAGS
  204 +0F90..0F97 ; Grapheme_Extend # Mn [8] TIBETAN SUBJOINED LETTER KA..TIBETAN SUBJOINED LETTER JA
  205 +0F99..0FBC ; Grapheme_Extend # Mn [36] TIBETAN SUBJOINED LETTER NYA..TIBETAN SUBJOINED LETTER FIXED-FORM RA
  206 +0FC6 ; Grapheme_Extend # Mn TIBETAN SYMBOL PADMA GDAN
  207 +102D..1030 ; Grapheme_Extend # Mn [4] MYANMAR VOWEL SIGN I..MYANMAR VOWEL SIGN UU
  208 +1032 ; Grapheme_Extend # Mn MYANMAR VOWEL SIGN AI
  209 +1036..1037 ; Grapheme_Extend # Mn [2] MYANMAR SIGN ANUSVARA..MYANMAR SIGN DOT BELOW
  210 +1039 ; Grapheme_Extend # Mn MYANMAR SIGN VIRAMA
  211 +1058..1059 ; Grapheme_Extend # Mn [2] MYANMAR VOWEL SIGN VOCALIC L..MYANMAR VOWEL SIGN VOCALIC LL
  212 +135F ; Grapheme_Extend # Mn ETHIOPIC COMBINING GEMINATION MARK
  213 +1712..1714 ; Grapheme_Extend # Mn [3] TAGALOG VOWEL SIGN I..TAGALOG SIGN VIRAMA
  214 +1732..1734 ; Grapheme_Extend # Mn [3] HANUNOO VOWEL SIGN I..HANUNOO SIGN PAMUDPOD
  215 +1752..1753 ; Grapheme_Extend # Mn [2] BUHID VOWEL SIGN I..BUHID VOWEL SIGN U
  216 +1772..1773 ; Grapheme_Extend # Mn [2] TAGBANWA VOWEL SIGN I..TAGBANWA VOWEL SIGN U
  217 +17B7..17BD ; Grapheme_Extend # Mn [7] KHMER VOWEL SIGN I..KHMER VOWEL SIGN UA
  218 +17C6 ; Grapheme_Extend # Mn KHMER SIGN NIKAHIT
  219 +17C9..17D3 ; Grapheme_Extend # Mn [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT
  220 +17DD ; Grapheme_Extend # Mn KHMER SIGN ATTHACAN
  221 +180B..180D ; Grapheme_Extend # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
  222 +18A9 ; Grapheme_Extend # Mn MONGOLIAN LETTER ALI GALI DAGALGA
  223 +1920..1922 ; Grapheme_Extend # Mn [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U
  224 +1927..1928 ; Grapheme_Extend # Mn [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O
  225 +1932 ; Grapheme_Extend # Mn LIMBU SMALL LETTER ANUSVARA
  226 +1939..193B ; Grapheme_Extend # Mn [3] LIMBU SIGN MUKPHRENG..LIMBU SIGN SA-I
  227 +1A17..1A18 ; Grapheme_Extend # Mn [2] BUGINESE VOWEL SIGN I..BUGINESE VOWEL SIGN U
  228 +1B00..1B03 ; Grapheme_Extend # Mn [4] BALINESE SIGN ULU RICEM..BALINESE SIGN SURANG
  229 +1B34 ; Grapheme_Extend # Mn BALINESE SIGN REREKAN
  230 +1B36..1B3A ; Grapheme_Extend # Mn [5] BALINESE VOWEL SIGN ULU..BALINESE VOWEL SIGN RA REPA
  231 +1B3C ; Grapheme_Extend # Mn BALINESE VOWEL SIGN LA LENGA
  232 +1B42 ; Grapheme_Extend # Mn BALINESE VOWEL SIGN PEPET
  233 +1B6B..1B73 ; Grapheme_Extend # Mn [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG
  234 +1DC0..1DCA ; Grapheme_Extend # Mn [11] COMBINING DOTTED GRAVE ACCENT..COMBINING LATIN SMALL LETTER R BELOW
  235 +1DFE..1DFF ; Grapheme_Extend # Mn [2] COMBINING LEFT ARROWHEAD ABOVE..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
  236 +200C..200D ; Grapheme_Extend # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
  237 +20D0..20DC ; Grapheme_Extend # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
  238 +20DD..20E0 ; Grapheme_Extend # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
  239 +20E1 ; Grapheme_Extend # Mn COMBINING LEFT RIGHT ARROW ABOVE
  240 +20E2..20E4 ; Grapheme_Extend # Me [3] COMBINING ENCLOSING SCREEN..COMBINING ENCLOSING UPWARD POINTING TRIANGLE
  241 +20E5..20EF ; Grapheme_Extend # Mn [11] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING RIGHT ARROW BELOW
  242 +302A..302F ; Grapheme_Extend # Mn [6] IDEOGRAPHIC LEVEL TONE MARK..HANGUL DOUBLE DOT TONE MARK
  243 +3099..309A ; Grapheme_Extend # Mn [2] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
  244 +A806 ; Grapheme_Extend # Mn SYLOTI NAGRI SIGN HASANTA
  245 +A80B ; Grapheme_Extend # Mn SYLOTI NAGRI SIGN ANUSVARA
  246 +A825..A826 ; Grapheme_Extend # Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
  247 +FB1E ; Grapheme_Extend # Mn HEBREW POINT JUDEO-SPANISH VARIKA
  248 +FE00..FE0F ; Grapheme_Extend # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16
  249 +FE20..FE23 ; Grapheme_Extend # Mn [4] COMBINING LIGATURE LEFT HALF..COMBINING DOUBLE TILDE RIGHT HALF
  250 +10A01..10A03 ; Grapheme_Extend # Mn [3] KHAROSHTHI VOWEL SIGN I..KHAROSHTHI VOWEL SIGN VOCALIC R
  251 +10A05..10A06 ; Grapheme_Extend # Mn [2] KHAROSHTHI VOWEL SIGN E..KHAROSHTHI VOWEL SIGN O
  252 +10A0C..10A0F ; Grapheme_Extend # Mn [4] KHAROSHTHI VOWEL LENGTH MARK..KHAROSHTHI SIGN VISARGA
  253 +10A38..10A3A ; Grapheme_Extend # Mn [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW
  254 +10A3F ; Grapheme_Extend # Mn KHAROSHTHI VIRAMA
  255 +1D165 ; Grapheme_Extend # Mc MUSICAL SYMBOL COMBINING STEM
  256 +1D167..1D169 ; Grapheme_Extend # Mn [3] MUSICAL SYMBOL COMBINING TREMOLO-1..MUSICAL SYMBOL COMBINING TREMOLO-3
  257 +1D16E..1D172 ; Grapheme_Extend # Mc [5] MUSICAL SYMBOL COMBINING FLAG-1..MUSICAL SYMBOL COMBINING FLAG-5
  258 +1D17B..1D182 ; Grapheme_Extend # Mn [8] MUSICAL SYMBOL COMBINING ACCENT..MUSICAL SYMBOL COMBINING LOURE
  259 +1D185..1D18B ; Grapheme_Extend # Mn [7] MUSICAL SYMBOL COMBINING DOIT..MUSICAL SYMBOL COMBINING TRIPLE TONGUE
  260 +1D1AA..1D1AD ; Grapheme_Extend # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO
  261 +1D242..1D244 ; Grapheme_Extend # Mn [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME
  262 +E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
  263 +END_OF_LIST
  264 +
  265 +$grapheme_extend = []
  266 +$grapheme_extend_list.each do |entry|
  267 + if entry =~ /^([0-9A-F]+)\.\.([0-9A-F]+)/
  268 + $1.hex.upto($2.hex) { |e2| $grapheme_extend << e2 }
  269 + elsif entry =~ /^[0-9A-F]+/
  270 + $grapheme_extend << $&.hex
  271 + end
  272 +end
  273 +
  274 +$exclusions = <<END_OF_LIST
  275 +0958 # DEVANAGARI LETTER QA
  276 +0959 # DEVANAGARI LETTER KHHA
  277 +095A # DEVANAGARI LETTER GHHA
  278 +095B # DEVANAGARI LETTER ZA
  279 +095C # DEVANAGARI LETTER DDDHA
  280 +095D # DEVANAGARI LETTER RHA
  281 +095E # DEVANAGARI LETTER FA
  282 +095F # DEVANAGARI LETTER YYA
  283 +09DC # BENGALI LETTER RRA
  284 +09DD # BENGALI LETTER RHA
  285 +09DF # BENGALI LETTER YYA
  286 +0A33 # GURMUKHI LETTER LLA
  287 +0A36 # GURMUKHI LETTER SHA
  288 +0A59 # GURMUKHI LETTER KHHA
  289 +0A5A # GURMUKHI LETTER GHHA
  290 +0A5B # GURMUKHI LETTER ZA
  291 +0A5E # GURMUKHI LETTER FA
  292 +0B5C # ORIYA LETTER RRA
  293 +0B5D # ORIYA LETTER RHA
  294 +0F43 # TIBETAN LETTER GHA
  295 +0F4D # TIBETAN LETTER DDHA
  296 +0F52 # TIBETAN LETTER DHA
  297 +0F57 # TIBETAN LETTER BHA
  298 +0F5C # TIBETAN LETTER DZHA
  299 +0F69 # TIBETAN LETTER KSSA
  300 +0F76 # TIBETAN VOWEL SIGN VOCALIC R
  301 +0F78 # TIBETAN VOWEL SIGN VOCALIC L
  302 +0F93 # TIBETAN SUBJOINED LETTER GHA
  303 +0F9D # TIBETAN SUBJOINED LETTER DDHA
  304 +0FA2 # TIBETAN SUBJOINED LETTER DHA
  305 +0FA7 # TIBETAN SUBJOINED LETTER BHA
  306 +0FAC # TIBETAN SUBJOINED LETTER DZHA
  307 +0FB9 # TIBETAN SUBJOINED LETTER KSSA
  308 +FB1D # HEBREW LETTER YOD WITH HIRIQ
  309 +FB1F # HEBREW LIGATURE YIDDISH YOD YOD PATAH
  310 +FB2A # HEBREW LETTER SHIN WITH SHIN DOT
  311 +FB2B # HEBREW LETTER SHIN WITH SIN DOT
  312 +FB2C # HEBREW LETTER SHIN WITH DAGESH AND SHIN DOT
  313 +FB2D # HEBREW LETTER SHIN WITH DAGESH AND SIN DOT
  314 +FB2E # HEBREW LETTER ALEF WITH PATAH
  315 +FB2F # HEBREW LETTER ALEF WITH QAMATS
  316 +FB30 # HEBREW LETTER ALEF WITH MAPIQ
  317 +FB31 # HEBREW LETTER BET WITH DAGESH
  318 +FB32 # HEBREW LETTER GIMEL WITH DAGESH
  319 +FB33 # HEBREW LETTER DALET WITH DAGESH
  320 +FB34 # HEBREW LETTER HE WITH MAPIQ
  321 +FB35 # HEBREW LETTER VAV WITH DAGESH
  322 +FB36 # HEBREW LETTER ZAYIN WITH DAGESH
  323 +FB38 # HEBREW LETTER TET WITH DAGESH
  324 +FB39 # HEBREW LETTER YOD WITH DAGESH
  325 +FB3A # HEBREW LETTER FINAL KAF WITH DAGESH
  326 +FB3B # HEBREW LETTER KAF WITH DAGESH
  327 +FB3C # HEBREW LETTER LAMED WITH DAGESH
  328 +FB3E # HEBREW LETTER MEM WITH DAGESH
  329 +FB40 # HEBREW LETTER NUN WITH DAGESH
  330 +FB41 # HEBREW LETTER SAMEKH WITH DAGESH
  331 +FB43 # HEBREW LETTER FINAL PE WITH DAGESH
  332 +FB44 # HEBREW LETTER PE WITH DAGESH
  333 +FB46 # HEBREW LETTER TSADI WITH DAGESH
  334 +FB47 # HEBREW LETTER QOF WITH DAGESH
  335 +FB48 # HEBREW LETTER RESH WITH DAGESH
  336 +FB49 # HEBREW LETTER SHIN WITH DAGESH
  337 +FB4A # HEBREW LETTER TAV WITH DAGESH
  338 +FB4B # HEBREW LETTER VAV WITH HOLAM
  339 +FB4C # HEBREW LETTER BET WITH RAFE
  340 +FB4D # HEBREW LETTER KAF WITH RAFE
  341 +FB4E # HEBREW LETTER PE WITH RAFE
  342 +END_OF_LIST
  343 +$exclusions = $exclusions.chomp.split("\n").collect { |e| e.hex }
  344 +
  345 +$excl_version = <<END_OF_LIST
  346 +2ADC # FORKING
  347 +1D15E # MUSICAL SYMBOL HALF NOTE
  348 +1D15F # MUSICAL SYMBOL QUARTER NOTE
  349 +1D160 # MUSICAL SYMBOL EIGHTH NOTE
  350 +1D161 # MUSICAL SYMBOL SIXTEENTH NOTE
  351 +1D162 # MUSICAL SYMBOL THIRTY-SECOND NOTE
  352 +1D163 # MUSICAL SYMBOL SIXTY-FOURTH NOTE
  353 +1D164 # MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE
  354 +1D1BB # MUSICAL SYMBOL MINIMA
  355 +1D1BC # MUSICAL SYMBOL MINIMA BLACK
  356 +1D1BD # MUSICAL SYMBOL SEMIMINIMA WHITE
  357 +1D1BE # MUSICAL SYMBOL SEMIMINIMA BLACK
  358 +1D1BF # MUSICAL SYMBOL FUSA WHITE
  359 +1D1C0 # MUSICAL SYMBOL FUSA BLACK
  360 +END_OF_LIST
  361 +$excl_version = $excl_version.chomp.split("\n").collect { |e| e.hex }
  362 +
  363 +$case_folding_string = <<END_OF_LIST
  364 +0041; C; 0061; # LATIN CAPITAL LETTER A
  365 +0042; C; 0062; # LATIN CAPITAL LETTER B
  366 +0043; C; 0063; # LATIN CAPITAL LETTER C
  367 +0044; C; 0064; # LATIN CAPITAL LETTER D
  368 +0045; C; 0065; # LATIN CAPITAL LETTER E
  369 +0046; C; 0066; # LATIN CAPITAL LETTER F
  370 +0047; C; 0067; # LATIN CAPITAL LETTER G
  371 +0048; C; 0068; # LATIN CAPITAL LETTER H
  372 +0049; C; 0069; # LATIN CAPITAL LETTER I
  373 +004A; C; 006A; # LATIN CAPITAL LETTER J
  374 +004B; C; 006B; # LATIN CAPITAL LETTER K
  375 +004C; C; 006C; # LATIN CAPITAL LETTER L
  376 +004D; C; 006D; # LATIN CAPITAL LETTER M
  377 +004E; C; 006E; # LATIN CAPITAL LETTER N
  378 +004F; C; 006F; # LATIN CAPITAL LETTER O
  379 +0050; C; 0070; # LATIN CAPITAL LETTER P
  380 +0051; C; 0071; # LATIN CAPITAL LETTER Q
  381 +0052; C; 0072; # LATIN CAPITAL LETTER R
  382 +0053; C; 0073; # LATIN CAPITAL LETTER S
  383 +0054; C; 0074; # LATIN CAPITAL LETTER T
  384 +0055; C; 0075; # LATIN CAPITAL LETTER U
  385 +0056; C; 0076; # LATIN CAPITAL LETTER V
  386 +0057; C; 0077; # LATIN CAPITAL LETTER W
  387 +0058; C; 0078; # LATIN CAPITAL LETTER X
  388 +0059; C; 0079; # LATIN CAPITAL LETTER Y
  389 +005A; C; 007A; # LATIN CAPITAL LETTER Z
  390 +00B5; C; 03BC; # MICRO SIGN
  391 +00C0; C; 00E0; # LATIN CAPITAL LETTER A WITH GRAVE
  392 +00C1; C; 00E1; # LATIN CAPITAL LETTER A WITH ACUTE
  393 +00C2; C; 00E2; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
  394 +00C3; C; 00E3; # LATIN CAPITAL LETTER A WITH TILDE
  395 +00C4; C; 00E4; # LATIN CAPITAL LETTER A WITH DIAERESIS
  396 +00C5; C; 00E5; # LATIN CAPITAL LETTER A WITH RING ABOVE
  397 +00C6; C; 00E6; # LATIN CAPITAL LETTER AE
  398 +00C7; C; 00E7; # LATIN CAPITAL LETTER C WITH CEDILLA
  399 +00C8; C; 00E8; # LATIN CAPITAL LETTER E WITH GRAVE
  400 +00C9; C; 00E9; # LATIN CAPITAL LETTER E WITH ACUTE
  401 +00CA; C; 00EA; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
  402 +00CB; C; 00EB; # LATIN CAPITAL LETTER E WITH DIAERESIS
  403 +00CC; C; 00EC; # LATIN CAPITAL LETTER I WITH GRAVE
  404 +00CD; C; 00ED; # LATIN CAPITAL LETTER I WITH ACUTE
  405 +00CE; C; 00EE; # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
  406 +00CF; C; 00EF; # LATIN CAPITAL LETTER I WITH DIAERESIS
  407 +00D0; C; 00F0; # LATIN CAPITAL LETTER ETH
  408 +00D1; C; 00F1; # LATIN CAPITAL LETTER N WITH TILDE
  409 +00D2; C; 00F2; # LATIN CAPITAL LETTER O WITH GRAVE
  410 +00D3; C; 00F3; # LATIN CAPITAL LETTER O WITH ACUTE
  411 +00D4; C; 00F4; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
  412 +00D5; C; 00F5; # LATIN CAPITAL LETTER O WITH TILDE
  413 +00D6; C; 00F6; # LATIN CAPITAL LETTER O WITH DIAERESIS
  414 +00D8; C; 00F8; # LATIN CAPITAL LETTER O WITH STROKE
  415 +00D9; C; 00F9; # LATIN CAPITAL LETTER U WITH GRAVE
  416 +00DA; C; 00FA; # LATIN CAPITAL LETTER U WITH ACUTE
  417 +00DB; C; 00FB; # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
  418 +00DC; C; 00FC; # LATIN CAPITAL LETTER U WITH DIAERESIS
  419 +00DD; C; 00FD; # LATIN CAPITAL LETTER Y WITH ACUTE
  420 +00DE; C; 00FE; # LATIN CAPITAL LETTER THORN
  421 +00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
  422 +0100; C; 0101; # LATIN CAPITAL LETTER A WITH MACRON
  423 +0102; C; 0103; # LATIN CAPITAL LETTER A WITH BREVE
  424 +0104; C; 0105; # LATIN CAPITAL LETTER A WITH OGONEK
  425 +0106; C; 0107; # LATIN CAPITAL LETTER C WITH ACUTE
  426 +0108; C; 0109; # LATIN CAPITAL LETTER C WITH CIRCUMFLEX
  427 +010A; C; 010B; # LATIN CAPITAL LETTER C WITH DOT ABOVE
  428 +010C; C; 010D; # LATIN CAPITAL LETTER C WITH CARON
  429 +010E; C; 010F; # LATIN CAPITAL LETTER D WITH CARON
  430 +0110; C; 0111; # LATIN CAPITAL LETTER D WITH STROKE
  431 +0112; C; 0113; # LATIN CAPITAL LETTER E WITH MACRON
  432 +0114; C; 0115; # LATIN CAPITAL LETTER E WITH BREVE
  433 +0116; C; 0117; # LATIN CAPITAL LETTER E WITH DOT ABOVE
  434 +0118; C; 0119; # LATIN CAPITAL LETTER E WITH OGONEK
  435 +011A; C; 011B; # LATIN CAPITAL LETTER E WITH CARON
  436 +011C; C; 011D; # LATIN CAPITAL LETTER G WITH CIRCUMFLEX
  437 +011E; C; 011F; # LATIN CAPITAL LETTER G WITH BREVE
  438 +0120; C; 0121; # LATIN CAPITAL LETTER G WITH DOT ABOVE
  439 +0122; C; 0123; # LATIN CAPITAL LETTER G WITH CEDILLA
  440 +0124; C; 0125; # LATIN CAPITAL LETTER H WITH CIRCUMFLEX
  441 +0126; C; 0127; # LATIN CAPITAL LETTER H WITH STROKE
  442 +0128; C; 0129; # LATIN CAPITAL LETTER I WITH TILDE
  443 +012A; C; 012B; # LATIN CAPITAL LETTER I WITH MACRON
  444 +012C; C; 012D; # LATIN CAPITAL LETTER I WITH BREVE
  445 +012E; C; 012F; # LATIN CAPITAL LETTER I WITH OGONEK
  446 +0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE
  447 +0132; C; 0133; # LATIN CAPITAL LIGATURE IJ
  448 +0134; C; 0135; # LATIN CAPITAL LETTER J WITH CIRCUMFLEX
  449 +0136; C; 0137; # LATIN CAPITAL LETTER K WITH CEDILLA
  450 +0139; C; 013A; # LATIN CAPITAL LETTER L WITH ACUTE
  451 +013B; C; 013C; # LATIN CAPITAL LETTER L WITH CEDILLA
  452 +013D; C; 013E; # LATIN CAPITAL LETTER L WITH CARON
  453 +013F; C; 0140; # LATIN CAPITAL LETTER L WITH MIDDLE DOT
  454 +0141; C; 0142; # LATIN CAPITAL LETTER L WITH STROKE
  455 +0143; C; 0144; # LATIN CAPITAL LETTER N WITH ACUTE
  456 +0145; C; 0146; # LATIN CAPITAL LETTER N WITH CEDILLA
  457 +0147; C; 0148; # LATIN CAPITAL LETTER N WITH CARON
  458 +0149; F; 02BC 006E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
  459 +014A; C; 014B; # LATIN CAPITAL LETTER ENG
  460 +014C; C; 014D; # LATIN CAPITAL LETTER O WITH MACRON
  461 +014E; C; 014F; # LATIN CAPITAL LETTER O WITH BREVE
  462 +0150; C; 0151; # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
  463 +0152; C; 0153; # LATIN CAPITAL LIGATURE OE
  464 +0154; C; 0155; # LATIN CAPITAL LETTER R WITH ACUTE
  465 +0156; C; 0157; # LATIN CAPITAL LETTER R WITH CEDILLA
  466 +0158; C; 0159; # LATIN CAPITAL LETTER R WITH CARON
  467 +015A; C; 015B; # LATIN CAPITAL LETTER S WITH ACUTE
  468 +015C; C; 015D; # LATIN CAPITAL LETTER S WITH CIRCUMFLEX
  469 +015E; C; 015F; # LATIN CAPITAL LETTER S WITH CEDILLA
  470 +0160; C; 0161; # LATIN CAPITAL LETTER S WITH CARON
  471 +0162; C; 0163; # LATIN CAPITAL LETTER T WITH CEDILLA
  472 +0164; C; 0165; # LATIN CAPITAL LETTER T WITH CARON
  473 +0166; C; 0167; # LATIN CAPITAL LETTER T WITH STROKE
  474 +0168; C; 0169; # LATIN CAPITAL LETTER U WITH TILDE
  475 +016A; C; 016B; # LATIN CAPITAL LETTER U WITH MACRON
  476 +016C; C; 016D; # LATIN CAPITAL LETTER U WITH BREVE
  477 +016E; C; 016F; # LATIN CAPITAL LETTER U WITH RING ABOVE
  478 +0170; C; 0171; # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
  479 +0172; C; 0173; # LATIN CAPITAL LETTER U WITH OGONEK
  480 +0174; C; 0175; # LATIN CAPITAL LETTER W WITH CIRCUMFLEX
  481 +0176; C; 0177; # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
  482 +0178; C; 00FF; # LATIN CAPITAL LETTER Y WITH DIAERESIS
  483 +0179; C; 017A; # LATIN CAPITAL LETTER Z WITH ACUTE
  484 +017B; C; 017C; # LATIN CAPITAL LETTER Z WITH DOT ABOVE
  485 +017D; C; 017E; # LATIN CAPITAL LETTER Z WITH CARON
  486 +017F; C; 0073; # LATIN SMALL LETTER LONG S
  487 +0181; C; 0253; # LATIN CAPITAL LETTER B WITH HOOK
  488 +0182; C; 0183; # LATIN CAPITAL LETTER B WITH TOPBAR
  489 +0184; C; 0185; # LATIN CAPITAL LETTER TONE SIX
  490 +0186; C; 0254; # LATIN CAPITAL LETTER OPEN O
  491 +0187; C; 0188; # LATIN CAPITAL LETTER C WITH HOOK
  492 +0189; C; 0256; # LATIN CAPITAL LETTER AFRICAN D
  493 +018A; C; 0257; # LATIN CAPITAL LETTER D WITH HOOK
  494 +018B; C; 018C; # LATIN CAPITAL LETTER D WITH TOPBAR
  495 +018E; C; 01DD; # LATIN CAPITAL LETTER REVERSED E
  496 +018F; C; 0259; # LATIN CAPITAL LETTER SCHWA
  497 +0190; C; 025B; # LATIN CAPITAL LETTER OPEN E
  498 +0191; C; 0192; # LATIN CAPITAL LETTER F WITH HOOK
  499 +0193; C; 0260; # LATIN CAPITAL LETTER G WITH HOOK
  500 +0194; C; 0263; # LATIN CAPITAL LETTER GAMMA
  501 +0196; C; 0269; # LATIN CAPITAL LETTER IOTA
  502 +0197; C; 0268; # LATIN CAPITAL LETTER I WITH STROKE
  503 +0198; C; 0199; # LATIN CAPITAL LETTER K WITH HOOK
  504 +019C; C; 026F; # LATIN CAPITAL LETTER TURNED M
  505 +019D; C; 0272; # LATIN CAPITAL LETTER N WITH LEFT HOOK
  506 +019F; C; 0275; # LATIN CAPITAL LETTER O WITH MIDDLE TILDE
  507 +01A0; C; 01A1; # LATIN CAPITAL LETTER O WITH HORN
  508 +01A2; C; 01A3; # LATIN CAPITAL LETTER OI
  509 +01A4; C; 01A5; # LATIN CAPITAL LETTER P WITH HOOK
  510 +01A6; C; 0280; # LATIN LETTER YR
  511 +01A7; C; 01A8; # LATIN CAPITAL LETTER TONE TWO
  512 +01A9; C; 0283; # LATIN CAPITAL LETTER ESH
  513 +01AC; C; 01AD; # LATIN CAPITAL LETTER T WITH HOOK
  514 +01AE; C; 0288; # LATIN CAPITAL LETTER T WITH RETROFLEX HOOK
  515 +01AF; C; 01B0; # LATIN CAPITAL LETTER U WITH HORN
  516 +01B1; C; 028A; # LATIN CAPITAL LETTER UPSILON
  517 +01B2; C; 028B; # LATIN CAPITAL LETTER V WITH HOOK
  518 +01B3; C; 01B4; # LATIN CAPITAL LETTER Y WITH HOOK
  519 +01B5; C; 01B6; # LATIN CAPITAL LETTER Z WITH STROKE
  520 +01B7; C; 0292; # LATIN CAPITAL LETTER EZH
  521 +01B8; C; 01B9; # LATIN CAPITAL LETTER EZH REVERSED
  522 +01BC; C; 01BD; # LATIN CAPITAL LETTER TONE FIVE
  523 +01C4; C; 01C6; # LATIN CAPITAL LETTER DZ WITH CARON
  524 +01C5; C; 01C6; # LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON
  525 +01C7; C; 01C9; # LATIN CAPITAL LETTER LJ
  526 +01C8; C; 01C9; # LATIN CAPITAL LETTER L WITH SMALL LETTER J
  527 +01CA; C; 01CC; # LATIN CAPITAL LETTER NJ
  528 +01CB; C; 01CC; # LATIN CAPITAL LETTER N WITH SMALL LETTER J
  529 +01CD; C; 01CE; # LATIN CAPITAL LETTER A WITH CARON
  530 +01CF; C; 01D0; # LATIN CAPITAL LETTER I WITH CARON
  531 +01D1; C; 01D2; # LATIN CAPITAL LETTER O WITH CARON
  532 +01D3; C; 01D4; # LATIN CAPITAL LETTER U WITH CARON
  533 +01D5; C; 01D6; # LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON
  534 +01D7; C; 01D8; # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
  535 +01D9; C; 01DA; # LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON
  536 +01DB; C; 01DC; # LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE
  537 +01DE; C; 01DF; # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON
  538 +01E0; C; 01E1; # LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON
  539 +01E2; C; 01E3; # LATIN CAPITAL LETTER AE WITH MACRON
  540 +01E4; C; 01E5; # LATIN CAPITAL LETTER G WITH STROKE
  541 +01E6; C; 01E7; # LATIN CAPITAL LETTER G WITH CARON
  542 +01E8; C; 01E9; # LATIN CAPITAL LETTER K WITH CARON
  543 +01EA; C; 01EB; # LATIN CAPITAL LETTER O WITH OGONEK
  544 +01EC; C; 01ED; # LATIN CAPITAL LETTER O WITH OGONEK AND MACRON
  545 +01EE; C; 01EF; # LATIN CAPITAL LETTER EZH WITH CARON
  546 +01F0; F; 006A 030C; # LATIN SMALL LETTER J WITH CARON
  547 +01F1; C; 01F3; # LATIN CAPITAL LETTER DZ
  548 +01F2; C; 01F3; # LATIN CAPITAL LETTER D WITH SMALL LETTER Z
  549 +01F4; C; 01F5; # LATIN CAPITAL LETTER G WITH ACUTE
  550 +01F6; C; 0195; # LATIN CAPITAL LETTER HWAIR
  551 +01F7; C; 01BF; # LATIN CAPITAL LETTER WYNN
  552 +01F8; C; 01F9; # LATIN CAPITAL LETTER N WITH GRAVE
  553 +01FA; C; 01FB; # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
  554 +01FC; C; 01FD; # LATIN CAPITAL LETTER AE WITH ACUTE
  555 +01FE; C; 01FF; # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE
  556 +0200; C; 0201; # LATIN CAPITAL LETTER A WITH DOUBLE GRAVE
  557 +0202; C; 0203; # LATIN CAPITAL LETTER A WITH INVERTED BREVE
  558 +0204; C; 0205; # LATIN CAPITAL LETTER E WITH DOUBLE GRAVE
  559 +0206; C; 0207; # LATIN CAPITAL LETTER E WITH INVERTED BREVE
  560 +0208; C; 0209; # LATIN CAPITAL LETTER I WITH DOUBLE GRAVE
  561 +020A; C; 020B; # LATIN CAPITAL LETTER I WITH INVERTED BREVE
  562 +020C; C; 020D; # LATIN CAPITAL LETTER O WITH DOUBLE GRAVE
  563 +020E; C; 020F; # LATIN CAPITAL LETTER O WITH INVERTED BREVE
  564 +0210; C; 0211; # LATIN CAPITAL LETTER R WITH DOUBLE GRAVE
  565 +0212; C; 0213; # LATIN CAPITAL LETTER R WITH INVERTED BREVE
  566 +0214; C; 0215; # LATIN CAPITAL LETTER U WITH DOUBLE GRAVE
  567 +0216; C; 0217; # LATIN CAPITAL LETTER U WITH INVERTED BREVE
  568 +0218; C; 0219; # LATIN CAPITAL LETTER S WITH COMMA BELOW
  569 +021A; C; 021B; # LATIN CAPITAL LETTER T WITH COMMA BELOW
  570 +021C; C; 021D; # LATIN CAPITAL LETTER YOGH
  571 +021E; C; 021F; # LATIN CAPITAL LETTER H WITH CARON
  572 +0220; C; 019E; # LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
  573 +0222; C; 0223; # LATIN CAPITAL LETTER OU
  574 +0224; C; 0225; # LATIN CAPITAL LETTER Z WITH HOOK
  575 +0226; C; 0227; # LATIN CAPITAL LETTER A WITH DOT ABOVE
  576 +0228; C; 0229; # LATIN CAPITAL LETTER E WITH CEDILLA
  577 +022A; C; 022B; # LATIN CAPITAL LETTER O WITH DIAERESIS AND MACRON
  578 +022C; C; 022D; # LATIN CAPITAL LETTER O WITH TILDE AND MACRON
  579 +022E; C; 022F; # LATIN CAPITAL LETTER O WITH DOT ABOVE
  580 +0230; C; 0231; # LATIN CAPITAL LETTER O WITH DOT ABOVE AND MACRON
  581 +0232; C; 0233; # LATIN CAPITAL LETTER Y WITH MACRON
  582 +023A; C; 2C65; # LATIN CAPITAL LETTER A WITH STROKE
  583 +023B; C; 023C; # LATIN CAPITAL LETTER C WITH STROKE
  584 +023D; C; 019A; # LATIN CAPITAL LETTER L WITH BAR
  585 +023E; C; 2C66; # LATIN CAPITAL LETTER T WITH DIAGONAL STROKE
  586 +0241; C; 0242; # LATIN CAPITAL LETTER GLOTTAL STOP
  587 +0243; C; 0180; # LATIN CAPITAL LETTER B WITH STROKE
  588 +0244; C; 0289; # LATIN CAPITAL LETTER U BAR
  589 +0245; C; 028C; # LATIN CAPITAL LETTER TURNED V
  590 +0246; C; 0247; # LATIN CAPITAL LETTER E WITH STROKE
  591 +0248; C; 0249; # LATIN CAPITAL LETTER J WITH STROKE
  592 +024A; C; 024B; # LATIN CAPITAL LETTER SMALL Q WITH HOOK TAIL
  593 +024C; C; 024D; # LATIN CAPITAL LETTER R WITH STROKE
  594 +024E; C; 024F; # LATIN CAPITAL LETTER Y WITH STROKE
  595 +0345; C; 03B9; # COMBINING GREEK YPOGEGRAMMENI
  596 +0386; C; 03AC; # GREEK CAPITAL LETTER ALPHA WITH TONOS
  597 +0388; C; 03AD; # GREEK CAPITAL LETTER EPSILON WITH TONOS
  598 +0389; C; 03AE; # GREEK CAPITAL LETTER ETA WITH TONOS
  599 +038A; C; 03AF; # GREEK CAPITAL LETTER IOTA WITH TONOS
  600 +038C; C; 03CC; # GREEK CAPITAL LETTER OMICRON WITH TONOS
  601 +038E; C; 03CD; # GREEK CAPITAL LETTER UPSILON WITH TONOS
  602 +038F; C; 03CE; # GREEK CAPITAL LETTER OMEGA WITH TONOS
  603 +0390; F; 03B9 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
  604 +0391; C; 03B1; # GREEK CAPITAL LETTER ALPHA
  605 +0392; C; 03B2; # GREEK CAPITAL LETTER BETA
  606 +0393; C; 03B3; # GREEK CAPITAL LETTER GAMMA
  607 +0394; C; 03B4; # GREEK CAPITAL LETTER DELTA
  608 +0395; C; 03B5; # GREEK CAPITAL LETTER EPSILON
  609 +0396; C; 03B6; # GREEK CAPITAL LETTER ZETA
  610 +0397; C; 03B7; # GREEK CAPITAL LETTER ETA
  611 +0398; C; 03B8; # GREEK CAPITAL LETTER THETA
  612 +0399; C; 03B9; # GREEK CAPITAL LETTER IOTA
  613 +039A; C; 03BA; # GREEK CAPITAL LETTER KAPPA
  614 +039B; C; 03BB; # GREEK CAPITAL LETTER LAMDA
  615 +039C; C; 03BC; # GREEK CAPITAL LETTER MU
  616 +039D; C; 03BD; # GREEK CAPITAL LETTER NU
  617 +039E; C; 03BE; # GREEK CAPITAL LETTER XI
  618 +039F; C; 03BF; # GREEK CAPITAL LETTER OMICRON
  619 +03A0; C; 03C0; # GREEK CAPITAL LETTER PI
  620 +03A1; C; 03C1; # GREEK CAPITAL LETTER RHO
  621 +03A3; C; 03C3; # GREEK CAPITAL LETTER SIGMA
  622 +03A4; C; 03C4; # GREEK CAPITAL LETTER TAU
  623 +03A5; C; 03C5; # GREEK CAPITAL LETTER UPSILON
  624 +03A6; C; 03C6; # GREEK CAPITAL LETTER PHI
  625 +03A7; C; 03C7; # GREEK CAPITAL LETTER CHI
  626 +03A8; C; 03C8; # GREEK CAPITAL LETTER PSI
  627 +03A9; C; 03C9; # GREEK CAPITAL LETTER OMEGA
  628 +03AA; C; 03CA; # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
  629 +03AB; C; 03CB; # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
  630 +03B0; F; 03C5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
  631 +03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA
  632 +03D0; C; 03B2; # GREEK BETA SYMBOL
  633 +03D1; C; 03B8; # GREEK THETA SYMBOL
  634 +03D5; C; 03C6; # GREEK PHI SYMBOL
  635 +03D6; C; 03C0; # GREEK PI SYMBOL
  636 +03D8; C; 03D9; # GREEK LETTER ARCHAIC KOPPA
  637 +03DA; C; 03DB; # GREEK LETTER STIGMA
  638 +03DC; C; 03DD; # GREEK LETTER DIGAMMA
  639 +03DE; C; 03DF; # GREEK LETTER KOPPA
  640 +03E0; C; 03E1; # GREEK LETTER SAMPI
  641 +03E2; C; 03E3; # COPTIC CAPITAL LETTER SHEI
  642 +03E4; C; 03E5; # COPTIC CAPITAL LETTER FEI
  643 +03E6; C; 03E7; # COPTIC CAPITAL LETTER KHEI
  644 +03E8; C; 03E9; # COPTIC CAPITAL LETTER HORI
  645 +03EA; C; 03EB; # COPTIC CAPITAL LETTER GANGIA
  646 +03EC; C; 03ED; # COPTIC CAPITAL LETTER SHIMA
  647 +03EE; C; 03EF; # COPTIC CAPITAL LETTER DEI
  648 +03F0; C; 03BA; # GREEK KAPPA SYMBOL
  649 +03F1; C; 03C1; # GREEK RHO SYMBOL
  650 +03F4; C; 03B8; # GREEK CAPITAL THETA SYMBOL
  651 +03F5; C; 03B5; # GREEK LUNATE EPSILON SYMBOL
  652 +03F7; C; 03F8; # GREEK CAPITAL LETTER SHO
  653 +03F9; C; 03F2; # GREEK CAPITAL LUNATE SIGMA SYMBOL
  654 +03FA; C; 03FB; # GREEK CAPITAL LETTER SAN
  655 +03FD; C; 037B; # GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL
  656 +03FE; C; 037C; # GREEK CAPITAL DOTTED LUNATE SIGMA SYMBOL
  657 +03FF; C; 037D; # GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL
  658 +0400; C; 0450; # CYRILLIC CAPITAL LETTER IE WITH GRAVE
  659 +0401; C; 0451; # CYRILLIC CAPITAL LETTER IO
  660 +0402; C; 0452; # CYRILLIC CAPITAL LETTER DJE
  661 +0403; C; 0453; # CYRILLIC CAPITAL LETTER GJE
  662 +0404; C; 0454; # CYRILLIC CAPITAL LETTER UKRAINIAN IE
  663 +0405; C; 0455; # CYRILLIC CAPITAL LETTER DZE
  664 +0406; C; 0456; # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
  665 +0407; C; 0457; # CYRILLIC CAPITAL LETTER YI
  666 +0408; C; 0458; # CYRILLIC CAPITAL LETTER JE
  667 +0409; C; 0459; # CYRILLIC CAPITAL LETTER LJE
  668 +040A; C; 045A; # CYRILLIC CAPITAL LETTER NJE
  669 +040B; C; 045B; # CYRILLIC CAPITAL LETTER TSHE
  670 +040C; C; 045C; # CYRILLIC CAPITAL LETTER KJE
  671 +040D; C; 045D; # CYRILLIC CAPITAL LETTER I WITH GRAVE
  672 +040E; C; 045E; # CYRILLIC CAPITAL LETTER SHORT U
  673 +040F; C; 045F; # CYRILLIC CAPITAL LETTER DZHE
  674 +0410; C; 0430; # CYRILLIC CAPITAL LETTER A
  675 +0411; C; 0431; # CYRILLIC CAPITAL LETTER BE
  676 +0412; C; 0432; # CYRILLIC CAPITAL LETTER VE
  677 +0413; C; 0433; # CYRILLIC CAPITAL LETTER GHE
  678 +0414; C; 0434; # CYRILLIC CAPITAL LETTER DE
  679 +0415; C; 0435; # CYRILLIC CAPITAL LETTER IE
  680 +0416; C; 0436; # CYRILLIC CAPITAL LETTER ZHE
  681 +0417; C; 0437; # CYRILLIC CAPITAL LETTER ZE
  682 +0418; C; 0438; # CYRILLIC CAPITAL LETTER I
  683 +0419; C; 0439; # CYRILLIC CAPITAL LETTER SHORT I
  684 +041A; C; 043A; # CYRILLIC CAPITAL LETTER KA
  685 +041B; C; 043B; # CYRILLIC CAPITAL LETTER EL
  686 +041C; C; 043C; # CYRILLIC CAPITAL LETTER EM
  687 +041D; C; 043D; # CYRILLIC CAPITAL LETTER EN
  688 +041E; C; 043E; # CYRILLIC CAPITAL LETTER O
  689 +041F; C; 043F; # CYRILLIC CAPITAL LETTER PE
  690 +0420; C; 0440; # CYRILLIC CAPITAL LETTER ER
  691 +0421; C; 0441; # CYRILLIC CAPITAL LETTER ES
  692 +0422; C; 0442; # CYRILLIC CAPITAL LETTER TE
  693 +0423; C; 0443; # CYRILLIC CAPITAL LETTER U
  694 +0424; C; 0444; # CYRILLIC CAPITAL LETTER EF
  695 +0425; C; 0445; # CYRILLIC CAPITAL LETTER HA
  696 +0426; C; 0446; # CYRILLIC CAPITAL LETTER TSE
  697 +0427; C; 0447; # CYRILLIC CAPITAL LETTER CHE
  698 +0428; C; 0448; # CYRILLIC CAPITAL LETTER SHA
  699 +0429; C; 0449; # CYRILLIC CAPITAL LETTER SHCHA
  700 +042A; C; 044A; # CYRILLIC CAPITAL LETTER HARD SIGN
  701 +042B; C; 044B; # CYRILLIC CAPITAL LETTER YERU
  702 +042C; C; 044C; # CYRILLIC CAPITAL LETTER SOFT SIGN
  703 +042D; C; 044D; # CYRILLIC CAPITAL LETTER E
  704 +042E; C; 044E; # CYRILLIC CAPITAL LETTER YU
  705 +042F; C; 044F; # CYRILLIC CAPITAL LETTER YA
  706 +0460; C; 0461; # CYRILLIC CAPITAL LETTER OMEGA
  707 +0462; C; 0463; # CYRILLIC CAPITAL LETTER YAT
  708 +0464; C; 0465; # CYRILLIC CAPITAL LETTER IOTIFIED E
  709 +0466; C; 0467; # CYRILLIC CAPITAL LETTER LITTLE YUS
  710 +0468; C; 0469; # CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS
  711 +046A; C; 046B; # CYRILLIC CAPITAL LETTER BIG YUS
  712 +046C; C; 046D; # CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS
  713 +046E; C; 046F; # CYRILLIC CAPITAL LETTER KSI
  714 +0470; C; 0471; # CYRILLIC CAPITAL LETTER PSI
  715 +0472; C; 0473; # CYRILLIC CAPITAL LETTER FITA
  716 +0474; C; 0475; # CYRILLIC CAPITAL LETTER IZHITSA
  717 +0476; C; 0477; # CYRILLIC CAPITAL LETTER IZHITSA WITH DOUBLE GRAVE ACCENT
  718 +0478; C; 0479; # CYRILLIC CAPITAL LETTER UK
  719 +047A; C; 047B; # CYRILLIC CAPITAL LETTER ROUND OMEGA
  720 +047C; C; 047D; # CYRILLIC CAPITAL LETTER OMEGA WITH TITLO
  721 +047E; C; 047F; # CYRILLIC CAPITAL LETTER OT
  722 +0480; C; 0481; # CYRILLIC CAPITAL LETTER KOPPA
  723 +048A; C; 048B; # CYRILLIC CAPITAL LETTER SHORT I WITH TAIL
  724 +048C; C; 048D; # CYRILLIC CAPITAL LETTER SEMISOFT SIGN
  725 +048E; C; 048F; # CYRILLIC CAPITAL LETTER ER WITH TICK
  726 +0490; C; 0491; # CYRILLIC CAPITAL LETTER GHE WITH UPTURN
  727 +0492; C; 0493; # CYRILLIC CAPITAL LETTER GHE WITH STROKE
  728 +0494; C; 0495; # CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
  729 +0496; C; 0497; # CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
  730 +0498; C; 0499; # CYRILLIC CAPITAL LETTER ZE WITH DESCENDER
  731 +049A; C; 049B; # CYRILLIC CAPITAL LETTER KA WITH DESCENDER
  732 +049C; C; 049D; # CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE
  733 +049E; C; 049F; # CYRILLIC CAPITAL LETTER KA WITH STROKE
  734 +04A0; C; 04A1; # CYRILLIC CAPITAL LETTER BASHKIR KA
  735 +04A2; C; 04A3; # CYRILLIC CAPITAL LETTER EN WITH DESCENDER
  736 +04A4; C; 04A5; # CYRILLIC CAPITAL LIGATURE EN GHE
  737 +04A6; C; 04A7; # CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
  738 +04A8; C; 04A9; # CYRILLIC CAPITAL LETTER ABKHASIAN HA
  739 +04AA; C; 04AB; # CYRILLIC CAPITAL LETTER ES WITH DESCENDER
  740 +04AC; C; 04AD; # CYRILLIC CAPITAL LETTER TE WITH DESCENDER
  741 +04AE; C; 04AF; # CYRILLIC CAPITAL LETTER STRAIGHT U
  742 +04B0; C; 04B1; # CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE
  743 +04B2; C; 04B3; # CYRILLIC CAPITAL LETTER HA WITH DESCENDER
  744 +04B4; C; 04B5; # CYRILLIC CAPITAL LIGATURE TE TSE
  745 +04B6; C; 04B7; # CYRILLIC CAPITAL LETTER CHE WITH DESCENDER
  746 +04B8; C; 04B9; # CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE
  747 +04BA; C; 04BB; # CYRILLIC CAPITAL LETTER SHHA
  748 +04BC; C; 04BD; # CYRILLIC CAPITAL LETTER ABKHASIAN CHE
  749 +04BE; C; 04BF; # CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER
  750 +04C0; C; 04CF; # CYRILLIC LETTER PALOCHKA
  751 +04C1; C; 04C2; # CYRILLIC CAPITAL LETTER ZHE WITH BREVE
  752 +04C3; C; 04C4; # CYRILLIC CAPITAL LETTER KA WITH HOOK
  753 +04C5; C; 04C6; # CYRILLIC CAPITAL LETTER EL WITH TAIL
  754 +04C7; C; 04C8; # CYRILLIC CAPITAL LETTER EN WITH HOOK
  755 +04C9; C; 04CA; # CYRILLIC CAPITAL LETTER EN WITH TAIL
  756 +04CB; C; 04CC; # CYRILLIC CAPITAL LETTER KHAKASSIAN CHE
  757 +04CD; C; 04CE; # CYRILLIC CAPITAL LETTER EM WITH TAIL
  758 +04D0; C; 04D1; # CYRILLIC CAPITAL LETTER A WITH BREVE
  759 +04D2; C; 04D3; # CYRILLIC CAPITAL LETTER A WITH DIAERESIS
  760 +04D4; C; 04D5; # CYRILLIC CAPITAL LIGATURE A IE
  761 +04D6; C; 04D7; # CYRILLIC CAPITAL LETTER IE WITH BREVE
  762 +04D8; C; 04D9; # CYRILLIC CAPITAL LETTER SCHWA
  763 +04DA; C; 04DB; # CYRILLIC CAPITAL LETTER SCHWA WITH DIAERESIS
  764 +04DC; C; 04DD; # CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS
  765 +04DE; C; 04DF; # CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS
  766 +04E0; C; 04E1; # CYRILLIC CAPITAL LETTER ABKHASIAN DZE
  767 +04E2; C; 04E3; # CYRILLIC CAPITAL LETTER I WITH MACRON
  768 +04E4; C; 04E5; # CYRILLIC CAPITAL LETTER I WITH DIAERESIS
  769 +04E6; C; 04E7; # CYRILLIC CAPITAL LETTER O WITH DIAERESIS
  770 +04E8; C; 04E9; # CYRILLIC CAPITAL LETTER BARRED O
  771 +04EA; C; 04EB; # CYRILLIC CAPITAL LETTER BARRED O WITH DIAERESIS
  772 +04EC; C; 04ED; # CYRILLIC CAPITAL LETTER E WITH DIAERESIS
  773 +04EE; C; 04EF; # CYRILLIC CAPITAL LETTER U WITH MACRON
  774 +04F0; C; 04F1; # CYRILLIC CAPITAL LETTER U WITH DIAERESIS
  775 +04F2; C; 04F3; # CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE
  776 +04F4; C; 04F5; # CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS
  777 +04F6; C; 04F7; # CYRILLIC CAPITAL LETTER GHE WITH DESCENDER
  778 +04F8; C; 04F9; # CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
  779 +04FA; C; 04FB; # CYRILLIC CAPITAL LETTER GHE WITH STROKE AND HOOK
  780 +04FC; C; 04FD; # CYRILLIC CAPITAL LETTER HA WITH HOOK
  781 +04FE; C; 04FF; # CYRILLIC CAPITAL LETTER HA WITH STROKE
  782 +0500; C; 0501; # CYRILLIC CAPITAL LETTER KOMI DE
  783 +0502; C; 0503; # CYRILLIC CAPITAL LETTER KOMI DJE
  784 +0504; C; 0505; # CYRILLIC CAPITAL LETTER KOMI ZJE
  785 +0506; C; 0507; # CYRILLIC CAPITAL LETTER KOMI DZJE
  786 +0508; C; 0509; # CYRILLIC CAPITAL LETTER KOMI LJE
  787 +050A; C; 050B; # CYRILLIC CAPITAL LETTER KOMI NJE
  788 +050C; C; 050D; # CYRILLIC CAPITAL LETTER KOMI SJE
  789 +050E; C; 050F; # CYRILLIC CAPITAL LETTER KOMI TJE
  790 +0510; C; 0511; # CYRILLIC CAPITAL LETTER REVERSED ZE
  791 +0512; C; 0513; # CYRILLIC CAPITAL LETTER EL WITH HOOK
  792 +0531; C; 0561; # ARMENIAN CAPITAL LETTER AYB
  793 +0532; C; 0562; # ARMENIAN CAPITAL LETTER BEN
  794 +0533; C; 0563; # ARMENIAN CAPITAL LETTER GIM
  795 +0534; C; 0564; # ARMENIAN CAPITAL LETTER DA
  796 +0535; C; 0565; # ARMENIAN CAPITAL LETTER ECH
  797 +0536; C; 0566; # ARMENIAN CAPITAL LETTER ZA
  798 +0537; C; 0567; # ARMENIAN CAPITAL LETTER EH
  799 +0538; C; 0568; # ARMENIAN CAPITAL LETTER ET
  800 +0539; C; 0569; # ARMENIAN CAPITAL LETTER TO
  801 +053A; C; 056A; # ARMENIAN CAPITAL LETTER ZHE
  802 +053B; C; 056B; # ARMENIAN CAPITAL LETTER INI
  803 +053C; C; 056C; # ARMENIAN CAPITAL LETTER LIWN
  804 +053D; C; 056D; # ARMENIAN CAPITAL LETTER XEH
  805 +053E; C; 056E; # ARMENIAN CAPITAL LETTER CA
  806 +053F; C; 056F; # ARMENIAN CAPITAL LETTER KEN
  807 +0540; C; 0570; # ARMENIAN CAPITAL LETTER HO
  808 +0541; C; 0571; # ARMENIAN CAPITAL LETTER JA
  809 +0542; C; 0572; # ARMENIAN CAPITAL LETTER GHAD
  810 +0543; C; 0573; # ARMENIAN CAPITAL LETTER CHEH
  811 +0544; C; 0574; # ARMENIAN CAPITAL LETTER MEN
  812 +0545; C; 0575; # ARMENIAN CAPITAL LETTER YI
  813 +0546; C; 0576; # ARMENIAN CAPITAL LETTER NOW
  814 +0547; C; 0577; # ARMENIAN CAPITAL LETTER SHA
  815 +0548; C; 0578; # ARMENIAN CAPITAL LETTER VO
  816 +0549; C; 0579; # ARMENIAN CAPITAL LETTER CHA
  817 +054A; C; 057A; # ARMENIAN CAPITAL LETTER PEH
  818 +054B; C; 057B; # ARMENIAN CAPITAL LETTER JHEH
  819 +054C; C; 057C; # ARMENIAN CAPITAL LETTER RA
  820 +054D; C; 057D; # ARMENIAN CAPITAL LETTER SEH
  821 +054E; C; 057E; # ARMENIAN CAPITAL LETTER VEW
  822 +054F; C; 057F; # ARMENIAN CAPITAL LETTER TIWN
  823 +0550; C; 0580; # ARMENIAN CAPITAL LETTER REH
  824 +0551; C; 0581; # ARMENIAN CAPITAL LETTER CO
  825 +0552; C; 0582; # ARMENIAN CAPITAL LETTER YIWN
  826 +0553; C; 0583; # ARMENIAN CAPITAL LETTER PIWR
  827 +0554; C; 0584; # ARMENIAN CAPITAL LETTER KEH
  828 +0555; C; 0585; # ARMENIAN CAPITAL LETTER OH
  829 +0556; C; 0586; # ARMENIAN CAPITAL LETTER FEH
  830 +0587; F; 0565 0582; # ARMENIAN SMALL LIGATURE ECH YIWN
  831 +10A0; C; 2D00; # GEORGIAN CAPITAL LETTER AN
  832 +10A1; C; 2D01; # GEORGIAN CAPITAL LETTER BAN
  833 +10A2; C; 2D02; # GEORGIAN CAPITAL LETTER GAN
  834 +10A3; C; 2D03; # GEORGIAN CAPITAL LETTER DON
  835 +10A4; C; 2D04; # GEORGIAN CAPITAL LETTER EN
  836 +10A5; C; 2D05; # GEORGIAN CAPITAL LETTER VIN
  837 +10A6; C; 2D06; # GEORGIAN CAPITAL LETTER ZEN
  838 +10A7; C; 2D07; # GEORGIAN CAPITAL LETTER TAN
  839 +10A8; C; 2D08; # GEORGIAN CAPITAL LETTER IN
  840 +10A9; C; 2D09; # GEORGIAN CAPITAL LETTER KAN
  841 +10AA; C; 2D0A; # GEORGIAN CAPITAL LETTER LAS
  842 +10AB; C; 2D0B; # GEORGIAN CAPITAL LETTER MAN
  843 +10AC; C; 2D0C; # GEORGIAN CAPITAL LETTER NAR
  844 +10AD; C; 2D0D; # GEORGIAN CAPITAL LETTER ON
  845 +10AE; C; 2D0E; # GEORGIAN CAPITAL LETTER PAR
  846 +10AF; C; 2D0F; # GEORGIAN CAPITAL LETTER ZHAR
  847 +10B0; C; 2D10; # GEORGIAN CAPITAL LETTER RAE
  848 +10B1; C; 2D11; # GEORGIAN CAPITAL LETTER SAN
  849 +10B2; C; 2D12; # GEORGIAN CAPITAL LETTER TAR
  850 +10B3; C; 2D13; # GEORGIAN CAPITAL LETTER UN
  851 +10B4; C; 2D14; # GEORGIAN CAPITAL LETTER PHAR
  852 +10B5; C; 2D15; # GEORGIAN CAPITAL LETTER KHAR
  853 +10B6; C; 2D16; # GEORGIAN CAPITAL LETTER GHAN
  854 +10B7; C; 2D17; # GEORGIAN CAPITAL LETTER QAR
  855 +10B8; C; 2D18; # GEORGIAN CAPITAL LETTER SHIN
  856 +10B9; C; 2D19; # GEORGIAN CAPITAL LETTER CHIN
  857 +10BA; C; 2D1A; # GEORGIAN CAPITAL LETTER CAN
  858 +10BB; C; 2D1B; # GEORGIAN CAPITAL LETTER JIL
  859 +10BC; C; 2D1C; # GEORGIAN CAPITAL LETTER CIL
  860 +10BD; C; 2D1D; # GEORGIAN CAPITAL LETTER CHAR
  861 +10BE; C; 2D1E; # GEORGIAN CAPITAL LETTER XAN
  862 +10BF; C; 2D1F; # GEORGIAN CAPITAL LETTER JHAN
  863 +10C0; C; 2D20; # GEORGIAN CAPITAL LETTER HAE
  864 +10C1; C; 2D21; # GEORGIAN CAPITAL LETTER HE
  865 +10C2; C; 2D22; # GEORGIAN CAPITAL LETTER HIE
  866 +10C3; C; 2D23; # GEORGIAN CAPITAL LETTER WE
  867 +10C4; C; 2D24; # GEORGIAN CAPITAL LETTER HAR
  868 +10C5; C; 2D25; # GEORGIAN CAPITAL LETTER HOE
  869 +1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW
  870 +1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE
  871 +1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW
  872 +1E06; C; 1E07; # LATIN CAPITAL LETTER B WITH LINE BELOW
  873 +1E08; C; 1E09; # LATIN CAPITAL LETTER C WITH CEDILLA AND ACUTE
  874 +1E0A; C; 1E0B; # LATIN CAPITAL LETTER D WITH DOT ABOVE
  875 +1E0C; C; 1E0D; # LATIN CAPITAL LETTER D WITH DOT BELOW
  876 +1E0E; C; 1E0F; # LATIN CAPITAL LETTER D WITH LINE BELOW
  877 +1E10; C; 1E11; # LATIN CAPITAL LETTER D WITH CEDILLA
  878 +1E12; C; 1E13; # LATIN CAPITAL LETTER D WITH CIRCUMFLEX BELOW
  879 +1E14; C; 1E15; # LATIN CAPITAL LETTER E WITH MACRON AND GRAVE
  880 +1E16; C; 1E17; # LATIN CAPITAL LETTER E WITH MACRON AND ACUTE
  881 +1E18; C; 1E19; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX BELOW
  882 +1E1A; C; 1E1B; # LATIN CAPITAL LETTER E WITH TILDE BELOW
  883 +1E1C; C; 1E1D; # LATIN CAPITAL LETTER E WITH CEDILLA AND BREVE
  884 +1E1E; C; 1E1F; # LATIN CAPITAL LETTER F WITH DOT ABOVE
  885 +1E20; C; 1E21; # LATIN CAPITAL LETTER G WITH MACRON
  886 +1E22; C; 1E23; # LATIN CAPITAL LETTER H WITH DOT ABOVE
  887 +1E24; C; 1E25; # LATIN CAPITAL LETTER H WITH DOT BELOW
  888 +1E26; C; 1E27; # LATIN CAPITAL LETTER H WITH DIAERESIS
  889 +1E28; C; 1E29; # LATIN CAPITAL LETTER H WITH CEDILLA
  890 +1E2A; C; 1E2B; # LATIN CAPITAL LETTER H WITH BREVE BELOW
  891 +1E2C; C; 1E2D; # LATIN CAPITAL LETTER I WITH TILDE BELOW
  892 +1E2E; C; 1E2F; # LATIN CAPITAL LETTER I WITH DIAERESIS AND ACUTE
  893 +1E30; C; 1E31; # LATIN CAPITAL LETTER K WITH ACUTE
  894 +1E32; C; 1E33; # LATIN CAPITAL LETTER K WITH DOT BELOW
  895 +1E34; C; 1E35; # LATIN CAPITAL LETTER K WITH LINE BELOW
  896 +1E36; C; 1E37; # LATIN CAPITAL LETTER L WITH DOT BELOW
  897 +1E38; C; 1E39; # LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON
  898 +1E3A; C; 1E3B; # LATIN CAPITAL LETTER L WITH LINE BELOW
  899 +1E3C; C; 1E3D; # LATIN CAPITAL LETTER L WITH CIRCUMFLEX BELOW
  900 +1E3E; C; 1E3F; # LATIN CAPITAL LETTER M WITH ACUTE
  901 +1E40; C; 1E41; # LATIN CAPITAL LETTER M WITH DOT ABOVE
  902 +1E42; C; 1E43; # LATIN CAPITAL LETTER M WITH DOT BELOW
  903 +1E44; C; 1E45; # LATIN CAPITAL LETTER N WITH DOT ABOVE
  904 +1E46; C; 1E47; # LATIN CAPITAL LETTER N WITH DOT BELOW
  905 +1E48; C; 1E49; # LATIN CAPITAL LETTER N WITH LINE BELOW
  906 +1E4A; C; 1E4B; # LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW
  907 +1E4C; C; 1E4D; # LATIN CAPITAL LETTER O WITH TILDE AND ACUTE
  908 +1E4E; C; 1E4F; # LATIN CAPITAL LETTER O WITH TILDE AND DIAERESIS
  909 +1E50; C; 1E51; # LATIN CAPITAL LETTER O WITH MACRON AND GRAVE
  910 +1E52; C; 1E53; # LATIN CAPITAL LETTER O WITH MACRON AND ACUTE
  911 +1E54; C; 1E55; # LATIN CAPITAL LETTER P WITH ACUTE
  912 +1E56; C; 1E57; # LATIN CAPITAL LETTER P WITH DOT ABOVE
  913 +1E58; C; 1E59; # LATIN CAPITAL LETTER R WITH DOT ABOVE
  914 +1E5A; C; 1E5B; # LATIN CAPITAL LETTER R WITH DOT BELOW
  915 +1E5C; C; 1E5D; # LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON
  916 +1E5E; C; 1E5F; # LATIN CAPITAL LETTER R WITH LINE BELOW
  917 +1E60; C; 1E61; # LATIN CAPITAL LETTER S WITH DOT ABOVE
  918 +1E62; C; 1E63; # LATIN CAPITAL LETTER S WITH DOT BELOW
  919 +1E64; C; 1E65; # LATIN CAPITAL LETTER S WITH ACUTE AND DOT ABOVE
  920 +1E66; C; 1E67; # LATIN CAPITAL LETTER S WITH CARON AND DOT ABOVE
  921 +1E68; C; 1E69; # LATIN CAPITAL LETTER S WITH DOT BELOW AND DOT ABOVE
  922 +1E6A; C; 1E6B; # LATIN CAPITAL LETTER T WITH DOT ABOVE
  923 +1E6C; C; 1E6D; # LATIN CAPITAL LETTER T WITH DOT BELOW
  924 +1E6E; C; 1E6F; # LATIN CAPITAL LETTER T WITH LINE BELOW
  925 +1E70; C; 1E71; # LATIN CAPITAL LETTER T WITH CIRCUMFLEX BELOW
  926 +1E72; C; 1E73; # LATIN CAPITAL LETTER U WITH DIAERESIS BELOW
  927 +1E74; C; 1E75; # LATIN CAPITAL LETTER U WITH TILDE BELOW
  928 +1E76; C; 1E77; # LATIN CAPITAL LETTER U WITH CIRCUMFLEX BELOW
  929 +1E78; C; 1E79; # LATIN CAPITAL LETTER U WITH TILDE AND ACUTE
  930 +1E7A; C; 1E7B; # LATIN CAPITAL LETTER U WITH MACRON AND DIAERESIS
  931 +1E7C; C; 1E7D; # LATIN CAPITAL LETTER V WITH TILDE
  932 +1E7E; C; 1E7F; # LATIN CAPITAL LETTER V WITH DOT BELOW
  933 +1E80; C; 1E81; # LATIN CAPITAL LETTER W WITH GRAVE
  934 +1E82; C; 1E83; # LATIN CAPITAL LETTER W WITH ACUTE
  935 +1E84; C; 1E85; # LATIN CAPITAL LETTER W WITH DIAERESIS
  936 +1E86; C; 1E87; # LATIN CAPITAL LETTER W WITH DOT ABOVE
  937 +1E88; C; 1E89; # LATIN CAPITAL LETTER W WITH DOT BELOW
  938 +1E8A; C; 1E8B; # LATIN CAPITAL LETTER X WITH DOT ABOVE
  939 +1E8C; C; 1E8D; # LATIN CAPITAL LETTER X WITH DIAERESIS
  940 +1E8E; C; 1E8F; # LATIN CAPITAL LETTER Y WITH DOT ABOVE
  941 +1E90; C; 1E91; # LATIN CAPITAL LETTER Z WITH CIRCUMFLEX
  942 +1E92; C; 1E93; # LATIN CAPITAL LETTER Z WITH DOT BELOW
  943 +1E94; C; 1E95; # LATIN CAPITAL LETTER Z WITH LINE BELOW
  944 +1E96; F; 0068 0331; # LATIN SMALL LETTER H WITH LINE BELOW
  945 +1E97; F; 0074 0308; # LATIN SMALL LETTER T WITH DIAERESIS
  946 +1E98; F; 0077 030A; # LATIN SMALL LETTER W WITH RING ABOVE
  947 +1E99; F; 0079 030A; # LATIN SMALL LETTER Y WITH RING ABOVE
  948 +1E9A; F; 0061 02BE; # LATIN SMALL LETTER A WITH RIGHT HALF RING
  949 +1E9B; C; 1E61; # LATIN SMALL LETTER LONG S WITH DOT ABOVE
  950 +1EA0; C; 1EA1; # LATIN CAPITAL LETTER A WITH DOT BELOW
  951 +1EA2; C; 1EA3; # LATIN CAPITAL LETTER A WITH HOOK ABOVE
  952 +1EA4; C; 1EA5; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND ACUTE
  953 +1EA6; C; 1EA7; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE
  954 +1EA8; C; 1EA9; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE
  955 +1EAA; C; 1EAB; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND TILDE
  956 +1EAC; C; 1EAD; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW
  957 +1EAE; C; 1EAF; # LATIN CAPITAL LETTER A WITH BREVE AND ACUTE
  958 +1EB0; C; 1EB1; # LATIN CAPITAL LETTER A WITH BREVE AND GRAVE
  959 +1EB2; C; 1EB3; # LATIN CAPITAL LETTER A WITH BREVE AND HOOK ABOVE
  960 +1EB4; C; 1EB5; # LATIN CAPITAL LETTER A WITH BREVE AND TILDE
  961 +1EB6; C; 1EB7; # LATIN CAPITAL LETTER A WITH BREVE AND DOT BELOW
  962 +1EB8; C; 1EB9; # LATIN CAPITAL LETTER E WITH DOT BELOW
  963 +1EBA; C; 1EBB; # LATIN CAPITAL LETTER E WITH HOOK ABOVE
  964 +1EBC; C; 1EBD; # LATIN CAPITAL LETTER E WITH TILDE
  965 +1EBE; C; 1EBF; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND ACUTE
  966 +1EC0; C; 1EC1; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND GRAVE
  967 +1EC2; C; 1EC3; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE
  968 +1EC4; C; 1EC5; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND TILDE
  969 +1EC6; C; 1EC7; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND DOT BELOW
  970 +1EC8; C; 1EC9; # LATIN CAPITAL LETTER I WITH HOOK ABOVE
  971 +1ECA; C; 1ECB; # LATIN CAPITAL LETTER I WITH DOT BELOW
  972 +1ECC; C; 1ECD; # LATIN CAPITAL LETTER O WITH DOT BELOW
  973 +1ECE; C; 1ECF; # LATIN CAPITAL LETTER O WITH HOOK ABOVE
  974 +1ED0; C; 1ED1; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND ACUTE
  975 +1ED2; C; 1ED3; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND GRAVE
  976 +1ED4; C; 1ED5; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE
  977 +1ED6; C; 1ED7; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND TILDE
  978 +1ED8; C; 1ED9; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND DOT BELOW
  979 +1EDA; C; 1EDB; # LATIN CAPITAL LETTER O WITH HORN AND ACUTE
  980 +1EDC; C; 1EDD; # LATIN CAPITAL LETTER O WITH HORN AND GRAVE
  981 +1EDE; C; 1EDF; # LATIN CAPITAL LETTER O WITH HORN AND HOOK ABOVE
  982 +1EE0; C; 1EE1; # LATIN CAPITAL LETTER O WITH HORN AND TILDE
  983 +1EE2; C; 1EE3; # LATIN CAPITAL LETTER O WITH HORN AND DOT BELOW
  984 +1EE4; C; 1EE5; # LATIN CAPITAL LETTER U WITH DOT BELOW
  985 +1EE6; C; 1EE7; # LATIN CAPITAL LETTER U WITH HOOK ABOVE
  986 +1EE8; C; 1EE9; # LATIN CAPITAL LETTER U WITH HORN AND ACUTE
  987 +1EEA; C; 1EEB; # LATIN CAPITAL LETTER U WITH HORN AND GRAVE
  988 +1EEC; C; 1EED; # LATIN CAPITAL LETTER U WITH HORN AND HOOK ABOVE
  989 +1EEE; C; 1EEF; # LATIN CAPITAL LETTER U WITH HORN AND TILDE
  990 +1EF0; C; 1EF1; # LATIN CAPITAL LETTER U WITH HORN AND DOT BELOW
  991 +1EF2; C; 1EF3; # LATIN CAPITAL LETTER Y WITH GRAVE
  992 +1EF4; C; 1EF5; # LATIN CAPITAL LETTER Y WITH DOT BELOW
  993 +1EF6; C; 1EF7; # LATIN CAPITAL LETTER Y WITH HOOK ABOVE
  994 +1EF8; C; 1EF9; # LATIN CAPITAL LETTER Y WITH TILDE
  995 +1F08; C; 1F00; # GREEK CAPITAL LETTER ALPHA WITH PSILI
  996 +1F09; C; 1F01; # GREEK CAPITAL LETTER ALPHA WITH DASIA
  997 +1F0A; C; 1F02; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA
  998 +1F0B; C; 1F03; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA
  999 +1F0C; C; 1F04; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA
  1000 +1F0D; C; 1F05; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA
  1001 +1F0E; C; 1F06; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI
  1002 +1F0F; C; 1F07; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI
  1003 +1F18; C; 1F10; # GREEK CAPITAL LETTER EPSILON WITH PSILI
  1004 +1F19; C; 1F11; # GREEK CAPITAL LETTER EPSILON WITH DASIA
  1005 +1F1A; C; 1F12; # GREEK CAPITAL LETTER EPSILON WITH PSILI AND VARIA
  1006 +1F1B; C; 1F13; # GREEK CAPITAL LETTER EPSILON WITH DASIA AND VARIA
  1007 +1F1C; C; 1F14; # GREEK CAPITAL LETTER EPSILON WITH PSILI AND OXIA
  1008 +1F1D; C; 1F15; # GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA
  1009 +1F28; C; 1F20; # GREEK CAPITAL LETTER ETA WITH PSILI
  1010 +1F29; C; 1F21; # GREEK CAPITAL LETTER ETA WITH DASIA
  1011 +1F2A; C; 1F22; # GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA
  1012 +1F2B; C; 1F23; # GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA
  1013 +1F2C; C; 1F24; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA
  1014 +1F2D; C; 1F25; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA
  1015 +1F2E; C; 1F26; # GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI
  1016 +1F2F; C; 1F27; # GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI
  1017 +1F38; C; 1F30; # GREEK CAPITAL LETTER IOTA WITH PSILI
  1018 +1F39; C; 1F31; # GREEK CAPITAL LETTER IOTA WITH DASIA
  1019 +1F3A; C; 1F32; # GREEK CAPITAL LETTER IOTA WITH PSILI AND VARIA
  1020 +1F3B; C; 1F33; # GREEK CAPITAL LETTER IOTA WITH DASIA AND VARIA
  1021 +1F3C; C; 1F34; # GREEK CAPITAL LETTER IOTA WITH PSILI AND OXIA
  1022 +1F3D; C; 1F35; # GREEK CAPITAL LETTER IOTA WITH DASIA AND OXIA
  1023 +1F3E; C; 1F36; # GREEK CAPITAL LETTER IOTA WITH PSILI AND PERISPOMENI
  1024 +1F3F; C; 1F37; # GREEK CAPITAL LETTER IOTA WITH DASIA AND PERISPOMENI
  1025 +1F48; C; 1F40; # GREEK CAPITAL LETTER OMICRON WITH PSILI
  1026 +1F49; C; 1F41; # GREEK CAPITAL LETTER OMICRON WITH DASIA
  1027 +1F4A; C; 1F42; # GREEK CAPITAL LETTER OMICRON WITH PSILI AND VARIA
  1028 +1F4B; C; 1F43; # GREEK CAPITAL LETTER OMICRON WITH DASIA AND VARIA
  1029 +1F4C; C; 1F44; # GREEK CAPITAL LETTER OMICRON WITH PSILI AND OXIA
  1030 +1F4D; C; 1F45; # GREEK CAPITAL LETTER OMICRON WITH DASIA AND OXIA
  1031 +1F50; F; 03C5 0313; # GREEK SMALL LETTER UPSILON WITH PSILI
  1032 +1F52; F; 03C5 0313 0300; # GREEK SMALL LETTER UPSILON WITH PSILI AND VARIA
  1033 +1F54; F; 03C5 0313 0301; # GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA
  1034 +1F56; F; 03C5 0313 0342; # GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI
  1035 +1F59; C; 1F51; # GREEK CAPITAL LETTER UPSILON WITH DASIA
  1036 +1F5B; C; 1F53; # GREEK CAPITAL LETTER UPSILON WITH DASIA AND VARIA
  1037 +1F5D; C; 1F55; # GREEK CAPITAL LETTER UPSILON WITH DASIA AND OXIA
  1038 +1F5F; C; 1F57; # GREEK CAPITAL LETTER UPSILON WITH DASIA AND PERISPOMENI
  1039 +1F68; C; 1F60; # GREEK CAPITAL LETTER OMEGA WITH PSILI
  1040 +1F69; C; 1F61; # GREEK CAPITAL LETTER OMEGA WITH DASIA
  1041 +1F6A; C; 1F62; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA
  1042 +1F6B; C; 1F63; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA
  1043 +1F6C; C; 1F64; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA
  1044 +1F6D; C; 1F65; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA
  1045 +1F6E; C; 1F66; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI
  1046 +1F6F; C; 1F67; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI
  1047 +1F80; F; 1F00 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
  1048 +1F81; F; 1F01 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAMMENI
  1049 +1F82; F; 1F02 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND YPOGEGRAMMENI
  1050 +1F83; F; 1F03 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA AND YPOGEGRAMMENI
  1051 +1F84; F; 1F04 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA AND YPOGEGRAMMENI
  1052 +1F85; F; 1F05 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA AND YPOGEGRAMMENI
  1053 +1F86; F; 1F06 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI
  1054 +1F87; F; 1F07 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI
  1055 +1F88; F; 1F00 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI
  1056 +1F88; S; 1F80; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI
  1057 +1F89; F; 1F01 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI
  1058 +1F89; S; 1F81; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI
  1059 +1F8A; F; 1F02 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI
  1060 +1F8A; S; 1F82; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI
  1061 +1F8B; F; 1F03 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI
  1062 +1F8B; S; 1F83; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI
  1063 +1F8C; F; 1F04 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI
  1064 +1F8C; S; 1F84; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI
  1065 +1F8D; F; 1F05 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI
  1066 +1F8D; S; 1F85; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI
  1067 +1F8E; F; 1F06 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
  1068 +1F8E; S; 1F86; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
  1069 +1F8F; F; 1F07 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
  1070 +1F8F; S; 1F87; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
  1071 +1F90; F; 1F20 03B9; # GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI
  1072 +1F91; F; 1F21 03B9; # GREEK SMALL LETTER ETA WITH DASIA AND YPOGEGRAMMENI
  1073 +1F92; F; 1F22 03B9; # GREEK SMALL LETTER ETA WITH PSILI AND VARIA AND YPOGEGRAMMENI
  1074