RUnicode is coming along nicely.
I just implemented a few methods for String. One was String#blocks, which returns the names of the blocks codepoints belong to.
# "今日ã¯ãƒˆãƒ å›ã€‚Niall is a ☆.".blocks
# => ["CJK Unified Ideographs", "CJK Unified Ideographs", "Hiragana",
# "Katakana", "Katakana", "CJK Unified Ideographs",
# "CJK Symbols and Punctuation", "Basic Latin", "Basic Latin",
# "Basic Latin", "Basic Latin", "Basic Latin", "Basic Latin", "Basic Latin",
# "Basic Latin", "Basic Latin", "Basic Latin", "Basic Latin",
# "Miscellaneous Symbols", "Basic Latin"]
And then I decided Ruby needed a real String#upcase and String#downcase. The original String#upcase just transliterates ASCII.
The operation is locale insensitive—only characters “a’’ to “z’’ are affected.
My version performs simple uppercase mappings according to the data found in UnicodeData.txt. Although it takes about a year to do it.
# "天空ã®ã‚¨ã‚¹ã‚«ãƒ•ãƒãƒ¼ãƒ TenkÅ« no EsukafurÅne, wörtlich".upcase
# => "天空ã®ã‚¨ã‚¹ã‚«ãƒ•ãƒãƒ¼ãƒ TENKŪ NO ESUKAFURÅŒNE, WÖRTLICH"
String#downcase just calls String#upcase to do its dirty work. And String#upcase! and String#downcase! just use String#replace.
This Ruby Unicoding is rather fun. It’s a good language to work with. It’s just so hackable. Maybe the next step should be to work out how to get String#upcase running in a timeframe similar to the original String#upcase, and then I might like to make some Ruby extensions in C. Although I’m not fond of C, I do think it’s a good choice for low-level things, and this is indeed low-level stuff.
RUnicode started out as one method I needed for my KLookup final year project. It’s growing a little, but I’m keeping it in the KLookup source tree for now. You can check it out with the following command:
svn checkout svn://rubyforge.org/var/svn/klookup
There’s a cute little demo in demo/ which makes use of the String#tr method (which is also available in jcode, I discovered) to convert Arabic numerals into (Japanese-style) kanji numerals. There’s a shell script to print the date from the Ruby script:
二åƒä¸ƒå¹´ä¸€æœˆå…«æ—¥
It only goes up to (10**26)-1 at the moment. If you’re interested, (10**26)-1 (that’s a nine followed by 25 nines) looks like this:
ä¹ä¸‡ä¹åƒä¹ç™¾ä¹åä¹å„„ä¹åƒä¹ç™¾ä¹åä¹ä¸‡ä¹åƒä¹ç™¾ä¹åä¹å…†ä¹ä¸‡ä¹åƒä¹ç™¾ä¹åä¹å„„ä¹åƒä¹ç™¾ä¹åä¹ä¸‡ä¹åƒä¹ç™¾ä¹åä¹
Enough babbling, goodnight.☆