Archive for the ‘Kanji Lookup Tool’ Category

Project

Tuesday, May 22nd, 2007

With one and a half hours until my final year project viva (presentation/demonstration/Q&A - I hadn’t heard the word before this project), I’ve just created a new study tool (in about ten minutes).

It takes a file of characters as input, and prints the meanings, which you then ponder about, and then you hopefully write the correct character down. Then if you want to check, you can uncomment the line that prints the character itself.

require 'rubygems'
require 'klookup'

include KLookup::Lookup

list = []

open('kyouiku3') { |f|
  f.read.each_char { |c|
    if Kanji.exist?(c)
      list << Kanji.new(c)
    end
  }
}

list.each {|k|
  20.times { print '-' }
  puts
#  print k
  puts "t" + k.meaning.join("nt")
}

You’ll need the KLookup library (gem install klookup). And some kanji: try Wikipedia’s 学年別漢字配当表 (kanji lists separated by school year).

I wrote the above before my viva. Now it’s after my viva.

I feel it went well, but my project tutor had to be very supportive and he showed the other lecturer a lot of things, like the ChangeLog (1179 lines, wow), and auxiliary code like the Rakefile, and such. The general impression I got was that I didn’t communicate well enough in the report, and I completely neglected to mention a lot of work I’d done. But it’s all over now. No more university in the immediate future.

KLookup release!

Monday, January 29th, 2007

This post I’m translating from English to Japanese.

このポストに英語から日本語に訳します。

I made a release of KLookup! The second release. Aptly named 0.2 (the first release was called 2006-11-26).

KLookupをリリースでしたよ!二番のリリースだから0.2と呼びます(一番は2006-11-26と呼びました)。

KLookup is my fabulous final year project. It’s coming along really well.

KLookupは俺のすばらしい最後の年のプロジェクトです。中々上達しています。

The easiest way to install it is (with RubyGems):

優しい方法は(RubyGemsで):

gem install klookup

Also, you can check it out via Subversion:

だって、Subversionでもチェックオウトできます。

This release (このリリース): svn co svn://rubyforge.org/var/svn/klookup/tags/REL_0.2 klookup-0.2

The trunk (幹?トランク?胴体?分かりません!): svn co svn://rubyforge.org/var/svn/klookup/trunk klookup

KLookup comes up as the only result for kanji on RubyForge, which is somewhat interesting due to a slight Japanese focus of Ruby in general. It’s quite possible that this sort of tool just wouldn’t be particularly useful to the average Japanese person who’s completed twelve years of education and maybe university.

kanjiを探せばKLookupは只一つの答え。それは面白いとおもいます、Rubyはちょっと日本に中心するから。もしかしてこんな道具は役に立ちませんかな、なみの十二年間の学校や大学校を受けた日本人に。

Ugh. I really need to make a blog CMS myself soon…one that is language-aware.

ああ、もう早いに言語を気がついてるブログCMSを作らなければ成りませんなぁ。

OpenBSD

Friday, December 8th, 2006

Well due to the lack of locale support, I evidently can’t use OpenBSD as a desktop system. I have, however, been running OpenBSD as a server (SSH-able NAS, MPD-able music). It’s a joy to administer, and I’ve just worked out how to set up the default chrooted Apache to serve my klookup project. See 10.16 - Tell me about this chroot(2) Apache?, the trick is copying over the Ruby binaries and libraries into the chroot jail, and the linked libraries with the help of the ldd command.

I was implementing UTF-8 in the tree of klookup until half past midnight last night. It wasn’t even something necessary to the project, but it was fun. I could do with investigating alternatives to what I’m doing. Like writing some Ruby extensions in C, or using Unihan (which is significantly bigger than KANJIDIC, but doesn’t require testing each character’s codepoint to see if it’s kana).

KLookup

Monday, November 27th, 2006

Well I haven’t written much about this, but I’ve spent many hours on it.

My final year project is a multiradical kanji lookup tool, in Ruby.

Just yesterday, KLookup was approved for inclusion on RubyForge.

So now you can check out the source:

svn checkout svn://rubyforge.org/var/svn/klookup

If you don’t like or have Subversion, I have already made a release.

Supposedly it will run (the executables can be run when in the bin/ directory - note that gklookup does nothing) and install (beware of setup.rb’s weird behaviour). Its only dependency should be Ruby (cklookup requires readline, but falls back to gets). Also, klookup.cgi (drop the whole directory into a directory with ExecCGI set) is apparently confusing, so just bear with it for the moment.

Suggestions, fixes, and questions are very welcome. My email address is floating around somewhere on this site, it’s foobar__AT__holizz.com, where foobar is any string you like.

Text-wrangling

Wednesday, October 18th, 2006

Because I needed to get something working in Ruby (I’m prototyping), I’ve been testing the same thing in various languages.

My project (multiradical kanji lookup - it doesn’t have a name yet) will probably take its input directly from radkdict. As a result, each line contains many kanji, so I decided to just split(//) on each line that I needed. ちょっと待て〜! Just doing that gives me a list of the bytes in the file, not a list of the characters. There’s no practical difference in UTF-8 between bytes and characters before you get a fair way away from ASCII, but when you’re looking at characters from the Han block there’s three bytes per character but rules of thumb are not the sort of thing you want to base parsing a text file on.

To cut a long story short, if you set $KCODE to ‘u’, splitting using RegExps will work (but indices still won’t).

So Ruby is FTL (for the loss, not faster than light). I’ll probably still be using Ruby for the project despite its failings. I haven’t even started on the background research (next: other research, analysis, design, and the development is where I start on the non-prototype code), so I still have plenty of time to change my mind.

Python (2.5, of course) and Perl also suffer the same sorry substring stupidity.

Java, on the other hand, works perfectly with substrings. The usual String.substring method works perfectly. When I ask for the first two characters of “英語が話せますか”, it gives me “英語”.

For the past few years Java has been growing on me. Now I’m really starting to think it’s a good thing. I’m looking forward to the day they open source it. gcj/gij are pretty good, but they’re not quite compatible yet.

I noticed jython in Ubuntu’s universe today so I installed it (no JRuby yet). It’s pretty damn swanky, despite the fact that jythonc depends on Python 2.1 (which is over four years old) - so I can’t compile Python classes to Java bytecode. It’s pretty good. You can evaluate Python from Java and access Java classes from Python.

Stream of conciousness journal entries FTW. Splat.

ヤー、おひさしぶり

Saturday, October 14th, 2006

IMEs are the best invention in the world. This week I’ve typed Ï€ and done some box drawing without having to look at 文字マップ (Gucharmap in this case).

Today I learnt Ï€ to 9 digits thanks to the Japanese language and this post. It was HARAGUCHI Akira’s recent escapades into real numbers that made me want to find Ï€ in Japanese phrases. It’s a great language for memorising numbers.

Today I have also been looking at fun things like graphical toolkits. It’s hard to put into words why GTK+ 2 could beat Qt or Tk in a fight. Just say that GTK+ 2 has a wider deployment and hope for the best. And then try to think up a good reason to choose Ruby over Python (I don’t think “Ruby is more Japonz” will cut the mustard in final year project documentation).

I shall probably start some coding of that soon (maybe sooner than my Gantt chart tells me I should be doing - but I’m sure I’ll live).

Well. I should go to bed now so I can go swimming in the morning and then write lots and lots of documentation.

P.S. I subscribe to the Doctor Will documentation methodology: have everything in a wiki then just print it out and call it a report.