Monday, September 6, 2010

Tuesday, August 3, 2010

Microsoft Mines The Web To Build Up Her Online Chinese-English Dictionary

An interesting August 3, 2010 Wall Street Journal article titled "Microsoft Mines Web to Hone Language Tool" about how Microsoft is mining the web for data to buildup her Engkoo.com Chinese web dictionary.

Eventually it'll include machine generated dictation of sample sentences and even machine generated videos of sentences being spoken so that language learners with see the movement of the lips when the words are spoken.

The focus so far is for Chinese learners of English but might eventually go the other way as well.

Plans are also in the works for other languages such as Japanese and English.

http://online.wsj.com/article/SB10001424052748703545604575406771145298614.html

I wonder why it's Engkoo and not Yingku?

Wednesday, July 28, 2010

Unihan & Mojikyo Both Down

I like looking up Chinese character variants but both Mojikyo and Unicode's Unihan Database lookup features are down.

http://www.mojikyo.org/

This is what Mojikyo says:

We are now reconstructing our "mojikyo.org" pages.
The work will be completed in a few year.
Therefore, the download service is discontinued at a while*1.
Please use the actual expenses distribution service of CD-R*2that we are doing.

*1 We expect that it might extend for a considerable long term.
If you cannot wait, it might be good for you to visit the following sites, but those information is not the latest.
Information Technology standards Commission of Japan's IPSJ-TS 0002:2004 "Character Shapes Identification".
*2 However, this service is done only to those who demand in Japan at present.


"The work will be completed in a few year"?

What does that mean?

Will it be completed within a year? Or in a few years?

It's been down for quite a while already.

It says if you can't wait then visit the itscj site but the information isn't the latest.

This really pisses Kobo. I feel like that stand-up comic Lewis Black who's always ranting about something or other in his act. :)

The Unicode Unihan database's look-up feature is down for maintenance and his been down at least 2 days.

http://www.unicode.org/charts/unihanrsindex.html

I hope they get everything up and running soon.

Update: The Unihan look-up feature is back on line.

Tuesday, July 27, 2010

Kobo's Blog: Stardict 3.0.2

My favorite dictionary software program is the fantastic free open source StarDict program available from the sourceforge.net web site.



Available for download here:

http://sourceforge.net/projects/stardict/

I know for us who aren't computer geeks it might be difficult to figure out what files to download but they've now made it a one-click button for the latest auto-installer executable version. No need to worry about source files, etc. Oh, this only applies to the Window version. For others, you've still got to go through the C++ open source stuff.

The developer's web site:

http://stardict.sourceforge.net/

There are tons of free dictionaries available for download.

Even a copy of the Kangxi dictionary. I haven't got it to work because supposedly you need to download and install a DLL (dynamic link library) from one of the files available for download. I'm kind of worried about virus here. More about that later.

They used to have the dictionaries available for download at this link:

http://stardict.sourceforge.net/Dictionaries.php

But, now, they've all since been removed.

I had a bitch of a time finding where they'd been moved to.

They've got a forum but it doesn't seem to be moderated so it's full of spam, some porn and many unanswered questions.

http://www.stardict.org/forum/

A few of the developers do visit the forum but only once in a blue moon. It was very difficult to wade through all the crap to get to the gold.

I found that the dictionary files are now housed at:

http://www.huzheng.org/stardict-iso/stardict-dic/

They've moved things around a bit since I last visited. Might even have removed some of the dictionary files though I'm not certain. I haven't an editic memory you know. :)

Hu Zheng is one of the developer's working on the project. Though, I don't think he's the originator of the program. He and the originator of the program are both Chinese mainlanders so their English isn't all that.

I think a guy from Stanford University joined the project and helped to clean up some of the English. At least one of the help web pages gives him credit.

You might want to also go to the upper levels of the huzheng.org site to see what other interesting things Hu's working on. Mostly Linux but there is an interesting Chinese handwriting recognition IME (input method editor) project.

The site is in simplified Chinese though.

There are also two huge zip files of Chinese dictionaries available all bundled together at the free file hosting web site rapidshare.com

http://rapidshare.com/files/370422644/zh_CN_all1.zip (30 dictionary files)
http://rapidshare.com/files/370427660/zh_Cn_all2.zip (28 dictionary files)

Don't know if these are different from the ones available at huzheng.org or if there are any overlap.

There are a lot more "simplified" character dictionaries available than "traditional" character ones but there are plenty of both to meet most people's Chinese needs.

Another good thing about the StarDict project is that you can make your own dictionaries.

A pet peeve of mine about most of the programs available for Chinese learning is that they don't include an easy method for making your own user dictionary.

That's why I like the freeware CQuickTrans program.

Their dictionary files are just plain text files. You just include it in the program folder and when you run the program it automatically takes your text file and spits out a dictionary file in a format that works with the program.

Okay, you can't include Chinese characters in the glosses and it doesn't handle the newer Unicode CJK Unified Ideographs Extensions but beggars can't be choosers. It's priced right. Free!

I felt enough about the program that I actually paid to register it for the rest of the features.

At least it doesn't make you jump through hoops. Download C++. Learn C++. Learn Java, Python, etc. Lie down, roll over, fetch.

Anyway, the CQuickTrans download used to be at www.coolest.com but now when I click on the link at the Help/About window of the program it automatically re-directs to:

http://www.postmeta.com/

CQuickTrans is the Chinese language related program that I use the most. I enjoy typing out dictionaries and have made tons of them for the program.

Back to StarDict.

StarDict has a ton of tools to work with the programs. Okay most of them are python or C++ files so you do have to be a computer geek to use them but their dictionary editor program is an executable not a source code (though I guess the source is also available for download).

It's the stardict-editor-3.0.1.rar file available for download at this page:

http://sourceforge.net/projects/stardict/files/

Okay back to the virus bit I alluded to above.

Usually when I download anything from the Internet, even sites that say they guarantee their stuff to be virus free, I run a virus scan program.

In the case of the StarDict tools, I ran the free edition of the AVG Anti-Virus software program and got this:




A trojan horse.

I don't know if this is a false positive or what.

I would have expected better from sourceforge.

I was planning to e-mail them or the developers but as you can see from the image the date of the download was sometime in February.

If anyone does plan on downloading KSDrip.exe and do contact them about it please post a comment.

If I'm not too lazy and get around to e-mailing them I'll post an update. But Kobo's a very lazy person. See how long it's taken for me to get a blog up since my previous one. We're talking glacial in slowness. :)

Anyway, you probably wouldn't need to download KSDrip.exe anyway.

It's for converting Kingsoft's Power Word dictionary files into the StarDict format and those dictionary files are already available in Stardict format for download at the huzheng.org site. Though they don't seem to be included with the rapidshare files.

The only tool that'd I'd really want to work with would be the stardict-editor-3.0.1 program because I like making personal dictionaries. But the virus warning has got me skittish.

If anyone who knows programming real well and is able to check out the source file please report back in the comments. I'd really like to be able to make my own dictionaries for the program.

If you do want to work with the editor you might want to check this web page:

http://filosofie.unibuc.ro/~solcan/wt/gnu/s/stardict.html

It's an explanation on how to use the stardict-editor program. Unfortunately, the guy, who is a professor at a Romanian university, is not a native English speaker so his writing is not exactly clear and also he's using Linux so...

It's put on your thinking cap time.

You have to really read carefully and pore over each sentence very carefully.

I haven't tried out the editor but basically it seems you run the Stardict-Editor.

Tell it the name of the file you want to work with. Only one of the files is the actual dictionary data. The editor will decompile the data into a Unicode text file.

You edit it. Even erase everything and put in your own dictionary data if you desire.

Get it to compile the text file back into a dictionary file. Put the dictionary files into where you keep the dictionaries and it should work.

I haven't actually done it myself but for those who are interested in making their own dictionaries for the StarDict program this is the way to go.

Or you could write to the StarDict develpers. At one of their forum replies they do solicit any interesting dictionaries that forumites wish to have converted for the StarDict software and they'd gladly do the conversion and include it for download. Just as long as it's not copyrighted.

I remember a guy posted at the chinese-forums.com that he's gotten a Chinese-French dictionary project started and was asking about the format for dictionary files with the .dict file extension.

This might be a good program for him to port his dictionary project over to.

I'm amazed that there isn't a good free Chinese-French dictionary available earlier. He said there was an earlier project but it seems to have stalled only after a few thousand entries. I think this new project is making a better go at it.

I'm also amazed there isn't a good free Chinese-Spanish dictionary project going. I read somewhere a guy was glad that there was a new Chinese language learning site available now that includes material for Spanish speakers interested in Chinese. Unfortunately it's a paid site and they only have a limited amount of material for free. But the monthly rate is only about $7 American.

Oh, one more thing before I end this post. At the huzheng.org dictionary download site they link to "a project to unite all existing open dictionaries and provide both users and developers with universal XML-based format, convertible to and from other popular dictionary formats".

Don't know what that is but it might be useful for those who know these things.

Here's the link:

http://xdxf.sourceforge.net/

Sorry to those who thought this was going to be a blog post on Mr. Spock, James T. Kirk, Dr. McCoy, Scotty, and the rest of the Star Trek gang.

Setting Up A Blog Is A Lot Harder Than It Seems


It sure is a lot harder to set up a blog than one would imagine. Just getting an image on my profile page is a chore. I tried to place the above photo into my profile but they wouldn't accept it.

Is it because of its size? Too big?

And they don't even give an explanation as to why it wouldn't take. Or even whether a picture is uploaded or not.

Monday, July 26, 2010

INTRODUCTION

Hello all! Welcome to my blog, Kobo-Daishi's Adventures in Chinese-Land.

You might remember my previous blog, Kobo's Grotto or something like that, hosted at Yahoo's Geocities.

It was a free site, a part of Yahoo's Geocities. Then they phased out Geocities so the site was removed. I guess they weren't making any money from the sites.

It didn't have much in the way of content.

There was a hack for using the NJStar Chinese Word Processor (the annotation feature) and a copy of an old Chaozhou (Teochew) dialect dictionary from Google's book program that had gone out of copyright and was therefore in the public domain.

When they announced that Geocities was being phased out, the Internet Archive offered to archive the sites of those who wished their sites to be saved.

I hadn't even save a copy for myself.

The site was my first attempt at a blog so I didn't even publicize the site.

I didn't have it added to the Geocities directory, didn't submit it for inclusion in the major search engines by clicking their opt-in button, didn't add my site to my signature at the various forums I frequent, etc.

No web optimization for Kobo.

Anyway, this blog will mainly be stuff that interests me with an emphasis on Chinese language and culture.

There will also occasionally be stuff that interest me that might not necessarily have to do with Chinese.

Something that I've read at a forum or a reaction to something from the media, etc.

I'll try to keep it current.

Chinese is more of a hobby for me and a way for me to get in touch with the heritage of my Chinese ancestors.

I don't get paid for using Chinese or anything so it's not a necessity but I enjoy it.

Mostly I will post on tips that I've accumulated from my decades learning Chinese.

Sometimes my posts will be anecdotal drawn from personal experience but they will also be made generic so as not to be too personal.

I prefer to keep my personal life personal.

My Internet life is for public consumption but I prefer to be anonymous more or less.

That's why the pseudonym.

Kobo-Daishi is more of an artificial construct for me to surf anonymously on the internet.

Hope that you will enjoy this little niche of the Internet that I've carved out.

Kobo-Daishi's Adventures In Chinese-Land. :)