This should absolutely be a part of the compulsory reading at any half decent Computer Science undergrad curricula. Quoting:
So I have an announcement to make: if you are a programmer working in 2003 and you don’t know the basics of characters, character sets, encodings, and Unicode, and I catch you, I’m going to punish you by making you peel onions for 6 months in a submarine. I swear I will.
And one more thing:
IT’S NOT THAT HARD.
And as that last phrase says, it really isn’t all that complicated. It takes 20 minutes to read thoroughly, and it WILL save you a LOT of debugging time in the future (I speak out of experience). The more low level/hardcore explanation is here, for those interested.