Tuesday, 25 August 2015

Stop kissing Crystal and find Grandpabbie!

Surround Sound allows some neat effects, like hearing that creepily dangerous sound from behind you instead of from the screen... but sometimes there can be additional words hidden in the other channels that you otherwise mightn't be able to hear. When Kristoff brings Anna to meet his friends (well, they're more like family), he greets them all, in several cases by name. But since the audience's attention is on Olaf and Anna ("He's cray zee!"), the actual lines are easily lost. Here's what Kristoff is actually saying...

Hey, guys.
You are a sight for sore eyes.
Hey look, Magma's back from vacation!
Rocko's looking sharp, as usual.
Clay, whoa... I don't even recognize you. You lost so much weight!
Didn't realize how much I missed you guys.
Guys, I've got so much to tell you.
Stop kissing Crystal and find Grandpabbie!

So now you know. And knowing, as they say...

Sunday, 3 May 2015

Upgrading Ubuntu Karmic to Debian Jessie

My server had been running an ancient release of Ubuntu for far too long, and I was getting weary of manually patching things and hoping that I could stay on top of everything. So, with Debian Jessie freshly stable, I figured it's high time to upgrade. My options were to wipe the computer and start over, or attempt an upgrade; being certifiably insane, I chose the latter. Herein is notes from what took place this weekend... as a cautionary tale, perhaps!

First and foremost, try it out on a lesser system. (I wasn't quite insane enough - or maybe stupid enough - to just dive in and start fiddling with a live server.) Upgrading Ubuntu Maverick (10.10) to Debian Jessie (8.0) worked out fairly well, with just a few messinesses and complications, all of which also happened with the full upgrade. But there were rather more problems on the live system.

  1. Replace /etc/apt/sources.list with Jessie content. Easy enough. Don't forget to check /etc/apt/sources.list.d/ for any that are now redundant.
  2. Grab the new GPG keys from https://ftp-master.debian.org/keys.html so apt can check signatures.
  3. apt-get update, find out about a few more keys needing to be imported. Grab them with gpg --recv-keys 64481591B98321F9; gpg --armor --export 64481591B98321F9|sudo apt-key add - (after checking their validity in whatever way satisfies your level of paranoia).
  4. Due to some major bootstrapping problems, I couldn't simply apt-get dist-upgrade to do the upgrade. For everything to work, I actually had to do several steps: first, grab the Squeeze (Debian 6) repos, and install a somewhat newer kernel; then reboot into that, and finish the upgrade in single-user mode with broken mounts.
  5. STRONG recommendation: Use apt-get -d dist-upgrade to download all packages into the cache. This operation will complete quite happily, and is not bothered by package conflicts. After that, even if the network connection is broken, package upgrading can continue. At very worst, this just lets you leave the download chugging for a while, and then come back when it's done - saves quite a bit of time when you aren't doing this on a high-bandwidth server, like my first test.

  6. In order to complete the upgrade, I had to first upgrade udev, and then only afterward install a new Linux kernel... which udev requires. This meant a big fat warning about how this was very dangerous, but that I could touch a particular file to override the warning and do the installation - with the proviso that it might trash the system if I rebooted into the running kernel. Fortunately, such did not happen, as I was able to subsequently install a recent kernel, but it was cause to pause.

  7. Above all, this change MUST be done by someone prepared to take responsibility. This can't be managed by a script, it might cause downtime, and you need to have a fail-over ready in case something breaks badly. But hey. What's life without a little adventure... Some people go mountain climbing, I go upgrade climbing.

The part I was most impressed by was how much could be done on a running system. Upgrading a 2010 release of one distro to a 2015 release of a different distro, with no outage until the reboot at the end? Rather not bad, I think. Apt is a great tool.

Sunday, 29 March 2015

File systems: Case insensitivity is cultural insensitivity

There has long been a divide between case sensitive file systems and case insensitive ones. The former retain all names as provided, and define uniqueness according to very simple rules; the latter either force names to monocase (upper or lower), or retain them in the first form provided, and match without regard to case. Today, the divide roughly parallels the distinction between POSIX (Linux, Mac OS, BSD, etc) and Windows, so it tends to be tied in with the religious war that all that entails, but the concepts don't depend on the OS at all. For the purposes of discussion, I will be using Unix-like path names, because they are valid on all platforms.

If all file names are required to be ASCII and to represent English text, there is no problem. You can define case insensitivity easily: simply treat these 26 byte values as identical to those 26 byte values. But in today's world, it's not good enough to support only English. You need to support all the world's languages, and that usually means Unicode. Since Unicode incorporates ASCII, the file system needs to be able to cope with ASCII file names. Well and good. So, are file names "/ALDIRMA" and "/aldirma" identical? They look like ASCII, and if those letters represented something in English, they'd be the same, so they ought to be the same, right? Nope. That first word is Turkish, and the lower-case form is "/aldırma", with a dotless i. (The second isn't actually a word in any language, so far as I know, but it's a perfectly valid file name.) Should all four (those three plus "/ALDİRMA" with a dotted majuscule I) be considered to represent the same file? And what about the German eszett ("ß") - should that be identical to "ss", because they both upper-case to "SS"? Should the Greek letter sigma, which has distinct final ("ς") and medial ("σ") forms, be considered a single letter? This might conflate file names consisting of words strung together (in which case the final sigma is an indication that there is a word break there).

These distinctions make it virtually impossible to case-fold arbitrary text reliably. Whatever you do, you'll surprise someone - and maybe create a security hole, depending on how various checks are done. To be trustworthy, a file system must behave predictably. There's plenty of room for sloppy comparisons in text searching (for instance, a Google search for "aldirma" does return results containing "aldırma"), but the primary identifier for a file should be simple and faithful. So there are basically two options: Either the case folding rules are kept simple and generic, or there are no case folding rules at all. Will you impose arbitrary English rules on everyone else? And why English - why not, say, Hebrew? Let's treat "/letter" and "/litter" as the same file - after all, there are no vowels in Hebrew, so they should be ignored when comparing file names, that's fair right?

Or we could take the reliable, safe, and international approach, and maintain file names as strict text. Transform them only in ways which have no linguistic meaning (eg NFC normalization), and display them strictly as the user entered them. If you misspell a word, the file system won't fix it; if you get majuscule and minuscule wrong, the file system shouldn't fix that either.

One change I would make in the current POSIX model, though, and that's to require that file names be Unicode text, not arbitrary bytes. There's a broad convention that Linux systems use file names that consist of UTF-8 streams, but it's not enforced anywhere, and that means arbitrary bytes sometimes have to be smuggled around the place. That's necessary for legacy support, I suppose, but it'd be nice to eventually be able to drop all that compatibility code and just use Unicode file names everywhere. But even without that, I still prefer the Linux "it's all bytes, probably UTF-8" model to the Windows "it's all UTF-16 code units, case insensitively" model. Case insensitivity works only if one culture imposes its definition of case on everyone on the planet. Let's not do that.