Monday, 18 April 2016

Falsehoods Programmers Believe About PEP 8

We're computer programmers. We spend our days warping reality to our purposes, and then leaving behind a textual representation of the exact type of warping so the next person can use our modified reality. Over the years, many MANY Python programmers have looked at a document called PEP 8 and misunderstood it, just as many programmers misunderstand time, or people's names. The same PEP 8 misconceptions crop up over and over again, and after some discussion on python-list, I've collected these commonly-held fallacies.

Remember, all of these assumptions are wrong.

  1. All Python code should follow PEP 8.
  2. If you use a tool named pep8, your code will be PEP 8 compliant.
  3. If your code is PEP 8 compliant, a tool named pep8 will accept it.
  4. The Python Standard Library is PEP 8 compliant.
  5. Okay, at least the new parts of the standard library are PEP 8 compliant.
  6. PEP 8 compliant code is inherently better than non-compliant code.
  7. PEP8-ing existing code will improve it.
  8. Once code is PEP 8 compliant, it can easily be kept that way through subsequent edits.
  9. PEP 8 never changes.
  10. Well, it never materially changes.
  11. I mean, new advice, sure, but it'll never actually go back on a rule.
  12. The line length limit is obsolete in an age of high-resolution displays.
  13. Okay, but if you disregard side-by-side windows, lines of code can be arbitrarily long without hurting readability.
  14. Well, maybe not several hundred characters, but surely 120 characters of code on a line is easy enough to read.
  15. The only valid white space is line breaks and U+0020 SPACE.
  16. Okay, U+0009 TAB when lining up columns, but no other white space.
  17. Oh, come on, no-one would use U+000C FORM FEED in source code.
  18. Everyone uses the same sort of tools (visual text editors) to read and write code.
  19. Ignoring the few weirdos who can cope with their own bizarre choices, every NORMAL person uses the same sort of tools.
  20. Alright, everyone at my organization will use the same tools. I can mandate that, so it must be true.
  21. Readability is an inherent quality of code. It doesn't matter who reads it, good code is good code.
  22. Avoiding the "Names to Avoid" is a sure and simple way to make sure your identifiers aren't confusable.
  23. Unicode is good for identifiers.
  24. Unicode is bad for identifiers.
  25. Unicode is optional for identifiers.
  26. You know what I mean. I'm talking about *non-ASCII* characters. And you shouldn't use them.
  27. PEP 8 is a tool for denying patches/pull requests that you should reject.

As with the articles I'm riffing off, every one of these is false, and I can give examples. And this is far from an exhaustive list. If you want to avoid the worst of the errors, start by reading the actual document (not some tool that borrows its name), particularly the section entitled "A Foolish Consistency is the Hobgoblin of Little Minds".

With thanks to Ben Finney and Dan Sommers for contributions to the above list.

Tuesday, 25 August 2015

Stop kissing Crystal and find Grandpabbie!

Surround Sound allows some neat effects, like hearing that creepily dangerous sound from behind you instead of from the screen... but sometimes there can be additional words hidden in the other channels that you otherwise mightn't be able to hear. When Kristoff brings Anna to meet his friends (well, they're more like family), he greets them all, in several cases by name. But since the audience's attention is on Olaf and Anna ("He's cray zee!"), the actual lines are easily lost. Here's what Kristoff is actually saying...

Hey, guys.
You are a sight for sore eyes.
Hey look, Magma's back from vacation!
Rocko's looking sharp, as usual.
Clay, whoa... I don't even recognize you. You lost so much weight!
Didn't realize how much I missed you guys.
Guys, I've got so much to tell you.
Stop kissing Crystal and find Grandpabbie!

So now you know. And knowing, as they say...

Sunday, 3 May 2015

Upgrading Ubuntu Karmic to Debian Jessie

My server had been running an ancient release of Ubuntu for far too long, and I was getting weary of manually patching things and hoping that I could stay on top of everything. So, with Debian Jessie freshly stable, I figured it's high time to upgrade. My options were to wipe the computer and start over, or attempt an upgrade; being certifiably insane, I chose the latter. Herein is notes from what took place this weekend... as a cautionary tale, perhaps!

First and foremost, try it out on a lesser system. (I wasn't quite insane enough - or maybe stupid enough - to just dive in and start fiddling with a live server.) Upgrading Ubuntu Maverick (10.10) to Debian Jessie (8.0) worked out fairly well, with just a few messinesses and complications, all of which also happened with the full upgrade. But there were rather more problems on the live system.

  1. Replace /etc/apt/sources.list with Jessie content. Easy enough. Don't forget to check /etc/apt/sources.list.d/ for any that are now redundant.
  2. Grab the new GPG keys from so apt can check signatures.
  3. apt-get update, find out about a few more keys needing to be imported. Grab them with gpg --recv-keys 64481591B98321F9; gpg --armor --export 64481591B98321F9|sudo apt-key add - (after checking their validity in whatever way satisfies your level of paranoia).
  4. Due to some major bootstrapping problems, I couldn't simply apt-get dist-upgrade to do the upgrade. For everything to work, I actually had to do several steps: first, grab the Squeeze (Debian 6) repos, and install a somewhat newer kernel; then reboot into that, and finish the upgrade in single-user mode with broken mounts.
  5. STRONG recommendation: Use apt-get -d dist-upgrade to download all packages into the cache. This operation will complete quite happily, and is not bothered by package conflicts. After that, even if the network connection is broken, package upgrading can continue. At very worst, this just lets you leave the download chugging for a while, and then come back when it's done - saves quite a bit of time when you aren't doing this on a high-bandwidth server, like my first test.

  6. In order to complete the upgrade, I had to first upgrade udev, and then only afterward install a new Linux kernel... which udev requires. This meant a big fat warning about how this was very dangerous, but that I could touch a particular file to override the warning and do the installation - with the proviso that it might trash the system if I rebooted into the running kernel. Fortunately, such did not happen, as I was able to subsequently install a recent kernel, but it was cause to pause.

  7. Above all, this change MUST be done by someone prepared to take responsibility. This can't be managed by a script, it might cause downtime, and you need to have a fail-over ready in case something breaks badly. But hey. What's life without a little adventure... Some people go mountain climbing, I go upgrade climbing.

The part I was most impressed by was how much could be done on a running system. Upgrading a 2010 release of one distro to a 2015 release of a different distro, with no outage until the reboot at the end? Rather not bad, I think. Apt is a great tool.

Sunday, 29 March 2015

File systems: Case insensitivity is cultural insensitivity

There has long been a divide between case sensitive file systems and case insensitive ones. The former retain all names as provided, and define uniqueness according to very simple rules; the latter either force names to monocase (upper or lower), or retain them in the first form provided, and match without regard to case. Today, the divide roughly parallels the distinction between POSIX (Linux, Mac OS, BSD, etc) and Windows, so it tends to be tied in with the religious war that all that entails, but the concepts don't depend on the OS at all. For the purposes of discussion, I will be using Unix-like path names, because they are valid on all platforms.

If all file names are required to be ASCII and to represent English text, there is no problem. You can define case insensitivity easily: simply treat these 26 byte values as identical to those 26 byte values. But in today's world, it's not good enough to support only English. You need to support all the world's languages, and that usually means Unicode. Since Unicode incorporates ASCII, the file system needs to be able to cope with ASCII file names. Well and good. So, are file names "/ALDIRMA" and "/aldirma" identical? They look like ASCII, and if those letters represented something in English, they'd be the same, so they ought to be the same, right? Nope. That first word is Turkish, and the lower-case form is "/aldırma", with a dotless i. (The second isn't actually a word in any language, so far as I know, but it's a perfectly valid file name.) Should all four (those three plus "/ALDİRMA" with a dotted majuscule I) be considered to represent the same file? And what about the German eszett ("ß") - should that be identical to "ss", because they both upper-case to "SS"? Should the Greek letter sigma, which has distinct final ("ς") and medial ("σ") forms, be considered a single letter? This might conflate file names consisting of words strung together (in which case the final sigma is an indication that there is a word break there).

These distinctions make it virtually impossible to case-fold arbitrary text reliably. Whatever you do, you'll surprise someone - and maybe create a security hole, depending on how various checks are done. To be trustworthy, a file system must behave predictably. There's plenty of room for sloppy comparisons in text searching (for instance, a Google search for "aldirma" does return results containing "aldırma"), but the primary identifier for a file should be simple and faithful. So there are basically two options: Either the case folding rules are kept simple and generic, or there are no case folding rules at all. Will you impose arbitrary English rules on everyone else? And why English - why not, say, Hebrew? Let's treat "/letter" and "/litter" as the same file - after all, there are no vowels in Hebrew, so they should be ignored when comparing file names, that's fair right?

Or we could take the reliable, safe, and international approach, and maintain file names as strict text. Transform them only in ways which have no linguistic meaning (eg NFC normalization), and display them strictly as the user entered them. If you misspell a word, the file system won't fix it; if you get majuscule and minuscule wrong, the file system shouldn't fix that either.

One change I would make in the current POSIX model, though, and that's to require that file names be Unicode text, not arbitrary bytes. There's a broad convention that Linux systems use file names that consist of UTF-8 streams, but it's not enforced anywhere, and that means arbitrary bytes sometimes have to be smuggled around the place. That's necessary for legacy support, I suppose, but it'd be nice to eventually be able to drop all that compatibility code and just use Unicode file names everywhere. But even without that, I still prefer the Linux "it's all bytes, probably UTF-8" model to the Windows "it's all UTF-16 code units, case insensitively" model. Case insensitivity works only if one culture imposes its definition of case on everyone on the planet. Let's not do that.

Saturday, 13 December 2014

Unicode makes life easy!

Every now and then, the complaint is raised: "Unicode is hard!". My usual first response is that it's not Unicode that's hard, but human language; for instance, it's not Unicode's fault that:
* English is written left-to-right, Hebrew is written right-to-left, and you sometimes need to put both on the one line, like when you explain the name of PHP's scope resolution operator (Paamayim Nekudotayim, פעמיים נקודתיים‎, "double dot doubled");
* Some languages use some diacritical marks, others use different ones, so there are literally dozens of different marks that could be applied to letters; and each mark could be applied to any of quite a few letters;
* Not all letters in all languages have upper-case and lower-case forms, and in some cases, a single upper-case letter becomes multiple lower-case letters;
* And plenty more besides.

What Unicode does, of course, is bring a lot of these issues to the surface. A programmer who thought that "plain text" could fit inside ASCII, and now has to deal with all of this, will often tend to blame Unicode. But the truth is that Unicode actually makes so much of this easy - partly because you can push most of the work down to a lower-level library, and partly because of some excellent design decisions. Here's a little challenge for you: Make a way to transcribe text in any Latin script, with any diacriticals, using a simple US English keyboard layout. You'll need to devise a system for transforming an easy-to-type notation into all the different adorned characters you'd need to support. Your options: Use Unicode, or use eight-bit character sets from ISO-8859.

Here's how I did it with Unicode - specifically, in Pike, using GTK2 for my UI. First, translate a number of two-character sequences into the codepoints for combining characters; then perform a Unicode NFC normalization. And the second step is optional, done only because certain editors have buggy handling of combining characters (SciTE allows the cursor to get between them!), so really, the entire translation code consists of a handful of straight-forward translations - in my case, seven of them to cover all the combining marks that I need, plus four more for inverted question mark and exclamation mark, and the vowels not found on a US English keyboard (æ and ø); so there are a total of eleven transformations done.

To do the same thing with ISO-8859 character sets would require: First, figure out which character set to use, and then enable only a subset of transformations. Then, have full translation tables including every letter-diacritical combination supported, and make sure you have them all correct. There'll be hundreds of transformation entries, and you'd need to redo them for every single character set; with Unicode, supporting a new language is simply a matter of seeing what's missing, and adding that.

Unicode made my code smaller AND more functional than ISO-8859 ever could. Unicode is awesome.

Thursday, 25 September 2014

Science, the Bible, and the theories of scientists

This is an open letter to NaClHv, in partial response to some things said in recent posts, though not specifically to any one of them. I invite a response, either here or on his own blog; if the latter, I will link to the post from here, for the benefit of future readers.

You've said that observations of nature are as trustworthy as Scripture, and should be taken equally as demonstrations of God's character. Great! I absolutely agree with you. For instance, if you shove a stick in the ground at noon on the summer solstice, and look at the angle the sun's shadow makes, you can prove that the earth is round - and even calculate its radius, with a fair degree of accuracy. We can take that as good solid fact, and interpret Scripture in the light of it. That's science. So far, so good.

But that validity applies only to the raw facts - the purest observations. The raw initial data is what's infallible. Every interpretation based on that data is as fallible as the person who makes it - and more importantly, reflects the biases of the person who makes it, and we're all biased. When a big corporation commissions a scientific study to prove that Brand X Toothpaste produces whiter teeth and healthier gums than all its competitors, there'll be some raw data somewhere that's still perfectly correct, and then some massive interpretational bias (at least, I've yet to see any toothpaste ad that isn't subject to that). We know that the conclusion is heavily influenced by the funding, when it's that blatant. Do we acknowledge that, even when people are striving for true science, their initial preconceptions will affect their published conclusions?

If we do, then every piece of scientific consensus must be subject to review. Scientific consensus, especially today, generally means anti-Christian biases. If someone sets out to prove that God doesn't exist, and ends up concluding that everything happens by itself without any external influence, can we truly trust that declaration? No, because we know it's false - it contradicts the Bible. What if it's not quite so blatantly obvious? It's just as unreliable - it's still someone's interpretation of the facts.

So I ask you: Why are you going to great lengths to incorporate the popular anti-Christian view that we're descended from monkeys? There is no Scriptural support for this; there is no justification from the Word of God that suggests that this theory should be accepted. So what's the hard fact that you're incorporating? Where is the evidence that death occurred prior to Adam? (Romans 5 suggests that it didn't.)

I'm aware that there are messy convolutions in my own interpretation of Scripture, particularly the account of Creation. Convolutions are not, in themselves, fundamentally bad, but they're the equivalent of code smell in software - suggestive while not conclusive. The simpler solutions are the better ones, and I would love to find a pure, beautiful, clean theory, that explains everything perfectly. But until we have one, we have to accept a certain messiness. I hold to the Dr Humphreys theory that the "days" of Genesis 1 are perfectly literal, as observed from here on Earth's surface (we know that time is affected by gravity, so time and location must be bound up together, and any literal acceptance of those days must therefore stipulate an observer's location); that requires the assumption of an event horizon, crossing the observer probably during the fourth day of Creation. In contrast, your theory requires the assumption of two different types of person, genetically and visually identical, one of them bearing the image of God and the other not; it requires that there be people with whom the descendants of Adam and Eve interacted, yet who were not fully people, yet who were... and somehow, there has to be a boundary on the laws in Lev 18, which make it very clear that we're not supposed to interact in certain ways with non-humans. Where's the edge of non-humans, if it's not the sons of Adam and daughters of Eve?

But fundamentally, the question is: What facts are we using in forming our interpretations? Scripture is infallible. Raw observation of nature is infallible. Everything else isn't. If you can't duplicate my research yourself, you can't take my conclusions as perfect science - they're science filtered through me. If you were unable to read the Bible, you'd have to have someone else read it to you; would you allow your views of God to be based on an edited reading... by someone who openly hates God? And would you then go to great lengths to incorporate that into your world-view?

Then why accept it with science?

Tuesday, 9 September 2014

My not-so-secret chocolate stashes

Cecilia has a few different places to find chocolate. Scott Adams of "Dilbert" said that every strip he draws is based on truth, or is absolutely trumped by truth. So... I'm not a grad student, but how does my real-life chocolate stash measure up?

I'm not an artist, so I'll just tell you about the chocolate I have around, rather than combining it all into a nice pic. Let's see. There's my primary sharing chocolate, which everyone partakes of whenever we watch movies together - in a Cadbury Roses tin (without its lid; the lid is where my little Belle figurine is enjoying a picnic on the 'grass'). To restock the primary tin, I have a box of Guylian pralines, sitting on top of three spare hard drives and an Alice in Wonderland jigsaw puzzle. Also a case of Lindor bars, resting casually on the boxed Portal turret. The next most accessible is the makings of hot chocolate - in my In-Tray, which now contains nothing else. (What's more important than chocolate, after all!)

But the bulk of my chocolate is all in my bedroom. More barrels of hot chocolate, a variety of sharing chocolates, and a small collection of "presentable chocolate" - nicely boxed stuff that can become gifts.

I don't yet have any chocolate taped under the desk, but that's something I have definitely considered, ever since Ceci told me about her "ICE" chocolate. Seems a good idea.