Tuesday, February 28, 2012

In Which Cliff Stoll Destroys Ignorance

Rarely do those who experience great stories know at the beginning how the stories are going to end, precisely because it is the banishment of high orders of ignorance[1] that makes for a great story. If you already know the end of your story, you're already there, and there is no more story. All character development and all personal growth is a matter of coming to realize the answers to questions that one previously didn't even know existed to be asked. The higher the levels of ignorance overcome, the greater the story is.
At one level, Cliff Stoll's The Cuckoo's Egg[2] is the story of how an astronomer became an expert in computer security, but Cliff could have expected this; he didn't know about security from the start, but he knew he would have to learn. He knew that there were questions to be asked on that topic. On another level, The Cuckoo's Egg is about how a self-styled irresponsible kid discovered responsibility and ethics. This is the greater story, because Cliff not only had to learn the answers, and not only had to learn the questions to ask to get the answers, but had to first learn that there was a topic about which ethical questions could be asked. That realization is the fundamental weltanschauung-altering event, which Cliff Stoll struggled with throughout his pursuit of the German hacker. He knew that his views were changing; he worried about how he would be received by his "radical friends" in the "People's Republic of Berkeley"; and he worried because he knew that not only would they disagree with his new politics, but that they were basically incapable of understanding his new politics, because they did not know how to ask the questions that were the prerequisite for understanding. It is that realization, not the simple acquisition of technical knowledge, that made him a real expert in his new field.

[1] http://dl.acm.org/citation.cfm?id=352194
[2] http://www.amazon.com/Cuckoos-Egg-Tracking-Computer-Espionage/dp/1416507787/

Wednesday, February 15, 2012

Facebook Knows My Family Tree

Genealogical research is not about generating new information; it is about trying to find information that already exists somewhere, in some inconvenient format, and re-entering it. It is using human brains to look for pointers in written records and joining those records by hand. As more and more records are digitized, or collected originally in digital form, it is utterly insane that human researchers should still be required to engage in this menial work. No one should have to slave away manually searching through old records looking for the pointers that connect one human to another in the family graph. The greatest potential revolution in genealogy lies not in new software to streamline the process of doing research, but in software that eliminates the human bottleneck entirely. The pinnacle of genealogical technology will have arrived when family trees assemble themselves, requiring only that someone ask.

Monday, February 13, 2012

Lobsters Are Bugs

Tonight, a major life milestone was past: I went on my first Real Date.

OK, so I've said that about a lot of other events. And I will probably say it about many more to come. This is because the classification of "date or not date" is not binary, but runs along a continuum of more and less traditional activities. And on Valentine's Eve, I have moved further along this continuum than ever before: I took my fiancée to a Real Fancy Restaurant.

This was only because we got a Gift Card for Red Lobster last Christmas, but for purposes of moving along the continuum, it totally counts.

As a result, I have now had my first experience eating Lobster. I'm not sure what to make of it; it tasted OK, but felt like raw beef. And lobsters are totally giant bugs. I'm not sure how I feel about that. I also had it confirmed by the waiter that there is in fact no dignified way to eat non-de-shelled shrimp.

My fiancée is laughing silently as I write this. I shall go hug her now, and look forward to further adventures moving along the Continuum of Real Dates in time to come.

Circumventing Filters for the Good Guys

Discussions of the ethical implications of technology tend towards the negative; if 'ethics' and 'technology' are mentioned in the same paragraph, it's usually to warn that the technology in question is somehow dangerous. Some more frequent targets of this sort of derision are file-sharing software (like BitTorrent) and anonymizers (like Tor). BYU's own internet filters even block websites about Tor, as it can be used to circumvent them. So it's nice to see an example of both of these technologies being used on a large scale for what most Americans at least would find a very ethically appropriate goal: circumventing Iran's attempts to simplify totalitarian surveillance by eliminating its citizens' use of encryption[1]. Technology itself, after all, is not good or bad- it's all in how the technology in question is being used.

[1] http://arstechnica.com/tech-policy/news/2012/02/tors-latest-project-helps-iran-get-back-online-amidst-internet-censorship-regime.ars

Monday, February 6, 2012

Eastern Europe: A Bastion of Freedom and Democracy?

In case you haven't heard yet, while the US public has worrying about SOPA and PIPA, Europe started dealing with their own version, called ACTA. While some were allegedly taken by surprise by the massive protests against SOPA and PIPA, everyone's pretty much used to American's protesting things. But would you be surprised to hear about anti-ACTA demonstrations in Prague? Or Czech members of parliament refusing to support it "as a matter of principle", and claiming that the media "played a part in the hush-up"[1]? A Slovenian ambassador even made a public apology for having signed the agreement, claiming that she acted carelessly and in ignorance and failed in her civic duty[2]. That is a level of candor that we never expect to hear from a US politician. Admitting ignorance as the Slovenian ambassador did is the first step towards gaining wisdom. But unfortunately, admitting ignorance is socially unnacceptable around here. Everybody knows, tacitly, that no one person can possibly be an expert on everything that they would need to know to decide every issue that faces the government. We'll start to make progress a lot faster when we can get around to no longer being embarrassed by that fact.

[1] http://m.ceskapozice.cz/en/news/politics-policy/czech-euro-mps-oppose-%E2%80%98completely-wide-mark%E2%80%99-acta
[2] http://boingboing.net/2012/02/03/slovenias-ambassador-apologi.html

Saturday, February 4, 2012

Javascript Needs Continuations

I am a JavaScript junkie. I love JavaScript. I love building things in JavaScript. And I love the fact that node.js lets me easily use JavaScript on the server as well as the client. But sometimes, JavaScript is just plain missing a really useful feature, and it gets on my nerves. One thing that I do not love about JavaScript, in which opinion I am far from alone, is the proliferation of nested callbacks.

The biggest problem with asynchronous callbacks is that they're infectious. Asynchronicity cannot be isolated and encapsulated.
Consider this code:
var returnHTML = renderAPage(name);
response.end(returnHTML);
...
function renderAPage(name){
return "Hello "+(name||"World")+"!";
}

Now maybe I want to make it a little more interesting and read a template out of a file or something:
var returnHTML = renderAPage(name);
response.end(returnHTML);
...
function renderAPage(name){
return fs.readFileSync("hello.html",'utf8').replace("{name}",name||"World");
}

But this is slow, so I clearly want to use asynchronous IO. But if I do, I cannot maintain the same interface! Altering the internal implementation of renderAPage requires leaking changes up through everything that calls it- and this is true no matter how deeply nested that one asynchronous call may be in other auxiliary functions. My new asynchronous code now looks like this:
renderAPage(name,function(err,returnHTML){
if(err) throw err;
response.end(returnHTML);
});
...
function renderAPage(name, callback){
fs.readFile("hello.html",'utf8', function(err, text){
if(err) callback(err);
else callback(null, text.replace("{name}",name||"World"));
});
}

Much more cluttered, and I've essentially had to transform my entire program by hand into continuation passing style just so the event loop can take control away after that asynchronous system call and then give it back later. CPS transformations are supposed to be the job of a compiler, not a programmer. What I would really like to do is this:
var returnHTML = renderAPage(name);
response.end(returnHTML);
...
function renderAPage(name){
var return_to = arguments.continuation;
fs.readFile("hello.html",'utf8', function(err, text){
if(err) throw err;
return_to(text.replace("{name}",name||"World"));
});
}
Where return_to is a genuine continuation- called like a function, but passes its argument through to the return value at the place where the continuation was saved. Being able to save continuations makes asynchronicity encapsulatable. And unlike other approaches to fixing JavaScript concurrency, this does not require any additional syntax or keywords; just an extra field on arguments representing the continuation at the place where a function was called.

Eh, but there is one complication- the way asynchronous calls are handled currently, renderAPage will end up returning twice, and the first time it'll return undefined, which is bad. We can just check the return value and terminate if it's not the real value, kind of like checking the return value on POSIX fork, but that fails to eliminate the leaking of implementation details. We could change the semantics of asynchronous calls, so they always suspend execution of that particular stack and never return. But then, what if you really do want to return twice?

I don't think that can be addressed without some additional syntax. Fortunately, it's a very simple bit of additional syntax. The break and continue keywords can already be used with label arguments, and they seem the perfect words to use for continuation control with expressions as arguments:
var returnHTML = renderAPage(name);
var response.end(returnHTML);
...
function renderAPage(name){
var return_to = arguments.continuation;
break fs.readFile("hello.html",'utf8', function(err, text){
if(err) throw err;
return_to(text.replace("{name}",name||"World"));
});
}
Here, I'm using the break keyword to signal that this function call will never return- if it tries to return, just terminate execution. Thus, the only way to get information out of it is to have the continuation called. But what if I want parallel execution?
var returnHTML = renderAPage(name);
var headers = renderHeaders();
response.writeHead(200,headers);
response.end(returnHTML);
renderAPage and renderHeaders might both contain asynchronous calls with break, but I have no need to run them sequentially, and I don't want to pause the whole thread while waiting for renderAPage to return via continuation. Well, that's where continue comes in:
var returnHTML, headers;
continue returnHTML = renderAPage(name);
continue headers = renderHeaders();
response.writeHead(200,headers);
response.end(returnHTML);
This usage of continue tells the interpreter not to worry about whatever might be going on inside the following expression- don't worry about side effects, don't worry about execution breaks; spawn a new thread to handle it if you want, but that's an unnecessary implementation choice. You're allowed to keep going and execute some more lines, but just remember that if you ever actually need to read the result of that expression, pause and wait for the return, whether it's an actual return statement or calling a continuation. I'm not sure how one should handle the possibility of multiple returns in this situation; the simple way might be to simply say that continues are only allowed to return once, and subsequent calls to that continuation will throw an error or be ignored.

If the interpreter does feel free to actually spawn a new thread to handle "continues", this potentially gives the programmer great power to define new asynchronous functions without having to use the dreaded nextTick or setTimeout, perhaps something like this:
function myAsyncFunction(){
var k = arguments.continuation;
continue break (function(){
...
k(return_value);
})();
}
The continuation is saved; an anonymous internal function is called and specified not to return, but execution of the containing function is allowed to continue, and will initially return undefined, just like fs.readFile. At some point, however, the internal anonymous function calls the continuation, and it returns again.

These additional behaviors for break and continue do not conflict with any existing syntactically valid constructs, and since they don't require adding any new keywords, they're guaranteed not to break any pre-existing code. That extra syntactic complexity is, however, all just figuring out how to deal with concurrency. This is important because it's the main impetus for my annoyance over JavaScript's lack of continuations, but once you've got continuations they're useful for much more than just that. And while adding real continuations may be a major feat for implementation in the interpreter, the interface for it does not have to be, and again should be perfectly backwards-compatible. Will anyone give me my arguments.continuation?

EDIT: Somehow, my code samples got all messed up when this was first published. They are now all corrected.