Monday, 14 April 2008

Compensating for Spamhaus

Exercise. Some pundits claim that (so-called) end-users should never ever send their own mail directly. You beg to differ, especially after experience with rubbish ISPs, and for years at work and at home have happily delivered your e-mail directly from your network to destinations, using Plan 9's pleasant little upas/smtp. Why bother with middlemen?

One day, your Internet supplier provides a shiny new set-top box with separate cable modem. Plug in the RJ45, hit a few configuring web pages, DHCP, update Sender Policy entries on dyndns, and away you go. Except that your mail is now being rejected by some sites. Yahoo? MSN? Hate them anyway. Gmail?! Oh dear! Why? There is a peculiar organisation, let's call it Spamhaus (which really ought to be the name of a group that sends spam), that busies itself making lists of IP address ranges that supposedly belong to these horrible end-user people. (The ones that pay to connect to the Internet, but never mind.) It turns out that your new address, unlike the old, is on a list. (Same supplier, but again, never mind.)

Others in your house are now distressed by rejected mail (they actually know people on Yahoo and MSN!). Fix it, using other computers and software as needed, but without introducing a store-and-forward phase or changing your mail domain.

Solution. The network in the exercise is running Plan 9, but saddled with a Spamhosed address. You have fortunately got access to a virtual server elsewhere with a safe address, but it is running Linux. Run hosted Inferno on theserver and export /net:

exec /usr/inferno/Linux/386/bin/emu /dis/sh.dis -c "\
listen 'tcp!*!port' {export -a /net}"
That makes the socket-based interfaces of the Linux system accessible through the name space exported on port by the Inferno system using the Styx protocol.

On the Plan 9 system, add a chunk similar to the following to /mail/lib/remotemail, before it calls smtp:

while(! mount /srv/netexp /n/remote){
{rm -f /srv/netexp && srv tcp!theserver!port netexp} ||
exit "import failed"
}
bind /n/remote/tcp /net/tcp || exit "failed bind"
The Styx (=9P) connection for the /net exported from theserver is cached in /srv/netexp, and mounted at /n/remote in remotemail's own name space, allowing just /net/tcp to be bound in from the other machine. (If the cached connection has hung up, we get another.) When upas/smtp is later invoked by remotemail, it will dial the destination machines using the name /net/tcp as usual for Plan 9, but the name will refer to the instance bound from the other machine, which thus acts as a TCP/IP gateway, transparently. Because only the remote /net/tcp is used, DNS lookups will use the local /net/udp, and local name server /net/cs. The outgoing TCP/IP traffic (just for smtp) will have theserver's IP address, because it is using the server's TCP/IP sockets via the hosted Inferno. Adjust the Sender Policy Framework and related DNS entries to suit, and get back to productive work.


Sunday, 13 April 2008

Summer of Code 2007 results and experience

Inferno projects in 2007 ended up within the Plan 9 from Bell Labs organisation. You can read about it in last year's blog. I was mentor for three of those projects: SPKI infrastructure for Inferno (Katie Reynolds/katelyn); Venti-like system in Limbo for Inferno, with added Rabin fingerprinting (Mechiel Lukkien/mjl); and a port of Inferno to the Nintendo DS (Noah Evans). The students were all talented, which was just as well, since otherwise acting as mentor for three projects (and helping a bit on some others) would have been quite impossible. The projects were modular, so that timing and expectations could be adjusted fairly easily as the summer wore on, and there was something to show for it all early on.

Here is an edited version of a post I made elsewhere just after the programme ended, of our experience of GSoC 2007.

I had three students to mentor and they all produced work that is being included in the organisation's distributions. One of the projects (SPKI) finally wrote code to implement some ideas I had originally intended to implement three years ago, but had to put aside for lack of time. That success in turn is leading to a significant change to ancient code and mechanisms in one of our systems, mainly by deleting code from its kernels. (So in a way, for us it was the Google Summer of Anti-code, which seems good to me.)

Those three projects were all quite hard. The SPKI one required installing two related but different operating systems and a large application suite, and then writing code in both C (for one part) and a concurrent programming language the student had never seen before (Limbo, for other parts). Another project required writing an archival storage subsystem broadly based on an existing design (Venti) but including some new techniques. In the third project, the student got the Inferno operating system settled as a native kernel on a new platform — first on an emulator, then on the hardware — without previous experience of doing kernel ports.

Despite the relative difficulty, the projects all worked out well, because the students settled down and did the work, and they kept it up. Sometimes I would receive e-mail starting "This is probably a silly question, but ...", and not only was it not silly, it revealed some long-standing flaw/deficiency/confusion in system or documentation or both. The students have also expressed interest in continuing to contribute to the underlying systems, time and graduate study permitting.

None of this would have happened without GSoC.

The style of interaction was different for each student. E-mail was used quite a bit, partly because of timezone differences, and partly because I prefer it because it offers a chance to think about the questions or responses, but one of the students used Google chat with me quite a bit at critical points to good effect. Interaction generally increased after mid-term evaluation, partly because the more complex stages had been reached, but mainly because I felt guilty about not spending as much time on the programme as I had originally intended, owing to a change in my own workload. Generally I found it similar to supervising a student project, but with the added complications of not being able to assess the student's ability as easily, having to cope with time zones, and deadlines in the day job.

Another GSoC organisation suggested that detailed specifications might improve the outcome.
One of my three projects was defined in considerable detail (including external references), another was fairly obvious in scope, and the third had an overall aim but no real detail. Thus, from my own experience, I cannot conclude that either over- or under-specification is critical (or not) to success. "It depends."

What else might help? All three projects were committing source code well before the mid-term point. Each of the projects had distinct stages identified, and all were deliberately or inherently open-ended (making it easy to add or remove items depending on actual progress). Insisting on fairly regular supply of code (or at least design material and discussion) helps to highlight writer's block.

Now, the Plan 9 organisation as a whole had 13 projects and 5 failed the final evaluation, which is not so much too high in absolute terms as too high too late (we ought to have failed more at mid-term). But then again, there were 8 successful projects, and none of those would have happened without GSoC. It's a bit like one of those tabloid newspaper articles bemoaning that "20% of people do/believe/hope some horrible thing" when a reasonable response might be that 80% of people do not. Our students succeeded more often than not. Still, we all hate wasting time and money on failure.

I later read through the original applications, the mid-term reviews and the final reviews, to see whether there was anything to distinguish the set that succeeded from the set that did not. I was reminded that during the application competition, I thought the quality of both GSoC applicants and the coherence of their applications was good. We had well over 100 applications, and there were only a few that were spam or no-hopers. The ones we accepted that subsequently failed still looked plausible to me.

For the projects I was going to mentor, I was particular about assessing the students' portfolios, which included any sort of code they'd previously written, ideally on their own, perhaps as a student project. I would not have declined to mentor someone who was relatively inexperienced, but I would never have agreed to mentor them on an ambitious project.

Still, only one of the three
that I did mentor had any relevant experence of their chosen project area, so the previous work wasn't directly applicable, and they still had quite a bit of work to do. On the other hand, at least one project that failed had an apparently experienced student who could point to previous, plausible, and even relevant work. As a rule, though, we'd probably pay special attention to evident capability in future.

All but two of the failed projects were reasonable ones that could be expected to be done (to an adequate level) in the time available. We ought, however, to have failed the three simpler projects at mid-term for lack of progress. One of those would have been especially good fun, and had a good mentor for it, who helped agree a simplified project at mid-term, but the student simply did not seem to come to grips with it at all. I eventually realised that all the projects that did fail had a worried mentor at mid-term, and the ones that succeeded looked fine to their mentors (even when the students were worried about progress).

One pleasant surprise for me was that the native kernel port to Nintendo DS succeeded.
One rule of thumb I originally had for GSoC projects was "never suggest doing a device driver let alone a kernel port", but although one port did fail, one worked. The reason for avoiding drivers and ports is that although they seem like fine projects, we know historically it is quite difficult to help debug them at a distance, even when both parties ostensibly have the same hardware, and that seemed a big risk for a short summer project. The Nintendo port has attracted an enthusiastic group, and work continues in a project on Google code.

Our rather loose (in several ways) organisation overall received a big benefit from participating in GSoC: there is now more code in public, and just as important, more visibility and more participants in the larger project.