Monday 14 April 2008

Compensating for Spamhaus

Exercise. Some pundits claim that (so-called) end-users should never ever send their own mail directly. You beg to differ, especially after experience with rubbish ISPs, and for years at work and at home have happily delivered your e-mail directly from your network to destinations, using Plan 9's pleasant little upas/smtp. Why bother with middlemen?

One day, your Internet supplier provides a shiny new set-top box with separate cable modem. Plug in the RJ45, hit a few configuring web pages, DHCP, update Sender Policy entries on dyndns, and away you go. Except that your mail is now being rejected by some sites. Yahoo? MSN? Hate them anyway. Gmail?! Oh dear! Why? There is a peculiar organisation, let's call it Spamhaus (which really ought to be the name of a group that sends spam), that busies itself making lists of IP address ranges that supposedly belong to these horrible end-user people. (The ones that pay to connect to the Internet, but never mind.) It turns out that your new address, unlike the old, is on a list. (Same supplier, but again, never mind.)

Others in your house are now distressed by rejected mail (they actually know people on Yahoo and MSN!). Fix it, using other computers and software as needed, but without introducing a store-and-forward phase or changing your mail domain.

Solution. The network in the exercise is running Plan 9, but saddled with a Spamhosed address. You have fortunately got access to a virtual server elsewhere with a safe address, but it is running Linux. Run hosted Inferno on theserver and export /net:

exec /usr/inferno/Linux/386/bin/emu /dis/sh.dis -c "\
listen 'tcp!*!port' {export -a /net}"
That makes the socket-based interfaces of the Linux system accessible through the name space exported on port by the Inferno system using the Styx protocol.

On the Plan 9 system, add a chunk similar to the following to /mail/lib/remotemail, before it calls smtp:

while(! mount /srv/netexp /n/remote){
{rm -f /srv/netexp && srv tcp!theserver!port netexp} ||
exit "import failed"
}
bind /n/remote/tcp /net/tcp || exit "failed bind"
The Styx (=9P) connection for the /net exported from theserver is cached in /srv/netexp, and mounted at /n/remote in remotemail's own name space, allowing just /net/tcp to be bound in from the other machine. (If the cached connection has hung up, we get another.) When upas/smtp is later invoked by remotemail, it will dial the destination machines using the name /net/tcp as usual for Plan 9, but the name will refer to the instance bound from the other machine, which thus acts as a TCP/IP gateway, transparently. Because only the remote /net/tcp is used, DNS lookups will use the local /net/udp, and local name server /net/cs. The outgoing TCP/IP traffic (just for smtp) will have theserver's IP address, because it is using the server's TCP/IP sockets via the hosted Inferno. Adjust the Sender Policy Framework and related DNS entries to suit, and get back to productive work.


Sunday 13 April 2008

Summer of Code 2007 results and experience

Inferno projects in 2007 ended up within the Plan 9 from Bell Labs organisation. You can read about it in last year's blog. I was mentor for three of those projects: SPKI infrastructure for Inferno (Katie Reynolds/katelyn); Venti-like system in Limbo for Inferno, with added Rabin fingerprinting (Mechiel Lukkien/mjl); and a port of Inferno to the Nintendo DS (Noah Evans). The students were all talented, which was just as well, since otherwise acting as mentor for three projects (and helping a bit on some others) would have been quite impossible. The projects were modular, so that timing and expectations could be adjusted fairly easily as the summer wore on, and there was something to show for it all early on.

Here is an edited version of a post I made elsewhere just after the programme ended, of our experience of GSoC 2007.

I had three students to mentor and they all produced work that is being included in the organisation's distributions. One of the projects (SPKI) finally wrote code to implement some ideas I had originally intended to implement three years ago, but had to put aside for lack of time. That success in turn is leading to a significant change to ancient code and mechanisms in one of our systems, mainly by deleting code from its kernels. (So in a way, for us it was the Google Summer of Anti-code, which seems good to me.)

Those three projects were all quite hard. The SPKI one required installing two related but different operating systems and a large application suite, and then writing code in both C (for one part) and a concurrent programming language the student had never seen before (Limbo, for other parts). Another project required writing an archival storage subsystem broadly based on an existing design (Venti) but including some new techniques. In the third project, the student got the Inferno operating system settled as a native kernel on a new platform — first on an emulator, then on the hardware — without previous experience of doing kernel ports.

Despite the relative difficulty, the projects all worked out well, because the students settled down and did the work, and they kept it up. Sometimes I would receive e-mail starting "This is probably a silly question, but ...", and not only was it not silly, it revealed some long-standing flaw/deficiency/confusion in system or documentation or both. The students have also expressed interest in continuing to contribute to the underlying systems, time and graduate study permitting.

None of this would have happened without GSoC.

The style of interaction was different for each student. E-mail was used quite a bit, partly because of timezone differences, and partly because I prefer it because it offers a chance to think about the questions or responses, but one of the students used Google chat with me quite a bit at critical points to good effect. Interaction generally increased after mid-term evaluation, partly because the more complex stages had been reached, but mainly because I felt guilty about not spending as much time on the programme as I had originally intended, owing to a change in my own workload. Generally I found it similar to supervising a student project, but with the added complications of not being able to assess the student's ability as easily, having to cope with time zones, and deadlines in the day job.

Another GSoC organisation suggested that detailed specifications might improve the outcome.
One of my three projects was defined in considerable detail (including external references), another was fairly obvious in scope, and the third had an overall aim but no real detail. Thus, from my own experience, I cannot conclude that either over- or under-specification is critical (or not) to success. "It depends."

What else might help? All three projects were committing source code well before the mid-term point. Each of the projects had distinct stages identified, and all were deliberately or inherently open-ended (making it easy to add or remove items depending on actual progress). Insisting on fairly regular supply of code (or at least design material and discussion) helps to highlight writer's block.

Now, the Plan 9 organisation as a whole had 13 projects and 5 failed the final evaluation, which is not so much too high in absolute terms as too high too late (we ought to have failed more at mid-term). But then again, there were 8 successful projects, and none of those would have happened without GSoC. It's a bit like one of those tabloid newspaper articles bemoaning that "20% of people do/believe/hope some horrible thing" when a reasonable response might be that 80% of people do not. Our students succeeded more often than not. Still, we all hate wasting time and money on failure.

I later read through the original applications, the mid-term reviews and the final reviews, to see whether there was anything to distinguish the set that succeeded from the set that did not. I was reminded that during the application competition, I thought the quality of both GSoC applicants and the coherence of their applications was good. We had well over 100 applications, and there were only a few that were spam or no-hopers. The ones we accepted that subsequently failed still looked plausible to me.

For the projects I was going to mentor, I was particular about assessing the students' portfolios, which included any sort of code they'd previously written, ideally on their own, perhaps as a student project. I would not have declined to mentor someone who was relatively inexperienced, but I would never have agreed to mentor them on an ambitious project.

Still, only one of the three
that I did mentor had any relevant experence of their chosen project area, so the previous work wasn't directly applicable, and they still had quite a bit of work to do. On the other hand, at least one project that failed had an apparently experienced student who could point to previous, plausible, and even relevant work. As a rule, though, we'd probably pay special attention to evident capability in future.

All but two of the failed projects were reasonable ones that could be expected to be done (to an adequate level) in the time available. We ought, however, to have failed the three simpler projects at mid-term for lack of progress. One of those would have been especially good fun, and had a good mentor for it, who helped agree a simplified project at mid-term, but the student simply did not seem to come to grips with it at all. I eventually realised that all the projects that did fail had a worried mentor at mid-term, and the ones that succeeded looked fine to their mentors (even when the students were worried about progress).

One pleasant surprise for me was that the native kernel port to Nintendo DS succeeded.
One rule of thumb I originally had for GSoC projects was "never suggest doing a device driver let alone a kernel port", but although one port did fail, one worked. The reason for avoiding drivers and ports is that although they seem like fine projects, we know historically it is quite difficult to help debug them at a distance, even when both parties ostensibly have the same hardware, and that seemed a big risk for a short summer project. The Nintendo port has attracted an enthusiastic group, and work continues in a project on Google code.

Our rather loose (in several ways) organisation overall received a big benefit from participating in GSoC: there is now more code in public, and just as important, more visibility and more participants in the larger project.

Monday 5 March 2007

Summer of Code 2007

It is at last lighter earlier in the morning and later in the evening here in England, hinting at the end of hibernation, the arrival of spring and with it, the start of this year's Google Summer of Code. Last year we inadvertently slept through that. This year we are better prepared.

During this last week, we have extended the set of projects on the Wiki at the Inferno-os site at code.google.com, adding a new section specifically for Summer of Code projects. Of course, those projects could be done at any other time, and other projects on that page might also be good projects for students. Still, the hope was that the Summer suggestions themselves would be a little different. The topics include naming in networks, language implementation, constraint-based systems, data archiving, design and implementation of library modules, and Inferno-related plug-ins for browsers. Whether they have been given short paragraphs or just bullet-points as descriptions, all the projects are essentially open-ended. Even so, they are intended to allow development to be done in stages during the time available. In every case, at the end of the summer there should be something substantial and useful to show for it.

I thought the work should also be reasonably self-contained, without programmers having to become familiar with large chunks of existing code. After all, those that have not seen or used Inferno before will usually take a little while to come to grips with the Limbo language and the Inferno environment. There is plenty of interesting work to be done in each project, making use of the novel aspects of Inferno to be sure, but also learning about a given application area. What do I mean by "interesting"? Every one of them is a project I should like to do myself, and perhaps has languished on my own TODO list, and none of them is "grunge work".

Notably excluded from the Summer section are native ports of the Inferno kernel (native ports are ports to raw hardware). Native ports are good projects, but I have experience now of people doing ports remotely and it is often just too hard to diagnose trouble at a distance, even if both parties have supposedly identical bits of hardware. The debugging support is often a little too Spartan for novices. As summer projects they are quite risky. By contrast, there has been more success with so-called hosted ports, where the Inferno kernel runs as an application under another operating system, and a few such projects are included.

One of the earliest mediaeval rounds merrily celebrates the lively arrival of early summer after the chill of winter:
Sumer is icumin in,
Lhude sing, cuccu!

The BBC Schools' Radio web site notes about it that:
As the cannon [sic!] develops, the melody weaves around itself to create a rich and complex rhythmic and harmonic pattern
That rich, constructive result is one of the hopes for this collection of projects.

Thursday 21 December 2006

Subversive Inferno

Now that Inferno's source code is available via Subversion on Googlecode, with incremental updates there (if it all works), I thought I'd try to keep a related blog. Fortunately it is Christmas, and I am on holiday. Meanwhile, Caerwyn's Inferno Programmer's Notebook (http://www.caerwyn.com/ipn/) sets a sufficiently high standard to satisfy readers (and put me off even trying) until the New Year. Still, why "Subversive" in the title of my initial post?
  1. Sub*vert" (?), v. t. [imp. & p. p. /Subverted/; p. pr. & vb. n. /Subverting/.] [L. subvertere, subversum; sub under + vertere to turn: cf. F. subvertir. See /Verse/.]
1. To overturn from the foundation; to overthrow; to ruin utterly.
These are his substance, sinews, arms, and strength,
With which he yoketh your rebellious necks,
Razeth your cities, and subverts your towns. Shak.
This would subvert the principles of all knowledge. Locke.
2. To pervert, as the mind, and turn it from the truth; to corrupt; to confound. 2 Tim. iii. 14. Syn. -- To overturn; overthrow; destroy; invert; reverse; extinguish.

...
It seems to have been a poor choice: ruination, corruption, turning from the truth etc. are certainly not the targets. Changing something at the foundation is probably closer to the real aim, and given the vast amount of existing code and systems, aiming to displace them is not realistic, but turning them all into run-time platforms for Inferno is both practical and sound.