Integrating NT & Unix Email
NOTE: this document was originally written in 2000 yet remains interesting from an historical perspective. References to 'two years ago' therefore relate to 1998 or thereabouts.
Some two years ago we became involved in designing a corporate email solution for a customer based in the UK's Midlands. Many of the issues that we faced have popped up again in various forms since then, so it seemed a reasonable time to describe what they were and some of the techniques used to solve them.
In particular, we think that an innovative approach has been to mix Unix and NT email systems in a way that works well for both communities of users — something we haven't seen described much elsewhere.
We were working as part of a team involving staff from our client's IT department, their existing network consultants and from time to time representatives of preferred suppliers. This is a situation which tends to lead to compromise and indeed had its own impact on the final design — our own preference was for a slightly different configuration, but at the end of the day the system as built works and reliably at that. There's nothing better than a happy customer!
The requirement was — as most are — an attempt to mix the possible with the desirable and get the best value for money with the least overhead. It wasn't fixed-in-stone just what was needed, instead everyone involved had to look at the existing setup, consider what could be done with it and within reasonable cost and timescale constraints, deliver a usable email solution to the business.
Business at the company is geographically spread. From the base in the central UK, there are around 150 branch offices, all connected via multiplexed serial lines to base. At present these lines carry printer and green-screen (dumb terminal) traffic, giving the branches access to databases run on some large Unix machines managing stock levels and the usual paraphernalia of a large distribution concern.
Though they don't really know it, the green-screen users are connecting to the various Unix systems via simple menus. They are adept users of the keyboard. To them, control-alt-shift-function12 is almost mystical in its secret meaning. If you showed them a mouse, they'd most likely try to set a trap for it.
Head office has a mixture of green-screen terminals and OS/2 PCs. Most PC users have historically been attached to WordPerfect, but with the best will in the world it's hard to see OS/2 as a strategic direction, so Windows in some shape or form is seen as a likely destination.
There is a comprehensive TCP/IP infrastructure already in place.
Not only do all of the green-screen users want an email solution, but (naturally) so do the PC client users and it's clear that some kind of solution is also wanted by the mobile sales force, who currently have no real IT support at all, unless you count the ability to call back to their office by phone. They don't have laptops or anything even remotely whizz-bang.
Administration has to be done by trained staff who are NOT required to be gurus. The operation runs 24 hours a day and there is an operations staff available at all times to deal with basic problems and maintenance, but they already have enough to do running the existing systems. The email system must not impose any serious overhead on their time or training requirements.
The Solution in Outline
Since this is a paper primarily about the solution not the problem, I'll gloss over the hours spent considering management issues and going down numerous 'what-if' dead ends whilst the team looked at possible solution architectures.
Whilst critiquing the solution, bear in mind that it was designed getting on for two years ago. The territory has moved since then and we might make different decisions now — but in the year since the system went live, it's had very few real-life problems and the user base is extremely pleased with the result. This is no academic exercise, it's real life with hundreds of dispersed users and all the problems that real life throws at neat and elegant-sounding designs to teach their designers a lesson.
Once we'd concluded that there were no show-stoppers (we didn't really expect any), the detail had to be worked through. The basics of the solution were:
- Keep the green-screen users on hardware they are used to, exploiting the existing asynchronous network.
- Try where possible to send parallel notification of email to the branches, since they will not be used to receiving email. Each mail for a branch will trigger notification on their local printer. Live with the fact that they won't be able to handle attached word-processor documents easily.
- Provide the PC-based users with a fully-functional email client, but nothing that requires more than POP/SMTP facilities from the server (for future-proofing).
- Run a pilot scheme where mobile staff use cellphones and Win/CE clients for mail.
The majority of the email users were going to be coming through green-screen terminals that talk to the Unix systems. Though it came close to tossing a coin to choose, the final preference was to provide them with Pine, free software covered by a very unrestricted licence from the University of Washington. This proved to be a fortuitous choice, for reasons we'll cover later.
After some warm debate, the eventual decision was to serve the PC users through Microsoft Exchange running on NT.
A general feeling that some of the mail handling might prove to be hairy led to the choice of Unix as the principal tool for mail routing, leaving Exchange to do the fancy kind of stuff that PC users were thought to want. Unix was a VERY good decision once we moved to detailed design.
Since the company was already happily using SCO Unix in a number of places, we decided to choose that as the platform for the email routing systems, combined with the free Sendmail package. Years of experience of Sendmail has given us a mixture of respect and loathing for it, but it indisputably works and its flexibility is legendary (as were the Gorgon, Medusa and the Horseman of the Apocalypse).
Security is clearly an important subject. Nobody was going to risk their job by recommending anything other than a commercial package, so one of the well-known packages was bought on the strength that it was able to do Network Address Translation (NAT, sometimes known as address spoofing). This turned out to be false (after it was bought, installed and running) so I shall sidestep potentially libellous comments and leave it unnamed. It came with a filter that is supposed to scan email and its attachments for viruses but has never been seen to work, which is odd, because the FTP virus scanner does perform as advertised.
In the diagram below, which shows the whole setup, the connection to the outside world is via a 64Kb/s Kilostream circuit from British Telecom delivered onto an Ethernet segment from the gizmo labelled ‘BT Router’. This talks to the PC known as SCO5 (the names have deep historical significance but no importance). SCO5 runs the firewall software, Sendmail for email forwarding and Netscape's web proxy software with its various options for permitting or denying access to sites and/or users.
An additional layer of defence against malicious intrusion lies behind the firewall: the network segment 192.168.250.0 is a private IP address which is inaccessible to the Internet at large (all external routers should drop packets to or from this address) and the company's real network is isolated from this segment by a second machine (SCO4) which is configured as a partial firewall in its own right — though not with commercial firewall software. The firewalling is done by restricting the services and routing offered by the box.
The company decided a long time ago to use the class A IP address 2.0 for its internal network (rather than the 10.0 that is reserved for private use). The cost of ‘correcting’ this didn't bear contemplating, so a dual-proxy approach is used instead. Anyone on the 2.0 network who wants to see the Web has to point their browser at the proxy on SCO4 which is running the free cacheing proxy Squid, an outstanding piece of software. After the event, the company was able to download various free utilities to analyse Squid's logfiles and spent many happy hours tinkering with them.
The dual proxy method is a great way of hiding illegal network addresses: requests to the inner Squid are masked because when it requests pages from the outer (SC05) proxy, all the requests bear the valid 192.168.250.224 address. As a result, the outer proxy never sees any references to the naughty network. We've used this approach with several other clients who have also chosen unfortunate network addresses and recommend it to anyone who has the same trouble.
All of the inbound email is delivered from the outside world — labelled ‘Internet’ in the diagram — via the should-check-viruses-but-doesn't-work firewall and onto a copy of Sendmail running on SCO5. This copy of Sendmail is configured to be a simple pass-through proxy: any mail for relevant domains is slung straight on to SCO3, whilst any other email is pushed out to the appropriate addressee on the Net (though with some header line rewriting to make sure the mail is seen to come from the company's preferred domain — this catches any accidental misconfiguration of internal clients and their addresses). Anything from the outside that isn't for the inside is rejected with a "we don't relay" error response; this prevents it from being used as a relay for the spamming community.
SCO3 is the real workhorse. It's also running Sendmail, which has a belt-and-braces header rewriting configuration just like SCO5.
All of the green-screen clients log on to this Unix system and use Pine to read and send their mail — more on the peculiarities of that setup in a moment.
All mail users are known to SCO3. If they actually use the Exchange mail server, the Sendmail aliases file is used to redirect their mail to the NT machine which then receives and processes the mail using the Exchange SMTP adapter. The way that the aliases file is maintained is one of the key parts of the setup and has its own section later.
There's not a lot to say about this. It's a bog-standard Exchange configuration, costing the company a significant amount in client licenses.
What it's most useful for is its addressbook. The addressbook has to be maintained by reception-desk staff and the point-and-click interface is easily explained. It's hard to see how Unix could do that without a fair bit of bespoke development, especially since it would have to be accessed from the reception-desk PCs. The addressbook is visible to smart email clients for the PC users; whether this justifies the cost of the client licenses is debatable, but it's there now. If it weren't for that feature, the PC clients could just as easily have talked POP/SMTP direct to SCO3 — which is precisely what the mobile clients do.
The addressbook plays a crucial part in generating the aliases file.
The aliases file is a wonderful part of Sendmail. It's a simple text file allowing you to create aliases for existing users. A line in the aliases file might read:
mikeb: firstname.lastname@example.org sales: user1, user2, mikeb
That would have the effect of ensuring that mail for mikeb is redirected to the same user but on the NT machine (i.e. punting it off to the Exchange server). The second line would mean that mail for sales is sent to user1, user2 and mikeb, with the latter then being recognised as a user on NT via the first rule.
There's much more it can do; the most valuable to us was for entries on the right-hand-side of alias rules to be Unix commands, not just the names of email users.
The first task was to work out how to generate the aliases file from the Exchange address book. It's possible to get the address book to export itself in CSV format if you ask it nicely, though this seems to demand human intervention. If there's any way of telling it to do that automatically or on demand from some kind of scheduling task, then we've not spotted it. A day or so of scratting around and we'd arranged for the resulting file to be copied over to SCO3 using a DOS batch file written for NT.
If the format of the addressbook file is documented anywhere we've missed that too. Several hours of experimenting allowed us to divine as much as we needed to know and the first cut of the Perl script to interpret it was written. By recycling some of the more unlikely fields in the address book, we were able to flag whether the users were local to Exchange or really SCO3 users. Within a day, generating the Sendmail aliases file was working already, though not doing everything that we wanted.
The next step was to add a field to show that particular addresses were in branches and so would need the printed notification slip for new mail. The Perl scrip was modified and, lo and behold, those addresses had the mail delivered not only to them, but to another Perl script to invoke the Unix printer commands over the network.
The next-to-last step was to add translation from external email names to the names already chosen for logging-in to the Unix system. Again history was having its effect — all the Unix users had names like u1234xxx where the ‘1234’ identified their branch or cost centre number and the xxx was three initials taken from their name. Rather than change this and cause administrative mayhem, we decided to live with it. The result was that those users' email addresses would have to be the same as their Unix logins — not exactly memorable and hard to explain on business cards. With a little prestidigitation, the Exchange database was coerced into taking another field or two and the Perl script rehacked .. and now names like a.einstein could be slipped into the aliases file and directed to the u1234aei mailbox. Users could publish their chosen external email address and the internal ‘real’ one would be a secret known only to them and their systems administrator.
Everyone was happy, everything was working — or so we thought until we spotted the looming disaster.
The cause of the problem was Pine. Since Noah first booted Unix Version 6 on his Ark Mk. I, Unix email clients have ‘known’ their user's email address — they fish it out of the soup by using Unix library calls designed for just that purpose. This meant that even though our fancy aliasing worked like a dream for inbound mail, outbound mail from a.einstein was going to go out with a From: line showing u1234aei no matter what we did. It's considered a security and anti-spam feature to ensure that there's no legal way to override this, certainly not as far as any well-written Unix email clients are concerned. This may strike you as odd if you are only used to Windows-based clients which allow you to send mail out with any From: address you like — but then anyone who caught a dose of the Melissa virus recently might be having second thoughts about the wisdom of a laissez faire approach to security.
Whether it's a good thing or not was a severe problem for us. The fix is not a thing of beauty, but it indubitably works. The flexibility of Pine's global configuration files was what saved us and made Pine such a good choice, even if only by luck.
We did some more hacking. As well as having forward aliases in the aliases file, we added a huge section which inverted the Unix external-to-internal aliases. This is no help to Sendmail which will only use the aliases file for inbound mail — and we had to hide the reverse aliases by making them look like comments in the file, since they would just have confused Sendmail. Next, we altered the global Pine configuration file to tell it not to use Sendmail to despatch outbound mail, but to invoke yet another Perl program. This program scanned the aliases file, extracted the inverse aliases and rewrote the From: address before handing it on to Sendmail as usual for outbound delivery. We could have used two files rather than stick it all in the one aliases file, but it seemed to make sense to keep all that stuff in one place.
Amazingly, after a couple of days of head-scratching and Perl-writing, we had a fully-working and integrated email system. The Perl script that reads the Exchange addressbook dump was given a final workover to create a Pine global address book so that Pine users can see the same address book as the Exchange users and that was that. The system has been live for over a year now and during busy times SCO3 is processing two or three messages a second, a load that it laughs at until someone sends a multi-megabyte attachment to everyone.
The internal email users have had all of their needs met by the system as we've described it. Though not crucial, the external users — mostly roving salespeople — also wanted a similar service.
The most promising client device at the time was a Windows CE handheld, using a cellphone for dial-up access. The chosen cellular supplier was able to offer a very competitive price for the service including a fibre-optic link direct to one of their local exchanges. This offered a number of specific benefits that aren't relevant to the email side of things, but which excited the telecomms folk.
When the cellphones call the appropriate number, they pop up on what looks like a primary-rate ISDN circuit. This plugs straight into an Ascend access switch — known to ISPs the world over. There are slightly more mobile users than the Ascend can store passwords for, so it's configured in the usual way to talk to a RADIUS database running on yet another SCO box — SCO4. This must be a strong contender for the "most under-used PC" competition.
Along with the roving staff, it had become clear that PC users in the branches would want email too. The data links from the branches are purely remote asynch terminals — the IP network doesn't spread that far (yet), so the short-term fix was obvious: give them modems and let them dial the Ascend access switch too. We're angling to get some cheap Linux boxes in there running PPP to get IP services up and going but that'll be another phase.
The remote users come in on the ‘demilitarized zone’ part of the firewall setup, not subject to quite such stringent access controls as the badlands of the public internet, but still limited in their access: they can see SCO3 to send and receive POP/SMTP mail and they can also see the web proxy on SCO4, giving them access to the Intranet (which is in its early stages of development).
There's nothing technically unusual about the remoteaccess setup beyond this; it's mainstream and common to see. It works.
The project is considered a great success by senior management and was delivered on time, to budget and without pain or heartache. This is as it should be, but historically speaking ranks as pretty rare. If anyone finds something useful or helpful in what we've discussed — or has their own suggestions on how to improve it, we'd love to hear from you. The project is, we think, moderately interesting, and in our opinion a showcase for the flexibility of utilities like Perl and Sendmail when combined with Unix. Integrating both with MS Exchange is not something that we've seen discussed before and having done it, we thought there might be one or two people who sould be interested to see what problems we needed to solve as well as what wasn't a problem at all.