3. Usenet news software

3.1. A brief history of Usenet systems

Towards the end of this HOWTO, we have added some information about the history of Usenet server software by quoting sections from an earlier Usenet Periodic Posting. We consider this historical perspective, and the Usenix papers and other documents referred to in it, essential reading for any Usenet server administrator. Please see the section titled "Usenet software: a historical perspective>".

3.2. C-News and NNTPd

C-News was written by Henry Spencer and Geoff Collyer of the Department of Zoology, University of Toronto, almost entirely in shell and awk, as a replacement for an earlier system called B-News. The focus was on adding some extra features and a lot of performance. The first release was called Shellscript Release, which was deployed by a very large number of servers worldwide, as a natural upgrade to B-News. This version of C-News had upward compatibility with B-News meta-data, e.g. history files. This was the version of C-News which was initially rolled out in 1991 or so at the National Centre for Software Technology (NCST, http://www.ncst.ernet.in) and the Indian Institutes of Technology in India as part of the Indian educational and research network (ERNET). We received guidance from the NCST about Usenet news installation and management.

The Shellscript Release was soon followed by a re-write with a lot more C code, called Performance Release, and then a set of cleanup and component integration steps leading to the last release called the Cleanup Release. This Cleanup Release was patched many times by the authors, and the last one was CR.G (Cleanup Release revision G). The version of C-News discussed in this HOWTO is a set of small bug fixes on CR.G.

Since C-News came from shellscript-based antecedents, its architecture followed the set-of-programs style so typical of Unix, rather than large monolothic software systems traditional to some other OSs. All pieces had well-defined roles, and therefore could be easily replaced with other pieces as needed. This allowed easy adaptations and upgradations. This never affected performance, because key components which did a lot of work at high speed, e.g. newsrun, had been rewritten in C by that time. Even within the shellscripts, crucial components which handled binary data, e.g. a component called dbz to manipulate efficient on-disk hash arrays, were C programs with command-line interfaces, called from scripts.

C-News was born in a world with widely varying network line speeds, where bandwidth utilisation was a big issue and dialup links with UUCP file transfers was common. Therefore, it has strong support for batched feeds, specially with a variety of compression techniques and over a variety of fast and slow transport channels. And C-News virtually does not know the existence of TCP/IP, other than one or two tiny batch transport programs like viarsh. However, its design was so modular that there was absolutely no problem in plugging in NNTP functionality using a separate set of C programs without modifying a single line of C-News. This was done by a program suite called NNTP Reference Implementation, which we call NNTPd.

This software suite could work with B-News and C-News article repositories, and provided the full NNTP functionality. Since B-News died a gradual death, the combination of C-News and NNTPd became a freely redistributable, portable, modern, extensible, and high-performance software suite for Unix Usenet servers. Further refinements were added later, e.g. nov, the News Overview package and pgpverify, a public-key-based digital signature module to protect Usenet news servers against fraudulent control messages.

3.3. INN

INN is one of the two most widely used Usenet news server solutions. It was written by Rich Salz for Unix systems which have a socket API --- probably all Unix systems do, today.

INN has an architecture diametrically opposite to CNews. It is a monolithic program, which is started at bootup time, and keeps running till your server OS is shut down. This is like the way high performance HTTP servers are run in most cases, and allows INN to cache a lot of things in its memory, including message-IDs of recently posted messages, etc. This interesting architecture has been discussed in an interesting paper by the author, where he explains the problems of the older B-News and C-News systems that he tried to address. Anyone interested in Usenet software in general and INN in particular should study this paper.

INN addresses a Usenet news world which revolves around NNTP, though it has support for UUCP batches --- a fact that not many INN administrators seem to talk about. INN works faster than the CNews-NNTPd combination when processing multiple parallel incoming NNTP feeds. For multiple readers reading and posting news over NNTP, there is no difference between the efficiency of INN and NNTPd. Section 5.7> discusses the efficiency issues of INN over the earlier C-News architecture, based on Rich Salz' paper and our analyses of usage patterns.

INN's architecture has inspired a lot of high-performance Usenet news software, including a lot of commercial systems which address the ``carrier class'' market. That is the market for which the INN architecture has clear advantages over C-News.

3.4. Leafnode

This is an interesting software system, to set up a ``small'' Usenet news server on one computer which only receives newsfeeds but does not have the headache of sending out bulk feeds to other sites, i.e. it is a ``leaf node'' in the newsfeed flow diagram. According to its homepage (www.leafnode.org), ``Leafnode is a USENET software package designed for small sites running any flavour of Unix, with a few tens of readers and only a slow link to the net. [...] The current version is 1.9.24.''

This software is a sort of combination of article repository and NNTP news server, and receives articles, digests and stores them on the local hard disks, expires them periodically, and serves them to an NNTP reader. It is claimed that it is simple to manage and is ideal for installation on a desktop-class Unix or Linux box, since it does not take up much resources.

Leafnode is based on an appealing idea, but we find no problem using C-News and NNTPd on a desktop-class box. Its resource consumption is somewhat proportional to the volume of articles you want it to process, and the number of groups you'll want to retain for a small team of users will be easily handled by C-News on a desktop-class computer. An office of a hundred users can easily use C-News and NNTPd on a desktop computer running Linux, with 64 MBytes of RAM, IDE drives, and sufficient disk space. Of course, ease of configuration and management is dependent on familiarity, and we are more familiar with C-News than with Leafnode. We hope this HOWTO will help you in that direction.

There is, however, one area in which Leafnode is far easier to administer than INN or C-News. Leafnode constantly monitors the actual usage of the newsgroups it carries, based on readership statistics of its NNTP readers. If a particular newsgroup is not read at all by any user for a week, then Leafnode will delete all articles in that newsgroup, free up disk space, and stop fetching new articles for it. If it finds that a previously abandoned newsgroup is now again receiving attention, even from one user, then it'll fetch all articles for that group from its upstream server the next time it connects. This self-tuning feature of Leafnode is really an excellent advantage which makes a Leafnode site easier to manage, specially for small setups with bandwidth and disk space constraints.

The Leafnode Website gives a lot of details in an easily understood format.

TO BE EXTENDED AND CORRECTED.

3.5. Suck

Suck is a program which lets you pull out an NNTP feed from an NNTP server and file it locally. It does not contain any article repository management software, expecting you to do it using some other software system, e.g. C-News or INN. It can create batchfiles which can be fed to C-News, for instance. (Well, to be fair, Suck does have an option to store the fetched articles in a spool directory tree very much like what is used by C-News or INN in their article area, with one file per article. You can later read this raw message spool area using a mail client which supports the msgdir file layout for mail folders, like MH, perhaps. We don't find this option useful if you're running Suck on a Usenet server.) Suck finally boils down to a single command-line program which is invoked periodically, typically from cron. It has a zillion command-line options which are confusing at first, but later show how mature and finely tunable the software is.

If you need an NNTP pull feed, then we know of no better programs than Suck for the job. The nntpxfer program which forms part of the NNTPd package also implements an NNTP pull feed, for instance, but does not have one-tenth of the flexibility and fine-tuning of Suck. One of the banes of the NNTP pull feed is connection timeouts; Suck allows a lot of special tuning to handle this problem. If we had to set up a Usenet server with an NNTP pull feed, we'd use Suck right away.

TO BE EXTENDED AND CORRECTED.

3.6. Carrier class software

Carrier-class servers are expected to handle a complete feed of all articles in all newsgroups, including a lot of groups which have what we call a ``high noise-to-signal ratio.'' They do not have the luxury of choosing a ``useful'' subset like administrators of internal corporate Usenet servers do. Secondly, carrier-class servers are expected to turn articles around very fast, i.e. they are expected to have very low latency from the moment they receive an article to the time they retransmit it by NNTP to downstream servers. Third, they are supposed to provide very high availability, like other ``carrier class'' services. This usually means that they have parallel arrays of computers in load sharing configurations. And fourth, they usually do not cater to retail connections for reading and posting articles by human users. Usenet news carriers usually reserve separate computers to handle retail connections.

Thus, carrier-class servers do not need to maintain a repository of articles; they only need to focus on super-efficient real-time re-transmission. These highly specialised servers have software which receive an article over NNTP, parse it, and immediately re-queue it for outward transmission to dozens or hundreds of other servers. And since they work at these high throughputs, their downstream servers are also expected to be live on the Internet round the clock to receive incoming NNTP connections, or be prepared to lose articles. Therefore, there's no batching or long queueing needed, and C-News-style batching in fact is totally inapplicable.

Therefore, these carrier-class Usenet servers are more like packet routers than servers with repositories. They are referred to nowadays as NNTP routers or news routers.

It can be seen why batch-oriented repository management software like C-News is a total anachronism here, and why they need an NNTP-oriented, online, real-time design. The INN antecedents of some of these systems is therefore natural. We would love to hear from any Linux HOWTO reader whose Usenet server requirements include carrier-class behaviour.

We are aware of only one freely redistributable NNTP router: NNTPRelay (see http://nntprelay.maxwell.syr.edu/); this software runs on NT. There is no reason why such services cannot run off Linux servers, even Intel Linux, provided you have fast network links and arrays of servers. Linux as an OS platform is not an issue here.

TO BE EXTENDED AND CORRECTED.