...making Linux just a little more fun!
Minh Nguyen [nguyenminh2 at gmail.com]
So far, issues of LG have been compressed using tar and gzip. Is there any intention to use tar with bzip2 for future issues? Since most of the files in each issue are text files, bzip2 is more efficient (in terms of the size of the compressed file) than gzip. Here is a comparison of bzip2 and gzip using the current issue; i.e. November 2007 (#144):
1028042 lg-144.tar.bz2 1045337 lg-144.tar.gzIMHO, providing a bzip2 compressed format of LG issues would save some download time.
Regards
Minh Van Nguyen
Ramon van Alteren [ramon at forgottenland.net]
Minh Nguyen wrote:
> So far, issues of LG have been compressed using tar and gzip. Is there > any intention to use tar with bzip2 for future issues? Since most of > the files in each issue are text files, bzip2 is more efficient (in > terms of the size of the compressed file) than gzip. Here is a > comparison of bzip2 and gzip using the current issue; i.e. November > 2007 (#144): > > 1028042 lg-144.tar.bz2 > 1045337 lg-144.tar.gz
That is a 1% size decrease.
Best regards,
Ramon
Ben Okopnik [ben at linuxgazette.net]
On Fri, Nov 02, 2007 at 02:13:10PM +1100, Minh Nguyen wrote:
> So far, issues of LG have been compressed using tar and gzip. Is there > any intention to use tar with bzip2 for future issues? Since most of > the files in each issue are text files, bzip2 is more efficient (in > terms of the size of the compressed file) than gzip. Here is a > comparison of bzip2 and gzip using the current issue; i.e. November > 2007 (#144): > > 1028042 lg-144.tar.bz2 > 1045337 lg-144.tar.gz > > IMHO, providing a bzip2 compressed format of LG issues would save some > download time.
As I recall, we had a similar discussion here in TAG quite a while back (digging through my 'Sent_mail' says 2002 - but I can't find it in LG. Annoying, that.) In any case, here's the comparison that I ran then:
OK, I'm the curious type... Here's a bunch of files from many walks of life; let's see who does what. -rw-r--r-- 1 ben ben 1474560 May 20 05:51 test.bin -rw-rw-r-- 1 ben ben 102970 Sep 19 2000 test.bmp -rw-rw-r-- 1 ben ben 121880 Sep 19 2000 test.gif -rw-rw---- 1 ben ben 939783 Jun 17 15:29 test.jpg -rw-r--r-- 1 ben ben 1727320 Oct 6 15:51 test.mov -rw-r--r-- 1 ben ben 1048576 Oct 16 20:59 test.nulls -rw-r--r-- 1 ben ben 1048576 Oct 16 21:03 test.ones -rw-r--r-- 1 ben ben 490765 Sep 1 2001 test.pbm -rw-r--r-- 1 ben ben 197029 Oct 12 13:53 test.ps -rw-rw-r-- 1 ben ben 1995119 May 29 2001 test.txt -rw-r--r-- 1 ben ben 36354922 Oct 16 20:29 test.wav # So then, I was like, "Dude, check out some of this stuff:" rar a ../rar.rar * # Very slow zip ../zip.zip * tar czf ../tgz.tgz * # Uses gzip as compressor tar cjf ../tbz2.tbz2 * # Uses bz2 as compressor, slowest of all tar cf -|compress - # And the winnah and champeen is... -rw-r--r-- 1 ben ben 26653542 Oct 16 21:09 rar.rar -rw-r--r-- 1 ben ben 33171830 Oct 16 21:26 tbz2.tbz2 -rw-r--r-- 1 ben ben 36128937 Oct 16 21:10 zip.zip -rw-r--r-- 1 ben ben 36132733 Oct 16 21:14 tgz.tgz -rw-r--r-- 1 ben ben 43458125 Oct 16 21:21 Z.Z I'll be darned. Looks like "rar" is it. Whodathunk?Unfortunately, the only method that shows an appreciable savings in size - 'rar', that is - uses a proprietary algorithm.
Given that there's no appreciable gain to be had by changing - and that a change may occasion problems (e.g., it would break any automated scripts that download and decompress the monthly archives), I don't see it changing any time soon. I'm usually pretty reluctant to change things like this without a really compelling reason.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Kapil Hari Paranjape [kapil at imsc.res.in]
Hello,
On Fri, 02 Nov 2007, Ben Okopnik wrote:
> # And the winnah and champeen is... > > -rw-r--r-- 1 ben ben 26653542 Oct 16 21:09 rar.rar > -rw-r--r-- 1 ben ben 33171830 Oct 16 21:26 tbz2.tbz2 > -rw-r--r-- 1 ben ben 36128937 Oct 16 21:10 zip.zip > -rw-r--r-- 1 ben ben 36132733 Oct 16 21:14 tgz.tgz > -rw-r--r-- 1 ben ben 43458125 Oct 16 21:21 Z.Z > > I'll be darned. Looks like "rar" is it. Whodathunk?
You should've tried "7zip".
Regards,
Kapil. --
Breen Mullins [breen.mullins at gmail.com]
* Kapil Hari Paranjape <kapil@imsc.res.in> [2007-11-02 19:54 +0530]:
> >You should've tried "7zip". >That would've been quite a trick in 2002...
Breen
-- Breen Mullins Menlo Park, California
Ben Okopnik [ben at linuxgazette.net]
On Fri, Nov 02, 2007 at 07:58:58AM -0700, Breen Mullins wrote:
> * Kapil Hari Paranjape <kapil@imsc.res.in> [2007-11-02 19:54 +0530]: > > > > >You should've tried "7zip". > > > That would've been quite a trick in 2002...
I was wondering about that. Like I said, I only heard about it much later - and it was being touted as a brand-new widget then.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
Ben Okopnik [ben at linuxgazette.net]
On Fri, Nov 02, 2007 at 07:54:17PM +0530, Kapil Hari Paranjape wrote:
> Hello, > > On Fri, 02 Nov 2007, Ben Okopnik wrote: > > # And the winnah and champeen is... > > > > -rw-r--r-- 1 ben ben 26653542 Oct 16 21:09 rar.rar > > -rw-r--r-- 1 ben ben 33171830 Oct 16 21:26 tbz2.tbz2 > > -rw-r--r-- 1 ben ben 36128937 Oct 16 21:10 zip.zip > > -rw-r--r-- 1 ben ben 36132733 Oct 16 21:14 tgz.tgz > > -rw-r--r-- 1 ben ben 43458125 Oct 16 21:21 Z.Z > > > > I'll be darned. Looks like "rar" is it. Whodathunk? > > You should've tried "7zip".
I recall finding out about and playing with 7zip well after this discussion; I don't recall being particularly impressed with it one way or another. Looking at it now, one reason, at least, stands out:
From the man page: Backup and limitations DO NOT USE the 7-zip format for backup purpose on Linux/Unix because : - 7-zip does not store the owner/group of the file.Compression-wise, using my 'Sent_mail' archive (I've trimmed the output for readability):
ben@Tyr:/tmp/t$ time tar cvzf Sent_mail.tgz Sent_mail real 0m34.554s ben@Tyr:/tmp/t$ time tar cvjf Sent_mail.tbz Sent_mail real 1m52.085s ben@Tyr:/tmp/t$ time tar cvZf Sent_mail.tar.Z Sent_mail real 0m47.239s ben@Tyr:/tmp/t$ time tar cvf - Sent_mail | 7zr a -si Sent_mail.7z real 2m34.064s ben@Tyr:/tmp/t$ ls -lS total 551944 -rw-r--r-- 1 ben ben 162769893 2007-11-02 10:40 Sent_mail -rw-r--r-- 1 ben ben 128435455 2007-11-02 10:56 Sent_mail.tar.Z -rw-r--r-- 1 ben ben 96948867 2007-11-02 10:52 Sent_mail.tgz -rw-r--r-- 1 ben ben 92754358 2007-11-02 10:54 Sent_mail.tbz -rw-r--r-- 1 ben ben 83686986 2007-11-02 11:00 Sent_mail.7zYep, "7zip" is smallest (I don't have "rar" anymore; won't be using proprietary software for key LG functions anyway. It's also slowest, by a large margin. TANSTAAFL, I guess.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *