2-cent tip: Summing up file sizes

Ben Okopnik [ben at linuxgazette.net]

Thu, 12 Jul 2007 16:32:00 -0400

I just got an iPod Shuffle, and am about to load it up with my favorite tunes; however, I didn't want to dink around with multiple reloads of the song list if it was too big. Since flash devices only have so many write cycles before they start losing their little brains, minimizing writes is a Good Thing. So, I needed to figure out the total size of the files in my M3U playlist - and given the kind of questions that often come up in my shell scripting classes and on LG's Answer Gang lists, I thought that our readers would find some of these techniques (as well as the script itself) useful. Herewith, the "m3u_size" script. Enjoy!

#!/bin/bash
# Created by Ben Okopnik on Thu Jul 12 15:27:45 EDT 2007
 
# Exit with usage message unless specified file exists and is readable
[ -r "$1" ] || { printf "Usage: ${0##*/} <file.m3u>\n"; exit; }
 
# For the purposes of the loop, ignore spaces in filenames
old=$IFS
IFS='
'
# Get the file size and sum it up
for n in `cat "$1"`; do s=`ls -l "$n" | cut -d ' ' -f 5`; ((total+=$s)); done
# Restore the IFS
IFS=$old
  
# Define G, M, and k
Gb=$((1024**3)); Mb=$((1024**2)); kb=1024
# Calculate the number of G+M+k+bytes in file list
G=$(($total/$Gb))
M=$((($total-$G*$Gb)/$Mb))
k=$((($total-$G*$Gb-$M*$Mb)/$kb))
b=$((($total-$G*$Gb-$M*$Mb-$k*$kb)))
 
echo "Total: $total (${G}G ${M}M ${k}k ${b}b)"

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Rick Moen [rick at linuxmafia.com]

Thu, 12 Jul 2007 13:42:20 -0700

Quoting Ben Okopnik (ben@linuxgazette.net):

> I just got an iPod Shuffle, and am about to load it up with my favorite
> tunes; however, I didn't want to dink around with multiple reloads of
> the song list if it was too big. Since flash devices only have so many
> write cycles before they start losing their little brains, minimizing
> writes is a Good Thing. So, I needed to figure out the total size of the
> files in my M3U playlist - and given the kind of questions that often
> come up in my shell scripting classes and on LG's Answer Gang lists, I
> thought that our readers would find some of these techniques (as well as
> the script itself) useful. Herewith, the "m3u_size" script.  Enjoy!

Don't forget to mount using noatime, for reasons explained in my second favourite Linux magazine: http://www.linuxjournal.com/article/6867

-- 
Cheers,      "Transported to a surreal landscape, a young girl kills the first
Rick Moen     woman she meets, and then teams up with three complete strangers
rick@linuxmafia.com       to kill again."  -- Rick Polito's That TV Guy column,
              describing the movie _The Wizard of Oz_

Top Back

Ben Okopnik [ben at linuxgazette.net]

Thu, 12 Jul 2007 23:38:46 -0400

On Thu, Jul 12, 2007 at 01:42:20PM -0700, Rick Moen wrote:

> Quoting Ben Okopnik (ben@linuxgazette.net):
> 
> > I just got an iPod Shuffle, and am about to load it up with my favorite
> > tunes; however, I didn't want to dink around with multiple reloads of
> > the song list if it was too big. Since flash devices only have so many
> > write cycles before they start losing their little brains, minimizing
> > writes is a Good Thing. So, I needed to figure out the total size of the
> > files in my M3U playlist - and given the kind of questions that often
> > come up in my shell scripting classes and on LG's Answer Gang lists, I
> > thought that our readers would find some of these techniques (as well as
> > the script itself) useful. Herewith, the "m3u_size" script.  Enjoy!
> 
> Don't forget to mount using noatime, for reasons explained in my second
> favourite Linux magazine:  http://www.linuxjournal.com/article/6867

Oooh, sweet bit of advice - I wasn't aware that mounting it with 'noatime' reduces the number of write ops, but it makes sense. I've modified my /etc/fstab, and all is well. Thanks, Rick!

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Neil Youngman [ny at youngman.org.uk]

Sat, 14 Jul 2007 12:56:42 +0100

On or around Thursday 12 July 2007 21:32, Ben Okopnik reorganised a bunch of electrons to form the message:

> I just got an iPod Shuffle, and am about to load it up with my favorite
> tunes; however, I didn't want to dink around with multiple reloads of
> the song list if it was too big. Since flash devices only have so many
> write cycles before they start losing their little brains, minimizing
> writes is a Good Thing. So, I needed to figure out the total size of the
> files in my M3U playlist - and given the kind of questions that often
> come up in my shell scripting classes and on LG's Answer Gang lists, I
> thought that our readers would find some of these techniques (as well as
> the script itself) useful. Herewith, the "m3u_size" script.  Enjoy!

I think du would be better as you need to know how many blocks of your flash device it will take up. Do you know what block size it uses?

Having said that I think

  du -bch `cat $file`

will do almost exactly what your script does, with the exception that it won't handle spaces in file names. For that maybe

  du -bch `sed -e's/^/"/' -e's/$/"/' < $file`

might work?

N.B. I haven't tested any of the above.

To get number of blocks, some other du options might be more appropriate, e.g. du -B 1024 if it's a 1k block if the (du -b gives you the number of bytes in the file, as ls does).

Neil

Top Back

Ben Okopnik [ben at linuxgazette.net]

Sat, 14 Jul 2007 10:12:14 -0400

On Sat, Jul 14, 2007 at 12:56:42PM +0100, Neil Youngman wrote:

> On or around Thursday 12 July 2007 21:32, Ben Okopnik reorganised a bunch of 
> electrons to form the message:
> > I just got an iPod Shuffle, and am about to load it up with my favorite
> > tunes; however, I didn't want to dink around with multiple reloads of
> > the song list if it was too big. Since flash devices only have so many
> > write cycles before they start losing their little brains, minimizing
> > writes is a Good Thing. So, I needed to figure out the total size of the
> > files in my M3U playlist - and given the kind of questions that often
> > come up in my shell scripting classes and on LG's Answer Gang lists, I
> > thought that our readers would find some of these techniques (as well as
> > the script itself) useful. Herewith, the "m3u_size" script.  Enjoy!
> 
> I think du would be better as you need to know how many blocks of your flash 
> device it will take up. Do you know what block size it uses?

2048 bytes; at least that's what /var/log/messages claims at mount time.

> Having said that I think
> 
>   du -bch `cat $file`
> 
> will do almost exactly what your script does, with the exception that it won't 
> handle spaces in file names.

...
du: cannot access `/home/ben/Music/World/Szosztar': No such file or directory
du: cannot access `mange': No such file or directory
du: cannot access `-': No such file or directory
du: cannot access `Tim': No such file or directory
du: cannot access `Rayborn.mp3': No such file or directory
du: cannot access `/home/ben/Music/Yes': No such file or directory
du: cannot access `-': No such file or directory
du: cannot access `Owner': No such file or directory
du: cannot access `Of': No such file or directory
du: cannot access `A': No such file or directory
du: cannot access `Lonely': No such file or directory
du: cannot access `Heart.mp3': No such file or directory
du: cannot access `/home/ben/Music/Dave': No such file or directory
du: cannot access `Stringer/Japa/01': No such file or directory
du: cannot access `Ganapati': No such file or directory
du: cannot access `Om.mp3': No such file or directory
du: cannot access `/home/ben/Music/Dave': No such file or directory
du: cannot access `Stringer/Japa/02': No such file or directory
du: cannot access `Jay': No such file or directory
du: cannot access `Ambe.mp3': No such file or directory
...

...and so on. Of course.

> For that maybe
> 
>   du -bch `sed -e's/^/"/' -e's/$/"/' < $file`
> 
> might work?

Almost the same list of errors, I'm afraid.

The way that Bash tokenizes a line depends on the IFS variable, which by default consists of a space, a tab, and a newline. So, no, "sed" isn't going to do anything useful - since it's not going to be presented with anything more than the same token as 'du' was previously. However, changing the value of IFS temporarily (just as I do in the script) will help:

ben@Tyr:~/devel/iPod$ old=$IFS
ben@Tyr:~/devel/iPod$ IFS='
'
ben@Tyr:~/devel/iPod$ du -bS `cat 0_ALL_0.m3u`
[ ... ]
2323696 /home/ben/Music/Vysotsky - Humor/Vladimir Visotsky - pis'mo iz durdoma.mp3
2965504 /home/ben/Music/World/Bora - Tim Rayborn.mp3
4538368 /home/ben/Music/World/Breathing - Shiva In Exile.mp3
5668826 /home/ben/Music/World/Fun Da Mental - Ja Sha Taan.mp3
2867200 /home/ben/Music/World/Szosztar mange - Tim Rayborn.mp3
3723180 /home/ben/Music/Yes - Owner Of A Lonely Heart.mp3
8085419 /home/ben/Music/Dave Stringer/Japa/01 Ganapati Om.mp3
7443434 /home/ben/Music/Dave Stringer/Japa/02 Jay Ambe.mp3
9048397 /home/ben/Music/Dave Stringer/Japa/04 Devakinandana gopala (major).mp3
5244132 /home/ben/Music/Dave Stringer/Mala/04 Gaja Nana.mp3
3398016 /home/ben/Music/Blossom Dearie and Lyle Lovett - Peel Me A Grape.mp3
3619944 /home/ben/Music/Luis Armstrong - Uncle Satchmo's Lullaby.mp3
ben@Tyr:~/devel/iPod$ du -bS `cat 0_ALL_0.m3u`|awk '{a+=$1}END{print a}'
889094121
ben@Tyr:~/devel/iPod$ IFS=$old

I do have to say that this solution is much faster than my "for" loop - so I'll update the script a bit. Incidentally, note the "du -bS" usage: I've played around with the "du" options, and there doesn't seem to be a way to tell it to display just the sum, in bytes, of a group of files. Seems rather silly... but that's OK; an accumulator isn't hard to write in "awk" or whatever.

#!/bin/bash
# Created by Ben Okopnik on Thu Jul 12 15:27:45 EDT 2007
  
# Exit with usage message unless specified file exists and is readable
[ -r "$1" ] || { printf "Usage: ${0##*/} <file.m3u>\n"; exit; }
  
################# User-configurable variables ###################
device="/dev/sda"
#################################################################
 
# For the purposes of the loop, ignore spaces in filenames
old=$IFS
IFS='
'
# Sum it up using Neil Youngman's kewl idea
total=$(du -bS $(cat "$1")|awk '{a+=$1}END{print a}')
# Restore the IFS
IFS=$old
 
# Calculate the number of G+M+k+bytes in file list
Gb=$((1024**3)); Mb=$((1024**2)); kb=1024
G=$(($total/$Gb))
M=$((($total-$G*$Gb)/$Mb))
k=$((($total-$G*$Gb-$M*$Mb)/$kb))
b=$((($total-$G*$Gb-$M*$Mb-$k*$kb)))
 
echo "Total: $total (${G}G ${M}M ${k}k ${b}b)"
 
# Test for fit in the defined device (number of blocks)
blocks=$(df --sync -B 2048 "$device"|awk '/^\//{print $2}')
[ "$(($total/2048))" -gt "$blocks" ] && printf "OH MY GAWD, IT'S TOO BIG! I CAN'T TAKE IT!\n"

Mmmm, tasty.

> N.B. I haven't tested any of the above.
> 
> To get number of blocks, some other du options might be more appropriate, e.g. 
> du -B 1024 if it's a 1k block if the  (du -b gives you the number of bytes in 
> the file, as ls does).

Well, I still want to know the "human-readable" size, and have it broken down by G/M/k/b - although note that the error report at the end is predicated on the number of blocks reported by "df".

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Neil Youngman [ny at youngman.org.uk]

Sat, 14 Jul 2007 17:05:13 +0100

On or around Saturday 14 July 2007 15:12, Ben Okopnik reorganised a bunch of electrons to form the message:

> I do have to say that this solution is much faster than my "for" loop -
> so I'll update the script a bit. Incidentally, note the "du -bS" usage:
> I've played around with the "du" options, and there doesn't seem to be a
> way to tell it to display just the sum, in bytes, of a group of files.
> Seems rather silly... but that's OK; an accumulator isn't hard to write
> in "awk" or whatever.

I thought -bc would do it

       -c, --total
              produce a grand total

but their idea of a "grand total" isn't quite what I meant by a "grand total".

Neil

Top Back

Neil Youngman [ny at youngman.org.uk]

Sat, 14 Jul 2007 21:58:00 +0100

On or around Saturday 14 July 2007 17:05, Neil Youngman reorganised a bunch of electrons to form the message:

> On or around Saturday 14 July 2007 15:12, Ben Okopnik reorganised a bunch
> of
>
> electrons to form the message:
> > I do have to say that this solution is much faster than my "for" loop -
> > so I'll update the script a bit. Incidentally, note the "du -bS" usage:
> > I've played around with the "du" options, and there doesn't seem to be a
> > way to tell it to display just the sum, in bytes, of a group of files.
> > Seems rather silly... but that's OK; an accumulator isn't hard to write
> > in "awk" or whatever.
>
> I thought -bc would do it
>
>        -c, --total
>               produce a grand total
>
> but their idea of a "grand total" isn't quite what I meant by a "grand
> total".

No, actually it is the same.

503 > du -bc *.txt | tail -1
47492   total
504 >

Doh.

Neil

Top Back

Ben Okopnik [ben at linuxgazette.net]

Sat, 14 Jul 2007 18:28:57 -0400

On Sat, Jul 14, 2007 at 09:58:00PM +0100, Neil Youngman wrote:

> On or around Saturday 14 July 2007 17:05, Neil Youngman reorganised a bunch of 
> electrons to form the message:
> > On or around Saturday 14 July 2007 15:12, Ben Okopnik reorganised a bunch
> > of electrons to form the message:
> >
> > > I do have to say that this solution is much faster than my "for" loop -
> > > so I'll update the script a bit. Incidentally, note the "du -bS" usage:
> > > I've played around with the "du" options, and there doesn't seem to be a
> > > way to tell it to display just the sum, in bytes, of a group of files.
> > > Seems rather silly... but that's OK; an accumulator isn't hard to write
> > > in "awk" or whatever.
> >
> > I thought -bc would do it
> >
> >        -c, --total
> >               produce a grand total
> >
> > but their idea of a "grand total" isn't quite what I meant by a "grand
> > total".
> 
> No, actually it is the same.

Were you getting the directory sizes before? That's why I've been adding '-S' into the mix.

> 503 > du -bc *.txt | tail -1
> 47492   total
> 504 >
> 
> Doh.

du -Sbc $(cat "$1")|tail -1|cut -f1

is only a little shorter than what we've got now; I don't know that we can squeeze a whole lot more out of it.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Neil Youngman [ny at youngman.org.uk]

Sun, 15 Jul 2007 10:02:45 +0100

On or around Saturday 14 July 2007 23:28, Ben Okopnik reorganised a bunch of electrons to form the message:

> Were you getting the directory sizes before? That's why I've been adding
> '-S' into the mix.

You shouldn't be seeing directory sizes from du unless you're passing it parameters.

I get the exact same output for 'du -Sbc *.txt' as I do for 'du -bc *.txt'

Neil

Top Back

Ben Okopnik [ben at linuxgazette.net]

Sun, 15 Jul 2007 10:33:00 -0400

On Sun, Jul 15, 2007 at 10:02:45AM +0100, Neil Youngman wrote:

> On or around Saturday 14 July 2007 23:28, Ben Okopnik reorganised a bunch of 
> electrons to form the message:
> > Were you getting the directory sizes before? That's why I've been adding
> > '-S' into the mix.
> 
> You shouldn't be seeing directory sizes from du unless you're passing it 
> parameters.
> 
> I get the exact same output for 'du -Sbc *.txt' as I do for 'du -bc *.txt'

You're right; I'm just going for the belt and suspenders approach. The reason I published the 2-cent tip in the first place was because of the two useful techniques in it - summing up file sizes from a disparate list of files and parsing a large number; in the case of a list of MP3s, the question of directories will, of course, be moot, but not if it's used on arbitrary lists. In fact, there probably needs to be a definite and explicit mechanism for ignoring directories; in the case of the former "for" loop that I was using, saying

[ -f "$n" ] && ((total+=$size))

would do it, whereas for using "du", you'd just want to make sure that you don't specify any directories (i.e., instead of saying "du foo/", you'd want to say "du foo/*".)

ben@Tyr:/tmp$ mkdir foo
ben@Tyr:/tmp$ for n in `seq 5`; do head -1000c /dev/full > foo/$n; done
ben@Tyr:/tmp$ du -bc foo/|tail -1		# WRONG!
9096    total
ben@Tyr:/tmp$ du -bc foo/*|tail -1		# Right.
5000    total

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Martin J Hooper [martinjh at blueyonder.co.uk]

Sun, 15 Jul 2007 20:42:29 +0100

Ben Okopnik wrote:

> ben@Tyr:/tmp$ for n in `seq 5`; do head -1000c /dev/full >
> foo/$n; done

Ben can you explain this line a bit more? Its obvious its looping 5 times to create the files but how is that head command working? and should that be /dev/null not /dev/full ? ;)

Top Back

Ben Okopnik [ben at linuxgazette.net]

Sun, 15 Jul 2007 16:27:46 -0400

On Sun, Jul 15, 2007 at 08:42:29PM +0100, Martin J Hooper wrote:

> Ben Okopnik wrote:
> > ben@Tyr:/tmp$ for n in `seq 5`; do head -1000c /dev/full >
> > foo/$n; done
> 
> Ben can you explain this line a bit more?

Sure.

>  Its obvious its
> looping 5 times to create the files but how is that head command
> working?

It grabs the "first" 1000 bytes ('c'haracters) from "/dev/full" and redirects them into a file in "foo/"; the name of the file is, of course, the same as the loop iterator - 1 through 5 depending on the loop.

> and should that be /dev/null not /dev/full ? ;)

Nope. In electrical terms, "/dev/null" is a byte sink, whereas "/dev/full" is a byte source (although that's not what it's designed for; you generally want to use "/dev/zero" as an infinite source of null bytes.) See also "/dev/urandom" if you need lots of line-noise.

All of this stuff is documented in "/usr/linux/Documentation/devices.txt" - it's a good idea to be at least somewhat familiar with its contents.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Martin J Hooper [martinjh at blueyonder.co.uk]

Mon, 16 Jul 2007 05:32:43 +0100

Ben Okopnik wrote:

> On Sun, Jul 15, 2007 at 08:42:29PM +0100, Martin J Hooper wrote:
>> Ben Okopnik wrote:
>>> ben@Tyr:/tmp$ for n in `seq 5`; do head -1000c /dev/full >
>>> foo/$n; done
>> Ben can you explain this line a bit more?
> 
> Sure.

Snipped

Thanks Ben - That makes a bit more sense now.

Top Back