Odd system load situation

Ben Okopnik [ben at linuxgazette.net]

Mon, 10 Jan 2011 21:21:19 -0500

Hi, all -

I've got something odd going on, and I'm trying to get some perspective on it. Perhaps someone here can cast a light.

I'm running a long-term rsync job (a full backup, after way too long of a hiatus. I know - really *bad* for someone who hounds people to keep on top of their backups professionally... but you know the saying about the cobbler's kids being the last to have new shoes.) It's copying the files to an external drive, connected via USB. Here's the problem: now that it's running, my system has become extremely "sensitive" to any additional loads - even very light ones. Firing up an xterm visibly spikes the CPU (which, with nothing more than two xterms open, is running at a load average of ~4.) Starting 'vim' takes about 4 seconds. Opening a PDF with the rather lightweight 'xpdf' takes about 9 seconds. Reformatting a paragraph in 'vim' turns the xterm gray for a good 5 seconds and almost freezes the cursor. Opening Firefox does freeze the cursor and prevents me from being able to tab between open applications for a good 30 seconds - and when it's open, the system is essentially useless for anything else. Needless to say, all but the last one are usually nearly instant, and Firefox normally takes just a couple of seconds, and doesn't lock anything up while loading.

Here's the kicker: "top" shows... nothing particularly unusual. vmstat/iostat report essentially the same story.

$ (vmstat -a;iostat)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa
 2  2 492908  15776 580684 341224    3    4    14     9    2   21 17  4 74  6
Linux 2.6.31-22-generic (Jotunheim)     01/10/2011      i686  (2 CPU)
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          16.70    0.20    3.81    5.71    0.00   73.59
 
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               3.27       131.12        48.13  219433573   80544152
sdb               0.62         0.85        72.43    1429100  121212480

Memory usage is reasonable, swap is barely being touched, the CPU is spending 3/4 of its time being idle, even the number of context switches is very reasonable as compared to the I/O rate. If I saw this on a remote machine, I'd figure it was being under-utilized. :\

Now, it is true that I'm not running some mad smokin' powerhouse machine that requires a dedicated nuclear power plant:

$ cat /proc/cpuinfo|egrep '^(processor|model name)'
processor       : 0
model name      : Intel(R) Atom(TM) CPU N270   @ 1.60GHz
processor       : 1
model name      : Intel(R) Atom(TM) CPU N270   @ 1.60GHz
$ cat /proc/meminfo|grep MemTotal
MemTotal:        1016764 kB

It's just a little Acer laptop... but this is usually enough for pretty much anything I need, including serving fairly heavy-duty Perl and PHP scripting via Apache. So... what's going on? What is "rsync" doing that is essentially invisible but is enough to make this thing behave like a 286 with 64k of memory? I thought I understood what the above numbers mean, and could reasonably estimate system state from them - but it seems that I'm wrong.

Does anyone see anything that I'm missing? Or is there another place I should be looking?

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

afsilva at gmail.com [(afsilva at gmail.com)]

Mon, 10 Jan 2011 21:56:59 -0500

>
> Does anyone see anything that I'm missing? Or is there another place I
> should be looking?
>

try this:

http://blog.internetnews.com/skerner/2010/11/forget-200-lines-red-hat-speed.html

-- 
http://www.the-silvas.com

Top Back

Ben Okopnik [ben at linuxgazette.net]

Mon, 10 Jan 2011 22:42:48 -0500

On Mon, Jan 10, 2011 at 09:56:59PM -0500, Anderson Silva wrote:

> >
> > Does anyone see anything that I'm missing? Or is there another place I
> > should be looking?
> 
> try this:
> 
> http://blog.internetnews.com/skerner/2010/11/forget-200-lines-red-hat-speed.html

if [ "$PS1" ] ; then
 
echo $$ > /sys/fs/cgroup/cpu/user/$$/tasks
fi
Then, as the superuser do this:
 
mount -t cgroup cgroup /sys/fs/cgroup/cpu -o cpu
mkdir -m 0777 /sys/fs/cgroup/cpu/user

Damn, that's pretty cool. However, it doesn't work for me:

$ sudo mkdir -m 0700 /sys/fs/cgroup/cpu/user/$$
mkdir: cannot create directory `/sys/fs/cgroup/cpu/user/23436': No such file or directory
$ dir=/sys/fs/cgroup/cpu/user/$$; until [ -d "$dir" ]; do echo "Does not exist: $dir"; dir=${dir%/*}; done
Does not exist: /sys/fs/cgroup/cpu/user/23436
Does not exist: /sys/fs/cgroup/cpu/user
Does not exist: /sys/fs/cgroup/cpu
Does not exist: /sys/fs/cgroup

Perhaps it takes a different kind of kernel option set or something. Besides, 'sudo nice' would probably whack 'rsync' over the head and make it behave... but that's not what the real problem is.

To put it in different words, if a remote user called you up and said "my system is really dragging", how would you identify the problem if it was this one? As far as the system tools are telling me, everything is Just Peachy Keen - except that I see turtles passing my CPU at what looks like jet speeds, and foot-thick geological strata are accreting while it does a single NOP. It's making me doubt my sysadmin-fu.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Henry Grebler [henrygrebler at optusnet.com.au]

Tue, 11 Jan 2011 15:14:21 +1100

A smart doctor wouldn't treat you over the phone. Too much danger of missing something.

But, I'm not a doctor, and nobody's health is at stake. So, if we stipulate that, from this distance, I might not see the whole picture, here's my take.

I guess you're sort of asking 2 questions: what is responsible for the rotten performance? what can I do about it?

But first a digression :-) Back in the days of the DECsystem10 (maybe also VAX/VMS), there were 3 top-like commands: TOPCPU, TOPBIO, TOPDIO. TOPCPU was like top. Despite frequently looking, I have never found a Linux command that matched the other 2 tops (buffered IO and direct IO).

You can probably simulate your symptoms on most machines with something like

dd if=/dev/hda of=/dev/null

My theory is that rsync is hammering the disk bus. Whenever you try to do something, disk is involved so it runs slowly. My suspicion is that, if you had a CPU-bound job, you would notice no slowing down, because there's CPU to spare.

Instead of a two CPUs, for this application you'd be better off with 2 paths to each disk. If you have another computer, you might be better off attaching the USB disk to the other computer and doing the rsync over the network.

Of course I could be entirely wrong, but try ^Z on the rsync and see if everything else gets to be ok.

Sadly, "nice" is for CPU usage and I'm not aware of anything that does the equivalent for IO. We want something that says, in effect, the human at the keyboard is important. In the overall scheme of things, humans actually don't take a lot of servicing, so prioritise the human. The moment you detect a single keystroke or mouse movement, pause any daemon or long running resource hog, and give the human your full attention. If there is no human activity for, oh, 10 seconds, then cause the paused job to continue. You could adjust the basic idea so that even resource hogs got a bit of a look in.

You ought to be able to detect if the disk is utilising much of the io bandwidth, because the disk light should be mainly on.

But, then again, I might have it completely wrong.

Top Back

Ben Okopnik [ben at linuxgazette.net]

Mon, 10 Jan 2011 23:56:04 -0500

On Tue, Jan 11, 2011 at 03:14:21PM +1100, Henry Grebler wrote:

> 
> A smart doctor wouldn't treat you over the phone. Too much danger of
> missing something.

There's a really serious problem with that analogy, though: a human cannot return precise self-diagnostic measurements. A computer can.

> But, I'm not a doctor, and nobody's health is at stake. So, if we
> stipulate that, from this distance, I might not see the whole picture,
> here's my take.

Hey, you're down there on the lower side of the planet, where you guys have to magnetize your shoes to stay attached or something. If I'm asking for a different perspective, who better?

> I guess you're sort of asking 2 questions: what is responsible for the
> rotten performance? what can I do about it?

Yep - with the second (currently) being a lot less important than the first. Actually, even the first breaks down further: "...and what can I look at that is a specific indicator of that rotten performance?"

> My theory is that rsync is hammering the disk bus.

Except that 'iostat' should show that. Besides, why would a running instance of Firefox (which I can't imagine doing a whole bunch of I/O, especially while it's not being used) bring everything to a screaming halt in that case?

$ iostat -dkz
Linux 2.6.31-22-generic (Jotunheim)     01/10/2011      i686  (2 CPU)
 
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               3.36        70.90        24.07  119328822   40504964
sdb               0.71         0.52        41.38     879590   69641056

That's the rate in kilobytes per second (sda is my /, and sdb is the external drive.)

I've just restarted 'rsync' with a '--progress' option, and here's what that looks like:

...
ben/Music/Bach - Lifescapes/05 - Allegro (from Viola da Gamba Sonata in Gm).mp3
     3248339 100%  642.02kB/s    0:00:04 (xfer#100, to-check=1023/329223)
ben/Music/Bach - Lifescapes/06 - Andante (from Sonata for Violin in Am).mp3
     5043474 100%    1.11MB/s    0:00:04 (xfer#101, to-check=1022/329223)
ben/Music/Bach - Lifescapes/07 - Wachet Auf.mp3
     4196271 100%    2.88MB/s    0:00:01 (xfer#102, to-check=1021/329223)
ben/Music/Bach - Lifescapes/08 - Allegro (from Concerto for Oboe & Violin in Cm).mp3
     5174295 100%  998.42kB/s    0:00:05 (xfer#103, to-check=1020/329223)
ben/Music/Bach - Lifescapes/09 - Adagio & Allegro (from Viola da Gamba Sonata in D).mp3
     4826553 100%  677.02kB/s    0:00:06 (xfer#104, to-check=1019/329223)
ben/Music/Bach - Lifescapes/10 - Sarabande (from Violin Partita in Dm).mp3
     4276519 100%    1.86MB/s    0:00:02 (xfer#105, to-check=1018/329223)
ben/Music/Bach - Lifescapes/11 - Nun Komm der Heiden Heilund.mp3
     4377665 100%  675.47kB/s    0:00:06 (xfer#106, to-check=1017/329223)
ben/Music/Bach - Lifescapes/12 - Gavotte (from Violin Partita in E).mp3
     3596081 100%  688.59kB/s    0:00:05 (xfer#107, to-check=1016/329223)
ben/Music/Bach - Lifescapes/13 - Fantasia in Am.mp3
     5079000 100%  935.67kB/s    0:00:05 (xfer#108, to-check=1015/329223)
...

No huge network transfer speeds there. I mean, it's going over USB - that's enough of a bottleneck that the system bus shouldn't even notice it! Now, it could be that the USB code is hammering the kernel in some weird way... but how it could do so without showing up on the CPU is beyond me.

> Whenever you try to
> do something, disk is involved so it runs slowly. My suspicion is
> that, if you had a CPU-bound job, you would notice no slowing down,
> because there's CPU to spare.

Same deal; 'iostat' would show it.

iostat -c
Linux 2.6.31-22-generic (Jotunheim)     01/10/2011      i686  (2 CPU)
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          16.59    0.23    3.82    6.38    0.00   72.98

It's just about asleep.

> Instead of a two CPUs, for this application you'd be better off with 2
> paths to each disk. If you have another computer, you might be better
> off attaching the USB disk to the other computer and doing the rsync
> over the network.

Nope, don't have one on board. I used to have two of these laptops - one as a backup - but one died. Which reminds me - gotta get a replacement sometime soon.

> Of course I could be entirely wrong, but try ^Z on the rsync and see
> if everything else gets to be ok.

Yep. Load average starts dropping right away, etc.

> You ought to be able to detect if the disk is utilising much of the io
> bandwidth, because the disk light should be mainly on.

Bursts of activity, along with multi-second pauses - as expected.

> But, then again, I might have it completely wrong.

Hey, welcome to my club. :-} I'm clearly completely wrong in what I'm thinking, and can't find a way to re-init my thinking process; that's the worst kind of wrong for this sort of work. That's why I'm asking, anyway.

I really do appreciate the input, though - it's really useful to see that I'm not the only one thinking along this kind of track!

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Mulyadi Santosa [mulyadi.santosa at gmail.com]

Tue, 11 Jan 2011 12:25:21 +0700

Dear Ben...

On Tue, Jan 11, 2011 at 09:21, Ben Okopnik <ben at linuxgazette.net> wrote:

> I'm running a long-term rsync job (a full backup, after way too long of
> a hiatus. I know - really *bad* for someone who hounds people to keep
> on top of their backups professionally... but you know the saying about
> the cobbler's kids being the last to have new shoes.) It's copying the
> files to an external drive, connected via USB. Here's the problem: now
> that it's running, my system has become extremely "sensitive" to any
> additional loads - even very light ones. Firing up an xterm visibly
> spikes the CPU (which, with nothing more than two xterms open, is
> running at a load average of ~4.) Starting 'vim' takes about 4 seconds.

<............ and lots other......>

Here's my advices: - switch the I/O scheduler. Specifically for the external drive, switch to "deadline"....avoid cfq since sometimes the timeslicing isn't so good.

- make sure you mount the external drive using "async" and add "relatime" too. Updating access time sometimes sucks....

- Is it using journalling I/O? try to switch it off temporarily (e.g mount it as ext2 instead off ext3), or you could just mount it in writeback mode (assuming it's ext3)

That's in userland, in kernel space, not much we can do but things like: - use latest stable...not the longterm ones....I believe backporting is a pain in the ass. Better use latest one....but avoid -rc, -git, -next etc since it will ruin your days :D

- make sure it's using highest HZ (1000 right now), full preemption (CONFIG_PREEMPT=y) and using CONFIG_NO_HZ. The latter brough some controversies, since most believes timer rearming does indeed another latency, but I personally think it's negligible.

- Use con kolivas Brain Fuck Scheduler. It somewhat lower the latency on every aspect of your latency issues (I/O, context switch, you name it)

- This is really subjective, but if you have plenty of RAM, try to lower your /proc/sys/vm/swappiness, try 10 or 20, so page cache aren't flushed too often. It might help to prevent the kernel block layer to re-read your block devices too much (in some degree, of course)

As a bonus, if you somewhat battery savvy, turns of USB suspend...that just add slight latency....

-- regards,

Mulyadi Santosa Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com

Top Back

Ben Okopnik [ben at linuxgazette.net]

Tue, 11 Jan 2011 01:01:07 -0500

On Tue, Jan 11, 2011 at 12:25:21PM +0700, Mulyadi Santosa wrote:

> Dear Ben...

Hey, Mulyadi!

> On Tue, Jan 11, 2011 at 09:21, Ben Okopnik <ben at linuxgazette.net> wrote:
> > I'm running a long-term rsync job (a full backup, after way too long of
> > a hiatus. I know - really *bad* for someone who hounds people to keep
> > on top of their backups professionally... but you know the saying about
> > the cobbler's kids being the last to have new shoes.) It's copying the
> > files to an external drive, connected via USB. Here's the problem: now
> > that it's running, my system has become extremely "sensitive" to any
> > additional loads - even very light ones. Firing up an xterm visibly
> > spikes the CPU (which, with nothing more than two xterms open, is
> > running at a load average of ~4.) Starting 'vim' takes about 4 seconds.
> <............ and lots other......>
> 
> Here's my advices:

Well, all of this tries to address problem #2 (per Henry's email); "how to fix". OK, this isn't a bad thing - but right now, I'm still on "how to diagnose reliably".

> - switch the I/O scheduler. Specifically for the external drive,
> switch to "deadline"....avoid cfq since sometimes the timeslicing
> isn't so good.

This might actually be a decent idea for handling it. If we make the assumption that I/O priority is being given to the USB transfer, then - despite the relatively low I/O rate - that might explain everything grinding to a halt. That's the first glimmer of a possibility that I've seen so far - definitely worth exploring!

> - make sure you mount the external drive using "async" and add
> "relatime" too. Updating access time sometimes sucks....

As I understand it, "relatime" has been the default for a while now.

$ perl -wle'print scalar localtime((stat "/etc/passwd")[8])'
Mon Jan 10 16:25:01 2011
$ cat /etc/passwd > /dev/null
$ perl -wle'print scalar localtime((stat "/etc/passwd")[8])'
Mon Jan 10 16:25:01 2011

Right, no change in atime after reading the file.

> - Is it using journalling I/O? try to switch it off temporarily (e.g
> mount it as ext2 instead off ext3), or you could just mount it in
> writeback mode (assuming it's ext3)

The external drive? It's just plain ext2. Journalling would be too much of a good thing for a backup drive.

> That's in userland, in kernel space, not much we can do but things like:
> - use latest stable...not the longterm ones....I believe backporting
> is a pain in the ass. Better use latest one....but avoid -rc, -git,
> -next etc since it will ruin your days :D
>
> - make sure it's using highest HZ (1000 right now), full preemption
> (CONFIG_PREEMPT=y) and using CONFIG_NO_HZ. The latter brough some
> controversies, since most believes timer rearming does indeed another
> latency, but I personally think it's negligible.
> 
> - Use con kolivas Brain Fuck Scheduler. It somewhat lower the latency
> on every aspect of your latency issues (I/O, context switch, you name
> it)

Heh. That's a bit too much work... I can't see this becoming a huge issue; I just want to know what's going on, and how to reliably diagnose it.

> - This is really subjective, but if you have plenty of RAM, try to
> lower your /proc/sys/vm/swappiness, try 10 or 20, so page cache aren't
> flushed too often. It might help to prevent the kernel block layer to
> re-read your block devices too much (in some degree, of course)

Nice! I don't have a huge amount of RAM, but a little to spare at least. Just tried it... doesn't seem to make any difference, unfortunately.

> As a bonus, if you somewhat battery savvy, turns of USB suspend...that
> just add slight latency....

Well, I don't think that trying to tune this is the answer; it's a pretty "binary" type of problem, where running this thing just whacks my system - and it really shouldn't.

Just for more data, I stopped 'rsync' and tried a simple 'cp' on a large directory. Yeah, it does about the same thing; the load average shoots right up - BUT, starting up a browser, although slow, does not make the system grind to a halt (in fact, I'm typing this at my normal speed, which I wouldn't have been able to do with rsync and FF running.)

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Mulyadi Santosa [mulyadi.santosa at gmail.com]

Tue, 11 Jan 2011 13:20:57 +0700

Hi Ben

Shall we go straight to your last paragraph? :D

On Tue, Jan 11, 2011 at 13:01, Ben Okopnik <ben at linuxgazette.net> wrote:

> Just for more data, I stopped 'rsync' and tried a simple 'cp' on a large
> directory. Yeah, it does about the same thing; the load average shoots
> right up - BUT, starting up a browser, although slow, does not make the
> system grind to a halt (in fact, I'm typing this at my normal speed,
> which I wouldn't have been able to do with rsync and FF running.)

I read somewhere that it's the rsync algorithm itself that in some degree slow down things, quite likely it's "finding the difference" algorithm (that's how I name it). It could be worse with compression IMHO.

So, getting rid of rsync and simply use cp might be the answer.

As to pin point it, I think you need tools like oprofile or systemtap or ftrace. Oh wait, I forgot to mention latencytop too...have you tried it? It has some dependencies in several kernel features, so just check it out. Plus iotop and/or dstat I guess....

-- regards,

Mulyadi Santosa Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com

Top Back

Ben Okopnik [ben at okopnik.com]

Tue, 11 Jan 2011 01:40:20 -0500

On Tue, Jan 11, 2011 at 01:20:57PM +0700, Mulyadi Santosa wrote:

> Hi Ben 
> 
> Shall we go straight to your last paragraph? :D

Works for me.

> I read somewhere that it's the rsync algorithm itself that in some
> degree slow down things, quite likely it's "finding the difference"
> algorithm (that's how I name it). It could be worse with compression
> IMHO.
> 
> So, getting rid of rsync and simply use cp might be the answer.

Whoops - can't do that. I'm using 'rsync' specifically for the features that 'cp' doesn't have. I suppose I could do the initial, full backup with 'cp -a', but the subsequent incrementals aren't going to be all that small.

> As to pin point it, I think you need tools like oprofile or systemtap
> or ftrace. Oh wait, I forgot to mention latencytop too...have you
> tried it?

Wow. Would you believe I've never used of any of those? I think I've heard of 'oprofile', at least. I'll check'em out - thanks!

> It has some dependencies in several kernel features, so just
> check it out. Plus iotop and/or dstat I guess....

I just tried these two out - oh, cool! Two more nifty gadgets for my toolbox. I particularly like 'iotop' ('dstat' is cute and colorful, but easily done with a simple shell script.)

-- 
                       OKOPNIK CONSULTING
        Custom Computing Solutions For Your Business
Expert-led Training | Dynamic, vital websites | Custom programming
  443-250-7895   http://okopnik.com   http://twitter.com/okopnik

Top Back

Mulyadi Santosa [mulyadi.santosa at gmail.com]

Tue, 11 Jan 2011 14:30:36 +0700

On Tue, Jan 11, 2011 at 13:40, Ben Okopnik <ben at okopnik.com> wrote:

>> It has some dependencies in several kernel features, so just
>> check it out. Plus iotop and/or dstat I guess....
>
> I just tried these two out - oh, cool!  Two more nifty gadgets for my
> toolbox. I particularly like 'iotop' ('dstat' is cute and colorful, but
> easily done with a simple shell script.)

Nice Indeed, it's a must for every admin now...oh wait, let me clarify that...even casual users should know about them

IMO, Linux really shines in monitoring aspects lately. Once (i'm sure you agree), we're just left with top, vmstat and ps. Now, we have more. of course, old saying still apply here "it's not about the gun, but the man behind the gun"

-- regards,

Mulyadi Santosa Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com

Top Back

Kapil Hari Paranjape [kapil at imsc.res.in]

Tue, 11 Jan 2011 19:39:32 +0530

Hello,

On Tue, 11 Jan 2011, Ben Okopnik wrote:

> > So, getting rid of rsync and simply use cp might be the answer.
> 
> Whoops - can't do that. I'm using 'rsync' specifically for the features
> that 'cp' doesn't have. I suppose I could do the initial, full backup
> with 'cp -a', but the subsequent incrementals aren't going to be all
> that small.

Note that the delta-xfer mechanism is not used when source and destination are local (as there is no transfer of data taking place!); in this case the "-W" switch is the default.

Is there really much[*] of a difference between "rsync -a /src/. dst/." the following?

Step 1. Compute the list of files that are newer (time-stamp only) or do not exist in /src/.

Step 2. Apply a (cd /src/; cp -a ${list} /dst/; cd -)

Regards,

Kapil.

[*] OK. "rsync" does not delete/unlink files before it has finished copying them which is safer for a backup. --

Top Back

Ben Okopnik [ben at linuxgazette.net]

Tue, 11 Jan 2011 11:05:01 -0500

On Tue, Jan 11, 2011 at 07:39:32PM +0530, Kapil Hari Paranjape wrote:

> Hello,
> 
> On Tue, 11 Jan 2011, Ben Okopnik wrote:
> > > So, getting rid of rsync and simply use cp might be the answer.
> > 
> > Whoops - can't do that. I'm using 'rsync' specifically for the features
> > that 'cp' doesn't have. I suppose I could do the initial, full backup
> > with 'cp -a', but the subsequent incrementals aren't going to be all
> > that small.
> 
> Note that the delta-xfer mechanism is not used when source and
> destination are local (as there is no transfer of data taking place!);
> in this case the "-W" switch is the default.

I figured that out in the process of watching 'rsync' run after enabling '--progress'. The files that had changed - that I knew had only been appended to, in fact - were being recopied whole.

> Is there really much[*] of a difference between "rsync -a /src/. dst/."
> the following?
> 
>  Step 1. Compute the list of files that are newer (time-stamp only) or do
>  not exist in /src/.

You mean 'cp -u' ?

> [*] OK. "rsync" does not delete/unlink files before it has finished
> copying them which is safer for a backup.

Not much of an issue, since I'm not deleting the source files.

'rsync' has many other advantages, though. If I found that the system was I/O bound, I could use --bwlimit=NNN to limit it to a set kB/s. I can watch its progress. I can easily filter the file list to ignore, say, cache directories and stuff that I already have backed up elsewhere. I can print a change summary update at the end of the run. Handling sparse files is easy. I can do a dry run that'll show me what'll get done. I can prune empty directories. And, yes, I always have the option of shifting the whole shebang onto a remote machine just by adding a hostname and a colon.

'cp' has some pretty good options - most people have no idea what it can really do - but it would have to pump a lot of iron and climb a lot of mountains to even begin to compete with 'rsync'.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Ben Okopnik [ben at linuxgazette.net]

Tue, 11 Jan 2011 11:05:01 -0500

On Tue, Jan 11, 2011 at 07:39:32PM +0530, Kapil Hari Paranjape wrote:

> Hello,
> 
> On Tue, 11 Jan 2011, Ben Okopnik wrote:
> > > So, getting rid of rsync and simply use cp might be the answer.
> > 
> > Whoops - can't do that. I'm using 'rsync' specifically for the features
> > that 'cp' doesn't have. I suppose I could do the initial, full backup
> > with 'cp -a', but the subsequent incrementals aren't going to be all
> > that small.
> 
> Note that the delta-xfer mechanism is not used when source and
> destination are local (as there is no transfer of data taking place!);
> in this case the "-W" switch is the default.

I figured that out in the process of watching 'rsync' run after enabling '--progress'. The files that had changed - that I knew had only been appended to, in fact - were being recopied whole.

> Is there really much[*] of a difference between "rsync -a /src/. dst/."
> the following?
> 
>  Step 1. Compute the list of files that are newer (time-stamp only) or do
>  not exist in /src/.

You mean 'cp -u' ?

> [*] OK. "rsync" does not delete/unlink files before it has finished
> copying them which is safer for a backup.

Not much of an issue, since I'm not deleting the source files.

'cp' has some pretty good options - most people have no idea what it can really do - but it would have to pump a lot of iron and climb a lot of mountains to even begin to compete with 'rsync'.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Kapil Hari Paranjape [kapil at imsc.res.in]

Tue, 11 Jan 2011 21:57:57 +0530

Hello,

On Tue, 11 Jan 2011, Ben Okopnik wrote:

> On Tue, Jan 11, 2011 at 07:39:32PM +0530, Kapil Hari Paranjape wrote:
> >  Step 1. Compute the list of files that are newer (time-stamp only) or do
> >  not exist in /src/.
> 
> You mean 'cp -u' ?

Uh-oh! I know I should have read the man page of cp more carefully :-)

Regards,

Kapil. --

Top Back

Ben Okopnik [ben at okopnik.com]

Tue, 11 Jan 2011 13:16:26 -0500

On Tue, Jan 11, 2011 at 09:57:57PM +0530, Kapil Hari Paranjape wrote:

> Hello,
> 
> On Tue, 11 Jan 2011, Ben Okopnik wrote:
> > On Tue, Jan 11, 2011 at 07:39:32PM +0530, Kapil Hari Paranjape wrote:
> > >  Step 1. Compute the list of files that are newer (time-stamp only) or do
> > >  not exist in /src/.
> > 
> > You mean 'cp -u' ? 
> 
> Uh-oh! I know I should have read the man page of cp more carefully

The man page is (as is often the case with 'info'-centric religious fanatics^W^Wauthors) somewhat less than informative - e.g., it doesn't list any of the rich set of arguments to the options. In this case, 'info' is the best source.

Couple of nifty 'cp' tricks:

# Create date+time-stamped backups for any changed/new files
/bin/cp -ubaS .`date '+%Y%m%d%H%M%S'` file1 file2 file3 ... backup
 
# Create old-style Unix backup files (file~)
for n in *; do cp -bf "$n" "$n"; done

-- 
                       OKOPNIK CONSULTING
        Custom Computing Solutions For Your Business
Expert-led Training | Dynamic, vital websites | Custom programming
  443-250-7895   http://okopnik.com   http://twitter.com/okopnik

Top Back

Karl Vogel [vogelke+tag at pobox.com]

Tue, 11 Jan 2011 14:49:11 -0500 (EST)

[ Ben, please excuse the CC: I'm not sure mail to "tag" is getting through. ]

>> On Mon, 10 Jan 2011 21:21:19 -0500, 
>> Ben Okopnik <ben at linuxgazette.net> said:

B> I'm running a long-term rsync job (a full backup, after way too long of B> a hiatus. [...] Here's the kicker: "top" shows... nothing particularly B> unusual. vmstat/iostat report essentially the same story.

I've had the same problem with full backups via rsync. You might want to check the read-ahead setting on your drives -- the default setting is usually very conservative: root# cat /sys/block/sda/queue/read_ahead_kb 128

After some fiddling, I found my system sweet spot (2Gb RAM, 2 CPUs) was 8k: root# echo 8192 > /sys/block/sda/queue/read_ahead_kb

Your system might expect the setting to be in blocks: root# blockdev --getra /dev/sda 256 root# blockdev --setra 16384 /dev/sda root# blockdev --getra /dev/sda 16384

Even with these settings, rsync still occasionally slows to a dead crawl if the filetree is big enough and I'm doing a full backup. I've gotten faster results doing something like this once, and using rsync after:

# mkdir /tmp/toc /tmp/work # cd /directory/to/copy # find . -depth -print | split -l500 - /tmp/toc/x

Then copy one batch at a time, using an unprivileged user (bkup) and a directory on the destination host that the user can write to:

for file in /tmp/toc/x*; do arch=/tmp/work/$file.pax.gz pax -wd < $file | gzip -1c > $arch logger -t bkup $arch su -f bkup -c "scp -c arcfour $arch some.host:/staging/area" rm $file $arch done

Unpack on some.host:

cd /destination/directory for arch in /staging/area/*.pax.gz; do gunzip -c $arch | pax -rd -pe && rm $arch done

This is for copies over an insecure network -- if you run rsync plus ssh, definitely use the "arcfour" cipher, it's a hell of a lot faster than the default.

-- Karl Vogel I don't speak for the USAF or my company

Diane, last night I dreamed I was eating a large, tasteless gumdrop, and awoke to discover I was chewing on one of my foam disposable earplugs. Perhaps I should consider moderating my nighttime coffee consumption. --FBI Special Agent Dale Cooper, "Twin Peaks"

Top Back

Ben Okopnik [ben at linuxgazette.net]

Tue, 11 Jan 2011 16:04:40 -0500

On Tue, Jan 11, 2011 at 02:49:11PM -0500, Karl Vogel wrote:

> [ Ben, please excuse the CC: I'm not sure mail to "tag" is getting through. ]

I appreciate the thoughtfulness, actually.

> >> On Mon, 10 Jan 2011 21:21:19 -0500, 
> >> Ben Okopnik <ben at linuxgazette.net> said:
> 
> B> I'm running a long-term rsync job (a full backup, after way too long of
> B> a hiatus. [...]  Here's the kicker: "top" shows... nothing particularly
> B> unusual.  vmstat/iostat report essentially the same story.
> 
>    I've had the same problem with full backups via rsync.  You might want
>    to check the read-ahead setting on your drives -- the default setting is
>    usually very conservative:
>       root# cat /sys/block/sda/queue/read_ahead_kb
>       128
> 
>    After some fiddling, I found my system sweet spot (2Gb RAM, 2 CPUs) was 8k:
>       root# echo 8192 > /sys/block/sda/queue/read_ahead_kb
> 
>    Your system might expect the setting to be in blocks:
>       root# blockdev --getra /dev/sda
>       256
>       root# blockdev --setra 16384 /dev/sda
>       root# blockdev --getra /dev/sda
>       16384

Wow. Significant improvement, just with those (untuned) values. I just ran an incremental backup, and LA stayed between 2.5 and 3, with only minimal latency in program response - and that's with Firefox, Evince, and 5 xterms up and running. Karl, you're officially awesome.

(BTW, that reminds me - when are you going to write up your "most-common directory changer" gadget for me? A lot of folks would enjoy it.)

>    Even with these settings, rsync still occasionally slows to a dead crawl
>    if the filetree is big enough and I'm doing a full backup.  I've gotten
>    faster results doing something like this once, and using rsync after:
> 
>       # mkdir /tmp/toc /tmp/work
>       # cd /directory/to/copy
>       # find . -depth -print | split -l500 - /tmp/toc/x
> 
>    Then copy one batch at a time, using an unprivileged user (bkup) and a
>    directory on the destination host that the user can write to:
> 
>       for file in /tmp/toc/x*; do
>           arch=/tmp/work/$file.pax.gz
>           pax -wd < $file | gzip -1c > $arch
>           logger -t bkup $arch
>           su -f bkup -c "scp -c arcfour $arch some.host:/staging/area"
>           rm $file $arch
>       done
> 
>    Unpack on some.host:
> 
>       cd /destination/directory
>       for arch in /staging/area/*.pax.gz; do
>           gunzip -c $arch | pax -rd -pe && rm $arch
>       done
> 
>    This is for copies over an insecure network -- if you run rsync plus ssh,
>    definitely use the "arcfour" cipher, it's a hell of a lot faster than
>    the default.

[grin] You know me, always kibitzing on stuff like this. Besides, I find tar options so fascinatingly arcane that I'm still looking for the one where I can send it out to stand on a street corner and earn money for me while I lay about and enjoy the good life (I'm sure it exists; I just haven't found the right manpage yet.) Meanwhile, I keep running across stuff I can use. This should automate the above process a bit more.

1) Create a 'send' script in some temp directory. Pick a $DELAY value.

$ mkdir /tmp/x
$ cat <<! >/tmp/x


> #!/bin/sh
> 
> ts=`date '+%s%N'`
> # Don't use '-C' (compression) on fast networks
> scp -C -2 -c arcfour /tmp/x/foo.tar some.host:/staging/area/foo$ts.tar
> echo "Chunk $ts sent; sleeping..."
> sleep $DELAY

!

2) From now on, all you need is to issue this command in the directory to be backed up; chunking and delays (per the above script) are built in.

$ yes|tar -ML $CHUNK_SIZE_in_KB -F /tmp/x/send -cf /tmp/x/foo.tar *

On fast networks, compression just eats up CPU, so '-C' may be unnecessary.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Ben Okopnik [ben at linuxgazette.net]

Tue, 11 Jan 2011 16:20:43 -0500

On Tue, Jan 11, 2011 at 04:04:40PM -0500, Benjamin Okopnik wrote:

> 
> 1) Create a 'send' script in some temp directory. Pick a $DELAY value.
> 
> ```
> $ mkdir /tmp/x
> $ cat <<! >/tmp/x
> > #!/bin/sh
> > 
> > ts=`date '+%s%N'`
> > # Don't use '-C' (compression) on fast networks
> > scp -C -2 -c arcfour /tmp/x/foo.tar some.host:/staging/area/foo$ts.tar
> > echo "Chunk $ts sent; sleeping..."
> > sleep $DELAY
> !
> '''

I just realized that I made it sound like "this script has to be in that temp directory." It doesn't, of course - you can keep it wherever you like, as long as you use the same temp directory name (or change it in the script, or make it an option.) Just change the invocation in the '-F' tar option.

Oh - something I want to mention explicitly while I'm thinking about it: I'm always doing something like "oh, and here's a different way to do it" here, especially with scripts from people like Henry Grebler and Karl Vogel; that is NOT, by any stretch of imagination, a criticism of their scripts - in fact, both of these folks are excellent scripters, and I admire the hell out of their skills. It's because they come up with ideas that make me go "ooh, OOH!" - for which I owe them much gratitude. The script is just a wrapper around an idea - and some ideas are cool enough to get all bouncy about.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Raj Shekhar [rajlist2 at rajshekhar.net]

Tue, 11 Jan 2011 14:05:50 -0800

In infinite wisdom Ben Okopnik said the following On 1/10/11 6:21 PM:

> Memory usage is reasonable, swap is barely being touched, the CPU is
> spending 3/4 of its time being idle, even the number of context switches
> is very reasonable as compared to the I/O rate. If I saw this on a
> remote machine, I'd figure it was being under-utilized. :\

Have you tried doing an 'strace -c xterm' and checking which syscall takes the most amount of time? That can usually point you in the right direction.

Top Back

Ben Okopnik [ben at linuxgazette.net]

Mon, 24 Jan 2011 22:21:29 -0500

Whoops, just spotted this (digging out my inbox after a bit of a hiatus):

On Tue, Jan 11, 2011 at 02:05:50PM -0800, Raj Shekhar wrote:

> In infinite wisdom Ben Okopnik said the following On 1/10/11 6:21 PM:
> 
> >Memory usage is reasonable, swap is barely being touched, the CPU is
> >spending 3/4 of its time being idle, even the number of context switches
> >is very reasonable as compared to the I/O rate. If I saw this on a
> >remote machine, I'd figure it was being under-utilized. :\
> 
> Have you tried doing an 'strace -c xterm' and checking which syscall
> takes the most amount of time?  That can usually point you in the
> right direction.

Why xterm? The problem was being caused by rsync. Besides, in situations like this one, strace is the perfect demonstration of the Heisenberg principle: it would slow rsync down by a huge factor so you'd never be sure whether it was rsync or strace that was causing the largest slowdown.

It's a fairly easy and reasonable guess that it wasn't getting bogged down in the network end: I can certainly transfer things faster than that via Ethernet and not get overloaded. It's also not disk I/O - same reasoning. Something on the CPU end, possibly rsync calculating all the necessary bits - transferring things via 'tar|ssh' didn't slam it quite as hard - plus a bit of disk read priority tuning. Karl's approach, including testing readahead settings for future reference, made a lot of sense and gave some improvement.

Overall, I think the answer is that you need some decent horsepower to run big transfers (and even more so if you're doing it via rsync.) That's a really good factor to be aware of.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Mulyadi Santosa [mulyadi.santosa at gmail.com]

Tue, 25 Jan 2011 11:32:51 +0700

On Tue, Jan 25, 2011 at 10:21, Ben Okopnik <ben at linuxgazette.net> wrote:

> Overall, I think the answer is that you need some decent horsepower to
> run big transfers (and even more so if you're doing it via rsync.)
> That's a really good factor to be aware of.

Just crossing my mind, do you enable tcp offloading in your ethernet card (if it is supported, of course)?

-- regards,

Mulyadi Santosa Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com

Top Back

Ben Okopnik [ben at linuxgazette.net]

Mon, 24 Jan 2011 23:45:12 -0500

On Tue, Jan 25, 2011 at 11:32:51AM +0700, Mulyadi Santosa wrote:

> On Tue, Jan 25, 2011 at 10:21, Ben Okopnik <ben at linuxgazette.net> wrote:
> > Overall, I think the answer is that you need some decent horsepower to
> > run big transfers (and even more so if you're doing it via rsync.)
> > That's a really good factor to be aware of.
> 
> Just crossing my mind, do you enable tcp offloading in your ethernet
> card (if it is supported, of course)?

Not relevant to the original question, of course, since I'm just backing up to a USB-connected external drive, but - that would take a specially-built NIC, right? I've just got an Atheros AR8132/L1c in my little netbook; I doubt that it's supported. Not that I ever have traffic across it that would require anything like that.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Mulyadi Santosa [mulyadi.santosa at gmail.com]

Tue, 25 Jan 2011 12:04:20 +0700

On Tue, Jan 25, 2011 at 11:45, Ben Okopnik <ben at linuxgazette.net> wrote:

> Not relevant to the original question, of course, since I'm just backing
> up to a USB-connected external drive,

c**p, sorry, I forgot that....pass :D

-- regards,

Mulyadi Santosa Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com

Top Back

Ben Okopnik [ben at linuxgazette.net]

Tue, 25 Jan 2011 00:15:12 -0500

On Tue, Jan 25, 2011 at 12:04:20PM +0700, Mulyadi Santosa wrote:

> On Tue, Jan 25, 2011 at 11:45, Ben Okopnik <ben at linuxgazette.net> wrote:
> > Not relevant to the original question, of course, since I'm just backing
> > up to a USB-connected external drive,
> 
> c**p, sorry, I forgot that....pass :D

No worries - it's all good technical knowledge, right? Besides, it's actually pretty cool to hear about all the stuff you know. You sound really knowledgeable about bits where I have little or no clue - all the kernel internals, for example. Great to have that on the list.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top Back

Mulyadi Santosa [mulyadi.santosa at gmail.com]

Wed, 26 Jan 2011 01:02:46 +0700

Hi Ben...

On Tue, Jan 25, 2011 at 12:15, Ben Okopnik <ben at linuxgazette.net> wrote:

> No worries - it's all good technical knowledge, right? Besides, it's
> actually pretty cool to hear about all the stuff you know. You sound
> really knowledgeable about bits where I have little or no clue - all the
> kernel internals, for example. Great to have that on the list.

Thanks a lot Ben, I recall the motivation I join TAG is to share ..and that's what I do and I hope I still can do. ..but...

people, I don't want to ruin your day or asking for great care or anything alike...but maybe in near future maybe I couldn't contribute as much as I do now. SInce I regularly under epilepsy medication, the nerve pills (that's how I call it) slowly chew out my thinking ability. I get easily forget things and sometimes I hardly focus on something.

So now you all know that I am not so healthy....and with this e-mail, I want to say "sorry" if in near future I can't contribute like I did today....but I hope what I had done have meanings for you all.

-- regards,

Mulyadi Santosa Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com

Top Back

Ben Okopnik [ben at okopnik.com]

Wed, 26 Jan 2011 22:20:19 -0500

Hi, Mulyadi -

On Wed, Jan 26, 2011 at 01:02:46AM +0700, Mulyadi Santosa wrote:

> 
> people, I don't want to ruin your day or asking for great care or
> anything alike...but maybe in near future maybe I couldn't contribute
> as much as I do now. SInce I regularly under epilepsy medication, the
> nerve pills (that's how I call it) slowly chew out my thinking
> ability. I get easily forget things and sometimes I hardly focus on
> something.

Man, that's really rough. You and I have talked about it a bit... I can only hope that things get better for you - perhaps you can see about getting a different type of medicine, or something that will counteract that effect. The main thing, the mot important thing, is that you never, ever give up on yourself. Whatever ability you have is yours, to be used to its fullest, to bring you whatever power and joy that can be extracted out of life. Fight for that every moment that you can, however tough the circumstances may be. I know that you do anyway; this post of yours is an example of your courage and your sense of responsibility toward others - but I just want to give you what encouragement I can, because I know how important it can be.

I've been going through some fairly harsh life stuff myself this past year, so staying focused on that upward climb is an ever-present challenge for me, every minute of the day. This awareness is at least a small bit of positive energy that I can share.

> So now you all know that I am not so healthy....and with this e-mail,
> I want to say "sorry" if in near future I can't contribute like I did
> today....but I hope what I had done have meanings for you all.

Thanks for everything you have contributed, Mulyadi. You helped me with this last question that I had; I'm sure that what you've given here has helped other people as well (pretty much the entire point of LG, that.) You know my email address, and we cross paths in other places; I can always lend an ear, at least.

Ben

-- 
                       OKOPNIK CONSULTING
        Custom Computing Solutions For Your Business
Expert-led Training | Dynamic, vital websites | Custom programming
  443-250-7895   http://okopnik.com   http://twitter.com/okopnik

Top Back