Hi



I need some advice on the best course of action with a frustrating issue.

I have raised this issue on a HP support forum, but I thought to tackle it also from a SLES angle as well.


We have Data Protector Express 4.00-sp1 - 56906 running on two Suse Linux Enterprise Server (SLES) v10 servers.



The domain server is running on 32 bit SLES 10 SP1, OES 2

while the remote agent server is running on 64 bit SLES 10 SP3, OES 2 SP2



I have run test backups during the day and they have worked, so I am not sure why the 2100hrs scheduled backup has the issue.



Basically the backup of the remote agents starts, files are counted and file backup commences but then it dies.



/var/log/messages shows the following for the job that was scheduled to run at 2100hrs.

The first entry and this one look interesting:

Sep 7 21:11:46 srv2 dplinsdr: *** glibc detected *** /usr/local/hp/dpx/lin/x86_64/dplinsdr: double free or corruption (!prev): 0x00002aaaaca008d0 ***



------------------------------------------------------------------------------------------

Sep 7 21:10:52 srv2 kernel: dplinsdr: page allocation failure. order:4, mode:0xd0

Sep 7 21:10:52 srv2 kernel:

Sep 7 21:10:52 srv2 kernel: Call Trace: <ffffffff80167964>{__alloc_pages+796} <ffffffff80182e4c>{kmem_getpages+106}

Sep 7 21:10:52 srv2 kernel: <ffffffff80184231>{fallback_alloc+275} <ffffffff80184753>{__kmalloc+179}

Sep 7 21:10:52 srv2 kernel: <ffffffff8016d1a7>{kzalloc+9} <ffffffff801a74a9>{getxattr+137}

Sep 7 21:10:52 srv2 kernel: <ffffffff80196cf4>{link_path_walk+218} <ffffffff802f1209>{__down_write+21}

Sep 7 21:10:52 srv2 kernel: <ffffffff801fee72>{__up_write+20} <ffffffff80174544>{sys_brk+244}

Sep 7 21:10:52 srv2 kernel: <ffffffff801a75cf>{sys_lgetxattr+75} <ffffffff802f1209>{__down_write+21}

Sep 7 21:10:52 srv2 kernel: <ffffffff801fee72>{__up_write+20} <ffffffff80174544>{sys_brk+244}

Sep 7 21:10:52 srv2 kernel: <ffffffff8010ae36>{system_call+126}

Sep 7 21:10:52 srv2 kernel: Mem-info:

Sep 7 21:10:52 srv2 kernel: Node 0 DMA per-cpu:

Sep 7 21:10:52 srv2 kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0

Sep 7 21:10:52 srv2 kernel: CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0

Sep 7 21:10:52 srv2 kernel: CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0

Sep 7 21:10:52 srv2 kernel: CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0

Sep 7 21:10:52 srv2 kernel: CPU 4: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0

Sep 7 21:10:52 srv2 kernel: CPU 5: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0

Sep 7 21:10:52 srv2 kernel: CPU 6: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0

Sep 7 21:10:52 srv2 kernel: CPU 7: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0

Sep 7 21:10:52 srv2 kernel: Node 0 DMA32 per-cpu:

Sep 7 21:10:52 srv2 kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 100 Cold: hi: 62, btch: 15 usd: 55

Sep 7 21:10:52 srv2 kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 169 Cold: hi: 62, btch: 15 usd: 11

Sep 7 21:10:52 srv2 kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 173 Cold: hi: 62, btch: 15 usd: 51

Sep 7 21:10:52 srv2 kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 159 Cold: hi: 62, btch: 15 usd: 54

Sep 7 21:10:52 srv2 kernel: CPU 4: Hot: hi: 186, btch: 31 usd: 179 Cold: hi: 62, btch: 15 usd: 48

Sep 7 21:10:52 srv2 kernel: CPU 5: Hot: hi: 186, btch: 31 usd: 178 Cold: hi: 62, btch: 15 usd: 12

Sep 7 21:10:52 srv2 kernel: CPU 6: Hot: hi: 186, btch: 31 usd: 155 Cold: hi: 62, btch: 15 usd: 50

Sep 7 21:10:52 srv2 kernel: CPU 7: Hot: hi: 186, btch: 31 usd: 156 Cold: hi: 62, btch: 15 usd: 51

Sep 7 21:10:52 srv2 kernel: Node 0 Normal per-cpu:

Sep 7 21:10:52 srv2 kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 124 Cold: hi: 62, btch: 15 usd: 48

Sep 7 21:10:52 srv2 kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 154 Cold: hi: 62, btch: 15 usd: 6

Sep 7 21:10:52 srv2 kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 15 Cold: hi: 62, btch: 15 usd: 57

Sep 7 21:10:52 srv2 kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 139 Cold: hi: 62, btch: 15 usd: 55

Sep 7 21:10:52 srv2 kernel: CPU 4: Hot: hi: 186, btch: 31 usd: 115 Cold: hi: 62, btch: 15 usd: 3

Sep 7 21:10:52 srv2 kernel: CPU 5: Hot: hi: 186, btch: 31 usd: 155 Cold: hi: 62, btch: 15 usd: 14

Sep 7 21:10:52 srv2 kernel: CPU 6: Hot: hi: 186, btch: 31 usd: 168 Cold: hi: 62, btch: 15 usd: 48

Sep 7 21:10:52 srv2 kernel: CPU 7: Hot: hi: 186, btch: 31 usd: 175 Cold: hi: 62, btch: 15 usd: 60

Sep 7 21:10:52 srv2 kernel: Free pages: 804448kB (0kB HighMem)

Sep 7 21:10:52 srv2 kernel: Active:202289 inactive:137558 dirty:229 writeback:0 unstable:0 free:201112 slab:717254 mapped:26192 pagetables:2886

Sep 7 21:10:52 srv2 kernel: Node 0 DMA free:12188kB min:16kB low:20kB high:24kB active:0kB inactive:0kB present:11780kB pages_scanned:0 all_unreclaimable? yes

Sep 7 21:10:52 srv2 kernel: lowmem_reserve[]: 0 3630 6029 6029

Sep 7 21:10:52 srv2 kernel: Node 0 DMA32 free:649472kB min:5976kB low:7468kB high:8964kB active:258228kB inactive:460832kB present:3717536kB pages_scanned:0 all_unreclaimable? no

Sep 7 21:10:52 srv2 kernel: lowmem_reserve[]: 0 0 2398 2398

Sep 7 21:10:52 srv2 kernel: Node 0 Normal free:142788kB min:3948kB low:4932kB high:5920kB active:550928kB inactive:89400kB present:2456320kB pages_scanned:5 all_unreclaimable? no

Sep 7 21:10:52 srv2 kernel: lowmem_reserve[]: 0 0 0 0

Sep 7 21:10:52 srv2 kernel: Node 0 DMA: 3*4kB 2*8kB 2*16kB 5*32kB 1*64kB 3*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12188kB

Sep 7 21:10:52 srv2 kernel: Node 0 DMA32: 120290*4kB 20847*8kB 26*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 649472kB

Sep 7 21:10:52 srv2 kernel: Node 0 Normal: 27731*4kB 3857*8kB 13*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 142788kB

Sep 7 21:10:52 srv2 kernel: Swap cache: add 58, delete 58, find 4/4, race 0+0

Sep 7 21:10:52 srv2 kernel: Free swap = 1052064kB

Sep 7 21:10:52 srv2 kernel: Total swap = 1052248kB

Sep 7 21:10:52 srv2 kernel: Free swap: 1052064kB

Sep 7 21:10:52 srv2 kernel: 1671167 pages of RAM

Sep 7 21:10:52 srv2 kernel: 144740 reserved pages

Sep 7 21:10:52 srv2 kernel: 225539 pages shared

Sep 7 21:10:52 srv2 kernel: 0 pages swap cached

Sep 7 21:11:46 srv2 dplinsdr: *** glibc detected *** /usr/local/hp/dpx/lin/x86_64/dplinsdr: double free or corruption (!prev): 0x00002aaaaca008d0 ***

Sep 7 21:27:34 srv2 syslog-ng[30490]: STATS: dropped 0

Sep 7 22:27:34 srv2 syslog-ng[30490]: STATS: dropped 0

Sep 7 23:27:34 srv2 syslog-ng[30490]: STATS: dropped 0



------------------------------------------------------------------------------------------



Some things I have tried:

1. Recreated the backup job

2. Tested backups several times during the day and they worked.



Where to from here?

Has anyone experienced similar issues on SLES with any software not just Data Protector, and can shed light on a possible fix?


ATTACHMENTS that I'll try to post:

(NB The attachments are from linux so if using Windows, it is best not to view with Notepad, use Wordpad or a word processor.)



hpdx-error Sep7.txt

= var/log/messages extract for the 7th Sep 2010 on the remote agent server. Scheduled backup starts at 2100hrs.



var-log-messages-dplinsdr-page-alloc-failure.txt

= /var/log/messages grep of dplinsvr showing page allocation failures.



var-log-messages-extract.log

= /var/log/messages extract, full log excluding some irrelevant DNS and other daemon messages.