Results 1 to 2 of 2

Thread: BackupExec: A timeout occurred waiting for NDMPD.NLM to init

Threaded View

  1. #1
    Join Date
    Apr 2008

    BackupExec: A timeout occurred waiting for NDMPD.NLM to init

    "Error: 1004 A timeout occurred waiting for NDMPD.NLM to initialize."

    A very common and well documented issue but every fix I have found has not worked. The list is extensive so I will start by describing the issue, then the environment and finally what I have tried so far.

    The issue
    I can not load BackupExec 9.1 or 9.2 on NetWare 6.5 SP8. The loader starts and when it gets to loading NDMPD.NLM the CPU hits 99% (reported by monitor.nlm) and BackupExec stops loading for about 80-120 seconds. The loader eventually gives up after a non-critical abend. From this point on I can not unload becdm.cdm and I generally reboot the server to clear the abend.
    BackupExec has been working on the server for about 2 years, I can not link any changes to the time it stopped working.

    It used to load reliably every time for about 1.5 years and then it started to exhibit this issue when loading 1 in 10 times. It has gradually gotten worse and now will not load at all.

    Before you say it, yes I know Symantec dropped BackupExec for NetWare years ago.

    It seems to me this can only be a hardware issue now but I'm keen to hear any other ideas?

    The environment
    * The physical server is a HP DL385 G5 (SKU 411360-371), the tape drive is an external half height HP Ultrium 448 LTO 2 (SKU DW086A) using SAS to connect to a HP SC44Ge SAS HBA (SKU 416096-B21). It has 18GB of RAM, Two quad-core AMD Opteron 2352 2.1 GHz CPU's and 5 x direct attached SAS hard drives running on the internal P400 RAID controller configured with RAID 5.
    * I can provide firmware versions if needed, the BIOS is HP A09 2009-07-11. Don't have the tape drive or HBA versions to hand.
    * The server is running VMware ESXi 4.1.0 build 260247, using a free license i.e. stand alone.
    * The VMware server uses about 50% of RAM, 70% of storage and 12% of CPU.
    * The guest is NetWare 6.5 SP8 (overlay) running as a guest with 4GB of vRAM and a single CPU. The tape drive is presented as a generic SCSI device.
    * I'm running BackupExec for NetWare 9.2.1401.5 build 287839 with the NetWare Open File Option and a Windows Remote Agent.
    * I'm running the storage driver LSIMPTNW.HAM version 5.02, 5 Dec 2007, which is default with SP8. BackupExec 9.2 device driver becdm.cdm version 7.50, 7 Feb 2006.
    * The NetWare server sees the tape drive as an "Unbound Device Object" until becdm.cdm is loaded at which point its recognsied as "HP Ultrium 2-SCSI T65D"
    * The tape drive is listed on the BackupExec HCL - "StorageWorks Ultrium 448 LTO2" except mine is SAS not SCSI. Enterprise Support - Symantec Corp. - Symantec Backup Exec for NetWare Servers (tm) 9.x (9.0, 9.1, 9.2) Hardware Compatibility List. Includes HCL information for supported drives, libraries, virtual tape devices, fibre-channel HBAs, switches, and rou

    What have I tried so far?
    * We have a rental tape drive identical to our own as the original one was over-heating, so I have tried two physical tape drives. Both using the same server, HBA and SAS tape drive cable.
    * I have used tcpcon.nlm to check that port 10,000 is unused.
    * I have tried the parameter -!X on the line of bestart.ncf that loads BackupExec to stop it checking if port 10,000 is free.
    * I have added a set statement to c:\nwserver\startup.ncf "set auto load of cdm modules = off" so it will not try to load unwanted CDM's.
    * I have renamed all copies of nwtape.cdm to nwtape.cdx on the C: so they can not be auto loaded.
    * There are no load statements in the startup.ncf for other tape drive device drivers. Just scsihd.cdm and idecd.cdm.
    * There are no tape libraries or robots involved. Its just a basic tape drive.
    * The physical and virtual server have been rebooted numerous times to ensure its not a memory issue etc...
    * I've loaded the server with no other significant modules loaded. Just the basics like eDirectory etc... and the same issue still occurs.
    * In addition to the NW65 SP8 default version of lsimptnw.ham I have tried version 5.03.01, 23 Jan 2008, from the LSI website.
    * I have deleted the original BackupExec folder from the server (sys:\bkupexec) and re-installed BackupExec numerous times.
    * I have powered off the tape drive a number of times
    * I have changed the tape drives virtual SCSI ID, presented by VMware to the NetWare guest, to be a higher number on the same virtual HBA as the virtual disk. I have also tried allocating the tape drive a virtual SCSI ID on a dedicated virtual HBA, SCSI 1:0.
    * I have deleted the SMDR config file (sys:\etc\sms\smdr.cfg) and re-created it using the command "load smdr.nlm new"
    * I have deleted the NetWare servers backup queue and the SMS SMDR Group from eDirectory and re-created the queue using the command "load qman.nlm new". Note: as its NW65SP8 the SMS RPC object does not exist and the SMS SMDR group is not re-created.
    * I have deleted the two BackupExec objects from eDirectory and re-created them by loading BackupExec using "bestart.ncf".
    * I have checked the health of eDirectory.
    * I have tried the LSI SAS and LSI Parallel virtual HBA's in the VMware guest machine.
    * I have tried adding the parameter qtags=off to the load command for lsimptnw.ham in startup.ncf.
    * I have tried the parameter -b on the line of bestart.ncf that loads BackupExec
    * I have tried loading ipxspx.nlm to stop any public symbol error messages when loading BackupExec
    * I have also tried BackupExec version 9.1 build 0306.12 and 9.1.1158.10 build 289707
    * I have built a second NetWare 6.5 SP8 server (SP8 overlay DVD, MD5 verified ISO media) using the Backup server profile during the GUI install stage. I have loaded afreecon.nlm, installed VMware tools (vmwtool.nlm) and BackupExec for NetWare 9.2.1401.5 build 287839. The same issue occurs

    System output
    1) Excerpt from "list storage adapters"
    0x04 [V358-A3] SAS1068:00008 [slot 8]
    0x06 [V358-A3-D0:0] HP Ultrium 2-SCSI T65D

    2) Backup Exec SureStart Console
    Module NETDB.NLM is already loaded
    Module SMDR.NLM is already loaded
    Module TSAFS.NLM is already loaded
    Loading module TSANDS.NLM
    Module BECDM.CDM is already loaded
    Loading module B2D.NLM
    Module NWIDK.NLM is already loaded
    Module NSS.NLM is already loaded
    Loading module OFM.NLM
    Loading module NRLTLI.NLM
    Loading module AD_ASPI.NLM
    Loading module NDMPD.NLM
    Error: 1004

    A timeout occurred waiting for NDMPD.NLM to initialize.

    Press <Alt+Esc> to return to the Server Console. After you resolve
    the error condition(s), re-execute the BESTART command.

    3) Abend log

    ************************************************** *******
    Novell Open Enterprise Server, NetWare 6.5
    PVER: 6.50.08

    Server halted Sunday, 6 May 2012 5:38:23.196
    Abend 1 on P00: Server-5.70.08: Page Fault Processor Exception (Error code 00000002)

    CS = 0008 DS = 0010 ES = 0010 FS = 0023 GS = 0023 SS = 0010
    EAX = 00000000 EBX = 00204B58 ECX = 896B77E0 EDX = 8A26157C
    ESI = 00204B58 EDI = 00000F9B EBP = 8A261DD4 ESP = 8A261DA8
    EIP = 8A1E03FD FLAGS = 00010082
    8A1E03FD C680B00E000000 MOV [EAX+00000EB0]=?, 00
    EIP in LSIMPTNW.HAM at code start +000013FDh
    Access Location: 0x00000EB0

    The violation occurred while processing the following instruction:
    8A1E03FD C680B00E000000 MOV [EAX+00000EB0], 00
    8A1E0404 8A45FC MOV AL, [EBP-04]
    8A1E0407 8845F8 MOV [EBP-08], AL
    8A1E040A 8A45F8 MOV AL, [EBP-08]
    8A1E040D 89EC MOV ESP, EBP
    8A1E040F 5D POP EBP
    8A1E0410 5F POP EDI
    8A1E0411 5E POP ESI
    8A1E0412 5B POP EBX
    8A1E0413 C3 RET

    Running process: Server 10 Process
    Thread Owned by NLM: SERVER.NLM
    Stack pointer: 8A261F5C
    OS Stack limit: 8A25A020
    Scheduling priority: 67371008
    Wait state: 50500F0 Waiting for work
    Stack: --8A261E01 ?
    --00000001 (LOADER.NLM|KernelAddressSpace+1)
    --00951740 ?
    8A1E0414 (LSIMPTNW.HAM|(Code Start)+1414)
    --0097BAA0 ?
    --00000000 (LOADER.NLM|KernelAddressSpace+0)
    --01900190 ?
    --157C157C ?
    --0D000001 ?
    --0097BACC ?
    --8A261D01 ?
    --8A261E18 ?
    --00000F9B (LOADER.NLM|KernelAddressSpace+F9B)
    --00204B58 ?
    --00204B58 ?
    8A1DFD6B (LSIMPTNW.HAM|(Code Start)+D6B)
    --0097BAA0 ?
    --0097BAC0 ?
    --00000001 (LOADER.NLM|KernelAddressSpace+1)
    --0097BA50 ?
    --00000000 (LOADER.NLM|KernelAddressSpace+0)
    --00F20000 ?
    --00000000 (LOADER.NLM|KernelAddressSpace+0)
    --00000000 (LOADER.NLM|KernelAddressSpace+0)
    --00200000 ?
    --0097BAA0 ?
    --0097BAC0 ?
    --8A261EB0 ?
    --8A261E44 ?
    --00000F9B (LOADER.NLM|KernelAddressSpace+F9B)
    --00204B58 ?
    --00204B58 ?
    8A1FAA64 ?
    --0097BA50 ?
    --00000F03 (LOADER.NLM|KernelAddressSpace+F03)
    --00000286 (LOADER.NLM|KernelAddressSpace+286)
    --00951740 ?
    8972FA00 (NWPA.NLM|HAI_Enable_Real_Mode_Access+34)
    --00000000 (LOADER.NLM|KernelAddressSpace+0)
    --8A261E70 ?
    --00000F9B (LOADER.NLM|KernelAddressSpace+F9B)
    --00204B58 ?
    --00204B58 ?
    8A1FAC47 ?
    --0097BA50 ?
    --00204B58 ?
    --0097BA50 ?
    --00000000 (LOADER.NLM|KernelAddressSpace+0)
    --01080000 ?
    --01080000 ?
    --8A261E90 ?
    --00000F9B (LOADER.NLM|KernelAddressSpace+F9B)
    --00204B58 ?
    --00204B58 ?
    8A1FAC95 ?
    --00952180 ?
    --010824A0 ?
    --02058001 ?
    --8A261EB8 ?
    --00000F9B (LOADER.NLM|KernelAddressSpace+F9B)
    --00204B58 ?
    --00204B58 ?
    8A207658 ?
    --010824A0 ?
    --01091240 ?
    --00951740 ?
    --00008100 (LOADER.NLM|KernelAddressSpace+8100)
    --00008134 (LOADER.NLM|KernelAddressSpace+8134)
    --8A261EEC ?
    --00000F9B (LOADER.NLM|KernelAddressSpace+F9B)
    --00204B58 ?
    --00204B58 ?
    8A208ED6 ?
    --010824A0 ?
    --00000003 (LOADER.NLM|KernelAddressSpace+3)
    --00000046 (LOADER.NLM|KernelAddressSpace+46)
    --02010002 ?
    --010824A0 ?
    --00000000 (LOADER.NLM|KernelAddressSpace+0)
    --00951740 ?
    --00000001 (LOADER.NLM|KernelAddressSpace+1)
    --00204B60 ?
    --00000F9B (LOADER.NLM|KernelAddressSpace+F9B)
    --00204B58 ?
    --00204B58 ?
    89729AE5 (NWPA.NLM|NPAThreadRoutine+3D)
    --00010002 ?
    -896E5110 (MM.NLM|PostponeEvent+0)
    8973173F (NWPA.NLM|CDM_MountActivate_Message+EB)
    --00000001 (LOADER.NLM|KernelAddressSpace+1)
    --00000000 (LOADER.NLM|KernelAddressSpace+0)
    --802171E4 ?
    --802171E4 ?
    80327F6B (MM.NLM|RetryPostponedMessages+237)
    --01081A00 ?
    --00204B60 ?
    --00000F9B (LOADER.NLM|KernelAddressSpace+F9B)
    --00204B58 ?
    --00204B58 ?
    003682B2 (SERVER.NLM|CallAESRoutineWithEsiSet+A)

    Additional Information:
    The CPU encountered a problem executing code in LSIMPTNW.HAM. The problem may be in that module or in data passed to that module by a process owned by SERVER.NLM.


    Last edited by Mike021548141; 07-May-2012 at 07:15 PM. Reason: Missed a line
    Things should be as simple as possible but no simpler

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts