Home

Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: ndsd cpu load 250%+

  1. #1
    bobbintb NNTP User

    ndsd cpu load 250%+


    So I am getting reports of timeouts when trying to autenticate against
    one of our eDirectory servers. We have two servers set up, both VMs. We
    intended to do a round robin but due to issues we mostly just split all
    the services. This one server lately is consistently over 250% CPU load.
    As I type this it is actually over 600%! What could be causing this
    issue?


    --
    bobbintb
    ------------------------------------------------------------------------
    bobbintb's Profile: https://forums.netiq.com/member.php?userid=5629
    View this thread: https://forums.netiq.com/showthread.php?t=51328


  2. #2
    ab NNTP User

    Re: ndsd cpu load 250%+

    What is the application doing, exactly ans specifically? If you have a
    poorly-written application that is hitting the server hundreds of times
    per second, that could do it. If you have an application that is trying
    to do a subtree search for some attribute and you have not defined an
    index on it, that may also cause a problem. The first stop is to find out
    what the box is doing and often tracing LDAP is a good place to do that.

    Code:
    #set LDAP tracing options properly
    ldapconfig set 'LDAP Screen Level=all'
    
    #Run ndstrace to capture data to a file.
    ndstrace
    set dstrace=nodebug
    dstrace +time +tags +ldap
    set dstrace=*m9999999
    dstrace file on
    set dstrace=*r
    #wait for a second here o capture data.
    dstrace file off
    quit
    Post the (by default) /var/opt/novell/eDirectory/log/ndstrace.log file and
    let's see what is happening.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  3. #3
    bobbintb NNTP User

    Re: ndsd cpu load 250%+


    It stopped before I had a chance to try. I will try your suggestion if
    it happens again.


    --
    bobbintb
    ------------------------------------------------------------------------
    bobbintb's Profile: https://forums.netiq.com/member.php?userid=5629
    View this thread: https://forums.netiq.com/showthread.php?t=51328


  4. #4
    bobbintb NNTP User

    Re: ndsd cpu load 250%+


    So this just started happening again. I restarted the service and the
    load immediately went critical again:


    Code:
    --------------------
    top - 13:59:56 up 104 days, 13:51, 3 users, load average: 18.72, 16.61, 11.57
    Tasks: 279 total, 1 running, 278 sleeping, 0 stopped, 0 zombie
    Cpu(s): 92.8%us, 5.5%sy, 0.0%ni, 1.5%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st
    Mem: 16249880k total, 16023036k used, 226844k free, 432292k buffers
    Swap: 8388600k total, 134056k used, 8254544k free, 7064712k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    4698 root 20 0 8085m 7.0g 26m S 1216.6 45.1 3717:53 ndsd
    24778 anglarma 20 0 15172 1388 952 R 6.9 0.0 0:01.59 top
    51 root 20 0 0 0 0 S 0.3 0.0 7:32.74 events/0
    56 root 20 0 0 0 0 S 0.3 0.0 3:33.32 events/5
    57 root 20 0 0 0 0 S 0.3 0.0 3:59.70 events/6
    83 root 20 0 0 0 0 S 0.3 0.0 1:23.92 kblockd/1
    94 root 20 0 0 0 0 S 0.3 0.0 0:07.19 kacpid
    2064 root 20 0 326m 15m 12m S 0.3 0.1 1:07.53 EvMgrC
    7025 novlwww 20 0 771m 455m 6728 S 0.3 2.9 133:03.26 java
    1 root 20 0 19356 524 316 S 0.0 0.0 0:09.48 init
    2 root 20 0 0 0 0 S 0.0 0.0 0:01.38 kthreadd
    3 root RT 0 0 0 0 S 0.0 0.0 0:01.38 migration/0
    4 root 20 0 0 0 0 S 0.0 0.0 0:15.86 ksoftirqd/0
    5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
    6 root RT 0 0 0 0 S 0.0 0.0 0:10.14 watchdog/0
    7 root RT 0 0 0 0 S 0.0 0.0 0:01.34 migration/1
    8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1

    --------------------


    I set the trace to only show LDAP and this is what I got:


    Code:
    --------------------
    09/23/2014
    14:19:06 84475700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:08 80233700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:11 86293700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:14 87BAC700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:14 86E9F700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:14 871A2700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:14 86697700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 86394700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 80637700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 84B7C700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 87AAB700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 86FA0700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 86D9E700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 82657700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 84172700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 88CBD700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 9F17B700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 83162700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:15 9A7B0700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:16 9448F700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:16 84475700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:16 886B7700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:16 81041700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:16 86495700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:16 87BAC700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:16 81445700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:16 85F90700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:17 883B4700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:17 99F98700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:18 82859700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:18 8598A700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:18 95890700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:18 85C8D700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:19 82859700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:19 82E5F700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:20 86C9D700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:21 82D5E700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:22 81748700 LDAP: TLS accept failure 5 on connection 0x44490380, setting err = -5875. Error stack:
    14:19:22 81748700 LDAP: TLS handshake failed on connection 0x44490380, err = -5875
    14:19:23 82051700 LDAP: TLS accept failure 5 on connection 0x44490380, setting err = -5875. Error stack:
    14:19:23 82051700 LDAP: TLS handshake failed on connection 0x44490380, err = -5875
    14:19:41 874A5700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:42 86293700 LDAP: TLS accept failure 5 on connection 0x430ee000, setting err = -5875. Error stack:
    14:19:42 86293700 LDAP: TLS handshake failed on connection 0x430ee000, err = -5875
    14:19:43 9F983700 LDAP: TLS accept failure 5 on connection 0x430ee000, setting err = -5875. Error stack:
    14:19:43 9F983700 LDAP: TLS handshake failed on connection 0x430ee000, err = -5875
    14:19:53 84879700 LDAP: BIO ctrl called with unknown cmd 7
    14:19:53 85485700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:02 80A3B700 LDAP: TLS accept failure 5 on connection 0x430ee380, setting err = -5875. Error stack:
    14:20:02 80A3B700 LDAP: TLS handshake failed on connection 0x430ee380, err = -5875
    14:20:03 8A4ED700 LDAP: TLS accept failure 5 on connection 0x430ee380, setting err = -5875. Error stack:
    14:20:03 8A4ED700 LDAP: TLS handshake failed on connection 0x430ee380, err = -5875
    14:20:13 81344700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:13 85889700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:14 9EA74700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:14 874A5700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:17 86697700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:17 81748700 LDAP: Failed to authenticate local on connection 0x436a6000, err = failed authentication (-669)
    14:20:17 85B8C700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:17 883B4700 LDAP: Failed to authenticate local on connection 0x434ed180, err = failed authentication (-669)
    14:20:20 86899700 LDAP: Failed to authenticate local on connection 0x430ee380, err = failed authentication (-669)
    14:20:22 87AAB700 LDAP: TLS accept failure 5 on connection 0x430ee380, setting err = -5875. Error stack:
    14:20:22 87AAB700 LDAP: TLS handshake failed on connection 0x430ee380, err = -5875
    14:20:23 84E7F700 LDAP: TLS accept failure 5 on connection 0x430ee380, setting err = -5875. Error stack:
    14:20:23 84E7F700 LDAP: TLS handshake failed on connection 0x430ee380, err = -5875
    14:20:23 83667700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:24 877A8700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:27 86495700 LDAP: Failed to authenticate local on connection 0x430ee000, err = failed authentication (-669)
    14:20:30 83B6C700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:30 82051700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:31 82F60700 LDAP: BIO ctrl called with unknown cmd 7
    14:20:33 80E3F700 LDAP: Failed to authenticate local on connection 0x430ee000, err = failed authentication (-669)
    14:20:42 9448F700 LDAP: TLS accept failure 5 on connection 0x430ee380, setting err = -5875. Error stack:
    14:20:42 9448F700 LDAP: TLS handshake failed on connection 0x430ee380, err = -5875
    14:20:43 83667700 LDAP: TLS accept failure 5 on connection 0x430ee380, setting err = -5875. Error stack:
    14:20:43 83667700 LDAP: TLS handshake failed on connection 0x430ee380, err = -5875
    14:21:00 86B9C700 LDAP: BIO ctrl called with unknown cmd 7
    14:21:00 874A5700 LDAP: BIO ctrl called with unknown cmd 7
    14:21:02 82657700 LDAP: TLS accept failure 5 on connection 0x430ee380, setting err = -5875. Error stack:
    14:21:02 82657700 LDAP: TLS handshake failed on connection 0x430ee380, err = -5875
    14:21:03 9578F700 LDAP: TLS accept failure 5 on connection 0x430ee380, setting err = -5875. Error stack:
    14:21:03 9578F700 LDAP: TLS handshake failed on connection 0x430ee380, err = -5875
    14:21:04 83869700 LDAP: Failed to authenticate local on connection 0x430ee000, err = failed authentication (-669)
    --------------------



    I probably have to set some more trace options but I really don't know
    which. It seems to have subsided by itself for now.


    --
    bobbintb
    ------------------------------------------------------------------------
    bobbintb's Profile: https://forums.netiq.com/member.php?userid=5629
    View this thread: https://forums.netiq.com/showthread.php?t=51328


  5. #5
    David Gersic NNTP User

    Re: ndsd cpu load 250%+

    On Tue, 23 Sep 2014 20:26:47 +0000, bobbintb wrote:

    > So this just started happening again. I restarted the service and the
    > load immediately went critical again:


    I'm not convinced this is a problem internal to eDirectory. Tracking down
    misperforming clients is always a pain, but I think that's what you're
    looking for here.


    > 14:19:22 81748700 LDAP: TLS accept failure 5 on connection 0x44490380,

    setting err = -5875.

    You have a bunch of these, which basically indicates that a client
    talking to your server, and your server, can't agree on the SSL layer.
    That's most likely a client problem if your server is otherwise working
    normally.

    > 14:20:20 86899700 LDAP: Failed to
    > authenticate local on connection 0x430ee380, err = failed
    > authentication (-669)


    Then you have a bunch of these. -669 is normally what you see for username
    or password is wrong.

    So given this it looks to me like something is hammering on your server
    attempting first to get a working SSL connection, then trying to log in
    with an invalid DN or password.


    > I probably have to set some more trace options but I really don't know
    > which. It seems to have subsided by itself for now.


    I'm curious why you're not seeing the DN of the attempted authentication.
    You're also not seeing the IP address the connections are coming from.
    You need to go to the LDAP Server object, find the trace options tab, and
    enable everything there that isn't "packet dump".


    --
    --------------------------------------------------------------------------
    David Gersic dgersic_@_niu.edu
    Knowledge Partner http://forums.netiq.com

    Please post questions in the forums. No support provided via email.
    If you find this post helpful, please click on the star below.

  6. #6
    bobbintb NNTP User

    Re: ndsd cpu load 250%+


    dgersic;249108 Wrote:
    > On Tue, 23 Sep 2014 20:26:47 +0000
    > I'm curious why you're not seeing the DN of the attempted
    > authentication.
    > You're also not seeing the IP address the connections are coming from.
    > You need to go to the LDAP Server object, find the trace options tab,
    > and
    > enable everything there that isn't "packet dump".
    >
    >
    > --
    > --------------------------------------------------------------------------
    > David Gersic
    > dgersic_@_niu.edu
    > Knowledge Partner
    > http://forums.netiq.com
    >
    > Please post questions in the forums. No support provided via
    > email.
    > If you find this post helpful, please click on the star below.


    Ok, it looks like only Critical Error Messages and Non-critical Error
    Messages was selected. It looks like it's been about 104 days since this
    last happened. I guess I will have to wait and see when it happens again
    but I have a good idea of the culprit. One of our systems was reporting
    slowdown issues when this started. As I mentioned earlier, I restarted
    the service and it didn't help. As soon as I told the admin to move over
    to another server server load went back to normal, although it did not
    cause any issue on the server it was moved to.


    --
    bobbintb
    ------------------------------------------------------------------------
    bobbintb's Profile: https://forums.netiq.com/member.php?userid=5629
    View this thread: https://forums.netiq.com/showthread.php?t=51328


  7. #7
    bobbintb NNTP User

    Re: ndsd cpu load 250%+


    This started happening again today and I set the LDAP trace as mentioned
    and got a good trace. Can I send it to one or both of you to look at as
    it has IP and usernames I'd rather not post? I'm pretty certain I have
    identified the culprit but I still don't know what it is actually doing.


    --
    bobbintb
    ------------------------------------------------------------------------
    bobbintb's Profile: https://forums.netiq.com/member.php?userid=5629
    View this thread: https://forums.netiq.com/showthread.php?t=51328


  8. #8
    ab NNTP User

    Re: ndsd cpu load 250%+

    Compress please; ab at novell.com

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  9. #9
    ab NNTP User

    Re: ndsd cpu load 250%+

    Based on what I can see I would make sure you have value (vs. presence or
    substring) indexes on the following attributes on this server, as well as
    any others answering these same LDAP queries from various applications:

    objectClass - Not a default, but IMO should be on every server
    CN - Usually present already, but double-check
    uniqueID - Usually present already, but double-check
    gidNumber
    memberUid
    member

    This query, in particular, is taking a long, long time to return:

    "(&(objectClass=posixGroup)(gidNumber=1234))"

    Be sure that at least objectClass is indexed, and preferably gidNumber as
    well. This search is also taking a very long time to return:

    "(&(objectClass=posixGroup)(memberUid=somethin g-here))"

    As a result, be sure that besides objectClass you get memberUid.

    Let us know if that helps. Indexes add overhead, but generally if you add
    them based on queries that are happening and are slow (as shown above)
    they are well worth the memory and processing overhead, which is usually
    not noticed.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  10. #10
    bobbintb NNTP User

    Re: ndsd cpu load 250%+


    I will look at the indexes and see if that helps but there is something
    else I noticed which might be relevant. Talking to the admin of the
    system in question and your response about the indexes led me to look
    into how groups are set up. When browsing the group objects in iManager
    I get the following error, especially on the "Dynamic" tab:


    Code:
    --------------------
    LDAP Error

    Unable to obtain a valid LDAP context.

    Creating secure SSL LDAP context failed:
    Invalid name: /:636
    --------------------


    I did a quick search and it looks like this error is related to a
    certificate. In addition to the indexes, could a bad certificate be
    compounding the issue, or is this totally off base?


    --
    bobbintb
    ------------------------------------------------------------------------
    bobbintb's Profile: https://forums.netiq.com/member.php?userid=5629
    View this thread: https://forums.netiq.com/showthread.php?t=51328


Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •