Overall Server Status is reported as "Severe Fault"

Version 1
    This document was generated from CDN thread

    Created by: John Devavaram on 04-09-2012 10:04:27 PM
    Hello team,
     
    We are seeing the following error message on CIMC main page:
     
    Overall Server Status is reported as "Severe Fault".
     
    Fault Sensors display the following:
     
    Threshold Sensors
     
    Sensor Name  Status    Reading Units
    DDR3_P1_C0_ECC  Non-Recoverable Error  253  error
     
    Please refer to the attached screen captures for details.
     
    Look forward to hearing from you.
     
    Thank you,
    John

    Subject: RE: Overall Server Status is reported as "Severe Fault"
    Replied by: Brett Tiller on 05-09-2012 05:38:08 PM
    Hi John,

    Thanks for the information.  I've provided the following request for data and steps below.

    1.  In CIMC please click on the Inventory link, then the Memory tab and let us know the displayed values.

    2.  Please reload the server via the router cli 'ucse <slot>/0 reload'  and let us know if the fault remains or disappears.

    Thanks,
    Brett

    Subject: RE: Overall Server Status is reported as "Severe Fault"
    Replied by: Brett Tiller on 05-09-2012 05:55:49 PM
    Hi John,

    Our Engineering team says that you can ignore this error message as it is incorrect.  This incorrect message is fixed in a later release which we'll post soon.

    Thanks,

    Brett

    Subject: Re: New Message from John Devavaram in Unified Computing System E-Series Se
    Replied by: Jin Zhang on 05-09-2012 05:38:38 PM
    John,

    I believe this error was fixed by a later version of the BMC/CIMC image.

    Daniel, can you confirm that?

    Brett, what's the current version of image you have given to John, is it the 0607 one?

    Thanks,

    Jin

    From: Cisco Developer Community Forums <cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>>
    Reply-To: "cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>" <cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>>
    Date: Tuesday, September 4, 2012 7:05 PM
    To: "cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>" <cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>>
    Subject: New Message from John Devavaram in Unified Computing System E-Series Servers (UCSE) - Technical Questions: Overall Server Status is reported as "Severe Fault"

    John Devavaram has created a new message in the forum "Technical Questions":

    --------------------------------------------------------------
    Hello team,

    We are seeing the following error message on CIMC main page:

    Overall Server Status is reported as "Severe Fault".

    Fault Sensors display the following:

    Threshold Sensors

    Sensor Name Status Reading Units
    DDR3_P1_C0_ECC Non-Recoverable Error 253 error

    Please refer to the attached screen captures for details.

    Look forward to hearing from you.

    Thank you,
    John
    --
    To respond to this post, please click the following link:

    <http://developer.cisco.com/web/ucse/forums/-/message_boards/view_message/6420236>

    or simply reply to this email.

    Subject: RE: Overall Server Status is reported as "Severe Fault"
    Replied by: John Devavaram on 06-09-2012 02:13:08 PM
    Brett, Jin and team,
     
    The reload of the module, momentarily stops the Severe Fault from appearing for a few hours (may be).
     
    But, the Severe Fault error does appear after some time even after reloading the module/ powering off the router and powering it back on.
     
    Currently, the UCS-E160D-M1/K9 module is running the following versions:
     
    CIMC Firmware:
    Running Version: 1.0(1.20120607153607)
     
    BIOS Version: 4.6.4.8
     
     

    Glad to hear that it is fixed in most recent BMC/CIMC image.

     
    Thank you for all your help.
     
    Regards,
    John