Overall Server Status is reported as "Severe Fault"

Document created by cdnadmin on Jan 25, 2014
Version 1Show Document
  • View in full screen mode
This document was generated from CDN thread

Created by: John Devavaram on 04-09-2012 10:04:27 PM
Hello team,
 
We are seeing the following error message on CIMC main page:
 
Overall Server Status is reported as "Severe Fault".
 
Fault Sensors display the following:
 
Threshold Sensors
 
Sensor Name  Status    Reading Units
DDR3_P1_C0_ECC  Non-Recoverable Error  253  error
 
Please refer to the attached screen captures for details.
 
Look forward to hearing from you.
 
Thank you,
John

Subject: RE: Overall Server Status is reported as "Severe Fault"
Replied by: Brett Tiller on 05-09-2012 05:38:08 PM
Hi John,

Thanks for the information.  I've provided the following request for data and steps below.

1.  In CIMC please click on the Inventory link, then the Memory tab and let us know the displayed values.

2.  Please reload the server via the router cli 'ucse <slot>/0 reload'  and let us know if the fault remains or disappears.

Thanks,
Brett

Subject: RE: Overall Server Status is reported as "Severe Fault"
Replied by: Brett Tiller on 05-09-2012 05:55:49 PM
Hi John,

Our Engineering team says that you can ignore this error message as it is incorrect.  This incorrect message is fixed in a later release which we'll post soon.

Thanks,

Brett

Subject: Re: New Message from John Devavaram in Unified Computing System E-Series Se
Replied by: Jin Zhang on 05-09-2012 05:38:38 PM
John,

I believe this error was fixed by a later version of the BMC/CIMC image.

Daniel, can you confirm that?

Brett, what's the current version of image you have given to John, is it the 0607 one?

Thanks,

Jin

From: Cisco Developer Community Forums <cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>>
Reply-To: "cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>" <cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>>
Date: Tuesday, September 4, 2012 7:05 PM
To: "cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>" <cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>>
Subject: New Message from John Devavaram in Unified Computing System E-Series Servers (UCSE) - Technical Questions: Overall Server Status is reported as "Severe Fault"

John Devavaram has created a new message in the forum "Technical Questions":

--------------------------------------------------------------
Hello team,

We are seeing the following error message on CIMC main page:

Overall Server Status is reported as "Severe Fault".

Fault Sensors display the following:

Threshold Sensors

Sensor Name Status Reading Units
DDR3_P1_C0_ECC Non-Recoverable Error 253 error

Please refer to the attached screen captures for details.

Look forward to hearing from you.

Thank you,
John
--
To respond to this post, please click the following link:

<http://developer.cisco.com/web/ucse/forums/-/message_boards/view_message/6420236>

or simply reply to this email.

Subject: RE: Overall Server Status is reported as "Severe Fault"
Replied by: John Devavaram on 06-09-2012 02:13:08 PM
Brett, Jin and team,
 
The reload of the module, momentarily stops the Severe Fault from appearing for a few hours (may be).
 
But, the Severe Fault error does appear after some time even after reloading the module/ powering off the router and powering it back on.
 
Currently, the UCS-E160D-M1/K9 module is running the following versions:
 
CIMC Firmware:
Running Version: 1.0(1.20120607153607)
 
BIOS Version: 4.6.4.8
 
 

Glad to hear that it is fixed in most recent BMC/CIMC image.

 
Thank you for all your help.
 
Regards,
John
 

Attachments

    Outcomes