EXADATA X5数据库一体机节点login: failure forking: Cannot allocate memory问题处理
5月13日11:11,EXADATA X5数据库一体机节点2出现异常,表现为无法登录(报错Cannot allocate memory")和数据库集群资源无法关闭。工程师及时响应,通过重启服务器和更换/SYS/MB/P0/D5槽位故障内存(型号M386A4G40DM0-CPB)恢复业务。故障分析显示内存硬件故障导致系统资源不足,引发操作系统级错误。目前一体机及所有数据库已恢复正常运行。建
- EXADATA X5数据库一体机的第2个计算节点,在5月13日的11:11分出现了服务器异常无响应的现象,表现为无法登录(ILOM口登录时报错login: failure forking: Cannot allocate memory)、数据库集群的资源无法关闭-无法连接使用;工程师在发现问题后,及时进行响应,已经及时恢复业务。并且及时到机房进行节点2服务器的重启和进行相应内存配件维修更换,目前一体机运行正常,一体机上的各个数据库运行正常;
- 节点2的服务器异常问题分析
- 告警服务器异常的时间节点
5月13日的11:11分,容灾平台发出告警,X5一体机的节点2连接异常:
告警级别:灾难
开始时间:2026-05-13 11:10:18
结束时间:
容灾组:X5B
资源名:X5B-Master-25-TEST
地址:10.134.126.14
告警信息:代理程序状态异常
-
- 服务器登录报错
在X5的节点2服务器重启时,服务器登录报错、无法登录,相应的数据库HANG主无法使用。登录时报错如下:
-> start /SP/console
Are you sure you want to start /SP/console (y/n)? y
Serial console started. To stop, type ESC (
dm01db02.watone.com.cn login: root
Password:
Last login: Thu May 7 11:55:21 from dm01db01
login: failure forking: Cannot allocate memory
节点2上面的数据库HANG住无法通过集群命令进行关闭:
[oracle@dm01db01 ~]$ srvctl stop instance -d LSFCJYDB -i lsfcjydb2
PRCR-1133 : Failed to stop database lsfcjydb and its running services
PRCR-1132 : Failed to stop resources using a filter
ORA-12549: TNS:operating system resource quota exceeded
CRS-2675: Stop of 'ora.lsfcjydb.db' on 'dm01db02' failed
CRS-2678: 'ora.lsfcjydb.db' on 'dm01db02' has experienced an unrecoverable failure
CRS-5804: Communication error with agent process
ORA-12549: TNS:operating system resource quota exceeded
CRS-2675: Stop of 'ora.lsfcjydb.db' on 'dm01db02' failed
CRS-2678: 'ora.lsfcjydb.db' on 'dm01db02' has experienced an unrecoverable failure
CRS-5804: Communication error with agent process
-
- 服务器硬件日志
从服务器硬件管理平台上的信息来看,/SYS/MB/P0/D5槽位的内存出现告警fault.memory.intel.dimm_ce,建议进行更换:
日志信息如下:
[root@dm01db02 ~]# ipmitool sunoem cli "show faulty"
Connected. Use ^D to exit.
-> show faulty
Target | Property | Value
-------------------+-----------------------+-----------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB/P0/D5
/SP/faultmgmt/0/ | class | fault.memory.intel.dimm_ce
faults/0 | |
/SP/faultmgmt/0/ | sunw-msg-id | SPX86A-8002-XM
faults/0 | |
/SP/faultmgmt/0/ | component | /SYS/MB/P0/D5
faults/0 | |
/SP/faultmgmt/0/ | uuid | e5329177-8e61-6d43-e291-aa3771226d
faults/0 | | b8
/SP/faultmgmt/0/ | serd_count | 0x69
faults/0 | |
/SP/faultmgmt/0/ | fru_part_number | 07075400,M386A4G40DM0-CPB
faults/0 | |
/SP/faultmgmt/0/ | fru_rev_level | 01
faults/0 | |
/SP/faultmgmt/0/ | fru_serial_number | 00CE0216393392A92B
faults/0 | |
/SP/faultmgmt/0/ | fru_manufacturer | Samsung
faults/0 | |
/SP/faultmgmt/0/ | fru_name | 32768MB DDR4 SDRAM DIMM
faults/0 | |
/SP/faultmgmt/0/ | system_component_manu | Oracle Corporation
faults/0 | facturer |
/SP/faultmgmt/0/ | system_component_name | ORACLE SERVER X5-2
faults/0 | |
/SP/faultmgmt/0/ | system_component_part | 7090664
faults/0 | _number |
/SP/faultmgmt/0/ | system_component_seri | 1517NM10DF
faults/0 | al_number |
/SP/faultmgmt/0/ | chassis_manufacturer | Oracle Corporation
faults/0 | |
/SP/faultmgmt/0/ | chassis_name | ORACLE SERVER X5-2
faults/0 | |
/SP/faultmgmt/0/ | chassis_part_number | 7090664
faults/0 | |
/SP/faultmgmt/0/ | chassis_serial_number | 1517NM10DF
faults/0 | |
/SP/faultmgmt/0/ | system_manufacturer | Oracle Corporation
faults/0 | |
/SP/faultmgmt/0/ | system_name | Exadata X5-2
faults/0 | |
/SP/faultmgmt/0/ | system_part_number | Exadata X5-2
faults/0 | |
/SP/faultmgmt/0/ | system_serial_number | AK00302721
faults/0 | |
/SP/faultmgmt/0/ | retire | no
faults/0 | |
-> Session closed
集群TED
- 总结与后续处理建议
- 问题分析总结
通过对EXADATA X5数据库一体机的故障想象和相关日志的深入分析,故障现象表现为服务器异常无响应。代维科技工程师在发现问题后,及时进行响应,已经及时恢复业务;并且及时到转塘机房进行节点2服务器的重启和进行相应内存配件维修更换,目前一体机运行正常,一体机上的各个数据库运行正常;
openEuler 是由开放原子开源基金会孵化的全场景开源操作系统项目,面向数字基础设施四大核心场景(服务器、云计算、边缘计算、嵌入式),全面支持 ARM、x86、RISC-V、loongArch、PowerPC、SW-64 等多样性计算架构
更多推荐


所有评论(0)