Alert Log中“Fatal NI connect error 12170”错误

0    105    1

Tags:

👉 本文共约15561个字,系统预计阅读时间或需59分钟。

Fatal NI connect error 12170.

VERSION INFORMATION:

​ TNS for Linux: Version 11.2.0.4.0 - Production

​ Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production

​ TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production

Time: 08-MAY-2017 10:24:32

Tracing not turned on.

Tns error struct:

ns main err code: 12535

TNS-12535: TNS:operation timed out

ns secondary err code: 12560

nt main err code: 505

TNS-00505: Operation timed out

nt secondary err code: 110

nt OS err code: 0

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.176.172.44)(PORT=55353))

[oracle11g@localhost admin]$ oerr ora 12170

12170, 00000, "TNS:Connect timeout occurred"

// *Cause: The server shut down because connection establishment or

// communication with a client failed to complete within the allotted time

// interval. This may be a result of network or system delays; or this may

// indicate that a malicious client is trying to cause a Denial of Service

// attack on the server.

// *Action: If the error occurred because of a slow network or system,

// reconfigure one or all of the parameters SQLNET.INBOUND_CONNECT_TIMEOUT,

// SQLNET.SEND_TIMEOUT, SQLNET.RECV_TIMEOUT in sqlnet.ora to larger values.

// If a malicious client is suspected, use the address in sqlnet.log to

// identify the source and restrict access. Note that logged addresses may

// not be reliable as they can be forged (e.g. in TCP/IP).

[oracle11g@localhost admin]$

解决方法:

listener.ora

INBOUND_CONNECT_TIMEOUT_LISTENER = 0

DIAG_ADR_ENABLED_LISTENER = OFF

sqlnet.ora

DIAG_ADR_ENABLED = OFF

SQLNET.INBOUND_CONNECT_TIMEOUT =0

之后,重新reload监听器配置,或者重启监听器。



定期检查数据库alert log信息,是我们进行数据库日常维护、巡检和故障排除的重要工作手段。数据库系统“带病运行”、“负伤运行”往往是“小病致死”的主要杀手。所谓“防患于未然”就需要数据库管理员从日常的小事微情入手,时刻了解系统运行情况,并尽早进行处理。

本文主要介绍笔者使用Oracle 11gR2过程中日志巡检中出现的问题,虽然最后没有得到圆满解决。记录下来,留待需要朋友待查。

1**、问题说明**

笔者使用的一套开发环境,数据库版本是11gR2,小版本号为11.2.0.4。

SQL> select * from v$version;

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

PL/SQL Release 11.2.0.4.0 - Production

CORE 11.2.0.4.0 Production

TNS for Linux: Version 11.2.0.4.0 - Production

NLSRTL Version 11.2.0.4.0 – Production

在巡检数据库alert log过程中,发现一些错误提示。

Tue May 19 23:04:55 2015


Fatal NI connect error 12170.

VERSION INFORMATION:

​ TNS for Linux: Version 11.2.0.4.0 - Production

​ Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production

​ TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production

Time: 19-MAY-2015 23:04:55

Tracing not turned on.

Tns error struct:

ns main err code: 12535

TNS-12535: TNS:operation timed out

ns secondary err code: 12560

nt main err code: 505

TNS-00505: Operation timed out

nt secondary err code: 110

nt OS err code: 0

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=172.xx.xx.xx)(PORT=50741))

相同类型错误在日志中反复出现,每天出现频率在10条左右,区别在于每次的Host对应IP地址不同。

2**、问题分析**

这类型错误在11gR2版本中经常出现。笔者之前的一些投产系统中,也常常出现这样的问题。当前进行开发的系统架构比较传统,是一个典型CS架构方式。客户端桌面应用是一个富客户端软件,所有业务逻辑都在客户端。客户端直接连接数据库。

这种架构方式是比较传统的方式,行业内对于这种方式的缺陷已经讨论很多年了。单从数据库角度看,这样的架构方式意味着更加多的数据库连接数量和更加频繁的访问结构。

利用IP地址,我们可以从监听器日志listener.log中,可以定位到这个IP地址连接是什么时候连接过来的。

[oracle@localhost trace]$ pwd

/u01/app/diag/tnslsnr/localhost/listener/trace

[oracle@localhost trace]$ cat listener.log | grep 172. xx.xx.xx

19-MAY-2015 13:51:10 (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=jdbc)(USER=visvim))(SERVICE_NAME=sicsdb)) (ADDRESS=(PROTOCOL=tcp)(HOST=172.xx.xx.xx)(PORT=50741)) establish sicsdb * 0

从MOS上的信息反馈看,这个类型错误提示是一种正常的Oracle工作机制。当客户端进程Client Process与服务器进程Server Process建立联系之后,两者就形成了“同生共死”的关系(专有连接模式)。除非客户端主动发起中断或者Server Process被异常kill。

在实际运行环境中,这种理想状态常常被打破。如果Client Process只是保持连接,不执行语句,会话就处于idle状态。这种连接很容易被诸如防火墙等网络层面设备切断。

在Oracle11gR2中,如果长期没有连接动作的Server Process被外力切断,Oracle就会自动将信息作为提示错误写入到alert log中,作为一种提示。在11R1版本中,这种信息是会写入到sqlnet.log中。

3**、问题解决措施**

本人提供Oracle(OCP、OCM)、MySQL(OCP)、PostgreSQL(PGCA、PGCE、PGCM)等数据库的培训和考证业务,私聊QQ646634621或微信db_bao,谢谢!

归纳MOS和网络中的各种方法,大体有两重策略,分别为使用DCD和禁用ADR。

DCD全称Dead Connection Detection,是一种基于主动测探方式检查Oracle僵尸客户端进程Client Process的策略。配置DCD的关键是设置sqlnet.expire_time参数在SQL Net体系下,Oracle会依据这个时间间隔给所有的Client Process发送网络通信包,用来确定Client是否存活。

正是借助这个包通信,可以让防火墙认为这个网络连接还是处在active状态,不会进行强制断开动作。类似的机制还有Linux上的tcp keep live机制,也是使用类似的策略进行检查。

[oracle@localhost trace]$ cd /u01/app/oracle/network/admin/

[oracle@localhost admin]$ ls -l

total 16

-rw-r--r--. 1 oracle oinstall 343 Sep 2 2014 listener.ora

drwxr-xr-x. 2 oracle oinstall 4096 Jun 16 2014 samples

-rw-r--r--. 1 oracle oinstall 381 Dec 17 2012 shrept.lst

-rw-r--r--. 1 oracle oinstall 0 Sep 2 2014 sqlnet.ora

-rw-r-----. 1 oracle oinstall 308 Sep 5 2014 tnsnames.ora

[oracle@localhost admin]$ cat sqlnet.ora

[oracle@localhost admin]$ cat sqlnet.ora

sqlnet.expire_time=10

另一种方式也是Oracle推荐的,就是关闭11g的ADR机制。ADR(Automatic Diagnostic Repository)是Oracle进行自动诊断、自动提醒的工具组件。Oracle认为如果用户不需要在SQL Net组件中应用ADR,可以再sqlnet.ora中进行配置关闭。

[oracle@localhost admin]$ cat sqlnet.ora

sqlnet.expire_time=10

DIAG_ADR_ENABLED = OFF

DIAG_ADR_ENABLED_LISTENER=OFF

之后,重新reload监听器配置,或者重启监听器。

[oracle@localhost admin]$ lsnrctl reload

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 21-MAY-2015 10:13:34

Copyright (c) 1991, 2013, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521)))

4**、结论**

数据库“Fatal NI connect error 12170”问题,从本质上是由于长连接数据库交互方式造成的,严格意义上不应算什么错误问题。如果是一些三层架构体系应用,可以考虑使用连接池进行动态资源调配的方式,对问题进行缓解。



深蓝的blog:http://blog.csdn.net/huangyanlong/article/details/46372849

查看数据库告警日志,发现错误:Fatal NI connect error 12170报错

告警日志如下:


Fatal NI connect error 12170.

VERSION INFORMATION:

​ TNS for Linux: Version 11.2.0.1.0 - Production

​ Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production

​ TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production

Time: 26-OCT-2014 06:05:44

Tracing not turned on.

Tns error struct:

ns main err code: 12535

VERSION INFORMATION:

​ TNS for Linux: Version 11.2.0.1.0 - Production

​ Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production

​ TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production

Time: 26-OCT-2014 06:05:44

Tracing not turned on.

Tns error struct:

ns main err code: 12535

TNS-12535: TNS:operation timed out

ns secondary err code: 12606

nt main err code: 0

nt secondary err code: 0

nt OS err code: 0

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.53.105.20)(PORT=19164))

VERSION INFORMATION:

​ TNS for Linux: Version 11.2.0.1.0 - Production

​ Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production

​ TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production

Time: 26-OCT-2014 06:05:44

Tracing not turned on.

Tns error struct:

ns main err code: 12535

TNS-12535: TNS:operation timed out

ns secondary err code: 12606

nt main err code: 0

nt secondary err code: 0

nt OS err code: 0

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.53.105.20)(PORT=19166))

TNS-12535: TNS:operation timed out

ns secondary err code: 12606

nt main err code: 0

nt secondary err code: 0

nt OS err code: 0

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.53.105.20)(PORT=19165))

Sun Oct 26 06:05:50 2014

Successfully onlined Undo Tablespace 2.

Verifying file header compatibility for 11g tablespace encryption..

Verifying 11g file header compatibility for tablespace encryption completed


【解决方式】

[oracle@node1 ~]$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 02-MAR-2015 12:34:22

Copyright (c) 1991, 2009, Oracle. All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))

STATUS of the LISTENER

------------------------

Alias LISTENER

Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production

Start Date 02-MAR-2015 09:45:49

Uptime 0 days 2 hr. 48 min. 33 sec

Trace Level off

Security ON: Local OS Authentication

SNMP OFF

Listener Parameter File /u01/11.2.0/grid/network/admin/listener.ora

Listener Log File /u01/app/oracle/diag/tnslsnr/node1/listener/alert/log.xml

Listening Endpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.53.105.20)(PORT=1521)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.53.105.24)(PORT=1521)))

Services Summary...

Service "+ASM" has 1 instance(s).

Instance "+ASM1", status READY, has 1 handler(s) for this service...

Service "xcky" has 1 instance(s).

Instance "xcky1", status READY, has 1 handler(s) for this service...

Service "xckyXDB" has 1 instance(s).

Instance "xcky1", status READY, has 1 handler(s) for this service...

The command completed successfully

[oracle@node1 ~]$ cd /u01/11.2.0/grid/network/admin/

[oracle@node1 admin]$ ls

endpoints_listener.ora listener.ora samples sqlnet.ora

listener1410255PM1539.bak listener.ora.bak.node1 shrept.lst

[root@node1 admin]# chmod 775 listener.ora

[root@node1 admin]# chmod 775 sqlnet.ora

--确认oracle用户对配置文件的访问权限

[oracle@node1 admin]$ vi listener.ora

--添加参数INBOUND_CONNECT_TIMEOUT_LISTENER = 0

--添加参数DIAG_ADR_ENABLED_LISTENER = OFF

LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent

LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent

ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent

ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent

INBOUND_CONNECT_TIMEOUT_LISTENER = 0

DIAG_ADR_ENABLED_LISTENER = OFF

~

~

[oracle@node1 admin]$ vi sqlnet.ora

--添加参数DIAG_ADR_ENABLED = OFF

--添加参数SQLNET.INBOUND_CONNECT_TIMEOUT =0

# sqlnet.ora.node1 Network Configuration File: /u01/11.2.0/grid/network/admin/sqlnet.ora.node1

# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)

ADR_BASE = /u01/app/oracle

DIAG_ADR_ENABLED = OFF

SQLNET.INBOUND_CONNECT_TIMEOUT =0

~

~

【官方文档】

来看一下官方文档中的说明,如下:

ORA-12170: TNS:Connect timeout occurred

Cause: The client failed to establish a connection and complete authentication in the time specified by the SQLNET.INBOUND_CONNECT_TIMEOUT parameter in the sqlnet.ora file. This error may be a result of network or system delays, or it may indicate that a malicious client is trying to cause a denial-of-service attack on the database server.

See Also:

["Configuring the Listener and the Oracle Database To Limit Resource Consumption By Unauthorized Users"](file:///E:/files/book/concept/10g Release 2 (10.2) Oracle Database Documentation Library/B19306_01/network.102/b14212/performance.htm#i1006364) further information about setting the SQLNET.INBOUND_CONNECT_TIMEOUT parameter

Action: If the error occurred due to system or network delays that are normal for the particular environment, then perform these steps:

Turn on tracing to determine where clients are timing out.

See Also:

["Tracing Error Information for Oracle Net Services"](file:///E:/files/book/concept/10g Release 2 (10.2) Oracle Database Documentation Library/B19306_01/network.102/b14212/troublestng.htm#i468763)

Reconfigure the SQLNET.INBOUND_CONNECT_TIMEOUT parameter in sqlnet.ora to a larger value.

If you suspect a malicious client, then perform these steps:

Locate the IP address of the client in the sqlnet.log file on the database server to identify the source.

For example, the following sqlnet.log excerpt shows a client IP address of 10.10.150.35.

Fatal NI connect error 12170.

VERSION INFORMATION:

​ TNS for Solaris: Version 10.1.0.2.0

​ Oracle Bequeath NT Protocol Adapter for Solaris: Version 10.1.0.2.0

​ TCP/IP NT Protocol Adapter for Solaris: Version 10.1.0.2.0

Time: 03-JUL-2002 13:51:12

Tracing to file: /ora/trace/svr_13279.trc

Tns error struct:

nr err code: 0

ns main err code: 12637

TNS-12637: Packet receive failed

ns secondary err code: 12604

nt main err code: 0

nt secondary err code: 0

nt OS err code: 0

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.10.150.35)(PORT=52996))

Beware that an IP address can be forged.

If the time out occurs before the IP address can be retrieved by the database server, then enable listener tracing to determine the client that made the request.

See Also:

[Tracing Error Information for Oracle Net Services](file:///E:/files/book/concept/10g Release 2 (10.2) Oracle Database Documentation Library/B19306_01/network.102/b14212/troublestng.htm#i468763)

Restrict access to the client. For example, you can configure parameters for access rights in the sqlnet.ora file.

See Also:

["Configuring Database Access Control"](file:///E:/files/book/concept/10g Release 2 (10.2) Oracle Database Documentation Library/B19306_01/network.102/b14212/profile.htm#i485056)

【过程梳理】

上面是10G的官方文档内容,可以看到官方文档,给出了一个建议:Reconfigure the SQLNET.INBOUND_CONNECT_TIMEOUT parameter in sqlnet.ora to a larger value.

再找找官方文档对于SQLNET.INBOUND_CONNECT_TIMEOUT的说明,如下图:

Alert Log中“Fatal NI connect error 12170”错误

Alert Log中“Fatal NI connect error 12170”错误

​ 如上,这个问题的出现是由于数据库连接没能成功完成,在连接结果来看出现了延迟造成的。在官方文档中并没有找到明确的处理方式,只是一些配置建议。

​ 后续在网上搜了搜同类的错误处理方式(有资料说这个错误可以再MOS文档中找到),再结合官方文档,初步了解到对于ORA-12170错误,由于在Automatic Diagnostic Repository中的 Oracle Net diagnostic是开启状态,从而对连接延迟错误进行获取并且将其写入告警日志。

网友贴出来的MOS文档参考,如下:

To revert to Oracle Net Server tracing/logging, set following parameter in the server's sqlnet.ora :

DIAG_ADR_ENABLED = OFF

Also, to back out the ADR diag for the Listener component, set following parameter in the server's listener.ora:

DIAG_ADRENABLED = OFF

- Where the would be replaced with the actual name of the configured listener(s) in the listener.ora configuration file. For example, if the listener name is 'LISTENER', the parameter would read:

DIAG_ADR_ENABLED_LISTENER = OFF

-Reload or restart the TNS Listener for the parameter change to take effect.

网友贴出的Metalink上给出的解决方案

\1. set INBOUND_CONNECTTIMEOUT=0 in listener.ora

\2. set SQLNET.INBOUND_CONNECT_TIMEOUT = 0 in sqlnet.ora of server.

\3. stop and start both listener and database.

\4. Now try to connect to DB and observe the behaviour



Fatal NI Connect Error 12170, 'TNS-12535: TNS:operation timed out' Reported in 11g Alert Log (文档 ID 1286376.1)

In this Document

Symptoms
Changes
Cause
Solution
References

APPLIES TO:

Oracle Net Services - Version 11.1.0.6 to 12.1.0.2 [Release 11.1 to 12.1]
Information in this document applies to any platform.
TNS-12170, ORA-12170, TNS-12535, TNS-00505 alert.log

SYMPTOMS

Alert Log中“Fatal NI connect error 12170”错误

nt secondary err code: 110 Monitoring of the 11g database Alert log(s) may show frequent timeout related messages such as:

- On Oracle Solaris:


Fatal NI connect error 12170.

VERSION INFORMATION:
TNS for Solaris: Version 11.2.0.1.0 - Production
Oracle Bequeath NT Protocol Adapter for Solaris: Version 11.2.0.1.0 - Production
TCP/IP NT Protocol Adapter for Solaris: Version 11.2.0.1.0 - Production
Time: 22-JAN-2011 21:48:23
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 145
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.xxx.yy.117)(PORT=1092))

---------
The "nt secondary err code" will be different based on the operating system.

Linux x86 or Linux x86-64: "nt secondary err code: 110"
HP-UX Server: "nt secondary err code: 238"
AIX: "nt secondary err code: 78"

CHANGES

No changes are necessary, but may have recently upgraded the database to 11g release 1 or higher, or installed a new Oracle11g database.

Note: Prior to 11gR1 these same 'Fatal NI connect error 12170' are written to the sqlnet.log

CAUSE

These time out related messages are mostly informational in nature. The messages indicate the specified client connection (identified by the 'Client address:' details) has experienced a time out. The 'nt secondary err code' identifies the underlying network transport, such as (TCP/IP) timeout limits after a client has abnormally terminated the database connection.

The 'nt secondary err code' translates to underlying network transport timeouts for the following Operating Systems:

For the Solaris system: nt secondary err code: 145:
ETIMEDOUT 145 / Connection timed out /

For the Linux operating system: nt secondary err code: 110
ETIMEDOUT 110 Connection timed out

For the HP-UX system: nt secondary err code: 238:
ETIMEDOUT 238 / Connection timed out /

For AIX: nt secondary err code: 78:
ETIMEDOUT 78 / Connection timed out /

For Windows based platforms: nt secondary err code: 60 (which translates to Winsock Error: 10060)

Description: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

The reason the messages are written to the alert log is related to the use of the new 11g Automatic Diagnostic Repository (ADR) feature being enabled by default. See (Doc ID 454927.1).

SOLUTION

Suggested Actions:

- Search the corresponding text version of the listener log located on the database server for the corresponding client connection referenced by the Client address details referenced in the alert log message.

For the message incident below you would search the listener log for the 'Client address' string:

(ADDRESS=(PROTOCOL=tcp)(HOST=10.xxx.yy.117)(PORT=1092))

The search of the listener log should find the most recent connection before the time reference displayed in the alert log message, e.g. '22-JAN-2011 21:48:23'.

-Corresponding listener log entry:

22-JAN-2011 21:20:12 (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=AMN11264.us.oracle.com)(CID=(PROGRAM=D:\app\mcassady\product\11.2.0\dbhome_1\bin\sqlplus.exe)(HOST=mcassady-lap)(USER=mca
ssady)))
(ADDRESS=(PROTOCOL=tcp)(HOST=10.xxx.yy.117)(PORT=1092)) establish AMN11264.us.oracle.com * 0

- Alert log entry:
------------

Fatal NI connect error 12170.

VERSION INFORMATION:
TNS for Solaris: Version 11.2.0.1.0 - Production
Oracle Bequeath NT Protocol Adapter for Solaris: Version 11.2.0.1.0 - Production
TCP/IP NT Protocol Adapter for Solaris: Version 11.2.0.1.0 - Production
Time: 22-JAN-2011 21:48:23
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 145
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.xxx.yy.117)(PORT=1092))

------------

Note the time of the client corresponding client connection(s) in the listener log. Here you may find a particular client, set of clients or particular applications that are improperly disconnecting causing the timeout errors to be raised and recorder in the database alert log.

See the following for more information and a potential solution where a firewall may be causing this issue: Note:1628949.1 Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out

You may choose to revert from the new Automatic Diagnostic Repository (ADR) method to prevent the Oracle Net diagnostic details from being written to the alert log(s) by setting the following Oracle Net configuration parameters:

To revert to Oracle Net Server tracing/logging, set following parameter in the server's sqlnet.ora :

DIAG_ADR_ENABLED = OFF

Also, to back out the ADR diag for the Listener component, set following parameter in the server's listener.ora:

DIAG_ADRENABLED = OFF

- Where the would be replaced with the actual name of the configured listener(s) in the listener.ora configuration file. For example, if the listener name is 'LISTENER', the parameter would read:

DIAG_ADR_ENABLED_LISTENER = OFF

-Reload or restart the TNS Listener for the parameter change to take effect.

REFERENCES

NOTE:151972.1 - Dead Connection Detection (DCD) Explained
NOTE:454927.1 - Using and Disabling the Automatic Diagnostic Repository (ADR) with Oracle Net for 11g
NOTE:1628949.1 - Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out



Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out (文档 ID 1628949.1)

In this Document

Symptoms
Changes
Cause
Solution
References

APPLIES TO:

Oracle Net Services - Version 11.2.0.3 to 12.1.0.2 [Release 11.2 to 12.1]
Oracle Database - Standard Edition - Version 11.2.0.3 to 11.2.0.3 [Release 11.2]
Information in this document applies to any platform.

SYMPTOMS

The following error is reported in the database alert log.

*Note the "Client address**" is posted within the error stack in this case.

Fatal NI connect error 12170.

VERSION INFORMATION:
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for 64-bit Windows: Version 11.2.0.3.0 - Production
Windows NT TCP/IP NT Protocol Adapter for 64-bit Windows: Version 11.2.0.3.0 - Production
Time: 22-FEB-2014 12:45:09
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 60
nt OS err code: 0
*\Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=121.23.142.141)(PORT=45679))**

The PORT field here is the ephemeral port assigned to the client for this connection.
This does not correspond to the listener port.

CHANGES

No changes are necessary, but may have recently upgraded the database to 11g release 1 or higher, or installed a new Oracle11g database and they are now visible in the alert log.
Note: Prior to 11gR1 these same 'Fatal NI connect error 12170' are written to the sqlnet.log. This document describes a problem that arises when a firewall exists between the client and the database server.

CAUSE

We can search the listener log covering the same time period using this search criteria.

(HOST=121.23.142.141)(PORT=45679)

The 11g listener log in text format is located here:

$ORACLE_BASE/diag/tnslsnr///trace/.log

Again, this is the client's IP address and the unique ephemeral port assigned to the client for this connection.

In this case, we find that this connection was established at the listener at this timestamp:

22-FEB-2014 10:42:10 (CONNECT_DATA=(SID=test)(CID=(PROGRAM=)(HOST=jdbc)(USER=))) (ADDRESS=(PROTOCOL=tcp)(HOST=121.23.142.141)(PORT=45679)) establish test* 0 .

Compare this to the event in the alert.log with special attention to the timestamp.
The connection was dropped by the instance at 22-FEB-2014 12:45:09 or roughly 2 hours later.

Time: 22-FEB-2014 12:45:09
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 60
nt OS err code: 0
*\Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=121.23.142.141)(PORT=45679))**

The 'nt secondary err code' identifies the underlying network transport, such as (TCP/IP) timeout limit. In the current case 60 identifies Windows underlying transport layer.

The "nt secondary err code" will be different based on the operating system:

Linux x86 or Linux x86-64: "nt secondary err code: 110"
HP-UX : "nt secondary err code: 238"
AIX: "nt secondary err code: 78"
Solaris: "nt secondary err code: 145"

The alert.log message indicates that a connection was terminated AFTER it was established to the instance. In this case, it was terminated 2 hours and 3 minutes after the listener handed the connection to the database.

This would indicate an issue with a firewall where a maximum idle time setting is in place.

The connection would not necessarily be "idle". This issue can arise during a long running query or when using JDBC Thin connection pooling. If there is no data 'on the wire' for lengthy

periods of time for any reason, the firewall might terminate the connection.

SOLUTION

The non-Oracle solution would be to remove or increase the firewall setting for maximum idle time. In cases where this is not feasible, Oracle offers the following suggestion:

The following parameter, set at the **RDBMS_HOME/network/admin/sqlnet.ora, can resolve this kind of problem. DCD or SQLNET.EXPIRE_TIME can mimic data transmission between the server and the client during long periods of idle time.

SQLNET.EXPIRE_TIME=n Where is a non-zero value set in minutes.

Once this change is in place, there is NO need to restart the listener or the database. The change will be in place for all newly spawned server processes following the change.

Be aware that connections that were established prior to this setting would not be affected by this change. Therefore, you may continue to experience some timeouts until all remote connection are established with this setting in place.

See the following : Note 257650.1 Resolving Problems with Connection Idle Timeout With Firewall

\In an installation that includes GRID, this parameter should be set in the RDBMS_HOME/network/admin/sqlnet.ora file. This would be the default location for sqlnet.ora file parameters referenced by the instance.

Please consider your business requirement for allowing connections to remain or appear 'idle' before implementing these suggestions and note that this is a workaround which, on some occasions, may not overpass all the network timeouts.

REFERENCES

NOTE:257650.1 - Resolving Problems with Connection Idle Timeout With Firewall
NOTE:1286376.1 - Fatal NI Connect Error 12170, 'TNS-12535: TNS:operation timed out' Reported in 11g Alert Log



Dead Connection Detection (DCD) Explained (文档 ID 151972.1)



Resolving Problems with Connection Idle Timeout With Firewall (文档 ID 257650.1)

In this Document

Purpose
Scope
Details
An Overview
Blackout
Resolving problems with connection idle time-out

APPLIES TO:

Oracle Net Services - Version 9.2.0.2 to 12.1.0.1 [Release 9.2 to 12.1]
Oracle Net Services - Version 12.1.0.2 to 12.1.0.2 [Release 12.1]
Information in this document applies to any platform.
Checked for relevance on 29-SEPT-2015

PURPOSE

This article describes about the connection idle time-out issues that occur while a firewall or load balancer is used to monitor and control TCP traffic between clients (such as application servers / iAS components) and Oracle Databases.

SCOPE

This article is primarily intended for Application/Database Administrators and Network Administrators who would like to understand and try to workaround connection "IDLE TIMEOUT" issues in a firewall or F5 load balancer environment.

DETAILS

An Overview

Firewalls (FW) has become common in today's networking to protect the network environment. The firewall recognizes the TCP protocol and it records the client server socket end-points. Also, FW recognize the TCP connection closure, and then will release the resources allocated for recording the opening connection. For every end-point pairs , the firewall must also allocate some resources(may be small).

When the client or server closes the communication it sends TCP FIN type packet, this is a normal socket closure. However, it is not uncommon that the client server communication abruptly ending without closing the end points properly by sending FIN packet, for example, when the client or server crashed, power down or a network error which prevents sending the closure packet to the other end. In that cases, the firewall will not know that the end-points will no longer use the opened channel. As a passive intermediary, it had no way to determine if the endpoints are still active. As is it not possible to maintain resources forever, and also, it is a security threat keeping a port open for undefined time. So, firewall imposes a BLACKOUT on those connections that stay idle for a predefined amount of time.

Initially FW were designed to protect the application servers, network and then to protect client/server connection. With these in mind, a time-out in terms of hours (1 hour is the default for most FW) is reasonable. With the advent of more complex security schemes, FW are not only between client and server, but also between different application servers ( intranet, demilitarized zone (DMZ) , and such) and database servers. So, the horizon of 1 hour idle time for communication between servers maybe not be appropriate.

Idle connections can be expected from an application server. There is the case of J2EE using pooled JDBC connections. The pool usually returns the first available connection to the requester, so the first connections of the pool list are the most likely to be active. The last one, which are at the end of the list, are only used at peak loads, and most of the time it will be inactive.

Other cases are the connections established from a HTTP Server, either SQL connections from mod_plsql, or AJP connections from mod_oc4j.

Blackout

One of the inconvenience of theses blackout, is that they are passive. None of the endpoints will be notified that the communication was banned . Only when the client or server tries to contact its peer, it comes to know that the peer end is no more active and the communication has already been broken.

The worst of all scenarios are the so called "passive listeners" . They will never know. Because, passive listeners are those processes at an endpoint that are simply waiting for commands to arrive from the other end. A typical example of this are the backend database server processes, which are reading from the socket looking new SQL statements to execute , and after the request is answered, they return to their passive state. When a blackout occurs, they will stay forever in this reading state, unless some of the following techniques are applied.

Resolving problems with connection idle time-out

TCP KeepAlive

Ensure TCP KeepAlive is set appropriately for your environment. Refer to your OS documentation for details.

DCD for DataBase Servers

For database connections, one of the endpoints is a passive listener, either is a dedicated process or a dispatcher process. If the connection becomes blacked out , this backend will never know that client cannot send any more requests, and then will lock important resources as database sessions, locks , and at least , a file descriptor used for maintaining the socket.

A solution is to make this backend "not so" passive, using the DCD (dead connection detection) to figure out if the communication is still possible.

Simply, set in the $ORACLE_HOME/network/admin/sqlnet.ora, in the server side SQLNET.EXPIRE_TIME=10 (10 minutes, for example). With this parameter in place, after 10 minutes of inactivity, the server send a small 10 bytes probe packet to the client. If this packet is not acknowledge, the connection will be closed and the associated resources will be released.

There are two benefits with this DCD
1. If the SQLNET.EXPIRE_TIME is less than the FW connection idle time-out, then the firewall may consider this packet as activity, and the idle time-out (firewall blackout) will never happen until both the client and the server processes are alive.

2. If the SQLNET.EXPIRE_TIME (let's say a little bit higher) than the FW idle limit, then , as soon as the blackout happens , the RDBMS will know and will close the connection.

The first case is recommended when the connection comes from another application server , and the second makes sense for client applications.

DCD works at the application level and also works on top of TCP/IP protocol. If you have set the SQLNET.EXPIRE_TIME=10 then do not expect that the connections will be flagged as dead exactly after 10 minutes of the blackout or network outage. Please seeNote:151972.1 "Dead Connection Detection (DCD) Explained" for details on DCD. The TCP timeout and TCP retransmission values also adds to this time.

PLEASE NOTE:
DCD was never designed to be used as a "virtual traffic generator" as we are wanting to use it for here. This is merely a useful side-effect of the feature.
In fact, some later firewalls and updated firewall firmware may not see DCD packets as a valid traffic possibly because the packets that DCD sends are actually empty packets. Therefore, DCD may not work as expected and the firewall / switch may still terminate TCP sockets that are idle for the period monitored, even when DCD is enabled and working.
In such cases, the firewall timeout should be increased or users should not leave the application idle for longer than the idle time out configured on the firewall.


AJP Connections

It is not a default behavior in 9.0.2, but if Patch 2862660 is installed, the connection between and OHS server process and the J2EE can be maintained for more than a single request. If the parameter Oc4jConnTimeout is set, the OHS will maintain the connection for at least that time. The problem is that the child process may became inactive before that time-out occurs, and then the connection will remain open. While the child process is inactive, the connection will be idle, and there is chance to be blackout by the FW.

If this happens, the first thing that the child will do is to close it when it becomes active. But at this time, the TCP socket closing cannot be completed, due the blackout. Although the http child process can simply ignore the closing failure and continue the creation of a new connection, the passive listener at the
j2ee side (the worker thread) will be hook without a chance for the resources to be released.

To solve this , the Patch 3151686 must be installed and the java-option
-Dajp.keepalive=true
must be enabled.

After this, the blackout detection will rely on the TCP KeepAlive provided by the operating system.

As DCD , this process consist in send probes -empty packages- when a socket had been inactive for a period of time. If there is no response, the socket will be closed, and then, even the passive listener, will receive and exception or signal to let him know that the no further communication is possible.

CONCLUSION

As the firewalls extend their functionality , and are now are placed in between application servers, some tuning and parameter adjusting must be made to overcome the default rules established for client/server communications. However, remember that a firewall idle timeout setting is simply the way this product functions and any changes to it should be made with agreement between all parties involved.

标签:

头像

小麦苗

学习或考证,均可联系麦老师,请加微信db_bao或QQ646634621

您可能还喜欢...

发表回复

嘿,我是小麦,需要帮助随时找我哦
  • 18509239930
  • 个人微信

  • 麦老师QQ聊天
  • 个人邮箱
  • 点击加入QQ群
  • 个人微店

  • 回到顶部
返回顶部