Archive for the ‘Networking’ Category

RATE_LIMIT and Oracle RAC 11gR2?

Monday, November 4th, 2013

Oracle SQLNet/TNS connections rate limiting is an awesome way of increasing stability of Oracle databases by allowing controlling insane behavior of starting up Applicaton Servers or Middleware components that push hundredths connections to the databases and thus killing CPU on the database/server side (and impacting other critical work). The way it works is also very important, it is not just refusing connections, but accept()-ing socket, but not starting processing (just queuing the work). For more detail you can look official Oracle whitepaper, especially www.oracle.com/technetwork/database/enterprise-edition/oraclenetservices-connectionratelim-133050.pdf.

The way its working is important , because rejecting the TCP connections (via TCP FIN or TCP RST) would cause application to get “Unable to get connection” type errors, which is in most cases not something you want to achieve for various reasons. If that feature doesn’t work you won’t be able to re-implement it in different way – let’s say using Linux’s NetFilter – because you are able to rate limit TCP connections there primarily by dropping them.

You may be surprised after reading official Oracle documentation http://docs.oracle.com/cd/E11882_01/network.112/e10835/listener.htm#NETRF008 regarding rate limiting Oracle SQLNet/TNS connections is NOT supported on RAC installations. The official documentation itself is silent on this topic, but this has been confirmed via Oracle SR to support. The probable primary reason for this is that CRS in 11gR2 manages normal and SCAN listeners so you have no choice of altering Oracle managed listner.ora and endpoints_listener.ora files because they are overwritten by… but still documentation asks you to them. Clearly a conflcit here.

On RAC/CRS the listeners are supposed to be altered only via “srvctl” command. srvctl actually is just frontend because oraagent.bin daemon takes care since 11gR2 for monitoring and housekeeping listeners. The problem is that srvctl does NOT have a way to enable RATE_LIMIT, but also there is no option to alter more advanced parameters like QUEUESIZE (fortunately on Linux 2.6.x backlog depth for listen() seems to be SOMAXCONN=/proc/sys/net/core/somaxconn=128 by default), SDU, etc (those are actually being mentioned in Bug 11782958: “SRVCTL DOES NOT SUPPORT LISTENER CONFIGURATION LIKE MODIFYING QUEUESIZE, SEND/RE”, also in DocID 1292915.1 “Setting Parameters for Scan and Node Listeners on RAC, Queuesize, SDU, Ports, etc”)

Sample srvctl options for 11gR2:

[oracle@racnode1 ~]$ aso srvctl config listener -a
Name: LISTENER
Network: 1, Owner: oracle
Home: <CRS home>
  /u01/app/11.2.0/grid11203 on node(s) racnode1, racnode2
End points: TCP:1521
[oracle@racnode1 ~]$ aso srvctl modify listener -h

Modifies the configuration for the listener.

Usage: srvctl modify listener [-l <lsnr_name>] [-o <oracle_home>] [-p "[TCP:]<port>[, ...][/IPC:<key>][/NMP:<pipe_name>][/TCPS:<s_port>] [/SDP:<port>]"] [-u <oracle_user>] [-k <net_num>]
    -l <lsnr_name>           Listener name (default name is LISTENER)
    -o <oracle_home>         ORACLE_HOME path
    -p "[TCP:]<port>[, ...][/IPC:<key>][/NMP:<pipe_name>][/TCPS:<s_port>] [/SDP:<port>]"       Comma separated tcp ports or listener endpoints
    -u <oracle_user>         Oracle user
    -k <net_num>             network number (default number is 1)
    -h                       Print usage
[oracle@racnode1 ~]$

Additionally in Doc Id 1568591.1 “11gR2 Listener With RATE_LIMIT Set: Slow Connect Time and Tnsping Response Time High with TNS-1158 Error” BUG:16409926 “LISTENER MEMORY LEAK IF RATE_LIMIT IS USED AND ENFORCED FREQUENTLY” is being mentioned (still not present in 11.2.0.2.x and 11.2.0.3.x PSUs, but fixed in 12.1.0.1 and included from start in 11.2.0.4.0). Overall this feature doesn’t seem to be widely used and/or tested and doesn’t make good feelings…

Raising Oracle VM’s maximal number of interfaces in domU

Saturday, August 2nd, 2008

Just edit /boot/grub/menu.lst and add “netloop.nloopbacks=X”. Sample file after modification:

title Oracle VM Server vnull02
root (hd0,0)
kernel /xen.gz console=ttyS0,57600n8 console=tty dom0_mem=512M
module /vmlinuz-2.6.18-vnull02_8.1.6.0.18.el5xen ro root=/dev/md0 netloop.nloopbacks=8
module /initrd-2.6.18-vnull02_8.1.6.0.18.el5xen.img

OracleVM (XEN) network performance

Monday, March 31st, 2008

In OracleVM (virtualization product for x86 and x86_64 from Oracle, which is based on OpenSource XEN) one can pin individual VirtualMachines(later called just VMs) to dedicated CPU cores. This can give great potential win if XEN scheduler (dom0) doesn’t have to switch VMs between CPU or cores. Also you can modify default MTU (1500) settings for VMs, but more about this later.

I’ve performed some tests (on PC: QuadCore Q6600 4×2.4GHz + 8GB RAM, 1GB RAM per nfsX VM, 2GB RAM per vmracX VM, 3 SATA2 10kRPM disks in RAID0), here are the results (OracleVM 2.1 with Oracle Enterprise Linux 5):

  • using defaults (without VCPU pinning, dynamic VirtualCPU selection by XEN scheduler)
    [root@nfs2 ~]# ./iperf -c 10.98.1.101 -i 1 -u -b 2048M
    ------------------------------------------------------------
    Client connecting to 10.98.1.101, UDP port 5001
    Sending 1470 byte datagrams
    UDP buffer size: 256 KByte (default)
    ------------------------------------------------------------
    [ 3] local 10.98.1.102 port 1030 connected with 10.98.1.101 port 5001
    [ 3] 0.0- 1.0 sec 209 MBytes 1.75 Gbits/sec
    [ 3] 1.0- 2.0 sec 206 MBytes 1.73 Gbits/sec
    [ 3] 2.0- 3.0 sec 206 MBytes 1.73 Gbits/sec
    [ 3] 3.0- 4.0 sec 216 MBytes 1.82 Gbits/sec
    [ 3] 4.0- 5.0 sec 231 MBytes 1.93 Gbits/sec
    [ 3] 5.0- 6.0 sec 230 MBytes 1.93 Gbits/sec
    [ 3] 6.0- 7.0 sec 228 MBytes 1.91 Gbits/sec
    [ 3] 7.0- 8.0 sec 231 MBytes 1.94 Gbits/sec
    [ 3] 8.0- 9.0 sec 230 MBytes 1.93 Gbits/sec
    [ 3] 9.0-10.0 sec 222 MBytes 1.86 Gbits/sec
    [ 3] 0.0-10.0 sec 2.16 GBytes 1.85 Gbits/sec
    [ 3] Sent 1576401 datagrams
    [ 3] Server Report:
    [ 3] 0.0-10.0 sec 1.94 GBytes 1.66 Gbits/sec 0.026 ms 160868/1576400 (10%)
    [ 3] 0.0-10.0 sec 1 datagrams received out-of-order
    [root@nfs2 ~]#
  • after pinning:

    [root@quad OVS]# xm vcpu-list
    Name ID VCPU CPU State Time(s) CPU Affinity
    18_nfs1 4 0 0 -b- 220.5 0
    21_nfs2 7 0 1 -b- 264.1 1
    24_vmrac1 8 0 2 -b- 4.7 any cpu
    24_vmrac1 8 1 2 -b- 5.9 any cpu
    Domain-0 0 0 1 -b- 1242.9 any cpu
    Domain-0 0 1 0 -b- 224.2 any cpu
    Domain-0 0 2 2 r-- 71.8 any cpu
    Domain-0 0 3 3 -b- 60.2 any cpu

    Notice that 18_nfs1 and 21_nfs2 are pinned to diffrent cores. You would expect at first glance that this will give better performance, but…
    [root@nfs2 ~]# ./iperf -c 10.98.1.101 -i 1 -u -b 2048M
    ------------------------------------------------------------
    Client connecting to 10.98.1.101, UDP port 5001
    Sending 1470 byte datagrams
    UDP buffer size: 256 KByte (default)
    ------------------------------------------------------------
    [ 3] local 10.98.1.102 port 1030 connected with 10.98.1.101 port 5001
    [ 3] 0.0- 1.0 sec 105 MBytes 883 Mbits/sec
    [ 3] 1.0- 2.0 sec 107 MBytes 894 Mbits/sec
    [ 3] 2.0- 3.0 sec 108 MBytes 908 Mbits/sec
    [ 3] 3.0- 4.0 sec 118 MBytes 988 Mbits/sec
    [ 3] 4.0- 5.0 sec 130 MBytes 1.09 Gbits/sec
    [ 3] 5.0- 6.0 sec 112 MBytes 937 Mbits/sec
    [ 3] 6.0- 7.0 sec 110 MBytes 922 Mbits/sec
    [ 3] 7.0- 8.0 sec 111 MBytes 928 Mbits/sec
    [ 3] 8.0- 9.0 sec 121 MBytes 1.01 Gbits/sec
    [ 3] 9.0-10.0 sec 121 MBytes 1.02 Gbits/sec
    [ 3] 0.0-10.0 sec 1.12 GBytes 958 Mbits/sec
    [ 3] Sent 814834 datagrams
    [ 3] Server Report:
    [ 3] 0.0-10.0 sec 1.11 GBytes 957 Mbits/sec 0.004 ms 1166/814833 (0.14%)
    [ 3] 0.0-10.0 sec 1 datagrams received out-of-order

    As you can see there is no performance win in such scenario, XEN scheduler better knows how to utilise hardware
  • The last test is the worst scenario which can happen under XEN: overloaded hardware. So pinning both nfs systems to one core(0) gives following results:
    [root@quad OVS]# xm vcpu-list
    Name ID VCPU CPU State Time(s) CPU Affinity
    18_nfs1 4 0 0 -b- 226.1 0
    21_nfs2 7 0 0 -b- 268.7 0
    [..]

    again:

    [root@nfs2 ~]# ./iperf -c 10.98.1.101 -i 1 -u -b 2048M
    ------------------------------------------------------------
    Client connecting to 10.98.1.101, UDP port 5001
    Sending 1470 byte datagrams
    UDP buffer size: 256 KByte (default)
    ------------------------------------------------------------
    [ 3] local 10.98.1.102 port 1030 connected with 10.98.1.101 port 5001
    [ 3] 0.0- 1.0 sec 73.3 MBytes 615 Mbits/sec
    [ 3] 1.0- 2.0 sec 68.3 MBytes 573 Mbits/sec
    [ 3] 2.0- 3.0 sec 68.3 MBytes 573 Mbits/sec
    [ 3] 3.0- 4.0 sec 68.3 MBytes 573 Mbits/sec
    [ 3] 4.0- 5.0 sec 68.1 MBytes 572 Mbits/sec
    [ 3] 5.0- 6.0 sec 68.6 MBytes 575 Mbits/sec
    [ 3] 6.0- 7.0 sec 69.0 MBytes 579 Mbits/sec
    [ 3] 7.0- 8.0 sec 68.9 MBytes 578 Mbits/sec
    [ 3] 8.0- 9.0 sec 68.9 MBytes 578 Mbits/sec
    [ 3] 9.0-10.0 sec 66.6 MBytes 559 Mbits/sec
    [ 3] 0.0-10.0 sec 688 MBytes 577 Mbits/sec
    [ 3] Sent 490928 datagrams
    [ 3] Server Report:
    [ 3] 0.0-10.0 sec 680 MBytes 570 Mbits/sec 0.019 ms 6064/490927 (1.2%)
    [ 3] 0.0-10.0 sec 1 datagrams received out-of-order

WARNING EXPERIMENAL AND NOT VERY WELL TESTED (USE AT OWN RISK!):
MTU stands for Maximal Transmission Unit in network terminology. The bigger MTU the less overhead from TCP/IP stack, thus it can give great network results decreasing CPU utilisation for network intensive operations between VMs (in XEN between VMs packets traverse like this: domU_1 –> dom0(bridge) –> domU_2). Before altering MTU for Virtual Machines you should be familiar with the way they work in XEN. Go here for very good article explaining architecture of bridged interfaces in XEN. Before you can change MTU of bridge (sanbr0 in my case) you must change MTU for each VIFX.Y in XEN dom0 by running the following ip link set dev vifX.Y mtu 9000. List of those interfaces can be found by running: brctl show. Next you have to set MTU for bridge (in dom0): ip link set dev sanbr0 mtu 9000. Now you can use larger MTU in VMs. The test was performed on the same Quad box mentioned earlier, but now from vmrac2 VM node to one nfs VM node (yes, this is vmrac2 node is running Oracle RAC on NFS, but it is idle – no transactions were performed during this test):

[root@vmrac2 ~]# cd /u03
[root@vmrac2 u03]# mkdir temp
[root@vmrac2 u03]# cd temp/
# used NFS mount options
[root@vmrac2 temp]# mount | grep /u03
10.98.1.102:/data on /u03 type nfs (rw,bg,hard,nointr,tcp,nfsvers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0,addr=10.98.1.102)
[root@vmrac2 temp]# ip link ls dev eth2
5: eth2: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:16:3e:6c:e7:67 brd ff:ff:ff:ff:ff:ff
[root@vmrac2 temp]# dd if=/dev/zero of=test1 bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 14.0485 seconds, 14.9 MB/s
# now we change MTU
[root@vmrac2 temp]# ip link set dev eth2 mtu 9000
[root@vmrac2 temp]# rm -f test1
[root@vmrac2 temp]# dd if=/dev/zero of=test2 bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 2.28668 seconds, 91.7 MB/s
[root@vmrac2 temp]# rm test2
rm: remove regular file `test2'? y
# let's test again to be sure
[root@vmrac2 temp]# dd if=/dev/zero of=test3 bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 2.14852 seconds, 97.6 MB/s
[root@vmrac2 temp]# rm test3
rm: remove regular file `test3'? y
# switch back to MTU=1500 to exclude other factors
[root@vmrac2 temp]# ip link set dev eth2 mtu 1500
[root@vmrac2 temp]# dd if=/dev/zero of=test4 bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 10.3054 seconds, 20.4 MB/s
# and again to MTU=9000
[root@vmrac2 temp]# ip link set dev eth2 mtu 9000
[root@vmrac2 temp]# dd if=/dev/zero of=test4 bs=1M count=200
[root@vmrac2 temp]# rm test4
rm: remove regular file `test4'? y
[root@vmrac2 temp]# dd if=/dev/zero of=test5 bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 2.37787 seconds, 88.2 MB/s
[root@vmrac2 temp]#

As you can see, we’ve increased sequential NFS write performance from something about ~20MB/s to ~90MB/s for NFS server and NFS client both in Oracle VM just by switching to larger MTU (I’ll try switching MTU to 16k or even 32k to be equal with NFS rsize/wsize).

One more notice: this is experimental and don’t try this at your’s OracleVM/XEN installations as this can be unsupported. I’m still experimenting with this, but I hope this trick won’t break anything ;)

p.s.#1 Simple iperf TCP bandwidth test on LAN with MTU=9000 (with 1500 it was ~1.9Gbps, as you could read earlier)
[root@nfs2 ~]# /root/iperf -c 10.98.1.101
------------------------------------------------------------
Client connecting to 10.98.1.101, TCP port 5001
TCP window size: 73.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.98.1.102 port 37660 connected with 10.98.1.101 port 5001
[ 3] 0.0-10.0 sec 7.30 GBytes 6.27 Gbits/sec

p.s.#2 Yes, Oracle RAC 11g works on Oracle VM on NFS3 :)