oracleRAC使用JumboFrames-創(chuàng)新互聯(lián)

先來(lái)看看Jumbo Frames是什么東東。

金林網(wǎng)站建設(shè)公司成都創(chuàng)新互聯(lián),金林網(wǎng)站設(shè)計(jì)制作，有大型網(wǎng)站制作公司豐富經(jīng)驗(yàn)。已為金林超過(guò)千家提供企業(yè)網(wǎng)站建設(shè)服務(wù)。企業(yè)網(wǎng)站搭建\外貿(mào)網(wǎng)站制作要多少錢，請(qǐng)找那個(gè)售后服務(wù)好的金林做網(wǎng)站的公司定做！

我們知道在TCP/IP 協(xié)義簇中，以太網(wǎng)數(shù)據(jù)鏈路層通信的單位是幀（frame）,1幀的大小被定為1,518字節(jié)，傳統(tǒng)的10M網(wǎng)卡frame的MTU（Maximum Transmission Unit大傳輸單元）大小是1500字節(jié)（如示例所示）,基中14 字節(jié)保留給了幀的頭，4字節(jié)保留給CRC校驗(yàn)，實(shí)際上去整個(gè)TCP/IP頭40字節(jié)，有效數(shù)據(jù)是1460字節(jié)。后來(lái)的100M和1000M網(wǎng)卡保持了兼容，也是1500字節(jié)。但是對(duì)1000M網(wǎng)卡來(lái)說(shuō)，這意味著更多的中斷和和處理時(shí)間。因此千兆網(wǎng)卡使用“Jumbo Frames”將frmae擴(kuò)展至9000字節(jié)。為什么是9000字節(jié)，而不是更大呢？因?yàn)?2位的CRC校驗(yàn)和對(duì)大于12000字節(jié)來(lái)說(shuō)將失去效率上的優(yōu)勢(shì)，而9000字節(jié)對(duì)8KB的應(yīng)用來(lái)說(shuō)，比如NFS，已經(jīng)足夠了。

eg:

[root@vm1 ~]# ifconfig
eth0     Link encap:Ethernet HWaddr 08:00:27:37:9C:D0
         inet addr:192.168.0.103 Bcast:192.168.0.255 Mask:255.255.255.0
         inet6 addr: fe80::a00:27ff:fe37:9cd0/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
         RX packets:9093 errors:0 dropped:0 overruns:0 frame:0
         TX packets:10011 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:749067 (731.5 KiB) TX bytes:4042337 (3.8 MiB)

如果是在以上網(wǎng)絡(luò)的示例一個(gè)配置MTU ~ 1500字節(jié)（1.5K）路徑中，一個(gè)數(shù)據(jù)塊的大小為8K的從一個(gè)節(jié)點(diǎn)傳送到另一個(gè)節(jié)點(diǎn),那么需要六個(gè)數(shù)據(jù)包來(lái)傳輸。8K緩存分割成六個(gè)IP數(shù)據(jù)包發(fā)送到接收側(cè)面。在接收端，這六個(gè)IP分組被接收并重新創(chuàng)建8K緩沖。重組緩沖區(qū)最終被傳遞給應(yīng)用程序，用于進(jìn)一步處理。

oracle RAC 使用Jumbo Frames

圖1

圖1顯示了數(shù)據(jù)塊如何分割和重組。在這個(gè)圖中，LMS進(jìn)程發(fā)送一個(gè)8KB數(shù)據(jù)塊到遠(yuǎn)程進(jìn)程。在傳送的過(guò)程中，在緩沖區(qū)8KB被分割為六個(gè)IP數(shù)據(jù)包，而這些IP包發(fā)送通過(guò)網(wǎng)絡(luò)發(fā)送到接收側(cè)。在接收側(cè)，內(nèi)核線程重組這六個(gè)IP數(shù)據(jù)包，并把8KB數(shù)據(jù)塊存放在緩沖區(qū)中。前臺(tái)進(jìn)程從套接字緩沖區(qū)讀取它到PGA，同時(shí)復(fù)制到database buffer中。

在以上過(guò)程中將會(huì)引起碎片與重新組合，即過(guò)度分割和重組問(wèn)題，這在無(wú)形中增加了這數(shù)據(jù)庫(kù)節(jié)點(diǎn)的以CPU使用率。這種情況我們不得不選擇Jumbo Frames 。

現(xiàn)在我們的網(wǎng)絡(luò)環(huán)境都可以達(dá)到千兆，萬(wàn)兆，甚至是更高，那么我們可以通過(guò)以下命令在系統(tǒng)中進(jìn)行設(shè)置（前提是你的環(huán)境是gigabit Ethernet switches and gigabit Ethernet network）：

# ifconfig eth0 mtu 9000

使其永久生效

# vi /etc/sysconfig/network-script/ifcfg-eth0

添加

MTU 9000

更多細(xì)節(jié)參看 http://www.cyberciti.biz/faq/rhel-centos-debian-ubuntu-jumbo-frames-configuration/

下面的文章對(duì)以上設(shè)置進(jìn)行很好的測(cè)試：

https://blogs.oracle.com/XPSONHA/entry/jumbo_frames_for_rac_interconn_1

對(duì)其測(cè)試步驟與結(jié)果摘錄如下：

Theory tells us properly configured Jumbo Frames can eliminate 10% of overhead on UDP traffic.

So how to test ?

I guess an 'end to end' test would be best way. So my first test is a 30 minute Swingbench run against a two node RAC, not too much stress in the begin.

The MTU configuration of the network bond (and the slave nics will be 1500 initially).

After the test, collect the results on the total transactions, the average transactions per second, the maximum transaction rate (results.xml), interconnect traffic (awr) and cpu usage. Then, do exactly the same, but now with an MTU of 9000 bytes. For this we need to make sure the switch settings are also modified to use an MTU of 9000.

B.t.w.: yes, it's possible to measure network only, but real-life end-to-end testing with a real Oracle application talking to RAC feels like the best approach to see what the impact is on for example the avg. transactions per second.

In order to make the test as reliable as possible some remarks:
- use guaranteed snapshots to flashback the database to its original state.
- stop/start the database (clean the cache)

B.t.w: before starting the test with an MTU of 9000 bytes the correct setting had to be proofed.

One way to do this is using ping with a packet size (-s) of 8972 and prohibiting fragmentation (-M do).
One could send Jumbo Frames and see if they can be sent without fragmentation.

[root@node01 rk]# ping -s 8972 -M do node02-ic -c 5
PING node02-ic. (192.168.23.32) 8972(9000) bytes of data.
8980 bytes from node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.914 ms

As you can see this is not a problem. While for packages larger then 9000 bytes, this is a problem:

[root@node01 rk]# ping -s 8973 -M do node02-ic -c 5
--- node02-ic. ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 0.859/0.955/1.167/0.109 ms, pipe 2
PING node02-ic. (192.168.23.32) 8973(9001) bytes of data.
From node02-ic. (192.168.23.52) icmp_seq=0 Frag needed and DF set (mtu = 9000)

Bringing back the MTU size to 1500 should also prohibit sending of fragmented 9000 packages:

[root@node01 rk]# ping -s 8972 -M do node02-ic -c 5
PING node02-ic. (192.168.23.32) 8972(9000) bytes of data.
--- node02-ic. ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms

Bringing back the MTU size to 1500 and sending 'normal' packages should work again:

[root@node01 rk]# ping node02-ic -M do -c 5
PING node02-ic. (192.168.23.32) 56(84) bytes of data.
64 bytes from node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.174 ms

--- node02-ic. ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.174/0.186/0.198/0.008 ms, pipe 2

An other way to verify the correct usage of the MTU size is the command 'netstat -a -i -n' (the column MTU size should be 9000 when you are performing tests on Jumbo Frames):

Kernel Interface table
Iface    MTU Met   RX-OK RX-ERR RX-DRP RX-OVR   TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0    1500 0 10371535    0    0    0 15338093    0    0    0 BMmRU
bond0:1   1500 0    - no statistics available -               BMmRU
bond1    9000 0 83383378    0    0    0 89645149    0    0    0 BMmRU
eth0    9000 0    36    0    0    0 88805888    0    0    0 BMsRU
eth2    1500 0 8036210    0    0    0 14235498    0    0    0 BMsRU
eth3    9000 0 83383342    0    0    0 839261    0    0    0 BMsRU
eth4    1500 0 2335325    0    0    0 1102595    0    0    0 BMsRU
eth5    1500 0 252075239    0    0    0 252020454    0    0    0 BMRU
eth6    1500 0     0    0    0    0     0    0    0    0 BM

As you can see my interconnect in on bond1 (build on eth0 and eth3). All 9000 bytes.

Not finished yet, no conclusions yet, but here is my first result.
You will notice the results are not that significantly.

MTU 1500:
TotalFailedTransactions        : 0
AverageTransactionsPerSecond : 1364
MaximumTransactionRate      : 107767
TotalCompletedTransactions    : 4910834

MTU 9000:
TotalFailedTransactions        : 1
AverageTransactionsPerSecond : 1336
MaximumTransactionRate      : 109775
TotalCompletedTransactions    : 4812122

In a chart this will look like this:
oracle RAC 使用Jumbo Frames

As you can see, the number of transactions between the two tests isn't really that significant, but the UDP traffic is less ! Still, I expected more from this test, so I have to put more stress to the test.

I noticed the failed transaction, and found "ORA-12155 TNS-received bad datatype in NSWMARKER packet". I did verify this and I am sure this is not related to the MTU size. This is because I only changed the MTU size for the interconnect and there is no TNS traffic on that network.

As said, I will now continue with tests that have much more stress on the systems:
- number of users changed from 80 to 150 per database
- number of databases changed from 1 to 2
- more network traffic:
- rebuild the Swingbench indexes without the 'REVERSE' option
- altered the sequences and lowered increment by value to 1 and cache size to 3. (in stead of 800)
- full table scans all the time on each instance
- run longer (4 hours in stead of half an hour)

Now, what you see is already improving. For the 4 hour test, the amount of extra UDP packets sent with an MTU size of 1500 compared to an MTU size of 9000 is about 2.5 to 3 million, see this chart:

oracle RAC 使用Jumbo Frames

Imagine yourself what an impact this has. Each package you not send save you the network-overhead of the package itself and a lot of CPU cycles that you don't need to spend.

The load average of the Linux box also decreases from an avg of 16 to 14.
oracle RAC 使用Jumbo Frames

In terms of completed transactions on different MTU sizes within the same timeframe, the chart looks like this:

oracle RAC 使用Jumbo Frames

To conclude this test two very high load runs are performed. Again, one with an MTU of 1500 and one with an MTU of 9000.

In the charts below you will see less CPU consumption when using 9000 bytes for MTU.

Also less packets are sent, although I think that number is not that significant compared to the total number of packets sent.

oracle RAC 使用Jumbo Frames

My final thoughts on this test:

1. you will hardly notice the benefits of using Jumbo on a system with no stress
2. you will notice the benefits of Jumbo using Frames on a stressed system and such a system will then use less CPU and will have less network overhead.

This means Jumbo Frames help you scaling out better then regular frames.

Depending on the interconnect usage of your applications the results may vary of course. With interconnect traffic intensive applications you will see the benefits earlier then with application that have relatively less interconnect activity.

I would use Jumbo Frames to scale better, since it saves CPU and reduces network traffic and this way leaves space for growth.

另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)cdcxhl.cn，海內(nèi)外云服務(wù)器15元起步，三天無(wú)理由+7*72小時(shí)售后在線，公司持有idc許可證，提供“云服務(wù)器、裸金屬服務(wù)器、高防服務(wù)器、香港服務(wù)器、美國(guó)服務(wù)器、虛擬主機(jī)、免備案服務(wù)器”等云主機(jī)租用服務(wù)以及企業(yè)上云的綜合解決方案，具有“安全穩(wěn)定、簡(jiǎn)單易用、服務(wù)可用性高、性價(jià)比高”等特點(diǎn)與優(yōu)勢(shì)，專為企業(yè)上云打造定制，能夠滿足用戶豐富、多元化的應(yīng)用場(chǎng)景需求。

本文題目：oracleRAC使用JumboFrames-創(chuàng)新互聯(lián)
瀏覽地址：http://muchs.cn/article30/dseepo.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供移動(dòng)網(wǎng)站建設(shè)、小程序開(kāi)發(fā)、企業(yè)網(wǎng)站制作、外貿(mào)網(wǎng)站建設(shè)、定制網(wǎng)站、網(wǎng)站維護(hù)

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請(qǐng)盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如需處理請(qǐng)聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來(lái)源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容