resin 是一款java實現(xiàn)的web服務器,在使用resin提供服務的時候發(fā)現(xiàn):在低流量的情況下偶爾會出現(xiàn)高負載的情況
創(chuàng)新互聯(lián)建站長期為上千客戶提供的網(wǎng)站建設服務,團隊從業(yè)經(jīng)驗10年,關注不同地域、不同群體,并針對不同對象提供差異化的產(chǎn)品和服務;打造開放共贏平臺,與合作伙伴共同營造健康的互聯(lián)網(wǎng)生態(tài)環(huán)境。為林周企業(yè)提供專業(yè)的成都做網(wǎng)站、網(wǎng)站制作,林周網(wǎng)站改版等技術(shù)服務。擁有十多年豐富建站經(jīng)驗和眾多成功案例,為您定制開發(fā)。挑戰(zhàn)的點:沒有流量但是有負載抖動
先說結(jié)論部署/tmp/cpu.sh
#!/bin/bash
date 2>&1 >>/tmp/pid_D.log
/usr/bin/ps -aux | awk '$8 == "D"' >>/tmp/pid_D.log
在crontab中每分鐘執(zhí)行一次
查看/tmp/pid_D.log,幾乎沒有D進程,得出結(jié)論:當前l(fā)oadavg主要由R構(gòu)成
r進程是被喚醒的進程,通過ps查看屬于一個采樣動作,不準確,于是打開內(nèi)核trace點,通過trace點能夠100%采集喚醒動作
#!/bin/bash
echo 'comm == "java"' >/sys/kernel/debug/tracing/events/sched/sched_wakeup_new/filter
echo 'comm == "java"' >/sys/kernel/debug/tracing/events/sched/sched_wakeup/filter
echo 1 >/sys/kernel/debug/tracing/events/sched/sched_wakeup_new/enable
echo 1 >/sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
nohup cat /sys/kernel/debug/tracing/trace_pipe >wakeup_trace.log &
打開sched_wakeup_new sched_wakeup兩個trace點,輸出到wakeup_trace.log中
使用work.py分析輸出文件,按照固定時間段統(tǒng)計被喚醒的進程個數(shù)
from datetime import datetime
def calc_date(t):
boot_time = 1666330944.762007849
"""
date | cat >/dev/null # Make sure programs are in cache
set -- $(date +%s.%N) $(cat /proc/uptime)
echo "$1-$2" | bc
"""
return datetime.fromtimestamp(boot_time + t)
def main():
input = 'wakeup_trace.log'
index = 0
s = 0
interval = 50 #ms
dic = {}
diff = 0
for line in open(input):
try:
arr = line.split(':')
#print line.split()
pid = line.split()[6]
t = int(float(arr[0].split(' ')[-1])*1000)
cindex = t / interval
t1 = cindex*interval/1000.
if cindex != index:
print calc_date(t1), ' ', t1, 'per', interval, 'ms:', s, ' diff cnt: ', diff
s = 0
diff = 0
index = cindex
dic = {}
else:
if not dic.get(pid, False):
dic[pid] = True
diff += 1
s += 1
except Exception, e:
# print e
pass
if '__main__' == __name__:
main()
以50ms為周期統(tǒng)計, 輸出到sched_a_out文件中
通過這樣的手段分析出確實是進程過多喚醒導致高負載
現(xiàn)象復現(xiàn)wget https://caucho.com/download/resin-pro-4.0.66.tar.gz
Now time is:<%=new java.util.Date()%>
java -jar lib/resin.jar start
java -jar lib/resin.jar stop
# Throttle the number of active threads for a port
port_thread_max : 256
accept_thread_max : 32
accept_thread_min : 4
# keepalive_max : 512
# Throttle the number of active threads for a port
port_thread_max : 256
accept_thread_max : 200
accept_thread_min : 200
# keepalive_max : 512
ab -n 10000000 -c 1 http://127.0.0.1:8080/time.jsp
[root@VM-0-13-centos ~]# uptime
11:01:41 up 30 days, 12:20, 2 users, load average: 57.34, 15.76, 5.40
8.修改配置將accept_thread_max accept_thread_min都調(diào)整到1 負載降低下來
[root@VM-0-13-centos ~]# uptime
11:06:27 up 30 days, 12:24, 2 users, load average: 5.04, 17.69, 10.36
原因分析
strace角度strace -p 1016685 -f -yy -e accept,poll
[pid 1016758]<... poll resumed>) = 1 ([{fd=45, revents=POLLIN}])
[pid 1016757]<... poll resumed>) = 1 ([{fd=45, revents=POLLIN}])
[pid 1016756]<... poll resumed>) = 1 ([{fd=45, revents=POLLIN}])
[pid 1016755]<... poll resumed>) = 1 ([{fd=45, revents=POLLIN}])
[pid 1016754]<... poll resumed>) = 1 ([{fd=45, revents=POLLIN}])
...
有很多accept
[pid 1016733] accept(45,[pid 1016732] accept(45,[pid 1016731] accept(45,[pid 1016730] accept(45,[pid 1016729] accept(45,[pid 1016724] accept(45,[pid 1016723] accept(45,[pid 1016721] accept(45,[pid 1016720] accept(45,
只有一個accept調(diào)用成功
[pid 1017179]<... accept resumed>{sa_family=AF_INET6, sin6_port=htons(37432), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_scope_id=0}, [28]) = 60[::ffff:127.0.0.1]:37432]>
其他都失敗
[pid 1016967]<... accept resumed>0x7fd965352100, [28]) = -1 EAGAIN (資源暫時不可用)
[pid 1016966]<... accept resumed>0x7fd965453340, [28]) = -1 EAGAIN (資源暫時不可用)
[pid 1016964]<... accept resumed>0x7fd965654f80, [28]) = -1 EAGAIN (資源暫時不可用)
...
[pid 1019707] poll([{fd=59[::ffff:127.0.0.1]:39736]>, events=POLLIN|POLLERR}], 1, 120000) = 1 ([{fd=59, revents=POLLIN}])
[pid 1019697] poll([{fd=45, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=45, revents=POLLIN}])
[pid 1019697] accept(45, {sa_family=AF_INET6, sin6_port=htons(39738), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_scope_id=0}, [28]) = 59[::ffff:127.0.0.1]:39738]>
從以上的信息來看,是因為resin的設計是“accept_thread_max accept_thread_min控制同時poll的線程數(shù)量,每一個poll成功之后會調(diào)用accept”,這就導致如果線程數(shù)量過多會造成“小流量喚醒很多線程的poll調(diào)用,但是真正只有很少的accept調(diào)用成功”,可以看成是一種“驚群”的現(xiàn)象。
用戶配置2000線程的原因用戶誤以為resin的網(wǎng)絡線程的處理方式是串行的,也就是“在一個請求未處理完之前不會處理下一個請求“,這個認知已經(jīng)通過如下的實驗幫助用戶糾正過來。
[root@VM-0-13-centos ~]# ab -n 1000 -c 20 http://127.0.0.1:8080/time.jsp
...
Concurrency Level: 20
Time taken for tests: 0.215 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 348000 bytes
HTML transferred: 91000 bytes
Requests per second: 4649.93 [#/sec] (mean)
Time per request: 4.301 [ms] (mean)
Time per request: 0.215 [ms] (mean, across all concurrent requests)
Transfer rate: 1580.25 [Kbytes/sec] received
從輸出可以看出, 單個監(jiān)聽線程每秒可以扛住4649個request,所以一定不是串行模式,而應該是多路io復用的模式# port_thread_max restricts the active threads available for a single port.
port_thread_max : 256
使用loadavg_tracer
insmod loadavg_tracer.ko load_threshold=20 dump_interval=5
上面的命令會在load高于20的時候打印R/D進程的棧
[2640203.171984] R [001] resin-port-8080 1025456
[2640203.172912]
poll_schedule_timeout.constprop.17+0x46/0x70
do_sys_poll+0x46b/0x500
__x64_sys_poll+0x3b/0x150
do_syscall_64+0x4d/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
[2640203.177809] R [001] resin-port-8080 1025457
[2640203.178725]
poll_schedule_timeout.constprop.17+0x46/0x70
do_sys_poll+0x46b/0x500
__x64_sys_poll+0x3b/0x150
do_syscall_64+0x4d/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
[2640203.183616] R [000] resin-port-8080 1025459
[2640203.184535]
poll_schedule_timeout.constprop.17+0x46/0x70
do_sys_poll+0x46b/0x500
__x64_sys_poll+0x3b/0x150
do_syscall_64+0x4d/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
[2640203.189394] R [001] resin-port-8080 1025460
[2640203.190102]
poll_schedule_timeout.constprop.17+0x46/0x70
do_sys_poll+0x46b/0x500
__x64_sys_poll+0x3b/0x150
do_syscall_64+0x4d/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
[2640203.193815] R [001] resin-port-8080 1025461
[2640203.194624]
poll_schedule_timeout.constprop.17+0x46/0x70
do_sys_poll+0x46b/0x500
__x64_sys_poll+0x3b/0x150
do_syscall_64+0x4d/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
[2640203.200751] R [001] resin-port-8080 1025511
[2640203.201538]
timer_callback.cold.2+0x154/0x1c8 [loadavg_tracer]
__hrtimer_run_queues+0x117/0x2a0
hrtimer_interrupt+0x101/0x210
smp_apic_timer_interrupt+0x74/0x140
apic_timer_interrupt+0xf/0x20
finish_task_switch+0x8b/0x250
poll_schedule_timeout.constprop.17+0x46/0x70
do_sys_poll+0x46b/0x500
__x64_sys_poll+0x3b/0x150
do_syscall_64+0x4d/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
你是否還在尋找穩(wěn)定的海外服務器提供商?創(chuàng)新互聯(lián)www.cdcxhl.cn海外機房具備T級流量清洗系統(tǒng)配攻擊溯源,準確流量調(diào)度確保服務器高可用性,企業(yè)級服務器適合批量采購,新人活動首月15元起,快前往官網(wǎng)查看詳情吧
網(wǎng)站題目:resin監(jiān)聽線程數(shù)配置不當導致loadavg升高-創(chuàng)新互聯(lián)
文章來源:http://www.muchs.cn/article18/cshhgp.html
成都網(wǎng)站建設公司_創(chuàng)新互聯(lián),為您提供域名注冊、網(wǎng)站內(nèi)鏈、移動網(wǎng)站建設、網(wǎng)站排名、全網(wǎng)營銷推廣、服務器托管
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請盡快告知,我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場,如需處理請聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時需注明來源: 創(chuàng)新互聯(lián)
猜你還喜歡下面的內(nèi)容