可能目前基于OpenWRT的系统都存在这个问题
系统概况
- openwrt-23.05 branch (git-24.006.68745-9128656)
- x86-64 on J1900 whose NIC r8111 drived by r8168
- 山东联通 OTL华为
故障分析
PPPoE不定时掉线.
通过系统日志可以发现如下日志
Tue Feb 6 11:32:20 2024 daemon.info pppd[14754]: No response to 5 echo-requests
Tue Feb 6 11:32:20 2024 daemon.notice pppd[14754]: Serial link appears to be disconnected.
Tue Feb 6 11:32:20 2024 daemon.info pppd[14754]: Connect time 43.1 minutes.
Tue Feb 6 11:32:20 2024 daemon.info pppd[14754]: Sent 13726252 bytes, received 37252270 bytes.
-------------------------------------------------------------------------
Tue Feb 6 12:08:47 2024 daemon.info pppd[22287]: LCP terminated by peer
Tue Feb 6 12:08:47 2024 daemon.info pppd[22287]: Connect time 36.4 minutes.
既然是LCP无响应,那么就把它关掉就好了,在这些文中都有提及. 但问题在于:当前设置是忽略故障的
通过pppd进程参数分析问题
使用ps
找到pppd
的参数
/usr/sbin/pppd nodetach ipparam wan_cu_0 ifname pppoe-wan_cu_0 lcp-echo-interval 1 lcp-echo-failure 5 lcp-echo-adaptive +ipv6 set AUTOIPV6=1 nodefaultroute usepeerdns maxfail 1 user ??????????? password ????????
于LuCI中的设置并不吻合. 既然这样,那么Bug产生了
在OpenWRT的GitHub仓库中, 已经有相关的Issue提出了: LCP echo failure threshold 0 does not behave as described in LuCI, 还有相关的解决方案, 但是默认的LCP keepalive="5 1",覆盖了LuCI中的0
解决方案
删除/lib/netifd/proto/ppp.sh
中的L123:
[ -n "$keepalive" ] || keepalive="5 1"