android_kernel_motorola_sm6225/net/ipv4
Neal Cardwell df92c8394e tcp: fix xmit timer to only be reset if data ACKed/SACKed
Fix a TCP loss recovery performance bug raised recently on the netdev
list, in two threads:

(i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
(ii) July 26, 2017: netdev thread:
     "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
     outstanding TLP retransmission"

The basic problem is that incoming TCP packets that did not indicate
forward progress could cause the xmit timer (TLP or RTO) to be rearmed
and pushed back in time. In certain corner cases this could result in
the following problems noted in these threads:

 - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
   could cause TCP to repeatedly schedule TLPs forever. We kept
   sending TLPs after every ~200ms, which elicited bogus SACKs, which
   caused more TLPs, ad infinitum; we never fired an RTO to fill in
   the holes.

 - Incoming data segments could, in some cases, cause us to reschedule
   our RTO or TLP timer further out in time, for no good reason. This
   could cause repeated inbound data to result in stalls in outbound
   data, in the presence of packet loss.

This commit fixes these bugs by changing the TLP and RTO ACK
processing to:

 (a) Only reschedule the xmit timer once per ACK.

 (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
     ACK indicates sufficient forward progress (a packet was
     cumulatively ACKed, or we got a SACK for a packet that was sent
     before the most recent retransmit of the write queue head).

This brings us back into closer compliance with the RFCs, since, as
the comment for tcp_rearm_rto() notes, we should only restart the RTO
timer after forward progress on the connection. Previously we were
restarting the xmit timer even in these cases where there was no
forward progress.

As a side benefit, this commit simplifies and speeds up the TCP timer
arming logic. We had been calling inet_csk_reset_xmit_timer() three
times on normal ACKs that cumulatively acknowledged some data:

1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
        if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
               tcp_rearm_rto(sk);

2) Once in tcp_clean_rtx_queue(), to update the RTO:
        if (flag & FLAG_ACKED) {
               tcp_rearm_rto(sk);

3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
   to TLP:
        if (icsk->icsk_pending == ICSK_TIME_RETRANS)
               tcp_schedule_loss_probe(sk);

This commit, by only rescheduling the xmit timer once per ACK,
simplifies the code and reduces CPU overhead.

This commit was tested in an A/B test with Google web server
traffic. SNMP stats and request latency metrics were within noise
levels, substantiating that for normal web traffic patterns this is a
rare issue. This commit was also tested with packetdrill tests to
verify that it fixes the timer behavior in the corner cases discussed
in the netdev threads mentioned above.

This patch is a bug fix patch intended to be queued for -stable
relases.

Fixes: 6ba8a3b19e ("tcp: Tail loss probe (TLP)")
Reported-by: Klavs Klavsen <kl@vsen.dk>
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 15:38:31 -07:00
..
netfilter netfilter: nf_tables: only allow in/output for arp packets 2017-07-17 17:02:44 +02:00
af_inet.c net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
ah4.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-06-23 14:17:31 -04:00
arp.c networking: make skb_put & friends return void pointers 2017-06-16 11:48:39 -04:00
cipso_ipv4.c Cipso: cipso_v4_optptr enter infinite loop 2017-08-01 15:31:23 -07:00
datagram.c
devinet.c net: convert in_device.refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
esp4.c net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
esp4_offload.c esp4/6: Fix GSO path for non-GSO SW-crypto packets 2017-04-19 07:48:57 +02:00
fib_frontend.c ipv4: initialize fib_trie prior to register_netdev_notifier call. 2017-07-20 15:24:45 -07:00
fib_lookup.h net: add extack arg to lwtunnel build state 2017-05-30 11:55:32 -04:00
fib_notifier.c ipv4: fib: Remove redundant argument 2017-03-10 09:45:09 -08:00
fib_rules.c ipv4: fib_rules: Dump FIB rules when registering FIB notifier 2017-03-16 10:18:34 -07:00
fib_semantics.c ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev() 2017-07-31 17:51:11 -07:00
fib_trie.c net, ipv4: convert fib_info.fib_clntref from atomic_t to refcount_t 2017-07-04 01:29:04 -07:00
fou.c gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP 2017-08-01 16:09:14 -07:00
gre_demux.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
gre_offload.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
icmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-06-15 11:59:32 -04:00
igmp.c net: convert ip_mc_list.refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
inet_connection_sock.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
inet_diag.c tcp: remove early retransmit 2017-01-13 22:37:16 -05:00
inet_fragment.c net: convert inet_frag_queue.refcnt from atomic_t to refcount_t 2017-07-01 07:39:09 -07:00
inet_hashtables.c net: make sk_ehashfn() static 2017-07-03 03:29:14 -07:00
inet_timewait_sock.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
inetpeer.c net: convert inet_peer.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
ip_forward.c ipv4: allow local fragmentation in ip_finish_output_gso() 2016-11-03 16:10:26 -04:00
ip_fragment.c net: convert inet_frag_queue.refcnt from atomic_t to refcount_t 2017-07-01 07:39:09 -07:00
ip_gre.c net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
ip_input.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
ip_options.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ip_output.c ipv4: ip_do_fragment: fix headroom tests 2017-07-15 14:38:31 -07:00
ip_sockglue.c do_ip_setsockopt(): don't open-code memdup_user() 2017-06-30 02:04:09 -04:00
ip_tunnel.c ip_tunnel: fix potential issue in ip_tunnel_rcv 2017-06-16 12:01:29 -04:00
ip_tunnel_core.c net: store port/representator id in metadata_dst 2017-06-25 11:42:01 -04:00
ip_vti.c net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
ipcomp.c
ipconfig.c networking: convert many more places to skb_put_zero() 2017-06-16 11:48:35 -04:00
ipip.c net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
ipmr.c net: ipmr: ipmr_get_table() returns NULL 2017-07-12 08:18:46 -07:00
Kconfig Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
Makefile tcp: ULP infrastructure 2017-06-15 12:12:40 -04:00
netfilter.c netfilter: use skb_to_full_sk in ip_route_me_harder 2017-02-28 12:49:36 +01:00
ping.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
proc.c tcp: add TCPMemoryPressuresChrono counter 2017-06-08 11:26:19 -04:00
protocol.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
raw.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
raw_diag.c net: ip, raw_diag -- Use jump for exiting from nested loop 2016-11-03 15:25:26 -04:00
route.c Add wait_for_random_bytes() and get_random_*_wait() functions so that 2017-07-15 12:44:02 -07:00
syncookies.c ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check() 2017-07-18 11:22:51 -07:00
sysctl_net_ipv4.c tcp: ULP infrastructure 2017-06-15 12:12:40 -04:00
tcp.c bpf: Add support for changing congestion control 2017-07-01 16:15:14 -07:00
tcp_bbr.c tcp_bbr: init pacing rate on first RTT sample 2017-07-15 14:43:29 -07:00
tcp_bic.c tcp: bic, cubic: use tcp_jiffies32 instead of tcp_time_stamp 2017-05-17 16:06:01 -04:00
tcp_cdg.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
tcp_cong.c bpf: Add support for changing congestion control 2017-07-01 16:15:14 -07:00
tcp_cubic.c tcp: bic, cubic: use tcp_jiffies32 instead of tcp_time_stamp 2017-05-17 16:06:01 -04:00
tcp_dctcp.c Revert "dctcp: update cwnd on congestion event" 2016-12-06 11:34:24 -05:00
tcp_diag.c net: diag: Fix refcnt leak in error path destroying socket 2016-08-23 23:11:36 -07:00
tcp_fastopen.c bpf: Add TCP connection BPF callbacks 2017-07-01 16:15:14 -07:00
tcp_highspeed.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_htcp.c tcp: replace misc tcp_time_stamp to tcp_jiffies32 2017-05-17 16:06:01 -04:00
tcp_hybla.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_illinois.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_input.c tcp: fix xmit timer to only be reset if data ACKed/SACKed 2017-08-03 15:38:31 -07:00
tcp_ipv4.c tcp: md5: tcp_md5_do_lookup_exact() can be static 2017-07-06 10:54:15 +01:00
tcp_lp.c tcp: switch TCP TS option (RFC 7323) to 1ms clock 2017-05-17 16:06:01 -04:00
tcp_metrics.c tcp: use tcp_jiffies32 to feed tp->snd_cwnd_stamp 2017-05-17 16:06:01 -04:00
tcp_minisocks.c bpf: Support for setting initial receive window 2017-07-01 16:15:13 -07:00
tcp_nv.c tcpnv: do not export local function 2017-05-21 13:42:36 -04:00
tcp_offload.c net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
tcp_output.c tcp: fix xmit timer to only be reset if data ACKed/SACKed 2017-08-03 15:38:31 -07:00
tcp_probe.c tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" 2017-02-21 13:26:03 -05:00
tcp_rate.c tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions 2017-06-15 12:12:40 -04:00
tcp_recovery.c tcp: switch TCP TS option (RFC 7323) to 1ms clock 2017-05-17 16:06:01 -04:00
tcp_scalable.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_timer.c net: fix keepalive code vs TCP_FASTOPEN_CONNECT 2017-08-03 09:34:51 -07:00
tcp_ulp.c tcp: fix out-of-bounds access in ULP sysctl 2017-06-23 14:10:05 -04:00
tcp_vegas.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_vegas.h
tcp_veno.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_westwood.c tcp_westwood: use tcp_jiffies32 instead of tcp_time_stamp 2017-05-17 16:06:01 -04:00
tcp_yeah.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tunnel4.c tunnels: correct conditional build of MPLS and IPv6 2016-07-11 13:27:06 -07:00
udp.c udp6: fix socket leak on early demux 2017-07-29 14:19:03 -07:00
udp_diag.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
udp_impl.h udp: make *udp*_queue_rcv_skb() functions static 2017-05-18 10:23:33 -04:00
udp_offload.c udp: disable inner UDP checksum offloads in IPsec case 2017-04-24 13:48:54 -04:00
udp_tunnel.c net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
udplite.c udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
xfrm4_input.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm4_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm4_mode_transport.c xfrm: Add encapsulation header offsets while SKB is not encrypted 2017-04-14 10:07:39 +02:00
xfrm4_mode_tunnel.c xfrm: Add encapsulation header offsets while SKB is not encrypted 2017-04-14 10:07:39 +02:00
xfrm4_output.c xfrm: Add an IPsec hardware offloading API 2017-04-14 10:06:10 +02:00
xfrm4_policy.c xfrm: policy: make policy backend const 2017-02-09 10:22:19 +01:00
xfrm4_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm4_state.c xfrm: remove unused function 2017-01-10 10:57:12 +01:00
xfrm4_tunnel.c