From 2a458eddc4c270a435c26f0d22c46da36cbf00d2 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Fri, 12 Apr 2019 15:04:10 +0200 Subject: [PATCH 01/99] bonding: fix event handling for stacked bonds [ Upstream commit 92480b3977fd3884649d404cbbaf839b70035699 ] When a bond is enslaved to another bond, bond_netdev_event() only handles the event as if the bond is a master, and skips treating the bond as a slave. This leads to a refcount leak on the slave, since we don't remove the adjacency to its master and the master holds a reference on the slave. Reproducer: ip link add bondL type bond ip link add bondU type bond ip link set bondL master bondU ip link del bondL No "Fixes:" tag, this code is older than git history. Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/bonding/bond_main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index b2c42cae3081..091b454e83fc 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3198,8 +3198,12 @@ static int bond_netdev_event(struct notifier_block *this, return NOTIFY_DONE; if (event_dev->flags & IFF_MASTER) { + int ret; + netdev_dbg(event_dev, "IFF_MASTER\n"); - return bond_master_netdev_event(event, event_dev); + ret = bond_master_netdev_event(event, event_dev); + if (ret != NOTIFY_DONE) + return ret; } if (event_dev->flags & IFF_SLAVE) { From fae6053d761122b272bdd5bca81561d7da306812 Mon Sep 17 00:00:00 2001 From: Si-Wei Liu Date: Mon, 8 Apr 2019 19:45:27 -0400 Subject: [PATCH 02/99] failover: allow name change on IFF_UP slave interfaces [ Upstream commit 8065a779f17e94536a1c4dcee4f9d88011672f97 ] When a netdev appears through hot plug then gets enslaved by a failover master that is already up and running, the slave will be opened right away after getting enslaved. Today there's a race that userspace (udev) may fail to rename the slave if the kernel (net_failover) opens the slave earlier than when the userspace rename happens. Unlike bond or team, the primary slave of failover can't be renamed by userspace ahead of time, since the kernel initiated auto-enslavement is unable to, or rather, is never meant to be synchronized with the rename request from userspace. As the failover slave interfaces are not designed to be operated directly by userspace apps: IP configuration, filter rules with regard to network traffic passing and etc., should all be done on master interface. In general, userspace apps only care about the name of master interface, while slave names are less important as long as admin users can see reliable names that may carry other information describing the netdev. For e.g., they can infer that "ens3nsby" is a standby slave of "ens3", while for a name like "eth0" they can't tell which master it belongs to. Historically the name of IFF_UP interface can't be changed because there might be admin script or management software that is already relying on such behavior and assumes that the slave name can't be changed once UP. But failover is special: with the in-kernel auto-enslavement mechanism, the userspace expectation for device enumeration and bring-up order is already broken. Previously initramfs and various userspace config tools were modified to bypass failover slaves because of auto-enslavement and duplicate MAC address. Similarly, in case that users care about seeing reliable slave name, the new type of failover slaves needs to be taken care of specifically in userspace anyway. It's less risky to lift up the rename restriction on failover slave which is already UP. Although it's possible this change may potentially break userspace component (most likely configuration scripts or management software) that assumes slave name can't be changed while UP, it's relatively a limited and controllable set among all userspace components, which can be fixed specifically to listen for the rename events on failover slaves. Userspace component interacting with slaves is expected to be changed to operate on failover master interface instead, as the failover slave is dynamic in nature which may come and go at any point. The goal is to make the role of failover slaves less relevant, and userspace components should only deal with failover master in the long run. Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") Signed-off-by: Si-Wei Liu Reviewed-by: Liran Alon Acked-by: Sridhar Samudrala Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/netdevice.h | 3 +++ net/core/dev.c | 16 +++++++++++++++- net/core/failover.c | 6 +++--- 3 files changed, 21 insertions(+), 4 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 21fef8c5eca7..8c2fec0bcb26 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1456,6 +1456,7 @@ struct net_device_ops { * @IFF_FAILOVER: device is a failover master device * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device + * @IFF_LIVE_RENAME_OK: rename is allowed while device is up and running */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1488,6 +1489,7 @@ enum netdev_priv_flags { IFF_FAILOVER = 1<<27, IFF_FAILOVER_SLAVE = 1<<28, IFF_L3MDEV_RX_HANDLER = 1<<29, + IFF_LIVE_RENAME_OK = 1<<30, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1519,6 +1521,7 @@ enum netdev_priv_flags { #define IFF_FAILOVER IFF_FAILOVER #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE #define IFF_L3MDEV_RX_HANDLER IFF_L3MDEV_RX_HANDLER +#define IFF_LIVE_RENAME_OK IFF_LIVE_RENAME_OK /** * struct net_device - The DEVICE structure. diff --git a/net/core/dev.c b/net/core/dev.c index d47554307a6d..3bcec116a5f2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1180,7 +1180,21 @@ int dev_change_name(struct net_device *dev, const char *newname) BUG_ON(!dev_net(dev)); net = dev_net(dev); - if (dev->flags & IFF_UP) + + /* Some auto-enslaved devices e.g. failover slaves are + * special, as userspace might rename the device after + * the interface had been brought up and running since + * the point kernel initiated auto-enslavement. Allow + * live name change even when these slave devices are + * up and running. + * + * Typically, users of these auto-enslaving devices + * don't actually care about slave name change, as + * they are supposed to operate on master interface + * directly. + */ + if (dev->flags & IFF_UP && + likely(!(dev->priv_flags & IFF_LIVE_RENAME_OK))) return -EBUSY; write_seqcount_begin(&devnet_rename_seq); diff --git a/net/core/failover.c b/net/core/failover.c index 4a92a98ccce9..b5cd3c727285 100644 --- a/net/core/failover.c +++ b/net/core/failover.c @@ -80,14 +80,14 @@ static int failover_slave_register(struct net_device *slave_dev) goto err_upper_link; } - slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; + slave_dev->priv_flags |= (IFF_FAILOVER_SLAVE | IFF_LIVE_RENAME_OK); if (fops && fops->slave_register && !fops->slave_register(slave_dev, failover_dev)) return NOTIFY_OK; netdev_upper_dev_unlink(slave_dev, failover_dev); - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_LIVE_RENAME_OK); err_upper_link: netdev_rx_handler_unregister(slave_dev); done: @@ -121,7 +121,7 @@ int failover_slave_unregister(struct net_device *slave_dev) netdev_rx_handler_unregister(slave_dev); netdev_upper_dev_unlink(slave_dev, failover_dev); - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_LIVE_RENAME_OK); if (fops && fops->slave_unregister && !fops->slave_unregister(slave_dev, failover_dev)) From bcb964012d1b17544f80a096bb4d6f8cfd32357b Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 15 Apr 2019 15:57:23 -0500 Subject: [PATCH 03/99] net: atm: Fix potential Spectre v1 vulnerabilities [ Upstream commit 899537b73557aafbdd11050b501cf54b4f5c45af ] arg is controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/atm/lec.c:715 lec_mcast_attach() warn: potential spectre issue 'dev_lec' [r] (local cap) Fix this by sanitizing arg before using it to index dev_lec. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/ Signed-off-by: Gustavo A. R. Silva Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/atm/lec.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/atm/lec.c b/net/atm/lec.c index d7f5cf5b7594..ad4f829193f0 100644 --- a/net/atm/lec.c +++ b/net/atm/lec.c @@ -710,7 +710,10 @@ static int lec_vcc_attach(struct atm_vcc *vcc, void __user *arg) static int lec_mcast_attach(struct atm_vcc *vcc, int arg) { - if (arg < 0 || arg >= MAX_LEC_ITF || !dev_lec[arg]) + if (arg < 0 || arg >= MAX_LEC_ITF) + return -EINVAL; + arg = array_index_nospec(arg, MAX_LEC_ITF); + if (!dev_lec[arg]) return -EINVAL; vcc->proto_data = dev_lec[arg]; return lec_mcast_make(netdev_priv(dev_lec[arg]), vcc); @@ -728,6 +731,7 @@ static int lecd_attach(struct atm_vcc *vcc, int arg) i = arg; if (arg >= MAX_LEC_ITF) return -EINVAL; + i = array_index_nospec(arg, MAX_LEC_ITF); if (!dev_lec[i]) { int size; From 08b0b4f28008ae81fa8fdf06d679a9edd6005d88 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Thu, 11 Apr 2019 13:56:39 +0300 Subject: [PATCH 04/99] net: bridge: fix per-port af_packet sockets [ Upstream commit 3b2e2904deb314cc77a2192f506f2fd44e3d10d0 ] When the commit below was introduced it changed two visible things: - the skb was no longer passed through the protocol handlers with the original device - the skb was passed up the stack with skb->dev = bridge The first change broke af_packet sockets on bridge ports. For example we use them for hostapd which listens for ETH_P_PAE packets on the ports. We discussed two possible fixes: - create a clone and pass it through NF_HOOK(), act on the original skb based on the result - somehow signal to the caller from the okfn() that it was called, meaning the skb is ok to be passed, which this patch is trying to implement via returning 1 from the bridge link-local okfn() Note that we rely on the fact that NF_QUEUE/STOLEN would return 0 and drop/error would return < 0 thus the okfn() is called only when the return was 1, so we signal to the caller that it was called by preserving the return value from nf_hook(). Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/bridge/br_input.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 72074276c088..fed0ff446abb 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -195,13 +195,10 @@ static void __br_handle_local_finish(struct sk_buff *skb) /* note: already called with rcu_read_lock */ static int br_handle_local_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { - struct net_bridge_port *p = br_port_get_rcu(skb->dev); - __br_handle_local_finish(skb); - BR_INPUT_SKB_CB(skb)->brdev = p->br->dev; - br_pass_frame_up(skb); - return 0; + /* return 1 to signal the okfn() was called so it's ok to use the skb */ + return 1; } /* @@ -278,10 +275,18 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) goto forward; } - /* Deliver packet to local host only */ - NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, dev_net(skb->dev), - NULL, skb, skb->dev, NULL, br_handle_local_finish); - return RX_HANDLER_CONSUMED; + /* The else clause should be hit when nf_hook(): + * - returns < 0 (drop/error) + * - returns = 0 (stolen/nf_queue) + * Thus return 1 from the okfn() to signal the skb is ok to pass + */ + if (NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, + dev_net(skb->dev), NULL, skb, skb->dev, NULL, + br_handle_local_finish) == 1) { + return RX_HANDLER_PASS; + } else { + return RX_HANDLER_CONSUMED; + } } forward: From 97fd88e04c8d724301839129b63ef663e9397a65 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Thu, 11 Apr 2019 15:08:25 +0300 Subject: [PATCH 05/99] net: bridge: multicast: use rcu to access port list from br_multicast_start_querier [ Upstream commit c5b493ce192bd7a4e7bd073b5685aad121eeef82 ] br_multicast_start_querier() walks over the port list but it can be called from a timer with only multicast_lock held which doesn't protect the port list, so use RCU to walk over it. Fixes: c83b8fab06fc ("bridge: Restart queries when last querier expires") Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/bridge/br_multicast.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 20ed7adcf1cc..75901c4641b1 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -2152,7 +2152,8 @@ static void br_multicast_start_querier(struct net_bridge *br, __br_multicast_open(br, query); - list_for_each_entry(port, &br->port_list, list) { + rcu_read_lock(); + list_for_each_entry_rcu(port, &br->port_list, list) { if (port->state == BR_STATE_DISABLED || port->state == BR_STATE_BLOCKING) continue; @@ -2164,6 +2165,7 @@ static void br_multicast_start_querier(struct net_bridge *br, br_multicast_enable(&port->ip6_own_query); #endif } + rcu_read_unlock(); } int br_multicast_toggle(struct net_bridge *br, unsigned long val) From 2804598764f984a8c5cdee377a73e0bf86dc5dd4 Mon Sep 17 00:00:00 2001 From: Yuya Kusakabe Date: Tue, 16 Apr 2019 10:22:28 +0900 Subject: [PATCH 06/99] net: Fix missing meta data in skb with vlan packet [ Upstream commit d85e8be2a5a02869f815dd0ac2d743deb4cd7957 ] skb_reorder_vlan_header() should move XDP meta data with ethernet header if XDP meta data exists. Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access") Signed-off-by: Yuya Kusakabe Signed-off-by: Takeru Hayasaka Co-developed-by: Takeru Hayasaka Reviewed-by: Toshiaki Makita Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index ceee28e184af..8b5768113acd 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5071,7 +5071,8 @@ EXPORT_SYMBOL_GPL(skb_gso_validate_mac_len); static struct sk_buff *skb_reorder_vlan_header(struct sk_buff *skb) { - int mac_len; + int mac_len, meta_len; + void *meta; if (skb_cow(skb, skb_headroom(skb)) < 0) { kfree_skb(skb); @@ -5083,6 +5084,13 @@ static struct sk_buff *skb_reorder_vlan_header(struct sk_buff *skb) memmove(skb_mac_header(skb) + VLAN_HLEN, skb_mac_header(skb), mac_len - VLAN_HLEN - ETH_TLEN); } + + meta_len = skb_metadata_len(skb); + if (meta_len) { + meta = skb_metadata_end(skb) - meta_len; + memmove(meta + VLAN_HLEN, meta, meta_len); + } + skb->mac_header += VLAN_HLEN; return skb; } From 1cd8788368226016b70af2c1e7d46c1bdb4756c6 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Tue, 9 Apr 2019 11:47:20 +0200 Subject: [PATCH 07/99] net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv [ Upstream commit 988dc4a9a3b66be75b30405a5494faf0dc7cffb6 ] gue tunnels run iptunnel_pull_offloads on received skbs. This can determine a possible use-after-free accessing guehdr pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device) Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Signed-off-by: Lorenzo Bianconi Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/fou.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c index 500a59906b87..854ff1e4c41f 100644 --- a/net/ipv4/fou.c +++ b/net/ipv4/fou.c @@ -120,6 +120,7 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb) struct guehdr *guehdr; void *data; u16 doffset = 0; + u8 proto_ctype; if (!fou) return 1; @@ -211,13 +212,14 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb) if (unlikely(guehdr->control)) return gue_control_message(skb, guehdr); + proto_ctype = guehdr->proto_ctype; __skb_pull(skb, sizeof(struct udphdr) + hdrlen); skb_reset_transport_header(skb); if (iptunnel_pull_offloads(skb)) goto drop; - return -guehdr->proto_ctype; + return -proto_ctype; drop: kfree_skb(skb); From 6728c6174a47b8a04ceec89aca9e1195dee7ff6b Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 16 Apr 2019 10:55:20 -0700 Subject: [PATCH 08/99] tcp: tcp_grow_window() needs to respect tcp_space() [ Upstream commit 50ce163a72d817a99e8974222dcf2886d5deb1ae ] For some reason, tcp_grow_window() correctly tests if enough room is present before attempting to increase tp->rcv_ssthresh, but does not prevent it to grow past tcp_space() This is causing hard to debug issues, like failing the (__tcp_select_window(sk) >= tp->rcv_wnd) test in __tcp_ack_snd_check(), causing ACK delays and possibly slow flows. Depending on tcp_rmem[2], MTU, skb->len/skb->truesize ratio, we can see the problem happening on "netperf -t TCP_RR -- -r 2000,2000" after about 60 round trips, when the active side no longer sends immediate acks. This bug predates git history. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Neal Cardwell Acked-by: Wei Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 572f79abd393..cfdd70e32755 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -402,11 +402,12 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb) static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); + int room; + + room = min_t(int, tp->window_clamp, tcp_space(sk)) - tp->rcv_ssthresh; /* Check #1 */ - if (tp->rcv_ssthresh < tp->window_clamp && - (int)tp->rcv_ssthresh < tcp_space(sk) && - !tcp_under_memory_pressure(sk)) { + if (room > 0 && !tcp_under_memory_pressure(sk)) { int incr; /* Check #2. Increase window, if skb with such overhead @@ -419,8 +420,7 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb) if (incr) { incr = max_t(int, incr, 2 * skb->len); - tp->rcv_ssthresh = min(tp->rcv_ssthresh + incr, - tp->window_clamp); + tp->rcv_ssthresh += min(room, incr); inet_csk(sk)->icsk_ack.quick |= 1; } } From a60a47206a31b92af27ca65b66089b21c5b66d78 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Mon, 8 Apr 2019 16:45:17 +0800 Subject: [PATCH 09/99] team: set slave to promisc if team is already in promisc mode [ Upstream commit 43c2adb9df7ddd6560fd3546d925b42cef92daa0 ] After adding a team interface to bridge, the team interface will enter promisc mode. Then if we add a new slave to team0, the slave will keep promisc off. Fix it by setting slave to promisc on if team master is already in promisc mode, also do the same for allmulti. v2: add promisc and allmulti checking when delete ports Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Hangbin Liu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/team/team.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index 95ee9d815d76..e23eaf3f6d03 100644 --- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -1250,6 +1250,23 @@ static int team_port_add(struct team *team, struct net_device *port_dev, goto err_option_port_add; } + /* set promiscuity level to new slave */ + if (dev->flags & IFF_PROMISC) { + err = dev_set_promiscuity(port_dev, 1); + if (err) + goto err_set_slave_promisc; + } + + /* set allmulti level to new slave */ + if (dev->flags & IFF_ALLMULTI) { + err = dev_set_allmulti(port_dev, 1); + if (err) { + if (dev->flags & IFF_PROMISC) + dev_set_promiscuity(port_dev, -1); + goto err_set_slave_promisc; + } + } + netif_addr_lock_bh(dev); dev_uc_sync_multiple(port_dev, dev); dev_mc_sync_multiple(port_dev, dev); @@ -1266,6 +1283,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev, return 0; +err_set_slave_promisc: + __team_option_inst_del_port(team, port); + err_option_port_add: team_upper_dev_unlink(team, port); @@ -1311,6 +1331,12 @@ static int team_port_del(struct team *team, struct net_device *port_dev) team_port_disable(team, port); list_del_rcu(&port->list); + + if (dev->flags & IFF_PROMISC) + dev_set_promiscuity(port_dev, -1); + if (dev->flags & IFF_ALLMULTI) + dev_set_allmulti(port_dev, -1); + team_upper_dev_unlink(team, port); netdev_rx_handler_unregister(port_dev); team_port_disable_netpoll(port); From 242e5746cb477bdb4c59d0f2d3c5a3e1c0a10629 Mon Sep 17 00:00:00 2001 From: Hoang Le Date: Tue, 9 Apr 2019 14:59:24 +0700 Subject: [PATCH 10/99] tipc: missing entries in name table of publications [ Upstream commit d1841533e54876f152a30ac398a34f47ad6590b1 ] When binding multiple services with specific type 1Ki, 2Ki.., this leads to some entries in the name table of publications missing when listed out via 'tipc name show'. The problem is at identify zero last_type conditional provided via netlink. The first is initial 'type' when starting name table dummping. The second is continuously with zero type (node state service type). Then, lookup function failure to finding node state service type in next iteration. To solve this, adding more conditional to marked as dirty type and lookup correct service type for the next iteration instead of select the first service as initial 'type' zero. Acked-by: Jon Maloy Signed-off-by: Hoang Le Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/name_table.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c index 66d5b2c5987a..d72985ca1d55 100644 --- a/net/tipc/name_table.c +++ b/net/tipc/name_table.c @@ -908,7 +908,8 @@ static int tipc_nl_service_list(struct net *net, struct tipc_nl_msg *msg, for (; i < TIPC_NAMETBL_SIZE; i++) { head = &tn->nametbl->services[i]; - if (*last_type) { + if (*last_type || + (!i && *last_key && (*last_lower == *last_key))) { service = tipc_service_find(net, *last_type); if (!service) return -EPIPE; From b82df42059fb90862505eb071dd0dd225a2791df Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Tue, 9 Apr 2019 12:10:25 +0800 Subject: [PATCH 11/99] vhost: reject zero size iova range [ Upstream commit 813dbeb656d6c90266f251d8bd2b02d445afa63f ] We used to accept zero size iova range which will lead a infinite loop in translate_desc(). Fixing this by failing the request in this case. Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com Fixes: 6b1e6cc7 ("vhost: new device IOTLB API") Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/vhost/vhost.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index b214a72d5caa..c163bc15976a 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -911,8 +911,12 @@ static int vhost_new_umem_range(struct vhost_umem *umem, u64 start, u64 size, u64 end, u64 userspace_addr, int perm) { - struct vhost_umem_node *tmp, *node = kmalloc(sizeof(*node), GFP_ATOMIC); + struct vhost_umem_node *tmp, *node; + if (!size) + return -EFAULT; + + node = kmalloc(sizeof(*node), GFP_ATOMIC); if (!node) return -ENOMEM; From 8a430e56a6485267a1b2d3747209d26c54d1a34b Mon Sep 17 00:00:00 2001 From: Stephen Suryaputra Date: Fri, 12 Apr 2019 16:19:27 -0400 Subject: [PATCH 12/99] ipv4: recompile ip options in ipv4_link_failure [ Upstream commit ed0de45a1008991fdaa27a0152befcb74d126a8b ] Recompile IP options since IPCB may not be valid anymore when ipv4_link_failure is called from arp_error_report. Refer to the commit 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error") and the commit before that (9ef6b42ad6fd) for a similar issue. Signed-off-by: Stephen Suryaputra Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/route.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 7a556e459375..444e0d0aa20b 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1188,8 +1188,16 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie) static void ipv4_link_failure(struct sk_buff *skb) { struct rtable *rt; + struct ip_options opt; - icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0); + /* Recompile ip options since IPCB may not be valid anymore. + */ + memset(&opt, 0, sizeof(opt)); + opt.optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr); + if (__ip_options_compile(dev_net(skb->dev), &opt, skb, NULL)) + return; + + __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0, &opt); rt = skb_rtable(skb); if (rt) From 7ba5ec69e1a7ecfc662050b55e77078208dea9cc Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 13 Apr 2019 17:32:21 -0700 Subject: [PATCH 13/99] ipv4: ensure rcu_read_lock() in ipv4_link_failure() [ Upstream commit c543cb4a5f07e09237ec0fc2c60c9f131b2c79ad ] fib_compute_spec_dst() needs to be called under rcu protection. syzbot reported : WARNING: suspicious RCU usage 5.1.0-rc4+ #165 Not tainted include/linux/inetdevice.h:220 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by swapper/0/0: #0: 0000000051b67925 ((&n->timer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:170 [inline] #0: 0000000051b67925 ((&n->timer)){+.-.}, at: call_timer_fn+0xda/0x720 kernel/time/timer.c:1315 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.1.0-rc4+ #165 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5162 __in_dev_get_rcu include/linux/inetdevice.h:220 [inline] fib_compute_spec_dst+0xbbd/0x1030 net/ipv4/fib_frontend.c:294 spec_dst_fill net/ipv4/ip_options.c:245 [inline] __ip_options_compile+0x15a7/0x1a10 net/ipv4/ip_options.c:343 ipv4_link_failure+0x172/0x400 net/ipv4/route.c:1195 dst_link_failure include/net/dst.h:427 [inline] arp_error_report+0xd1/0x1c0 net/ipv4/arp.c:297 neigh_invalidate+0x24b/0x570 net/core/neighbour.c:995 neigh_timer_handler+0xc35/0xf30 net/core/neighbour.c:1081 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325 expire_timers kernel/time/timer.c:1362 [inline] __run_timers kernel/time/timer.c:1681 [inline] __run_timers kernel/time/timer.c:1649 [inline] run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694 __do_softirq+0x266/0x95a kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807 Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: Stephen Suryaputra Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/route.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 444e0d0aa20b..98c81c21b753 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1187,14 +1187,20 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie) static void ipv4_link_failure(struct sk_buff *skb) { - struct rtable *rt; struct ip_options opt; + struct rtable *rt; + int res; /* Recompile ip options since IPCB may not be valid anymore. */ memset(&opt, 0, sizeof(opt)); opt.optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr); - if (__ip_options_compile(dev_net(skb->dev), &opt, skb, NULL)) + + rcu_read_lock(); + res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL); + rcu_read_unlock(); + + if (res) return; __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0, &opt); From 9de22b997fe40a0f783db73df5574d20242c6f09 Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Thu, 11 Apr 2019 12:26:32 +0200 Subject: [PATCH 14/99] net: thunderx: raise XDP MTU to 1508 [ Upstream commit 5ee15c101f29e0093ffb5448773ccbc786eb313b ] The thunderx driver splits frames bigger than 1530 bytes to multiple pages, making impossible to run an eBPF program on it. This leads to a maximum MTU of 1508 if QinQ is in use. The thunderx driver forbids to load an eBPF program if the MTU is higher than 1500 bytes. Raise the limit to 1508 so it is possible to use L2 protocols which need some more headroom. Fixes: 05c773f52b96e ("net: thunderx: Add basic XDP support") Signed-off-by: Matteo Croce Acked-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index 9800738448ec..c9491dbea4d2 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -32,6 +32,13 @@ #define DRV_NAME "nicvf" #define DRV_VERSION "1.0" +/* NOTE: Packets bigger than 1530 are split across multiple pages and XDP needs + * the buffer to be contiguous. Allow XDP to be set up only if we don't exceed + * this value, keeping headroom for the 14 byte Ethernet header and two + * VLAN tags (for QinQ) + */ +#define MAX_XDP_MTU (1530 - ETH_HLEN - VLAN_HLEN * 2) + /* Supported devices */ static const struct pci_device_id nicvf_id_table[] = { { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, @@ -1795,8 +1802,10 @@ static int nicvf_xdp_setup(struct nicvf *nic, struct bpf_prog *prog) bool bpf_attached = false; int ret = 0; - /* For now just support only the usual MTU sized frames */ - if (prog && (dev->mtu > 1500)) { + /* For now just support only the usual MTU sized frames, + * plus some headroom for VLAN, QinQ. + */ + if (prog && dev->mtu > MAX_XDP_MTU) { netdev_warn(dev, "Jumbo frames not yet supported with XDP, current MTU %d.\n", dev->mtu); return -EOPNOTSUPP; From d1785bea2f3419507827b009b483a53fde80b928 Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Thu, 11 Apr 2019 12:26:33 +0200 Subject: [PATCH 15/99] net: thunderx: don't allow jumbo frames with XDP [ Upstream commit 1f227d16083b2e280b7dde4ca78883d75593f2fd ] The thunderx driver forbids to load an eBPF program if the MTU is too high, but this can be circumvented by loading the eBPF, then raising the MTU. Fix this by limiting the MTU if an eBPF program is already loaded. Fixes: 05c773f52b96e ("net: thunderx: Add basic XDP support") Signed-off-by: Matteo Croce Acked-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index c9491dbea4d2..dca02b35c231 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -1554,6 +1554,15 @@ static int nicvf_change_mtu(struct net_device *netdev, int new_mtu) struct nicvf *nic = netdev_priv(netdev); int orig_mtu = netdev->mtu; + /* For now just support only the usual MTU sized frames, + * plus some headroom for VLAN, QinQ. + */ + if (nic->xdp_prog && new_mtu > MAX_XDP_MTU) { + netdev_warn(netdev, "Jumbo frames not yet supported with XDP, current MTU %d.\n", + netdev->mtu); + return -EINVAL; + } + netdev->mtu = new_mtu; if (!netif_running(netdev)) From 7cfddb81a81727362e0e4d3aea03af7d7d96d73f Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Tue, 19 Mar 2019 22:09:05 -0700 Subject: [PATCH 16/99] net/mlx5: FPGA, tls, hold rcu read lock a bit longer [ Upstream commit 31634bf5dcc418b5b2cacd954394c0c4620db6a2 ] To avoid use-after-free, hold the rcu read lock until we are done copying flow data into the command buffer. Fixes: ab412e1dd7db ("net/mlx5: Accel, add TLS rx offload routines") Reported-by: Eric Dumazet Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- .../net/ethernet/mellanox/mlx5/core/fpga/tls.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c index 8de64e88c670..08aa7266c8c0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c @@ -217,22 +217,22 @@ int mlx5_fpga_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq, void *cmd; int ret; - rcu_read_lock(); - flow = idr_find(&mdev->fpga->tls->rx_idr, ntohl(handle)); - rcu_read_unlock(); - - if (!flow) { - WARN_ONCE(1, "Received NULL pointer for handle\n"); - return -EINVAL; - } - buf = kzalloc(size, GFP_ATOMIC); if (!buf) return -ENOMEM; cmd = (buf + 1); + rcu_read_lock(); + flow = idr_find(&mdev->fpga->tls->rx_idr, ntohl(handle)); + if (unlikely(!flow)) { + rcu_read_unlock(); + WARN_ONCE(1, "Received NULL pointer for handle\n"); + kfree(buf); + return -EINVAL; + } mlx5_fpga_tls_flow_to_cmd(flow, cmd); + rcu_read_unlock(); MLX5_SET(tls_cmd, cmd, swid, ntohl(handle)); MLX5_SET64(tls_cmd, cmd, tls_rcd_sn, be64_to_cpu(rcd_sn)); From 785833b9eee027c0d31dfe96225e243f13110939 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 8 Apr 2019 17:59:50 -0700 Subject: [PATCH 17/99] net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded() [ Upstream commit b4f47f3848eb70986f75d06112af7b48b7f5f462 ] Unlike '&&' operator, the '&' does not have short-circuit evaluation semantics. IOW both sides of the operator always get evaluated. Fix the wrong operator in tls_is_sk_tx_device_offloaded(), which would lead to out-of-bounds access for for non-full sockets. Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski Reviewed-by: Dirk van der Merwe Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/tls.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/tls.h b/include/net/tls.h index 0a769cf2f5f3..c423b7d0b6ab 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -317,7 +317,7 @@ tls_validate_xmit_skb(struct sock *sk, struct net_device *dev, static inline bool tls_is_sk_tx_device_offloaded(struct sock *sk) { #ifdef CONFIG_SOCK_VALIDATE_XMIT - return sk_fullsock(sk) & + return sk_fullsock(sk) && (smp_load_acquire(&sk->sk_validate_xmit_skb) == &tls_validate_xmit_skb); #else From 1d2499b0860024c7cd973a8dee375105ffd34156 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Tue, 19 Mar 2019 01:05:41 -0700 Subject: [PATCH 18/99] net/mlx5: FPGA, tls, idr remove on flow delete [ Upstream commit df3a8344d404a810b4aadbf19b08c8232fbaa715 ] Flow is kfreed on mlx5_fpga_tls_del_flow but kept in the idr data structure, this is risky and can cause use-after-free, since the idr_remove is delayed until tls_send_teardown_cmd completion. Instead of delaying idr_remove, in this patch we do it on mlx5_fpga_tls_del_flow, before actually kfree(flow). Added synchronize_rcu before kfree(flow) Fixes: ab412e1dd7db ("net/mlx5: Accel, add TLS rx offload routines") Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- .../ethernet/mellanox/mlx5/core/fpga/tls.c | 43 +++++++------------ 1 file changed, 15 insertions(+), 28 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c index 08aa7266c8c0..22a2ef111514 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c @@ -148,14 +148,16 @@ static int mlx5_fpga_tls_alloc_swid(struct idr *idr, spinlock_t *idr_spinlock, return ret; } -static void mlx5_fpga_tls_release_swid(struct idr *idr, - spinlock_t *idr_spinlock, u32 swid) +static void *mlx5_fpga_tls_release_swid(struct idr *idr, + spinlock_t *idr_spinlock, u32 swid) { unsigned long flags; + void *ptr; spin_lock_irqsave(idr_spinlock, flags); - idr_remove(idr, swid); + ptr = idr_remove(idr, swid); spin_unlock_irqrestore(idr_spinlock, flags); + return ptr; } static void mlx_tls_kfree_complete(struct mlx5_fpga_conn *conn, @@ -165,20 +167,12 @@ static void mlx_tls_kfree_complete(struct mlx5_fpga_conn *conn, kfree(buf); } -struct mlx5_teardown_stream_context { - struct mlx5_fpga_tls_command_context cmd; - u32 swid; -}; - static void mlx5_fpga_tls_teardown_completion(struct mlx5_fpga_conn *conn, struct mlx5_fpga_device *fdev, struct mlx5_fpga_tls_command_context *cmd, struct mlx5_fpga_dma_buf *resp) { - struct mlx5_teardown_stream_context *ctx = - container_of(cmd, struct mlx5_teardown_stream_context, cmd); - if (resp) { u32 syndrome = MLX5_GET(tls_resp, resp->sg[0].data, syndrome); @@ -186,14 +180,6 @@ mlx5_fpga_tls_teardown_completion(struct mlx5_fpga_conn *conn, mlx5_fpga_err(fdev, "Teardown stream failed with syndrome = %d", syndrome); - else if (MLX5_GET(tls_cmd, cmd->buf.sg[0].data, direction_sx)) - mlx5_fpga_tls_release_swid(&fdev->tls->tx_idr, - &fdev->tls->tx_idr_spinlock, - ctx->swid); - else - mlx5_fpga_tls_release_swid(&fdev->tls->rx_idr, - &fdev->tls->rx_idr_spinlock, - ctx->swid); } mlx5_fpga_tls_put_command_ctx(cmd); } @@ -253,7 +239,7 @@ int mlx5_fpga_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq, static void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev, void *flow, u32 swid, gfp_t flags) { - struct mlx5_teardown_stream_context *ctx; + struct mlx5_fpga_tls_command_context *ctx; struct mlx5_fpga_dma_buf *buf; void *cmd; @@ -261,7 +247,7 @@ static void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev, if (!ctx) return; - buf = &ctx->cmd.buf; + buf = &ctx->buf; cmd = (ctx + 1); MLX5_SET(tls_cmd, cmd, command_type, CMD_TEARDOWN_STREAM); MLX5_SET(tls_cmd, cmd, swid, swid); @@ -272,8 +258,7 @@ static void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev, buf->sg[0].data = cmd; buf->sg[0].size = MLX5_TLS_COMMAND_SIZE; - ctx->swid = swid; - mlx5_fpga_tls_cmd_send(mdev->fpga, &ctx->cmd, + mlx5_fpga_tls_cmd_send(mdev->fpga, ctx, mlx5_fpga_tls_teardown_completion); } @@ -283,13 +268,14 @@ void mlx5_fpga_tls_del_flow(struct mlx5_core_dev *mdev, u32 swid, struct mlx5_fpga_tls *tls = mdev->fpga->tls; void *flow; - rcu_read_lock(); if (direction_sx) - flow = idr_find(&tls->tx_idr, swid); + flow = mlx5_fpga_tls_release_swid(&tls->tx_idr, + &tls->tx_idr_spinlock, + swid); else - flow = idr_find(&tls->rx_idr, swid); - - rcu_read_unlock(); + flow = mlx5_fpga_tls_release_swid(&tls->rx_idr, + &tls->rx_idr_spinlock, + swid); if (!flow) { mlx5_fpga_err(mdev->fpga, "No flow information for swid %u\n", @@ -297,6 +283,7 @@ void mlx5_fpga_tls_del_flow(struct mlx5_core_dev *mdev, u32 swid, return; } + synchronize_rcu(); /* before kfree(flow) */ mlx5_fpga_tls_send_teardown_cmd(mdev, flow, swid, flags); } From 5f72cb2ab51d242158b6b963365d70f88844b983 Mon Sep 17 00:00:00 2001 From: Jonathan Lemon Date: Sun, 14 Apr 2019 14:21:29 -0700 Subject: [PATCH 19/99] route: Avoid crash from dereferencing NULL rt->from [ Upstream commit 9c69a13205151c0d801de9f9d83a818e6e8f60ec ] When __ip6_rt_update_pmtu() is called, rt->from is RCU dereferenced, but is never checked for null - rt6_flush_exceptions() may have removed the entry. [ 1913.989004] RIP: 0010:ip6_rt_cache_alloc+0x13/0x170 [ 1914.209410] Call Trace: [ 1914.214798] [ 1914.219226] __ip6_rt_update_pmtu+0xb0/0x190 [ 1914.228649] ip6_tnl_xmit+0x2c2/0x970 [ip6_tunnel] [ 1914.239223] ? ip6_tnl_parse_tlv_enc_lim+0x32/0x1a0 [ip6_tunnel] [ 1914.252489] ? __gre6_xmit+0x148/0x530 [ip6_gre] [ 1914.262678] ip6gre_tunnel_xmit+0x17e/0x3c7 [ip6_gre] [ 1914.273831] dev_hard_start_xmit+0x8d/0x1f0 [ 1914.283061] sch_direct_xmit+0xfa/0x230 [ 1914.291521] __qdisc_run+0x154/0x4b0 [ 1914.299407] net_tx_action+0x10e/0x1f0 [ 1914.307678] __do_softirq+0xca/0x297 [ 1914.315567] irq_exit+0x96/0xa0 [ 1914.322494] smp_apic_timer_interrupt+0x68/0x130 [ 1914.332683] apic_timer_interrupt+0xf/0x20 [ 1914.341721] Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected") Signed-off-by: Jonathan Lemon Reviewed-by: Eric Dumazet Reviewed-by: David Ahern Reviewed-by: Martin KaFai Lau Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 9006bb3c9e72..06fa8425d82c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2367,6 +2367,10 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk, rcu_read_lock(); from = rcu_dereference(rt6->from); + if (!from) { + rcu_read_unlock(); + return; + } nrt6 = ip6_rt_cache_alloc(from, daddr, saddr); if (nrt6) { rt6_do_update_pmtu(nrt6, mtu); From 490532225e20f2c3ef763c7da2c4e090ac42318a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Thu, 4 Apr 2019 15:01:33 +0200 Subject: [PATCH 20/99] sch_cake: Use tc_skb_protocol() helper for getting packet protocol MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit b2100cc56fca8c51d28aa42a9f1fbcb2cf351996 ] We shouldn't be using skb->protocol directly as that will miss cases with hardware-accelerated VLAN tags. Use the helper instead to get the right protocol number. Reported-by: Kevin Darbyshire-Bryant Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Greg Kroah-Hartman --- net/sched/sch_cake.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 793016d722ec..c5e87d1c26eb 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1526,7 +1526,7 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) { u8 dscp; - switch (skb->protocol) { + switch (tc_skb_protocol(skb)) { case htons(ETH_P_IP): dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2; if (wash && dscp) From cbce0413f783a5a9d49005a2187201df8eb15a38 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Thu, 4 Apr 2019 15:01:33 +0200 Subject: [PATCH 21/99] sch_cake: Make sure we can write the IP header before changing DSCP bits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit c87b4ecdbe8db27867a7b7f840291cd843406bd7 ] There is not actually any guarantee that the IP headers are valid before we access the DSCP bits of the packets. Fix this using the same approach taken in sch_dsmark. Reported-by: Kevin Darbyshire-Bryant Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Greg Kroah-Hartman --- net/sched/sch_cake.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index c5e87d1c26eb..75ca80909cf8 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1524,16 +1524,27 @@ static void cake_wash_diffserv(struct sk_buff *skb) static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) { + int wlen = skb_network_offset(skb); u8 dscp; switch (tc_skb_protocol(skb)) { case htons(ETH_P_IP): + wlen += sizeof(struct iphdr); + if (!pskb_may_pull(skb, wlen) || + skb_try_make_writable(skb, wlen)) + return 0; + dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2; if (wash && dscp) ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, 0); return dscp; case htons(ETH_P_IPV6): + wlen += sizeof(struct ipv6hdr); + if (!pskb_may_pull(skb, wlen) || + skb_try_make_writable(skb, wlen)) + return 0; + dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2; if (wash && dscp) ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, 0); From 06f7d2182f9d850e762a8870633d10e99cfc10a8 Mon Sep 17 00:00:00 2001 From: Pieter Jansen van Vuuren Date: Mon, 1 Apr 2019 19:36:33 -0700 Subject: [PATCH 22/99] nfp: flower: replace CFI with vlan present [ Upstream commit f7ee799a51ddbcc205ef615fe424fb5084e9e0aa ] Replace vlan CFI bit with a vlan present bit that indicates the presence of a vlan tag. Previously the driver incorrectly assumed that an vlan id of 0 is not matchable, therefore we indicate vlan presence with a vlan present bit. Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities") Signed-off-by: Pieter Jansen van Vuuren Signed-off-by: Louis Peens Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/netronome/nfp/flower/cmsg.h | 2 +- drivers/net/ethernet/netronome/nfp/flower/match.c | 14 ++++++-------- 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h index 325954b829c8..3b7a8630530a 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h +++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h @@ -55,7 +55,7 @@ #define NFP_FLOWER_LAYER2_GENEVE_OP BIT(6) #define NFP_FLOWER_MASK_VLAN_PRIO GENMASK(15, 13) -#define NFP_FLOWER_MASK_VLAN_CFI BIT(12) +#define NFP_FLOWER_MASK_VLAN_PRESENT BIT(12) #define NFP_FLOWER_MASK_VLAN_VID GENMASK(11, 0) #define NFP_FLOWER_MASK_MPLS_LB GENMASK(31, 12) diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c index 17acb8cc6044..b99d55cf81f1 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/match.c +++ b/drivers/net/ethernet/netronome/nfp/flower/match.c @@ -56,14 +56,12 @@ nfp_flower_compile_meta_tci(struct nfp_flower_meta_tci *frame, FLOW_DISSECTOR_KEY_VLAN, target); /* Populate the tci field. */ - if (flow_vlan->vlan_id || flow_vlan->vlan_priority) { - tmp_tci = FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO, - flow_vlan->vlan_priority) | - FIELD_PREP(NFP_FLOWER_MASK_VLAN_VID, - flow_vlan->vlan_id) | - NFP_FLOWER_MASK_VLAN_CFI; - frame->tci = cpu_to_be16(tmp_tci); - } + tmp_tci = NFP_FLOWER_MASK_VLAN_PRESENT; + tmp_tci |= FIELD_PREP(NFP_FLOWER_MASK_VLAN_PRIO, + flow_vlan->vlan_priority) | + FIELD_PREP(NFP_FLOWER_MASK_VLAN_VID, + flow_vlan->vlan_id); + frame->tci = cpu_to_be16(tmp_tci); } } From 8d9051a4680abf7316b440278fe52b9437503e90 Mon Sep 17 00:00:00 2001 From: Pieter Jansen van Vuuren Date: Mon, 1 Apr 2019 19:36:34 -0700 Subject: [PATCH 23/99] nfp: flower: remove vlan CFI bit from push vlan action [ Upstream commit 42cd5484a22f1a1b947e21e2af65fa7dab09d017 ] We no longer set CFI when pushing vlan tags, therefore we remove the CFI bit from push vlan. Fixes: 1a1e586f54bf ("nfp: add basic action capabilities to flower offloads") Signed-off-by: Pieter Jansen van Vuuren Signed-off-by: Louis Peens Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/netronome/nfp/flower/action.c | 3 +-- drivers/net/ethernet/netronome/nfp/flower/cmsg.h | 1 - 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c index 7a1e9cd9cc62..777b99416062 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/action.c +++ b/drivers/net/ethernet/netronome/nfp/flower/action.c @@ -80,8 +80,7 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan, tmp_push_vlan_tci = FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, tcf_vlan_push_prio(action)) | - FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)) | - NFP_FL_PUSH_VLAN_CFI; + FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)); push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci); } diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h index 3b7a8630530a..9b018321e24e 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h +++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h @@ -109,7 +109,6 @@ #define NFP_FL_OUT_FLAGS_TYPE_IDX GENMASK(2, 0) #define NFP_FL_PUSH_VLAN_PRIO GENMASK(15, 13) -#define NFP_FL_PUSH_VLAN_CFI BIT(12) #define NFP_FL_PUSH_VLAN_VID GENMASK(11, 0) /* LAG ports */ From e24be8e38cd7fcc85174595690b20d153f8e204d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Fri, 5 Apr 2019 15:01:59 +0200 Subject: [PATCH 24/99] sch_cake: Simplify logic in cake_select_tin() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 4976e3c683f328bc6f2edef555a4ffee6524486f ] The logic in cake_select_tin() was getting a bit hairy, and it turns out we can simplify it quite a bit. This also allows us to get rid of one of the two diffserv parsing functions, which has the added benefit that already-zeroed DSCP fields won't get re-written. Suggested-by: Kevin Darbyshire-Bryant Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Greg Kroah-Hartman --- net/sched/sch_cake.c | 44 ++++++++++++++++---------------------------- 1 file changed, 16 insertions(+), 28 deletions(-) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 75ca80909cf8..9fd37d91b5ed 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1508,20 +1508,6 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free) return idx + (tin << 16); } -static void cake_wash_diffserv(struct sk_buff *skb) -{ - switch (skb->protocol) { - case htons(ETH_P_IP): - ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, 0); - break; - case htons(ETH_P_IPV6): - ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, 0); - break; - default: - break; - } -} - static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) { int wlen = skb_network_offset(skb); @@ -1564,25 +1550,27 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch, { struct cake_sched_data *q = qdisc_priv(sch); u32 tin; + u8 dscp; - if (TC_H_MAJ(skb->priority) == sch->handle && - TC_H_MIN(skb->priority) > 0 && - TC_H_MIN(skb->priority) <= q->tin_cnt) { + /* Tin selection: Default to diffserv-based selection, allow overriding + * using firewall marks or skb->priority. + */ + dscp = cake_handle_diffserv(skb, + q->rate_flags & CAKE_FLAG_WASH); + + if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) + tin = 0; + + else if (TC_H_MAJ(skb->priority) == sch->handle && + TC_H_MIN(skb->priority) > 0 && + TC_H_MIN(skb->priority) <= q->tin_cnt) tin = q->tin_order[TC_H_MIN(skb->priority) - 1]; - if (q->rate_flags & CAKE_FLAG_WASH) - cake_wash_diffserv(skb); - } else if (q->tin_mode != CAKE_DIFFSERV_BESTEFFORT) { - /* extract the Diffserv Precedence field, if it exists */ - /* and clear DSCP bits if washing */ - tin = q->tin_index[cake_handle_diffserv(skb, - q->rate_flags & CAKE_FLAG_WASH)]; + else { + tin = q->tin_index[dscp]; + if (unlikely(tin >= q->tin_cnt)) tin = 0; - } else { - tin = 0; - if (q->rate_flags & CAKE_FLAG_WASH) - cake_wash_diffserv(skb); } return &q->tins[tin]; From 702ddf862d9d74e670849df659b8706fd6878205 Mon Sep 17 00:00:00 2001 From: Peter Oskolkov Date: Tue, 23 Apr 2019 10:25:31 -0700 Subject: [PATCH 25/99] net: IP defrag: encapsulate rbtree defrag code into callable functions [ Upstream commit c23f35d19db3b36ffb9e04b08f1d91565d15f84f ] This is a refactoring patch: without changing runtime behavior, it moves rbtree-related code from IPv4-specific files/functions into .h/.c defrag files shared with IPv6 defragmentation code. v2: make handling of overlapping packets match upstream. Signed-off-by: Peter Oskolkov Cc: Eric Dumazet Cc: Florian Westphal Cc: Tom Herbert Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/net/inet_frag.h | 16 ++- net/ipv4/inet_fragment.c | 293 +++++++++++++++++++++++++++++++++++++ net/ipv4/ip_fragment.c | 302 +++++---------------------------------- 3 files changed, 342 insertions(+), 269 deletions(-) diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index 1662cbc0b46b..b02bf737d019 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -77,8 +77,8 @@ struct inet_frag_queue { struct timer_list timer; spinlock_t lock; refcount_t refcnt; - struct sk_buff *fragments; /* Used in IPv6. */ - struct rb_root rb_fragments; /* Used in IPv4. */ + struct sk_buff *fragments; /* used in 6lopwpan IPv6. */ + struct rb_root rb_fragments; /* Used in IPv4/IPv6. */ struct sk_buff *fragments_tail; struct sk_buff *last_run_head; ktime_t stamp; @@ -153,4 +153,16 @@ static inline void add_frag_mem_limit(struct netns_frags *nf, long val) extern const u8 ip_frag_ecn_table[16]; +/* Return values of inet_frag_queue_insert() */ +#define IPFRAG_OK 0 +#define IPFRAG_DUP 1 +#define IPFRAG_OVERLAP 2 +int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, + int offset, int end); +void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, + struct sk_buff *parent); +void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, + void *reasm_data); +struct sk_buff *inet_frag_pull_head(struct inet_frag_queue *q); + #endif diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 760a9e52e02b..9f69411251d0 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -25,6 +25,62 @@ #include #include #include +#include +#include + +/* Use skb->cb to track consecutive/adjacent fragments coming at + * the end of the queue. Nodes in the rb-tree queue will + * contain "runs" of one or more adjacent fragments. + * + * Invariants: + * - next_frag is NULL at the tail of a "run"; + * - the head of a "run" has the sum of all fragment lengths in frag_run_len. + */ +struct ipfrag_skb_cb { + union { + struct inet_skb_parm h4; + struct inet6_skb_parm h6; + }; + struct sk_buff *next_frag; + int frag_run_len; +}; + +#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb)) + +static void fragcb_clear(struct sk_buff *skb) +{ + RB_CLEAR_NODE(&skb->rbnode); + FRAG_CB(skb)->next_frag = NULL; + FRAG_CB(skb)->frag_run_len = skb->len; +} + +/* Append skb to the last "run". */ +static void fragrun_append_to_last(struct inet_frag_queue *q, + struct sk_buff *skb) +{ + fragcb_clear(skb); + + FRAG_CB(q->last_run_head)->frag_run_len += skb->len; + FRAG_CB(q->fragments_tail)->next_frag = skb; + q->fragments_tail = skb; +} + +/* Create a new "run" with the skb. */ +static void fragrun_create(struct inet_frag_queue *q, struct sk_buff *skb) +{ + BUILD_BUG_ON(sizeof(struct ipfrag_skb_cb) > sizeof(skb->cb)); + fragcb_clear(skb); + + if (q->last_run_head) + rb_link_node(&skb->rbnode, &q->last_run_head->rbnode, + &q->last_run_head->rbnode.rb_right); + else + rb_link_node(&skb->rbnode, NULL, &q->rb_fragments.rb_node); + rb_insert_color(&skb->rbnode, &q->rb_fragments); + + q->fragments_tail = skb; + q->last_run_head = skb; +} /* Given the OR values of all fragments, apply RFC 3168 5.3 requirements * Value : 0xff if frame should be dropped. @@ -123,6 +179,28 @@ static void inet_frag_destroy_rcu(struct rcu_head *head) kmem_cache_free(f->frags_cachep, q); } +unsigned int inet_frag_rbtree_purge(struct rb_root *root) +{ + struct rb_node *p = rb_first(root); + unsigned int sum = 0; + + while (p) { + struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode); + + p = rb_next(p); + rb_erase(&skb->rbnode, root); + while (skb) { + struct sk_buff *next = FRAG_CB(skb)->next_frag; + + sum += skb->truesize; + kfree_skb(skb); + skb = next; + } + } + return sum; +} +EXPORT_SYMBOL(inet_frag_rbtree_purge); + void inet_frag_destroy(struct inet_frag_queue *q) { struct sk_buff *fp; @@ -224,3 +302,218 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key) return fq; } EXPORT_SYMBOL(inet_frag_find); + +int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, + int offset, int end) +{ + struct sk_buff *last = q->fragments_tail; + + /* RFC5722, Section 4, amended by Errata ID : 3089 + * When reassembling an IPv6 datagram, if + * one or more its constituent fragments is determined to be an + * overlapping fragment, the entire datagram (and any constituent + * fragments) MUST be silently discarded. + * + * Duplicates, however, should be ignored (i.e. skb dropped, but the + * queue/fragments kept for later reassembly). + */ + if (!last) + fragrun_create(q, skb); /* First fragment. */ + else if (last->ip_defrag_offset + last->len < end) { + /* This is the common case: skb goes to the end. */ + /* Detect and discard overlaps. */ + if (offset < last->ip_defrag_offset + last->len) + return IPFRAG_OVERLAP; + if (offset == last->ip_defrag_offset + last->len) + fragrun_append_to_last(q, skb); + else + fragrun_create(q, skb); + } else { + /* Binary search. Note that skb can become the first fragment, + * but not the last (covered above). + */ + struct rb_node **rbn, *parent; + + rbn = &q->rb_fragments.rb_node; + do { + struct sk_buff *curr; + int curr_run_end; + + parent = *rbn; + curr = rb_to_skb(parent); + curr_run_end = curr->ip_defrag_offset + + FRAG_CB(curr)->frag_run_len; + if (end <= curr->ip_defrag_offset) + rbn = &parent->rb_left; + else if (offset >= curr_run_end) + rbn = &parent->rb_right; + else if (offset >= curr->ip_defrag_offset && + end <= curr_run_end) + return IPFRAG_DUP; + else + return IPFRAG_OVERLAP; + } while (*rbn); + /* Here we have parent properly set, and rbn pointing to + * one of its NULL left/right children. Insert skb. + */ + fragcb_clear(skb); + rb_link_node(&skb->rbnode, parent, rbn); + rb_insert_color(&skb->rbnode, &q->rb_fragments); + } + + skb->ip_defrag_offset = offset; + + return IPFRAG_OK; +} +EXPORT_SYMBOL(inet_frag_queue_insert); + +void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, + struct sk_buff *parent) +{ + struct sk_buff *fp, *head = skb_rb_first(&q->rb_fragments); + struct sk_buff **nextp; + int delta; + + if (head != skb) { + fp = skb_clone(skb, GFP_ATOMIC); + if (!fp) + return NULL; + FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag; + if (RB_EMPTY_NODE(&skb->rbnode)) + FRAG_CB(parent)->next_frag = fp; + else + rb_replace_node(&skb->rbnode, &fp->rbnode, + &q->rb_fragments); + if (q->fragments_tail == skb) + q->fragments_tail = fp; + skb_morph(skb, head); + FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag; + rb_replace_node(&head->rbnode, &skb->rbnode, + &q->rb_fragments); + consume_skb(head); + head = skb; + } + WARN_ON(head->ip_defrag_offset != 0); + + delta = -head->truesize; + + /* Head of list must not be cloned. */ + if (skb_unclone(head, GFP_ATOMIC)) + return NULL; + + delta += head->truesize; + if (delta) + add_frag_mem_limit(q->net, delta); + + /* If the first fragment is fragmented itself, we split + * it to two chunks: the first with data and paged part + * and the second, holding only fragments. + */ + if (skb_has_frag_list(head)) { + struct sk_buff *clone; + int i, plen = 0; + + clone = alloc_skb(0, GFP_ATOMIC); + if (!clone) + return NULL; + skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; + skb_frag_list_init(head); + for (i = 0; i < skb_shinfo(head)->nr_frags; i++) + plen += skb_frag_size(&skb_shinfo(head)->frags[i]); + clone->data_len = head->data_len - plen; + clone->len = clone->data_len; + head->truesize += clone->truesize; + clone->csum = 0; + clone->ip_summed = head->ip_summed; + add_frag_mem_limit(q->net, clone->truesize); + skb_shinfo(head)->frag_list = clone; + nextp = &clone->next; + } else { + nextp = &skb_shinfo(head)->frag_list; + } + + return nextp; +} +EXPORT_SYMBOL(inet_frag_reasm_prepare); + +void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, + void *reasm_data) +{ + struct sk_buff **nextp = (struct sk_buff **)reasm_data; + struct rb_node *rbn; + struct sk_buff *fp; + + skb_push(head, head->data - skb_network_header(head)); + + /* Traverse the tree in order, to build frag_list. */ + fp = FRAG_CB(head)->next_frag; + rbn = rb_next(&head->rbnode); + rb_erase(&head->rbnode, &q->rb_fragments); + while (rbn || fp) { + /* fp points to the next sk_buff in the current run; + * rbn points to the next run. + */ + /* Go through the current run. */ + while (fp) { + *nextp = fp; + nextp = &fp->next; + fp->prev = NULL; + memset(&fp->rbnode, 0, sizeof(fp->rbnode)); + fp->sk = NULL; + head->data_len += fp->len; + head->len += fp->len; + if (head->ip_summed != fp->ip_summed) + head->ip_summed = CHECKSUM_NONE; + else if (head->ip_summed == CHECKSUM_COMPLETE) + head->csum = csum_add(head->csum, fp->csum); + head->truesize += fp->truesize; + fp = FRAG_CB(fp)->next_frag; + } + /* Move to the next run. */ + if (rbn) { + struct rb_node *rbnext = rb_next(rbn); + + fp = rb_to_skb(rbn); + rb_erase(rbn, &q->rb_fragments); + rbn = rbnext; + } + } + sub_frag_mem_limit(q->net, head->truesize); + + *nextp = NULL; + skb_mark_not_on_list(head); + head->prev = NULL; + head->tstamp = q->stamp; +} +EXPORT_SYMBOL(inet_frag_reasm_finish); + +struct sk_buff *inet_frag_pull_head(struct inet_frag_queue *q) +{ + struct sk_buff *head; + + if (q->fragments) { + head = q->fragments; + q->fragments = head->next; + } else { + struct sk_buff *skb; + + head = skb_rb_first(&q->rb_fragments); + if (!head) + return NULL; + skb = FRAG_CB(head)->next_frag; + if (skb) + rb_replace_node(&head->rbnode, &skb->rbnode, + &q->rb_fragments); + else + rb_erase(&head->rbnode, &q->rb_fragments); + memset(&head->rbnode, 0, sizeof(head->rbnode)); + barrier(); + } + if (head == q->fragments_tail) + q->fragments_tail = NULL; + + sub_frag_mem_limit(q->net, head->truesize); + + return head; +} +EXPORT_SYMBOL(inet_frag_pull_head); diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index d95b32af4a0e..5a1d39e32196 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -57,57 +57,6 @@ */ static const char ip_frag_cache_name[] = "ip4-frags"; -/* Use skb->cb to track consecutive/adjacent fragments coming at - * the end of the queue. Nodes in the rb-tree queue will - * contain "runs" of one or more adjacent fragments. - * - * Invariants: - * - next_frag is NULL at the tail of a "run"; - * - the head of a "run" has the sum of all fragment lengths in frag_run_len. - */ -struct ipfrag_skb_cb { - struct inet_skb_parm h; - struct sk_buff *next_frag; - int frag_run_len; -}; - -#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb)) - -static void ip4_frag_init_run(struct sk_buff *skb) -{ - BUILD_BUG_ON(sizeof(struct ipfrag_skb_cb) > sizeof(skb->cb)); - - FRAG_CB(skb)->next_frag = NULL; - FRAG_CB(skb)->frag_run_len = skb->len; -} - -/* Append skb to the last "run". */ -static void ip4_frag_append_to_last_run(struct inet_frag_queue *q, - struct sk_buff *skb) -{ - RB_CLEAR_NODE(&skb->rbnode); - FRAG_CB(skb)->next_frag = NULL; - - FRAG_CB(q->last_run_head)->frag_run_len += skb->len; - FRAG_CB(q->fragments_tail)->next_frag = skb; - q->fragments_tail = skb; -} - -/* Create a new "run" with the skb. */ -static void ip4_frag_create_run(struct inet_frag_queue *q, struct sk_buff *skb) -{ - if (q->last_run_head) - rb_link_node(&skb->rbnode, &q->last_run_head->rbnode, - &q->last_run_head->rbnode.rb_right); - else - rb_link_node(&skb->rbnode, NULL, &q->rb_fragments.rb_node); - rb_insert_color(&skb->rbnode, &q->rb_fragments); - - ip4_frag_init_run(skb); - q->fragments_tail = skb; - q->last_run_head = skb; -} - /* Describe an entry in the "incomplete datagrams" queue. */ struct ipq { struct inet_frag_queue q; @@ -212,27 +161,9 @@ static void ip_expire(struct timer_list *t) * pull the head out of the tree in order to be able to * deal with head->dev. */ - if (qp->q.fragments) { - head = qp->q.fragments; - qp->q.fragments = head->next; - } else { - head = skb_rb_first(&qp->q.rb_fragments); - if (!head) - goto out; - if (FRAG_CB(head)->next_frag) - rb_replace_node(&head->rbnode, - &FRAG_CB(head)->next_frag->rbnode, - &qp->q.rb_fragments); - else - rb_erase(&head->rbnode, &qp->q.rb_fragments); - memset(&head->rbnode, 0, sizeof(head->rbnode)); - barrier(); - } - if (head == qp->q.fragments_tail) - qp->q.fragments_tail = NULL; - - sub_frag_mem_limit(qp->q.net, head->truesize); - + head = inet_frag_pull_head(&qp->q); + if (!head) + goto out; head->dev = dev_get_by_index_rcu(net, qp->iif); if (!head->dev) goto out; @@ -345,12 +276,10 @@ static int ip_frag_reinit(struct ipq *qp) static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) { struct net *net = container_of(qp->q.net, struct net, ipv4.frags); - struct rb_node **rbn, *parent; - struct sk_buff *skb1, *prev_tail; - int ihl, end, skb1_run_end; + int ihl, end, flags, offset; + struct sk_buff *prev_tail; struct net_device *dev; unsigned int fragsize; - int flags, offset; int err = -ENOENT; u8 ecn; @@ -382,7 +311,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) */ if (end < qp->q.len || ((qp->q.flags & INET_FRAG_LAST_IN) && end != qp->q.len)) - goto err; + goto discard_qp; qp->q.flags |= INET_FRAG_LAST_IN; qp->q.len = end; } else { @@ -394,82 +323,33 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) if (end > qp->q.len) { /* Some bits beyond end -> corruption. */ if (qp->q.flags & INET_FRAG_LAST_IN) - goto err; + goto discard_qp; qp->q.len = end; } } if (end == offset) - goto err; + goto discard_qp; err = -ENOMEM; if (!pskb_pull(skb, skb_network_offset(skb) + ihl)) - goto err; + goto discard_qp; err = pskb_trim_rcsum(skb, end - offset); if (err) - goto err; + goto discard_qp; /* Note : skb->rbnode and skb->dev share the same location. */ dev = skb->dev; /* Makes sure compiler wont do silly aliasing games */ barrier(); - /* RFC5722, Section 4, amended by Errata ID : 3089 - * When reassembling an IPv6 datagram, if - * one or more its constituent fragments is determined to be an - * overlapping fragment, the entire datagram (and any constituent - * fragments) MUST be silently discarded. - * - * We do the same here for IPv4 (and increment an snmp counter) but - * we do not want to drop the whole queue in response to a duplicate - * fragment. - */ - - err = -EINVAL; - /* Find out where to put this fragment. */ prev_tail = qp->q.fragments_tail; - if (!prev_tail) - ip4_frag_create_run(&qp->q, skb); /* First fragment. */ - else if (prev_tail->ip_defrag_offset + prev_tail->len < end) { - /* This is the common case: skb goes to the end. */ - /* Detect and discard overlaps. */ - if (offset < prev_tail->ip_defrag_offset + prev_tail->len) - goto discard_qp; - if (offset == prev_tail->ip_defrag_offset + prev_tail->len) - ip4_frag_append_to_last_run(&qp->q, skb); - else - ip4_frag_create_run(&qp->q, skb); - } else { - /* Binary search. Note that skb can become the first fragment, - * but not the last (covered above). - */ - rbn = &qp->q.rb_fragments.rb_node; - do { - parent = *rbn; - skb1 = rb_to_skb(parent); - skb1_run_end = skb1->ip_defrag_offset + - FRAG_CB(skb1)->frag_run_len; - if (end <= skb1->ip_defrag_offset) - rbn = &parent->rb_left; - else if (offset >= skb1_run_end) - rbn = &parent->rb_right; - else if (offset >= skb1->ip_defrag_offset && - end <= skb1_run_end) - goto err; /* No new data, potential duplicate */ - else - goto discard_qp; /* Found an overlap */ - } while (*rbn); - /* Here we have parent properly set, and rbn pointing to - * one of its NULL left/right children. Insert skb. - */ - ip4_frag_init_run(skb); - rb_link_node(&skb->rbnode, parent, rbn); - rb_insert_color(&skb->rbnode, &qp->q.rb_fragments); - } + err = inet_frag_queue_insert(&qp->q, skb, offset, end); + if (err) + goto insert_error; if (dev) qp->iif = dev->ifindex; - skb->ip_defrag_offset = offset; qp->q.stamp = skb->tstamp; qp->q.meat += skb->len; @@ -494,15 +374,24 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) skb->_skb_refdst = 0UL; err = ip_frag_reasm(qp, skb, prev_tail, dev); skb->_skb_refdst = orefdst; + if (err) + inet_frag_kill(&qp->q); return err; } skb_dst_drop(skb); return -EINPROGRESS; +insert_error: + if (err == IPFRAG_DUP) { + kfree_skb(skb); + return -EINVAL; + } + err = -EINVAL; + __IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS); discard_qp: inet_frag_kill(&qp->q); - __IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS); + __IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS); err: kfree_skb(skb); return err; @@ -514,13 +403,8 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, { struct net *net = container_of(qp->q.net, struct net, ipv4.frags); struct iphdr *iph; - struct sk_buff *fp, *head = skb_rb_first(&qp->q.rb_fragments); - struct sk_buff **nextp; /* To build frag_list. */ - struct rb_node *rbn; - int len; - int ihlen; - int delta; - int err; + void *reasm_data; + int len, err; u8 ecn; ipq_kill(qp); @@ -530,117 +414,23 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, err = -EINVAL; goto out_fail; } + /* Make the one we just received the head. */ - if (head != skb) { - fp = skb_clone(skb, GFP_ATOMIC); - if (!fp) - goto out_nomem; - FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag; - if (RB_EMPTY_NODE(&skb->rbnode)) - FRAG_CB(prev_tail)->next_frag = fp; - else - rb_replace_node(&skb->rbnode, &fp->rbnode, - &qp->q.rb_fragments); - if (qp->q.fragments_tail == skb) - qp->q.fragments_tail = fp; - skb_morph(skb, head); - FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag; - rb_replace_node(&head->rbnode, &skb->rbnode, - &qp->q.rb_fragments); - consume_skb(head); - head = skb; - } - - WARN_ON(head->ip_defrag_offset != 0); - - /* Allocate a new buffer for the datagram. */ - ihlen = ip_hdrlen(head); - len = ihlen + qp->q.len; + reasm_data = inet_frag_reasm_prepare(&qp->q, skb, prev_tail); + if (!reasm_data) + goto out_nomem; + len = ip_hdrlen(skb) + qp->q.len; err = -E2BIG; if (len > 65535) goto out_oversize; - delta = - head->truesize; + inet_frag_reasm_finish(&qp->q, skb, reasm_data); - /* Head of list must not be cloned. */ - if (skb_unclone(head, GFP_ATOMIC)) - goto out_nomem; + skb->dev = dev; + IPCB(skb)->frag_max_size = max(qp->max_df_size, qp->q.max_size); - delta += head->truesize; - if (delta) - add_frag_mem_limit(qp->q.net, delta); - - /* If the first fragment is fragmented itself, we split - * it to two chunks: the first with data and paged part - * and the second, holding only fragments. */ - if (skb_has_frag_list(head)) { - struct sk_buff *clone; - int i, plen = 0; - - clone = alloc_skb(0, GFP_ATOMIC); - if (!clone) - goto out_nomem; - skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; - skb_frag_list_init(head); - for (i = 0; i < skb_shinfo(head)->nr_frags; i++) - plen += skb_frag_size(&skb_shinfo(head)->frags[i]); - clone->len = clone->data_len = head->data_len - plen; - head->truesize += clone->truesize; - clone->csum = 0; - clone->ip_summed = head->ip_summed; - add_frag_mem_limit(qp->q.net, clone->truesize); - skb_shinfo(head)->frag_list = clone; - nextp = &clone->next; - } else { - nextp = &skb_shinfo(head)->frag_list; - } - - skb_push(head, head->data - skb_network_header(head)); - - /* Traverse the tree in order, to build frag_list. */ - fp = FRAG_CB(head)->next_frag; - rbn = rb_next(&head->rbnode); - rb_erase(&head->rbnode, &qp->q.rb_fragments); - while (rbn || fp) { - /* fp points to the next sk_buff in the current run; - * rbn points to the next run. - */ - /* Go through the current run. */ - while (fp) { - *nextp = fp; - nextp = &fp->next; - fp->prev = NULL; - memset(&fp->rbnode, 0, sizeof(fp->rbnode)); - fp->sk = NULL; - head->data_len += fp->len; - head->len += fp->len; - if (head->ip_summed != fp->ip_summed) - head->ip_summed = CHECKSUM_NONE; - else if (head->ip_summed == CHECKSUM_COMPLETE) - head->csum = csum_add(head->csum, fp->csum); - head->truesize += fp->truesize; - fp = FRAG_CB(fp)->next_frag; - } - /* Move to the next run. */ - if (rbn) { - struct rb_node *rbnext = rb_next(rbn); - - fp = rb_to_skb(rbn); - rb_erase(rbn, &qp->q.rb_fragments); - rbn = rbnext; - } - } - sub_frag_mem_limit(qp->q.net, head->truesize); - - *nextp = NULL; - head->next = NULL; - head->prev = NULL; - head->dev = dev; - head->tstamp = qp->q.stamp; - IPCB(head)->frag_max_size = max(qp->max_df_size, qp->q.max_size); - - iph = ip_hdr(head); + iph = ip_hdr(skb); iph->tot_len = htons(len); iph->tos |= ecn; @@ -653,7 +443,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, * from one very small df-fragment and one large non-df frag. */ if (qp->max_df_size == qp->q.max_size) { - IPCB(head)->flags |= IPSKB_FRAG_PMTU; + IPCB(skb)->flags |= IPSKB_FRAG_PMTU; iph->frag_off = htons(IP_DF); } else { iph->frag_off = 0; @@ -751,28 +541,6 @@ struct sk_buff *ip_check_defrag(struct net *net, struct sk_buff *skb, u32 user) } EXPORT_SYMBOL(ip_check_defrag); -unsigned int inet_frag_rbtree_purge(struct rb_root *root) -{ - struct rb_node *p = rb_first(root); - unsigned int sum = 0; - - while (p) { - struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode); - - p = rb_next(p); - rb_erase(&skb->rbnode, root); - while (skb) { - struct sk_buff *next = FRAG_CB(skb)->next_frag; - - sum += skb->truesize; - kfree_skb(skb); - skb = next; - } - } - return sum; -} -EXPORT_SYMBOL(inet_frag_rbtree_purge); - #ifdef CONFIG_SYSCTL static int dist_min; From 684685326ab0cf8d71ae83ff614c748876f24938 Mon Sep 17 00:00:00 2001 From: Peter Oskolkov Date: Tue, 23 Apr 2019 10:25:32 -0700 Subject: [PATCH 26/99] net: IP6 defrag: use rbtrees for IPv6 defrag [ Upstream commit d4289fcc9b16b89619ee1c54f829e05e56de8b9a ] Currently, IPv6 defragmentation code drops non-last fragments that are smaller than 1280 bytes: see commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags smaller than min mtu") This behavior is not specified in IPv6 RFCs and appears to break compatibility with some IPv6 implemenations, as reported here: https://www.spinics.net/lists/netdev/msg543846.html This patch re-uses common IP defragmentation queueing and reassembly code in IPv6, removing the 1280 byte restriction. v2: change handling of overlaps to match that of upstream. Signed-off-by: Peter Oskolkov Reported-by: Tom Herbert Cc: Eric Dumazet Cc: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- include/net/ipv6_frag.h | 11 +- net/ipv6/reassembly.c | 236 +++++++++++----------------------------- 2 files changed, 73 insertions(+), 174 deletions(-) diff --git a/include/net/ipv6_frag.h b/include/net/ipv6_frag.h index 6ced1e6899b6..28aa9b30aece 100644 --- a/include/net/ipv6_frag.h +++ b/include/net/ipv6_frag.h @@ -82,8 +82,15 @@ ip6frag_expire_frag_queue(struct net *net, struct frag_queue *fq) __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT); /* Don't send error if the first segment did not arrive. */ - head = fq->q.fragments; - if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head) + if (!(fq->q.flags & INET_FRAG_FIRST_IN)) + goto out; + + /* sk_buff::dev and sk_buff::rbnode are unionized. So we + * pull the head out of the tree in order to be able to + * deal with head->dev. + */ + head = inet_frag_pull_head(&fq->q); + if (!head) goto out; head->dev = dev; diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 7c943392c128..095825f964e2 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -69,8 +69,8 @@ static u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h) static struct inet_frags ip6_frags; -static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev, - struct net_device *dev); +static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb, + struct sk_buff *prev_tail, struct net_device *dev); static void ip6_frag_expire(struct timer_list *t) { @@ -111,21 +111,26 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, struct frag_hdr *fhdr, int nhoff, u32 *prob_offset) { - struct sk_buff *prev, *next; - struct net_device *dev; - int offset, end, fragsize; struct net *net = dev_net(skb_dst(skb)->dev); + int offset, end, fragsize; + struct sk_buff *prev_tail; + struct net_device *dev; + int err = -ENOENT; u8 ecn; if (fq->q.flags & INET_FRAG_COMPLETE) goto err; + err = -EINVAL; offset = ntohs(fhdr->frag_off) & ~0x7; end = offset + (ntohs(ipv6_hdr(skb)->payload_len) - ((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1))); if ((unsigned int)end > IPV6_MAXPLEN) { *prob_offset = (u8 *)&fhdr->frag_off - skb_network_header(skb); + /* note that if prob_offset is set, the skb is freed elsewhere, + * we do not free it here. + */ return -1; } @@ -145,7 +150,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, */ if (end < fq->q.len || ((fq->q.flags & INET_FRAG_LAST_IN) && end != fq->q.len)) - goto err; + goto discard_fq; fq->q.flags |= INET_FRAG_LAST_IN; fq->q.len = end; } else { @@ -162,70 +167,35 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, if (end > fq->q.len) { /* Some bits beyond end -> corruption. */ if (fq->q.flags & INET_FRAG_LAST_IN) - goto err; + goto discard_fq; fq->q.len = end; } } if (end == offset) - goto err; + goto discard_fq; + err = -ENOMEM; /* Point into the IP datagram 'data' part. */ if (!pskb_pull(skb, (u8 *) (fhdr + 1) - skb->data)) - goto err; - - if (pskb_trim_rcsum(skb, end - offset)) - goto err; - - /* Find out which fragments are in front and at the back of us - * in the chain of fragments so far. We must know where to put - * this fragment, right? - */ - prev = fq->q.fragments_tail; - if (!prev || prev->ip_defrag_offset < offset) { - next = NULL; - goto found; - } - prev = NULL; - for (next = fq->q.fragments; next != NULL; next = next->next) { - if (next->ip_defrag_offset >= offset) - break; /* bingo! */ - prev = next; - } - -found: - /* RFC5722, Section 4, amended by Errata ID : 3089 - * When reassembling an IPv6 datagram, if - * one or more its constituent fragments is determined to be an - * overlapping fragment, the entire datagram (and any constituent - * fragments) MUST be silently discarded. - */ - - /* Check for overlap with preceding fragment. */ - if (prev && - (prev->ip_defrag_offset + prev->len) > offset) goto discard_fq; - /* Look for overlap with succeeding segment. */ - if (next && next->ip_defrag_offset < end) + err = pskb_trim_rcsum(skb, end - offset); + if (err) goto discard_fq; - /* Note : skb->ip_defrag_offset and skb->dev share the same location */ + /* Note : skb->rbnode and skb->dev share the same location. */ dev = skb->dev; - if (dev) - fq->iif = dev->ifindex; /* Makes sure compiler wont do silly aliasing games */ barrier(); - skb->ip_defrag_offset = offset; - /* Insert this fragment in the chain of fragments. */ - skb->next = next; - if (!next) - fq->q.fragments_tail = skb; - if (prev) - prev->next = skb; - else - fq->q.fragments = skb; + prev_tail = fq->q.fragments_tail; + err = inet_frag_queue_insert(&fq->q, skb, offset, end); + if (err) + goto insert_error; + + if (dev) + fq->iif = dev->ifindex; fq->q.stamp = skb->tstamp; fq->q.meat += skb->len; @@ -246,44 +216,48 @@ found: if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) && fq->q.meat == fq->q.len) { - int res; unsigned long orefdst = skb->_skb_refdst; skb->_skb_refdst = 0UL; - res = ip6_frag_reasm(fq, prev, dev); + err = ip6_frag_reasm(fq, skb, prev_tail, dev); skb->_skb_refdst = orefdst; - return res; + return err; } skb_dst_drop(skb); - return -1; + return -EINPROGRESS; +insert_error: + if (err == IPFRAG_DUP) { + kfree_skb(skb); + return -EINVAL; + } + err = -EINVAL; + __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), + IPSTATS_MIB_REASM_OVERLAPS); discard_fq: inet_frag_kill(&fq->q); -err: __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMFAILS); +err: kfree_skb(skb); - return -1; + return err; } /* * Check if this packet is complete. - * Returns NULL on failure by any reason, and pointer - * to current nexthdr field in reassembled frame. * * It is called with locked fq, and caller must check that * queue is eligible for reassembly i.e. it is not COMPLETE, * the last and the first frames arrived and all the bits are here. */ -static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev, - struct net_device *dev) +static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb, + struct sk_buff *prev_tail, struct net_device *dev) { struct net *net = container_of(fq->q.net, struct net, ipv6.frags); - struct sk_buff *fp, *head = fq->q.fragments; - int payload_len, delta; unsigned int nhoff; - int sum_truesize; + void *reasm_data; + int payload_len; u8 ecn; inet_frag_kill(&fq->q); @@ -292,121 +266,40 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev, if (unlikely(ecn == 0xff)) goto out_fail; - /* Make the one we just received the head. */ - if (prev) { - head = prev->next; - fp = skb_clone(head, GFP_ATOMIC); + reasm_data = inet_frag_reasm_prepare(&fq->q, skb, prev_tail); + if (!reasm_data) + goto out_oom; - if (!fp) - goto out_oom; - - fp->next = head->next; - if (!fp->next) - fq->q.fragments_tail = fp; - prev->next = fp; - - skb_morph(head, fq->q.fragments); - head->next = fq->q.fragments->next; - - consume_skb(fq->q.fragments); - fq->q.fragments = head; - } - - WARN_ON(head == NULL); - WARN_ON(head->ip_defrag_offset != 0); - - /* Unfragmented part is taken from the first segment. */ - payload_len = ((head->data - skb_network_header(head)) - + payload_len = ((skb->data - skb_network_header(skb)) - sizeof(struct ipv6hdr) + fq->q.len - sizeof(struct frag_hdr)); if (payload_len > IPV6_MAXPLEN) goto out_oversize; - delta = - head->truesize; - - /* Head of list must not be cloned. */ - if (skb_unclone(head, GFP_ATOMIC)) - goto out_oom; - - delta += head->truesize; - if (delta) - add_frag_mem_limit(fq->q.net, delta); - - /* If the first fragment is fragmented itself, we split - * it to two chunks: the first with data and paged part - * and the second, holding only fragments. */ - if (skb_has_frag_list(head)) { - struct sk_buff *clone; - int i, plen = 0; - - clone = alloc_skb(0, GFP_ATOMIC); - if (!clone) - goto out_oom; - clone->next = head->next; - head->next = clone; - skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; - skb_frag_list_init(head); - for (i = 0; i < skb_shinfo(head)->nr_frags; i++) - plen += skb_frag_size(&skb_shinfo(head)->frags[i]); - clone->len = clone->data_len = head->data_len - plen; - head->data_len -= clone->len; - head->len -= clone->len; - clone->csum = 0; - clone->ip_summed = head->ip_summed; - add_frag_mem_limit(fq->q.net, clone->truesize); - } - /* We have to remove fragment header from datagram and to relocate * header in order to calculate ICV correctly. */ nhoff = fq->nhoffset; - skb_network_header(head)[nhoff] = skb_transport_header(head)[0]; - memmove(head->head + sizeof(struct frag_hdr), head->head, - (head->data - head->head) - sizeof(struct frag_hdr)); - if (skb_mac_header_was_set(head)) - head->mac_header += sizeof(struct frag_hdr); - head->network_header += sizeof(struct frag_hdr); + skb_network_header(skb)[nhoff] = skb_transport_header(skb)[0]; + memmove(skb->head + sizeof(struct frag_hdr), skb->head, + (skb->data - skb->head) - sizeof(struct frag_hdr)); + if (skb_mac_header_was_set(skb)) + skb->mac_header += sizeof(struct frag_hdr); + skb->network_header += sizeof(struct frag_hdr); - skb_reset_transport_header(head); - skb_push(head, head->data - skb_network_header(head)); + skb_reset_transport_header(skb); - sum_truesize = head->truesize; - for (fp = head->next; fp;) { - bool headstolen; - int delta; - struct sk_buff *next = fp->next; + inet_frag_reasm_finish(&fq->q, skb, reasm_data); - sum_truesize += fp->truesize; - if (head->ip_summed != fp->ip_summed) - head->ip_summed = CHECKSUM_NONE; - else if (head->ip_summed == CHECKSUM_COMPLETE) - head->csum = csum_add(head->csum, fp->csum); - - if (skb_try_coalesce(head, fp, &headstolen, &delta)) { - kfree_skb_partial(fp, headstolen); - } else { - fp->sk = NULL; - if (!skb_shinfo(head)->frag_list) - skb_shinfo(head)->frag_list = fp; - head->data_len += fp->len; - head->len += fp->len; - head->truesize += fp->truesize; - } - fp = next; - } - sub_frag_mem_limit(fq->q.net, sum_truesize); - - head->next = NULL; - head->dev = dev; - head->tstamp = fq->q.stamp; - ipv6_hdr(head)->payload_len = htons(payload_len); - ipv6_change_dsfield(ipv6_hdr(head), 0xff, ecn); - IP6CB(head)->nhoff = nhoff; - IP6CB(head)->flags |= IP6SKB_FRAGMENTED; - IP6CB(head)->frag_max_size = fq->q.max_size; + skb->dev = dev; + ipv6_hdr(skb)->payload_len = htons(payload_len); + ipv6_change_dsfield(ipv6_hdr(skb), 0xff, ecn); + IP6CB(skb)->nhoff = nhoff; + IP6CB(skb)->flags |= IP6SKB_FRAGMENTED; + IP6CB(skb)->frag_max_size = fq->q.max_size; /* Yes, and fold redundant checksum back. 8) */ - skb_postpush_rcsum(head, skb_network_header(head), - skb_network_header_len(head)); + skb_postpush_rcsum(skb, skb_network_header(skb), + skb_network_header_len(skb)); rcu_read_lock(); __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS); @@ -414,6 +307,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev, fq->q.fragments = NULL; fq->q.rb_fragments = RB_ROOT; fq->q.fragments_tail = NULL; + fq->q.last_run_head = NULL; return 1; out_oversize: @@ -425,6 +319,7 @@ out_fail: rcu_read_lock(); __IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS); rcu_read_unlock(); + inet_frag_kill(&fq->q); return -1; } @@ -463,10 +358,6 @@ static int ipv6_frag_rcv(struct sk_buff *skb) return 1; } - if (skb->len - skb_network_offset(skb) < IPV6_MIN_MTU && - fhdr->frag_off & htons(IP6_MF)) - goto fail_hdr; - iif = skb->dev ? skb->dev->ifindex : 0; fq = fq_find(net, fhdr->identification, hdr, iif); if (fq) { @@ -484,6 +375,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb) if (prob_offset) { __IP6_INC_STATS(net, __in6_dev_get_safely(skb->dev), IPSTATS_MIB_INHDRERRORS); + /* icmpv6_param_prob() calls kfree_skb(skb) */ icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, prob_offset); } return ret; From 6e2081f29392f878aa9e1bd19b976c7c8b82bad3 Mon Sep 17 00:00:00 2001 From: Peter Oskolkov Date: Tue, 23 Apr 2019 10:25:33 -0700 Subject: [PATCH 27/99] net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c [ Upstream commit 997dd96471641e147cb2c33ad54284000d0f5e35 ] Currently, IPv6 defragmentation code drops non-last fragments that are smaller than 1280 bytes: see commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags smaller than min mtu") This behavior is not specified in IPv6 RFCs and appears to break compatibility with some IPv6 implemenations, as reported here: https://www.spinics.net/lists/netdev/msg543846.html This patch re-uses common IP defragmentation queueing and reassembly code in IP6 defragmentation in nf_conntrack, removing the 1280 byte restriction. Signed-off-by: Peter Oskolkov Reported-by: Tom Herbert Cc: Eric Dumazet Cc: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv6/netfilter/nf_conntrack_reasm.c | 256 +++++++----------------- 1 file changed, 69 insertions(+), 187 deletions(-) diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 043ed8eb0ab9..cb1b4772dac0 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -136,6 +136,9 @@ static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net) } #endif +static int nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *skb, + struct sk_buff *prev_tail, struct net_device *dev); + static inline u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h) { return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK); @@ -177,9 +180,10 @@ static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user, static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, const struct frag_hdr *fhdr, int nhoff) { - struct sk_buff *prev, *next; unsigned int payload_len; - int offset, end; + struct net_device *dev; + struct sk_buff *prev; + int offset, end, err; u8 ecn; if (fq->q.flags & INET_FRAG_COMPLETE) { @@ -254,55 +258,18 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, goto err; } - /* Find out which fragments are in front and at the back of us - * in the chain of fragments so far. We must know where to put - * this fragment, right? - */ - prev = fq->q.fragments_tail; - if (!prev || prev->ip_defrag_offset < offset) { - next = NULL; - goto found; - } - prev = NULL; - for (next = fq->q.fragments; next != NULL; next = next->next) { - if (next->ip_defrag_offset >= offset) - break; /* bingo! */ - prev = next; - } - -found: - /* RFC5722, Section 4: - * When reassembling an IPv6 datagram, if - * one or more its constituent fragments is determined to be an - * overlapping fragment, the entire datagram (and any constituent - * fragments, including those not yet received) MUST be silently - * discarded. - */ - - /* Check for overlap with preceding fragment. */ - if (prev && - (prev->ip_defrag_offset + prev->len) > offset) - goto discard_fq; - - /* Look for overlap with succeeding segment. */ - if (next && next->ip_defrag_offset < end) - goto discard_fq; - - /* Note : skb->ip_defrag_offset and skb->dev share the same location */ - if (skb->dev) - fq->iif = skb->dev->ifindex; + /* Note : skb->rbnode and skb->dev share the same location. */ + dev = skb->dev; /* Makes sure compiler wont do silly aliasing games */ barrier(); - skb->ip_defrag_offset = offset; - /* Insert this fragment in the chain of fragments. */ - skb->next = next; - if (!next) - fq->q.fragments_tail = skb; - if (prev) - prev->next = skb; - else - fq->q.fragments = skb; + prev = fq->q.fragments_tail; + err = inet_frag_queue_insert(&fq->q, skb, offset, end); + if (err) + goto insert_error; + + if (dev) + fq->iif = dev->ifindex; fq->q.stamp = skb->tstamp; fq->q.meat += skb->len; @@ -319,11 +286,25 @@ found: fq->q.flags |= INET_FRAG_FIRST_IN; } - return 0; + if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) && + fq->q.meat == fq->q.len) { + unsigned long orefdst = skb->_skb_refdst; -discard_fq: + skb->_skb_refdst = 0UL; + err = nf_ct_frag6_reasm(fq, skb, prev, dev); + skb->_skb_refdst = orefdst; + return err; + } + + skb_dst_drop(skb); + return -EINPROGRESS; + +insert_error: + if (err == IPFRAG_DUP) + goto err; inet_frag_kill(&fq->q); err: + skb_dst_drop(skb); return -EINVAL; } @@ -333,147 +314,67 @@ err: * It is called with locked fq, and caller must check that * queue is eligible for reassembly i.e. it is not COMPLETE, * the last and the first frames arrived and all the bits are here. - * - * returns true if *prev skb has been transformed into the reassembled - * skb, false otherwise. */ -static bool -nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev, struct net_device *dev) +static int nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *skb, + struct sk_buff *prev_tail, struct net_device *dev) { - struct sk_buff *fp, *head = fq->q.fragments; - int payload_len, delta; + void *reasm_data; + int payload_len; u8 ecn; inet_frag_kill(&fq->q); - WARN_ON(head == NULL); - WARN_ON(head->ip_defrag_offset != 0); - ecn = ip_frag_ecn_table[fq->ecn]; if (unlikely(ecn == 0xff)) - return false; + goto err; - /* Unfragmented part is taken from the first segment. */ - payload_len = ((head->data - skb_network_header(head)) - + reasm_data = inet_frag_reasm_prepare(&fq->q, skb, prev_tail); + if (!reasm_data) + goto err; + + payload_len = ((skb->data - skb_network_header(skb)) - sizeof(struct ipv6hdr) + fq->q.len - sizeof(struct frag_hdr)); if (payload_len > IPV6_MAXPLEN) { net_dbg_ratelimited("nf_ct_frag6_reasm: payload len = %d\n", payload_len); - return false; - } - - delta = - head->truesize; - - /* Head of list must not be cloned. */ - if (skb_unclone(head, GFP_ATOMIC)) - return false; - - delta += head->truesize; - if (delta) - add_frag_mem_limit(fq->q.net, delta); - - /* If the first fragment is fragmented itself, we split - * it to two chunks: the first with data and paged part - * and the second, holding only fragments. */ - if (skb_has_frag_list(head)) { - struct sk_buff *clone; - int i, plen = 0; - - clone = alloc_skb(0, GFP_ATOMIC); - if (clone == NULL) - return false; - - clone->next = head->next; - head->next = clone; - skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; - skb_frag_list_init(head); - for (i = 0; i < skb_shinfo(head)->nr_frags; i++) - plen += skb_frag_size(&skb_shinfo(head)->frags[i]); - clone->len = clone->data_len = head->data_len - plen; - head->data_len -= clone->len; - head->len -= clone->len; - clone->csum = 0; - clone->ip_summed = head->ip_summed; - - add_frag_mem_limit(fq->q.net, clone->truesize); - } - - /* morph head into last received skb: prev. - * - * This allows callers of ipv6 conntrack defrag to continue - * to use the last skb(frag) passed into the reasm engine. - * The last skb frag 'silently' turns into the full reassembled skb. - * - * Since prev is also part of q->fragments we have to clone it first. - */ - if (head != prev) { - struct sk_buff *iter; - - fp = skb_clone(prev, GFP_ATOMIC); - if (!fp) - return false; - - fp->next = prev->next; - - iter = head; - while (iter) { - if (iter->next == prev) { - iter->next = fp; - break; - } - iter = iter->next; - } - - skb_morph(prev, head); - prev->next = head->next; - consume_skb(head); - head = prev; + goto err; } /* We have to remove fragment header from datagram and to relocate * header in order to calculate ICV correctly. */ - skb_network_header(head)[fq->nhoffset] = skb_transport_header(head)[0]; - memmove(head->head + sizeof(struct frag_hdr), head->head, - (head->data - head->head) - sizeof(struct frag_hdr)); - head->mac_header += sizeof(struct frag_hdr); - head->network_header += sizeof(struct frag_hdr); + skb_network_header(skb)[fq->nhoffset] = skb_transport_header(skb)[0]; + memmove(skb->head + sizeof(struct frag_hdr), skb->head, + (skb->data - skb->head) - sizeof(struct frag_hdr)); + skb->mac_header += sizeof(struct frag_hdr); + skb->network_header += sizeof(struct frag_hdr); - skb_shinfo(head)->frag_list = head->next; - skb_reset_transport_header(head); - skb_push(head, head->data - skb_network_header(head)); + skb_reset_transport_header(skb); - for (fp = head->next; fp; fp = fp->next) { - head->data_len += fp->len; - head->len += fp->len; - if (head->ip_summed != fp->ip_summed) - head->ip_summed = CHECKSUM_NONE; - else if (head->ip_summed == CHECKSUM_COMPLETE) - head->csum = csum_add(head->csum, fp->csum); - head->truesize += fp->truesize; - fp->sk = NULL; - } - sub_frag_mem_limit(fq->q.net, head->truesize); + inet_frag_reasm_finish(&fq->q, skb, reasm_data); - head->ignore_df = 1; - head->next = NULL; - head->dev = dev; - head->tstamp = fq->q.stamp; - ipv6_hdr(head)->payload_len = htons(payload_len); - ipv6_change_dsfield(ipv6_hdr(head), 0xff, ecn); - IP6CB(head)->frag_max_size = sizeof(struct ipv6hdr) + fq->q.max_size; + skb->ignore_df = 1; + skb->dev = dev; + ipv6_hdr(skb)->payload_len = htons(payload_len); + ipv6_change_dsfield(ipv6_hdr(skb), 0xff, ecn); + IP6CB(skb)->frag_max_size = sizeof(struct ipv6hdr) + fq->q.max_size; /* Yes, and fold redundant checksum back. 8) */ - if (head->ip_summed == CHECKSUM_COMPLETE) - head->csum = csum_partial(skb_network_header(head), - skb_network_header_len(head), - head->csum); + if (skb->ip_summed == CHECKSUM_COMPLETE) + skb->csum = csum_partial(skb_network_header(skb), + skb_network_header_len(skb), + skb->csum); fq->q.fragments = NULL; fq->q.rb_fragments = RB_ROOT; fq->q.fragments_tail = NULL; + fq->q.last_run_head = NULL; - return true; + return 0; + +err: + inet_frag_kill(&fq->q); + return -EINVAL; } /* @@ -542,7 +443,6 @@ find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int *prevhoff, int *fhoff) int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) { u16 savethdr = skb->transport_header; - struct net_device *dev = skb->dev; int fhoff, nhoff, ret; struct frag_hdr *fhdr; struct frag_queue *fq; @@ -565,10 +465,6 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) hdr = ipv6_hdr(skb); fhdr = (struct frag_hdr *)skb_transport_header(skb); - if (skb->len - skb_network_offset(skb) < IPV6_MIN_MTU && - fhdr->frag_off & htons(IP6_MF)) - return -EINVAL; - skb_orphan(skb); fq = fq_find(net, fhdr->identification, user, hdr, skb->dev ? skb->dev->ifindex : 0); @@ -580,31 +476,17 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) spin_lock_bh(&fq->q.lock); ret = nf_ct_frag6_queue(fq, skb, fhdr, nhoff); - if (ret < 0) { - if (ret == -EPROTO) { - skb->transport_header = savethdr; - ret = 0; - } - goto out_unlock; + if (ret == -EPROTO) { + skb->transport_header = savethdr; + ret = 0; } /* after queue has assumed skb ownership, only 0 or -EINPROGRESS * must be returned. */ - ret = -EINPROGRESS; - if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) && - fq->q.meat == fq->q.len) { - unsigned long orefdst = skb->_skb_refdst; + if (ret) + ret = -EINPROGRESS; - skb->_skb_refdst = 0UL; - if (nf_ct_frag6_reasm(fq, skb, dev)) - ret = 0; - skb->_skb_refdst = orefdst; - } else { - skb_dst_drop(skb); - } - -out_unlock: spin_unlock_bh(&fq->q.lock); inet_frag_put(&fq->q); return ret; From 8092ecc306d81186a64cda42411121f4d35aaff4 Mon Sep 17 00:00:00 2001 From: Aurelien Aptel Date: Fri, 29 Mar 2019 10:49:12 +0100 Subject: [PATCH 28/99] CIFS: keep FileInfo handle live during oplock break commit b98749cac4a695f084a5ff076f4510b23e353ecd upstream. In the oplock break handler, writing pending changes from pages puts the FileInfo handle. If the refcount reaches zero it closes the handle and waits for any oplock break handler to return, thus causing a deadlock. To prevent this situation: * We add a wait flag to cifsFileInfo_put() to decide whether we should wait for running/pending oplock break handlers * We keep an additionnal reference of the SMB FileInfo handle so that for the rest of the handler putting the handle won't close it. - The ref is bumped everytime we queue the handler via the cifs_queue_oplock_break() helper. - The ref is decremented at the end of the handler This bug was triggered by xfstest 464. Also important fix to address the various reports of oops in smb2_push_mandatory_locks Signed-off-by: Aurelien Aptel Signed-off-by: Steve French Reviewed-by: Pavel Shilovsky CC: Stable Signed-off-by: Greg Kroah-Hartman --- fs/cifs/cifsglob.h | 2 ++ fs/cifs/file.c | 30 +++++++++++++++++++++++++----- fs/cifs/misc.c | 25 +++++++++++++++++++++++-- fs/cifs/smb2misc.c | 6 +++--- 4 files changed, 53 insertions(+), 10 deletions(-) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 80f33582059e..6f227cc781e5 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1263,6 +1263,7 @@ cifsFileInfo_get_locked(struct cifsFileInfo *cifs_file) } struct cifsFileInfo *cifsFileInfo_get(struct cifsFileInfo *cifs_file); +void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_hdlr); void cifsFileInfo_put(struct cifsFileInfo *cifs_file); #define CIFS_CACHE_READ_FLG 1 @@ -1763,6 +1764,7 @@ GLOBAL_EXTERN spinlock_t gidsidlock; #endif /* CONFIG_CIFS_ACL */ void cifs_oplock_break(struct work_struct *work); +void cifs_queue_oplock_break(struct cifsFileInfo *cfile); extern const struct slow_work_ops cifs_oplock_break_ops; extern struct workqueue_struct *cifsiod_wq; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index d847132ab027..d6b45682833b 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -358,12 +358,30 @@ cifsFileInfo_get(struct cifsFileInfo *cifs_file) return cifs_file; } -/* - * Release a reference on the file private data. This may involve closing - * the filehandle out on the server. Must be called without holding - * tcon->open_file_lock and cifs_file->file_info_lock. +/** + * cifsFileInfo_put - release a reference of file priv data + * + * Always potentially wait for oplock handler. See _cifsFileInfo_put(). */ void cifsFileInfo_put(struct cifsFileInfo *cifs_file) +{ + _cifsFileInfo_put(cifs_file, true); +} + +/** + * _cifsFileInfo_put - release a reference of file priv data + * + * This may involve closing the filehandle @cifs_file out on the + * server. Must be called without holding tcon->open_file_lock and + * cifs_file->file_info_lock. + * + * If @wait_for_oplock_handler is true and we are releasing the last + * reference, wait for any running oplock break handler of the file + * and cancel any pending one. If calling this function from the + * oplock break handler, you need to pass false. + * + */ +void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler) { struct inode *inode = d_inode(cifs_file->dentry); struct cifs_tcon *tcon = tlink_tcon(cifs_file->tlink); @@ -411,7 +429,8 @@ void cifsFileInfo_put(struct cifsFileInfo *cifs_file) spin_unlock(&tcon->open_file_lock); - oplock_break_cancelled = cancel_work_sync(&cifs_file->oplock_break); + oplock_break_cancelled = wait_oplock_handler ? + cancel_work_sync(&cifs_file->oplock_break) : false; if (!tcon->need_reconnect && !cifs_file->invalidHandle) { struct TCP_Server_Info *server = tcon->ses->server; @@ -4170,6 +4189,7 @@ void cifs_oplock_break(struct work_struct *work) cinode); cifs_dbg(FYI, "Oplock release rc = %d\n", rc); } + _cifsFileInfo_put(cfile, false /* do not wait for ourself */); cifs_done_oplock_break(cinode); } diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 6926685e513c..facc94e159a1 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -490,8 +490,7 @@ is_valid_oplock_break(char *buffer, struct TCP_Server_Info *srv) CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2, &pCifsInode->flags); - queue_work(cifsoplockd_wq, - &netfile->oplock_break); + cifs_queue_oplock_break(netfile); netfile->oplock_break_cancelled = false; spin_unlock(&tcon->open_file_lock); @@ -588,6 +587,28 @@ void cifs_put_writer(struct cifsInodeInfo *cinode) spin_unlock(&cinode->writers_lock); } +/** + * cifs_queue_oplock_break - queue the oplock break handler for cfile + * + * This function is called from the demultiplex thread when it + * receives an oplock break for @cfile. + * + * Assumes the tcon->open_file_lock is held. + * Assumes cfile->file_info_lock is NOT held. + */ +void cifs_queue_oplock_break(struct cifsFileInfo *cfile) +{ + /* + * Bump the handle refcount now while we hold the + * open_file_lock to enforce the validity of it for the oplock + * break handler. The matching put is done at the end of the + * handler. + */ + cifsFileInfo_get(cfile); + + queue_work(cifsoplockd_wq, &cfile->oplock_break); +} + void cifs_done_oplock_break(struct cifsInodeInfo *cinode) { clear_bit(CIFS_INODE_PENDING_OPLOCK_BREAK, &cinode->flags); diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c index 58700d2ba8cd..0a7ed2e3ad4f 100644 --- a/fs/cifs/smb2misc.c +++ b/fs/cifs/smb2misc.c @@ -555,7 +555,7 @@ smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp, clear_bit(CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2, &cinode->flags); - queue_work(cifsoplockd_wq, &cfile->oplock_break); + cifs_queue_oplock_break(cfile); kfree(lw); return true; } @@ -719,8 +719,8 @@ smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server) CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2, &cinode->flags); spin_unlock(&cfile->file_info_lock); - queue_work(cifsoplockd_wq, - &cfile->oplock_break); + + cifs_queue_oplock_break(cfile); spin_unlock(&tcon->open_file_lock); spin_unlock(&cifs_tcp_ses_lock); From 8fb89b43b65fcd35f15d982712904b96fc64c68a Mon Sep 17 00:00:00 2001 From: ZhangXiaoxu Date: Sat, 6 Apr 2019 15:47:38 +0800 Subject: [PATCH 29/99] cifs: Fix use-after-free in SMB2_write commit 6a3eb3360667170988f8a6477f6686242061488a upstream. There is a KASAN use-after-free: BUG: KASAN: use-after-free in SMB2_write+0x1342/0x1580 Read of size 8 at addr ffff8880b6a8e450 by task ln/4196 Should not release the 'req' because it will use in the trace. Fixes: eccb4422cf97 ("smb3: Add ftrace tracepoints for improved SMB3 debugging") Signed-off-by: ZhangXiaoxu Signed-off-by: Steve French CC: Stable 4.18+ Reviewed-by: Pavel Shilovsky Signed-off-by: Greg Kroah-Hartman --- fs/cifs/smb2pdu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 71f32d983384..299d8e46f01f 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -3591,7 +3591,6 @@ SMB2_write(const unsigned int xid, struct cifs_io_parms *io_parms, rc = cifs_send_recv(xid, io_parms->tcon->ses, &rqst, &resp_buftype, flags, &rsp_iov); - cifs_small_buf_release(req); rsp = (struct smb2_write_rsp *)rsp_iov.iov_base; if (rc) { @@ -3609,6 +3608,7 @@ SMB2_write(const unsigned int xid, struct cifs_io_parms *io_parms, io_parms->offset, *nbytes); } + cifs_small_buf_release(req); free_rsp_buf(resp_buftype, rsp); return rc; } From c69330a855ab4342d304f67f8c1e7d1fa2686bec Mon Sep 17 00:00:00 2001 From: ZhangXiaoxu Date: Sat, 6 Apr 2019 15:47:39 +0800 Subject: [PATCH 30/99] cifs: Fix use-after-free in SMB2_read commit 088aaf17aa79300cab14dbee2569c58cfafd7d6e upstream. There is a KASAN use-after-free: BUG: KASAN: use-after-free in SMB2_read+0x1136/0x1190 Read of size 8 at addr ffff8880b4e45e50 by task ln/1009 Should not release the 'req' because it will use in the trace. Fixes: eccb4422cf97 ("smb3: Add ftrace tracepoints for improved SMB3 debugging") Signed-off-by: ZhangXiaoxu Signed-off-by: Steve French CC: Stable 4.18+ Reviewed-by: Pavel Shilovsky Signed-off-by: Greg Kroah-Hartman --- fs/cifs/smb2pdu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 299d8e46f01f..c6fd3acc5560 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -3273,8 +3273,6 @@ SMB2_read(const unsigned int xid, struct cifs_io_parms *io_parms, rqst.rq_nvec = 1; rc = cifs_send_recv(xid, ses, &rqst, &resp_buftype, flags, &rsp_iov); - cifs_small_buf_release(req); - rsp = (struct smb2_read_rsp *)rsp_iov.iov_base; if (rc) { @@ -3293,6 +3291,8 @@ SMB2_read(const unsigned int xid, struct cifs_io_parms *io_parms, io_parms->tcon->tid, ses->Suid, io_parms->offset, io_parms->length); + cifs_small_buf_release(req); + *nbytes = le32_to_cpu(rsp->DataLength); if ((*nbytes > CIFS_MAX_MSGSIZE) || (*nbytes > io_parms->length)) { From 2fcee5eaae6ee1c2ccbe629ffd46c6674a98824c Mon Sep 17 00:00:00 2001 From: Ronnie Sahlberg Date: Wed, 10 Apr 2019 07:47:22 +1000 Subject: [PATCH 31/99] cifs: fix handle leak in smb2_query_symlink() commit e6d0fb7b34f264f72c33053558a360a6a734905e upstream. If we enter smb2_query_symlink() for something that is not a symlink and where the SMB2_open() would succeed we would never end up closing this handle and would thus leak a handle on the server. Fix this by immediately calling SMB2_close() on successfull open. Signed-off-by: Ronnie Sahlberg CC: Stable Signed-off-by: Steve French Reviewed-by: Pavel Shilovsky Signed-off-by: Greg Kroah-Hartman --- fs/cifs/smb2ops.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index d4d7d61a6fe2..2001184afe70 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -1906,6 +1906,8 @@ smb2_query_symlink(const unsigned int xid, struct cifs_tcon *tcon, rc = SMB2_open(xid, &oparms, utf16_path, &oplock, NULL, &err_iov, &resp_buftype); + if (!rc) + SMB2_close(xid, tcon, fid.persistent_fid, fid.volatile_fid); if (!rc || !err_iov.iov_base) { rc = -ENOENT; goto free_path; From 18cf09a817714d7702f4dfa0956abefd4b4e2f63 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Tue, 2 Apr 2019 08:10:47 -0700 Subject: [PATCH 32/99] KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU commit 8f4dc2e77cdfaf7e644ef29693fa229db29ee1de upstream. Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save state area, i.e. don't save/restore EFER across SMM transitions. KVM somewhat models this, e.g. doesn't clear EFER on entry to SMM if the guest doesn't support long mode. But during RSM, KVM unconditionally clears EFER so that it can get back to pure 32-bit mode in order to start loading CRs with their actual non-SMM values. Clear EFER only when it will be written when loading the non-SMM state so as to preserve bits that can theoretically be set on 32-bit vCPUs, e.g. KVM always emulates EFER_SCE. And because CR4.PAE is cleared only to play nice with EFER, wrap that code in the long mode check as well. Note, this may result in a compiler warning about cr4 being consumed uninitialized. Re-read CR4 even though it's technically unnecessary, as doing so allows for more readable code and RSM emulation is not a performance critical path. Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/emulate.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 106482da6388..860bd271619d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2575,15 +2575,13 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt) * CR0/CR3/CR4/EFER. It's all a bit more complicated if the vCPU * supports long mode. */ - cr4 = ctxt->ops->get_cr(ctxt, 4); if (emulator_has_longmode(ctxt)) { struct desc_struct cs_desc; /* Zero CR4.PCIDE before CR0.PG. */ - if (cr4 & X86_CR4_PCIDE) { + cr4 = ctxt->ops->get_cr(ctxt, 4); + if (cr4 & X86_CR4_PCIDE) ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PCIDE); - cr4 &= ~X86_CR4_PCIDE; - } /* A 32-bit code segment is required to clear EFER.LMA. */ memset(&cs_desc, 0, sizeof(cs_desc)); @@ -2597,13 +2595,16 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt) if (cr0 & X86_CR0_PE) ctxt->ops->set_cr(ctxt, 0, cr0 & ~(X86_CR0_PG | X86_CR0_PE)); - /* Now clear CR4.PAE (which must be done before clearing EFER.LME). */ - if (cr4 & X86_CR4_PAE) - ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PAE); + if (emulator_has_longmode(ctxt)) { + /* Clear CR4.PAE before clearing EFER.LME. */ + cr4 = ctxt->ops->get_cr(ctxt, 4); + if (cr4 & X86_CR4_PAE) + ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PAE); - /* And finally go back to 32-bit mode. */ - efer = 0; - ctxt->ops->set_msr(ctxt, MSR_EFER, efer); + /* And finally go back to 32-bit mode. */ + efer = 0; + ctxt->ops->set_msr(ctxt, MSR_EFER, efer); + } smbase = ctxt->ops->get_smbase(ctxt); From c9e34935a3514d083cc2787f874fee786c775c6c Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Wed, 3 Apr 2019 16:06:42 +0200 Subject: [PATCH 33/99] KVM: x86: svm: make sure NMI is injected after nmi_singlestep commit 99c221796a810055974b54c02e8f53297e48d146 upstream. I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P, the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing shows that we're sometimes able to deliver a few but never all. When we're trying to inject an NMI we may fail to do so immediately for various reasons, however, we still need to inject it so enable_nmi_window() arms nmi_singlestep mode. #DB occurs as expected, but we're not checking for pending NMIs before entering the guest and unless there's a different event to process, the NMI will never get delivered. Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure pending NMIs are checked and possibly injected. Signed-off-by: Vitaly Kuznetsov Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 6dc72804fe6e..766070d7b4c5 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2679,6 +2679,7 @@ static int npf_interception(struct vcpu_svm *svm) static int db_interception(struct vcpu_svm *svm) { struct kvm_run *kvm_run = svm->vcpu.run; + struct kvm_vcpu *vcpu = &svm->vcpu; if (!(svm->vcpu.guest_debug & (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) && @@ -2689,6 +2690,8 @@ static int db_interception(struct vcpu_svm *svm) if (svm->nmi_singlestep) { disable_nmi_singlestep(svm); + /* Make sure we check for pending NMIs upon entry */ + kvm_make_request(KVM_REQ_EVENT, vcpu); } if (svm->vcpu.guest_debug & From c54d1258c637ac4318cab995be0965b866cb8e3b Mon Sep 17 00:00:00 2001 From: Leonard Pollak Date: Wed, 13 Feb 2019 11:19:52 +0100 Subject: [PATCH 34/99] Staging: iio: meter: fixed typo commit 0a8a29be499cbb67df79370aaf5109085509feb8 upstream. This patch fixes an obvious typo, which will cause erroneously returning the Peak Voltage instead of the Peak Current. Signed-off-by: Leonard Pollak Cc: Acked-by: Michael Hennerich Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/staging/iio/meter/ade7854.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/iio/meter/ade7854.c b/drivers/staging/iio/meter/ade7854.c index 029c3bf42d4d..07774c000c5a 100644 --- a/drivers/staging/iio/meter/ade7854.c +++ b/drivers/staging/iio/meter/ade7854.c @@ -269,7 +269,7 @@ static IIO_DEV_ATTR_VPEAK(0644, static IIO_DEV_ATTR_IPEAK(0644, ade7854_read_32bit, ade7854_write_32bit, - ADE7854_VPEAK); + ADE7854_IPEAK); static IIO_DEV_ATTR_APHCAL(0644, ade7854_read_16bit, ade7854_write_16bit, From 6f3e66b155f02dd90a30c9307f4900256eafa3f1 Mon Sep 17 00:00:00 2001 From: Mircea Caprioru Date: Wed, 20 Feb 2019 13:08:20 +0200 Subject: [PATCH 35/99] staging: iio: ad7192: Fix ad7193 channel address commit 7ce0f216221856a17fc4934b39284678a5fef2e9 upstream. This patch fixes the differential channels addresses for the ad7193. Signed-off-by: Mircea Caprioru Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/staging/iio/adc/ad7192.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/staging/iio/adc/ad7192.c b/drivers/staging/iio/adc/ad7192.c index df0499fc4802..6857a4bf7297 100644 --- a/drivers/staging/iio/adc/ad7192.c +++ b/drivers/staging/iio/adc/ad7192.c @@ -109,10 +109,10 @@ #define AD7192_CH_AIN3 BIT(6) /* AIN3 - AINCOM */ #define AD7192_CH_AIN4 BIT(7) /* AIN4 - AINCOM */ -#define AD7193_CH_AIN1P_AIN2M 0x000 /* AIN1(+) - AIN2(-) */ -#define AD7193_CH_AIN3P_AIN4M 0x001 /* AIN3(+) - AIN4(-) */ -#define AD7193_CH_AIN5P_AIN6M 0x002 /* AIN5(+) - AIN6(-) */ -#define AD7193_CH_AIN7P_AIN8M 0x004 /* AIN7(+) - AIN8(-) */ +#define AD7193_CH_AIN1P_AIN2M 0x001 /* AIN1(+) - AIN2(-) */ +#define AD7193_CH_AIN3P_AIN4M 0x002 /* AIN3(+) - AIN4(-) */ +#define AD7193_CH_AIN5P_AIN6M 0x004 /* AIN5(+) - AIN6(-) */ +#define AD7193_CH_AIN7P_AIN8M 0x008 /* AIN7(+) - AIN8(-) */ #define AD7193_CH_TEMP 0x100 /* Temp senseor */ #define AD7193_CH_AIN2P_AIN2M 0x200 /* AIN2(+) - AIN2(-) */ #define AD7193_CH_AIN1 0x401 /* AIN1 - AINCOM */ From 8bd3fd46ec23375740e7f29bafce4a11f6cff1d2 Mon Sep 17 00:00:00 2001 From: Sergey Larin Date: Sat, 2 Mar 2019 19:54:55 +0300 Subject: [PATCH 36/99] iio: gyro: mpu3050: fix chip ID reading commit 409a51e0a4a5f908763191fae2c29008632eb712 upstream. According to the datasheet, the last bit of CHIP_ID register controls I2C bus, and the first one is unused. Handle this correctly. Note that there are chips out there that have a value such that the id check currently fails. Signed-off-by: Sergey Larin Reviewed-by: Linus Walleij Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/gyro/mpu3050-core.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/iio/gyro/mpu3050-core.c b/drivers/iio/gyro/mpu3050-core.c index 77fac81a3adc..5ddebede31a6 100644 --- a/drivers/iio/gyro/mpu3050-core.c +++ b/drivers/iio/gyro/mpu3050-core.c @@ -29,7 +29,8 @@ #include "mpu3050.h" -#define MPU3050_CHIP_ID 0x69 +#define MPU3050_CHIP_ID 0x68 +#define MPU3050_CHIP_ID_MASK 0x7E /* * Register map: anything suffixed *_H is a big-endian high byte and always @@ -1176,8 +1177,9 @@ int mpu3050_common_probe(struct device *dev, goto err_power_down; } - if (val != MPU3050_CHIP_ID) { - dev_err(dev, "unsupported chip id %02x\n", (u8)val); + if ((val & MPU3050_CHIP_ID_MASK) != MPU3050_CHIP_ID) { + dev_err(dev, "unsupported chip id %02x\n", + (u8)(val & MPU3050_CHIP_ID_MASK)); ret = -ENODEV; goto err_power_down; } From f7ee6890caa59f71c4af92814afd4b4cb7dafcc9 Mon Sep 17 00:00:00 2001 From: Mike Looijmans Date: Wed, 13 Feb 2019 08:41:47 +0100 Subject: [PATCH 37/99] iio/gyro/bmg160: Use millidegrees for temperature scale commit 40a7198a4a01037003c7ca714f0d048a61e729ac upstream. Standard unit for temperature is millidegrees Celcius, whereas this driver was reporting in degrees. Fix the scale factor in the driver. Signed-off-by: Mike Looijmans Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/gyro/bmg160_core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iio/gyro/bmg160_core.c b/drivers/iio/gyro/bmg160_core.c index 63ca31628a93..92c07ab826eb 100644 --- a/drivers/iio/gyro/bmg160_core.c +++ b/drivers/iio/gyro/bmg160_core.c @@ -582,11 +582,10 @@ static int bmg160_read_raw(struct iio_dev *indio_dev, case IIO_CHAN_INFO_LOW_PASS_FILTER_3DB_FREQUENCY: return bmg160_get_filter(data, val); case IIO_CHAN_INFO_SCALE: - *val = 0; switch (chan->type) { case IIO_TEMP: - *val2 = 500000; - return IIO_VAL_INT_PLUS_MICRO; + *val = 500; + return IIO_VAL_INT; case IIO_ANGL_VEL: { int i; @@ -594,6 +593,7 @@ static int bmg160_read_raw(struct iio_dev *indio_dev, for (i = 0; i < ARRAY_SIZE(bmg160_scale_table); ++i) { if (bmg160_scale_table[i].dps_range == data->dps_range) { + *val = 0; *val2 = bmg160_scale_table[i].scale; return IIO_VAL_INT_PLUS_MICRO; } From a3117576a73f6b2df88d96fb6ae0402bba0c24b3 Mon Sep 17 00:00:00 2001 From: Mike Looijmans Date: Wed, 6 Mar 2019 08:31:47 +0100 Subject: [PATCH 38/99] iio:chemical:bme680: Fix, report temperature in millidegrees commit 9436f45dd53595e21566a8c6627411077dfdb776 upstream. The standard unit for temperature is millidegrees Celcius. Adapt the driver to report in millidegrees instead of degrees. Signed-off-by: Mike Looijmans Fixes: 1b3bd8592780 ("iio: chemical: Add support for Bosch BME680 sensor"); Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/chemical/bme680_core.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/iio/chemical/bme680_core.c b/drivers/iio/chemical/bme680_core.c index 7d9bb62baa3f..ca665e872be9 100644 --- a/drivers/iio/chemical/bme680_core.c +++ b/drivers/iio/chemical/bme680_core.c @@ -591,8 +591,7 @@ static int bme680_gas_config(struct bme680_data *data) return ret; } -static int bme680_read_temp(struct bme680_data *data, - int *val, int *val2) +static int bme680_read_temp(struct bme680_data *data, int *val) { struct device *dev = regmap_get_device(data->regmap); int ret; @@ -625,10 +624,9 @@ static int bme680_read_temp(struct bme680_data *data, * compensate_press/compensate_humid to get compensated * pressure/humidity readings. */ - if (val && val2) { - *val = comp_temp; - *val2 = 100; - return IIO_VAL_FRACTIONAL; + if (val) { + *val = comp_temp * 10; /* Centidegrees to millidegrees */ + return IIO_VAL_INT; } return ret; @@ -643,7 +641,7 @@ static int bme680_read_press(struct bme680_data *data, s32 adc_press; /* Read and compensate temperature to get a reading of t_fine */ - ret = bme680_read_temp(data, NULL, NULL); + ret = bme680_read_temp(data, NULL); if (ret < 0) return ret; @@ -676,7 +674,7 @@ static int bme680_read_humid(struct bme680_data *data, u32 comp_humidity; /* Read and compensate temperature to get a reading of t_fine */ - ret = bme680_read_temp(data, NULL, NULL); + ret = bme680_read_temp(data, NULL); if (ret < 0) return ret; @@ -769,7 +767,7 @@ static int bme680_read_raw(struct iio_dev *indio_dev, case IIO_CHAN_INFO_PROCESSED: switch (chan->type) { case IIO_TEMP: - return bme680_read_temp(data, val, val2); + return bme680_read_temp(data, val); case IIO_PRESSURE: return bme680_read_press(data, val, val2); case IIO_HUMIDITYRELATIVE: From adfb0f0b17a3b1bdef44a5249def720534d7ac33 Mon Sep 17 00:00:00 2001 From: Mike Looijmans Date: Wed, 6 Mar 2019 08:31:48 +0100 Subject: [PATCH 39/99] iio:chemical:bme680: Fix SPI read interface commit 73f3bc6da506711302bb67572440eb84b1ec4a2c upstream. The SPI interface implementation was completely broken. When using the SPI interface, there are only 7 address bits, the upper bit is controlled by a page select register. The core needs access to both ranges, so implement register read/write for both regions. The regmap paging functionality didn't agree with a register that needs to be read and modified, so I implemented a custom paging algorithm. This fixes that the device wouldn't even probe in SPI mode. The SPI interface then isn't different from I2C, merged them into the core, and the I2C/SPI named registers are no longer needed. Implemented register value caching for the registers to reduce the I2C/SPI data transfers considerably. The calibration set reads as all zeroes until some undefined point in time, and I couldn't determine what makes it valid. The datasheet mentions these registers but does not provide any hints on when they become valid, and they aren't even enumerated in the memory map. So check the calibration and retry reading it from the device after each measurement until it provides something valid. Despite the size this is suitable for a stable backport given that it seems the SPI support never worked. Signed-off-by: Mike Looijmans Fixes: 1b3bd8592780 ("iio: chemical: Add support for Bosch BME680 sensor"); Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/chemical/bme680.h | 6 +- drivers/iio/chemical/bme680_core.c | 38 ++++++++++ drivers/iio/chemical/bme680_i2c.c | 21 ------ drivers/iio/chemical/bme680_spi.c | 115 +++++++++++++++++++---------- 4 files changed, 118 insertions(+), 62 deletions(-) diff --git a/drivers/iio/chemical/bme680.h b/drivers/iio/chemical/bme680.h index e049323f209a..71dd635fce2d 100644 --- a/drivers/iio/chemical/bme680.h +++ b/drivers/iio/chemical/bme680.h @@ -2,11 +2,9 @@ #ifndef BME680_H_ #define BME680_H_ -#define BME680_REG_CHIP_I2C_ID 0xD0 -#define BME680_REG_CHIP_SPI_ID 0x50 +#define BME680_REG_CHIP_ID 0xD0 #define BME680_CHIP_ID_VAL 0x61 -#define BME680_REG_SOFT_RESET_I2C 0xE0 -#define BME680_REG_SOFT_RESET_SPI 0x60 +#define BME680_REG_SOFT_RESET 0xE0 #define BME680_CMD_SOFTRESET 0xB6 #define BME680_REG_STATUS 0x73 #define BME680_SPI_MEM_PAGE_BIT BIT(4) diff --git a/drivers/iio/chemical/bme680_core.c b/drivers/iio/chemical/bme680_core.c index ca665e872be9..b2db59812755 100644 --- a/drivers/iio/chemical/bme680_core.c +++ b/drivers/iio/chemical/bme680_core.c @@ -63,9 +63,23 @@ struct bme680_data { s32 t_fine; }; +static const struct regmap_range bme680_volatile_ranges[] = { + regmap_reg_range(BME680_REG_MEAS_STAT_0, BME680_REG_GAS_R_LSB), + regmap_reg_range(BME680_REG_STATUS, BME680_REG_STATUS), + regmap_reg_range(BME680_T2_LSB_REG, BME680_GH3_REG), +}; + +static const struct regmap_access_table bme680_volatile_table = { + .yes_ranges = bme680_volatile_ranges, + .n_yes_ranges = ARRAY_SIZE(bme680_volatile_ranges), +}; + const struct regmap_config bme680_regmap_config = { .reg_bits = 8, .val_bits = 8, + .max_register = 0xef, + .volatile_table = &bme680_volatile_table, + .cache_type = REGCACHE_RBTREE, }; EXPORT_SYMBOL(bme680_regmap_config); @@ -330,6 +344,10 @@ static s16 bme680_compensate_temp(struct bme680_data *data, s64 var1, var2, var3; s16 calc_temp; + /* If the calibration is invalid, attempt to reload it */ + if (!calib->par_t2) + bme680_read_calib(data, calib); + var1 = (adc_temp >> 3) - (calib->par_t1 << 1); var2 = (var1 * calib->par_t2) >> 11; var3 = ((var1 >> 1) * (var1 >> 1)) >> 12; @@ -903,8 +921,28 @@ int bme680_core_probe(struct device *dev, struct regmap *regmap, { struct iio_dev *indio_dev; struct bme680_data *data; + unsigned int val; int ret; + ret = regmap_write(regmap, BME680_REG_SOFT_RESET, + BME680_CMD_SOFTRESET); + if (ret < 0) { + dev_err(dev, "Failed to reset chip\n"); + return ret; + } + + ret = regmap_read(regmap, BME680_REG_CHIP_ID, &val); + if (ret < 0) { + dev_err(dev, "Error reading chip ID\n"); + return ret; + } + + if (val != BME680_CHIP_ID_VAL) { + dev_err(dev, "Wrong chip ID, got %x expected %x\n", + val, BME680_CHIP_ID_VAL); + return -ENODEV; + } + indio_dev = devm_iio_device_alloc(dev, sizeof(*data)); if (!indio_dev) return -ENOMEM; diff --git a/drivers/iio/chemical/bme680_i2c.c b/drivers/iio/chemical/bme680_i2c.c index 06d4be539d2e..cfc4449edf1b 100644 --- a/drivers/iio/chemical/bme680_i2c.c +++ b/drivers/iio/chemical/bme680_i2c.c @@ -23,8 +23,6 @@ static int bme680_i2c_probe(struct i2c_client *client, { struct regmap *regmap; const char *name = NULL; - unsigned int val; - int ret; regmap = devm_regmap_init_i2c(client, &bme680_regmap_config); if (IS_ERR(regmap)) { @@ -33,25 +31,6 @@ static int bme680_i2c_probe(struct i2c_client *client, return PTR_ERR(regmap); } - ret = regmap_write(regmap, BME680_REG_SOFT_RESET_I2C, - BME680_CMD_SOFTRESET); - if (ret < 0) { - dev_err(&client->dev, "Failed to reset chip\n"); - return ret; - } - - ret = regmap_read(regmap, BME680_REG_CHIP_I2C_ID, &val); - if (ret < 0) { - dev_err(&client->dev, "Error reading I2C chip ID\n"); - return ret; - } - - if (val != BME680_CHIP_ID_VAL) { - dev_err(&client->dev, "Wrong chip ID, got %x expected %x\n", - val, BME680_CHIP_ID_VAL); - return -ENODEV; - } - if (id) name = id->name; diff --git a/drivers/iio/chemical/bme680_spi.c b/drivers/iio/chemical/bme680_spi.c index c9fb05e8d0b9..881778e55d38 100644 --- a/drivers/iio/chemical/bme680_spi.c +++ b/drivers/iio/chemical/bme680_spi.c @@ -11,28 +11,93 @@ #include "bme680.h" +struct bme680_spi_bus_context { + struct spi_device *spi; + u8 current_page; +}; + +/* + * In SPI mode there are only 7 address bits, a "page" register determines + * which part of the 8-bit range is active. This function looks at the address + * and writes the page selection bit if needed + */ +static int bme680_regmap_spi_select_page( + struct bme680_spi_bus_context *ctx, u8 reg) +{ + struct spi_device *spi = ctx->spi; + int ret; + u8 buf[2]; + u8 page = (reg & 0x80) ? 0 : 1; /* Page "1" is low range */ + + if (page == ctx->current_page) + return 0; + + /* + * Data sheet claims we're only allowed to change bit 4, so we must do + * a read-modify-write on each and every page select + */ + buf[0] = BME680_REG_STATUS; + ret = spi_write_then_read(spi, buf, 1, buf + 1, 1); + if (ret < 0) { + dev_err(&spi->dev, "failed to set page %u\n", page); + return ret; + } + + buf[0] = BME680_REG_STATUS; + if (page) + buf[1] |= BME680_SPI_MEM_PAGE_BIT; + else + buf[1] &= ~BME680_SPI_MEM_PAGE_BIT; + + ret = spi_write(spi, buf, 2); + if (ret < 0) { + dev_err(&spi->dev, "failed to set page %u\n", page); + return ret; + } + + ctx->current_page = page; + + return 0; +} + static int bme680_regmap_spi_write(void *context, const void *data, size_t count) { - struct spi_device *spi = context; + struct bme680_spi_bus_context *ctx = context; + struct spi_device *spi = ctx->spi; + int ret; u8 buf[2]; memcpy(buf, data, 2); + + ret = bme680_regmap_spi_select_page(ctx, buf[0]); + if (ret) + return ret; + /* * The SPI register address (= full register address without bit 7) * and the write command (bit7 = RW = '0') */ buf[0] &= ~0x80; - return spi_write_then_read(spi, buf, 2, NULL, 0); + return spi_write(spi, buf, 2); } static int bme680_regmap_spi_read(void *context, const void *reg, size_t reg_size, void *val, size_t val_size) { - struct spi_device *spi = context; + struct bme680_spi_bus_context *ctx = context; + struct spi_device *spi = ctx->spi; + int ret; + u8 addr = *(const u8 *)reg; - return spi_write_then_read(spi, reg, reg_size, val, val_size); + ret = bme680_regmap_spi_select_page(ctx, addr); + if (ret) + return ret; + + addr |= 0x80; /* bit7 = RW = '1' */ + + return spi_write_then_read(spi, &addr, 1, val, val_size); } static struct regmap_bus bme680_regmap_bus = { @@ -45,8 +110,8 @@ static struct regmap_bus bme680_regmap_bus = { static int bme680_spi_probe(struct spi_device *spi) { const struct spi_device_id *id = spi_get_device_id(spi); + struct bme680_spi_bus_context *bus_context; struct regmap *regmap; - unsigned int val; int ret; spi->bits_per_word = 8; @@ -56,45 +121,21 @@ static int bme680_spi_probe(struct spi_device *spi) return ret; } + bus_context = devm_kzalloc(&spi->dev, sizeof(*bus_context), GFP_KERNEL); + if (!bus_context) + return -ENOMEM; + + bus_context->spi = spi; + bus_context->current_page = 0xff; /* Undefined on warm boot */ + regmap = devm_regmap_init(&spi->dev, &bme680_regmap_bus, - &spi->dev, &bme680_regmap_config); + bus_context, &bme680_regmap_config); if (IS_ERR(regmap)) { dev_err(&spi->dev, "Failed to register spi regmap %d\n", (int)PTR_ERR(regmap)); return PTR_ERR(regmap); } - ret = regmap_write(regmap, BME680_REG_SOFT_RESET_SPI, - BME680_CMD_SOFTRESET); - if (ret < 0) { - dev_err(&spi->dev, "Failed to reset chip\n"); - return ret; - } - - /* after power-on reset, Page 0(0x80-0xFF) of spi_mem_page is active */ - ret = regmap_read(regmap, BME680_REG_CHIP_SPI_ID, &val); - if (ret < 0) { - dev_err(&spi->dev, "Error reading SPI chip ID\n"); - return ret; - } - - if (val != BME680_CHIP_ID_VAL) { - dev_err(&spi->dev, "Wrong chip ID, got %x expected %x\n", - val, BME680_CHIP_ID_VAL); - return -ENODEV; - } - /* - * select Page 1 of spi_mem_page to enable access to - * to registers from address 0x00 to 0x7F. - */ - ret = regmap_write_bits(regmap, BME680_REG_STATUS, - BME680_SPI_MEM_PAGE_BIT, - BME680_SPI_MEM_PAGE_1_VAL); - if (ret < 0) { - dev_err(&spi->dev, "failed to set page 1 of spi_mem_page\n"); - return ret; - } - return bme680_core_probe(&spi->dev, regmap, id->name); } From 42eae0cff22a75823abfe198fce4418461b7cc6b Mon Sep 17 00:00:00 2001 From: Gwendal Grignou Date: Wed, 13 Mar 2019 12:40:02 +0100 Subject: [PATCH 40/99] iio: cros_ec: Fix the maths for gyro scale calculation commit 3d02d7082e5823598090530c3988a35f69689943 upstream. Calculation did not use IIO_DEGREE_TO_RAD and implemented a variant to avoid precision loss as we aim a nano value. The offset added to avoid rounding error, though, doesn't give us a close result to the expected value. E.g. For 1000dps, the result should be: (1000 * pi ) / 180 >> 15 ~= 0.000532632218 But with current calculation we get $ cat scale 0.000547890 Fix the calculation by just doing the maths involved for a nano value val * pi * 10e12 / (180 * 2^15) so we get a closer result. $ cat scale 0.000532632 Fixes: c14dca07a31d ("iio: cros_ec_sensors: add ChromeOS EC Contiguous Sensors driver") Signed-off-by: Gwendal Grignou Signed-off-by: Enric Balletbo i Serra Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c index 89cb0066a6e0..8d76afb87d87 100644 --- a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c +++ b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c @@ -103,9 +103,10 @@ static int cros_ec_sensors_read(struct iio_dev *indio_dev, * Do not use IIO_DEGREE_TO_RAD to avoid precision * loss. Round to the nearest integer. */ - *val = div_s64(val64 * 314159 + 9000000ULL, 1000); - *val2 = 18000 << (CROS_EC_SENSOR_BITS - 1); - ret = IIO_VAL_FRACTIONAL; + *val = 0; + *val2 = div_s64(val64 * 3141592653ULL, + 180 << (CROS_EC_SENSOR_BITS - 1)); + ret = IIO_VAL_INT_PLUS_NANO; break; case MOTIONSENSE_TYPE_MAG: /* From 5ad173ea6c3a906c80c9e1afb3960c859d9ab037 Mon Sep 17 00:00:00 2001 From: Dragos Bogdan Date: Tue, 19 Mar 2019 12:47:00 +0200 Subject: [PATCH 41/99] iio: ad_sigma_delta: select channel when reading register commit fccfb9ce70ed4ea7a145f77b86de62e38178517f upstream. The desired channel has to be selected in order to correctly fill the buffer with the corresponding data. The `ad_sd_write_reg()` already does this, but for the `ad_sd_read_reg_raw()` this was omitted. Fixes: af3008485ea03 ("iio:adc: Add common code for ADI Sigma Delta devices") Signed-off-by: Dragos Bogdan Signed-off-by: Alexandru Ardelean Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/ad_sigma_delta.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iio/adc/ad_sigma_delta.c b/drivers/iio/adc/ad_sigma_delta.c index fc9510716ac7..ae2a5097f449 100644 --- a/drivers/iio/adc/ad_sigma_delta.c +++ b/drivers/iio/adc/ad_sigma_delta.c @@ -121,6 +121,7 @@ static int ad_sd_read_reg_raw(struct ad_sigma_delta *sigma_delta, if (sigma_delta->info->has_registers) { data[0] = reg << sigma_delta->info->addr_shift; data[0] |= sigma_delta->info->read_mask; + data[0] |= sigma_delta->comm; spi_message_add_tail(&t[0], &m); } spi_message_add_tail(&t[1], &m); From 0e47edde91325efde88e937e1ccc5883c32b5a31 Mon Sep 17 00:00:00 2001 From: Jean-Francois Dagenais Date: Wed, 6 Mar 2019 15:56:06 -0500 Subject: [PATCH 42/99] iio: dac: mcp4725: add missing powerdown bits in store eeprom commit 06003531502d06bc89d32528f6ec96bf978790f9 upstream. When issuing the write DAC register and write eeprom command, the two powerdown bits (PD0 and PD1) are assumed by the chip to be present in the bytes sent. Leaving them at 0 implies "powerdown disabled" which is a different state that the current one. By adding the current state of the powerdown in the i2c write, the chip will correctly power-on exactly like as it is at the moment of store_eeprom call. This is documented in MCP4725's datasheet, FIGURE 6-2: "Write Commands for DAC Input Register and EEPROM" and MCP4726's datasheet, FIGURE 6-3: "Write All Memory Command". Signed-off-by: Jean-Francois Dagenais Acked-by: Peter Meerwald-Stadler Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/dac/mcp4725.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iio/dac/mcp4725.c b/drivers/iio/dac/mcp4725.c index 8b5aad4c32d9..30dc2775cbfb 100644 --- a/drivers/iio/dac/mcp4725.c +++ b/drivers/iio/dac/mcp4725.c @@ -98,6 +98,7 @@ static ssize_t mcp4725_store_eeprom(struct device *dev, inoutbuf[0] = 0x60; /* write EEPROM */ inoutbuf[0] |= data->ref_mode << 3; + inoutbuf[0] |= data->powerdown ? ((data->powerdown_mode + 1) << 1) : 0; inoutbuf[1] = data->dac_value >> 4; inoutbuf[2] = (data->dac_value & 0xf) << 4; From 36971130bb2f815fddae17cfe02ae5fed9b3ff56 Mon Sep 17 00:00:00 2001 From: Lars-Peter Clausen Date: Wed, 20 Feb 2019 17:11:32 +0200 Subject: [PATCH 43/99] iio: Fix scan mask selection commit 20ea39ef9f2f911bd01c69519e7d69cfec79fde3 upstream. The trialmask is expected to have all bits set to 0 after allocation. Currently kmalloc_array() is used which does not zero the memory and so random bits are set. This results in random channels being enabled when they shouldn't. Replace kmalloc_array() with kcalloc() which has the same interface but zeros the memory. Note the fix is actually required earlier than the below fixes tag, but will require a manual backport due to move from kmalloc to kmalloc_array. Signed-off-by: Lars-Peter Clausen Signed-off-by: Alexandru Ardelean Fixes commit 057ac1acdfc4 ("iio: Use kmalloc_array() in iio_scan_mask_set()"). Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/industrialio-buffer.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c index cd5bfe39591b..dadd921a4a30 100644 --- a/drivers/iio/industrialio-buffer.c +++ b/drivers/iio/industrialio-buffer.c @@ -320,9 +320,8 @@ static int iio_scan_mask_set(struct iio_dev *indio_dev, const unsigned long *mask; unsigned long *trialmask; - trialmask = kmalloc_array(BITS_TO_LONGS(indio_dev->masklength), - sizeof(*trialmask), - GFP_KERNEL); + trialmask = kcalloc(BITS_TO_LONGS(indio_dev->masklength), + sizeof(*trialmask), GFP_KERNEL); if (trialmask == NULL) return -ENOMEM; if (!indio_dev->masklength) { From 98171e1947b6ce06fb7a439c0d8c599e0c090fcd Mon Sep 17 00:00:00 2001 From: Georg Ottinger Date: Wed, 30 Jan 2019 14:42:02 +0100 Subject: [PATCH 44/99] iio: adc: at91: disable adc channel interrupt in timeout case commit 09c6bdee51183a575bf7546890c8c137a75a2b44 upstream. Having a brief look at at91_adc_read_raw() it is obvious that in the case of a timeout the setting of AT91_ADC_CHDR and AT91_ADC_IDR registers is omitted. If 2 different channels are queried we can end up with a situation where two interrupts are enabled, but only one interrupt is cleared in the interrupt handler. Resulting in a interrupt loop and a system hang. Signed-off-by: Georg Ottinger Acked-by: Ludovic Desroches Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/at91_adc.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/iio/adc/at91_adc.c b/drivers/iio/adc/at91_adc.c index 75d2f73582a3..596841a3c4db 100644 --- a/drivers/iio/adc/at91_adc.c +++ b/drivers/iio/adc/at91_adc.c @@ -704,23 +704,29 @@ static int at91_adc_read_raw(struct iio_dev *idev, ret = wait_event_interruptible_timeout(st->wq_data_avail, st->done, msecs_to_jiffies(1000)); - if (ret == 0) - ret = -ETIMEDOUT; - if (ret < 0) { - mutex_unlock(&st->lock); - return ret; - } - - *val = st->last_value; + /* Disable interrupts, regardless if adc conversion was + * successful or not + */ at91_adc_writel(st, AT91_ADC_CHDR, AT91_ADC_CH(chan->channel)); at91_adc_writel(st, AT91_ADC_IDR, BIT(chan->channel)); - st->last_value = 0; - st->done = false; + if (ret > 0) { + /* a valid conversion took place */ + *val = st->last_value; + st->last_value = 0; + st->done = false; + ret = IIO_VAL_INT; + } else if (ret == 0) { + /* conversion timeout */ + dev_err(&idev->dev, "ADC Channel %d timeout.\n", + chan->channel); + ret = -ETIMEDOUT; + } + mutex_unlock(&st->lock); - return IIO_VAL_INT; + return ret; case IIO_CHAN_INFO_SCALE: *val = st->vref_mv; From bbe0bed4647ccc135a31c378b1257e9af6f9c4e9 Mon Sep 17 00:00:00 2001 From: Fabrice Gasnier Date: Mon, 25 Mar 2019 14:01:23 +0100 Subject: [PATCH 45/99] iio: core: fix a possible circular locking dependency commit 7f75591fc5a123929a29636834d1bcb8b5c9fee3 upstream. This fixes a possible circular locking dependency detected warning seen with: - CONFIG_PROVE_LOCKING=y - consumer/provider IIO devices (ex: "voltage-divider" consumer of "adc") When using the IIO consumer interface, e.g. iio_channel_get(), the consumer device will likely call iio_read_channel_raw() or similar that rely on 'info_exist_lock' mutex. typically: ... mutex_lock(&chan->indio_dev->info_exist_lock); if (chan->indio_dev->info == NULL) { ret = -ENODEV; goto err_unlock; } ret = do_some_ops() err_unlock: mutex_unlock(&chan->indio_dev->info_exist_lock); return ret; ... Same mutex is also hold in iio_device_unregister(). The following deadlock warning happens when: - the consumer device has called an API like iio_read_channel_raw() at least once. - the consumer driver is unregistered, removed (unbind from sysfs) ====================================================== WARNING: possible circular locking dependency detected 4.19.24 #577 Not tainted ------------------------------------------------------ sh/372 is trying to acquire lock: (kn->count#30){++++}, at: kernfs_remove_by_name_ns+0x3c/0x84 but task is already holding lock: (&dev->info_exist_lock){+.+.}, at: iio_device_unregister+0x18/0x60 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&dev->info_exist_lock){+.+.}: __mutex_lock+0x70/0xa3c mutex_lock_nested+0x1c/0x24 iio_read_channel_raw+0x1c/0x60 iio_read_channel_info+0xa8/0xb0 dev_attr_show+0x1c/0x48 sysfs_kf_seq_show+0x84/0xec seq_read+0x154/0x528 __vfs_read+0x2c/0x15c vfs_read+0x8c/0x110 ksys_read+0x4c/0xac ret_fast_syscall+0x0/0x28 0xbedefb60 -> #0 (kn->count#30){++++}: lock_acquire+0xd8/0x268 __kernfs_remove+0x288/0x374 kernfs_remove_by_name_ns+0x3c/0x84 remove_files+0x34/0x78 sysfs_remove_group+0x40/0x9c sysfs_remove_groups+0x24/0x34 device_remove_attrs+0x38/0x64 device_del+0x11c/0x360 cdev_device_del+0x14/0x2c iio_device_unregister+0x24/0x60 release_nodes+0x1bc/0x200 device_release_driver_internal+0x1a0/0x230 unbind_store+0x80/0x130 kernfs_fop_write+0x100/0x1e4 __vfs_write+0x2c/0x160 vfs_write+0xa4/0x17c ksys_write+0x4c/0xac ret_fast_syscall+0x0/0x28 0xbe906840 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&dev->info_exist_lock); lock(kn->count#30); lock(&dev->info_exist_lock); lock(kn->count#30); *** DEADLOCK *** ... cdev_device_del() can be called without holding the lock. It should be safe as info_exist_lock prevents kernelspace consumers to use the exported routines during/after provider removal. cdev_device_del() is for userspace. Help to reproduce: See example: Documentation/devicetree/bindings/iio/afe/voltage-divider.txt sysv { compatible = "voltage-divider"; io-channels = <&adc 0>; output-ohms = <22>; full-ohms = <222>; }; First, go to iio:deviceX for the "voltage-divider", do one read: $ cd /sys/bus/iio/devices/iio:deviceX $ cat in_voltage0_raw Then, unbind the consumer driver. It triggers above deadlock warning. $ cd /sys/bus/platform/drivers/iio-rescale/ $ echo sysv > unbind Note I don't actually expect stable will pick this up all the way back into IIO being in staging, but if's probably valid that far back. Signed-off-by: Fabrice Gasnier Fixes: ac917a81117c ("staging:iio:core set the iio_dev.info pointer to null on unregister") Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/industrialio-core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c index a062cfddc5af..49d4b4f1a457 100644 --- a/drivers/iio/industrialio-core.c +++ b/drivers/iio/industrialio-core.c @@ -1735,10 +1735,10 @@ EXPORT_SYMBOL(__iio_device_register); **/ void iio_device_unregister(struct iio_dev *indio_dev) { - mutex_lock(&indio_dev->info_exist_lock); - cdev_device_del(&indio_dev->chrdev, &indio_dev->dev); + mutex_lock(&indio_dev->info_exist_lock); + iio_device_unregister_debugfs(indio_dev); iio_disable_all_buffers(indio_dev); From b007c64d860f55db1064e4e7a56cbbabe87d1cc9 Mon Sep 17 00:00:00 2001 From: "he, bo" Date: Wed, 6 Mar 2019 10:32:20 +0800 Subject: [PATCH 46/99] io: accel: kxcjk1013: restore the range after resume. commit fe2d3df639a7940a125a33d6460529b9689c5406 upstream. On some laptops, kxcjk1013 is powered off when system enters S3. We need restore the range regiter during resume. Otherwise, the sensor doesn't work properly after S3. Signed-off-by: he, bo Signed-off-by: Chen, Hu Reviewed-by: Hans de Goede Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/accel/kxcjk-1013.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iio/accel/kxcjk-1013.c b/drivers/iio/accel/kxcjk-1013.c index 471caa5323e4..e5fdca74a630 100644 --- a/drivers/iio/accel/kxcjk-1013.c +++ b/drivers/iio/accel/kxcjk-1013.c @@ -1437,6 +1437,8 @@ static int kxcjk1013_resume(struct device *dev) mutex_lock(&data->mutex); ret = kxcjk1013_set_mode(data, OPERATION); + if (ret == 0) + ret = kxcjk1013_set_range(data, data->range); mutex_unlock(&data->mutex); return ret; From a1da981f66432d1d38c91cff0b19490b3bf7aace Mon Sep 17 00:00:00 2001 From: Christian Gromm Date: Tue, 2 Apr 2019 13:39:57 +0200 Subject: [PATCH 47/99] staging: most: core: use device description as name commit 131ac62253dba79daf4a6d83ab12293d2b9863d3 upstream. This patch uses the device description to clearly identity a device attached to the bus. It is needed as the currently useed mdevX notation is not sufficiant in case more than one network interface controller is being used at the same time. Cc: stable@vger.kernel.org Signed-off-by: Christian Gromm Signed-off-by: Greg Kroah-Hartman --- drivers/staging/most/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/most/core.c b/drivers/staging/most/core.c index 52ad62722996..25a077f4ea94 100644 --- a/drivers/staging/most/core.c +++ b/drivers/staging/most/core.c @@ -1412,7 +1412,7 @@ int most_register_interface(struct most_interface *iface) INIT_LIST_HEAD(&iface->p->channel_list); iface->p->dev_id = id; - snprintf(iface->p->name, STRING_SIZE, "mdev%d", id); + strcpy(iface->p->name, iface->description); iface->dev.init_name = iface->p->name; iface->dev.bus = &mc.bus; iface->dev.parent = &mc.dev; From 1f01a970b8c2de057c7c1cf3afce3d116fb6645b Mon Sep 17 00:00:00 2001 From: Ian Abbott Date: Mon, 15 Apr 2019 12:10:14 +0100 Subject: [PATCH 48/99] staging: comedi: vmk80xx: Fix use of uninitialized semaphore commit 08b7c2f9208f0e2a32159e4e7a4831b7adb10a3e upstream. If `vmk80xx_auto_attach()` returns an error, the core comedi module code will call `vmk80xx_detach()` to clean up. If `vmk80xx_auto_attach()` successfully allocated the comedi device private data, `vmk80xx_detach()` assumes that a `struct semaphore limit_sem` contained in the private data has been initialized and uses it. Unfortunately, there are a couple of places where `vmk80xx_auto_attach()` can return an error after allocating the device private data but before initializing the semaphore, so this assumption is invalid. Fix it by initializing the semaphore just after allocating the private data in `vmk80xx_auto_attach()` before any other errors can be returned. I believe this was the cause of the following syzbot crash report : usb 1-1: config 0 has no interface number 0 usb 1-1: New USB device found, idVendor=10cf, idProduct=8068, bcdDevice=e6.8d usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 usb 1-1: config 0 descriptor?? vmk80xx 1-1:0.117: driver 'vmk80xx' failed to auto-configure device. INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xe8/0x16e lib/dump_stack.c:113 assign_lock_key kernel/locking/lockdep.c:786 [inline] register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095 __lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582 lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x44/0x60 kernel/locking/spinlock.c:152 down+0x12/0x80 kernel/locking/semaphore.c:58 vmk80xx_detach+0x59/0x100 drivers/staging/comedi/drivers/vmk80xx.c:829 comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204 comedi_device_cleanup.part.0+0x68/0x140 drivers/staging/comedi/comedi_fops.c:156 comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline] comedi_free_board_dev.part.0+0x16/0x90 drivers/staging/comedi/comedi_fops.c:190 comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline] comedi_release_hardware_device+0x111/0x140 drivers/staging/comedi/comedi_fops.c:2880 comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068 usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361 really_probe+0x2da/0xb10 drivers/base/dd.c:509 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454 __device_attach+0x223/0x3a0 drivers/base/dd.c:844 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514 device_add+0xad2/0x16e0 drivers/base/core.c:2106 usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021 generic_probe+0xa2/0xda drivers/usb/core/generic.c:210 usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266 really_probe+0x2da/0xb10 drivers/base/dd.c:509 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454 __device_attach+0x223/0x3a0 drivers/base/dd.c:844 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514 device_add+0xad2/0x16e0 drivers/base/core.c:2106 usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534 hub_port_connect drivers/usb/core/hub.c:5089 [inline] hub_port_connect_change drivers/usb/core/hub.c:5204 [inline] port_event drivers/usb/core/hub.c:5350 [inline] hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432 process_one_work+0x90f/0x1580 kernel/workqueue.c:2269 worker_thread+0x9b/0xe20 kernel/workqueue.c:2415 kthread+0x313/0x420 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Reported-by: syzbot+54c2f58f15fe6876b6ad@syzkaller.appspotmail.com Signed-off-by: Ian Abbott Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/staging/comedi/drivers/vmk80xx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/comedi/drivers/vmk80xx.c b/drivers/staging/comedi/drivers/vmk80xx.c index 6234b649d887..b035d662390b 100644 --- a/drivers/staging/comedi/drivers/vmk80xx.c +++ b/drivers/staging/comedi/drivers/vmk80xx.c @@ -800,6 +800,8 @@ static int vmk80xx_auto_attach(struct comedi_device *dev, devpriv->model = board->model; + sema_init(&devpriv->limit_sem, 8); + ret = vmk80xx_find_usb_endpoints(dev); if (ret) return ret; @@ -808,8 +810,6 @@ static int vmk80xx_auto_attach(struct comedi_device *dev, if (ret) return ret; - sema_init(&devpriv->limit_sem, 8); - usb_set_intfdata(intf, devpriv); if (devpriv->model == VMK8055_MODEL) From edf2f548baa959e90f1a2f0fce737e18701070d5 Mon Sep 17 00:00:00 2001 From: Ian Abbott Date: Mon, 15 Apr 2019 12:52:30 +0100 Subject: [PATCH 49/99] staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf commit 663d294b4768bfd89e529e069bffa544a830b5bf upstream. `vmk80xx_alloc_usb_buffers()` is called from `vmk80xx_auto_attach()` to allocate RX and TX buffers for USB transfers. It allocates `devpriv->usb_rx_buf` followed by `devpriv->usb_tx_buf`. If the allocation of `devpriv->usb_tx_buf` fails, it frees `devpriv->usb_rx_buf`, leaving the pointer set dangling, and returns an error. Later, `vmk80xx_detach()` will be called from the core comedi module code to clean up. `vmk80xx_detach()` also frees both `devpriv->usb_rx_buf` and `devpriv->usb_tx_buf`, but `devpriv->usb_rx_buf` may have already been freed, leading to a double-free error. Fix it by removing the call to `kfree(devpriv->usb_rx_buf)` from `vmk80xx_alloc_usb_buffers()`, relying on `vmk80xx_detach()` to free the memory. Signed-off-by: Ian Abbott Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/staging/comedi/drivers/vmk80xx.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/staging/comedi/drivers/vmk80xx.c b/drivers/staging/comedi/drivers/vmk80xx.c index b035d662390b..65dc6c51037e 100644 --- a/drivers/staging/comedi/drivers/vmk80xx.c +++ b/drivers/staging/comedi/drivers/vmk80xx.c @@ -682,10 +682,8 @@ static int vmk80xx_alloc_usb_buffers(struct comedi_device *dev) size = usb_endpoint_maxp(devpriv->ep_tx); devpriv->usb_tx_buf = kzalloc(size, GFP_KERNEL); - if (!devpriv->usb_tx_buf) { - kfree(devpriv->usb_rx_buf); + if (!devpriv->usb_tx_buf) return -ENOMEM; - } return 0; } From 09f9bacae11874c6abe368fab7b093dd33ef11ea Mon Sep 17 00:00:00 2001 From: Ian Abbott Date: Mon, 15 Apr 2019 12:43:01 +0100 Subject: [PATCH 50/99] staging: comedi: ni_usb6501: Fix use of uninitialized mutex commit 660cf4ce9d0f3497cc7456eaa6d74c8b71d6282c upstream. If `ni6501_auto_attach()` returns an error, the core comedi module code will call `ni6501_detach()` to clean up. If `ni6501_auto_attach()` successfully allocated the comedi device private data, `ni6501_detach()` assumes that a `struct mutex mut` contained in the private data has been initialized and uses it. Unfortunately, there are a couple of places where `ni6501_auto_attach()` can return an error after allocating the device private data but before initializing the mutex, so this assumption is invalid. Fix it by initializing the mutex just after allocating the private data in `ni6501_auto_attach()` before any other errors can be retturned. Also move the call to `usb_set_intfdata()` just to keep the code a bit neater (either position for the call is fine). I believe this was the cause of the following syzbot crash report : usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 usb 1-1: config 0 descriptor?? usb 1-1: string descriptor 0 read error: -71 comedi comedi0: Wrong number of endpoints ni6501 1-1:0.233: driver 'ni6501' failed to auto-configure device. INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 585 Comm: kworker/0:3 Not tainted 5.1.0-rc4-319354-g9a33b36 #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xe8/0x16e lib/dump_stack.c:113 assign_lock_key kernel/locking/lockdep.c:786 [inline] register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095 __lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582 lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211 __mutex_lock_common kernel/locking/mutex.c:925 [inline] __mutex_lock+0xfe/0x12b0 kernel/locking/mutex.c:1072 ni6501_detach+0x5b/0x110 drivers/staging/comedi/drivers/ni_usb6501.c:567 comedi_device_detach+0xed/0x800 drivers/staging/comedi/drivers.c:204 comedi_device_cleanup.part.0+0x68/0x140 drivers/staging/comedi/comedi_fops.c:156 comedi_device_cleanup drivers/staging/comedi/comedi_fops.c:187 [inline] comedi_free_board_dev.part.0+0x16/0x90 drivers/staging/comedi/comedi_fops.c:190 comedi_free_board_dev drivers/staging/comedi/comedi_fops.c:189 [inline] comedi_release_hardware_device+0x111/0x140 drivers/staging/comedi/comedi_fops.c:2880 comedi_auto_config.cold+0x124/0x1b0 drivers/staging/comedi/drivers.c:1068 usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361 really_probe+0x2da/0xb10 drivers/base/dd.c:509 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454 __device_attach+0x223/0x3a0 drivers/base/dd.c:844 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514 device_add+0xad2/0x16e0 drivers/base/core.c:2106 usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021 generic_probe+0xa2/0xda drivers/usb/core/generic.c:210 usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266 really_probe+0x2da/0xb10 drivers/base/dd.c:509 driver_probe_device+0x21d/0x350 drivers/base/dd.c:671 __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454 __device_attach+0x223/0x3a0 drivers/base/dd.c:844 bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514 device_add+0xad2/0x16e0 drivers/base/core.c:2106 usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534 hub_port_connect drivers/usb/core/hub.c:5089 [inline] hub_port_connect_change drivers/usb/core/hub.c:5204 [inline] port_event drivers/usb/core/hub.c:5350 [inline] hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432 process_one_work+0x90f/0x1580 kernel/workqueue.c:2269 worker_thread+0x9b/0xe20 kernel/workqueue.c:2415 kthread+0x313/0x420 kernel/kthread.c:253 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Reported-by: syzbot+cf4f2b6c24aff0a3edf6@syzkaller.appspotmail.com Signed-off-by: Ian Abbott Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/staging/comedi/drivers/ni_usb6501.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/comedi/drivers/ni_usb6501.c b/drivers/staging/comedi/drivers/ni_usb6501.c index 808ed92ed66f..ed5e42655821 100644 --- a/drivers/staging/comedi/drivers/ni_usb6501.c +++ b/drivers/staging/comedi/drivers/ni_usb6501.c @@ -518,6 +518,9 @@ static int ni6501_auto_attach(struct comedi_device *dev, if (!devpriv) return -ENOMEM; + mutex_init(&devpriv->mut); + usb_set_intfdata(intf, devpriv); + ret = ni6501_find_endpoints(dev); if (ret) return ret; @@ -526,9 +529,6 @@ static int ni6501_auto_attach(struct comedi_device *dev, if (ret) return ret; - mutex_init(&devpriv->mut); - usb_set_intfdata(intf, devpriv); - ret = comedi_alloc_subdevices(dev, 2); if (ret) return ret; From 4e78a1fb8d1d0948aafac07465779e821d9dda2e Mon Sep 17 00:00:00 2001 From: Ian Abbott Date: Mon, 15 Apr 2019 12:43:02 +0100 Subject: [PATCH 51/99] staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf commit af4b54a2e5ba18259ff9aac445bf546dd60d037e upstream. `ni6501_alloc_usb_buffers()` is called from `ni6501_auto_attach()` to allocate RX and TX buffers for USB transfers. It allocates `devpriv->usb_rx_buf` followed by `devpriv->usb_tx_buf`. If the allocation of `devpriv->usb_tx_buf` fails, it frees `devpriv->usb_rx_buf`, leaving the pointer set dangling, and returns an error. Later, `ni6501_detach()` will be called from the core comedi module code to clean up. `ni6501_detach()` also frees both `devpriv->usb_rx_buf` and `devpriv->usb_tx_buf`, but `devpriv->usb_rx_buf` may have already beed freed, leading to a double-free error. Fix it bu removing the call to `kfree(devpriv->usb_rx_buf)` from `ni6501_alloc_usb_buffers()`, relying on `ni6501_detach()` to free the memory. Signed-off-by: Ian Abbott Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/staging/comedi/drivers/ni_usb6501.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/staging/comedi/drivers/ni_usb6501.c b/drivers/staging/comedi/drivers/ni_usb6501.c index ed5e42655821..1bb1cb651349 100644 --- a/drivers/staging/comedi/drivers/ni_usb6501.c +++ b/drivers/staging/comedi/drivers/ni_usb6501.c @@ -463,10 +463,8 @@ static int ni6501_alloc_usb_buffers(struct comedi_device *dev) size = usb_endpoint_maxp(devpriv->ep_tx); devpriv->usb_tx_buf = kzalloc(size, GFP_KERNEL); - if (!devpriv->usb_tx_buf) { - kfree(devpriv->usb_rx_buf); + if (!devpriv->usb_tx_buf) return -ENOMEM; - } return 0; } From 4171b6ee932828ad2c8258f4a0a84a2d1c05c896 Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Wed, 17 Apr 2019 16:10:32 +0800 Subject: [PATCH 52/99] ALSA: hda/realtek - add two more pin configuration sets to quirk table commit b26e36b7ef36a8a3a147b1609b2505f8a4ecf511 upstream. We have two Dell laptops which have the codec 10ec0236 and 10ec0256 respectively, the headset mic on them can't work, need to apply the quirk of ALC255_FIXUP_DELL1_MIC_NO_PRESENCE. So adding their pin configurations in the pin quirk table. Cc: Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index bd60eb7168fa..0a745d677b1c 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7170,6 +7170,8 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = { {0x12, 0x90a60140}, {0x14, 0x90170150}, {0x21, 0x02211020}), + SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + {0x21, 0x02211020}), SND_HDA_PIN_QUIRK(0x10ec0255, 0x1028, "Dell", ALC255_FIXUP_DELL2_MIC_NO_PRESENCE, {0x14, 0x90170110}, {0x21, 0x02211020}), @@ -7280,6 +7282,10 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = { {0x21, 0x0221101f}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, ALC256_STANDARD_PINS), + SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + {0x14, 0x90170110}, + {0x1b, 0x01011020}, + {0x21, 0x0221101f}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC256_FIXUP_ASUS_MIC, {0x14, 0x90170110}, {0x1b, 0x90a70130}, From b50e435df2d8b9a1d3e956e1c767dfc7e30a441b Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Tue, 16 Apr 2019 17:06:33 +0200 Subject: [PATCH 53/99] ALSA: core: Fix card races between register and disconnect commit 2a3f7221acddfe1caa9ff09b3a8158c39b2fdeac upstream. There is a small race window in the card disconnection code that allows the registration of another card with the very same card id. This leads to a warning in procfs creation as caught by syzkaller. The problem is that we delete snd_cards and snd_cards_lock entries at the very beginning of the disconnection procedure. This makes the slot available to be assigned for another card object while the disconnection procedure is being processed. Then it becomes possible to issue a procfs registration with the existing file name although we check the conflict beforehand. The fix is simply to move the snd_cards and snd_cards_lock clearances at the end of the disconnection procedure. The references to these entries are merely either from the global proc files like /proc/asound/cards or from the card registration / disconnection, so it should be fine to shift at the very end. Reported-by: syzbot+48df349490c36f9f54ab@syzkaller.appspotmail.com Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/init.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/sound/core/init.c b/sound/core/init.c index 4849c611c0fe..16b7cc7aa66b 100644 --- a/sound/core/init.c +++ b/sound/core/init.c @@ -407,14 +407,7 @@ int snd_card_disconnect(struct snd_card *card) card->shutdown = 1; spin_unlock(&card->files_lock); - /* phase 1: disable fops (user space) operations for ALSA API */ - mutex_lock(&snd_card_mutex); - snd_cards[card->number] = NULL; - clear_bit(card->number, snd_cards_lock); - mutex_unlock(&snd_card_mutex); - - /* phase 2: replace file->f_op with special dummy operations */ - + /* replace file->f_op with special dummy operations */ spin_lock(&card->files_lock); list_for_each_entry(mfile, &card->files_list, list) { /* it's critical part, use endless loop */ @@ -430,7 +423,7 @@ int snd_card_disconnect(struct snd_card *card) } spin_unlock(&card->files_lock); - /* phase 3: notify all connected devices about disconnection */ + /* notify all connected devices about disconnection */ /* at this point, they cannot respond to any calls except release() */ #if IS_ENABLED(CONFIG_SND_MIXER_OSS) @@ -446,6 +439,13 @@ int snd_card_disconnect(struct snd_card *card) device_del(&card->card_dev); card->registered = false; } + + /* disable fops (user space) operations for ALSA API */ + mutex_lock(&snd_card_mutex); + snd_cards[card->number] = NULL; + clear_bit(card->number, snd_cards_lock); + mutex_unlock(&snd_card_mutex); + #ifdef CONFIG_PM wake_up(&card->power_sleep); #endif From ec96f65e121459eda46de186750a7e9646306288 Mon Sep 17 00:00:00 2001 From: KT Liao Date: Tue, 26 Mar 2019 17:28:32 -0700 Subject: [PATCH 54/99] Input: elan_i2c - add hardware ID for multiple Lenovo laptops commit 738c06d0e4562e0acf9f2c7438a22b2d5afc67aa upstream. There are many Lenovo laptops which need elan_i2c support, this patch adds relevant IDs to the Elan driver so that touchpads are recognized. Signed-off-by: KT Liao Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elan_i2c_core.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/drivers/input/mouse/elan_i2c_core.c b/drivers/input/mouse/elan_i2c_core.c index 628ef617bb2f..f9525d6f0bfe 100644 --- a/drivers/input/mouse/elan_i2c_core.c +++ b/drivers/input/mouse/elan_i2c_core.c @@ -1339,21 +1339,46 @@ static const struct acpi_device_id elan_acpi_id[] = { { "ELAN0600", 0 }, { "ELAN0601", 0 }, { "ELAN0602", 0 }, + { "ELAN0603", 0 }, + { "ELAN0604", 0 }, { "ELAN0605", 0 }, + { "ELAN0606", 0 }, + { "ELAN0607", 0 }, { "ELAN0608", 0 }, { "ELAN0609", 0 }, { "ELAN060B", 0 }, { "ELAN060C", 0 }, + { "ELAN060F", 0 }, + { "ELAN0610", 0 }, { "ELAN0611", 0 }, { "ELAN0612", 0 }, + { "ELAN0615", 0 }, + { "ELAN0616", 0 }, { "ELAN0617", 0 }, { "ELAN0618", 0 }, + { "ELAN0619", 0 }, + { "ELAN061A", 0 }, + { "ELAN061B", 0 }, { "ELAN061C", 0 }, { "ELAN061D", 0 }, { "ELAN061E", 0 }, + { "ELAN061F", 0 }, { "ELAN0620", 0 }, { "ELAN0621", 0 }, { "ELAN0622", 0 }, + { "ELAN0623", 0 }, + { "ELAN0624", 0 }, + { "ELAN0625", 0 }, + { "ELAN0626", 0 }, + { "ELAN0627", 0 }, + { "ELAN0628", 0 }, + { "ELAN0629", 0 }, + { "ELAN062A", 0 }, + { "ELAN062B", 0 }, + { "ELAN062C", 0 }, + { "ELAN062D", 0 }, + { "ELAN0631", 0 }, + { "ELAN0632", 0 }, { "ELAN1000", 0 }, { } }; From de6d6b8902fb2d26a46d266841364c3bfa70dc41 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 29 Mar 2019 10:10:26 +0100 Subject: [PATCH 55/99] serial: sh-sci: Fix HSCIF RX sampling point adjustment commit 6b87784b53592a90d21576be8eff688b56d93cce upstream. The calculation of the sampling point has min() and max() exchanged. Fix this by using the clamp() helper instead. Fixes: 63ba1e00f178a448 ("serial: sh-sci: Support for HSCIF RX sampling point adjustment") Signed-off-by: Geert Uytterhoeven Reviewed-by: Ulrich Hecht Reviewed-by: Wolfram Sang Acked-by: Dirk Behme Cc: stable Reviewed-by: Simon Horman Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/sh-sci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c index cbbf239aea0f..3c25e9f0b579 100644 --- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -2504,7 +2504,7 @@ done: * last stop bit; we can increase the error * margin by shifting the sampling point. */ - int shift = min(-8, max(7, deviation / 2)); + int shift = clamp(deviation / 2, -8, 7); hssrr |= (shift << HSCIF_SRHP_SHIFT) & HSCIF_SRHP_MASK; From 38b7f09a9e83da3d8d786429520aace041e1d29e Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 1 Apr 2019 13:25:10 +0200 Subject: [PATCH 56/99] serial: sh-sci: Fix HSCIF RX sampling point calculation commit ace965696da2611af759f0284e26342b7b6cec89 upstream. There are several issues with the formula used for calculating the deviation from the intended rate: 1. While min_err and last_stop are signed, srr and baud are unsigned. Hence the signed values are promoted to unsigned, which will lead to a bogus value of deviation if min_err is negative, 2. Srr is the register field value, which is one less than the actual sampling rate factor, 3. The divisions do not use rounding. Fix this by casting unsigned variables to int, adding one to srr, and using a single DIV_ROUND_CLOSEST(). Fixes: 63ba1e00f178a448 ("serial: sh-sci: Support for HSCIF RX sampling point adjustment") Signed-off-by: Geert Uytterhoeven Reviewed-by: Mukesh Ojha Cc: stable Reviewed-by: Ulrich Hecht Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/sh-sci.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c index 3c25e9f0b579..03fe3fb4bff6 100644 --- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -2497,7 +2497,9 @@ done: * center of the last stop bit in sampling clocks. */ int last_stop = bits * 2 - 1; - int deviation = min_err * srr * last_stop / 2 / baud; + int deviation = DIV_ROUND_CLOSEST(min_err * last_stop * + (int)(srr + 1), + 2 * (int)baud); if (abs(deviation) >= 2) { /* At least two sampling clocks off at the From 8f2ef0e8f9670f83b3fe7e620eb7ae98910ff627 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Thu, 4 Apr 2019 20:53:28 -0400 Subject: [PATCH 57/99] vt: fix cursor when clearing the screen commit b2ecf00631362a83744e5ec249947620db5e240c upstream. The patch a6dbe4427559 ("vt: perform safe console erase in the right order") introduced a bug. The conditional do_update_region() was replaced by a call to update_region() that does contain the conditional already, but with unwanted extra side effects such as restoring the cursor drawing. In order to reproduce the bug: - use framebuffer console with the AMDGPU driver - type "links" to start the console www browser - press 'q' and space to exit links Now the cursor will be permanently visible in the center of the screen. It will stay there until something overwrites it. The bug goes away if we change update_region() back to the conditional do_update_region(). [ nico: reworded changelog ] Signed-off-by: Mikulas Patocka Reviewed-by: Nicolas Pitre Cc: stable@vger.kernel.org Fixes: a6dbe4427559 ("vt: perform safe console erase in the right order") Signed-off-by: Greg Kroah-Hartman --- drivers/tty/vt/vt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index b9a9a07f1ee9..3e5ec1cee059 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -1521,7 +1521,8 @@ static void csi_J(struct vc_data *vc, int vpar) return; } scr_memsetw(start, vc->vc_video_erase_char, 2 * count); - update_region(vc, (unsigned long) start, count); + if (con_should_update(vc)) + do_update_region(vc, (unsigned long) start, count); vc->vc_need_wrap = 0; } From 1aa2682d0a98bb025f8dd29df11f978a249a81ba Mon Sep 17 00:00:00 2001 From: Jaesoo Lee Date: Tue, 9 Apr 2019 17:02:22 -0700 Subject: [PATCH 58/99] scsi: core: set result when the command cannot be dispatched commit be549d49115422f846b6d96ee8fd7173a5f7ceb0 upstream. When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq. Specifically, the bug is not setting result field of scsi_request correctly when the dispatch of the command has been failed. Since the upper layer code including the sg_io ioctl expects to receive any error status from result field of scsi_request, the error is silently ignored and this could cause data corruptions for some applications. Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.") Cc: Signed-off-by: Jaesoo Lee Reviewed-by: Hannes Reinecke Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/scsi_lib.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 655790f30434..1fc832751a4f 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2149,8 +2149,12 @@ out_put_budget: ret = BLK_STS_DEV_RESOURCE; break; default: + if (unlikely(!scsi_device_online(sdev))) + scsi_req(req)->result = DID_NO_CONNECT << 16; + else + scsi_req(req)->result = DID_ERROR << 16; /* - * Make sure to release all allocated ressources when + * Make sure to release all allocated resources when * we hit an error, as we will never see this command * again. */ From ee4b8e266229b061ff8d03f0ed8764ac5f8098b6 Mon Sep 17 00:00:00 2001 From: Saurav Kashyap Date: Thu, 18 Apr 2019 03:40:12 -0700 Subject: [PATCH 59/99] Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO" commit 0228034d8e5915b98c33db35a98f5e909e848ae9 upstream. This patch clears FC_RP_STARTED flag during logoff, because of this re-login(flogi) didn't happen to the switch. This reverts commit 1550ec458e0cf1a40a170ab1f4c46e3f52860f65. Fixes: 1550ec458e0c ("scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO") Cc: # v4.18+ Signed-off-by: Saurav Kashyap Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/libfc/fc_rport.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c index 1797e47fab38..3d51a936f6d5 100644 --- a/drivers/scsi/libfc/fc_rport.c +++ b/drivers/scsi/libfc/fc_rport.c @@ -2153,7 +2153,6 @@ static void fc_rport_recv_logo_req(struct fc_lport *lport, struct fc_frame *fp) FC_RPORT_DBG(rdata, "Received LOGO request while in state %s\n", fc_rport_state(rdata)); - rdata->flags &= ~FC_RP_STARTED; fc_rport_enter_delete(rdata, RPORT_EV_STOP); mutex_unlock(&rdata->rp_mutex); kref_put(&rdata->kref, fc_rport_destroy); From 3e1b3e4d3c83a925e9c6fcd5aeed8713225b4a37 Mon Sep 17 00:00:00 2001 From: "Suthikulpanit, Suravee" Date: Wed, 20 Mar 2019 08:12:28 +0000 Subject: [PATCH 60/99] Revert "svm: Fix AVIC incomplete IPI emulation" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 4a58038b9e420276157785afa0a0bbb4b9bc2265 upstream. This reverts commit bb218fbcfaaa3b115d4cd7a43c0ca164f3a96e57. As Oren Twaig pointed out the old discussion: https://patchwork.kernel.org/patch/8292231/ that the change coud potentially cause an extra IPI to be sent to the destination vcpu because the AVIC hardware already set the IRR bit before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running). Since writting to ICR and ICR2 will also set the IRR. If something triggers the destination vcpu to get scheduled before the emulation finishes, then this could result in an additional IPI. Also, the issue mentioned in the commit bb218fbcfaaa was misdiagnosed. Cc: Radim Krčmář Cc: Paolo Bonzini Reported-by: Oren Twaig Signed-off-by: Suravee Suthikulpanit Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 766070d7b4c5..813cb60eb401 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4496,14 +4496,25 @@ static int avic_incomplete_ipi_interception(struct vcpu_svm *svm) kvm_lapic_reg_write(apic, APIC_ICR, icrl); break; case AVIC_IPI_FAILURE_TARGET_NOT_RUNNING: { + int i; + struct kvm_vcpu *vcpu; + struct kvm *kvm = svm->vcpu.kvm; struct kvm_lapic *apic = svm->vcpu.arch.apic; /* - * Update ICR high and low, then emulate sending IPI, - * which is handled when writing APIC_ICR. + * At this point, we expect that the AVIC HW has already + * set the appropriate IRR bits on the valid target + * vcpus. So, we just need to kick the appropriate vcpu. */ - kvm_lapic_reg_write(apic, APIC_ICR2, icrh); - kvm_lapic_reg_write(apic, APIC_ICR, icrl); + kvm_for_each_vcpu(i, vcpu, kvm) { + bool m = kvm_apic_match_dest(vcpu, apic, + icrl & KVM_APIC_SHORT_MASK, + GET_APIC_DEST_FIELD(icrh), + icrl & KVM_APIC_DEST_MASK); + + if (m && !avic_vcpu_is_running(vcpu)) + kvm_vcpu_wake_up(vcpu); + } break; } case AVIC_IPI_FAILURE_INVALID_TARGET: From 6ff17bc5936e5fab33de8064dc0690f6c8c789ca Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Thu, 18 Apr 2019 17:50:52 -0700 Subject: [PATCH 61/99] coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream. The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli Reported-by: Jann Horn Suggested-by: Oleg Nesterov Acked-by: Peter Xu Reviewed-by: Mike Rapoport Reviewed-by: Oleg Nesterov Reviewed-by: Jann Horn Acked-by: Jason Gunthorpe Acked-by: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/proc/task_mmu.c | 18 ++++++++++++++++++ fs/userfaultfd.c | 9 +++++++++ include/linux/sched/mm.h | 21 +++++++++++++++++++++ mm/mmap.c | 7 ++++++- 4 files changed, 54 insertions(+), 1 deletion(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index d76fe166f6ce..c5819baee35c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1138,6 +1138,24 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, count = -EINTR; goto out_mm; } + /* + * Avoid to modify vma->vm_flags + * without locked ops while the + * coredump reads the vm_flags. + */ + if (!mmget_still_valid(mm)) { + /* + * Silently return "count" + * like if get_task_mm() + * failed. FIXME: should this + * function have returned + * -ESRCH if get_task_mm() + * failed like if + * get_proc_task() fails? + */ + up_write(&mm->mmap_sem); + goto out_mm; + } for (vma = mm->mmap; vma; vma = vma->vm_next) { vma->vm_flags &= ~VM_SOFTDIRTY; vma_set_page_prot(vma); diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index d8b8323e80f4..aaca81b5e119 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -630,6 +630,8 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, /* the various vma->vm_userfaultfd_ctx still points to it */ down_write(&mm->mmap_sem); + /* no task can run (and in turn coredump) yet */ + VM_WARN_ON(!mmget_still_valid(mm)); for (vma = mm->mmap; vma; vma = vma->vm_next) if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; @@ -884,6 +886,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file) * taking the mmap_sem for writing. */ down_write(&mm->mmap_sem); + if (!mmget_still_valid(mm)) + goto skip_mm; prev = NULL; for (vma = mm->mmap; vma; vma = vma->vm_next) { cond_resched(); @@ -906,6 +910,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file) vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; } +skip_mm: up_write(&mm->mmap_sem); mmput(mm); wakeup: @@ -1334,6 +1339,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, goto out; down_write(&mm->mmap_sem); + if (!mmget_still_valid(mm)) + goto out_unlock; vma = find_vma_prev(mm, start, &prev); if (!vma) goto out_unlock; @@ -1521,6 +1528,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, goto out; down_write(&mm->mmap_sem); + if (!mmget_still_valid(mm)) + goto out_unlock; vma = find_vma_prev(mm, start, &prev); if (!vma) goto out_unlock; diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index aebb370a0006..cebb79fe2c72 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -49,6 +49,27 @@ static inline void mmdrop(struct mm_struct *mm) __mmdrop(mm); } +/* + * This has to be called after a get_task_mm()/mmget_not_zero() + * followed by taking the mmap_sem for writing before modifying the + * vmas or anything the coredump pretends not to change from under it. + * + * NOTE: find_extend_vma() called from GUP context is the only place + * that can modify the "mm" (notably the vm_start/end) under mmap_sem + * for reading and outside the context of the process, so it is also + * the only case that holds the mmap_sem for reading that must call + * this function. Generally if the mmap_sem is hold for reading + * there's no need of this check after get_task_mm()/mmget_not_zero(). + * + * This function can be obsoleted and the check can be removed, after + * the coredump code will hold the mmap_sem for writing before + * invoking the ->core_dump methods. + */ +static inline bool mmget_still_valid(struct mm_struct *mm) +{ + return likely(!mm->core_state); +} + /** * mmget() - Pin the address space associated with a &struct mm_struct. * @mm: The address space to pin. diff --git a/mm/mmap.c b/mm/mmap.c index 43507f7e66b4..1480880ff814 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -45,6 +45,7 @@ #include #include #include +#include #include #include @@ -2491,7 +2492,8 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr) vma = find_vma_prev(mm, addr, &prev); if (vma && (vma->vm_start <= addr)) return vma; - if (!prev || expand_stack(prev, addr)) + /* don't alter vm_end if the coredump is running */ + if (!prev || !mmget_still_valid(mm) || expand_stack(prev, addr)) return NULL; if (prev->vm_flags & VM_LOCKED) populate_vma_page_range(prev, addr, prev->vm_end, NULL); @@ -2517,6 +2519,9 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr) return vma; if (!(vma->vm_flags & VM_GROWSDOWN)) return NULL; + /* don't alter vm_start if the coredump is running */ + if (!mmget_still_valid(mm)) + return NULL; start = vma->vm_start; if (expand_stack(vma, addr)) return NULL; From dacdbc115d23f1e4dbc091440bd5cdcdc01173b6 Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Wed, 3 Apr 2019 15:58:16 -0500 Subject: [PATCH 62/99] ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier commit 3b9a907223d7f6b9d1dadea29436842ae9bcd76d upstream. free_user() could be called in atomic context. This patch pushed the free operation off into a workqueue. Example: BUG: sleeping function called from invalid context at kernel/workqueue.c:2856 in_atomic(): 1, irqs_disabled(): 0, pid: 177, name: ksoftirqd/27 CPU: 27 PID: 177 Comm: ksoftirqd/27 Not tainted 4.19.25-3 #1 Hardware name: AIC 1S-HV26-08/MB-DPSB04-06, BIOS IVYBV060 10/21/2015 Call Trace: dump_stack+0x5c/0x7b ___might_sleep+0xec/0x110 __flush_work+0x48/0x1f0 ? try_to_del_timer_sync+0x4d/0x80 _cleanup_srcu_struct+0x104/0x140 free_user+0x18/0x30 [ipmi_msghandler] ipmi_free_recv_msg+0x3a/0x50 [ipmi_msghandler] deliver_response+0xbd/0xd0 [ipmi_msghandler] deliver_local_response+0xe/0x30 [ipmi_msghandler] handle_one_recv_msg+0x163/0xc80 [ipmi_msghandler] ? dequeue_entity+0xa0/0x960 handle_new_recv_msgs+0x15c/0x1f0 [ipmi_msghandler] tasklet_action_common.isra.22+0x103/0x120 __do_softirq+0xf8/0x2d7 run_ksoftirqd+0x26/0x50 smpboot_thread_fn+0x11d/0x1e0 kthread+0x103/0x140 ? sort_range+0x20/0x20 ? kthread_destroy_worker+0x40/0x40 ret_from_fork+0x1f/0x40 Fixes: 77f8269606bf ("ipmi: fix use-after-free of user->release_barrier.rda") Reported-by: Konstantin Khlebnikov Signed-off-by: Corey Minyard Cc: stable@vger.kernel.org # 5.0 Cc: Yang Yingliang Signed-off-by: Greg Kroah-Hartman --- drivers/char/ipmi/ipmi_msghandler.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index d5f7a12e350e..3fb297b5fb17 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -213,6 +213,9 @@ struct ipmi_user { /* Does this interface receive IPMI events? */ bool gets_events; + + /* Free must run in process context for RCU cleanup. */ + struct work_struct remove_work; }; static struct ipmi_user *acquire_ipmi_user(struct ipmi_user *user, int *index) @@ -1078,6 +1081,15 @@ static int intf_err_seq(struct ipmi_smi *intf, } +static void free_user_work(struct work_struct *work) +{ + struct ipmi_user *user = container_of(work, struct ipmi_user, + remove_work); + + cleanup_srcu_struct(&user->release_barrier); + kfree(user); +} + int ipmi_create_user(unsigned int if_num, const struct ipmi_user_hndl *handler, void *handler_data, @@ -1121,6 +1133,8 @@ int ipmi_create_user(unsigned int if_num, goto out_kfree; found: + INIT_WORK(&new_user->remove_work, free_user_work); + rv = init_srcu_struct(&new_user->release_barrier); if (rv) goto out_kfree; @@ -1183,8 +1197,9 @@ EXPORT_SYMBOL(ipmi_get_smi_info); static void free_user(struct kref *ref) { struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount); - cleanup_srcu_struct(&user->release_barrier); - kfree(user); + + /* SRCU cleanup must happen in task context. */ + schedule_work(&user->remove_work); } static void _ipmi_destroy_user(struct ipmi_user *user) From fbe5cff932295f2468524bf6ffe109a9d0849178 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Sun, 31 Mar 2019 13:04:11 -0700 Subject: [PATCH 63/99] crypto: x86/poly1305 - fix overflow during partial reduction commit 678cce4019d746da6c680c48ba9e6d417803e127 upstream. The x86_64 implementation of Poly1305 produces the wrong result on some inputs because poly1305_4block_avx2() incorrectly assumes that when partially reducing the accumulator, the bits carried from limb 'd4' to limb 'h0' fit in a 32-bit integer. This is true for poly1305-generic which processes only one block at a time. However, it's not true for the AVX2 implementation, which processes 4 blocks at a time and therefore can produce intermediate limbs about 4x larger. Fix it by making the relevant calculations use 64-bit arithmetic rather than 32-bit. Note that most of the carries already used 64-bit arithmetic, but the d4 -> h0 carry was different for some reason. To be safe I also made the same change to the corresponding SSE2 code, though that only operates on 1 or 2 blocks at a time. I don't think it's really needed for poly1305_block_sse2(), but it doesn't hurt because it's already x86_64 code. It *might* be needed for poly1305_2block_sse2(), but overflows aren't easy to reproduce there. This bug was originally detected by my patches that improve testmgr to fuzz algorithms against their generic implementation. But also add a test vector which reproduces it directly (in the AVX2 case). Fixes: b1ccc8f4b631 ("crypto: poly1305 - Add a four block AVX2 variant for x86_64") Fixes: c70f4abef07a ("crypto: poly1305 - Add a SSE2 SIMD variant for x86_64") Cc: # v4.3+ Cc: Martin Willi Cc: Jason A. Donenfeld Signed-off-by: Eric Biggers Reviewed-by: Martin Willi Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- arch/x86/crypto/poly1305-avx2-x86_64.S | 14 +++++--- arch/x86/crypto/poly1305-sse2-x86_64.S | 22 ++++++++----- crypto/testmgr.h | 44 +++++++++++++++++++++++++- 3 files changed, 67 insertions(+), 13 deletions(-) diff --git a/arch/x86/crypto/poly1305-avx2-x86_64.S b/arch/x86/crypto/poly1305-avx2-x86_64.S index 3b6e70d085da..8457cdd47f75 100644 --- a/arch/x86/crypto/poly1305-avx2-x86_64.S +++ b/arch/x86/crypto/poly1305-avx2-x86_64.S @@ -323,6 +323,12 @@ ENTRY(poly1305_4block_avx2) vpaddq t2,t1,t1 vmovq t1x,d4 + # Now do a partial reduction mod (2^130)-5, carrying h0 -> h1 -> h2 -> + # h3 -> h4 -> h0 -> h1 to get h0,h2,h3,h4 < 2^26 and h1 < 2^26 + a small + # amount. Careful: we must not assume the carry bits 'd0 >> 26', + # 'd1 >> 26', 'd2 >> 26', 'd3 >> 26', and '(d4 >> 26) * 5' fit in 32-bit + # integers. It's true in a single-block implementation, but not here. + # d1 += d0 >> 26 mov d0,%rax shr $26,%rax @@ -361,16 +367,16 @@ ENTRY(poly1305_4block_avx2) # h0 += (d4 >> 26) * 5 mov d4,%rax shr $26,%rax - lea (%eax,%eax,4),%eax - add %eax,%ebx + lea (%rax,%rax,4),%rax + add %rax,%rbx # h4 = d4 & 0x3ffffff mov d4,%rax and $0x3ffffff,%eax mov %eax,h4 # h1 += h0 >> 26 - mov %ebx,%eax - shr $26,%eax + mov %rbx,%rax + shr $26,%rax add %eax,h1 # h0 = h0 & 0x3ffffff andl $0x3ffffff,%ebx diff --git a/arch/x86/crypto/poly1305-sse2-x86_64.S b/arch/x86/crypto/poly1305-sse2-x86_64.S index c88c670cb5fc..5851c7418fb7 100644 --- a/arch/x86/crypto/poly1305-sse2-x86_64.S +++ b/arch/x86/crypto/poly1305-sse2-x86_64.S @@ -253,16 +253,16 @@ ENTRY(poly1305_block_sse2) # h0 += (d4 >> 26) * 5 mov d4,%rax shr $26,%rax - lea (%eax,%eax,4),%eax - add %eax,%ebx + lea (%rax,%rax,4),%rax + add %rax,%rbx # h4 = d4 & 0x3ffffff mov d4,%rax and $0x3ffffff,%eax mov %eax,h4 # h1 += h0 >> 26 - mov %ebx,%eax - shr $26,%eax + mov %rbx,%rax + shr $26,%rax add %eax,h1 # h0 = h0 & 0x3ffffff andl $0x3ffffff,%ebx @@ -520,6 +520,12 @@ ENTRY(poly1305_2block_sse2) paddq t2,t1 movq t1,d4 + # Now do a partial reduction mod (2^130)-5, carrying h0 -> h1 -> h2 -> + # h3 -> h4 -> h0 -> h1 to get h0,h2,h3,h4 < 2^26 and h1 < 2^26 + a small + # amount. Careful: we must not assume the carry bits 'd0 >> 26', + # 'd1 >> 26', 'd2 >> 26', 'd3 >> 26', and '(d4 >> 26) * 5' fit in 32-bit + # integers. It's true in a single-block implementation, but not here. + # d1 += d0 >> 26 mov d0,%rax shr $26,%rax @@ -558,16 +564,16 @@ ENTRY(poly1305_2block_sse2) # h0 += (d4 >> 26) * 5 mov d4,%rax shr $26,%rax - lea (%eax,%eax,4),%eax - add %eax,%ebx + lea (%rax,%rax,4),%rax + add %rax,%rbx # h4 = d4 & 0x3ffffff mov d4,%rax and $0x3ffffff,%eax mov %eax,h4 # h1 += h0 >> 26 - mov %ebx,%eax - shr $26,%eax + mov %rbx,%rax + shr $26,%rax add %eax,h1 # h0 = h0 & 0x3ffffff andl $0x3ffffff,%ebx diff --git a/crypto/testmgr.h b/crypto/testmgr.h index 862ee1d04263..74e1454cae1e 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -5592,7 +5592,49 @@ static const struct hash_testvec poly1305_tv_template[] = { .psize = 80, .digest = "\x13\x00\x00\x00\x00\x00\x00\x00" "\x00\x00\x00\x00\x00\x00\x00\x00", - }, + }, { /* Regression test for overflow in AVX2 implementation */ + .plaintext = "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff", + .psize = 300, + .digest = "\xfb\x5e\x96\xd8\x61\xd5\xc7\xc8" + "\x78\xe5\x87\xcc\x2d\x5a\x22\xe1", + } }; /* From 96800ba9e565ab752774cd88328f96aed28a1436 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Tue, 2 Apr 2019 09:26:52 +0200 Subject: [PATCH 64/99] drm/ttm: fix out-of-bounds read in ttm_put_pages() v2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit a66477b0efe511d98dde3e4aaeb189790e6f0a39 upstream. When ttm_put_pages() tries to figure out whether it's dealing with transparent hugepages, it just reads past the bounds of the pages array without a check. v2: simplify the test if enough pages are left in the array (Christian). Signed-off-by: Jann Horn Signed-off-by: Christian König Fixes: 5c42c64f7d54 ("drm/ttm: fix the fix for huge compound pages") Cc: stable@vger.kernel.org Reviewed-by: Michel Dänzer Reviewed-by: Junwei Zhang Reviewed-by: Huang Rui Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/ttm/ttm_page_alloc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index f841accc2c00..f77c81db161b 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -730,7 +730,8 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, } #ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (!(flags & TTM_PAGE_FLAG_DMA32)) { + if (!(flags & TTM_PAGE_FLAG_DMA32) && + (npages - i) >= HPAGE_PMD_NR) { for (j = 0; j < HPAGE_PMD_NR; ++j) if (p++ != pages[i + j]) break; @@ -759,7 +760,7 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, unsigned max_size, n2free; spin_lock_irqsave(&huge->lock, irq_flags); - while (i < npages) { + while ((npages - i) >= HPAGE_PMD_NR) { struct page *p = pages[i]; unsigned j; From 5105fc758bdc4f7bb330248f1e2d2ea3b704421d Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Wed, 17 Apr 2019 00:21:21 -0700 Subject: [PATCH 65/99] arm64: futex: Restore oldval initialization to work around buggy compilers commit ff8acf929014b7f87315588e0daf8597c8aa9d1c upstream. Commit 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value") removed oldval's zero initialization in arch_futex_atomic_op_inuser because it is not necessary. Unfortunately, Android's arm64 GCC 4.9.4 [1] does not agree: ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1658:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:73:0: ../arch/arm64/include/asm/futex.h:53:6: note: 'oldval' was declared here int oldval, ret, tmp; ^ GCC fails to follow that when ret is non-zero, futex_atomic_op_inuser returns right away, avoiding the uninitialized use that it claims. Restoring the zero initialization works around this issue. [1]: https://android.googlesource.com/platform/prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/ Cc: stable@vger.kernel.org Fixes: 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value") Reviewed-by: Greg Kroah-Hartman Signed-off-by: Nathan Chancellor Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/futex.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h index b447b4db423a..fd1e722f3821 100644 --- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -50,7 +50,7 @@ do { \ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr) { - int oldval, ret, tmp; + int oldval = 0, ret, tmp; u32 __user *uaddr = __uaccess_mask_ptr(_uaddr); pagefault_disable(); From 1fab567a270b8fb2f2b80c00b5c8c8106d377be8 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Sun, 24 Feb 2019 01:49:52 +0900 Subject: [PATCH 66/99] x86/kprobes: Verify stack frame on kretprobe commit 3ff9c075cc767b3060bdac12da72fc94dd7da1b8 upstream. Verify the stack frame pointer on kretprobe trampoline handler, If the stack frame pointer does not match, it skips the wrong entry and tries to find correct one. This can happen if user puts the kretprobe on the function which can be used in the path of ftrace user-function call. Such functions should not be probed, so this adds a warning message that reports which function should be blacklisted. Tested-by: Andrea Righi Signed-off-by: Masami Hiramatsu Acked-by: Steven Rostedt Cc: Linus Torvalds Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094059185.6137.15527904013362842072.stgit@devbox Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/kprobes/core.c | 26 ++++++++++++++++++++++++++ include/linux/kprobes.h | 1 + 2 files changed, 27 insertions(+) diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index b0d1e81c96bb..acb901b43ce4 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -569,6 +569,7 @@ void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs) unsigned long *sara = stack_addr(regs); ri->ret_addr = (kprobe_opcode_t *) *sara; + ri->fp = sara; /* Replace the return addr with trampoline addr */ *sara = (unsigned long) &kretprobe_trampoline; @@ -759,15 +760,21 @@ __visible __used void *trampoline_handler(struct pt_regs *regs) unsigned long flags, orig_ret_address = 0; unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline; kprobe_opcode_t *correct_ret_addr = NULL; + void *frame_pointer; + bool skipped = false; INIT_HLIST_HEAD(&empty_rp); kretprobe_hash_lock(current, &head, &flags); /* fixup registers */ #ifdef CONFIG_X86_64 regs->cs = __KERNEL_CS; + /* On x86-64, we use pt_regs->sp for return address holder. */ + frame_pointer = ®s->sp; #else regs->cs = __KERNEL_CS | get_kernel_rpl(); regs->gs = 0; + /* On x86-32, we use pt_regs->flags for return address holder. */ + frame_pointer = ®s->flags; #endif regs->ip = trampoline_address; regs->orig_ax = ~0UL; @@ -789,8 +796,25 @@ __visible __used void *trampoline_handler(struct pt_regs *regs) if (ri->task != current) /* another task is sharing our hash bucket */ continue; + /* + * Return probes must be pushed on this hash list correct + * order (same as return order) so that it can be poped + * correctly. However, if we find it is pushed it incorrect + * order, this means we find a function which should not be + * probed, because the wrong order entry is pushed on the + * path of processing other kretprobe itself. + */ + if (ri->fp != frame_pointer) { + if (!skipped) + pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n"); + skipped = true; + continue; + } orig_ret_address = (unsigned long)ri->ret_addr; + if (skipped) + pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n", + ri->rp->kp.addr); if (orig_ret_address != trampoline_address) /* @@ -808,6 +832,8 @@ __visible __used void *trampoline_handler(struct pt_regs *regs) if (ri->task != current) /* another task is sharing our hash bucket */ continue; + if (ri->fp != frame_pointer) + continue; orig_ret_address = (unsigned long)ri->ret_addr; if (ri->rp && ri->rp->handler) { diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index e909413e4e38..32cae0f35b9d 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -173,6 +173,7 @@ struct kretprobe_instance { struct kretprobe *rp; kprobe_opcode_t *ret_addr; struct task_struct *task; + void *fp; char data[0]; }; From 426e2a8024c23b3a664250b46e2ab0604c0a0a09 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Sun, 24 Feb 2019 01:50:20 +0900 Subject: [PATCH 67/99] kprobes: Mark ftrace mcount handler functions nokprobe commit fabe38ab6b2bd9418350284c63825f13b8a6abba upstream. Mark ftrace mcount handler functions nokprobe since probing on these functions with kretprobe pushes return address incorrectly on kretprobe shadow stack. Reported-by: Francis Deslauriers Tested-by: Andrea Righi Signed-off-by: Masami Hiramatsu Acked-by: Steven Rostedt Acked-by: Steven Rostedt (VMware) Cc: Linus Torvalds Cc: Mathieu Desnoyers Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094062044.6137.6419622920568680640.stgit@devbox Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/trace/ftrace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index e23eb9fc77aa..1688782f3dfb 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -34,6 +34,7 @@ #include #include #include +#include #include @@ -6250,7 +6251,7 @@ void ftrace_reset_array_ops(struct trace_array *tr) tr->ops->func = ftrace_stub; } -static inline void +static nokprobe_inline void __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *ignored, struct pt_regs *regs) { @@ -6310,11 +6311,13 @@ static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, { __ftrace_ops_list_func(ip, parent_ip, NULL, regs); } +NOKPROBE_SYMBOL(ftrace_ops_list_func); #else static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip) { __ftrace_ops_list_func(ip, parent_ip, NULL, NULL); } +NOKPROBE_SYMBOL(ftrace_ops_no_ops); #endif /* @@ -6341,6 +6344,7 @@ static void ftrace_ops_assist_func(unsigned long ip, unsigned long parent_ip, preempt_enable_notrace(); trace_clear_recursion(bit); } +NOKPROBE_SYMBOL(ftrace_ops_assist_func); /** * ftrace_ops_get_func - get the function a trampoline should call From 23a926e5edd940b7311e0c511caa4b45eaf6b59f Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Mon, 15 Apr 2019 15:01:25 +0900 Subject: [PATCH 68/99] kprobes: Fix error check when reusing optimized probes commit 5f843ed415581cfad4ef8fefe31c138a8346ca8a upstream. The following commit introduced a bug in one of our error paths: 819319fc9346 ("kprobes: Return error if we fail to reuse kprobe instead of BUG_ON()") it missed to handle the return value of kprobe_optready() as error-value. In reality, the kprobe_optready() returns a bool result, so "true" case must be passed instead of 0. This causes some errors on kprobe boot-time selftests on ARM: [ ] Beginning kprobe tests... [ ] Probe ARM code [ ] kprobe [ ] kretprobe [ ] ARM instruction simulation [ ] Check decoding tables [ ] Run test cases [ ] FAIL: test_case_handler not run [ ] FAIL: Test andge r10, r11, r14, asr r7 [ ] FAIL: Scenario 11 ... [ ] FAIL: Scenario 7 [ ] Total instruction simulation tests=1631, pass=1433 fail=198 [ ] kprobe tests failed This can happen if an optimized probe is unregistered and next kprobe is registered on same address until the previous probe is not reclaimed. If this happens, a hidden aggregated probe may be kept in memory, and no new kprobe can probe same address. Also, in that case register_kprobe() will return "1" instead of minus error value, which can mislead caller logic. Signed-off-by: Masami Hiramatsu Cc: Anil S Keshavamurthy Cc: David S . Miller Cc: Linus Torvalds Cc: Naveen N . Rao Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: stable@vger.kernel.org # v5.0+ Fixes: 819319fc9346 ("kprobes: Return error if we fail to reuse kprobe instead of BUG_ON()") Link: http://lkml.kernel.org/r/155530808559.32517.539898325433642204.stgit@devnote2 Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/kprobes.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 4344381664cc..29ff6635d259 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -703,7 +703,6 @@ static void unoptimize_kprobe(struct kprobe *p, bool force) static int reuse_unused_kprobe(struct kprobe *ap) { struct optimized_kprobe *op; - int ret; BUG_ON(!kprobe_unused(ap)); /* @@ -715,9 +714,8 @@ static int reuse_unused_kprobe(struct kprobe *ap) /* Enable the probe again */ ap->flags &= ~KPROBE_FLAG_DISABLED; /* Optimize it again (remove from op->list) */ - ret = kprobe_optready(ap); - if (ret) - return ret; + if (!kprobe_optready(ap)) + return -EINVAL; optimize_kprobe(ap); return 0; From 852de0d53d1443404b00c6e7dc6f15adc68aec1d Mon Sep 17 00:00:00 2001 From: Vijayakumar Durai Date: Wed, 27 Mar 2019 11:03:17 +0100 Subject: [PATCH 69/99] rt2x00: do not increment sequence number while re-transmitting commit 746ba11f170603bf1eaade817553a6c2e9135bbe upstream. Currently rt2x00 devices retransmit the management frames with incremented sequence number if hardware is assigning the sequence. This is HW bug fixed already for non-QOS data frames, but it should be fixed for management frames except beacon. Without fix retransmitted frames have wrong SN: AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1648, FN=0, Flags=........C Frame is not being retransmitted 1648 1 AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1649, FN=0, Flags=....R...C Frame is being retransmitted 1649 1 AlphaNet_e8:fb:36 Vivotek_52:31:51 Authentication, SN=1650, FN=0, Flags=....R...C Frame is being retransmitted 1650 1 With the fix SN stays correctly the same: 88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=........C 88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=....R...C 88:6a:e3:e8:f9:a2 8c:f5:a3:88:76:87 Authentication, SN=1450, FN=0, Flags=....R...C Cc: stable@vger.kernel.org Signed-off-by: Vijayakumar Durai [sgruszka: simplify code, change comments and changelog] Signed-off-by: Stanislaw Gruszka Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ralink/rt2x00/rt2x00.h | 1 - drivers/net/wireless/ralink/rt2x00/rt2x00mac.c | 10 ---------- drivers/net/wireless/ralink/rt2x00/rt2x00queue.c | 15 +++++++++------ 3 files changed, 9 insertions(+), 17 deletions(-) diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00.h b/drivers/net/wireless/ralink/rt2x00/rt2x00.h index a279a4363bc1..1d21424eae8a 100644 --- a/drivers/net/wireless/ralink/rt2x00/rt2x00.h +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00.h @@ -672,7 +672,6 @@ enum rt2x00_state_flags { CONFIG_CHANNEL_HT40, CONFIG_POWERSAVING, CONFIG_HT_DISABLED, - CONFIG_QOS_DISABLED, CONFIG_MONITORING, /* diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00mac.c b/drivers/net/wireless/ralink/rt2x00/rt2x00mac.c index fa2fd64084ac..da526684596f 100644 --- a/drivers/net/wireless/ralink/rt2x00/rt2x00mac.c +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00mac.c @@ -642,18 +642,8 @@ void rt2x00mac_bss_info_changed(struct ieee80211_hw *hw, rt2x00dev->intf_associated--; rt2x00leds_led_assoc(rt2x00dev, !!rt2x00dev->intf_associated); - - clear_bit(CONFIG_QOS_DISABLED, &rt2x00dev->flags); } - /* - * Check for access point which do not support 802.11e . We have to - * generate data frames sequence number in S/W for such AP, because - * of H/W bug. - */ - if (changes & BSS_CHANGED_QOS && !bss_conf->qos) - set_bit(CONFIG_QOS_DISABLED, &rt2x00dev->flags); - /* * When the erp information has changed, we should perform * additional configuration steps. For all other changes we are done. diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c index 710e9641552e..85e320178a0e 100644 --- a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c @@ -200,15 +200,18 @@ static void rt2x00queue_create_tx_descriptor_seq(struct rt2x00_dev *rt2x00dev, if (!rt2x00_has_cap_flag(rt2x00dev, REQUIRE_SW_SEQNO)) { /* * rt2800 has a H/W (or F/W) bug, device incorrectly increase - * seqno on retransmited data (non-QOS) frames. To workaround - * the problem let's generate seqno in software if QOS is - * disabled. + * seqno on retransmitted data (non-QOS) and management frames. + * To workaround the problem let's generate seqno in software. + * Except for beacons which are transmitted periodically by H/W + * hence hardware has to assign seqno for them. */ - if (test_bit(CONFIG_QOS_DISABLED, &rt2x00dev->flags)) - __clear_bit(ENTRY_TXD_GENERATE_SEQ, &txdesc->flags); - else + if (ieee80211_is_beacon(hdr->frame_control)) { + __set_bit(ENTRY_TXD_GENERATE_SEQ, &txdesc->flags); /* H/W will generate sequence number */ return; + } + + __clear_bit(ENTRY_TXD_GENERATE_SEQ, &txdesc->flags); } /* From 39cad03c4360b72d4d4adda4be1b718c24d0af44 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Fri, 1 Mar 2019 14:48:37 +0100 Subject: [PATCH 70/99] mac80211: do not call driver wake_tx_queue op during reconfig commit 4856bfd230985e43e84c26473c91028ff0a533bd upstream. There are several scenarios in which mac80211 can call drv_wake_tx_queue after ieee80211_restart_hw has been called and has not yet completed. Driver private structs are considered uninitialized until mac80211 has uploaded the vifs, stations and keys again, so using private tx queue data during that time is not safe. The driver can also not rely on drv_reconfig_complete to figure out when it is safe to accept drv_wake_tx_queue calls again, because it is only called after all tx queues are woken again. To fix this, bail out early in drv_wake_tx_queue if local->in_reconfig is set. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman --- net/mac80211/driver-ops.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h index 8f6998091d26..2123f6e90fc0 100644 --- a/net/mac80211/driver-ops.h +++ b/net/mac80211/driver-ops.h @@ -1166,6 +1166,9 @@ static inline void drv_wake_tx_queue(struct ieee80211_local *local, { struct ieee80211_sub_if_data *sdata = vif_to_sdata(txq->txq.vif); + if (local->in_reconfig) + return; + if (!check_sdata_in_driver(sdata)) return; From ba407222f563af8cfc69c5ae5607899a039f8be2 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Thu, 11 Apr 2019 14:54:40 -0500 Subject: [PATCH 71/99] drm/amdgpu/gmc9: fix VM_L2_CNTL3 programming MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 1925e7d3d4677e681cc2e878c2bdbeaee988c8e2 upstream. Got accidently dropped when 2+1 level support was added. Fixes: 6a42fd6fbf534096 ("drm/amdgpu: implement 2+1 PD support for Raven v3") Reviewed-by: Christian König Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c index e70a0d4d6db4..c963eec58c70 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c @@ -164,6 +164,7 @@ static void mmhub_v1_0_init_cache_regs(struct amdgpu_device *adev) tmp = REG_SET_FIELD(tmp, VM_L2_CNTL3, L2_CACHE_BIGK_FRAGMENT_SIZE, 6); } + WREG32_SOC15(MMHUB, 0, mmVM_L2_CNTL3, tmp); tmp = mmVM_L2_CNTL4_DEFAULT; tmp = REG_SET_FIELD(tmp, VM_L2_CNTL4, VMC_TAP_PDE_REQUEST_PHYSICAL, 0); From f45829e6250a84350e38402cc4dc2b5e3a9ea82e Mon Sep 17 00:00:00 2001 From: Kim Phillips Date: Thu, 21 Mar 2019 21:15:22 +0000 Subject: [PATCH 72/99] perf/x86/amd: Add event map for AMD Family 17h MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 3fe3331bb285700ab2253dbb07f8e478fcea2f1b upstream. Family 17h differs from prior families by: - Does not support an L2 cache miss event - It has re-enumerated PMC counters for: - L2 cache references - front & back end stalled cycles So we add a new amd_f17h_perfmon_event_map[] so that the generic perf event names will resolve to the correct h/w events on family 17h and above processors. Reference sections 2.1.13.3.3 (stalls) and 2.1.13.3.6 (L2): https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf Signed-off-by: Kim Phillips Cc: # v4.9+ Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Janakarajan Natarajan Cc: Jiri Olsa Cc: Linus Torvalds Cc: Martin Liška Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Pu Wen Cc: Suravee Suthikulpanit Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors") [ Improved the formatting a bit. ] Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/amd/core.c | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index 3e5dd85b019a..263af6312329 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -117,22 +117,39 @@ static __initconst const u64 amd_hw_cache_event_ids }; /* - * AMD Performance Monitor K7 and later. + * AMD Performance Monitor K7 and later, up to and including Family 16h: */ static const u64 amd_perfmon_event_map[PERF_COUNT_HW_MAX] = { - [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, - [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, - [PERF_COUNT_HW_CACHE_REFERENCES] = 0x077d, - [PERF_COUNT_HW_CACHE_MISSES] = 0x077e, - [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2, - [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3, - [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x00d0, /* "Decoder empty" event */ - [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x00d1, /* "Dispatch stalls" event */ + [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, + [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, + [PERF_COUNT_HW_CACHE_REFERENCES] = 0x077d, + [PERF_COUNT_HW_CACHE_MISSES] = 0x077e, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2, + [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3, + [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x00d0, /* "Decoder empty" event */ + [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x00d1, /* "Dispatch stalls" event */ +}; + +/* + * AMD Performance Monitor Family 17h and later: + */ +static const u64 amd_f17h_perfmon_event_map[PERF_COUNT_HW_MAX] = +{ + [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, + [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, + [PERF_COUNT_HW_CACHE_REFERENCES] = 0xff60, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2, + [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3, + [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x0287, + [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x0187, }; static u64 amd_pmu_event_map(int hw_event) { + if (boot_cpu_data.x86 >= 0x17) + return amd_f17h_perfmon_event_map[hw_event]; + return amd_perfmon_event_map[hw_event]; } From 293926b37013c0b9204d110eaaad9a4a5467d24b Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Fri, 29 Mar 2019 17:47:43 -0700 Subject: [PATCH 73/99] x86/cpu/bugs: Use __initconst for 'const' init data commit 1de7edbb59c8f1b46071f66c5c97b8a59569eb51 upstream. Some of the recently added const tables use __initdata which causes section attribute conflicts. Use __initconst instead. Fixes: fa1202ef2243 ("x86/speculation: Add command line control") Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190330004743.29541-9-andi@firstfloor.org Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 1e0c4c74195c..e5258bd64200 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -272,7 +272,7 @@ static const struct { const char *option; enum spectre_v2_user_cmd cmd; bool secure; -} v2_user_options[] __initdata = { +} v2_user_options[] __initconst = { { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, { "off", SPECTRE_V2_USER_CMD_NONE, false }, { "on", SPECTRE_V2_USER_CMD_FORCE, true }, @@ -407,7 +407,7 @@ static const struct { const char *option; enum spectre_v2_mitigation_cmd cmd; bool secure; -} mitigation_options[] __initdata = { +} mitigation_options[] __initconst = { { "off", SPECTRE_V2_CMD_NONE, false }, { "on", SPECTRE_V2_CMD_FORCE, true }, { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, @@ -643,7 +643,7 @@ static const char * const ssb_strings[] = { static const struct { const char *option; enum ssb_mitigation_cmd cmd; -} ssb_mitigation_options[] __initdata = { +} ssb_mitigation_options[] __initconst = { { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */ { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */ { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */ From 90e17512f1e43a67008082d0380bdc8a689c6c9b Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Tue, 2 Apr 2019 12:44:58 -0700 Subject: [PATCH 74/99] perf/x86: Fix incorrect PEBS_REGS commit 9d5dcc93a6ddfc78124f006ccd3637ce070ef2fc upstream. PEBS_REGS used as mask for the supported registers for large PEBS. However, the mask cannot filter the sample_regs_user/sample_regs_intr correctly. (1ULL << PERF_REG_X86_*) should be used to replace PERF_REG_X86_*, which is only the index. Rename PEBS_REGS to PEBS_GP_REGS, because the mask is only for general purpose registers. Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Cc: Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: acme@kernel.org Cc: jolsa@kernel.org Fixes: 2fe1bc1f501d ("perf/x86: Enable free running PEBS for REGS_USER/INTR") Link: https://lkml.kernel.org/r/20190402194509.2832-2-kan.liang@linux.intel.com [ Renamed it to PEBS_GP_REGS - as 'GPRS' is used elsewhere ;-) ] Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/intel/core.c | 2 +- arch/x86/events/perf_event.h | 38 ++++++++++++++++++------------------ 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 12453cf7c11b..3dd204d1dd19 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3014,7 +3014,7 @@ static unsigned long intel_pmu_large_pebs_flags(struct perf_event *event) flags &= ~PERF_SAMPLE_TIME; if (!event->attr.exclude_kernel) flags &= ~PERF_SAMPLE_REGS_USER; - if (event->attr.sample_regs_user & ~PEBS_REGS) + if (event->attr.sample_regs_user & ~PEBS_GP_REGS) flags &= ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); return flags; } diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 42a36280d168..05659c7b43d4 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -96,25 +96,25 @@ struct amd_nb { PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER | \ PERF_SAMPLE_PERIOD) -#define PEBS_REGS \ - (PERF_REG_X86_AX | \ - PERF_REG_X86_BX | \ - PERF_REG_X86_CX | \ - PERF_REG_X86_DX | \ - PERF_REG_X86_DI | \ - PERF_REG_X86_SI | \ - PERF_REG_X86_SP | \ - PERF_REG_X86_BP | \ - PERF_REG_X86_IP | \ - PERF_REG_X86_FLAGS | \ - PERF_REG_X86_R8 | \ - PERF_REG_X86_R9 | \ - PERF_REG_X86_R10 | \ - PERF_REG_X86_R11 | \ - PERF_REG_X86_R12 | \ - PERF_REG_X86_R13 | \ - PERF_REG_X86_R14 | \ - PERF_REG_X86_R15) +#define PEBS_GP_REGS \ + ((1ULL << PERF_REG_X86_AX) | \ + (1ULL << PERF_REG_X86_BX) | \ + (1ULL << PERF_REG_X86_CX) | \ + (1ULL << PERF_REG_X86_DX) | \ + (1ULL << PERF_REG_X86_DI) | \ + (1ULL << PERF_REG_X86_SI) | \ + (1ULL << PERF_REG_X86_SP) | \ + (1ULL << PERF_REG_X86_BP) | \ + (1ULL << PERF_REG_X86_IP) | \ + (1ULL << PERF_REG_X86_FLAGS) | \ + (1ULL << PERF_REG_X86_R8) | \ + (1ULL << PERF_REG_X86_R9) | \ + (1ULL << PERF_REG_X86_R10) | \ + (1ULL << PERF_REG_X86_R11) | \ + (1ULL << PERF_REG_X86_R12) | \ + (1ULL << PERF_REG_X86_R13) | \ + (1ULL << PERF_REG_X86_R14) | \ + (1ULL << PERF_REG_X86_R15)) /* * Per register state. From 5680b0635cdab1a1afb4f2078226584e008008b0 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 14 Apr 2019 19:51:06 +0200 Subject: [PATCH 75/99] x86/speculation: Prevent deadlock on ssb_state::lock commit 2f5fb19341883bb6e37da351bc3700489d8506a7 upstream. Mikhail reported a lockdep splat related to the AMD specific ssb_state lock: CPU0 CPU1 lock(&st->lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&st->lock); lock(&(&sighand->siglock)->rlock); *** DEADLOCK *** The connection between sighand->siglock and st->lock comes through seccomp, which takes st->lock while holding sighand->siglock. Make sure interrupts are disabled when __speculation_ctrl_update() is invoked via prctl() -> speculation_ctrl_update(). Add a lockdep assert to catch future offenders. Fixes: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") Reported-by: Mikhail Gavrilov Signed-off-by: Thomas Gleixner Tested-by: Mikhail Gavrilov Cc: Thomas Lendacky Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1904141948200.4917@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/process.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 7d31192296a8..b8b08e61ac73 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -411,6 +411,8 @@ static __always_inline void __speculation_ctrl_update(unsigned long tifp, u64 msr = x86_spec_ctrl_base; bool updmsr = false; + lockdep_assert_irqs_disabled(); + /* * If TIF_SSBD is different, select the proper mitigation * method. Note that if SSBD mitigation is disabled or permanentely @@ -462,10 +464,12 @@ static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk) void speculation_ctrl_update(unsigned long tif) { + unsigned long flags; + /* Forced update. Make sure all relevant TIF flags are different */ - preempt_disable(); + local_irq_save(flags); __speculation_ctrl_update(~tif, tif); - preempt_enable(); + local_irq_restore(flags); } /* Called from seccomp/prctl update */ From cd37fd46b4857a787571c3153bfa64d0ae28a407 Mon Sep 17 00:00:00 2001 From: Chang-An Chen Date: Fri, 29 Mar 2019 10:59:09 +0800 Subject: [PATCH 76/99] timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() commit 3f2552f7e9c5abef2775c53f7af66532f8bf65bc upstream. tick_freeze() introduced by suspend-to-idle in commit 124cf9117c5f ("PM / sleep: Make it possible to quiesce timers during suspend-to-idle") uses timekeeping_suspend() instead of syscore_suspend() during suspend-to-idle. As a consequence generic sched_clock will keep going because sched_clock_suspend() and sched_clock_resume() are not invoked during suspend-to-idle which can result in a generic sched_clock wrap. On a ARM system with suspend-to-idle enabled, sched_clock is registered as "56 bits at 13MHz, resolution 76ns, wraps every 4398046511101ns", which means the real wrapping duration is 8796093022202ns. [ 134.551779] suspend-to-idle suspend (timekeeping_suspend()) [ 1204.912239] suspend-to-idle resume (timekeeping_resume()) ...... [ 1206.912239] suspend-to-idle suspend (timekeeping_suspend()) [ 5880.502807] suspend-to-idle resume (timekeeping_resume()) ...... [ 6000.403724] suspend-to-idle suspend (timekeeping_suspend()) [ 8035.753167] suspend-to-idle resume (timekeeping_resume()) ...... [ 8795.786684] (2)[321:charger_thread]...... [ 8795.788387] (2)[321:charger_thread]...... [ 0.057226] (0)[0:swapper/0]...... [ 0.061447] (2)[0:swapper/2]...... sched_clock was not stopped during suspend-to-idle, and sched_clock_poll hrtimer was not expired because timekeeping_suspend() was invoked during suspend-to-idle. It makes sched_clock wrap at kernel time 8796s. To prevent this, invoke sched_clock_suspend() and sched_clock_resume() in tick_freeze() together with timekeeping_suspend() and timekeeping_resume(). Fixes: 124cf9117c5f (PM / sleep: Make it possible to quiesce timers during suspend-to-idle) Signed-off-by: Chang-An Chen Signed-off-by: Thomas Gleixner Cc: Frederic Weisbecker Cc: Matthias Brugger Cc: John Stultz Cc: Kees Cook Cc: Corey Minyard Cc: Cc: Cc: Stanley Chu Cc: Cc: Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1553828349-8914-1-git-send-email-chang-an.chen@mediatek.com Signed-off-by: Greg Kroah-Hartman --- kernel/time/sched_clock.c | 4 ++-- kernel/time/tick-common.c | 2 ++ kernel/time/timekeeping.h | 7 +++++++ 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index cbc72c2c1fca..78eb05aa8003 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -275,7 +275,7 @@ static u64 notrace suspended_sched_clock_read(void) return cd.read_data[seq & 1].epoch_cyc; } -static int sched_clock_suspend(void) +int sched_clock_suspend(void) { struct clock_read_data *rd = &cd.read_data[0]; @@ -286,7 +286,7 @@ static int sched_clock_suspend(void) return 0; } -static void sched_clock_resume(void) +void sched_clock_resume(void) { struct clock_read_data *rd = &cd.read_data[0]; diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index 14de3727b18e..a02e0f6b287c 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -491,6 +491,7 @@ void tick_freeze(void) trace_suspend_resume(TPS("timekeeping_freeze"), smp_processor_id(), true); system_state = SYSTEM_SUSPEND; + sched_clock_suspend(); timekeeping_suspend(); } else { tick_suspend_local(); @@ -514,6 +515,7 @@ void tick_unfreeze(void) if (tick_freeze_depth == num_online_cpus()) { timekeeping_resume(); + sched_clock_resume(); system_state = SYSTEM_RUNNING; trace_suspend_resume(TPS("timekeeping_freeze"), smp_processor_id(), false); diff --git a/kernel/time/timekeeping.h b/kernel/time/timekeeping.h index 7a9b4eb7a1d5..141ab3ab0354 100644 --- a/kernel/time/timekeeping.h +++ b/kernel/time/timekeeping.h @@ -14,6 +14,13 @@ extern u64 timekeeping_max_deferment(void); extern void timekeeping_warp_clock(void); extern int timekeeping_suspend(void); extern void timekeeping_resume(void); +#ifdef CONFIG_GENERIC_SCHED_CLOCK +extern int sched_clock_suspend(void); +extern void sched_clock_resume(void); +#else +static inline int sched_clock_suspend(void) { return 0; } +static inline void sched_clock_resume(void) { } +#endif extern void do_timer(unsigned long ticks); extern void update_wall_time(void); From 82a13a006ed5412fd817419e60113b7c922b21ff Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 22 Apr 2019 16:08:10 -0700 Subject: [PATCH 77/99] nfit/ars: Remove ars_start_flags commit 317a992ab9266b86b774b9f6b0f87eb4f59879a1 upstream. The ars_start_flags property of 'struct acpi_nfit_desc' is no longer used since ARS_REQ_SHORT and ARS_REQ_LONG were added. Reviewed-by: Toshi Kani Signed-off-by: Dan Williams Signed-off-by: Sasha Levin --- drivers/acpi/nfit/core.c | 10 +++++----- drivers/acpi/nfit/nfit.h | 1 - 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index df2175b1169a..b5237a506464 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2539,11 +2539,11 @@ static int ars_continue(struct acpi_nfit_desc *acpi_desc) struct nvdimm_bus_descriptor *nd_desc = &acpi_desc->nd_desc; struct nd_cmd_ars_status *ars_status = acpi_desc->ars_status; - memset(&ars_start, 0, sizeof(ars_start)); - ars_start.address = ars_status->restart_address; - ars_start.length = ars_status->restart_length; - ars_start.type = ars_status->type; - ars_start.flags = acpi_desc->ars_start_flags; + ars_start = (struct nd_cmd_ars_start) { + .address = ars_status->restart_address, + .length = ars_status->restart_length, + .type = ars_status->type, + }; rc = nd_desc->ndctl(nd_desc, NULL, ND_CMD_ARS_START, &ars_start, sizeof(ars_start), &cmd_rc); if (rc < 0) diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h index 02c10de50386..df31a2721573 100644 --- a/drivers/acpi/nfit/nfit.h +++ b/drivers/acpi/nfit/nfit.h @@ -194,7 +194,6 @@ struct acpi_nfit_desc { struct list_head idts; struct nvdimm_bus *nvdimm_bus; struct device *dev; - u8 ars_start_flags; struct nd_cmd_ars_status *ars_status; struct nfit_spa *scrub_spa; struct delayed_work dwork; From bc18c2593635364f0c71f76f8fc395bee33954ab Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 22 Apr 2019 16:08:16 -0700 Subject: [PATCH 78/99] nfit/ars: Introduce scrub_flags commit e34b8252a3d2893ca55c82dbfcdaa302fa03d400 upstream. In preparation for introducing new flags to gate whether ARS results are stale, or poll the completion state, convert the existing flags to an unsigned long with enumerated values. This conversion allows the flags to be atomically updated outside of ->init_mutex. Reviewed-by: Toshi Kani Signed-off-by: Dan Williams Signed-off-by: Sasha Levin --- drivers/acpi/nfit/core.c | 30 +++++++++++++++++------------- drivers/acpi/nfit/nfit.h | 8 ++++++-- 2 files changed, 23 insertions(+), 15 deletions(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index b5237a506464..6b5a3c3b4458 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -1298,19 +1298,23 @@ static ssize_t scrub_show(struct device *dev, struct device_attribute *attr, char *buf) { struct nvdimm_bus_descriptor *nd_desc; + struct acpi_nfit_desc *acpi_desc; ssize_t rc = -ENXIO; + bool busy; device_lock(dev); nd_desc = dev_get_drvdata(dev); - if (nd_desc) { - struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc); - - mutex_lock(&acpi_desc->init_mutex); - rc = sprintf(buf, "%d%s", acpi_desc->scrub_count, - acpi_desc->scrub_busy - && !acpi_desc->cancel ? "+\n" : "\n"); - mutex_unlock(&acpi_desc->init_mutex); + if (!nd_desc) { + device_unlock(dev); + return rc; } + acpi_desc = to_acpi_desc(nd_desc); + + mutex_lock(&acpi_desc->init_mutex); + busy = test_bit(ARS_BUSY, &acpi_desc->scrub_flags) + && !test_bit(ARS_CANCEL, &acpi_desc->scrub_flags); + rc = sprintf(buf, "%d%s", acpi_desc->scrub_count, busy ? "+\n" : "\n"); + mutex_unlock(&acpi_desc->init_mutex); device_unlock(dev); return rc; } @@ -2960,7 +2964,7 @@ static unsigned int __acpi_nfit_scrub(struct acpi_nfit_desc *acpi_desc, lockdep_assert_held(&acpi_desc->init_mutex); - if (acpi_desc->cancel) + if (test_bit(ARS_CANCEL, &acpi_desc->scrub_flags)) return 0; if (query_rc == -EBUSY) { @@ -3034,7 +3038,7 @@ static void __sched_ars(struct acpi_nfit_desc *acpi_desc, unsigned int tmo) { lockdep_assert_held(&acpi_desc->init_mutex); - acpi_desc->scrub_busy = 1; + set_bit(ARS_BUSY, &acpi_desc->scrub_flags); /* note this should only be set from within the workqueue */ if (tmo) acpi_desc->scrub_tmo = tmo; @@ -3050,7 +3054,7 @@ static void notify_ars_done(struct acpi_nfit_desc *acpi_desc) { lockdep_assert_held(&acpi_desc->init_mutex); - acpi_desc->scrub_busy = 0; + clear_bit(ARS_BUSY, &acpi_desc->scrub_flags); acpi_desc->scrub_count++; if (acpi_desc->scrub_count_state) sysfs_notify_dirent(acpi_desc->scrub_count_state); @@ -3322,7 +3326,7 @@ int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc, struct nfit_spa *nfit_spa; mutex_lock(&acpi_desc->init_mutex); - if (acpi_desc->cancel) { + if (test_bit(ARS_CANCEL, &acpi_desc->scrub_flags)) { mutex_unlock(&acpi_desc->init_mutex); return 0; } @@ -3401,7 +3405,7 @@ void acpi_nfit_shutdown(void *data) mutex_unlock(&acpi_desc_lock); mutex_lock(&acpi_desc->init_mutex); - acpi_desc->cancel = 1; + set_bit(ARS_CANCEL, &acpi_desc->scrub_flags); cancel_delayed_work_sync(&acpi_desc->dwork); mutex_unlock(&acpi_desc->init_mutex); diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h index df31a2721573..94710e579598 100644 --- a/drivers/acpi/nfit/nfit.h +++ b/drivers/acpi/nfit/nfit.h @@ -181,6 +181,11 @@ struct nfit_mem { bool has_lsw; }; +enum scrub_flags { + ARS_BUSY, + ARS_CANCEL, +}; + struct acpi_nfit_desc { struct nvdimm_bus_descriptor nd_desc; struct acpi_table_header acpi_header; @@ -202,8 +207,7 @@ struct acpi_nfit_desc { unsigned int max_ars; unsigned int scrub_count; unsigned int scrub_mode; - unsigned int scrub_busy:1; - unsigned int cancel:1; + unsigned long scrub_flags; unsigned long dimm_cmd_force_en; unsigned long bus_cmd_force_en; unsigned long bus_nfit_cmd_force_en; From 40221d56ae28eb0b3e86a482063b0e303903e397 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 22 Apr 2019 16:08:21 -0700 Subject: [PATCH 79/99] nfit/ars: Allow root to busy-poll the ARS state machine commit 5479b2757f26fe9908fc341d105b2097fe820b6f upstream. The ARS implementation implements exponential back-off on the poll interval to prevent high-frequency access to the DIMM / platform interface. Depending on when the ARS completes the poll interval may exceed the completion event by minutes. Allow root to reset the timeout each time it probes the status. A one-second timeout is still enforced, but root can otherwise can control the poll interval. Fixes: bc6ba8085842 ("nfit, address-range-scrub: rework and simplify ARS...") Cc: Reported-by: Erwin Tsaur Reviewed-by: Toshi Kani Signed-off-by: Dan Williams Signed-off-by: Sasha Levin --- drivers/acpi/nfit/core.c | 8 ++++++++ drivers/acpi/nfit/nfit.h | 1 + 2 files changed, 9 insertions(+) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 6b5a3c3b4458..4b489d14a680 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -1314,6 +1314,13 @@ static ssize_t scrub_show(struct device *dev, busy = test_bit(ARS_BUSY, &acpi_desc->scrub_flags) && !test_bit(ARS_CANCEL, &acpi_desc->scrub_flags); rc = sprintf(buf, "%d%s", acpi_desc->scrub_count, busy ? "+\n" : "\n"); + /* Allow an admin to poll the busy state at a higher rate */ + if (busy && capable(CAP_SYS_RAWIO) && !test_and_set_bit(ARS_POLL, + &acpi_desc->scrub_flags)) { + acpi_desc->scrub_tmo = 1; + mod_delayed_work(nfit_wq, &acpi_desc->dwork, HZ); + } + mutex_unlock(&acpi_desc->init_mutex); device_unlock(dev); return rc; @@ -3075,6 +3082,7 @@ static void acpi_nfit_scrub(struct work_struct *work) else notify_ars_done(acpi_desc); memset(acpi_desc->ars_status, 0, acpi_desc->max_ars); + clear_bit(ARS_POLL, &acpi_desc->scrub_flags); mutex_unlock(&acpi_desc->init_mutex); } diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h index 94710e579598..b5fd3522abc7 100644 --- a/drivers/acpi/nfit/nfit.h +++ b/drivers/acpi/nfit/nfit.h @@ -184,6 +184,7 @@ struct nfit_mem { enum scrub_flags { ARS_BUSY, ARS_CANCEL, + ARS_POLL, }; struct acpi_nfit_desc { From be608583d9c4910877d7553d4240e998818c2d70 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 22 Apr 2019 16:08:26 -0700 Subject: [PATCH 80/99] nfit/ars: Avoid stale ARS results commit 78153dd45e7e0596ba32b15d02bda08e1513111e upstream. Gate ARS result consumption on whether the OS issued start-ARS since the previous consumption. The BIOS may only clear its result buffers after a successful start-ARS. Fixes: 0caeef63e6d2 ("libnvdimm: Add a poison list and export badblocks") Cc: Reported-by: Krzysztof Rusocki Reported-by: Vishal Verma Reviewed-by: Toshi Kani Signed-off-by: Dan Williams Signed-off-by: Sasha Levin --- drivers/acpi/nfit/core.c | 17 ++++++++++++++++- drivers/acpi/nfit/nfit.h | 1 + 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 4b489d14a680..925dbc751322 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2540,7 +2540,10 @@ static int ars_start(struct acpi_nfit_desc *acpi_desc, if (rc < 0) return rc; - return cmd_rc; + if (cmd_rc < 0) + return cmd_rc; + set_bit(ARS_VALID, &acpi_desc->scrub_flags); + return 0; } static int ars_continue(struct acpi_nfit_desc *acpi_desc) @@ -2633,6 +2636,17 @@ static int ars_status_process_records(struct acpi_nfit_desc *acpi_desc) */ if (ars_status->out_length < 44) return 0; + + /* + * Ignore potentially stale results that are only refreshed + * after a start-ARS event. + */ + if (!test_and_clear_bit(ARS_VALID, &acpi_desc->scrub_flags)) { + dev_dbg(acpi_desc->dev, "skip %d stale records\n", + ars_status->num_records); + return 0; + } + for (i = 0; i < ars_status->num_records; i++) { /* only process full records */ if (ars_status->out_length @@ -3117,6 +3131,7 @@ static int acpi_nfit_register_regions(struct acpi_nfit_desc *acpi_desc) struct nfit_spa *nfit_spa; int rc; + set_bit(ARS_VALID, &acpi_desc->scrub_flags); list_for_each_entry(nfit_spa, &acpi_desc->spas, list) { switch (nfit_spa_type(nfit_spa->spa)) { case NFIT_SPA_VOLATILE: diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h index b5fd3522abc7..68848fc4b7c9 100644 --- a/drivers/acpi/nfit/nfit.h +++ b/drivers/acpi/nfit/nfit.h @@ -184,6 +184,7 @@ struct nfit_mem { enum scrub_flags { ARS_BUSY, ARS_CANCEL, + ARS_VALID, ARS_POLL, }; From b2be40b73b29921e5c7f2e3a11ceaa31c508326c Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Thu, 15 Nov 2018 15:53:41 +0200 Subject: [PATCH 81/99] mmc: sdhci: Fix data command CRC error handling [ Upstream commit 4bf780996669280171c9cd58196512849b93434e ] Existing data command CRC error handling is non-standard and does not work with some Intel host controllers. Specifically, the assumption that the host controller will continue operating normally after the error interrupt, is not valid. Change the driver to handle the error in the same manner as a data CRC error, taking care to ensure that the data line reset is done for single or multi-block transfers, and it is done before unmapping DMA. Signed-off-by: Adrian Hunter Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/mmc/host/sdhci.c | 40 +++++++++++++++------------------------- 1 file changed, 15 insertions(+), 25 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 654051e00117..4bfaca33a477 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -1078,8 +1078,7 @@ static bool sdhci_needs_reset(struct sdhci_host *host, struct mmc_request *mrq) return (!(host->flags & SDHCI_DEVICE_DEAD) && ((mrq->cmd && mrq->cmd->error) || (mrq->sbc && mrq->sbc->error) || - (mrq->data && ((mrq->data->error && !mrq->data->stop) || - (mrq->data->stop && mrq->data->stop->error))) || + (mrq->data && mrq->data->stop && mrq->data->stop->error) || (host->quirks & SDHCI_QUIRK_RESET_AFTER_REQUEST))); } @@ -1131,6 +1130,16 @@ static void sdhci_finish_data(struct sdhci_host *host) host->data = NULL; host->data_cmd = NULL; + /* + * The controller needs a reset of internal state machines upon error + * conditions. + */ + if (data->error) { + if (!host->cmd || host->cmd == data_cmd) + sdhci_do_reset(host, SDHCI_RESET_CMD); + sdhci_do_reset(host, SDHCI_RESET_DATA); + } + if ((host->flags & (SDHCI_REQ_USE_DMA | SDHCI_USE_ADMA)) == (SDHCI_REQ_USE_DMA | SDHCI_USE_ADMA)) sdhci_adma_table_post(host, data); @@ -1155,17 +1164,6 @@ static void sdhci_finish_data(struct sdhci_host *host) if (data->stop && (data->error || !data->mrq->sbc)) { - - /* - * The controller needs a reset of internal state machines - * upon error conditions. - */ - if (data->error) { - if (!host->cmd || host->cmd == data_cmd) - sdhci_do_reset(host, SDHCI_RESET_CMD); - sdhci_do_reset(host, SDHCI_RESET_DATA); - } - /* * 'cap_cmd_during_tfr' request must not use the command line * after mmc_command_done() has been called. It is upper layer's @@ -2642,7 +2640,7 @@ static void sdhci_timeout_data_timer(struct timer_list *t) * * \*****************************************************************************/ -static void sdhci_cmd_irq(struct sdhci_host *host, u32 intmask) +static void sdhci_cmd_irq(struct sdhci_host *host, u32 intmask, u32 *intmask_p) { if (!host->cmd) { /* @@ -2665,20 +2663,12 @@ static void sdhci_cmd_irq(struct sdhci_host *host, u32 intmask) else host->cmd->error = -EILSEQ; - /* - * If this command initiates a data phase and a response - * CRC error is signalled, the card can start transferring - * data - the card may have received the command without - * error. We must not terminate the mmc_request early. - * - * If the card did not receive the command or returned an - * error which prevented it sending data, the data phase - * will time out. - */ + /* Treat data command CRC error the same as data CRC error */ if (host->cmd->data && (intmask & (SDHCI_INT_CRC | SDHCI_INT_TIMEOUT)) == SDHCI_INT_CRC) { host->cmd = NULL; + *intmask_p |= SDHCI_INT_DATA_CRC; return; } @@ -2906,7 +2896,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) } if (intmask & SDHCI_INT_CMD_MASK) - sdhci_cmd_irq(host, intmask & SDHCI_INT_CMD_MASK); + sdhci_cmd_irq(host, intmask & SDHCI_INT_CMD_MASK, &intmask); if (intmask & SDHCI_INT_DATA_MASK) sdhci_data_irq(host, intmask & SDHCI_INT_DATA_MASK); From ba8a6c055677ef22c43fd31b6ac0b27998be36f9 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Thu, 15 Nov 2018 15:53:42 +0200 Subject: [PATCH 82/99] mmc: sdhci: Rename SDHCI_ACMD12_ERR and SDHCI_INT_ACMD12ERR [ Upstream commit 869f8a69bb3a4aec4eb914a330d4ba53a9eed495 ] The SDHCI_ACMD12_ERR register is used for auto-CMD23 and auto-CMD12 errors, as is the SDHCI_INT_ACMD12ERR interrupt bit. Rename them to SDHCI_AUTO_CMD_STATUS and SDHCI_INT_AUTO_CMD_ERR respectively. Signed-off-by: Adrian Hunter Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/mmc/host/sdhci-esdhc-imx.c | 12 ++++++------ drivers/mmc/host/sdhci.c | 4 ++-- drivers/mmc/host/sdhci.h | 4 ++-- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c index 8dae12b841b3..629860f7327c 100644 --- a/drivers/mmc/host/sdhci-esdhc-imx.c +++ b/drivers/mmc/host/sdhci-esdhc-imx.c @@ -429,7 +429,7 @@ static u16 esdhc_readw_le(struct sdhci_host *host, int reg) val = readl(host->ioaddr + ESDHC_MIX_CTRL); else if (imx_data->socdata->flags & ESDHC_FLAG_STD_TUNING) /* the std tuning bits is in ACMD12_ERR for imx6sl */ - val = readl(host->ioaddr + SDHCI_ACMD12_ERR); + val = readl(host->ioaddr + SDHCI_AUTO_CMD_STATUS); } if (val & ESDHC_MIX_CTRL_EXE_TUNE) @@ -494,7 +494,7 @@ static void esdhc_writew_le(struct sdhci_host *host, u16 val, int reg) } writel(new_val , host->ioaddr + ESDHC_MIX_CTRL); } else if (imx_data->socdata->flags & ESDHC_FLAG_STD_TUNING) { - u32 v = readl(host->ioaddr + SDHCI_ACMD12_ERR); + u32 v = readl(host->ioaddr + SDHCI_AUTO_CMD_STATUS); u32 m = readl(host->ioaddr + ESDHC_MIX_CTRL); if (val & SDHCI_CTRL_TUNED_CLK) { v |= ESDHC_MIX_CTRL_SMPCLK_SEL; @@ -512,7 +512,7 @@ static void esdhc_writew_le(struct sdhci_host *host, u16 val, int reg) v &= ~ESDHC_MIX_CTRL_EXE_TUNE; } - writel(v, host->ioaddr + SDHCI_ACMD12_ERR); + writel(v, host->ioaddr + SDHCI_AUTO_CMD_STATUS); writel(m, host->ioaddr + ESDHC_MIX_CTRL); } return; @@ -957,9 +957,9 @@ static void esdhc_reset_tuning(struct sdhci_host *host) writel(ctrl, host->ioaddr + ESDHC_MIX_CTRL); writel(0, host->ioaddr + ESDHC_TUNE_CTRL_STATUS); } else if (imx_data->socdata->flags & ESDHC_FLAG_STD_TUNING) { - ctrl = readl(host->ioaddr + SDHCI_ACMD12_ERR); + ctrl = readl(host->ioaddr + SDHCI_AUTO_CMD_STATUS); ctrl &= ~ESDHC_MIX_CTRL_SMPCLK_SEL; - writel(ctrl, host->ioaddr + SDHCI_ACMD12_ERR); + writel(ctrl, host->ioaddr + SDHCI_AUTO_CMD_STATUS); } } } @@ -1319,7 +1319,7 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev) /* clear tuning bits in case ROM has set it already */ writel(0x0, host->ioaddr + ESDHC_MIX_CTRL); - writel(0x0, host->ioaddr + SDHCI_ACMD12_ERR); + writel(0x0, host->ioaddr + SDHCI_AUTO_CMD_STATUS); writel(0x0, host->ioaddr + ESDHC_TUNE_CTRL_STATUS); } diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 4bfaca33a477..0eb05a42a857 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -82,8 +82,8 @@ void sdhci_dumpregs(struct sdhci_host *host) SDHCI_DUMP("Int enab: 0x%08x | Sig enab: 0x%08x\n", sdhci_readl(host, SDHCI_INT_ENABLE), sdhci_readl(host, SDHCI_SIGNAL_ENABLE)); - SDHCI_DUMP("AC12 err: 0x%08x | Slot int: 0x%08x\n", - sdhci_readw(host, SDHCI_ACMD12_ERR), + SDHCI_DUMP("ACmd stat: 0x%08x | Slot int: 0x%08x\n", + sdhci_readw(host, SDHCI_AUTO_CMD_STATUS), sdhci_readw(host, SDHCI_SLOT_INT_STATUS)); SDHCI_DUMP("Caps: 0x%08x | Caps_1: 0x%08x\n", sdhci_readl(host, SDHCI_CAPABILITIES), diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h index f0bd36ce3817..33a7728c71fa 100644 --- a/drivers/mmc/host/sdhci.h +++ b/drivers/mmc/host/sdhci.h @@ -144,7 +144,7 @@ #define SDHCI_INT_DATA_CRC 0x00200000 #define SDHCI_INT_DATA_END_BIT 0x00400000 #define SDHCI_INT_BUS_POWER 0x00800000 -#define SDHCI_INT_ACMD12ERR 0x01000000 +#define SDHCI_INT_AUTO_CMD_ERR 0x01000000 #define SDHCI_INT_ADMA_ERROR 0x02000000 #define SDHCI_INT_NORMAL_MASK 0x00007FFF @@ -166,7 +166,7 @@ #define SDHCI_CQE_INT_MASK (SDHCI_CQE_INT_ERR_MASK | SDHCI_INT_CQE) -#define SDHCI_ACMD12_ERR 0x3C +#define SDHCI_AUTO_CMD_STATUS 0x3C #define SDHCI_HOST_CONTROL2 0x3E #define SDHCI_CTRL_UHS_MASK 0x0007 From 87eadc0b8c2a574c3e48aa0d15e1318a6a24f654 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Thu, 15 Nov 2018 15:53:43 +0200 Subject: [PATCH 83/99] mmc: sdhci: Handle auto-command errors [ Upstream commit af849c86109d79222e549826068bbf4e7f9a2472 ] If the host controller supports auto-commands then enable the auto-command error interrupt and handle it. In the case of auto-CMD23, the error is treated the same as manual CMD23 error. In the case of auto-CMD12, commands-during-transfer are not permitted, so the error handling is treated the same as a data error. Signed-off-by: Adrian Hunter Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/mmc/host/sdhci.c | 35 +++++++++++++++++++++++++++++++++++ drivers/mmc/host/sdhci.h | 7 ++++++- 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 0eb05a42a857..c749d3dc1d36 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -841,6 +841,11 @@ static void sdhci_set_transfer_irqs(struct sdhci_host *host) else host->ier = (host->ier & ~dma_irqs) | pio_irqs; + if (host->flags & (SDHCI_AUTO_CMD23 | SDHCI_AUTO_CMD12)) + host->ier |= SDHCI_INT_AUTO_CMD_ERR; + else + host->ier &= ~SDHCI_INT_AUTO_CMD_ERR; + sdhci_writel(host, host->ier, SDHCI_INT_ENABLE); sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE); } @@ -2642,6 +2647,21 @@ static void sdhci_timeout_data_timer(struct timer_list *t) static void sdhci_cmd_irq(struct sdhci_host *host, u32 intmask, u32 *intmask_p) { + /* Handle auto-CMD12 error */ + if (intmask & SDHCI_INT_AUTO_CMD_ERR && host->data_cmd) { + struct mmc_request *mrq = host->data_cmd->mrq; + u16 auto_cmd_status = sdhci_readw(host, SDHCI_AUTO_CMD_STATUS); + int data_err_bit = (auto_cmd_status & SDHCI_AUTO_CMD_TIMEOUT) ? + SDHCI_INT_DATA_TIMEOUT : + SDHCI_INT_DATA_CRC; + + /* Treat auto-CMD12 error the same as data error */ + if (!mrq->sbc && (host->flags & SDHCI_AUTO_CMD12)) { + *intmask_p |= data_err_bit; + return; + } + } + if (!host->cmd) { /* * SDHCI recovers from errors by resetting the cmd and data @@ -2676,6 +2696,21 @@ static void sdhci_cmd_irq(struct sdhci_host *host, u32 intmask, u32 *intmask_p) return; } + /* Handle auto-CMD23 error */ + if (intmask & SDHCI_INT_AUTO_CMD_ERR) { + struct mmc_request *mrq = host->cmd->mrq; + u16 auto_cmd_status = sdhci_readw(host, SDHCI_AUTO_CMD_STATUS); + int err = (auto_cmd_status & SDHCI_AUTO_CMD_TIMEOUT) ? + -ETIMEDOUT : + -EILSEQ; + + if (mrq->sbc && (host->flags & SDHCI_AUTO_CMD23)) { + mrq->sbc->error = err; + sdhci_finish_mrq(host, mrq); + return; + } + } + if (intmask & SDHCI_INT_RESPONSE) sdhci_finish_command(host); } diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h index 33a7728c71fa..0f8c4f3ccafc 100644 --- a/drivers/mmc/host/sdhci.h +++ b/drivers/mmc/host/sdhci.h @@ -151,7 +151,8 @@ #define SDHCI_INT_ERROR_MASK 0xFFFF8000 #define SDHCI_INT_CMD_MASK (SDHCI_INT_RESPONSE | SDHCI_INT_TIMEOUT | \ - SDHCI_INT_CRC | SDHCI_INT_END_BIT | SDHCI_INT_INDEX) + SDHCI_INT_CRC | SDHCI_INT_END_BIT | SDHCI_INT_INDEX | \ + SDHCI_INT_AUTO_CMD_ERR) #define SDHCI_INT_DATA_MASK (SDHCI_INT_DATA_END | SDHCI_INT_DMA_END | \ SDHCI_INT_DATA_AVAIL | SDHCI_INT_SPACE_AVAIL | \ SDHCI_INT_DATA_TIMEOUT | SDHCI_INT_DATA_CRC | \ @@ -167,6 +168,10 @@ #define SDHCI_CQE_INT_MASK (SDHCI_CQE_INT_ERR_MASK | SDHCI_INT_CQE) #define SDHCI_AUTO_CMD_STATUS 0x3C +#define SDHCI_AUTO_CMD_TIMEOUT 0x00000002 +#define SDHCI_AUTO_CMD_CRC 0x00000004 +#define SDHCI_AUTO_CMD_END_BIT 0x00000008 +#define SDHCI_AUTO_CMD_INDEX 0x00000010 #define SDHCI_HOST_CONTROL2 0x3E #define SDHCI_CTRL_UHS_MASK 0x0007 From aa0e8cc9d7a89bb0c23435ec1b62eac46b1f4984 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Thu, 22 Nov 2018 13:28:41 +0900 Subject: [PATCH 84/99] modpost: file2alias: go back to simple devtable lookup [ Upstream commit ec91e78d378cc5d4b43805a1227d8e04e5dfa17d ] Commit e49ce14150c6 ("modpost: use linker section to generate table.") was not so cool as we had expected first; it ended up with ugly section hacks when commit dd2a3acaecd7 ("mod/file2alias: make modpost compile on darwin again") came in. Given a certain degree of unknowledge about the link stage of host programs, I really want to see simple, stupid table lookup so that this works in the same way regardless of the underlying executable format. Signed-off-by: Masahiro Yamada Acked-by: Mathieu Malaterre Signed-off-by: Sasha Levin --- scripts/mod/file2alias.c | 144 +++++++++++++-------------------------- 1 file changed, 49 insertions(+), 95 deletions(-) diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c index 7be43697ff84..9f3ebde7a0cd 100644 --- a/scripts/mod/file2alias.c +++ b/scripts/mod/file2alias.c @@ -50,46 +50,6 @@ struct devtable { void *function; }; -#define ___cat(a,b) a ## b -#define __cat(a,b) ___cat(a,b) - -/* we need some special handling for this host tool running eventually on - * Darwin. The Mach-O section handling is a bit different than ELF section - * handling. The differnces in detail are: - * a) we have segments which have sections - * b) we need a API call to get the respective section symbols */ -#if defined(__MACH__) -#include - -#define INIT_SECTION(name) do { \ - unsigned long name ## _len; \ - char *__cat(pstart_,name) = getsectdata("__TEXT", \ - #name, &__cat(name,_len)); \ - char *__cat(pstop_,name) = __cat(pstart_,name) + \ - __cat(name, _len); \ - __cat(__start_,name) = (void *)__cat(pstart_,name); \ - __cat(__stop_,name) = (void *)__cat(pstop_,name); \ - } while (0) -#define SECTION(name) __attribute__((section("__TEXT, " #name))) - -struct devtable **__start___devtable, **__stop___devtable; -#else -#define INIT_SECTION(name) /* no-op for ELF */ -#define SECTION(name) __attribute__((section(#name))) - -/* We construct a table of pointers in an ELF section (pointers generally - * go unpadded by gcc). ld creates boundary syms for us. */ -extern struct devtable *__start___devtable[], *__stop___devtable[]; -#endif /* __MACH__ */ - -#if !defined(__used) -# if __GNUC__ == 3 && __GNUC_MINOR__ < 3 -# define __used __attribute__((__unused__)) -# else -# define __used __attribute__((__used__)) -# endif -#endif - /* Define a variable f that holds the value of field f of struct devid * based at address m. */ @@ -102,16 +62,6 @@ extern struct devtable *__start___devtable[], *__stop___devtable[]; #define DEF_FIELD_ADDR(m, devid, f) \ typeof(((struct devid *)0)->f) *f = ((m) + OFF_##devid##_##f) -/* Add a table entry. We test function type matches while we're here. */ -#define ADD_TO_DEVTABLE(device_id, type, function) \ - static struct devtable __cat(devtable,__LINE__) = { \ - device_id + 0*sizeof((function)((const char *)NULL, \ - (void *)NULL, \ - (char *)NULL)), \ - SIZE_##type, (function) }; \ - static struct devtable *SECTION(__devtable) __used \ - __cat(devtable_ptr,__LINE__) = &__cat(devtable,__LINE__) - #define ADD(str, sep, cond, field) \ do { \ strcat(str, sep); \ @@ -431,7 +381,6 @@ static int do_hid_entry(const char *filename, return 1; } -ADD_TO_DEVTABLE("hid", hid_device_id, do_hid_entry); /* Looks like: ieee1394:venNmoNspNverN */ static int do_ieee1394_entry(const char *filename, @@ -456,7 +405,6 @@ static int do_ieee1394_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("ieee1394", ieee1394_device_id, do_ieee1394_entry); /* Looks like: pci:vNdNsvNsdNbcNscNiN. */ static int do_pci_entry(const char *filename, @@ -500,7 +448,6 @@ static int do_pci_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("pci", pci_device_id, do_pci_entry); /* looks like: "ccw:tNmNdtNdmN" */ static int do_ccw_entry(const char *filename, @@ -524,7 +471,6 @@ static int do_ccw_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("ccw", ccw_device_id, do_ccw_entry); /* looks like: "ap:tN" */ static int do_ap_entry(const char *filename, @@ -535,7 +481,6 @@ static int do_ap_entry(const char *filename, sprintf(alias, "ap:t%02X*", dev_type); return 1; } -ADD_TO_DEVTABLE("ap", ap_device_id, do_ap_entry); /* looks like: "css:tN" */ static int do_css_entry(const char *filename, @@ -546,7 +491,6 @@ static int do_css_entry(const char *filename, sprintf(alias, "css:t%01X", type); return 1; } -ADD_TO_DEVTABLE("css", css_device_id, do_css_entry); /* Looks like: "serio:tyNprNidNexN" */ static int do_serio_entry(const char *filename, @@ -566,7 +510,6 @@ static int do_serio_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("serio", serio_device_id, do_serio_entry); /* looks like: "acpi:ACPI0003" or "acpi:PNP0C0B" or "acpi:LNXVIDEO" or * "acpi:bbsspp" (bb=base-class, ss=sub-class, pp=prog-if) @@ -604,7 +547,6 @@ static int do_acpi_entry(const char *filename, } return 1; } -ADD_TO_DEVTABLE("acpi", acpi_device_id, do_acpi_entry); /* looks like: "pnp:dD" */ static void do_pnp_device_entry(void *symval, unsigned long size, @@ -725,7 +667,6 @@ static int do_pcmcia_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("pcmcia", pcmcia_device_id, do_pcmcia_entry); static int do_vio_entry(const char *filename, void *symval, char *alias) @@ -745,7 +686,6 @@ static int do_vio_entry(const char *filename, void *symval, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("vio", vio_device_id, do_vio_entry); #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) @@ -818,7 +758,6 @@ static int do_input_entry(const char *filename, void *symval, do_input(alias, *swbit, 0, INPUT_DEVICE_ID_SW_MAX); return 1; } -ADD_TO_DEVTABLE("input", input_device_id, do_input_entry); static int do_eisa_entry(const char *filename, void *symval, char *alias) @@ -830,7 +769,6 @@ static int do_eisa_entry(const char *filename, void *symval, strcat(alias, "*"); return 1; } -ADD_TO_DEVTABLE("eisa", eisa_device_id, do_eisa_entry); /* Looks like: parisc:tNhvNrevNsvN */ static int do_parisc_entry(const char *filename, void *symval, @@ -850,7 +788,6 @@ static int do_parisc_entry(const char *filename, void *symval, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("parisc", parisc_device_id, do_parisc_entry); /* Looks like: sdio:cNvNdN. */ static int do_sdio_entry(const char *filename, @@ -867,7 +804,6 @@ static int do_sdio_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("sdio", sdio_device_id, do_sdio_entry); /* Looks like: ssb:vNidNrevN. */ static int do_ssb_entry(const char *filename, @@ -884,7 +820,6 @@ static int do_ssb_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("ssb", ssb_device_id, do_ssb_entry); /* Looks like: bcma:mNidNrevNclN. */ static int do_bcma_entry(const char *filename, @@ -903,7 +838,6 @@ static int do_bcma_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("bcma", bcma_device_id, do_bcma_entry); /* Looks like: virtio:dNvN */ static int do_virtio_entry(const char *filename, void *symval, @@ -919,7 +853,6 @@ static int do_virtio_entry(const char *filename, void *symval, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("virtio", virtio_device_id, do_virtio_entry); /* * Looks like: vmbus:guid @@ -942,7 +875,6 @@ static int do_vmbus_entry(const char *filename, void *symval, return 1; } -ADD_TO_DEVTABLE("vmbus", hv_vmbus_device_id, do_vmbus_entry); /* Looks like: rpmsg:S */ static int do_rpmsg_entry(const char *filename, void *symval, @@ -953,7 +885,6 @@ static int do_rpmsg_entry(const char *filename, void *symval, return 1; } -ADD_TO_DEVTABLE("rpmsg", rpmsg_device_id, do_rpmsg_entry); /* Looks like: i2c:S */ static int do_i2c_entry(const char *filename, void *symval, @@ -964,7 +895,6 @@ static int do_i2c_entry(const char *filename, void *symval, return 1; } -ADD_TO_DEVTABLE("i2c", i2c_device_id, do_i2c_entry); /* Looks like: spi:S */ static int do_spi_entry(const char *filename, void *symval, @@ -975,7 +905,6 @@ static int do_spi_entry(const char *filename, void *symval, return 1; } -ADD_TO_DEVTABLE("spi", spi_device_id, do_spi_entry); static const struct dmifield { const char *prefix; @@ -1030,7 +959,6 @@ static int do_dmi_entry(const char *filename, void *symval, strcat(alias, ":"); return 1; } -ADD_TO_DEVTABLE("dmi", dmi_system_id, do_dmi_entry); static int do_platform_entry(const char *filename, void *symval, char *alias) @@ -1039,7 +967,6 @@ static int do_platform_entry(const char *filename, sprintf(alias, PLATFORM_MODULE_PREFIX "%s", *name); return 1; } -ADD_TO_DEVTABLE("platform", platform_device_id, do_platform_entry); static int do_mdio_entry(const char *filename, void *symval, char *alias) @@ -1064,7 +991,6 @@ static int do_mdio_entry(const char *filename, return 1; } -ADD_TO_DEVTABLE("mdio", mdio_device_id, do_mdio_entry); /* Looks like: zorro:iN. */ static int do_zorro_entry(const char *filename, void *symval, @@ -1075,7 +1001,6 @@ static int do_zorro_entry(const char *filename, void *symval, ADD(alias, "i", id != ZORRO_WILDCARD, id); return 1; } -ADD_TO_DEVTABLE("zorro", zorro_device_id, do_zorro_entry); /* looks like: "pnp:dD" */ static int do_isapnp_entry(const char *filename, @@ -1091,7 +1016,6 @@ static int do_isapnp_entry(const char *filename, (function >> 12) & 0x0f, (function >> 8) & 0x0f); return 1; } -ADD_TO_DEVTABLE("isapnp", isapnp_device_id, do_isapnp_entry); /* Looks like: "ipack:fNvNdN". */ static int do_ipack_entry(const char *filename, @@ -1107,7 +1031,6 @@ static int do_ipack_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("ipack", ipack_device_id, do_ipack_entry); /* * Append a match expression for a single masked hex digit. @@ -1178,7 +1101,6 @@ static int do_amba_entry(const char *filename, return 1; } -ADD_TO_DEVTABLE("amba", amba_id, do_amba_entry); /* * looks like: "mipscdmm:tN" @@ -1194,7 +1116,6 @@ static int do_mips_cdmm_entry(const char *filename, sprintf(alias, "mipscdmm:t%02X*", type); return 1; } -ADD_TO_DEVTABLE("mipscdmm", mips_cdmm_device_id, do_mips_cdmm_entry); /* LOOKS like cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:*,FEAT,* * All fields are numbers. It would be nicer to use strings for vendor @@ -1219,7 +1140,6 @@ static int do_x86cpu_entry(const char *filename, void *symval, sprintf(alias + strlen(alias), "%04X*", feature); return 1; } -ADD_TO_DEVTABLE("x86cpu", x86_cpu_id, do_x86cpu_entry); /* LOOKS like cpu:type:*:feature:*FEAT* */ static int do_cpu_entry(const char *filename, void *symval, char *alias) @@ -1229,7 +1149,6 @@ static int do_cpu_entry(const char *filename, void *symval, char *alias) sprintf(alias, "cpu:type:*:feature:*%04X*", feature); return 1; } -ADD_TO_DEVTABLE("cpu", cpu_feature, do_cpu_entry); /* Looks like: mei:S:uuid:N:* */ static int do_mei_entry(const char *filename, void *symval, @@ -1248,7 +1167,6 @@ static int do_mei_entry(const char *filename, void *symval, return 1; } -ADD_TO_DEVTABLE("mei", mei_cl_device_id, do_mei_entry); /* Looks like: rapidio:vNdNavNadN */ static int do_rio_entry(const char *filename, @@ -1268,7 +1186,6 @@ static int do_rio_entry(const char *filename, add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("rapidio", rio_device_id, do_rio_entry); /* Looks like: ulpi:vNpN */ static int do_ulpi_entry(const char *filename, void *symval, @@ -1281,7 +1198,6 @@ static int do_ulpi_entry(const char *filename, void *symval, return 1; } -ADD_TO_DEVTABLE("ulpi", ulpi_device_id, do_ulpi_entry); /* Looks like: hdaudio:vNrNaN */ static int do_hda_entry(const char *filename, void *symval, char *alias) @@ -1298,7 +1214,6 @@ static int do_hda_entry(const char *filename, void *symval, char *alias) add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("hdaudio", hda_device_id, do_hda_entry); /* Looks like: sdw:mNpN */ static int do_sdw_entry(const char *filename, void *symval, char *alias) @@ -1313,7 +1228,6 @@ static int do_sdw_entry(const char *filename, void *symval, char *alias) add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("sdw", sdw_device_id, do_sdw_entry); /* Looks like: fsl-mc:vNdN */ static int do_fsl_mc_entry(const char *filename, void *symval, @@ -1325,7 +1239,6 @@ static int do_fsl_mc_entry(const char *filename, void *symval, sprintf(alias, "fsl-mc:v%08Xd%s", vendor, *obj_type); return 1; } -ADD_TO_DEVTABLE("fslmc", fsl_mc_device_id, do_fsl_mc_entry); /* Looks like: tbsvc:kSpNvNrN */ static int do_tbsvc_entry(const char *filename, void *symval, char *alias) @@ -1350,7 +1263,6 @@ static int do_tbsvc_entry(const char *filename, void *symval, char *alias) add_wildcard(alias); return 1; } -ADD_TO_DEVTABLE("tbsvc", tb_service_id, do_tbsvc_entry); /* Looks like: typec:idNmN */ static int do_typec_entry(const char *filename, void *symval, char *alias) @@ -1363,7 +1275,6 @@ static int do_typec_entry(const char *filename, void *symval, char *alias) return 1; } -ADD_TO_DEVTABLE("typec", typec_device_id, do_typec_entry); /* Does namelen bytes of name exactly match the symbol? */ static bool sym_is(const char *name, unsigned namelen, const char *symbol) @@ -1396,6 +1307,48 @@ static void do_table(void *symval, unsigned long size, } } +static const struct devtable devtable[] = { + {"hid", SIZE_hid_device_id, do_hid_entry}, + {"ieee1394", SIZE_ieee1394_device_id, do_ieee1394_entry}, + {"pci", SIZE_pci_device_id, do_pci_entry}, + {"ccw", SIZE_ccw_device_id, do_ccw_entry}, + {"ap", SIZE_ap_device_id, do_ap_entry}, + {"css", SIZE_css_device_id, do_css_entry}, + {"serio", SIZE_serio_device_id, do_serio_entry}, + {"acpi", SIZE_acpi_device_id, do_acpi_entry}, + {"pcmcia", SIZE_pcmcia_device_id, do_pcmcia_entry}, + {"vio", SIZE_vio_device_id, do_vio_entry}, + {"input", SIZE_input_device_id, do_input_entry}, + {"eisa", SIZE_eisa_device_id, do_eisa_entry}, + {"parisc", SIZE_parisc_device_id, do_parisc_entry}, + {"sdio", SIZE_sdio_device_id, do_sdio_entry}, + {"ssb", SIZE_ssb_device_id, do_ssb_entry}, + {"bcma", SIZE_bcma_device_id, do_bcma_entry}, + {"virtio", SIZE_virtio_device_id, do_virtio_entry}, + {"vmbus", SIZE_hv_vmbus_device_id, do_vmbus_entry}, + {"rpmsg", SIZE_rpmsg_device_id, do_rpmsg_entry}, + {"i2c", SIZE_i2c_device_id, do_i2c_entry}, + {"spi", SIZE_spi_device_id, do_spi_entry}, + {"dmi", SIZE_dmi_system_id, do_dmi_entry}, + {"platform", SIZE_platform_device_id, do_platform_entry}, + {"mdio", SIZE_mdio_device_id, do_mdio_entry}, + {"zorro", SIZE_zorro_device_id, do_zorro_entry}, + {"isapnp", SIZE_isapnp_device_id, do_isapnp_entry}, + {"ipack", SIZE_ipack_device_id, do_ipack_entry}, + {"amba", SIZE_amba_id, do_amba_entry}, + {"mipscdmm", SIZE_mips_cdmm_device_id, do_mips_cdmm_entry}, + {"x86cpu", SIZE_x86_cpu_id, do_x86cpu_entry}, + {"cpu", SIZE_cpu_feature, do_cpu_entry}, + {"mei", SIZE_mei_cl_device_id, do_mei_entry}, + {"rapidio", SIZE_rio_device_id, do_rio_entry}, + {"ulpi", SIZE_ulpi_device_id, do_ulpi_entry}, + {"hdaudio", SIZE_hda_device_id, do_hda_entry}, + {"sdw", SIZE_sdw_device_id, do_sdw_entry}, + {"fslmc", SIZE_fsl_mc_device_id, do_fsl_mc_entry}, + {"tbsvc", SIZE_tb_service_id, do_tbsvc_entry}, + {"typec", SIZE_typec_device_id, do_typec_entry}, +}; + /* Create MODULE_ALIAS() statements. * At this time, we cannot write the actual output C source yet, * so we write into the mod->dev_table_buf buffer. */ @@ -1450,13 +1403,14 @@ void handle_moddevtable(struct module *mod, struct elf_info *info, else if (sym_is(name, namelen, "pnp_card")) do_pnp_card_entries(symval, sym->st_size, mod); else { - struct devtable **p; - INIT_SECTION(__devtable); + int i; - for (p = __start___devtable; p < __stop___devtable; p++) { - if (sym_is(name, namelen, (*p)->device_id)) { - do_table(symval, sym->st_size, (*p)->id_size, - (*p)->device_id, (*p)->function, mod); + for (i = 0; i < ARRAY_SIZE(devtable); i++) { + const struct devtable *p = &devtable[i]; + + if (sym_is(name, namelen, p->device_id)) { + do_table(symval, sym->st_size, p->id_size, + p->device_id, p->function, mod); break; } } From 7de43cb71116ab13adaf1f57a72edb6757ca57d0 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Thu, 22 Nov 2018 13:28:42 +0900 Subject: [PATCH 85/99] modpost: file2alias: check prototype of handler [ Upstream commit f880eea68fe593342fa6e09be9bb661f3c297aec ] Use specific prototype instead of an opaque pointer so that the compiler can catch function prototype mismatch. Signed-off-by: Masahiro Yamada Reviewed-by: Mathieu Malaterre Signed-off-by: Sasha Levin --- scripts/mod/file2alias.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c index 9f3ebde7a0cd..7f40b6aab689 100644 --- a/scripts/mod/file2alias.c +++ b/scripts/mod/file2alias.c @@ -47,7 +47,7 @@ typedef struct { struct devtable { const char *device_id; /* name of table, __mod___*_device_table. */ unsigned long id_size; - void *function; + int (*do_entry)(const char *filename, void *symval, char *alias); }; /* Define a variable f that holds the value of field f of struct devid @@ -1288,12 +1288,11 @@ static bool sym_is(const char *name, unsigned namelen, const char *symbol) static void do_table(void *symval, unsigned long size, unsigned long id_size, const char *device_id, - void *function, + int (*do_entry)(const char *filename, void *symval, char *alias), struct module *mod) { unsigned int i; char alias[500]; - int (*do_entry)(const char *, void *entry, char *alias) = function; device_id_check(mod->name, device_id, size, id_size, symval); /* Leave last one: it's the terminator. */ @@ -1410,7 +1409,7 @@ void handle_moddevtable(struct module *mod, struct elf_info *info, if (sym_is(name, namelen, p->device_id)) { do_table(symval, sym->st_size, p->id_size, - p->device_id, p->function, mod); + p->device_id, p->do_entry, mod); break; } } From 18af9b7b91380825ac398ff3b94fac4e0621c5cc Mon Sep 17 00:00:00 2001 From: Jarkko Sakkinen Date: Fri, 8 Feb 2019 18:30:59 +0200 Subject: [PATCH 86/99] tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete [ Upstream commit 442601e87a4769a8daba4976ec3afa5222ca211d ] Return -E2BIG when the transfer is incomplete. The upper layer does not retry, so not doing that is incorrect behaviour. Cc: stable@vger.kernel.org Fixes: a2871c62e186 ("tpm: Add support for Atmel I2C TPMs") Signed-off-by: Jarkko Sakkinen Reviewed-by: Stefan Berger Reviewed-by: Jerry Snitselaar Signed-off-by: Sasha Levin --- drivers/char/tpm/tpm_i2c_atmel.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/char/tpm/tpm_i2c_atmel.c b/drivers/char/tpm/tpm_i2c_atmel.c index 32a8e27c5382..cc4e642d3180 100644 --- a/drivers/char/tpm/tpm_i2c_atmel.c +++ b/drivers/char/tpm/tpm_i2c_atmel.c @@ -69,6 +69,10 @@ static int i2c_atmel_send(struct tpm_chip *chip, u8 *buf, size_t len) if (status < 0) return status; + /* The upper layer does not support incomplete sends. */ + if (status != len) + return -E2BIG; + return 0; } From 1c36862e8be8e45094024e7384a7b441779f34cb Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 23 Apr 2019 16:05:18 +0300 Subject: [PATCH 87/99] tpm: Fix the type of the return value in calc_tpm2_event_size() commit b9d0a85d6b2e76630cfd4c475ee3af4109bfd87a upstream calc_tpm2_event_size() has an invalid signature because it returns a 'size_t' where as its signature says that it returns 'int'. Cc: Fixes: 4d23cc323cdb ("tpm: add securityfs support for TPM 2.0 firmware event log") Suggested-by: Jarkko Sakkinen Signed-off-by: Yue Haibing Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: James Morris Signed-off-by: Sasha Levin --- drivers/char/tpm/eventlog/tpm2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/char/tpm/eventlog/tpm2.c b/drivers/char/tpm/eventlog/tpm2.c index 1b8fa9de2cac..41b9f6c92da7 100644 --- a/drivers/char/tpm/eventlog/tpm2.c +++ b/drivers/char/tpm/eventlog/tpm2.c @@ -37,8 +37,8 @@ * * Returns size of the event. If it is an invalid event, returns 0. */ -static int calc_tpm2_event_size(struct tcg_pcr_event2 *event, - struct tcg_pcr_event *event_header) +static size_t calc_tpm2_event_size(struct tcg_pcr_event2 *event, + struct tcg_pcr_event *event_header) { struct tcg_efi_specid_event *efispecid; struct tcg_event_field *event_field; From c21bcc2352e954202798588c41cd76c43073d207 Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Tue, 23 Apr 2019 12:04:23 -0700 Subject: [PATCH 88/99] Revert "kbuild: use -Oz instead of -Os when using clang" commit a75bb4eb9e565b9f5115e2e8c07377ce32cbe69a upstream. The clang option -Oz enables *aggressive* optimization for size, which doesn't necessarily result in smaller images, but can have negative impact on performance. Switch back to the less aggressive -Os. This reverts commit 6748cb3c299de1ffbe56733647b01dbcc398c419. Suggested-by: Peter Zijlstra Signed-off-by: Matthias Kaehlcke Reviewed-by: Nick Desaulniers Signed-off-by: Masahiro Yamada Signed-off-by: Nathan Chancellor Signed-off-by: Sasha Levin --- Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 3fac08f6a11e..91e265ca0ca5 100644 --- a/Makefile +++ b/Makefile @@ -661,8 +661,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow) KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context) ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE -KBUILD_CFLAGS += $(call cc-option,-Oz,-Os) -KBUILD_CFLAGS += $(call cc-disable-warning,maybe-uninitialized,) +KBUILD_CFLAGS += -Os $(call cc-disable-warning,maybe-uninitialized,) else ifdef CONFIG_PROFILE_ALL_BRANCHES KBUILD_CFLAGS += -O2 $(call cc-disable-warning,maybe-uninitialized,) From c3edd427d5389ca46734c343662cdba1b3048f12 Mon Sep 17 00:00:00 2001 From: Phil Auld Date: Tue, 23 Apr 2019 19:51:06 -0400 Subject: [PATCH 89/99] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup [ Upstream commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 ] With extremely short cfs_period_us setting on a parent task group with a large number of children the for loop in sched_cfs_period_timer() can run until the watchdog fires. There is no guarantee that the call to hrtimer_forward_now() will ever return 0. The large number of children can make do_sched_cfs_period_timer() take longer than the period. NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 RIP: 0010:tg_nop+0x0/0x10 walk_tg_tree_from+0x29/0xb0 unthrottle_cfs_rq+0xe0/0x1a0 distribute_cfs_runtime+0xd3/0xf0 sched_cfs_period_timer+0xcb/0x160 ? sched_cfs_slack_timer+0xd0/0xd0 __hrtimer_run_queues+0xfb/0x270 hrtimer_interrupt+0x122/0x270 smp_apic_timer_interrupt+0x6a/0x140 apic_timer_interrupt+0xf/0x20 To prevent this we add protection to the loop that detects when the loop has run too many times and scales the period and quota up, proportionally, so that the timer can complete before then next period expires. This preserves the relative runtime quota while preventing the hard lockup. A warning is issued reporting this state and the new values. Signed-off-by: Phil Auld Signed-off-by: Peter Zijlstra (Intel) Cc: Cc: Anton Blanchard Cc: Ben Segall Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- kernel/sched/fair.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 640094391169..4aa8e7d90c25 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4847,12 +4847,15 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer) return HRTIMER_NORESTART; } +extern const u64 max_cfs_quota_period; + static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) { struct cfs_bandwidth *cfs_b = container_of(timer, struct cfs_bandwidth, period_timer); int overrun; int idle = 0; + int count = 0; raw_spin_lock(&cfs_b->lock); for (;;) { @@ -4860,6 +4863,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) if (!overrun) break; + if (++count > 3) { + u64 new, old = ktime_to_ns(cfs_b->period); + + new = (old * 147) / 128; /* ~115% */ + new = min(new, max_cfs_quota_period); + + cfs_b->period = ns_to_ktime(new); + + /* since max is 1s, this is limited to 1e9^2, which fits in u64 */ + cfs_b->quota *= new; + cfs_b->quota = div64_u64(cfs_b->quota, old); + + pr_warn_ratelimited( + "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n", + smp_processor_id(), + div_u64(new, NSEC_PER_USEC), + div_u64(cfs_b->quota, NSEC_PER_USEC)); + + /* reset count so we don't come right back in here */ + count = 0; + } + idle = do_sched_cfs_period_timer(cfs_b, overrun); } if (idle) From 628c99a836dde71a4e415859359b7ebecdb3d363 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Tue, 19 Mar 2019 02:36:59 +0100 Subject: [PATCH 90/99] device_cgroup: fix RCU imbalance in error case commit 0fcc4c8c044e117ac126ab6df4138ea9a67fa2a9 upstream. When dev_exception_add() returns an error (due to a failed memory allocation), make sure that we move the RCU preemption count back to where it was before we were called. We dropped the RCU read lock inside the loop body, so we can't just "break". sparse complains about this, too: $ make -s C=2 security/device_cgroup.o ./include/linux/rcupdate.h:647:9: warning: context imbalance in 'propagate_exception' - unexpected unlock Fixes: d591fb56618f ("device_cgroup: simplify cgroup tree walk in propagate_exception()") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn Acked-by: Michal Hocko Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman --- security/device_cgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/device_cgroup.c b/security/device_cgroup.c index cd97929fac66..dc28914fa72e 100644 --- a/security/device_cgroup.c +++ b/security/device_cgroup.c @@ -560,7 +560,7 @@ static int propagate_exception(struct dev_cgroup *devcg_root, devcg->behavior == DEVCG_DEFAULT_ALLOW) { rc = dev_exception_add(devcg, ex); if (rc) - break; + return rc; } else { /* * in the other possible cases: From 1343fd8f9629ef1e040bdb84626ae69382272741 Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Thu, 18 Apr 2019 17:50:20 -0700 Subject: [PATCH 91/99] mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n commit e8277b3b52240ec1caad8e6df278863e4bf42eac upstream. Commit 58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly") depends on skipping vmstat entries with empty name introduced in 7aaf77272358 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat") but reverted in b29940c1abd7 ("mm: rename and change semantics of nr_indirectly_reclaimable_bytes"). So skipping no longer works and /proc/vmstat has misformatted lines " 0". This patch simply shows debug counters "nr_tlb_remote_*" for UP. Link: http://lkml.kernel.org/r/155481488468.467.4295519102880913454.stgit@buzz Fixes: 58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly") Signed-off-by: Konstantin Khlebnikov Acked-by: Vlastimil Babka Cc: Roman Gushchin Cc: Jann Horn Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/vmstat.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index 2878dc4e9af6..4a387937f9f5 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1272,13 +1272,8 @@ const char * const vmstat_text[] = { #endif #endif /* CONFIG_MEMORY_BALLOON */ #ifdef CONFIG_DEBUG_TLBFLUSH -#ifdef CONFIG_SMP "nr_tlb_remote_flush", "nr_tlb_remote_flush_received", -#else - "", /* nr_tlb_remote_flush */ - "", /* nr_tlb_remote_flush_received */ -#endif /* CONFIG_SMP */ "nr_tlb_local_flush_all", "nr_tlb_local_flush_one", #endif /* CONFIG_DEBUG_TLBFLUSH */ From 8a6f2ea0c3dd3de75cc344fe8d216457287a2ab2 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Tue, 16 Apr 2019 15:25:00 +0200 Subject: [PATCH 92/99] ALSA: info: Fix racy addition/deletion of nodes commit 8c2f870890fd28e023b0fcf49dcee333f2c8bad7 upstream. The ALSA proc helper manages the child nodes in a linked list, but its addition and deletion is done without any lock. This leads to a corruption if they are operated concurrently. Usually this isn't a problem because the proc entries are added sequentially in the driver probe procedure itself. But the card registrations are done often asynchronously, and the crash could be actually reproduced with syzkaller. This patch papers over it by protecting the link addition and deletion with the parent's mutex. There is "access" mutex that is used for the file access, and this can be reused for this purpose as well. Reported-by: syzbot+48df349490c36f9f54ab@syzkaller.appspotmail.com Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/info.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/sound/core/info.c b/sound/core/info.c index fe502bc5e6d2..679136fba730 100644 --- a/sound/core/info.c +++ b/sound/core/info.c @@ -722,8 +722,11 @@ snd_info_create_entry(const char *name, struct snd_info_entry *parent) INIT_LIST_HEAD(&entry->children); INIT_LIST_HEAD(&entry->list); entry->parent = parent; - if (parent) + if (parent) { + mutex_lock(&parent->access); list_add_tail(&entry->list, &parent->children); + mutex_unlock(&parent->access); + } return entry; } @@ -805,7 +808,12 @@ void snd_info_free_entry(struct snd_info_entry * entry) list_for_each_entry_safe(p, n, &entry->children, list) snd_info_free_entry(p); - list_del(&entry->list); + p = entry->parent; + if (p) { + mutex_lock(&p->access); + list_del(&entry->list); + mutex_unlock(&p->access); + } kfree(entry->name); if (entry->private_free) entry->private_free(entry); From 6580376fe81093d3a796b7e8b8c3c0f9e620f89c Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Mon, 18 Mar 2019 02:32:36 +0100 Subject: [PATCH 93/99] percpu: stop printing kernel addresses commit 00206a69ee32f03e6f40837684dcbe475ea02266 upstream. Since commit ad67b74d2469d9b8 ("printk: hash addresses printed with %p"), at boot "____ptrval____" is printed instead of actual addresses: percpu: Embedded 38 pages/cpu @(____ptrval____) s124376 r0 d31272 u524288 Instead of changing the print to "%px", and leaking kernel addresses, just remove the print completely, cfr. e.g. commit 071929dbdd865f77 ("arm64: Stop printing the virtual memory layout"). Signed-off-by: Matteo Croce Signed-off-by: Dennis Zhou Signed-off-by: Greg Kroah-Hartman --- mm/percpu.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 4b90682623e9..41e58f3d8fbf 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -2529,8 +2529,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, ai->groups[group].base_offset = areas[group] - base; } - pr_info("Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n", - PFN_DOWN(size_sum), base, ai->static_size, ai->reserved_size, + pr_info("Embedded %zu pages/cpu s%zu r%zu d%zu u%zu\n", + PFN_DOWN(size_sum), ai->static_size, ai->reserved_size, ai->dyn_size, ai->unit_size); rc = pcpu_setup_first_chunk(ai, base); @@ -2651,8 +2651,8 @@ int __init pcpu_page_first_chunk(size_t reserved_size, } /* we're ready, commit */ - pr_info("%d %s pages/cpu @%p s%zu r%zu d%zu\n", - unit_pages, psize_str, vm.addr, ai->static_size, + pr_info("%d %s pages/cpu s%zu r%zu d%zu\n", + unit_pages, psize_str, ai->static_size, ai->reserved_size, ai->dyn_size); rc = pcpu_setup_first_chunk(ai, vm.addr); From a782f8475715d6dc61f892127d4fb8e5f0775ac4 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Tue, 25 Sep 2018 10:55:59 -0300 Subject: [PATCH 94/99] tools include: Adopt linux/bits.h commit ba4aa02b417f08a0bee5e7b8ed70cac788a7c854 upstream. So that we reduce the difference of tools/include/linux/bitops.h to the original kernel file, include/linux/bitops.h, trying to remove the need to define BITS_PER_LONG, to avoid clashes with asm/bitsperlong.h. And the things removed from tools/include/linux/bitops.h are really in linux/bits.h, so that we can have a copy and then tools/perf/check_headers.sh will tell us when new stuff gets added to linux/bits.h so that we can check if it is useful and if any adjustment needs to be done to the tools/{include,arch}/ copies. Cc: Adrian Hunter Cc: Alexander Sverdlin Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-y1sqyydvfzo0bjjoj4zsl562@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman --- tools/include/linux/bitops.h | 7 ++----- tools/include/linux/bits.h | 26 ++++++++++++++++++++++++++ tools/perf/check-headers.sh | 1 + 3 files changed, 29 insertions(+), 5 deletions(-) create mode 100644 tools/include/linux/bits.h diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h index acc704bd3998..0b0ef3abc966 100644 --- a/tools/include/linux/bitops.h +++ b/tools/include/linux/bitops.h @@ -3,8 +3,6 @@ #define _TOOLS_LINUX_BITOPS_H_ #include -#include - #ifndef __WORDSIZE #define __WORDSIZE (__SIZEOF_LONG__ * 8) #endif @@ -12,10 +10,9 @@ #ifndef BITS_PER_LONG # define BITS_PER_LONG __WORDSIZE #endif +#include +#include -#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) -#define BIT_WORD(nr) ((nr) / BITS_PER_LONG) -#define BITS_PER_BYTE 8 #define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) #define BITS_TO_U64(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(u64)) #define BITS_TO_U32(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(u32)) diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h new file mode 100644 index 000000000000..2b7b532c1d51 --- /dev/null +++ b/tools/include/linux/bits.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_BITS_H +#define __LINUX_BITS_H +#include + +#define BIT(nr) (1UL << (nr)) +#define BIT_ULL(nr) (1ULL << (nr)) +#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) +#define BIT_WORD(nr) ((nr) / BITS_PER_LONG) +#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG)) +#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG) +#define BITS_PER_BYTE 8 + +/* + * Create a contiguous bitmask starting at bit position @l and ending at + * position @h. For example + * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. + */ +#define GENMASK(h, l) \ + (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) + +#define GENMASK_ULL(h, l) \ + (((~0ULL) - (1ULL << (l)) + 1) & \ + (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h)))) + +#endif /* __LINUX_BITS_H */ diff --git a/tools/perf/check-headers.sh b/tools/perf/check-headers.sh index 466540ee8ea7..c72cc73a6b09 100755 --- a/tools/perf/check-headers.sh +++ b/tools/perf/check-headers.sh @@ -14,6 +14,7 @@ include/uapi/linux/sched.h include/uapi/linux/stat.h include/uapi/linux/vhost.h include/uapi/sound/asound.h +include/linux/bits.h include/linux/hash.h include/uapi/linux/hw_breakpoint.h arch/x86/include/asm/disabled-features.h From 52dde1160f17c42bca85f1365227a69b2c6aa9e8 Mon Sep 17 00:00:00 2001 From: Katsuhiro Suzuki Date: Tue, 11 Sep 2018 01:39:32 +0900 Subject: [PATCH 95/99] ASoC: rockchip: add missing INTERLEAVED PCM attribute commit 24d6638302b48328a58c13439276d4531af4ca7d upstream. This patch adds SNDRV_PCM_INFO_INTERLEAVED into PCM hardware info. Signed-off-by: Katsuhiro Suzuki Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/rockchip/rockchip_pcm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sound/soc/rockchip/rockchip_pcm.c b/sound/soc/rockchip/rockchip_pcm.c index 7029e0b85f9e..4ac78d7a4b2d 100644 --- a/sound/soc/rockchip/rockchip_pcm.c +++ b/sound/soc/rockchip/rockchip_pcm.c @@ -21,7 +21,8 @@ static const struct snd_pcm_hardware snd_rockchip_hardware = { .info = SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_MMAP_VALID | SNDRV_PCM_INFO_PAUSE | - SNDRV_PCM_INFO_RESUME, + SNDRV_PCM_INFO_RESUME | + SNDRV_PCM_INFO_INTERLEAVED, .period_bytes_min = 32, .period_bytes_max = 8192, .periods_min = 1, From 9c1862566176250bae343a5cee617a5ce41efa54 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 27 Oct 2018 09:10:48 -0700 Subject: [PATCH 96/99] i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit b59dfdaef173677b0b7e10f375226c0a1114fd20 upstream. Commit 9ee3e06610fd ("HID: i2c-hid: override HID descriptors for certain devices") added a new dmi_system_id quirk table to override certain HID report descriptors for some systems that lack them. But the table wasn't properly terminated, causing the dmi matching to walk off into la-la-land, and starting to treat random data as dmi descriptor pointers, causing boot-time oopses if you were at all unlucky. Terminate the array. We really should have some way to just statically check that arrays that should be terminated by an empty entry actually are so. But the HID people really should have caught this themselves, rather than have me deal with an oops during the merge window. Tssk, tssk. Cc: Julian Sax Cc: Benjamin Tissoires Cc: Jiri Kosina Signed-off-by: Linus Torvalds Cc: Ambrož Bizjak Signed-off-by: Greg Kroah-Hartman --- drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c b/drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c index 1d645c9ab417..cac262a912c1 100644 --- a/drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c +++ b/drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c @@ -337,7 +337,8 @@ static const struct dmi_system_id i2c_hid_dmi_desc_override_table[] = { DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "FlexBook edge11 - M-FBE11"), }, .driver_data = (void *)&sipodev_desc - } + }, + { } /* Terminate list */ }; From ac54bc121e1f491b70d71ac234b8e0f76f79d7ad Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 25 Apr 2019 10:00:43 +0200 Subject: [PATCH 97/99] Revert "locking/lockdep: Add debug_locks check in __lock_downgrade()" This reverts commit 0e0f7b30721233ce610bde2395a8e58ff7771475 which was commit 71492580571467fb7177aade19c18ce7486267f5 upstream. Tetsuo rightly points out that the backport here is incorrect, as it touches the __lock_set_class function instead of the intended __lock_downgrade function. Reported-by: Tetsuo Handa Cc: Waiman Long Cc: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Ingo Molnar Cc: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/locking/lockdep.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 0cbdbbb0729f..26b57e24476f 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3567,9 +3567,6 @@ __lock_set_class(struct lockdep_map *lock, const char *name, unsigned int depth; int i; - if (unlikely(!debug_locks)) - return 0; - depth = curr->lockdep_depth; /* * This function is about (re)setting the class of a held lock, From cdd369fe0f98c29d01b310f7ee1c89e43eecbd8c Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 5 Apr 2019 18:39:38 -0700 Subject: [PATCH 98/99] kernel/sysctl.c: fix out-of-bounds access when setting file-max commit 9002b21465fa4d829edfc94a5a441005cffaa972 upstream. Commit 32a5ad9c2285 ("sysctl: handle overflow for file-max") hooked up min/max values for the file-max sysctl parameter via the .extra1 and .extra2 fields in the corresponding struct ctl_table entry. Unfortunately, the minimum value points at the global 'zero' variable, which is an int. This results in a KASAN splat when accessed as a long by proc_doulongvec_minmax on 64-bit architectures: | BUG: KASAN: global-out-of-bounds in __do_proc_doulongvec_minmax+0x5d8/0x6a0 | Read of size 8 at addr ffff2000133d1c20 by task systemd/1 | | CPU: 0 PID: 1 Comm: systemd Not tainted 5.1.0-rc3-00012-g40b114779944 #2 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x228 | show_stack+0x14/0x20 | dump_stack+0xe8/0x124 | print_address_description+0x60/0x258 | kasan_report+0x140/0x1a0 | __asan_report_load8_noabort+0x18/0x20 | __do_proc_doulongvec_minmax+0x5d8/0x6a0 | proc_doulongvec_minmax+0x4c/0x78 | proc_sys_call_handler.isra.19+0x144/0x1d8 | proc_sys_write+0x34/0x58 | __vfs_write+0x54/0xe8 | vfs_write+0x124/0x3c0 | ksys_write+0xbc/0x168 | __arm64_sys_write+0x68/0x98 | el0_svc_common+0x100/0x258 | el0_svc_handler+0x48/0xc0 | el0_svc+0x8/0xc | | The buggy address belongs to the variable: | zero+0x0/0x40 | | Memory state around the buggy address: | ffff2000133d1b00: 00 00 00 00 00 00 00 00 fa fa fa fa 04 fa fa fa | ffff2000133d1b80: fa fa fa fa 04 fa fa fa fa fa fa fa 04 fa fa fa | >ffff2000133d1c00: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 00 00 | ^ | ffff2000133d1c80: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 00 00 | ffff2000133d1d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fix the splat by introducing a unsigned long 'zero_ul' and using that instead. Link: http://lkml.kernel.org/r/20190403153409.17307-1-will.deacon@arm.com Fixes: 32a5ad9c2285 ("sysctl: handle overflow for file-max") Signed-off-by: Will Deacon Acked-by: Christian Brauner Cc: Kees Cook Cc: Alexey Dobriyan Cc: Matteo Croce Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/sysctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 9e22660153ff..9a85c7ae7362 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -125,6 +125,7 @@ static int zero; static int __maybe_unused one = 1; static int __maybe_unused two = 2; static int __maybe_unused four = 4; +static unsigned long zero_ul; static unsigned long one_ul = 1; static unsigned long long_max = LONG_MAX; static int one_hundred = 100; @@ -1696,7 +1697,7 @@ static struct ctl_table fs_table[] = { .maxlen = sizeof(files_stat.max_files), .mode = 0644, .proc_handler = proc_doulongvec_minmax, - .extra1 = &zero, + .extra1 = &zero_ul, .extra2 = &long_max, }, { From 19bb613acb9ad8e57593cad5118acaee117cc303 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sat, 27 Apr 2019 09:36:41 +0200 Subject: [PATCH 99/99] Linux 4.19.37 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 91e265ca0ca5..7b495cad8c2e 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 19 -SUBLEVEL = 36 +SUBLEVEL = 37 EXTRAVERSION = NAME = "People's Front"