asgardius/android_kernel_motorola_sm6225

Author	SHA1	Message	Date
Daniel Borkmann	4f3446bb80	bpf: add generic constant blinding for use in jits This work adds a generic facility for use from eBPF JIT compilers that allows for further hardening of JIT generated images through blinding constants. In response to the original work on BPF JIT spraying published by Keegan McAllister [1], most BPF JITs were changed to make images read-only and start at a randomized offset in the page, where the rest was filled with trap instructions. We have this nowadays in x86, arm, arm64 and s390 JIT compilers. Additionally, later work also made eBPF interpreter images read only for kernels supporting DEBUG_SET_MODULE_RONX, that is, x86, arm, arm64 and s390 archs as well currently. This is done by default for mentioned JITs when JITing is enabled. Furthermore, we had a generic and configurable constant blinding facility on our todo for quite some time now to further make spraying harder, and first implementation since around netconf 2016. We found that for systems where untrusted users can load cBPF/eBPF code where JIT is enabled, start offset randomization helps a bit to make jumps into crafted payload harder, but in case where larger programs that cross page boundary are injected, we again have some part of the program opcodes at a page start offset. With improved guessing and more reliable payload injection, chances can increase to jump into such payload. Elena Reshetova recently wrote a test case for it [2, 3]. Moreover, eBPF comes with 64 bit constants, which can leave some more room for payloads. Note that for all this, additional bugs in the kernel are still required to make the jump (and of course to guess right, to not jump into a trap) and naturally the JIT must be enabled, which is disabled by default. For helping mitigation, the general idea is to provide an option bpf_jit_harden that admins can tweak along with bpf_jit_enable, so that for cases where JIT should be enabled for performance reasons, the generated image can be further hardened with blinding constants for unpriviledged users (bpf_jit_harden == 1), with trading off performance for these, but not for privileged ones. We also added the option of blinding for all users (bpf_jit_harden == 2), which is quite helpful for testing f.e. with test_bpf.ko. There are no further e.g. hardening levels of bpf_jit_harden switch intended, rationale is to have it dead simple to use as on/off. Since this functionality would need to be duplicated over and over for JIT compilers to use, which are already complex enough, we provide a generic eBPF byte-code level based blinding implementation, which is then just transparently JITed. JIT compilers need to make only a few changes to integrate this facility and can be migrated one by one. This option is for eBPF JITs and will be used in x86, arm64, s390 without too much effort, and soon ppc64 JITs, thus that native eBPF can be blinded as well as cBPF to eBPF migrations, so that both can be covered with a single implementation. The rule for JITs is that bpf_jit_blind_constants() must be called from bpf_int_jit_compile(), and in case blinding is disabled, we follow normally with JITing the passed program. In case blinding is enabled and we fail during the process of blinding itself, we must return with the interpreter. Similarly, in case the JITing process after the blinding failed, we return normally to the interpreter with the non-blinded code. Meaning, interpreter doesn't change in any way and operates on eBPF code as usual. For doing this pre-JIT blinding step, we need to make use of a helper/auxiliary register, here BPF_REG_AX. This is strictly internal to the JIT and not in any way part of the eBPF architecture. Just like in the same way as JITs internally make use of some helper registers when emitting code, only that here the helper register is one abstraction level higher in eBPF bytecode, but nevertheless in JIT phase. That helper register is needed since f.e. manually written program can issue loads to all registers of eBPF architecture. The core concept with the additional register is: blind out all 32 and 64 bit constants by converting BPF_K based instructions into a small sequence from K_VAL into ((RND ^ K_VAL) ^ RND). Therefore, this is transformed into: BPF_REG_AX := (RND ^ K_VAL), BPF_REG_AX ^= RND, and REG <OP> BPF_REG_AX, so actual operation on the target register is translated from BPF_K into BPF_X one that is operating on BPF_REG_AX's content. During rewriting phase when blinding, RND is newly generated via prandom_u32() for each processed instruction. 64 bit loads are split into two 32 bit loads to make translation and patching not too complex. Only basic thing required by JITs is to call the helper bpf_jit_blind_constants()/bpf_jit_prog_release_other() pair, and to map BPF_REG_AX into an unused register. Small bpf_jit_disasm extract from [2] when applied to x86 JIT: echo 0 > /proc/sys/net/core/bpf_jit_harden ffffffffa034f5e9 + <x>: [...] 39: mov $0xa8909090,%eax 3e: mov $0xa8909090,%eax 43: mov $0xa8ff3148,%eax 48: mov $0xa89081b4,%eax 4d: mov $0xa8900bb0,%eax 52: mov $0xa810e0c1,%eax 57: mov $0xa8908eb4,%eax 5c: mov $0xa89020b0,%eax [...] echo 1 > /proc/sys/net/core/bpf_jit_harden ffffffffa034f1e5 + <x>: [...] 39: mov $0xe1192563,%r10d 3f: xor $0x4989b5f3,%r10d 46: mov %r10d,%eax 49: mov $0xb8296d93,%r10d 4f: xor $0x10b9fd03,%r10d 56: mov %r10d,%eax 59: mov $0x8c381146,%r10d 5f: xor $0x24c7200e,%r10d 66: mov %r10d,%eax 69: mov $0xeb2a830e,%r10d 6f: xor $0x43ba02ba,%r10d 76: mov %r10d,%eax 79: mov $0xd9730af,%r10d 7f: xor $0xa5073b1f,%r10d 86: mov %r10d,%eax 89: mov $0x9a45662b,%r10d 8f: xor $0x325586ea,%r10d 96: mov %r10d,%eax [...] As can be seen, original constants that carry payload are hidden when enabled, actual operations are transformed from constant-based to register-based ones, making jumps into constants ineffective. Above extract/example uses single BPF load instruction over and over, but of course all instructions with constants are blinded. Performance wise, JIT with blinding performs a bit slower than just JIT and faster than interpreter case. This is expected, since we still get all the performance benefits from JITing and in normal use-cases not every single instruction needs to be blinded. Summing up all 296 test cases averaged over multiple runs from test_bpf.ko suite, interpreter was 55% slower than JIT only and JIT with blinding was 8% slower than JIT only. Since there are also some extremes in the test suite, I expect for ordinary workloads that the performance for the JIT with blinding case is even closer to JIT only case, f.e. nmap test case from suite has averaged timings in ns 29 (JIT), 35 (+ blinding), and 151 (interpreter). BPF test suite, seccomp test suite, eBPF sample code and various bigger networking eBPF programs have been tested with this and were running fine. For testing purposes, I also adapted interpreter and redirected blinded eBPF image to interpreter and also here all tests pass. [1] http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html [2] https://github.com/01org/jit-spray-poc-for-ksp/ [3] http://www.openwall.com/lists/kernel-hardening/2016/05/03/5 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Elena Reshetova <elena.reshetova@intel.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:49:32 -04:00
Daniel Borkmann	d1c55ab5e4	bpf: prepare bpf_int_jit_compile/bpf_prog_select_runtime apis Since the blinding is strictly only called from inside eBPF JITs, we need to change signatures for bpf_int_jit_compile() and bpf_prog_select_runtime() first in order to prepare that the eBPF program we're dealing with can change underneath. Hence, for call sites, we need to return the latest prog. No functional change in this patch. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:49:32 -04:00
Daniel Borkmann	c237ee5eb3	bpf: add bpf_patch_insn_single helper Move the functionality to patch instructions out of the verifier code and into the core as the new bpf_patch_insn_single() helper will be needed later on for blinding as well. No changes in functionality. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:49:32 -04:00
Daniel Borkmann	93a73d442d	bpf, x86/arm64: remove useless checks on prog There is never such a situation, where bpf_int_jit_compile() is called with either prog as NULL or len as 0, so the tests are unnecessary and confusing as people would just copy them. s390 doesn't have them, so no change is needed there. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:49:32 -04:00
Daniel Borkmann	6077776b59	bpf: split HAVE_BPF_JIT into cBPF and eBPF variant Split the HAVE_BPF_JIT into two for distinguishing cBPF and eBPF JITs. Current cBPF ones: # git grep -n HAVE_CBPF_JIT arch/ arch/arm/Kconfig:44: select HAVE_CBPF_JIT arch/mips/Kconfig:18: select HAVE_CBPF_JIT if !CPU_MICROMIPS arch/powerpc/Kconfig:129: select HAVE_CBPF_JIT arch/sparc/Kconfig:35: select HAVE_CBPF_JIT Current eBPF ones: # git grep -n HAVE_EBPF_JIT arch/ arch/arm64/Kconfig:61: select HAVE_EBPF_JIT arch/s390/Kconfig:126: select HAVE_EBPF_JIT if PACK_STACK && HAVE_MARCH_Z196_FEATURES arch/x86/Kconfig:94: select HAVE_EBPF_JIT if X86_64 Later code also needs this facility to check for eBPF JITs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:49:31 -04:00
Daniel Borkmann	c94987e40e	bpf: move bpf_jit_enable declaration Move the bpf_jit_enable declaration to the filter.h file where most other core code is declared, also since we're going to add a second knob there. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:49:31 -04:00
Daniel Borkmann	4936e3528e	bpf: minor cleanups in ebpf code Besides others, remove redundant comments where the code is self documenting enough, and properly indent various bpf_verifier_ops and bpf_prog_type_list declarations. Moreover, remove two exports that actually have no module user. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:49:31 -04:00
Vivien Didelot	553eb54444	net: dsa: mv88e6xxx: remove bridge work Now that the bridge code defers the switchdev port state setting, there is no need to defer the port STP state change within the mv88e6xxx code. Thus get rid of the driver's bridge work code. This also fixes a race condition where the DSA layer assumes that the bridge code already set the unbridged port's STP state to Disabled before restoring the Forwarding state. As a consequence, this also fixes the FDB flush for the unbridged port which now correctly occurs during the Forwarding to Disabled transition. Fixes: `0bc05d585d` ("switchdev: allow caller to explicitly request attr_set as deferred") Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:46:24 -04:00
David Ahern	b0e95ccdd7	net: vrf: protect changes to private data with rcu One cpu can be processing packets which includes using the cached route entries in the vrf device's private data and on another cpu the device gets deleted which releases the routes and sets the pointers in net_vrf to NULL. This results in datapath dereferencing a NULL pointer. Fix by protecting access to dst's with rcu. Fixes: `193125dbd8` ("net: Introduce VRF device driver") Fixes: `35402e3136` ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:46:24 -04:00
Eric Dumazet	ea1627c20c	tcp: minor optimizations around tcp_hdr() usage tcp_hdr() is slightly more expensive than using skb->data in contexts where we know they point to the same byte. In receive path, tcp_v4_rcv() and tcp_v6_rcv() are in this situation, as tcp header has not been pulled yet. In output path, the same can be said when we just pushed the tcp header in the skb, in tcp_transmit_skb() and tcp_make_synack() Also factorize the two checks for tcb->tcp_flags & TCPHDR_SYN in tcp_transmit_skb() and pass tcp header pointer to tcp_ecn_send(), so that compiler can further optimize and avoid a reload. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:46:23 -04:00
Nicolas Dichtel	5022524308	netlink: kill nla_put_u64() This function is not used anymore. nla_put_u64_64bit() should be used instead. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:46:23 -04:00
Eric Dumazet	2632616bc4	sock: propagate __sock_cmsg_send() error __sock_cmsg_send() might return different error codes, not only -EINVAL. Fixes: `24025c465f` ("ipv4: process socket-level control messages in IPv4") Fixes: `ad1e46a837` ("ipv6: process socket-level control messages in IPv6") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:46:23 -04:00
Arnd Bergmann	a986a05de9	net: qrtr: fix build problems Having multiple loadable modules with the same name cannot work with modprobe, and having both net/qrtr/smd.ko and drivers/soc/qcom/smd.ko results in a (somewhat cryptic) build error: ERROR: "qcom_smd_driver_unregister" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_driver_register" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_set_drvdata" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_send" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_get_drvdata" [net/qrtr/smd.ko] undefined! ERROR: "qcom_smd_driver_unregister" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! ERROR: "qcom_smd_driver_register" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! ERROR: "qcom_smd_set_drvdata" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! ERROR: "qcom_smd_send" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! ERROR: "qcom_smd_get_drvdata" [drivers/soc/qcom/wcnss_ctrl.ko] undefined! Also, the qrtr driver uses the SMD interface and has a Kconfig dependency, but also allows for compile-testing when SMD is disabled. However, if with QCOM_SMD=m and COMPILE_TEST=y we can end up with QRTR_SMD=y and that fails with a related link error. The changes the dependency so we can still compile-test the driver but not have it built-in if SMD is a module, to avoid running in the broken configuration, and changes the Makefile to provide the driver under a different module name. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `bdabad3e36` ("net: Add Qualcomm IPC router") Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:44:58 -04:00
David S. Miller	148bd3a34b	Merge branch 'tc_flower_offload' Amir Vadai says: ==================== sched,mlx5: Offloaded TC flower filter statistics This patchset introduces counters support for offloaded cls_flower filters. When the user calls 'tc show -s ..', fl_dump is called. Before fl_dump() returns the statistics, it calls the NIC driver (using a new ndo_setup_tc() command - TC_CLSFLOWER_STATS) to read the hardware counters and update the statistics accordingly. A new TC action op was added (stats_update()) to be used by the NIC driver to update the statistics. Patchset was applied and tested over commit `ed7cbbc` ("udp: Resolve NULL pointer dereference over flow-based vxlan device") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:43:52 -04:00
Amir Vadai	aad7e08d39	net/mlx5e: Hardware offloaded flower filter statistics support Introduce support in updating statistics of offloaded TC flower classifiers. Currently only the DROP action is supported. Signed-off-by: Amir Vadai <amirva@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:43:51 -04:00
Amir Vadai	43a335e055	net/mlx5_core: Flow counters infrastructure If a counter has the aging flag set when created, it is added to a list of counters that will be queried periodically from a workqueue. query result and last use timestamp are cached. add/del counter must be very efficient since thousands of such operations might be issued in a second. There is only a single reference to counters without aging, therefore no need for locks. But, counters with aging enabled are stored in a list. In order to make code as lockless as possible, all the list manipulation and access to hardware is done from a single context - the periodic counters query thread. The hardware supports multiple counters per FTE, however currently we are using one counter for each FTE. Signed-off-by: Amir Vadai <amirva@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:43:51 -04:00
Amir Vadai	bd5251dbf1	net/mlx5_core: Introduce flow steering destination of type counter When adding a flow steering rule with a counter, need to supply a destination of type MLX5_FLOW_DESTINATION_TYPE_COUNTER, with a pointer to a struct mlx5_fc. Also, MLX5_FLOW_CONTEXT_ACTION_COUNT bit should be set in the action. Signed-off-by: Amir Vadai <amirva@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:43:51 -04:00
Amir Vadai	9dc0b289c4	net/mlx5_core: Firmware commands to support flow counters Getting packet/byte statistics on flows is done through flow counters. Implement the firmware commands to alloc, free and query flow counters. Signed-off-by: Amir Vadai <amirva@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:43:51 -04:00
Amir Vadai	42ca502e17	net/mlx5_core: Use a macro in mlx5_command_str() Use a macro instead of copying the OP name. Signed-off-by: Amir Vadai <amirva@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:43:51 -04:00
Amir Vadai	10cbc68434	net/sched: cls_flower: Hardware offloaded filters statistics support Introduce a new command in ndo_setup_tc() for hardware offloaded filters, to call the NIC driver, and make it update the statistics. This will be done before dumping the filter and its statistics. Signed-off-by: Amir Vadai <amirva@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:43:50 -04:00
Amir Vadai	9fea47d93b	net/sched: act_gact: Update statistics when offloaded to hardware Implement the stats_update callback that will be called by NIC drivers for hardware offloaded filters. Signed-off-by: Amir Vadai <amirva@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:43:50 -04:00
Amir Vadai	3804070235	net/sched: Enable netdev drivers to update statistics of offloaded actions Introduce stats_update callback. netdev driver could call it for offloaded actions to update the basic statistics (packets, bytes and last use). Since bstats_update() and bstats_cpu_update() use skb as an argument to get the counters, _bstats_update() and _bstats_cpu_update(), that get bytes and packets as arguments, were added. Signed-off-by: Amir Vadai <amirva@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:43:50 -04:00
David S. Miller	388665a9be	Merge branch 'pxa168_eth-perf' Jisheng Zhang says: ==================== net: pxa168_eth: improve performance This series is to improve the pxa168_eth driver performance by using {readl\|writel}_relaxed or appropriate memory barriers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:39:50 -04:00
Jisheng Zhang	b17d15592d	net: pxa168_eth: Use dma_wmb/rmb where appropriate Update the pxa168_eth driver to use the dma_rmb/wmb calls instead of the full barriers in order to improve performance: reduced 97ns/39ns on average in tx/rx path on Marvell BG4CT platform. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:39:49 -04:00
Jisheng Zhang	3ed687823c	net: pxa168_eth: use {readl\|writel}_relaxed instead of readl/writel Since appropriate memory barriers are already there, use the relaxed version to improve performance a bit. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:39:49 -04:00
Jiri Benc	8be0cfa4d3	vxlan: set mac_header correctly in GPE mode For VXLAN-GPE, the interface is ARPHRD_NONE, thus we need to reset mac_header after pulling the outer header. v2: Put the code to the existing conditional block as suggested by Shmulik Ladkani. Fixes: `e1e5314de0` ("vxlan: implement GPE") Signed-off-by: Jiri Benc <jbenc@redhat.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:37:10 -04:00
David S. Miller	41ae56ce35	Merge branch 'xen-netback-control-ring' Paul Durrant says: ==================== xen-netback: support for control ring My recent patch to import an up-to-date include/xen/interface/io/netif.h from the Xen Project brought in the necessary definitions to support the new control shared ring and protocol. This patch series updates xen-netback to support the new ring. Patch #1 adds the necessary boilerplate to map the control ring and handle messages. No implementation of the new protocol is included in this patch so that it can be kept to a reasonable size. Patch #2 adds the protocol implementation. Patch #3 adds support for passing has values calculated by xen-netback to capable frontends. Patch #4 adds support for accepting hash values calculated by capable frontends and using them the set the socket buffer hash. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:35:57 -04:00
Paul Durrant	c2d09fde72	xen-netback: use hash value from the frontend My recent patch to include/xen/interface/io/netif.h defines a new extra info type that can be used to pass hash values between backend and guest frontend. This patch adds code to xen-netback to use the value in a hash extra info fragment passed from the guest frontend in a transmit-side (i.e. netback receive side) packet to set the skb hash accordingly. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:35:56 -04:00
Paul Durrant	f07f989338	xen-netback: pass hash value to the frontend My recent patch to include/xen/interface/io/netif.h defines a new extra info type that can be used to pass hash values between backend and guest frontend. This patch adds code to xen-netback to pass hash values calculated for guest receive-side packets (i.e. netback transmit side) to the frontend. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:35:56 -04:00
Paul Durrant	40d8abdee8	xen-netback: add control protocol implementation My recent patch to include/xen/interface/io/netif.h defines a new shared ring (in addition to the rx and tx rings) for passing control messages from a VM frontend driver to a backend driver. A previous patch added the necessary boilerplate for mapping the control ring from the frontend, should it be created. This patch adds implementations for each of the defined protocol messages. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:35:56 -04:00
Paul Durrant	4e15ee2cb4	xen-netback: add control ring boilerplate My recent patch to include/xen/interface/io/netif.h defines a new shared ring (in addition to the rx and tx rings) for passing control messages from a VM frontend driver to a backend driver. This patch adds the necessary code to xen-netback to map this new shared ring, should it be created by a frontend, but does not add implementations for any of the defined protocol messages. These are added in a subsequent patch for clarity. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:35:56 -04:00
David S. Miller	1ca4673432	Merge branch 'cls_u32_hw_sw' Sridhar Samudrala says: ==================== Enable SW only or HW only offloads with u32 classifier This set of patches export TCA_CLS_FLAGS_SKIP_HW to userspace and also introduces another flag TCA_CLS_FLAGS_SKIP_SW. These flags enable offloading u32 filters to either SW or HW only. The default semantics with no flags is to add the filter to HW if possible and also into SW. With SKIP_HW flag, the filter is only added to SW. With SKIP_SW flag, the filter is added to HW and an error is returned to user on failure. These flags are mutually exclusive. There was an earlier discussion on these semantics in the following email thread. http://thread.gmane.org/gmane.linux.network/401733 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:30:57 -04:00
Samudrala, Sridhar	d34e3e1813	net: cls_u32: Add support for skip-sw flag to tc u32 classifier. On devices that support TC U32 offloads, this flag enables a filter to be added only to HW. skip-sw and skip-hw are mutually exclusive flags. By default without any flags, the filter is added to both HW and SW, but no error checks are done in case of failure to add to HW. With skip-sw, failure to add to HW is treated as an error. Here is a sample script that adds 2 filters, one with skip-sw and the other with skip-hw flag. # add ingress qdisc tc qdisc add dev p4p1 ingress # enable hw tc offload. ethtool -K p4p1 hw-tc-offload on # add u32 filter with skip-sw flag. tc filter add dev p4p1 parent ffff: protocol ip prio 99 \ handle 800:0:1 u32 ht 800: flowid 800:1 \ skip-sw \ match ip src 192.168.1.0/24 \ action drop # add u32 filter with skip-hw flag. tc filter add dev p4p1 parent ffff: protocol ip prio 99 \ handle 800:0:2 u32 ht 800: flowid 800:2 \ skip-hw \ match ip src 192.168.2.0/24 \ action drop Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:30:57 -04:00
Samudrala, Sridhar	760edee8b5	net: sched: Move TCA_CLS_FLAGS_SKIP_HW to uapi header file. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:30:56 -04:00
David S. Miller	860d7ef64d	Merge branch 'hv_netvsc-races' Vitaly Kuznetsov says: ==================== hv_netvsc: avoid races on mtu change/set channels Changes since v1: - Rebased to net-next [Haiyang Zhang] Original description: MTU change and set channels operations are implemented as netvsc device re-creation destroying internal structures (struct net_device stays). This is really unfortunate but there is no support from Hyper-V host to do it in a different way. Such re-creation is unsurprisingly racy, Haiyang reported a crash when netvsc_change_mtu() is racing with netvsc_link_change() but I was able to identify additional races upon investigation. Both netvsc_set_channels() and netvsc_change_mtu() race against: 1) netvsc_link_change() 2) netvsc_remove() 3) netvsc_send() To solve these issues without introducing new locks some refactoring is required. We need to get rid of very complex link graph in all the internal structures and avoid traveling through structures which are being removed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:26:01 -04:00
Vitaly Kuznetsov	8809883482	hv_netvsc: set nvdev link after populating chn_table Crash in netvsc_send() is observed when netvsc device is re-created on mtu change/set channels. The crash is caused by dereferencing of NULL channel pointer which comes from chn_table. The root cause is a mixture of two facts: - we set nvdev pointer in net_device_context in alloc_net_device() before we populate chn_table. - we populate chn_table[0] only. The issue could be papered over by checking channel != NULL in netvsc_send() but populating the whole chn_table and writing the nvdev pointer afterwards seems more appropriate. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:26:01 -04:00
Vitaly Kuznetsov	6da7225f5a	hv_netvsc: synchronize netvsc_change_mtu()/netvsc_set_channels() with netvsc_remove() When netvsc device is removed during mtu change or channels setup we get into troubles as both paths are trying to remove the device. Synchronize them with start_remove flag and rtnl lock. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:26:01 -04:00
Vitaly Kuznetsov	0a1275ca51	hv_netvsc: get rid of struct net_device pointer in struct netvsc_device Simplify netvsvc pointer graph by getting rid of the redundant ndev pointer. We can always get a pointer to struct net_device from somewhere else. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:26:00 -04:00
Vitaly Kuznetsov	3d541ac5a9	hv_netvsc: untangle the pointer mess We have the following structures keeping netvsc adapter state: - struct net_device - struct net_device_context - struct netvsc_device - struct rndis_device - struct hv_device and there are pointers/dependencies between them: - struct net_device_context is contained in struct net_device - struct hv_device has driver_data pointer which points to 'struct net_device' OR 'struct netvsc_device' depending on driver's state (!). - struct net_device_context has a pointer to 'struct hv_device'. - struct netvsc_device has pointers to 'struct hv_device' and 'struct net_device_context'. - struct rndis_device has a pointer to 'struct netvsc_device'. Different functions get different structures as parameters and use these pointers for traveling. The problem is (in addition to keeping in mind this complex graph) that some of these structures (struct netvsc_device and struct rndis_device) are being removed and re-created on mtu change (as we implement it as re-creation of hyper-v device) so our travel using these pointers is dangerous. Simplify this to a the following: - add struct netvsc_device pointer to struct net_device_context (which is a part of struct net_device and thus never disappears) - remove struct hv_device and struct net_device_context pointers from struct netvsc_device - replace pointer to 'struct netvsc_device' with pointer to 'struct net_device'. - always keep 'struct net_device' in hv_device driver_data. We'll end up with the following 'circular' structure: net_device: [net_device_context] -> netvsc_device -> rndis_device -> net_device -> hv_device -> net_device On MTU change we'll be removing the 'netvsc_device -> rndis_device' branch and re-creating it making the synchronization easier. There is one additional redundant pointer left, it is struct net_device link in struct netvsc_device, it is going to be removed in a separate commit. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:26:00 -04:00
Vitaly Kuznetsov	1bdcec8a5f	hv_netvsc: use start_remove flag to protect netvsc_link_change() netvsc_link_change() can race with netvsc_change_mtu() or netvsc_set_channels() as these functions destroy struct netvsc_device and rndis filter. Use start_remove flag for syncronization. As netvsc_change_mtu()/netvsc_set_channels() are called with rtnl lock held we need to take it before checking start_remove value in netvsc_link_change(). Reported-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:26:00 -04:00
Vitaly Kuznetsov	f580aec4bf	hv_netvsc: move start_remove flag to net_device_context struct netvsc_device is destroyed on mtu change so keeping the protection flag there is not a good idea. Move it to struct net_device_context which is preserved. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:26:00 -04:00
Uwe Kleine-König	da47b45720	phy: add support for a reset-gpio specification The framework only asserts (for now) that the reset gpio is not active. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Roger Quadros <rogerq@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 13:22:53 -04:00
David S. Miller	f23e0f6507	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-05-14 This series contains updates to i40e and i40evf. Kevin adds support to disable link on all ports and changes bits set for telling firmware the PHY needs to be modified by the driver. Anjali adds a feature to enable/disable all multicast for a trusted VF. Added priv-flag knob to configure global true promiscuous support. Shannon adds the support code for calling the admin queue API call aq_set_switch_config(). Mitch modifies the VF, to log a message if an untrusted VF attempts to configure promiscuous mode, but lies to it and returns everything is ok instead of returning an error. Corrects the logic for reporting the receive packet hash. Fixed the adding of a broadcast filter for VFs, since that all VSIs are configured to receive broadcasts as default, so do not need to add a filter. Catherine refactors the ethtool get_settings to report the possible supported link modes from what we know about the current PHY type and that with the firmware supported PHY types. Jacob changes the driver to use WARN_ONCE in order to highlight the issue, but do not display a warning every time when receive hang message is received. Akeem corrects receive ptype payload layer for non_tunneled IPv6, when it should be layer 4 for UDP, instead of layer 3. Dan Carpenter fixes an uninitialized variable bug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-15 13:47:27 -04:00
David S. Miller	bf14e9ec80	Merge branch 'bnxt_en-next' Michael Chan says: ==================== bnxt_en: updates for net-next. Non-critical bug fixes, improvements, a new ethtool feature, and a new device ID. v2: Fixed a bug in bnxt_get_module_eeprom() found by Ben Hutchings. Ajit Khaparde (2): bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO bnxt_en: Report PCIe link speed and width during driver load Michael Chan (6): bnxt_en: Reduce maximum ring pages if page size is 64K. bnxt_en: Improve the delay logic for firmware response. bnxt_en: Fix length value in dmesg log firmware error message. bnxt_en: Simplify and improve unsupported SFP+ module reporting. bnxt_en: Add BCM57314 device ID. bnxt_en: Use dma_rmb() instead of rmb(). Satish Baddipadige (1): bnxt_en: Fix invalid max channel parameter in ethtool -l. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-15 13:35:49 -04:00
Michael Chan	b67daab033	bnxt_en: Use dma_rmb() instead of rmb(). Use the weaker but more appropriate dma_rmb() to order the reading of the completion ring. Suggested-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-15 13:35:48 -04:00
Michael Chan	5049e33b55	bnxt_en: Add BCM57314 device ID. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-15 13:35:48 -04:00
Michael Chan	10289bec00	bnxt_en: Simplify and improve unsupported SFP+ module reporting. The current code is more complicated than necessary and can only report unsupported SFP+ module if it is plugged in after the device is up. Rename bnxt_port_module_event() to bnxt_get_port_module_status(). We already have the current module_status in the link_info structure, so just check that and report any unsupported SFP+ module status. Delete the unnecessary last_port_module_event. Call this function at the end of bnxt_open to report unsupported module already plugged in. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-15 13:35:48 -04:00
Michael Chan	8578d6c19a	bnxt_en: Fix length value in dmesg log firmware error message. The len value in the hwrm error message is wrong. Use the properly adjusted value in the variable len. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-15 13:35:48 -04:00
Michael Chan	a11fa2be6d	bnxt_en: Improve the delay logic for firmware response. The current code has 2 problems: 1. The maximum wait time is not long enough. It is about 60% of the duration specified by the firmware. It is calling usleep_range(600, 800) for every 1 msec we are supposed to wait. 2. The granularity of the delay is too coarse. Many simple firmware commands finish in 25 usec or less. We fix these 2 issues by multiplying the original 1 msec loop counter by 40 and calling usleep_range(25, 40) for each iteration. There is also a second delay loop to wait for the last DMA word to complete. This delay loop should be a very short 5 usec wait. This change results in much faster bring-up/down time: Before the patch: time ip link set p4p1 up real 0m0.120s user 0m0.001s sys 0m0.009s After the patch: time ip link set p4p1 up real 0m0.030s user 0m0.000s sys 0m0.010s Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-15 13:35:48 -04:00
Michael Chan	d0a42d6fc8	bnxt_en: Reduce maximum ring pages if page size is 64K. The chip supports 4K/8K/64K page sizes for the rings and we try to match it to the CPU PAGE_SIZE. The current page size limits for the rings are based on 4K/8K page size. If the page size is 64K, these limits are too large. Reduce them appropriately. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-15 13:35:48 -04:00

1 2 3 4 5 ...

591897 commits