The IB specification ch. 9.9.3 table 58 says that a QP which isn't set
up for the operation should return a NAK invalid request.
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
During compliance testing and when debugging some interconnect issues,
it is very useful to be able to send malformed packets, without having
the device signal them as malformed (drop, or terminate with EBP). The
hardware supports this, but the driver "diagnostic packet" interface
did not.
Extend capability to send specific malformed packets for testing.
Signed-off-by: Michael Albaugh <Michael.Albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Previously the driver and userspace code handled the case of 1 subport
somewhat inconsistently. The new interpretation of this situation is
that if one subport is requested, the driver turns on the subport
mechanism and arranges for the port to be "shared" by one process. In
normal use the userspace library does not use this configuration and
instead arranges for the port not to be shared at all. This
particular idiom can be useful for testing purposes.
Signed-off-by: Mark Debbage <mark.debbage@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When subports are required to run a program, this patch checks that
the driver and the userspace library have compatible subport
implementations. This is achieved through checks on the swminor
version field built into the driver and userspace library. Bad
combinations are reported through syslog and result in an error when
opening the port.
Signed-off-by: Mark Debbage <mark.debbage@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A duplicate RDMA read request can fool the responder into NAKing a new
RDMA read request because the responder wasn't keeping track of
whether the queue of RDMA read requests had been sent at least once.
For example, requester sends 4 2K byte RDMA read requests, times out,
and resends the first, then sees the 4 responses, then sends a 5th
RDMA read or atomic operation. The responder sees the 4 requests,
sends 4 responses, sees the resent 1st request, rewinds the queue,
then sees the 5th request but thinks the queue is full and that the
requester is invalidly sending a 5th new request.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The code to copy data from the receive queue buffers to the IB SGEs
doesn't check the SGE length, only the memory region/page length when
copying data. This could overwrite parts of the user's memory that
were not intended to be written. It can only happen if multiple SGEs
are used to describe a receive buffer which almost never happens in
practice.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The send function is called when posting new send work requests.
There is no point in trying to send a packet if the QP is already
waiting for a HW send buffer so don't clear the busy bit until the
buffer available interrupt happens.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A RDMA read response or atomic response can ACK earlier sends and RDMA
writes. In this case, the wrong work request pointer was being used
to store the read first response or atomic result. Also, if a RDMA
read request is retried, the code to compute which request to resend
was incorrect.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This centralizes the use of the abort functionality, removes the
unneeded buffer cancel (abort does the same thing), sets up to ignore
launch errors after abort, same as cancel. We need abort on exit from
freeze mode to avoid having buffers stuck in the busy state, if a user
process happened to complete the send while we were in freeze mode
doing the recovery.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The values passed have never been right for iba 6120 chips, but just
happened to work. We needed to select the right buffer offset in the
chip (both are in same register), and the total length was wrong also,
but was covered by the rounding up.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Define pkt rcvd 'type' in a way consistent with HW spec and chips.
The hardware considers received packets of type 0 to be expected, and
type 1 to be eager. The driver was calling the ipath_f_put_tid
functions using a variable called 'type' set to 0 for eager and to 1
for expected packets. Worse, the iba6110 and iba6120 drivers used
those values inconsistently. This was quite confusing. Now
everything is consistent with the hardware.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
According to chapter 17.2.8.1.1, QPs start in the migrated state and
should send packets with the M bit set in the BTH.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a minor bug where the wrong QP was checked for a send
work request that should wait for an RNR timeout.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a bug introduced when moving some code around for
readability.
Setting the wqe pointer at the end of the function is a NOP since it
isn't used. Move it back to where it is used.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In ipath_query_device(), some of the struct ib_device_attr fields were
not being initialized.
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Although our chip supports 4K MTUs, our driver doesn't yet support
this feature, so limit the maximum MTU to 2K until we get support for
4K MTUs implemented.
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Recognize IBA 6110 Revision 4: same feature set, etc. as earlier revisions.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We currently track various errors, now we enhance that capability by
logging some of them to EEPROM. We also now log a cumulative "active"
time defined by traffic though the InfiniPath HCA beyond the normal SM
traffic.
Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IPATH_RUNTIME_PBC_REWRITE and the IPATH_RUNTIME_LOOSE_DMA_ALIGN
flags were not ever implemented correctly and did not turn out to be
necessary. Remove the last vestiges of these flags but mark the spot
with a comment to remind us to not reuse these flags in the interest
of binary compatibility. The INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR
bit was also not found to be useful, so it was dropped in the cleanup
as well.
Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Arthur Jones <arthur.jones@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The new LED blinking interface adds more contention for the
unprotected GPIO pins that were already shared, though not commonly at
the same time. We add locks to the accesses to these pins so that
Read-Modify-Write is now safe. Some of these locks are added at
interrupt context, so we shadow the registers which drive and inspect
these pins to avoid the mmio read/writes. This mitigates the effects
of the locks and hastens us through the interrupt.
Add locking and always use shadows for registers controlling GPIO pins
(ExtCtrl and GPIOout). The use of shadows implies doing less I/O,
which can make I2C operation too fast on some platforms. An explicit
udelay(1) in SCL manipulation fixes that.
Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When we want to find an InfiniPath HCA in a rack of nodes, it is often
expeditious to blink the status LEDs via a userspace /sys file.
A write-only led_override "file" is published per device. Writes to
this file are interpreted as (string form) numbers, and the resulting
value sent to ipath_set_led_override(). The upper eight bits are
interpretted as a 4.4 fixed-point "frequency in Hertz", and the bottom
two 4-bit values are alternately (D0..3, then D4..7) used by the
board-specific LED-setting function to override the normal state.
Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mlx4_ib.h uses struct mutex, so although <linux/mutex.h> seems to be
pulled in indirectly by one of the headers it includes, the right
thing is to include <linux/mutex.h> directly.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Refactor the ehca changes from commit ed23a727 ("IB: Return "maybe
missed event" hint from ib_req_notify_cq()") so the queue arithmetic
is done in slightly fewer lines. Also, move the spinlock flags into
the block they're used in.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We need to keep a spare entry in the SRQ so that there always is a
next WQE available when posting receives (so that we can tell the
difference between a full queue and an empty queue). So subtract 1
from the value HW gives us before reporting the limit on SRQ entries
to consumers.
Found by Mellanox QA.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Inline data segments in send WQEs are not allowed to cross a 64 byte
boundary. We use inline data segments to hold the UD headers for MLX
QPs (QP0 and QP1). A send with GRH on QP1 will have a UD header that
is too big to fit in a single inline data segment without crossing a
64 byte boundary, so split the header into two inline data segments.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Upcoming firmware introduces command interface revision 3, which
changes the way port capabilities are queried and set. Update the
driver to handle both the new and old command interfaces by adding a
new MLX4_FLAG_OLD_PORT_CMDS that it is set after querying the firmware
interface revision and then using the correct interface based on the
setting of the flag.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When compacting CQ entries, we need to set the correct value of the
ownership bit in case the value is different between the index we copy
the CQE from and the index we copy it to.
Found by Ronni Zimmerman of Mellanox.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The calculation of max_inline_data in set_kernel_sq_size() is bogus,
since it doesn't take into account the fact that inline segments may
not cross a 64-byte boundary, and hence multiple inline segments will
probably need to be used to post large inline sends.
We don't support inline sends for kernel QPs anyway, so there's no
point in doing this calculation anyway, since the field is just zeroed
out a little later. So just delete the bogus calculation.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
New ConnectX firmware introduces FW command interface revision 2,
which requires that for each QP, a chunk of send queue entries (the
"headroom") is kept marked as invalid, so that the HCA doesn't get
confused if it prefetches entries that haven't been posted yet. Add
code to the driver to do this, and also update the user ABI so that
userspace can request that the prefetcher be turned off for userspace
QPs (we just leave the prefetcher on for all kernel QPs).
Unfortunately, marking send queue entries this way is confuses older
firmware, so we change the driver to allow only FW command interface
revisions 2. This means that users will have to update their firmware
to work with the new driver, but the firmware is changing quickly and
the old firmware has lots of other bugs anyway, so this shouldn't be too
big a deal.
Based on a patch from Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Doing max(1, foo) where foo is u32 generates a warning, because 1 is a
signed constant. Fix this by using 1U instead.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Cast the increment added to wq->tail when send completions are
processed to u16 to avoid using wrong values caused by standard
integer promotions.
The same bug was fixed in libmlx4 by Eli Cohen <eli@mellanox.co.il>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mlx4: Make sure RQ allocation is always valid
RDMA/cma: Fix initialization of next_port
IB/mlx4: Fix zeroing of rnr_retry value in ib_modify_qp()
mlx4_core: Don't set MTT address in dMPT entries with PA set
mlx4_core: Check firmware command interface revision
IB/mthca, mlx4_core: Fix typo in comment
mlx4_core: Free catastrophic error MSI-X interrupt with correct dev_id
mlx4_core: Initialize ctx_list and ctx_lock earlier
mlx4_core: Fix CQ context layout
QPs attached to an SRQ must never have their own RQ, and QPs not
attached to SRQs must have an RQ with at least 1 entry. Enforce all
of this in set_rq_size().
Based on a patch by Eli Cohen <eli@mellanox.co.il>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The code in __mlx4_ib_modify_qp() overwrites context->params1 after
the RNR retry parameter is ORed in, which results in the RNR retry
parameter always being set to 0. Fix this by moving where we OR in
the value to later in the function, after the initial assignment of
context->params1.
Found by the Mellanox firmware group.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch converts the ipv4_devconf config members (everything except
sysctl) to an array. This allows easier manipulation which will be
needed later on to provide better management of default config values.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
mthca_free_err_wqe() currently treats both send and receive CQEs
identically if a QP is using an SRQ. But for Tavor hardware, send
CQEs with error can be chained together even if the RQ is part of SRQ,
so we may miss some CQEs.
Fix by following the WQE chain for all send CQEs even for non-SRQ QPs.
This fixes crashes in IPoIB CM:
<https://bugs.openfabrics.org//show_bug.cgi?id=604>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Due to a typo, the driver was reporting the wrong number of "actual send
WRs" after ehca_create_qp().
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We need to initialize the owner bit of send queue WQEs to hardware
ownership whenever the QP is modified from reset to init, not just
when the QP is first allocated. This avoids having the hardware
process stale WQEs when the QP is moved to reset but not destroyed and
then modified to init again.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a QP is attached to a shared receive queue (SRQ), then it doesn't
have a receive queue (RQ). So don't allocate an RQ doorbell (or map a
doorbell from userspace for userspace QPs) for that QP.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/cm: Improve local id allocation
IPoIB/cm: Fix SRQ WR leak
IB/ipoib: Fix typos in error messages
IB/mlx4: Check if SRQ is full when posting receive
IB/mlx4: Pass send queue sizes from userspace to kernel
IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
mlx4_core: Fix array overrun in dump_dev_cap_flags()
IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
IB/mthca: Fix RESET to ERROR transition
IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
IB/mthca: Set GRH:HopLimit when building MLX headers
IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
IB/mthca: Fix use-after-free on device restart
IB/ehca: Return proper error code if register_mr fails
IPoIB: Handle P_Key table reordering
IB/core: Use start_port() and end_port()
IB/core: Add helpers for uncached GID and P_Key searches
IB/ipath: Fix potential deadlock with multicast spinlocks
IB/core: Free umem when mm is already gone
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pass the number of WQEs for the send queue and their size from userspace
to the kernel to avoid having to keep the QP size calculations in sync
between the kernel driver and libmlx4. This fixes a bug seen with the
current mlx4_ib driver and current libmlx4 caused by a difference in the
calculated sizes for SQ WQEs. Also, this gives more flexibility for
userspace to experiment with using multiple WQE BBs for a single SQ WQE.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
wr->opcode is invalid if it's >= ARRAY_SIZE(mlx4_ib_opcode), not just
strictly >.
This was spotted by the Coverity checker (CID 1643).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
According to the IB spec, a QP can be moved from RESET back to RESET
or to the ERROR state, but mlx4 firmware does not support this and
returns an error if we try. Fix the RESET to RESET transition by
just returning 0 without doing anything, and fix RESET to ERROR by
moving the QP from RESET to INIT with dummy parameters and then
transitioning from INIT to ERROR.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
According to the IB spec, a QP can be moved from RESET to the ERROR
state, but mthca firmware does not support this and returns an error if
we try. Work around this FW limitation by moving the QP from RESET to
INIT with dummy parameters and then transitioning from INIT to ERROR.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Global CM packets used by rmda_cm were being sent with a GRH:hopLimit
of zero, causing them to be dropped by the router. The problem is a
missing initialization of the hop_limit field in mthca_read_ah(),
which was called by build_mlx_header() when sending a MAD on QP1.
Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>