Commit graph

918 commits

Author SHA1 Message Date
Tom Tucker
64be8608c1 svcrdma: Add FRMR get/put services
Add services for the allocating, freeing, and unmapping Fast Reg MR. These
services will be used by the transport connection setup, send and receive
routines.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-10-06 14:45:18 -05:00
Chuck Lever
2937391385 NLM: Remove unused argument from svc_addsock() function
Clean up: The svc_addsock() function no longer uses its "proto"
argument, so remove it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-10-04 17:12:27 -04:00
Benny Halevy
d5b337b487 nfsd: use nfs client rpc callback program
since commit ff7d9756b5
"nfsd: use static memory for callback program and stats"
do_probe_callback uses a static callback program
(NFS4_CALLBACK) rather than the one set in clp->cl_callback.cb_prog
as passed in by the client in setclientid (4.0)
or create_session (4.1).

This patches introduces rpc_create_args.prognumber that allows
overriding program->number when creating rpc_clnt.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:40 -04:00
Chuck Lever
db820d6376 SUNRPC: Clean up debug messages in rpcb_clnt.c
The RPCB XDR functions are used for multiple procedures.  For instance,
rpcb_encode_getaddr() is used for RPCB_GETADDR, RPCB_SET, and
RPCB_UNSET.  Make the XDR debug messages more generic so they are less
confusing.

And, unlike in other RPC consumers in the kernel, a single debug flag
enables all levels of debug messages in the RPC bind client, including
XDR debug messages.  Since the XDR decoders already report success or
failure in this case, remove redundant debug messages in the mid-level
rpcb_register_call() function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:40 -04:00
Chuck Lever
f6fb3f6f59 SUNRPC: Fix up svc_unregister()
With the new rpcbind code, a PMAP_UNSET will not have any effect on
services registered via rpcbind v3 or v4.

Implement a version of svc_unregister() that uses an RPCB_UNSET with
an empty netid string to make sure we have cleared *all* entries for
a kernel RPC service when shutting down, or before starting a fresh
instance of the service.

Use the new version only when CONFIG_SUNRPC_REGISTER_V4 is enabled;
otherwise, the legacy PMAP version is used to ensure complete
backwards-compatibility with the Linux portmapper daemon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:40 -04:00
Chuck Lever
9d548b9c95 SUNRPC: Use short-hand IPv6 ANYADDR for RPCB_SET
Clean up: When doing an RPCB_SET, make the kernel's rpcb client use the
shorthand "::" for the universal form of the IPv6 ANY address.

Without this patch, rpcbind will advertise:

  0000:0000:0000:0000:0000:0000:0000:0000.x.y

This is cosmetic only.  It cleans up the display of information from
/sbin/rpcinfo.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:40 -04:00
Chuck Lever
2c7eb0b206 SUNRPC: Register both netids for AF_INET6 servers
TI-RPC is a user-space library of RPC functions that replaces ONC RPC
and allows RPC to operate in the new world of IPv6.

TI-RPC combines the concept of a transport protocol (UDP and TCP)
and a protocol family (PF_INET and PF_INET6) into a single identifier
called a "netid."  For example, "udp" means UDP over IPv4, and "udp6"
means UDP over IPv6.

For rpcbind, then, the RPC service tuple that is registered and
advertised is:

  [RPC program, RPC version, service address and port, netid]

instead of

  [RPC program, RPC version, port, protocol]

Service address is typically ANYADDR, but can be a specific address
of one of the interfaces on a multi-homed host.  The third item in
the new tuple is expressed as a universal address.

The current Linux rpcbind implementation registers a netid for both
protocol families when RPCB_SET is done for just the PF_INET6 version
of the netid (ie udp6 or tcp6).  So registering "udp6" causes a
registration for "udp" to appear automatically as well.

We've recently determined that this is incorrect behavior.  In the
TI-RPC world, "udp6" is not meant to imply that the registered RPC
service handles requests from AF_INET as well, even if the listener
socket does address mapping.  "udp" and "udp6" are entirely separate
capabilities, and must be registered separately.

The Linux kernel, unlike TI-RPC, leverages address mapping to allow a
single listener socket to handle requests for both AF_INET and AF_INET6.
This is still OK, but the kernel currently assumes registering "udp6"
will cover "udp" as well.  It registers only "udp6" for it's AF_INET6
services, even though they handle both AF_INET and AF_INET6 on the same
port.

So svc_register() actually needs to register both "udp" and "udp6"
explicitly (and likewise for TCP).  Until rpcbind is fixed, the
kernel can ignore the return code for the second RPCB_SET call.

Please merge this with commit 15231312:

    SUNRPC: Support IPv6 when registering kernel RPC services

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Olaf Kirch <okir@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:39 -04:00
Chuck Lever
a26cfad6e0 SUNRPC: Support IPv6 when registering kernel RPC services
In order to advertise NFS-related services on IPv6 interfaces via
rpcbind, the kernel RPC server implementation must use
rpcb_v4_register() instead of rpcb_register().

A new kernel build option allows distributions to use the legacy
v2 call until they integrate an appropriate user-space rpcbind
daemon that can support IPv6 RPC services.

I tried adding some automatic logic to fall back if registering
with a v4 protocol request failed, but there are too many corner
cases.  So I just made it a compile-time switch that distributions
can throw when they've replaced portmapper with rpcbind.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:38 -04:00
Chuck Lever
7252d575ab SUNRPC: Split portmap unregister API into separate function
Create a separate server-level interface for unregistering RPC services.

The mechanics of, and the API for, registering and unregistering RPC
services will diverge further as support for IPv6 is added.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:38 -04:00
Chuck Lever
14aeb2118d SUNRPC: Simplify rpcb_register() API
Bruce suggested there's no need to expose the difference between an error
sending the PMAP_SET request and an error reply from the portmapper to
rpcb_register's callers.  The user space equivalent of rpcb_register() is
pmap_set(3), which returns a bool_t : either the PMAP set worked, or it
didn't.  Simple.

So let's remove the "*okay" argument from rpcb_register() and
rpcb_v4_register(), and simply return an error if any part of the call
didn't work.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:37 -04:00
Chuck Lever
b6632339e3 SUNRPC: Set V6ONLY socket option for RPC listener sockets
My plan is to use an AF_INET listener on systems that support only IPv4,
and an AF_INET6 listener on systems that can support IPv6. Incoming
IPv4 packets will be posted to an AF_INET6 listener with a mapped IPv4
address.

Max Matveev <makc@sgi.com> says:
  Creating a single listener can be dangerous - if net.ipv6.bindv6only
  is enabled then it's possible to create another listener in v4
  namespace on the same port and steal the traffic from the "unifed"
  listener. You need to disable V6ONLY explicitly via a sockopt to stop
  that.

Set appropriate socket option on RPC server listener sockets to prevent
this.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 18:13:37 -04:00
Chuck Lever
5dd248f6f1 SUNRPC: Use proper INADDR_ANY when setting up RPC services on IPv6
Teach svc_create_xprt() to use the correct ANY address for AF_INET6 based
RPC services.

No caller uses AF_INET6 yet.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:56 -04:00
Chuck Lever
e851db5b05 SUNRPC: Add address family field to svc_serv data structure
Introduce and initialize an address family field in the svc_serv structure.

This field will determine what family to use for the service's listener
sockets and what families are advertised via the local rpcbind daemon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:56 -04:00
Arnaldo Carvalho de Melo
6067804047 net: Use hton[sl]() instead of __constant_hton[sl]() where applicable
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-20 22:20:49 -07:00
Cyrill Gorcunov
27df6f25ff sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports
Vegard Nossum reported
----------------------
> I noticed that something weird is going on with /proc/sys/sunrpc/transports.
> This file is generated in net/sunrpc/sysctl.c, function proc_do_xprt(). When
> I "cat" this file, I get the expected output:
>    $ cat /proc/sys/sunrpc/transports
>    tcp 1048576
>    udp 32768

> But I think that it does not check the length of the buffer supplied by
> userspace to read(). With my original program, I found that the stack was
> being overwritten by the characters above, even when the length given to
> read() was just 1.

David Wagner added (among other things) that copy_to_user could be
probably used here.

Ingo Oeser suggested to use simple_read_from_buffer() here.

The conclusion is that proc_do_xprt doesn't check for userside buffer
size indeed so fix this by using Ingo's suggestion.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
CC: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Greg Banks <gnb@sgi.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-01 14:24:24 -04:00
Tom Tucker
24b8b44780 svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet
RDMA_READ completions are kept on a separate queue from the general
I/O request queue. Since a separate lock is used to protect the RDMA_READ
completion queue, a race exists between the dto_tasklet and the
svc_rdma_recvfrom thread where the dto_tasklet sets the XPT_DATA
bit and adds I/O to the read-completion queue. Concurrently, the
recvfrom thread checks the generic queue, finds it empty and resets
the XPT_DATA bit. A subsequent svc_xprt_enqueue will fail to enqueue
the transport for I/O and cause the transport to "stall".

The fix is to protect both lists with the same lock and set the XPT_DATA
bit with this lock held.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-08-13 16:57:31 -04:00
Ingo Molnar
414f746d23 Merge branch 'linus' into cpus4096 2008-07-28 21:14:43 +02:00
Alexey Dobriyan
51cc50685a SL*B: drop kmem cache argument from constructor
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:07 -07:00
FUJITA Tomonori
8d8bb39b9e dma-mapping: add the device argument to dma_mapping_error()
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:

This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread).  So I
CC'ed this to KVM camp.  Comments are appreciated.

A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
NULL, the system-wide dma_ops pointer is used as before.

If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging).  It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.

The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
device.  Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.

The first patch adds the device argument to dma_mapping_error.  The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.

This patch:

dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations.  So we can't have dma_mapping_ops per device.

Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
argument.

[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:03 -07:00
Mike Travis
0bc3cc03fa cpumask: change cpumask_of_cpu_ptr to use new cpumask_of_cpu
* Replace previous instances of the cpumask_of_cpu_ptr* macros
    with a the new (lvalue capable) generic cpumask_of_cpu().

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 16:40:33 +02:00
Ingo Molnar
eb6a12c242 Merge branch 'linus' into cpus4096-for-linus
Conflicts:

	net/sunrpc/svc.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-21 17:19:50 +02:00
Linus Torvalds
14b395e35d Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits)
  nfsd: nfs4xdr.c do-while is not a compound statement
  nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c
  lockd: Pass "struct sockaddr *" to new failover-by-IP function
  lockd: get host reference in nlmsvc_create_block() instead of callers
  lockd: minor svclock.c style fixes
  lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock
  lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock
  lockd: nlm_release_host() checks for NULL, caller needn't
  file lock: reorder struct file_lock to save space on 64 bit builds
  nfsd: take file and mnt write in nfs4_upgrade_open
  nfsd: document open share bit tracking
  nfsd: tabulate nfs4 xdr encoding functions
  nfsd: dprint operation names
  svcrdma: Change WR context get/put to use the kmem cache
  svcrdma: Create a kmem cache for the WR contexts
  svcrdma: Add flush_scheduled_work to module exit function
  svcrdma: Limit ORD based on client's advertised IRD
  svcrdma: Remove unused wait q from svcrdma_xprt structure
  svcrdma: Remove unneeded spin locks from __svc_rdma_free
  svcrdma: Add dma map count and WARN_ON
  ...
2008-07-20 21:21:46 -07:00
Mike Travis
65c0118453 cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
* This patch replaces the dangerous lvalue version of cpumask_of_cpu
    with new cpumask_of_cpu_ptr macros.  These are patterned after the
    node_to_cpumask_ptr macros.

    In general terms, if there is a cpumask_of_cpu_map[] then a pointer to
    the cpumask_of_cpu_map[cpu] entry is used.  The cpumask_of_cpu_map
    is provided when there is a large NR_CPUS count, reducing
    greatly the amount of code generated and stack space used for
    cpumask_of_cpu().  The pointer to the cpumask_t value is needed for
    calling set_cpus_allowed_ptr() to reduce the amount of stack space
    needed to pass the cpumask_t value.

    If there isn't a cpumask_of_cpu_map[], then a temporary variable is
    declared and filled in with value from cpumask_of_cpu(cpu) as well as
    a pointer variable pointing to this temporary variable.  Afterwards,
    the pointer is used to reference the cpumask value.  The compiler
    will optimize out the extra dereference through the pointer as well
    as the stack space used for the pointer, resulting in identical code.

    A good example of the orthogonal usages is in net/sunrpc/svc.c:

	case SVC_POOL_PERCPU:
	{
		unsigned int cpu = m->pool_to[pidx];
		cpumask_of_cpu_ptr(cpumask, cpu);

		*oldmask = current->cpus_allowed;
		set_cpus_allowed_ptr(current, cpumask);
		return 1;
	}
	case SVC_POOL_PERNODE:
	{
		unsigned int node = m->pool_to[pidx];
		node_to_cpumask_ptr(nodecpumask, node);

		*oldmask = current->cpus_allowed;
		set_cpus_allowed_ptr(current, nodecpumask);
		return 1;
	}

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 22:02:57 +02:00
Ingo Molnar
bb2c018b09 Merge branch 'linus' into cpus4096
Conflicts:

	drivers/acpi/processor_throttling.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 22:00:54 +02:00
David S. Miller
49997d7515 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	Documentation/powerpc/booting-without-of.txt
	drivers/atm/Makefile
	drivers/net/fs_enet/fs_enet-main.c
	drivers/pci/pci-acpi.c
	net/8021q/vlan.c
	net/iucv/iucv.c
2008-07-18 02:39:39 -07:00
Ingo Molnar
82638844d9 Merge branch 'linus' into cpus4096
Conflicts:

	arch/x86/xen/smp.c
	kernel/sched_rt.c
	net/iucv/iucv.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 00:29:07 +02:00
Trond Myklebust
a86dc496b7 SUNRPC: Remove the BKL from the callback functions
Push it into those callback functions that actually need it.

Note that all the NFS operations use their own locking, so don't need the
BKL. Ditto for the rpcbind client.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:57 -04:00
Chuck Lever
c2e1b09ff2 SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
Introduce a new API to register RPC services on IPv6 interfaces to allow
the NFS server and lockd to advertise on IPv6 networks.

Unlike rpcb_register(), the new rpcb_v4_register() function uses rpcbind
protocol version 4 to contact the local rpcbind daemon.  The version 4
SET/UNSET procedures allow services to register address families besides
AF_INET, register at specific network interfaces, and register transport
protocols besides UDP and TCP.  All of this functionality is exposed via
the new rpcb_v4_register() kernel API.

A user-space rpcbind daemon implementation that supports version 4 of the
rpcbind protocol is required in order to make use of this new API.

Note that rpcbind version 3 is sufficient to support the new rpcbind
facilities listed above, but most extant implementations use version 4.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:55 -04:00
Chuck Lever
babe80eb49 SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
rpcbind version 4 registration will reuse part of rpcb_register, so just
split it out into a separate function now.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:51 -04:00
Chuck Lever
423d8b0647 SUNRPC: None of rpcb_create's callers wants a privileged source port
Clean up: Callers that required a privileged source port now use
rpcb_create_local(), so we can remove the @privileged argument from
rpcb_create().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:47 -04:00
Chuck Lever
cc5598b78f SUNRPC: Introduce a specific rpcb_create for contacting localhost
Add rpcb_create_local() for use by rpcb_register() and upcoming IPv6
registration functions.

Ensure any errors encountered by rpcb_create_local() are properly
reported.

We can also use a statically allocated constant loopback socket address
instead of one allocated on the stack and initialized every time the
function is called.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:44 -04:00
Chuck Lever
166b88d755 SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET
The rpcbind versions 3 and 4 SET and UNSET procedures use the same
arguments as the GETADDR procedure.

While definitely a bug, this hasn't been a problem so far since the
kernel hasn't used version 3 or 4 SET and UNSET.  But this will change
in just a moment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:40 -04:00
Trond Myklebust
381ba74af5 SUNRPC: Ensure our task is notified when an rpcbind call is done
If another task is busy in rpcb_getport_async number, it is more efficient
to have it wake us up when it has finished instead of arbitrarily sleeping
for 5 seconds.

Also ensure that rpcb_wake_rpcbind_waiters() is called regardless of
whether or not rpcb_getport_done() gets called.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:45 -04:00
Chuck Lever
40fef8a649 SUNRPC: Use only rpcbind v2 for AF_INET requests
Some server vendors support the higher versions of rpcbind only for
AF_INET6.  The kernel doesn't need to use v3 or v4 for AF_INET anyway,
so change the kernel's rpcbind client to query AF_INET servers over
rpcbind v2 only.

This has a few interesting benefits:

1. If the rpcbind request is going over TCP, and the server doesn't
   support rpcbind versions 3 or 4, the client reduces by two the number
   of ephemeral ports left in TIME_WAIT for each rpcbind request.  This
   will help during NFS mount storms.

2. The rpcbind interaction with servers that don't support rpcbind
   versions 3 or 4 will use less network traffic.  Also helpful
   during mount storms.

3. We can eliminate the kernel build option that controls whether the
   kernel's rpcbind client uses rpcbind version 3 and 4 for AF_INET
   servers.  Less complicated kernel configuration...

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:37 -04:00
Chuck Lever
8842413aa4 SUNRPC: Use GETADDR for rpcbind version 4 queries
Some rpcbind servers that do support rpcbind version 4 do not support
the GETVERSADDR procedure.  Use GETADDR for querying rpcbind servers
via rpcbind version 4 instead of GETVERSADDR.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:36 -04:00
Chuck Lever
6a77405157 SUNRPC: Use rpcbind version 2 GETPORT
Clean up: Change the version 2 procedure name to GETPORT.  It's the same
procedure number as GETADDR, but version 2 implementations usually refer
to it as GETPORT.

This also now matches the procedure name used in the version 2 procedure
entry in the rpcb_next_version[] array, making it slightly less confusing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:35 -04:00
Chuck Lever
fc200e794d SUNRPC: Document some naked integers in rpcbind client
Clean up: Replace naked integers that represent rpcbind protocol versions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:34 -04:00
Chuck Lever
877fcf1039 SUNRPC: More useful debugging output for rpcb client
Clean up dprintk's in rpcb client's XDR decoder functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:33 -04:00
Chuck Lever
b22602a673 SUNRPC: Ensure all transports set rq_xtime consistently
The RPC client uses the rq_xtime field in each RPC request to determine the
round-trip time of the request.  Currently, the rq_xtime field is
initialized by each transport just before it starts enqueing a request to
be sent.  However, transports do not handle initializing this value
consistently; sometimes they don't initialize it at all.

To make the measurement of request round-trip time consistent for all
RPC client transport capabilities, pull rq_xtime initialization into the
RPC client's generic transport logic.  Now all transports will get a
standardized RTT measure automatically, from:

  xprt_transmit()

to

  xprt_complete_rqst()

This makes round-trip time calculation more accurate for the TCP transport.
The socket ->sendmsg() method can return "-EAGAIN" if the socket's output
buffer is full, so the TCP transport's ->send_request() method may call
the ->sendmsg() method repeatedly until it gets all of the request's bytes
queued in the socket's buffer.

Currently, the TCP transport sets the rq_xtime field every time through
that loop so the final value is the timestamp just before the *last* call
to the underlying socket's ->sendmsg() method.  After this patch, the
rq_xtime field contains a timestamp that reflects the time just before the
*first* call to ->sendmsg().

This is consequential under heavy workloads because large requests often
take multiple ->sendmsg() calls to get all the bytes of a request queued.
The TCP transport causes the request to sleep until the remote end of the
socket has received enough bytes to clear space in the socket's local
output buffer.  This delay can be quite significant.

The method introduced by this patch is a more accurate measure of RTT
for stream transports, since the server can cause enough back pressure
to delay (ie increase the latency of) requests from the client.

Additionally, this patch corrects the behavior of the RDMA transport, which
entirely neglected to initialize the rq_xtime field.  RPC performance
metrics for RDMA transports now display correct RPC request round trip
times.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Tom Talpey <thomas.talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:15 -04:00
\\\"J. Bruce Fields\\\
a486aeda9b rpc: minor cleanup of scheduler callback code
Try to make the comment here a little more clear and concise.

Also, this macro definition seems unnecessary.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:14 -04:00
\\\"J. Bruce Fields\\\
d25a03cf96 rpc: remove some unused macros
There used to be a print_hexl() function that used isprint(), now gone.
I don't know why NFS_NGROUPS and CA_RUN_AS_MACHINE were here.

I also don't know why another #define that's actually used was marked
"unused".

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:12 -04:00
\\\"J. Bruce Fields\\\
720b8f2d6f rpc: eliminate unused variable in auth_gss upcall code
Also, a minor comment grammar fix in the same file.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:11 -04:00
Olga Kornievskaia
b6b6152c46 rpc: bring back cl_chatty
The cl_chatty flag alows us to control whether a given rpc client leaves

	"server X not responding, timed out"

messages in the syslog.  Such messages make sense for ordinary nfs
clients (where an unresponsive server means applications on the
mountpoint are probably hanging), but not for the callback client (which
can fail more commonly, with the only result just of disabling some
optimizations).

Previously cl_chatty was removed, do to lack of users; reinstate it, and
use it for the nfsd's callback client.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:10 -04:00
Chuck Lever
cd983ef81b SUNRPC: Remove obsolete messages during transport connect
Recent changes to the RPC client's transport connect logic make connect
status values ECONNREFUSED and ECONNRESET impossible.

Clean up xprt_connect_status() to account for these changes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:06 -04:00
Chuck Lever
cb3997b5a0 SUNRPC: Display some debugging information as text rather than numbers
In rpc_show_tasks(), display the program name, version number, procedure
name and tk_action as human-readable variable-length text fields rather
than columnar numbers.

Doing the symbol lookup here helps in cases where we have actual
debugging output from a kernel log, but don't have access to the kernel
image or RPC module that generated the output.

Sample output:

 -pid- flgs status -client- --rqstp- -timeout ---ops--
  5608 0001    -11 eeb42690 f6d93710        0 f8fa1764 nfsv3 WRITE a:call_transmit_status q:none
  5609 0001    -11 eeb42690 f6d937e0        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
  5610 0001    -11 eeb42690 f6d93230        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
  5611 0001    -11 eeb42690 f6d93300        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
  5612 0001    -11 eeb42690 f6d93090        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
  5613 0001    -11 eeb42690 f6d933d0        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
  5614 0001    -11 eeb42690 f6d93cc0        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
  5615 0001    -11 eeb42690 f6d93a50        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
  5616 0001    -11 eeb42690 f6d93640        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
  5617 0001    -11 eeb42690 f6d93b20        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
  5618 0001    -11 eeb42690 f6d93160        0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:58 -04:00
Chuck Lever
38e886e0c1 SUNRPC: Refactor rpc_show_tasks
Clean up: move the logic that displays each task to its own function.
This removes indentation and makes future changes easier.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:56 -04:00
Chuck Lever
68a23ee94e SUNRPC: Don't display the rpc_show_tasks header if there are no tasks
Clean up: don't display the rpc_show_tasks column header unless there is at
least one task to display.  As far as I can tell, it is safe to let the
list_for_each_entry macro decide that each list is empty.

scripts/checkpatch.pl also wants a KERN_FOO at the start of any newly added
printk() calls, so this and subsequent patches will also add KERN_INFO.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:55 -04:00
Chuck Lever
b0e1c57ea0 SUNRPC: Rename "call_" functions that are no longer FSM states
The RPC client uses a finite state machine to move RPC tasks through each
step of an RPC request.  Each state is contained in a function in
net/sunrpc/clnt.c, and named call_foo.

Some of the functions named call_foo have changed over the past few years and
are no longer states in the FSM.  These include: call_encode, call_header,
and call_verify.  As a clean up, rename the functions that have changed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:53 -04:00
Chuck Lever
3748f1e447 SUNRPC: Add a function to display the name of an RPC procedure
Improve debugging messages in call_start() and call_verify() by having
them show the RPC procedure name instead of the procedure number.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:53 -04:00
Trond Myklebust
0f38b873ae SUNRPC: Use GFP_NOFS when allocating credentials
Since the credentials may be allocated during the call to rpc_new_task(),
which again may be called by a memory allocator...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:48 -04:00
Trond Myklebust
b390c2b55c SUNRPC: An ENOMEM error from call_encode is always fatal
The special 'ENOMEM' case that was previously flagged as non-fatal is
bogus: auth_gss always returns EAGAIN for non-fatal errors, and may in fact
return ENOMEM in the special case where xdr_buf_read_netobj runs out of
preallocated buffer space (invariably a _fatal_ error, since there is no
provision for preallocating larger buffers).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:43 -04:00
Trond Myklebust
8b39f2b410 SUNRPC: Ensure we exit early in case of an encode error
All errors from call_encode(), with exception of EAGAIN are fatal, so we
should immediately return instead of proceeding to xprt_transmit().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:41 -04:00
Trond Myklebust
803a9067e1 SUNRPC: Fix an rpcbind breakage for the case of IPv6 lookups
Now that rpcb_next_version has been split into an IPv4 version and an IPv6
version, we Oops when rpcb_call_async attempts to look up the IPv6-specific
RPC procedure in rpcb_next_version.

Fix the Oops simply by having rpcb_getport_async pass the correct RPC
procedure as an argument.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-08 15:23:10 -04:00
Trond Myklebust
0d3a34b48c SUNRPC: Fix a double-free in rpcbind
It is wrong to be freeing up the rpcbind arguments if the call to
rpcb_call_async() fails, since they should already have been freed up by
rpcb_map_release().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-08 15:23:00 -04:00
Ingo Molnar
68083e05d7 Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
J. Bruce Fields
e86322f611 Merge branch 'for-bfields' of git://linux-nfs.org/~tomtucker/xprt-switch-2.6 into for-2.6.27 2008-07-03 16:24:06 -04:00
J. Bruce Fields
b620754bfe svcrpc: fix handling of garbage args
To return garbage_args, the accept_stat must be 0, and we must have a
verifier.  So we shouldn't be resetting the write pointer as we reject
the call.

Also, we must add the two placeholder words here regardless of success
of the unwrap, to ensure the output buffer is left in a consistent state
for svcauth_gss_release().

This fixes a BUG() in svcauth_gss.c:svcauth_gss_release().

Thanks to Aime Le Rouzic for bug report, debugging help, and testing.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Aime Le Rouzic <aime.le-rouzic@bull.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-03 12:46:56 -07:00
Tom Tucker
8948896c9e svcrdma: Change WR context get/put to use the kmem cache
Change the WR context pool to be shared across mount points. This
reduces the RDMA transport memory footprint significantly since
idle mounts don't consume WR context memory.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:02:02 -05:00
Tom Tucker
bf5927d84e svcrdma: Create a kmem cache for the WR contexts
Create a kmem cache to hold WR contexts. Next we will convert
the WR context get and put services to use this kmem cache.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:02:01 -05:00
Tom Tucker
902a94e088 svcrdma: Add flush_scheduled_work to module exit function
Make certain all transports pending free are flushed from the wq
before unloading the module.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:02:00 -05:00
Tom Tucker
36ef25e464 svcrdma: Limit ORD based on client's advertised IRD
When adapters have differing IRD limits, the RDMA transport will fail to
connect properly. The RDMA transport should use the client's advertised
inbound read limit when computing its outbound read limit. For iWARP
transports, there is currently no standard for exchanging IRD/ORD
during connection establishment so the 'responder_resources' field in the
connect event is the local device's limit. The RDMA transport can be
configured to use a smaller ORD by writing the desired number to the
/proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:59 -05:00
Tom Tucker
94dba4918d svcrdma: Remove unneeded spin locks from __svc_rdma_free
At the time __svc_rdma_free is called, we are guaranteed that all references
to this transport are gone. There is, therefore, no need to protect the
resource lists with a spin lock.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:57 -05:00
Tom Tucker
87295b6c5c svcrdma: Add dma map count and WARN_ON
Add a dma map count in order to verify that all DMA mapping resources
have been freed when the transport is closed.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:56 -05:00
Tom Tucker
e6ab914371 svcrdma: Move the DMA unmap logic to the CQ handler
Separate DMA unmap from context destruction and perform DMA unmapping
in the SQ/RQ CQ reap functions. This is necessary to support software
based RDMA implementations that actually copy the data in their
ib_dma_unmap callback functions and architectures that don't have
cache coherent I/O busses.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:55 -05:00
Tom Tucker
f820c57ebf svcrdma: Use reply and chunk map for RDMA_READ processing
Modify the RDMA_READ processing to use the reply and chunk list mapping data
types. Also add a special purpose 'hdr_count' field in in the context to hold
the header page count instead of overloading the SGE length field and
corrupting the DMA map length.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:55 -05:00
Tom Tucker
34d16e42a6 svcrdma: Use RPC reply map for RDMA_WRITE processing
Use the new svc_rdma_req_map data type for mapping the client side memory
to the server side memory. Move the DMA mapping to the context pointed to
by each WR individually so that it is unmapped after the WR completes.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:54 -05:00
Tom Tucker
ab96dddbed svcrdma: Add a type for keeping NFS RPC mapping
Create a new data structure to hold the remote client address space
to local server address space mapping.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-07-02 15:01:53 -05:00
Kevin Coffman
863a24882e gss_krb5: Use random value to initialize confounder
Initialize the value used for the confounder to a random value
rather than starting from zero.
Allow for confounders of length 8 or 16 (which will be needed for AES).

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:47:38 -04:00
Kevin Coffman
db8add5789 gss_krb5: move gss_krb5_crypto into the krb5 module
The gss_krb5_crypto.o object belongs in the rpcsec_gss_krb5 module.
Also, there is no need to export symbols from gss_krb5_crypto.c

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:47:32 -04:00
Kevin Coffman
d00953a53e gss_krb5: create a define for token header size and clean up ptr location
cleanup:
Document token header size with a #define instead of open-coding it.

Don't needlessly increment "ptr" past the beginning of the header
which makes the values passed to functions more understandable and
eliminates the need for extra "krb5_hdr" pointer.

Clean up some intersecting  white-space issues flagged by checkpatch.pl.

This leaves the checksum length hard-coded at 8 for DES.  A later patch
cleans that up.

Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:47:25 -04:00
Jeff Layton
a75c5d01e4 sunrpc: remove sv_kill_signal field from svc_serv struct
Since we no longer make any distinction between shutdown signals with
nfsd, then it becomes easier to just standardize on a particular signal
to use to bring it down (SIGINT, in this case).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Jeff Layton
9867d76ca1 knfsd: convert knfsd to kthread API
This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:

- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
  take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call kthread_create directly
- eliminate __svc_create_thread

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Neil Brown
bedbdd8bad knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.
This removes the BKL from the RPC service creation codepath. The BKL
really isn't adequate for this job since some of this info needs
protection across sleeps.

Also, add some comments to try and clarify how the locking should work
and to make it clear that the BKL isn't necessary as long as there is
adequate locking between tasks when touching the svc_serv fields.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23 13:02:49 -04:00
Adrian Bunk
0b04082995 net: remove CVS keywords
This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11 21:00:38 -07:00
Mike Travis
3f9b48a758 net: Pass reference to cpumask variable in net/sunrpc/svc.c
* Pass reference to cpumask variable instead of using stack.

For inclusion into sched-devel/latest tree.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 18:44:36 +02:00
Linus Torvalds
d40ace0c7b Merge branch 'for-2.6.26' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.26' of git://linux-nfs.org/~bfields/linux: (25 commits)
  svcrdma: Verify read-list fits within RPCSVC_MAXPAGES
  svcrdma: Change svc_rdma_send_error return type to void
  svcrdma: Copy transport address and arm CQ before calling rdma_accept
  svcrdma: Set rqstp transport address in rdma_read_complete function
  svcrdma: Use ib verbs version of dma_unmap
  svcrdma: Cleanup queued, but unprocessed I/O in svc_rdma_free
  svcrdma: Move the QP and cm_id destruction to svc_rdma_free
  svcrdma: Add reference for each SQ/RQ WR
  svcrdma: Move destroy to kernel thread
  svcrdma: Shrink scope of spinlock on RQ CQ
  svcrdma: Use standard Linux lists for context cache
  svcrdma: Simplify RDMA_READ deferral buffer management
  svcrdma: Remove unused READ_DONE context flags bit
  svcrdma: Return error from rdma_read_xdr so caller knows to free context
  svcrdma: Fix error handling during listening endpoint creation
  svcrdma: Free context on post_recv error in send_reply
  svcrdma: Free context on ib_post_recv error
  svcrdma: Add put of connection ESTABLISHED reference in rdma_cma_handler
  svcrdma: Fix return value in svc_rdma_send
  svcrdma: Fix race with dto_tasklet in svc_rdma_send
  ...
2008-05-20 19:30:54 -07:00
J. Bruce Fields
68432a03f8 Merge branch 'from-tomtucker' into for-2.6.26 2008-05-20 19:57:38 -04:00
Tom Tucker
a6f911c04e svcrdma: Verify read-list fits within RPCSVC_MAXPAGES
A RDMA read-list cannot contain more elements than RPCSVC_MAXPAGES or
it will overflow the DTO context. Verify this when processing the
protocol header.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:34:02 -05:00
Tom Tucker
008fdbc571 svcrdma: Change svc_rdma_send_error return type to void
The svc_rdma_send_error function is called when an RPCRDMA protocol
error is detected. This function attempts to post an error reply message.
Since an error posting to a transport in error is ignored, change
the return type to void.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:34:01 -05:00
Tom Tucker
af261af4db svcrdma: Copy transport address and arm CQ before calling rdma_accept
This race was found by inspection. Messages can be received from the peer
immediately following the rdma_accept call, however, the CQ have not yet
been armed and the transport address has not yet been set.

Set the transport address in the connect request handler and arm the CQ
prior to calling rdma_accept.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:34:00 -05:00
Tom Tucker
69500c43b4 svcrdma: Set rqstp transport address in rdma_read_complete function
The rdma_read_complete function needs to copy the rqstp transport address
from the transport. Failure to do so can result in using the wrong
authentication method for the RPC or bug checking if the rqstp address
is not valid.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:59 -05:00
Tom Tucker
97a3df382e svcrdma: Use ib verbs version of dma_unmap
Use the ib_verbs version of the dma_unmap service in the
svc_rdma_put_context function. This should support providers
using software rdma.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:58 -05:00
Tom Tucker
356d0a1519 svcrdma: Cleanup queued, but unprocessed I/O in svc_rdma_free
When the transport is closing, the DTO tasklet may queue data
that never gets processed. Clean up resources associated with
this I/O.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:57 -05:00
Tom Tucker
1711386c62 svcrdma: Move the QP and cm_id destruction to svc_rdma_free
Move the destruction of the QP and CM_ID to the free path so that the
QP cleanup code doesn't race with the dto_tasklet handling flushed WR.
The QP reference is not needed because we now have a reference for
every WR.

Also add a guard in the SQ and RQ completion handlers to ignore
calls generated by some providers when the QP is destroyed.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:56 -05:00
Tom Tucker
0905c0f0a2 svcrdma: Add reference for each SQ/RQ WR
Add a reference on the transport for every outstanding WR.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:55 -05:00
Tom Tucker
8da91ea8de svcrdma: Move destroy to kernel thread
Some providers may wait while destroying adapter resources.
Since it is possible that the last reference is put on the
dto_tasklet, the actual destroy must be scheduled as a work item.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:54 -05:00
Tom Tucker
47698e083e svcrdma: Shrink scope of spinlock on RQ CQ
The rq_cq_reap function is only called from the dto_tasklet. The
only resource shared with other threads is the sc_rq_dto_q. Move the
spin lock to protect only this list.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:53 -05:00
Tom Tucker
8740767376 svcrdma: Use standard Linux lists for context cache
Replace the one-off linked list implementation used to implement the
context cache with the standard Linux list_head lists. Add a context
counter to catch resource leaks. A WARN_ON will be added later to
ensure that we've freed all contexts.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:52 -05:00
Tom Tucker
02e7452de7 svcrdma: Simplify RDMA_READ deferral buffer management
An NFS_WRITE requires a set of RDMA_READ requests to fetch the write
data from the client. There are two principal pieces of data that
need to be tracked: the list of pages that comprise the completed RPC
and the SGE of dma mapped pages to refer to this list of pages. Previously
this whole bit was managed as a linked list of contexts with the
context containing the page list buried in this list. This patch
simplifies this processing by not keeping a linked list, but rather only
a pionter from the last submitted RDMA_READ's context to the context
that maps the set of pages that describe the RPC.  This significantly
simplifies this code path. SGE contexts are cleaned up inline in the DTO
path instead of at read completion time.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:51 -05:00
Tom Tucker
10a38c33f4 svcrdma: Remove unused READ_DONE context flags bit
The RDMACTXT_F_READ_DONE bit is not longer used. Remove it.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:50 -05:00
Tom Tucker
d16d40093a svcrdma: Return error from rdma_read_xdr so caller knows to free context
The rdma_read_xdr function did not discriminate between no read-list and
an error posting the read-list. This results in a leak of a page if there
is an error posting the read-list.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:49 -05:00
Tom Tucker
58e8f62137 svcrdma: Fix error handling during listening endpoint creation
A listening endpoint isn't known to the generic transport switch until
the svc_create_xprt function returns without error. Calling
svc_xprt_put within the xpo_create function causes the module reference
count to be erroneously decremented.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:48 -05:00
Tom Tucker
5ac461a6f0 svcrdma: Free context on post_recv error in send_reply
If an error is encountered trying to post a recv buffer in send_reply,
free the passed in context. Return an error to the caller so it is
aware that the request was not posted.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:47 -05:00
Tom Tucker
05a0826a6e svcrdma: Free context on ib_post_recv error
If there is an error posting the recv WR to the RQ, free the
context associated with the WR. This would leak a context when
asynchronous errors occurred on the transport while conccurent threads
were processing their RPC.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:47 -05:00
Tom Tucker
120693d12c svcrdma: Add put of connection ESTABLISHED reference in rdma_cma_handler
The svcrdma transport takes a reference when it gets the ESTABLISHED
event from the provider. This reference is supposed to be removed when
the DISCONNECT event is received, however, the call to svc_xprt_put
was missing in the switch statement. This results in the memory
associated with the transport never being freed.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:46 -05:00
Tom Tucker
9d6347acd2 svcrdma: Fix return value in svc_rdma_send
Fix the return value on close to -ENOTCONN so caller knows to free context.
Also if a thread is waiting for free SQ space, check for close when waking
to avoid posting WR to a closing transport.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:45 -05:00
Tom Tucker
dbcd00eba9 svcrdma: Fix race with dto_tasklet in svc_rdma_send
The svc_rdma_send function will attempt to reap SQ WR to make room for
a new request if it finds the SQ full. This function races with the
dto_tasklet that also reaps SQ WR. To avoid polling and arming the CQ
unnecessarily move the test_and_clear_bit of the RDMAXPRT_SQ_PENDING
flag and arming of the CQ to the sq_cq_reap function.

Refactor the rq_cq_reap function to match sq_cq_reap so that the
code is easier to follow.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:44 -05:00
Tom Tucker
0e7f011a19 svcrdma: Simplify receive buffer posting
The svcrdma transport provider currently allocates receive buffers
to the RQ through the xpo_release_rqst method. This approach is overly
complicated since it means that the rqstp rq_xprt_ctxt has to be
selectively set based on whether the RPC is going to be processed
immediately or deferred. Instead, just post the receive buffer when
we are certain that we are replying in the send_reply function.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:43 -05:00
Tom Tucker
aa3314c8d6 svc: Remove unused header files from svc_xprt.c
This cosmetic patch removes unused header files that svc_xprt.c
inherited from svcsock.c

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:42 -05:00
Tom Tucker
fc63a05086 svc: Remove extra check for XPT_DEAD bit in svc_xprt_enqueue
Remove a redundant check for the XPT_DEAD bit in the svc_xprt_enqueue
function. This same bit is checked below while holding the pool lock
and prints a debug message if found to be dead.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
2008-05-19 07:33:41 -05:00