Commit graph

363539 commits

Author SHA1 Message Date
Mengdong Lin
2ef5692efa ALSA: hda - bug fix on return value when getting HDMI ELD info
In function snd_hdmi_get_eld(), the variable 'ret' should be initialized to 0.
Otherwise it will be returned uninitialized as non-zero after ELD info is got
successfully. Thus hdmi_present_sense() will always assume ELD info is invalid
by mistake, and /proc file system cannot show the proper ELD info.

Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Cc: stable@vger.kernel.org
Acked-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-04-02 11:55:23 +02:00
Takashi Iwai
ea019fdf1a ASoC: Fixes for v3.9
A few more fixes here and there, including quite a few nasty driver
 specific ones, but nothing that has a major general impact.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRUawIAAoJELSic+t+oim9RqsP/RGgUqjX68ewq4Ub9BbhsYGa
 Xz4dCN6LACQVXvgNlMhR/TypaJ+vJ3xjlgb/WXNDRYPnEoffoqzDhZOwIaJH8rPR
 wGvZGs+UZNuPeIwZ3TF3o0Ct3XYPzzPZPmLnZKL+ZWVgggbHgqGJmkK7smB1koNU
 ppnqhPaid6DIlTcSP3jHsUCxcahtCUDUMNPqIzQx7k1QquucUXvs7HWWXaboqyh3
 aci3NtDtaMJwM/qheBz0rxv0bk7FrtrukSdN0muSKmcd2Fy8VuwZWIXdB467gXUs
 I46ZEOxV2XM+/Tq7HfzLiTu35cgCwXv3z7ayr6Z5ZXjn8B1FnDmqanh3XaawwKbr
 Bx53x1X1ga83ms61z+S9QOZ8K74EcJcL6wE77QxsnuaOkgHZYdoZNC7u8i5+cHhN
 Lr9ymozK52dH1uJo0/kWvYmK/3SFeHDOMVGiBVVeRQw81PKEPrPZid7veL10UUoG
 WX224U1HeuH+0+3scFkaomd4mzhPoj//seIM+806OF4dzsiTLHC7SFU4i3HY23Fx
 OH9LJUGvr3FygNqgbdsuO3jEu5N5lXn6zY/Dumi7dLga75vtn0llhXqgM6ec1CLF
 jvlHN5UUN8hnxam4SBme5AcX2+rUbEXjD8VA82Y2NOtRWbkiYI5KYINj+SKR0lt+
 uRfo30zqYhhaUwmCfXcI
 =rovv
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v3.9

A few more fixes here and there, including quite a few nasty driver
specific ones, but nothing that has a major general impact.
2013-04-02 11:44:51 +02:00
Heiko Carstens
765a0cac56 s390/mm: provide emtpy check_pgt_cache() function
All architectures need to provide a check_pgt_cache() function. The s390 one
got lost somewhere.
So reintroduce it to prevent future compile errors e.g. if Thomas Gleixner's
idle loop rework patches get merged.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-02 08:53:11 +02:00
Heiko Carstens
ea81531de2 s390/uaccess: fix page table walk
When translating user space addresses to kernel addresses the follow_table()
function had two bugs:

- PROT_NONE mappings could be read accessed via the kernel mapping. That is
  e.g. putting a filename into a user page, then protecting the page with
  PROT_NONE and afterwards issuing the "open" syscall with a pointer to
  the filename would incorrectly succeed.

- when walking the page tables it used the pgd/pud/pmd/pte primitives which
  with dynamic page tables give no indication which real level of page tables
  is being walked (region2, region3, segment or page table). So in case of an
  exception the translation exception code passed to __handle_fault() is not
  necessarily correct.
  This is not really an issue since __handle_fault() doesn't evaluate the code.
  Only in case of e.g. a SIGBUS this code gets passed to user space. If user
  space can do something sane with the value is a different question though.

To fix these issues don't use any Linux primitives. Only walk the page tables
like the hardware would do it, however we leave quite some checks away since
we know that we only have full size page tables and each index is within bounds.

In theory this should fix all issues...

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-02 08:53:08 +02:00
Rafael J. Wysocki
29896178cf ACPI / SPI: Use parent's ACPI_HANDLE() in acpi_register_spi_devices()
The ACPI handle of struct spi_master's dev member should not be
set, because this causes that struct spi_master to be associated
with the ACPI device node corresponding to its parent as the
second "physical_device", which is incorrect (this happens during
the registration of struct spi_master).  Consequently,
acpi_register_spi_devices() should use the ACPI handle of the
parent of the struct spi_master it is called for rather than that
struct spi_master's ACPI handle (which should be NULL).

Make that happen and modify the spi-pxa2xx driver, which currently is
the only driver for ACPI-enumerated SPI controller chips, not to set
the ACPI handle for the struct spi_master it creates.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-04-02 01:55:45 +02:00
Paolo Pisati
f5c3ef21db cpufreq: check OF node /cpus presence before dereferencing it
Check for the presence of the '/cpus' OF node before dereferencing it
blindly:

[    4.181793] Unable to handle kernel NULL pointer dereference at virtual address 0000001c
[    4.181793] pgd = c0004000
[    4.181823] [0000001c] *pgd=00000000
[    4.181823] Internal error: Oops: 5 [#1] SMP ARM
[    4.181823] Modules linked in:
[    4.181823] CPU: 1    Tainted: G        W     (3.8.0-15-generic #25~hbankD)
[    4.181854] PC is at of_get_next_child+0x64/0x70
[    4.181854] LR is at of_get_next_child+0x24/0x70
[    4.181854] pc : [<c04fda18>]    lr : [<c04fd9d8>]    psr: 60000113
[    4.181854] sp : ed891ec0  ip : ed891ec0  fp : ed891ed4
[    4.181884] r10: c04dafd0  r9 : c098690c  r8 : c0936208
[    4.181884] r7 : ed890000  r6 : c0a63d00  r5 : 00000000  r4 : 00000000
[    4.181884] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : c0b2acc8
[    4.181884] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[    4.181884] Control: 10c5387d  Table: adcb804a  DAC: 00000015
[    4.181915] Process swapper/0 (pid: 1, stack limit = 0xed890238)
[    4.181915] Stack: (0xed891ec0 to 0xed892000)
[    4.181915] 1ec0: c09b7b70 00000007 ed891efc ed891ed8 c04daff4 c04fd9c0 00000000 c09b7b70
[    4.181915] 1ee0: 00000007 c0a63d00 ed890000 c0936208 ed891f54 ed891f00 c00088e0 c04dafdc
[    4.181945] 1f00: ed891f54 ed891f10 c006e940 00000000 00000000 00000007 00000007 c08a4914
[    4.181945] 1f20: 00000000 c07dbd30 c0a63d00 c09b7b70 00000007 c0a63d00 000000bc c0936208
[    4.181945] 1f40: c098690c c0986914 ed891f94 ed891f58 c0936a40 c00087bc 00000007 00000007
[    4.181976] 1f60: c0936208 be8bda20 b6eea010 c0a63d00 c064547c 00000000 00000000 00000000
[    4.181976] 1f80: 00000000 00000000 ed891fac ed891f98 c0645498 c09368c8 00000000 00000000
[    4.181976] 1fa0: 00000000 ed891fb0 c0014658 c0645488 00000000 00000000 00000000 00000000
[    4.182006] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    4.182006] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    4.182037] [<c04fda18>] (of_get_next_child+0x64/0x70) from [<c04daff4>] (cpu0_cpufreq_driver_init+0x24/0x284)
[    4.182067] [<c04daff4>] (cpu0_cpufreq_driver_init+0x24/0x284) from [<c00088e0>] (do_one_initcall+0x130/0x1b0)
[    4.182067] [<c00088e0>] (do_one_initcall+0x130/0x1b0) from [<c0936a40>] (kernel_init_freeable+0x184/0x24c)
[    4.182098] [<c0936a40>] (kernel_init_freeable+0x184/0x24c) from [<c0645498>] (kernel_init+0x1c/0xf4)
[    4.182128] [<c0645498>] (kernel_init+0x1c/0xf4) from [<c0014658>] (ret_from_fork+0x14/0x20)
[    4.182128] Code: f57ff04f e320f004 e89da830 e89da830 (e595001c)
[    4.182128] ---[ end trace 634903a22e8609cb ]---
[    4.182189] Kernel panic - not syncing: Attempted to kill init!  exitcode=0x0000000b
[    4.182189]
[    4.642395] CPU0: stopping

[rjw: Changelog]
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-02 01:36:08 +02:00
Rajagopal Venkat
5faaa035dd PM / devfreq: Fix compiler warnings for CONFIG_PM_DEVFREQ unset
Fix compiler warnings generated when devfreq is not enabled
(CONFIG_PM_DEVFREQ is not set).

Signed-off-by: Rajagopal Venkat <rajagopal.venkat@linaro.org>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-02 01:28:41 +02:00
holger@eitzenberger.org
5c33448c40 netfilter: xt_NFQUEUE: coalesce IPv4 and IPv6 hashing
Because rev1 and rev3 of the target share the same hashing
generalize it by introduing nfqueue_hash().

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02 01:26:10 +02:00
holger@eitzenberger.org
8746ddcf12 netfilter: xt_NFQUEUE: introduce CPU fanout
Current NFQUEUE target uses a hash, computed over source and
destination address (and other parameters), for steering the packet
to the actual NFQUEUE. This, however forgets about the fact that the
packet eventually is handled by a particular CPU on user request.

If E. g.

  1) IRQ affinity is used to handle packets on a particular CPU already
     (both single-queue or multi-queue case)

and/or

  2) RPS is used to steer packets to a specific softirq

the target easily chooses an NFQUEUE which is not handled by a process
pinned to the same CPU.

The idea is therefore to use the CPU index for determining the
NFQUEUE handling the packet.

E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it
looks like this:

 +-----+  +-----+  +-----+  +-----+
 |NFQ#0|  |NFQ#1|  |NFQ#2|  |NFQ#3|
 +-----+  +-----+  +-----+  +-----+
    ^        ^        ^        ^
    |        |NFQUEUE |        |
    +        +        +        +
 +-----+  +-----+  +-----+  +-----+
 |rx-0 |  |rx-1 |  |rx-2 |  |rx-3 |
 +-----+  +-----+  +-----+  +-----+

The NFQUEUEs not necessarily have to start with number 0, setups with
less NFQUEUEs than packet-handling CPUs are not a problem as well.

This patch extends the NFQUEUE target to accept a new
NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the
CPU index for determining the NFQUEUE being used. I have to introduce
rev3 for this. The 'flags' are folded into _v2 'bypass'.

By changing the way which queue is assigned, I'm able to improve the
performance if the processes reading on the NFQUEUs are pinned
correctly.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02 01:25:44 +02:00
Rafael J. Wysocki
0f70306929 PM / QoS: Avoid possible deadlock related to sysfs access
Commit b81ea1b (PM / QoS: Fix concurrency issues and memory leaks in
device PM QoS) put calls to pm_qos_sysfs_add_latency(),
pm_qos_sysfs_add_flags(), pm_qos_sysfs_remove_latency(), and
pm_qos_sysfs_remove_flags() under dev_pm_qos_mtx, which was a
mistake, because it may lead to deadlocks in some situations.
For example, if pm_qos_remote_wakeup_store() is run in parallel
with dev_pm_qos_constraints_destroy(), they may deadlock in the
following way:

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.9.0-rc4-next-20130328-sasha-00014-g91a3267 #319 Tainted: G        W
 -------------------------------------------------------
 trinity-child6/12371 is trying to acquire lock:
  (s_active#54){++++.+}, at: [<ffffffff81301631>] sysfs_addrm_finish+0x31/0x60

 but task is already holding lock:
  (dev_pm_qos_mtx){+.+.+.}, at: [<ffffffff81f07cc3>] dev_pm_qos_constraints_destroy+0x23/0x250

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (dev_pm_qos_mtx){+.+.+.}:
        [<ffffffff811811da>] lock_acquire+0x1aa/0x240
        [<ffffffff83dab809>] __mutex_lock_common+0x59/0x5e0
        [<ffffffff83dabebf>] mutex_lock_nested+0x3f/0x50
        [<ffffffff81f07f2f>] dev_pm_qos_update_flags+0x3f/0xc0
        [<ffffffff81f05f4f>] pm_qos_remote_wakeup_store+0x3f/0x70
        [<ffffffff81efbb43>] dev_attr_store+0x13/0x20
        [<ffffffff812ffdaa>] sysfs_write_file+0xfa/0x150
        [<ffffffff8127f2c1>] __kernel_write+0x81/0x150
        [<ffffffff812afc2d>] write_pipe_buf+0x4d/0x80
        [<ffffffff812af57c>] splice_from_pipe_feed+0x7c/0x120
        [<ffffffff812afa25>] __splice_from_pipe+0x45/0x80
        [<ffffffff812b14fc>] splice_from_pipe+0x4c/0x70
        [<ffffffff812b1538>] default_file_splice_write+0x18/0x30
        [<ffffffff812afae3>] do_splice_from+0x83/0xb0
        [<ffffffff812afb2e>] direct_splice_actor+0x1e/0x20
        [<ffffffff812b0277>] splice_direct_to_actor+0xe7/0x200
        [<ffffffff812b15bc>] do_splice_direct+0x4c/0x70
        [<ffffffff8127eda9>] do_sendfile+0x169/0x300
        [<ffffffff8127ff94>] SyS_sendfile64+0x64/0xb0
        [<ffffffff83db7d18>] tracesys+0xe1/0xe6

 -> #0 (s_active#54){++++.+}:
        [<ffffffff811800cf>] __lock_acquire+0x15bf/0x1e50
        [<ffffffff811811da>] lock_acquire+0x1aa/0x240
        [<ffffffff81300aa2>] sysfs_deactivate+0x122/0x1a0
        [<ffffffff81301631>] sysfs_addrm_finish+0x31/0x60
        [<ffffffff812ff77f>] sysfs_hash_and_remove+0x7f/0xb0
        [<ffffffff813035a1>] sysfs_unmerge_group+0x51/0x70
        [<ffffffff81f068f4>] pm_qos_sysfs_remove_flags+0x14/0x20
        [<ffffffff81f07490>] __dev_pm_qos_hide_flags+0x30/0x70
        [<ffffffff81f07cd5>] dev_pm_qos_constraints_destroy+0x35/0x250
        [<ffffffff81f06931>] dpm_sysfs_remove+0x11/0x50
        [<ffffffff81efcf6f>] device_del+0x3f/0x1b0
        [<ffffffff81efd128>] device_unregister+0x48/0x60
        [<ffffffff82d4083c>] usb_hub_remove_port_device+0x1c/0x20
        [<ffffffff82d2a9cd>] hub_disconnect+0xdd/0x160
        [<ffffffff82d36ab7>] usb_unbind_interface+0x67/0x170
        [<ffffffff81f001a7>] __device_release_driver+0x87/0xe0
        [<ffffffff81f00559>] device_release_driver+0x29/0x40
        [<ffffffff81effc58>] bus_remove_device+0x148/0x160
        [<ffffffff81efd07f>] device_del+0x14f/0x1b0
        [<ffffffff82d344f9>] usb_disable_device+0xf9/0x280
        [<ffffffff82d34ff8>] usb_set_configuration+0x268/0x840
        [<ffffffff82d3a7fc>] usb_remove_store+0x4c/0x80
        [<ffffffff81efbb43>] dev_attr_store+0x13/0x20
        [<ffffffff812ffdaa>] sysfs_write_file+0xfa/0x150
        [<ffffffff8127f71d>] do_loop_readv_writev+0x4d/0x90
        [<ffffffff8127f999>] do_readv_writev+0xf9/0x1e0
        [<ffffffff8127faba>] vfs_writev+0x3a/0x60
        [<ffffffff8127fc60>] SyS_writev+0x50/0xd0
        [<ffffffff83db7d18>] tracesys+0xe1/0xe6

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(dev_pm_qos_mtx);
                                lock(s_active#54);
                                lock(dev_pm_qos_mtx);
   lock(s_active#54);

  *** DEADLOCK ***

To avoid that, remove the calls to functions mentioned above from
under dev_pm_qos_mtx and introduce a separate lock to prevent races
between functions that add or remove device PM QoS sysfs attributes
from happening.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-02 01:25:24 +02:00
Rafael J. Wysocki
da259465d7 USB / PM: Don't try to hide PM QoS flags from usb_port_device_release()
Remove the call to dev_pm_qos_hide_flags(), added by commit 6e30d7cb
"usb: Add driver/usb/core/(port.c,hub.h) files", from
usb_port_device_release(), because (1) it is completely unnecessary
(the flags have been removed already by the PM core during the
unregistration of the device object) and (2) it triggers a NULL
pointer dereference in sysfs_find_dirent() (dev->kobj.sd is NULL at
this point).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-02 01:25:09 +02:00
Linus Torvalds
fefcdbe4ac One reversion, a tiny leak fix, and a cc:stable locking fix, in two parts.
Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRWhMkAAoJENkgDmzRrbjxqegP/jDUHptF9Z0xWgZ7W9ecpz8X
 Gsg5UmpD8WKf5c3EMlWjtkXEp4u1f+TNjNKXDgY7yP34Kp9ZgZKBMu9NLoaq72F9
 1Q9KKKXWh4HaKAvXYlVX63toBvsTgkwkohv7ZIWwDyBPFX3JH2GvSFGaC2WXLklM
 6KJaPn4yQSWYDlF3BfGj7m4NNpdec9JlWycVQ2FJ4esjbqZmztjiHrYN5rcWaUaG
 KHrVr0QuBTyt8wTo4FF+qFRCn/aJyvz7Bih9Mns3vGZB8fDs9fc8xMMDNXDpBUop
 rklU6pbPt0FGhjxD+OT3pQPtrZ0QZBIchEdfBcmf8r5kxucKq+FvitvX6l1DhZg1
 2dT5grVSXGICnkH/kDLNuUYLGyssJ75aI7mAYpTwniStkQm0fCFkPc3+JSqvYZTW
 3ht/sK6e7uOfsrhSPAMIa5NaNFjl8D8oJuWuzWspY/DYxG+jEMNV6N2mKENGir3F
 ViAtE+wvvf2gO6hkGGwTyOqYsKERbXlplcihmvV2CNMnslUWxeMlkdhFJ2oZzPr2
 YKOy7oCteWuToq8ZnYf2VZEFIr9/RM7CcDxK11xA2NHqwWq9MEMB9BJB5ay2GVzv
 sgzl6XYtmFDektbDL1WpZCbaqlF8qxcYHKbji/qhGrQl5KbllQYjRfI+HHgcfkax
 oLWAX9hxE2Y8/L4DKktd
 =qmfE
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio fixes from Rusty Russell:
 "One reversion, a tiny leak fix, and a cc:stable locking fix, in two
  parts"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio: console: add locking around c_ovq operations
  virtio: console: rename cvq_lock to c_ivq_lock
  hw_random: free rng_buffer at module exit
  Revert "virtio_console: Initialize guest_connected=true for rproc_serial"
2013-04-01 16:17:52 -07:00
Gao feng
f016588861 netfilter: use IS_ENABLE to replace if defined in TRACE target
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02 01:03:45 +02:00
Anatol Pomozov
c1681bf8a7 loop: prevent bdev freeing while device in use
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".

But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
  bd_set_size+0x10/0xa0
  loop_clr_fd+0x1f8/0x420 [loop]
  lo_ioctl+0x200/0x7e0 [loop]
  lo_compat_ioctl+0x47/0xe0 [loop]
  compat_blkdev_ioctl+0x341/0x1290
  do_filp_open+0x42/0xa0
  compat_sys_ioctl+0xc1/0xf20
  do_sys_open+0x16e/0x1d0
  sysenter_dispatch+0x7/0x1a

To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().

The issue is reprodusible on current Linus head and v3.3. Here is the test:

  dd if=/dev/zero of=loop.file bs=1M count=1
  while [ true ]; do
    losetup /dev/loop0 loop.file
    echo 2 > /proc/sys/vm/drop_caches
    losetup -d /dev/loop0
  done

[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
  time we call loop_set_fd() we check that loop_device->lo_state is
  Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
  it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
  device we'll get ENXIO.

  loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
  loop_device->lo_ctl_mutex. ]

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-01 15:48:47 -07:00
Julian Anastasov
ac69269a45 ipvs: do not disable bh for long time
We used a global BH disable in LOCAL_OUT hook.
Add _bh suffix to all places that need it and remove
the disabling from LOCAL_OUT and sync code.

Functions like ip_defrag need protection from
BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
RCU lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:58 +02:00
Julian Anastasov
ceec4c3816 ipvs: convert services to rcu
This is the final step in RCU conversion.

Things that are removed:

- svc->usecnt: now svc is accessed under RCU read lock
- svc->inc: and some unused code
- ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
- __ip_vs_svc_lock: replaced with RCU
- IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
	RCU and work in parallel with configuration

Other changes:

- before now, a RCU read-side critical section included the
calling of the schedule method, now it is extended to include
service lookup
- ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
- svc->pe and svc->scheduler remain to the end (of grace period),
	the schedulers are prepared for such RCU readers
	even after done_service is called but they need
	to use synchronize_rcu because last ip_vs_scheduler_put
	can happen while RCU read-side critical sections
	use an outdated svc->scheduler pointer
- as planned, update_service is removed
- empty services can be freed immediately after grace period.
	If dests were present, the services are freed from
	the dest trash code

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:58 +02:00
Julian Anastasov
413c2d04e9 ipvs: convert dests to rcu
In previous commits the schedulers started to access
svc->destinations with _rcu list traversal primitives
because the IP_VS_WAIT_WHILE macro still plays the role of
grace period. Now it is time to finish the updating part,
i.e. adding and deleting of dests with _rcu suffix before
removing the IP_VS_WAIT_WHILE in next commit.

We use the same rule for conns as for the
schedulers: dests can be searched in RCU read-side critical
section where ip_vs_dest_hold can be called by ip_vs_bind_dest.

Some things are not perfect, for example, calling
functions like ip_vs_lookup_dest from updating code under
RCU, just because we use some function both from reader
and from updater.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:57 +02:00
Julian Anastasov
ba3a3ce14e ipvs: convert sched_lock to spin lock
As all read_locks are gone spin lock is preferred.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:56 +02:00
Julian Anastasov
ed3ffc4e48 ipvs: do not expect result from done_service
This method releases the scheduler state,
it can not fail. Such change will help to properly
replace the scheduler in following patch.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:56 +02:00
Julian Anastasov
578bc3ef1e ipvs: reorganize dest trash
All dests will go to trash, no exceptions.
But we have to use new list node t_list for this, due
to RCU changes in following patches. Dests will wait there
initial grace period and later all conns and schedulers to
put their reference. The dests don't get reference for
staying in dest trash as before.

	As result, we do not load ip_vs_dest_put with
extra checks for last refcnt and the schedulers do not
need to play games with atomic_inc_not_zero while
selecting best destination.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:55 +02:00
Julian Anastasov
08cb2d032f ipvs: convert wrr scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the weight for some
dest can be reduced during dest selection, change the
algorithm to check weights by using minimum weights in the
1 .. max_weight-(di-1) range, with the same step (di). By this
way we ensure that there will be always a weight >= 1 check
before claiming that all destinations are overloaded.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:54 +02:00
Julian Anastasov
b310faad3e ipvs: convert wlc scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:54 +02:00
Julian Anastasov
1acb7f6761 ipvs: convert sh scheduler to rcu
Use the 3 new methods to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:53 +02:00
Julian Anastasov
9be52aba7a ipvs: convert sed scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:52 +02:00
Julian Anastasov
c0d0c0a1c0 ipvs: convert rr scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the previous entry
could be unlinked, limit the list traversals to 2 when
lookup started from previous entry.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:52 +02:00
Julian Anastasov
f92ea8f096 ipvs: convert nq scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:51 +02:00
Julian Anastasov
4ebd288b69 ipvs: convert lc scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:50 +02:00
Julian Anastasov
c5549571f9 ipvs: convert lblcr scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. The set.lock is removed because now it is used in
rare cases, mostly under sched_lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:50 +02:00
Julian Anastasov
c2a4ffb70e ipvs: convert lblc scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. Use a dead flag to prevent new entries to be created
while scheduler is reclaimed. Use hlist for the hash table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:49 +02:00
Julian Anastasov
8f3d0023b9 ipvs: convert dh scheduler to rcu
Use the new add_dest and del_dest methods
to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:49 +02:00
Julian Anastasov
fca9c20ae1 ipvs: add ip_vs_dest_hold and ip_vs_dest_put
ip_vs_dest_hold will be used under RCU lock
while ip_vs_dest_put can be called even after dest
is removed from service, as it happens for conns and
some schedulers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:48 +02:00
Julian Anastasov
6b6df46663 ipvs: preparations for using rcu in schedulers
Allow schedulers to use rcu_dereference when
returning destination on lookup. The RCU read-side critical
section will allow ip_vs_bind_dest to get dest refcnt as
preparation for the step where destinations will be
deleted without an IP_VS_WAIT_WHILE guard that holds the
packet processing during update.

	Add new optional scheduler methods add_dest,
del_dest and upd_dest. For now the methods are called
together with update_service but update_service will be
removed in a following change.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:47 +02:00
Julian Anastasov
71dfa982f1 ipvs: change ip_vs_sched_lock to mutex
The global list with schedulers ip_vs_schedulers
is accessed only from user context - configuration and
scheduler module [un]registration. Use ip_vs_sched_mutex
instead.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:47 +02:00
Julian Anastasov
9a05475ceb ipvs: avoid kmem_cache_zalloc in ip_vs_conn_new
We have many fields to set and few to reset,
use kmem_cache_alloc instead to save some cycles.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:46 +02:00
Julian Anastasov
1845ed0bb2 ipvs: reorder keys in connection structure
__ip_vs_conn_in_get and ip_vs_conn_out_get are
hot places. Optimize them, so that ports are matched first.
By moving net and fwmark below, on 32-bit arch we can fit
caddr in 32-byte cache line and all addresses in 64-byte
cache line.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:45 +02:00
Julian Anastasov
088339a57d ipvs: convert connection locking
Convert __ip_vs_conntbl_lock_array as follows:

- readers that do not modify conn lists will use RCU lock
- updaters that modify lists will use spinlock_t

Now for conn lookups we will use RCU read-side
critical section. Without using __ip_vs_conn_get such
places have access to connection fields and can
dereference some pointers like pe and pe_data plus
the ability to update timer expiration. If full access
is required we contend for reference.

We add barrier in __ip_vs_conn_put, so that
other CPUs see the refcnt operation after other writes.

With the introduction of ip_vs_conn_unlink()
we try to reorganize ip_vs_conn_expire(), so that
unhashing of connections that should stay more time is
avoided, even if it is for very short time.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:45 +02:00
Julian Anastasov
60b6aa3b31 ipvs: convert locks used in persistence engines
Allow the readers to use RCU lock and for
PE module registrations use global mutex instead of
spinlock. All PE modules need to use synchronize_rcu
in their module exit handler.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:44 +02:00
Julian Anastasov
276472eae0 ipvs: remove rs_lock by using RCU
rs_lock was used to protect rs_table (hash table)
from updaters (under global mutex) and readers (packet handlers).
We can remove rs_lock by using RCU lock for readers. Reclaiming
dest only with kfree_rcu is enough because the readers access
only fields from the ip_vs_dest structure.

Use hlist for rs_table.

As we are now using hlist_del_rcu, introduce in_rs_table
flag as replacement for the list_empty checks which do not
work with RCU. It is needed because only NAT dests are in
the rs_table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:43 +02:00
Julian Anastasov
363c97d743 ipvs: convert app locks
We use locks like tcp_app_lock, udp_app_lock,
sctp_app_lock to protect access to the protocol hash tables
from readers in packet context while the application
instances (inc) are [un]registered under global mutex.

As the hash tables are mostly read when conns are
created and bound to app, use RCU for readers and reclaim
app instance after grace period.

Simplify ip_vs_app_inc_get because we use usecnt
only for statistics and rely on module refcounting.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:43 +02:00
Julian Anastasov
026ace060d ipvs: optimize dst usage for real server
Currently when forwarding requests to real servers
we use dst_lock and atomic operations when cloning the
dst_cache value. As the dst_cache value does not change
most of the time it is better to use RCU and to lock
dst_lock only when we need to replace the obsoleted dst.
For this to work we keep dst_cache in new structure protected
by RCU. For packets to remote real servers we will use noref
version of dst_cache, it will be valid while we are in RCU
read-side critical section because now dst_release for replaced
dsts will be invoked after the grace period. Packets to
local real servers that are passed to local stack with
NF_ACCEPT need a dst clone.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:42 +02:00
Julian Anastasov
4115ded131 ipvs: consolidate all dst checks on transmit in one place
Consolidate the PMTU checks, ICMP sending and
skb_dst modification in __ip_vs_get_out_rt and
__ip_vs_get_out_rt_v6. Now skb_dst is changed early
to simplify the transmitters.

Make sure update_pmtu is called only for local clients.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:41 +02:00
Julian Anastasov
f11cb2c2aa ipvs: do not use skb_share_check
We run in contexts like ip_rcv, ipv6_rcv, br_handle_frame,
do not expect shared skbs.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:41 +02:00
Julian Anastasov
183dce554a ipvs: no need to reroute anymore on DNAT over loopback
After commit 70e7341673 (ipv4: Show that ip_send_reply()
is purely unicast routine.) we do not need to reroute DNAT-ed
traffic over loopback because reply uses iph daddr and not
rt_spec_dst.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:40 +02:00
Julian Anastasov
d1deae4d3a ipvs: rename functions related to dst_cache reset
Move and give better names to two functions:

- ip_vs_dst_reset to __ip_vs_dst_cache_reset
- __ip_vs_dev_reset to ip_vs_forget_dev

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:39 +02:00
Julian Anastasov
b8abdf0984 ipvs: convert the IP_VS_XMIT macros to functions
It was a bad idea to hide return statements in macros.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:39 +02:00
Julian Anastasov
313eae637f ipvs: prefer NETDEV_DOWN event to free cached dsts
The real server becomes unreachable on down event,
no need to wait device unregistration. Should help in
releasing dsts early before dst->dev is replaced with lo.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:38 +02:00
Julian Anastasov
c90558dae5 ipvs: avoid routing by TOS for real server
Avoid replacing the cached route for real server
on every packet with different TOS. I doubt that routing
by TOS for real server is used at all, so we should be
better with such optimization.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:37 +02:00
Julian Anastasov
932bc4d7a5 net: add skb_dst_set_noref_force
Rename skb_dst_set_noref to __skb_dst_set_noref and
add force flag as suggested by David Miller. The new wrapper
skb_dst_set_noref_force will force dst entries that are not
cached to be attached as skb dst without taking reference
as long as provided dst is reclaimed after RCU grace period.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:22:53 +02:00
Linus Torvalds
aae92db9f0 Missing base address in Tegra clock driver results in non-operational
PCIe.  On some devices this means that Ethernet will go uninitialized
 and other devices will fail.  This pull request fixes it with a single
 patch to pass the proper base address in the Tegra clock driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRWfJOAAoJEDqPOy9afJhJWnkP/i2zWVD+0r7eyr6LSQzcp/NI
 4KKANm/numhqUb+gijmI1z5iL05bSV7xsg9loiM89uJmgFh6FUL8/LPMX3jVQ1fp
 wPoxf913D2PSVHOyonOgPqEQjLwdC4rADjAQsDOXBbJMBPlm8mrRZuB0ZIQw6MCR
 9tfGyqSN2WSabJgTzb+NY6odc5mZUA7u1IqJ2gmu2SRrIvgyeplMjX7rPvXJiifd
 6zhIhPyrFv8HvrX+Jex4YyGiumtrCj7uTvGqHsVG+0Iiv8CVFc9NgEXCmym1+ssw
 JnqjdnBSq8UbDO0DyL54j6pXWfDtVGstw2oUe0uUW4QrCCojvC3tPXOGi/OG6gTF
 e3/18WIHj6U6XtDiMNLk5ZyGkUjxVNBXrZ2VNXWGd7uKu02AlzZ67Mh3v5oPRE2L
 JtpJg6dH0Xv9h6dVAE9/t3U2YO+XBwRJp6rTrzEs/f8dnGEt6C4WJPUQcJtnLrKf
 OldqSZZmRNjHRKH8hF8bbTqCKzobyhYDMgW9lJ7SQt2qcw9ddZeLS5VJGW0uCZrL
 XWwG29TVXosjjRR8v+i+XHg+HjCc08QQeSLHLB+ZwJQ90k3DiYkd3tuSgdDYPiY8
 mggOnBJMb6d5gogKKR6m+q+qo381tgW6lVMdGEXQkgr5Ix6JuQaXBrd0SZXQE6/Z
 bUxw6/GJec1W0RuT4R8J
 =WU9c
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux

Pull tegra clock driver fix from Mike Turquette:
 "Missing base address in Tegra clock driver results in non-operational
  PCIe.  On some devices this means that Ethernet will go uninitialized
  and other devices will fail.  This pull request fixes it with a single
  patch to pass the proper base address in the Tegra clock driver."

* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux:
  clk: tegra: Allow PLLE training to succeed
2013-04-01 15:06:34 -07:00
Linus Torvalds
dc543f9e2d Critical patches to fix FCoE VN2VN mode with new interfaces targeting 3.9-rc
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRUNwyAAoJEEajxTw9cn4H/u8P/RfsrptbQ2buaWmdEaq1NfG6
 +UFyFQknsEsNlk/mxBCwwsNcSi6Jj6RuKLo+mTWNmcx4BYudjlD2/bFkaqwqXKGi
 g5ukpx+0a9jwNDsN6LV55vKqgBRabfIucISil1RFzGS9+vmxudbrk4CWDknDiCRw
 XGZeuWw6ei0OhZcxzyithwVJvjAqT1MrTpJk9Szo8ufjfkbs1EmSorFHKU92sFsp
 tGChzoYVUIdAAvrMjnOJtlgtI+c1sq/dIYBq7kDYYsRpqa5YJUo5ZZfPpM0aCCVs
 hEIq9454d/Ze3QlxlXfDsY/EWCI0HziXV15OmPCDzZ7DtuIvUGgHRAEBuZH8tMBb
 K09Fh1f5ybFhnnMNEC7f8FKItIZs5AK71+e6S8ots6k5b6RrGFZHkkeL9qwatgIu
 Jz1CnQYHM4TgOq0WrGLV/CkdG/I7IJIGSFZv4vbHSLhMAugtA2buL4pOvQw5w/jt
 Bj5ZY/CuKOOD5OcedhwZMRUc0gcSUwQFhp34pOQ6vi9/QVCpkddtM7YwwfT2DTWf
 YAEMFNPHF0vAm/XD5raaCcYgoCRzh8SsBhqsw9F9xd+SsjxDIvVJZ+d4juOX9Bmk
 ZaKbAWviNzGzfqUOpXy3t3k8zZRHYLWEt7r4i9AzJ94lQydrAEXTHEb/apm3KWp1
 oiyJ/iVx6T4UDhW8jLa0
 =3kzY
 -----END PGP SIGNATURE-----

Merge tag 'for-3.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rwlove/fcoe

Pull FCoE fixes from Robert Love:
 "Critical patches to fix FCoE VN2VN mode with new interfaces targeting
  3.9-rc"

* tag 'for-3.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rwlove/fcoe:
  libfcoe: Fix fcoe_sysfs VN2VN mode
  libfc, fcoe, bnx2fc: Split fc_disc_init into fc_disc_{init, config}
  libfc, fcoe, bnx2fc: Always use fcoe_disc_init for discovery layer initialization
  fcoe: Fix deadlock between create and destroy paths
  bnx2fc: Make the fcoe_cltr the SCSI host parent
2013-04-01 15:06:00 -07:00