Some IOMMUs allocate memory areas spanning LLD's segment boundary limit. It
forces low level drivers to have a workaround to adjust scatter lists that the
IOMMU builds. We are in the process of making all the IOMMUs respect the
segment boundary limits to remove such work around in LLDs.
SPARC64 IOMMUs were rewritten to use the IOMMU helper functions and the commit
89c94f2f70 made the IOMMUs not allocate memory
areas spanning the segment boundary limit.
However, SPARC64 IOMMUs allocate memory areas first then try to merge them
(while some IOMMUs walk through all the sg entries to see how they can be
merged first and allocate memory areas). So SPARC64 IOMMUs also need the
boundary limit checking when they try to merge sg entries.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mimicks almost perfectly the powerpc IOMMU code, except that it
doesn't have the IOMMU_PAGE_SIZE != PAGE_SIZE handling, and it also
lacks the device dma mask support bits.
I'll add that later as time permits, but this gets us at least back to
where we were beforehand.
Signed-off-by: David S. Miller <davem@davemloft.net>
Changeset fde6a3c82d ("iommu sg merging:
sparc64: make iommu respect the segment size limits") broke sparc64
because whilst it added the segment limiting code to the first pass of
SG mapping (in prepare_sg()) it did not add matching code to the
second pass handling (in fill_sg())
As a result the two passes disagree where the segment boundaries
should be, resulting in OOPSes, DMA corruption, and corrupted
superblocks.
Signed-off-by: David S. Miller <davem@davemloft.net>
WARNING: vmlinux.o(.text+0x39be4): Section mismatch in reference from the function probe_existing_entries() to the function .init.text:page_in_phys_avail()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following Section mismatch warning in sparc64:
WARNING: arch/sparc64/kernel/built-in.o(.text+0x13dec): Section mismatch: reference to .devinit.text:pci_scan_one_pbm (between 'psycho_scan_bus' and 'psycho_pbm_init')
WARNING: arch/sparc64/kernel/built-in.o(.text+0x14b58): Section mismatch: reference to .devinit.text:pci_scan_one_pbm (between 'sabre_scan_bus' and 'sabre_init')
WARNING: arch/sparc64/kernel/built-in.o(.text+0x15ea4): Section mismatch: reference to .devinit.text:pci_scan_one_pbm (between 'schizo_scan_bus' and 'schizo_pbm_init')
WARNING: arch/sparc64/kernel/built-in.o(.text+0x17780): Section mismatch: reference to .devinit.text:pci_scan_one_pbm (between 'pci_sun4v_scan_bus' and 'pci_sun4v_get_head')
WARNING: arch/sparc64/kernel/built-in.o(.text+0x17d5c): Section mismatch: reference to .devinit.text:pci_scan_one_pbm (between 'pci_fire_scan_bus' and 'pci_fire_get_head')
WARNING: arch/sparc64/kernel/built-in.o(.text+0x23860): Section mismatch: reference to .devinit.text:vio_dev_release (between 'vio_create_one' and 'vio_add')
WARNING: arch/sparc64/kernel/built-in.o(.text+0x23868): Section mismatch: reference to .devinit.text:vio_dev_release (between 'vio_create_one' and 'vio_add')
The pci_* were all missing __init annotations.
For the vio.c case it was a function with a wrong annotation which was removed.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds checking for possible NULL pointer dereference
if of_find_property() failed.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2c941a2040 looks incomplete. The
helper functions like prepare_sg() need to support sg chaining too.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This updates the sparc64 iommu/pci dma mappers to sg chaining.
Acked-by: David S. Miller <davem@davemloft.net>
Later updated to newer kernel with unified sparc64 iommu sg handling.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This also makes us use the MSI queues correctly.
Each MSI queue is serviced by a normal sun4u/sun4v INO interrupt
handler. This handler runs the MSI queue and dispatches the
virtual interrupts indicated by arriving MSIs in that MSI queue.
All of the common logic is placed in pci_msi.c, with callbacks to
handle the PCI controller specific aspects of the operations.
This common infrastructure will make it much easier to add MSG
support.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) sun4{u,v}_build_msi() have improper return value handling.
We should always return negative error codes, instead of
using the magic value "0" which could in fact be a valid
MSI number.
2) sun4{u,v}_build_msi() should return -ENOMEM instead of
calling prom_prom() halt with kzalloc() of the interrupt
data fails.
3) We 'remembered' the MSI number using a singleton in the
struct device archdata area, this doesn't work for MSI-X
which can cause multiple MSIs assosciated with one device.
Delete that archdata member, and instead store the MSI
number in the IRQ chip data area.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fully unify all of the DMA ops so that subordinate bus types to
the DMA operation providers (such as ebus, isa, of_device) can
work transparently.
Basically, we just make sure that for every system device we
create, the dev->archdata 'iommu' and 'stc' fields are filled
in.
Then we have two platform variants of the DMA ops, one for SUN4U which
actually programs the real hardware, and one for SUN4V which makes
hypervisor calls.
This also fixes the crashes in parport_pc on sparc64, reported by
Meelis Roos.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following warning:
WARNING: vmlinux.o(.text+0x3cf50): Section mismatch: reference to .init.text:page_in_phys_avail (between 'pci_sun4v_pbm_init' and 'sun4v_pci_init')
pci_sun4v_pbm_init and sun4v_pci_init was only used under __init
context so declare them _init.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle arbitrary base and length values as long as they
are multiples of IO_PAGE_SIZE.
Bug found by Arun Kumar Rao.
Signed-off-by: David S. Miller <davem@davemloft.net>
All the sun4u controllers do the same thing to compute the physical
I/O address to poke, and we can move the sun4v code into this common
location too.
This one needs a bit of testing, in particular the Sabre code had some
funny stuff that would break up u16 and/or u32 accesses into pieces
and I didn't think that was needed any more. If it is we need to find
out why and add back code to do it again.
Signed-off-by: David S. Miller <davem@davemloft.net>
The idea is to move more and more things into the pbm,
with the eventual goal of eliminating the pci_controller_info
entirely as there really isn't any need for it.
This stage of the transformations requires some reworking of
the PCI error interrupt handling.
It might be tricky to get rid of the pci_controller_info parenting for
a few reasons:
1) When we get an uncorrectable or correctable error we want
to interrogate the IOMMU and streaming cache of both
PBMs for error status. These errors come from the UPA
front-end which is shared between the two PBM PCI bus
segments.
Historically speaking this is why I choose the datastructure
hierarchy of pci_controller_info-->pci_pbm_info
2) The probing does a portid/devhandle match to look for the
'other' pbm, but this is entirely an artifact and can be
eliminated trivially.
What we could do to solve #1 is to have a "buddy" pointer from one pbm
to another.
Signed-off-by: David S. Miller <davem@davemloft.net>
Namely bus-range and ino-bitmap.
This allows us also to eliminate pci_controller_info's
pci_{first,last}_busno fields as only the pbm ones are
used now.
Signed-off-by: David S. Miller <davem@davemloft.net>
set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
it at some point in their setup routine, and then the generic code sets up the
reverse mapping from the msi_desc back to the irq.
set_irq_msi() should do both connections, making it the one and only call
required to connect an irq with it's MSI desc and vice versa.
The arch code MUST call set_irq_msi(), and it must do so only once it's sure
it's not going to fail the irq allocation.
Given that there's no need for the arch to return the irq anymore, the return
value from the arch setup routine just becomes 0 for success and anything else
for failure.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We fake up a dummy one in all cases because that is the simplest
thing to do and it happens to be necessary for hypervisor systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't do the "Simba APB is a PBM" bogosity for Sabre
controllers any longer, so this pbms_same_domain thing
is no longer necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Almost entirely taken from the 64-bit PowerPC PCI code.
This allowed to eliminate a ton of cruft from the sparc64
PCI layer.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is kind of hokey, we could use the hardware provided facilities
much better.
MSIs are assosciated with MSI Queues. MSI Queues generate interrupts
when any MSI assosciated with it is signalled. This suggests a
two-tiered IRQ dispatch scheme:
MSI Queue interrupt --> queue interrupt handler
MSI dispatch --> driver interrupt handler
But we just get one-level under Linux currently. What I'd like to do
is possibly stick the IRQ actions into a per-MSI-Queue data structure,
and dispatch them form there, but the generic IRQ layer doesn't
provide a way to do that right now.
So, the current kludge is to "ACK" the interrupt by processing the
MSI Queue data structures and ACK'ing them, then we run the actual
handler like normal.
We are wasting a lot of useful information, for example the MSI data
and address are provided with ever MSI, as well as a system tick if
available. If we could pass this into the IRQ handler it could help
with certain things, in particular for PCI-Express error messages.
The MSI entries on sparc64 also tell you exactly which bus/device/fn
sent the MSI, which would be great for error handling when no
registered IRQ handler can service the interrupt.
We override the disable/enable IRQ chip methods in sun4v_msi, so we
have to call {mask,unmask}_msi_irq() directly from there. This is
another ugly wart.
Signed-off-by: David S. Miller <davem@davemloft.net>
Do IRQ determination generically by parsing the PROM properties,
and using IRQ controller drivers for final resolution.
One immediate positive effect is that all of the IRQ frobbing
in the EBUS, ISA, and PCI controller layers has been eliminated.
We just look up the of_device and use the properly computed
value.
The PCI controller irq_build() routines are gone and no longer
used. Unfortunately sbus_build_irq() has to remain as there is
a direct reference to this in the sunzilog driver. That can be
killed off once the sparc32 side of this is written and the
sunzilog driver is transformed into an "of" bus driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
One thing this change pointed out was that we really should
pull the "get 'local-mac-address' property" logic into a helper
function all the network drivers can call.
Signed-off-by: David S. Miller <davem@davemloft.net>
On some sun4v systems, after netboot the ethernet controller and it's
DMA mappings can be left active. The net result is that the kernel
can end up using memory the ethernet controller will continue to DMA
into, resulting in corruption.
To deal with this, we are more careful about importing IOMMU
translations which OBP has left in the IO-TLB. If the mapping maps
into an area the firmware claimed was free and available memory for
the kernel to use, we demap instead of import that IOMMU entry.
This is going to cause the network chip to take a PCI master abort on
the next DMA it attempts, if it has been left going like this. All
tests show that this is handled properly by the PCI layer and the e1000
drivers.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the long overdue conversion of sparc64 over to
the generic IRQ layer.
The kernel image is slightly larger, but the BSS is ~60K
smaller due to the reduced size of struct ino_bucket.
A lot of IRQ implementation details, including ino_bucket,
were moved out of asm-sparc64/irq.h and are now private to
arch/sparc64/kernel/irq.c, and most of the code in irq.c
totally disappeared.
One thing that's different at the moment is IRQ distribution,
we do it at enable_irq() time. If the cpu mask is ALL then
we round-robin using a global rotating cpu counter, else
we pick the first cpu in the mask to support single cpu
targetting. This is similar to what powerpc's XICS IRQ
support code does.
This works fine on my UP SB1000, and the SMP build goes
fine and runs on that machine, but lots of testing on
different setups is needed.
Signed-off-by: David S. Miller <davem@davemloft.net>