Commit graph

53 commits

Author SHA1 Message Date
Rusty Russell
8c79873da0 lguest: turn Waker into a thread, not a process
lguest uses a Waker process to break it out of the kernel (ie.
actually running the guest) when file descriptor needs attention.

Changing this from a process to a thread somewhat simplifies things:
it can directly access the fd_set of things to watch.  More
importantly, it means that the Waker can see Guest memory correctly,
so /dev/vring file descriptors will work as anticipated (the
alternative is to actually mmap MAP_SHARED, but you can't do that with
/dev/zero).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:39 +10:00
Rusty Russell
0f0c4fab82 lguest: Enlarge virtio rings
With big packets, 128 entries is a little small.

Guest -> Host 1GB TCP:
Before: 8.43625 seconds xmit 95640 recv 198266 timeout 49771 usec 1252
After: 8.01099 seconds xmit 49200 recv 102263 timeout 26014 usec 2118

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:38 +10:00
Rusty Russell
398f187d74 lguest: Use GSO/IFF_VNET_HDR extensions on tun/tap
Guest -> Host 1GB TCP:
Before 20.1974 seconds xmit 214510 recv 5 timeout 214491 usec 278
After 8.43625 seconds xmit 95640 recv 198266 timeout 49771 usec 1252

Host -> Guest 1GB TCP:
Before: Seconds 9.98854 xmit 172166 recv 5344 timeout 172157 usec 251
After: Seconds 5.72803 xmit 244322 recv 9919 timeout 244302 usec 156

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:37 +10:00
Rusty Russell
9254926f85 lguest: Remove 'network: no dma buffer!' warning
This warning can happen a lot under load, and it should be warnx not
warn anwyay.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:37 +10:00
Rusty Russell
aa1249840b lguest: Adaptive timeout
Since the correct timeout value varies, use a heuristic which adjusts
the timeout depending on how many packets we've seen.  This gives
slightly worse results, but doesn't need tweaking when GSO is
introduced.

500 usec	19.1887		xmit 561141 recv 1 timeout 559657
Dynamic (278)	20.1974		xmit 214510 recv 5 timeout 214491 usec 278

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:36 +10:00
Rusty Russell
a161883a29 lguest: Tell Guest net not to notify us on every packet xmit
virtio_ring has the ability to suppress notifications.  This prevents
a guest exit for every packet, but we need to set a timer on packet
receipt to re-check if there were any remaining packets.

Here are the times for 1G TCP Guest->Host with different timeout
settings (it matters because the TCP window doesn't grow big enough to
fill the entire buffer):

Timeout value	Seconds		Xmit/Recv/Timeout
None (before)	25.3784		xmit 7750233 recv 1
2500 usec	62.5119		xmit 207020 recv 2 timeout 207020
1000 usec	34.5379		xmit 207003 recv 2 timeout 207003
750 usec	29.2305		xmit 207002 recv 1 timeout 207002
500 usec	19.1887		xmit 561141 recv 1 timeout 559657
250 usec	20.0465		xmit 214128 recv 2 timeout 214110
100 usec	19.2583		xmit 561621 recv 1 timeout 560153

(Note that these values are sensitive to the GSO patches which come
 later, and probably other traffic-related variables, so take with a
 large grain of salt).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:36 +10:00
Rusty Russell
5dae785a82 lguest: net block unneeded receive queue update notifications
Number of exits transmitting 10GB Guest->Host before:
	network xmit 7858610 recv 118136

After:
	network xmit 7750233 recv 1

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:35 +10:00
Rusty Russell
b5111790fa lguest: wrap last_avail accesses.
To simplify the transition to when we publish indices in the ring
(and make shuffling my patch queue easier), wrap them in a lg_last_avail()
macro.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:35 +10:00
Rusty Russell
28fd6d7f95 lguest: virtio-rng support
This is a simple patch to add support for the virtio "hardware random
generator" to lguest.  It gets about 1.2 MB/sec reading from /dev/hwrng
in the guest.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:34 +10:00
Mark McLoughlin
dec6a2be08 lguest: Support assigning a MAC address
If you've got a nice DHCP configuration which maps MAC
addresses to specific IP addresses, then you're going to
want to start your guest with one of those MAC addresses.

Also, in Fedora, we have persistent network interface naming
based on the MAC address, so with randomly assigned
addresses you're soon going to hit eth13. Who knows what
will happen then!

Allow assigning a MAC address to the network interface with
e.g.

  --tunnet=bridge:eth0:00:FF:95:6B:DA:3D

or:

  --tunnet=192.168.121.1:00:FF:95:6B:DA:3D

which is pretty unintelligable, but ...

(includes Rusty's minor rework)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:33 +10:00
Mark McLoughlin
34bdaab44d lguest: Don't leak /dev/zero fd
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:33 +10:00
Rusty Russell
32c68e5c56 lguest: fix verbose printing of device features.
%02x is more appropriate for bytes than %08x.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-29 09:58:32 +10:00
Rusty Russell
2088761152 lguest: notify on empty
This is the lguest implementation of the VIRTIO_F_NOTIFY_ON_EMPTY feature.
It is currently only published for network devices, but it is turned on for
everyone.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30 15:09:46 +10:00
Rusty Russell
a007a751d9 lguest: make Launcher see device status updates
This brings us closer to Real Life, where we'd examine the device
features once it's set the DRIVER_OK status bit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:54 +10:00
Rusty Russell
cb38fa23c1 virtio: de-structify virtio_block status byte
Ron Minnich points out that a struct containing a char is not always
sizeof(char); simplest to remove the structure to avoid confusion.

Cc: "ron minnich" <rminnich@gmail.com>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:45 +10:00
Rusty Russell
a6bd8e1303 lguest: comment documentation update.
Took some cycles to re-read the Lguest Journey end-to-end, fix some
rot and tighten some phrases.

Only comments change.  No new jokes, but a couple of recycled old jokes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-28 11:05:54 +11:00
Rusty Russell
e18b094f0f lguest: Don't need comment terminator before disk section.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-28 11:05:53 +11:00
Paul Bolle
9b7a448e2b lguest: lguest.txt documentation fix
Mention the config options for the Virtio drivers and move the Virtualization
menu to the toplevel.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-28 11:05:52 +11:00
Tim Ansell
b488f22d70 lguest: Add puppies which where previously missing.
lguest doesn't have features, it has puppies!

Signed-off-by: Timothy R Ansell <mithro@mithis.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-28 11:05:52 +11:00
Paul Bolle
1ef36fa64e lguest: Do not append space to guests kernel command line
The lguest launcher appends a space to the kernel command line (if kernel
arguments are specified on its command line). This space is unneeded. More
importantly, this appended space will make Red Hat's nash script interpreter
(used in a Fedora style initramfs) add an empty argument to init's command
line. This empty argument will make kernel arguments like "init=/bin/bash"
fail (because the shell will try to execute a script with an empty name).
This could be considered a bug in nash, but is easily fixed in the lguest
launcher too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-11 09:35:58 +11:00
Rusty Russell
6e5aa7efb2 virtio: reset function
A reset function solves three problems:

1) It allows us to renegotiate features, eg. if we want to upgrade a
   guest driver without rebooting the guest.

2) It gives us a clean way of shutting down virtqueues: after a reset,
   we know that the buffers won't be used by the host, and

3) It helps the guest recover from messed-up drivers.

So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.

We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:03 +11:00
Rusty Russell
426e3e0af5 virtio: clarify NO_NOTIFY flag usage
The other side (host) can set the NO_NOTIFY flag as an optimization,
to say "no need to kick me when you add things".  Make it clear that
this is advisory only; especially that we should always notify when
the ring is full.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:00 +11:00
Rusty Russell
a586d4f601 virtio: simplify config mechanism.
Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.

The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:57 +11:00
Glauber de Oliveira Costa
e3283fa0cc lguest: adapt launcher to per-cpuness
This patch makes uses of pread() and pwrite() in lguest launcher
to communicate the vcpu id to the lguest driver. The id is kept in
a thread variable, which means we'll span in the future, vcpus as
threads. But right now, only the infrastructure is out there.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-01-30 22:50:05 +11:00
Balaji Rao
ec04b13f67 lguest: Reboot support
Reboot Implemented

(Prevent fd leak, fix style and fix documentation --RR)

Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-01-30 22:50:04 +11:00
Sheela
2e12a7fb0d Fix lguest documentation
Share net is not supported, Rusty is an "idiot" .

Signed-off-by: Sheela Sequeira <sheela.sequeira@gmail.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17 19:28:16 -08:00
Rusty Russell
d1c856e0f1 lguest: Fix uninitialized members in example launcher
Thanks valgrind!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-19 11:20:41 +11:00
Rusty Russell
42b36cc0ce virtio: Force use of power-of-two for descriptor ring sizes
The virtio descriptor rings of size N-1 were nicely set up to be
aligned to an N-byte boundary.  But as Anthony Liguori points out, the
free-running indices used by virtio require that the sizes be a power
of 2, otherwise we get problems on wrap (demonstrated with lguest).

So we replace the clever "2^n-1" scheme with a simple "align to page
boundary" scheme: this means that all virtio rings take at least two
pages, but it's safer than guessing cache alignment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-12 13:59:40 +11:00
Anthony Liguori
1200e646ae lguest: Fix lguest virtio-blk backend size computation
This seems like an obvious typo but it's worked in the past because the virtio
blk frontend just ignores the length field on completion.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-12 13:59:26 +11:00
Rusty Russell
e1e72965ec lguest: documentation update
Went through the documentation doing typo and content fixes.  This
patch contains only comment and whitespace changes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-25 15:02:50 +10:00
Rusty Russell
db24e8c2ef lguest: example launcher header cleanup.
Now the kernel headers are clean for userspace export, we don't need
to typedef kernel types before including them.  We also don't need
pci_ids.h (that was from an earlier virtio draft).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-25 14:09:25 +10:00
Rusty Russell
43d33b21a0 Use "struct boot_params" in example launcher
Now that the "struct boot_params" is userspace accessible, we don't need
magic numbers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:57 +10:00
Rusty Russell
5bbf89fc26 Loading bzImage directly.
Now arch/i386/boot/compressed/head.S understands the hardware_platform field,
we can directly execute bzImages.  No more horrific unpacking code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:57 +10:00
Rusty Russell
814a0e5cdf Revert lguest magic and use hook in head.S
Version 2.07 of the boot protocol uses 0x23C for the hardware_subarch
field, that for lguest is "1".  This allows us to use the standard
boot entry point rather than the "GenuineLguest" string hack.

The standard entry point also clears the BSS and copies the boot parameters
and commandline for us, saving more code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:57 +10:00
Chris Malley
1f5a29022a Update lguest documentation to reflect the new virtual block device name.
Signed-off-by: Chris Malley <mail@chrismalley.co.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:57 +10:00
Rusty Russell
56ae43dfe2 Example launcher handle guests not being ready for input
We currently discard console and network input when the guest has no
input buffers.  This patch changes that, so that we simply stop
listening to that fd until the guest refills its input buffers.

This is particularly important because hvc_console without interrupts
does backoff polling and so often lose characters if we discard.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:56 +10:00
Rusty Russell
17cbca2ba3 Update example launcher for virtio
Implements virtio-based console, network and block servers.  The block
server uses a thread so it's async, which is an improvement over the
old synchronous implementation (but a little more complex).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:56 +10:00
Rusty Russell
47436aa4ad Boot with virtual == physical to get closer to native Linux.
1) This allows us to get alot closer to booting bzImages.

2) It means we don't have to know page_offset.

3) The Guest needs to modify the boot pagetables to create the
   PAGE_OFFSET mapping before jumping to C code.

4) guest_pa() walks the page tables rather than using page_offset.

5) We don't use page_offset to figure out whether to emulate: it was
   always kinda quesationable, and won't work for instructions done
   before remapping (bzImage unpacking in particular).

6) We still want the kernel address for tlb flushing: have the initial
   hypercall give us that, too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:54 +10:00
Jes Sorensen
511801dc31 Change example launcher to use unsigned long not u32
Apply Clue 2x4 to lguest userland<->kernel handling code and the
lguest launcher. Pointers are not to be passed in u32's!

Basic rule of thumb: Anything passing u32's back and forth should be
passing unsigned longs to be portable to 64 bit archs.

For those who forgotten already, I repeat: NO POINTERS IN u32!

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:52 +10:00
Rusty Russell
3c6b5bfa3c Introduce guest mem offset, static link example launcher
In order to avoid problematic special linking of the Launcher, we give
the Host an offset: this means we can use any memory region in the
Launcher as Guest memory rather than insisting on mmap() at 0.

The result is quite pleasing: a number of casts are replaced with
simple additions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:50 +10:00
Ronald G. Minnich
6649bb7af6 Accept elf files that are valid but have sections that can not be mmap'ed for some reason.
Plan9 kernel binaries don't neatly align their ELF sections to our
page boundaries.

Signed-off-by: Ronald G. Minnich <rminnich@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:50 +10:00
Rusty Russell
b45d8cb054 Make lguest_launcher.h types userspace-friendly
lguest_launcher.h uses "u32" not "__u32", which sets a bad example.  Fix that,
and include <linux/types.h>.

This means we need to use -I on the Launcher build line so types.h is found.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:49 +10:00
Rusty Russell
9653c4aff9 lguest.txt update
o Describe the new split configurations
o Highlight code documentation in drivers/lguest/README
o Point out necessity of having a getty on /dev/hvc0
o Remove gratuitous "m" in example
o Don't discuss I/O model here, stick to user documentation.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:48 +10:00
Glauber de Oliveira Costa
babed5c002 turn err into errx in lguest call sites
These two callsites should really be errx instead of err, since there is
no errno associated with them in the moment they are issued.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Glauber de Oliveira Costa <gcosta@redhat.com>
2007-10-23 15:49:47 +10:00
Rusty Russell
ee8e7cfe9d Make asm-x86/bootparam.h includable from userspace.
To actually write a bootloader (or, say, the lguest launcher)
currently requires duplication of these structures.  Making them
includable from userspace is much nicer.

We merge the common userspace-required definitions of e820_32/64.h
into e820.h for export.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:47 +10:00
Thomas Gleixner
96a388de5d i386/x86_64: move headers to include/asm-x86
Move the headers to include/asm-x86 and fixup the
header install make rules

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-11 11:20:03 +02:00
Chris Malley
f6a592e8ab lguest example launcher truncates block device file to 0 length on problems
The function should also use ftruncate64() rather than ftruncate() to prevent
files over 4GB (not uncommon for a root filesystem) being zeroed.

Signed-off-by: Chris Malley <mail@chrismalley.co.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26 09:22:04 -07:00
Ronald G. Minnich
e3bcf5e278 lguest: avoid shared libraries mapped over guest memory
Some versions of ld.so mmap the shared libraries right in over guest
memory, so compile lguest statically by default.

[ FC7 maps shared libraries very low, where the launcher maps guest's
  physical memory.  Quick fix is to link Launcher static, real fix is
  for 2.6.24. ]

-static is a simple fix. I expect this problem will be more common than we
like, as different distro's make different "improvements" to ld.so

Signed-off-by: Ronald G. Minnich <rminnich@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-09 08:14:56 -07:00
Rusty Russell
f56a384e98 lguest: documentation VII: FIXMEs
Documentation: The FIXMEs

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:35:17 -07:00
Rusty Russell
dde797899a lguest: documentation IV: Launcher
Documentation: The Launcher

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:35:17 -07:00