Commit Graph

21492 Commits

Author SHA1 Message Date
Ronnie Sahlberg
31459f463a iscsi: Pick default initiator-name based on the name of the VM
This patch updates the iscsi layer to automatically pick a 'unique'
initiator-name based on the name of the vm in case the user has not set
an explicit iqn-name to use.

Create a new function qemu_get_vm_name() that returns the name of the VM,
if specified.

This way we can thus create default names to use as the initiator name
based on the guest session.

If the VM is not named via the '-name' command line argument, the iscsi
initiator-name used wiull simply be

    iqn.2008-11.org.linux-kvm

If a name for the VM was specified with the '-name' option, iscsi will
use a default initiatorname of

    iqn.2008-11.org.linux-kvm:<name>

These names are just the default iscsi initiator name that qemu will
generate/use only when the user has not set an explicit initiator name
via the commandlines or config files.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-08-09 15:04:09 +02:00
Paolo Bonzini
f2ef4a6dd9 iscsi: reorganize code for parse_initiator_name
Merge the occurrences of the "iqn.2008-11.org.linux-kvm" string
to avoid duplication.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-08-08 14:51:59 +02:00
Paolo Bonzini
b93c94f7ec iscsi: do not leak initiator_name
The argument of iscsi_create_context is never freed by libiscsi,
which in fact calls strdup on it.  Avoid a leak.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-08-08 14:51:59 +02:00
Juan Quintela
dd051c7217 Restart optimization on stage3 update version
Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:12 +02:00
Orit Wasserman
f36d55af74 Add XBZRLE statistics
Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:12 +02:00
Orit Wasserman
004d4c10ae Add migration accounting for normal and duplicate pages
Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:12 +02:00
Orit Wasserman
62d4e3fe31 Change total_time to total-time in MigrationStats
migration total_time was introduced in commit
d5f8a5701d for QEMU 1.2

Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:12 +02:00
Orit Wasserman
9e1ba4cc4e Add migrate_set_cache_size command
Change XBZRLE cache size in bytes (the size should be a power of 2, it will be
rounded down to the nearest power of 2).
If XBZRLE cache size is too small there will be many cache miss.

New query-migrate-cache-size QMP command and 'info migrate_cache_size' HMP
command to query cache value.

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:12 +02:00
Orit Wasserman
17ad9b358b Add XBZRLE to ram_save_block and ram_save_live
In the outgoing migration check to see if the page is cached and
changed, then send compressed page by using save_xbrle_page function.
In the incoming migration check to see if RAM_SAVE_FLAG_XBZRLE is set
and decompress the page (by using load_xbrle function).

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:12 +02:00
Orit Wasserman
302dfbeb21 Add xbzrle_encode_buffer and xbzrle_decode_buffer functions
For performance we are encoding long word at a time.
For nzrun we use long-word-at-a-time NULL-detection tricks from strcmp():
using ((lword - 0x0101010101010101) & (~lword) & 0x8080808080808080) test
to find out if any byte in the long word is zero.

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:12 +02:00
Orit Wasserman
e6546bb938 Add uleb encoding/decoding functions
Implement Unsigned Little Endian Base 128.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:12 +02:00
Orit Wasserman
9fb26641ab Add cache handling functions
Add MRU page cache mechanism.
The page are accessed by their address.

Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:12 +02:00
Orit Wasserman
34c26412b7 Add XBZRLE documentation
Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:11 +02:00
Orit Wasserman
0045843324 Add migrate-set-capabilities
The management can enable/disable a capability for the next migration by using
migrate-set-capabilities QMP command.
The user can use migrate_set_capability HMP command.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:11 +02:00
Orit Wasserman
bbf6da32b5 Add migration capabilities
The management can query the current migration capabilities using
query-migrate-capabilities QMP command.
The user can use 'info migrate_capabilities' HMP command.
Currently only XBZRLE capability is available.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-08-08 13:51:11 +02:00
Gerd Hoffmann
01afdadc92 update seabios to latest master
Upstream seabios commit 5a023065388287e261ae9212452ff541f9fa9cd3

Major changes since 1.7.0:
 - Usual share of bugfixes and cleanups ;)
 - Support for 64bit PCI bars and mapping those above 4G.
 - Stack switching for real mode irq handlers to reduce
   seabios stack footprint.
 - Support for booting from lsi scsi hba.
 - Support for booting from usb attached scsi.
 - Support for non-linear apic ids.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2012-08-07 17:11:12 +02:00
Anthony Liguori
c03b0aa0ca Merge remote-tracking branch 'kraxel/usb.58' into staging
* kraxel/usb.58:
  usb-storage: fix SYNCHRONIZE_CACHE
  usb-storage: improve debug logging
2012-08-07 09:46:40 -05:00
Anthony Liguori
b262fce11a Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  qemu-img: use QemuOpts instead of QEMUOptionParameter in resize function
  qemu-iotests: Be more flexible with image creation options
  qemu-iotests: add 039 qcow2 lazy refcounts test
  qemu-io: add "abort" command to simulate program crash
  qcow2: implement lazy refcounts
  qemu-iotests: ignore qemu-img create lazy_refcounts output
  docs: add lazy refcounts bit to qcow2 specification
  qcow2: introduce dirty bit
  docs: add dirty bit to qcow2 specification
  qemu-iotests: add qed.py image manipulation utility
  qapi: generalize documentation of streaming commands
  ide scsi: Mess with geometry only for hard disk devices
2012-08-07 09:46:24 -05:00
Gerd Hoffmann
54414218d7 usb-storage: fix SYNCHRONIZE_CACHE
Commit 5931065907 is incomplete,
we'll arrive in the scsi command complete callback in CSW state
and must handle that case correctly.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2012-08-07 10:49:06 +02:00
Gerd Hoffmann
06f9847dc3 usb-storage: improve debug logging
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2012-08-07 10:49:06 +02:00
Anthony Liguori
0b8db8fe15 slirp: fix build on mingw32
in_addr_t isn't available on mingw32.  Just use an unsigned long instead.  I
considered typedef'ing in_addr_t on mingw32 but this would potentially be
brittle if mingw32 did introduce the type.

Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-08-06 19:31:55 -05:00
Dong Xu Wang
20caf0f766 qemu-img: use QemuOpts instead of QEMUOptionParameter in resize function
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Kevin Wolf
b0869a46b2 qemu-iotests: Be more flexible with image creation options
qemu-iotests already filters out image creation options that may be
present or not in order to get the same output in both cases. However,
often it only considers the default value of the option. Cover all valid
values instead so that ./check -o name=value can be used successfull for
all of them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Stefan Hajnoczi
dc68afe0f3 qemu-iotests: add 039 qcow2 lazy refcounts test
This tests establishes the basic post-conditions of the qcow2 lazy
refcounts features:

  1. If the image was closed normally, it is marked clean.

  2. If an allocating write was performed and the image was not closed
     normally, then it is marked dirty.

     a. Written data can be read back successfully.
     b. The image file can be repaired and will be marked clean again.
     c. The image file is automatically repaired when opened read/write.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Stefan Hajnoczi
e01c30d3e2 qemu-io: add "abort" command to simulate program crash
Avoiding data loss and corruption is the top requirement for image file
formats.  The qemu-io "abort" command makes it possible to simulate
program crashes and does not give the image format a chance to cleanly
shut down.  This command is useful for data integrity test cases.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Stefan Hajnoczi
bfe8043e92 qcow2: implement lazy refcounts
Lazy refcounts is a performance optimization for qcow2 that postpones
refcount metadata updates and instead marks the image dirty.  In the
case of crash or power failure the image will be left in a dirty state
and repaired next time it is opened.

Reducing metadata I/O is important for cache=writethrough and
cache=directsync because these modes guarantee that data is on disk
after each write (hence we cannot take advantage of caching updates in
RAM).  Refcount metadata is not needed for guest->file block address
translation and therefore does not need to be on-disk at the time of
write completion - this is the motivation behind the lazy refcount
optimization.

The lazy refcount optimization must be enabled at image creation time:

  qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
  qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough

Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Stefan Hajnoczi
91cf8a35e7 qemu-iotests: ignore qemu-img create lazy_refcounts output
Hide the default lazy_refcounts=off output from qemu-img like we do with
other image creation options.  This ensures that existing golden outputs
continue to pass despite the new option that has been added.

Note that this patch applies before the one that actually introduces the
lazy_refcounts=on|off option.  This ensures git-bisect(1) continues to
work.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Stefan Hajnoczi
dae8796d00 docs: add lazy refcounts bit to qcow2 specification
The lazy refcounts bit indicates that this image can take advantage of
the dirty bit and that refcount updates can be postponed.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Stefan Hajnoczi
c61d0004bc qcow2: introduce dirty bit
This patch adds an incompatible feature bit to mark images that have not
been closed cleanly.  When a dirty image file is opened a consistency
check and repair is performed.

Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Stefan Hajnoczi
0f6d767aa8 docs: add dirty bit to qcow2 specification
The dirty bit will make it possible to perform lazy refcount updates,
where the image file is not kept consistent all the time.  Upon opening
a dirty image file, it is necessary to perform a consistency check and
repair any incorrect refcounts.

Therefore the dirty bit must be an incompatible feature bit.  We don't
want old programs accessing a file with stale refcounts.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Stefan Hajnoczi
e77964f79b qemu-iotests: add qed.py image manipulation utility
The qed.py utility can inspect and manipulate QED image files.  It can
be used for testing to see the state of image metadata and also to
inject corruptions into the image file.  It also has a scrubbing feature
to copy just the metadata out of an image file, allowing users to share
broken image files without revealing data in bug reports.

This has lived in my local repo for a long time but could be useful
to others.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Paolo Bonzini
05290d80c8 qapi: generalize documentation of streaming commands
Talk about background operations in general, rather than specifically
about streaming.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Markus Armbruster
b2df431407 ide scsi: Mess with geometry only for hard disk devices
Legacy -drive cyls=... are now ignored completely when the drive
doesn't back a hard disk device.  Before, they were first checked
against a hard disk's limits, then ignored.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Paolo Bonzini
26b9b5fe17 virtio: fix vhost handling
Commit b1f416aa8d breaks vhost_net
because it always registers the virtio_pci_host_notifier_read() handler
function on the ioeventfd, even when vhost_net.ko is using the ioeventfd.
The result is both QEMU and vhost_net.ko polling on the same eventfd
and the virtio_net.ko guest driver seeing inconsistent results:

  # ifconfig eth0 192.168.0.1 netmask 255.255.255.0
  virtio_net virtio0: output:id 0 is not a head!

To fix this, proceed the same as we do for irqfd: add a parameter to
virtio_queue_set_host_notifier_fd_handler and in that case only set
the notifier, not the handler.

Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Tested-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-08-06 14:01:44 -05:00
Anthony Liguori
22d48de65c Merge remote-tracking branch 'kiszka/queues/slirp' into staging
* kiszka/queues/slirp:
  slirp: Handle whole 127.0.0.0/8 network as local addresses.
2012-08-06 13:59:59 -05:00
Blue Swirl
f777501cc9 Merge branch 'axp-next' of git://repo.or.cz/qemu/rth
* 'axp-next' of git://repo.or.cz/qemu/rth:
  alpha-linux-user: Fix the getpriority syscall
  alpha-linux-user: Properly handle the non-rt sigprocmask syscall.
  alpha-linux-user: Fix a3 error return with v0 error bypass.
  linux-user: Translate pipe2 flags; add to strace
  linux-user: Allocate the right amount of space for non-fixed file maps
  linux-user: Handle O_SYNC, O_NOATIME, O_CLOEXEC, O_PATH
  linux-user: Sync fcntl.h bits with the kernel
  alpha-linux-user: Handle TARGET_SSI_IEEE_RAISE_EXCEPTION properly
  alpha-linux-user: Work around hosted mmap allocation problems
  alpha-linux-user: Fix signal handling
2012-08-04 17:58:23 +00:00
Richard Henderson
95c098286b alpha-linux-user: Fix the getpriority syscall
Alpha uses unbiased priority values in the syscall, with the a3
return value signaling error conditions.  Therefore, properly
interpret the libc getpriority as needed for the guest rather
than passing the host value through unchanged.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:50 -07:00
Richard Henderson
0229f5a30e alpha-linux-user: Properly handle the non-rt sigprocmask syscall.
Name the syscall properly for QEMU, kernel source notwithstanding.
Fix syntax errors in the code thus enabled within do_syscall.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:49 -07:00
Richard Henderson
0e141977e6 alpha-linux-user: Fix a3 error return with v0 error bypass.
We were failing to initialize a3 for syscalls that bypass the
negative return value error check.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:49 -07:00
Richard Henderson
e7ea6cbefd linux-user: Translate pipe2 flags; add to strace
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:49 -07:00
Richard Henderson
a5e7ee467c linux-user: Allocate the right amount of space for non-fixed file maps
If we let the kernel handle the implementation of mmap_find_vma,
via an anon mmap, we must use the size as indicated by the user
and not the size truncated to the filesize.

This happens often in ld.so, where we initially mmap the file to
the size of the text+data+bss to reserve an area, then mmap+fixed
over the top to properly handle data and bss.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:49 -07:00
Richard Henderson
afc8763f9d linux-user: Handle O_SYNC, O_NOATIME, O_CLOEXEC, O_PATH
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:49 -07:00
Richard Henderson
4eeea4f3f1 linux-user: Sync fcntl.h bits with the kernel
For each target, only define the bits that appear in
arch/target/include/asm/fcntl.h.  Mirror the kernel's
asm-generic layout by handling anything undefined afterward.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:49 -07:00
Richard Henderson
6e06d515d4 alpha-linux-user: Handle TARGET_SSI_IEEE_RAISE_EXCEPTION properly
We weren't aggregating the exceptions, nor raising signals properly.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:49 -07:00
Richard Henderson
76393642ae alpha-linux-user: Work around hosted mmap allocation problems
Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:48 -07:00
Richard Henderson
d0f204952a alpha-linux-user: Fix signal handling
Proper signal numbers were not defined, and EXCP_INTERRUPT
was unhandled, leading to all sorts of subtle confusion.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-08-04 09:37:48 -07:00
Blue Swirl
17a4ed8a5e bitops: drop volatile qualifier
Qualifier 'volatile' is not useful for applications, it's too strict
for single threaded code but does not give the real atomicity guarantees
needed for multithreaded code.

Drop them and now useless casts.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-04 15:51:23 +00:00
Peter Maydell
9b4c0b56b5 configure: Fix set-but-not-used warning in Xen 4.1 probe
The Xen 4.1 probe never uses the return value from xc_interface_open(),
so was provoking a compiler warning on newer gcc. Fix by not bothering
to put the return value anywhere.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-04 13:25:05 +00:00
Peter Maydell
69deef089d configure: Don't run Xen compile checks in subshells
The Xen compile checks are currently run inside subshells. This
is unnecessary and has the effect that if do_cc() exits with
an error message then this only causes the subshell to exit,
not the whole of configure, which is confusing. Remove the
subshells, changing:
  if ( cat ; compile_prog ) ; then ...
to
  if cat && compile_prog ; then ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-04 13:25:04 +00:00
Chegu Vinod
ee785fed5d Fixes related to processing of qemu's -numa option
The -numa option to qemu is used to create [fake] numa nodes
and expose them to the guest OS instance.

There are a couple of issues with the -numa option:

a) Max VCPU's that can be specified for a guest while using
   the qemu's -numa option is 64. Due to a typecasting issue
   when the number of VCPUs is > 32 the VCPUs don't show up
   under the specified [fake] numa nodes.

b) KVM currently has support for 160VCPUs per guest. The
   qemu's -numa option has only support for upto 64VCPUs
   per guest.
This patch addresses these two issues.

Below are examples of (a) and (b)

a) >32 VCPUs are specified with the -numa option:

/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
71:01:01 \
-net tap,ifname=tap0,script=no,downscript=no \
-vnc :4

...
Upstream qemu :
--------------

QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
6 nodes
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41
node 0 size: 131072 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51
node 1 size: 131072 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59
node 2 size: 131072 MB
node 3 cpus: 30
node 3 size: 131072 MB
node 4 cpus:
node 4 size: 131072 MB
node 5 cpus: 31
node 5 size: 131072 MB

With the patch applied :
-----------------------

QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
6 nodes
node 0 cpus: 0 1 2 3 4 5 6 7 8 9
node 0 size: 131072 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19
node 1 size: 131072 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29
node 2 size: 131072 MB
node 3 cpus: 30 31 32 33 34 35 36 37 38 39
node 3 size: 131072 MB
node 4 cpus: 40 41 42 43 44 45 46 47 48 49
node 4 size: 131072 MB
node 5 cpus: 50 51 52 53 54 55 56 57 58 59
node 5 size: 131072 MB

b) >64 VCPUs specified with -numa option:

/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
-cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4

...

Upstream qemu :
--------------

only 63 CPUs in NUMA mode supported.
only 64 CPUs in NUMA mode supported.
QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
8 nodes
node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73
node 0 size: 65536 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79
node 1 size: 65536 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61
node 2 size: 65536 MB
node 3 cpus: 30 62
node 3 size: 65536 MB
node 4 cpus:
node 4 size: 65536 MB
node 5 cpus:
node 5 size: 65536 MB
node 6 cpus: 31 63
node 6 size: 65536 MB
node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69
node 7 size: 65536 MB

With the patch applied :
-----------------------

QEMU 1.1.50 monitor - type 'help' for more information
(qemu) info numa
8 nodes
node 0 cpus: 0 1 2 3 4 5 6 7 8 9
node 0 size: 65536 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19
node 1 size: 65536 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29
node 2 size: 65536 MB
node 3 cpus: 30 31 32 33 34 35 36 37 38 39
node 3 size: 65536 MB
node 4 cpus: 40 41 42 43 44 45 46 47 48 49
node 4 size: 65536 MB
node 5 cpus: 50 51 52 53 54 55 56 57 58 59
node 5 size: 65536 MB
node 6 cpus: 60 61 62 63 64 65 66 67 68 69
node 6 size: 65536 MB
node 7 cpus: 70 71 72 73 74 75 76 77 78 79

Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com>
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-04 13:23:58 +00:00