We can get rid of the switch(asc) in mmu_translate_asc() by simply
selecting the right control register ASCE in the mmu_translate()
function already.
This patch is based on an original patch/idea by Ralf Hoppe.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Bit 52 in a page table entry has always to be zero, or a translation
specification exception is to be recognized.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
An Address Space Control Element (ASCE) is only the very first unit of
an s390 address translation (normally residing in one of the control
registers). The entries in the page tables are called differently.
So let's call the relevant variable pt_entry instead of asce in
mmu_translate_pte() to avoid future confusion (thus there is no
functional change in this patch, just renaming).
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If the "DAT-protection" bit is set in the region table entry and EDAT is
enabled, only read accesses are allowed in the corresponding memory area.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Each different level of region/segment table has a dedicated
exception type for illegal entries.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If an ASCE has illegal bits set, an ASCE-type exception should be
generated instead of a translation specification exception.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The address space bits in the translation exception code were wrong.
In fact, we can simply copy the bits from the PSW, so there's no need
for the trans_bits() function anymore.
Additionally, we now also set the fetch/store bits in the translation
exception code, so a guest can determine whether the exception occured
during a write or during a read.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When a fault occurs during the MMU lookup in s390_cpu_get_phys_page_debug(),
the trigger_page_fault() function writes the translation exception code
into the lowcore - something you would not expect during a memory access
by the debugger. Ease this problem by adding an additional parameter to
mmu_translate() which can be used to specify whether a program check and
the translation exception code should be injected or not.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The ACSEs have a table length field and the region entries have
table length and offset fields which must be checked during
translation to see whether the given virtual address is really
covered by the translation table.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The current code used a wrong and very confusing way of dealing with
the table levels by introducing a "fake level above current". However,
the real problem was simply that the checks for the region/segment
invalid bit and for the matching region/segment level was done at the
wrong spot in the code - it has to be done after the first table entry
has been looked up instead (e.g. there is also no "invalid" bit in the
ASCE itself and the current "level" has to be the same as the level in
the entry that we just looked up).
Also the entries for the segment table are quite a bit different compared
to the region table entries. So this patch moves the related code into the
function mmu_translate_segment() to make it clear at which table level we
currently are and to get rid of the ugly switch-statement in the function
mmu_translate_region().
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The real-space designation bits live in the ASCEs, not in the table entries,
so the check must be done before we start walking the MMU table.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
helper.c is quite overcrowded already, so let's move the MMU
translation to a separate file instead (like it has been done
with the other targets already).
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The command is not implemented correctly yet. The documentation allows
to not pass any value to set, in which case the time is re-read from
RTC. However, reading CMOS on Windows is not trivial to implement. So
instead of pretending we've set the correct time, fail explicitly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
For memory block command, we only support for linux with sysfs.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This conveys general information about guest memory blocks. Currently,
just the memory block size.
The size of a memory block is architecture dependent, it represents the logical
unit upon which memory online/offline operations are to be performed.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
*generalized guest-get-memory-block-size to get-get-memory-block-info
for future extensibility
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We can change guest's online/offline state of memory blocks, by using
command 'guest-set-memory-blocks'.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We can get guest's memory block information by using command
"guest-get-memory-blocks", the returned value contains a list of memory block
info, such as phys-index, online state, can-offline info.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
*replaced guest-triggerable assertion with an error msg
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Introduce three new guest commands:
guest-get-memory-blocks, guest-set-memory-blocks, guest-get-memory-block-size.
With these three commands, we can support online/offline guest's memory block
(logical memory hotplug/unplug) as required from host.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
*generalized guest-get-memory-block-size to get-get-memory-block-info
for future extensibility
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The following commands are implemented:
- guest_file_open
- guest_file_close
- guest_file_write
- guest_file_read
- guest_file_seek
- guest_file_flush
Motivation is quite simple: Windows guests should be supported with the
same set of features as Linux one. Also this patch is a prerequisite for
Windows guest-exec command support.
Signed-off-by: Olga Krishtal <okrishtal@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Moved the code that sets non-blocking flag on fd into a separate function.
Signed-off-by: Simon Zolin <szolin@parallels.com>
Reviewed-by: Roman Kagan <rkagan@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The problem is that mingw 4.9.1 fails to compile the code with the
following warning:
/mingw/include/string.h:88:9: note: previous declaration of 'strtok_r'
was here
char *strtok_r(char * __restrict__ _Str,
const char * __restrict__ _Delim,
char ** __restrict__ __last);
/include/sysemu/os-win32.h:83:7: warning: redundant redeclaration of
'strtok_r' [-Wredundant-decls]
char *strtok_r(char *str, const char *delim, char **saveptr);
The problem is that compiles just fine on previous versions of mingw.
Compiler version check here is not a good idea. Though fortunately
strtok_r is used only once in the code and we could simply rewrite
the code without it.
Signed-off-by: Olga Krishtal <okrishtal@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Add a new 'guest-set-user-password' command for changing the password
of guest OS user accounts. This command is needed to enable OpenStack
to support its API for changing the admin password of guests running
on KVM/QEMU. It is not practical to provide a command at the QEMU
level explicitly targetting administrator account password change
only, since different guest OS have different names for the admin
account. While UNIX systems use 'root', Windows systems typically
use 'Administrator' and even that can be renamed. Higher level apps
like OpenStack have the ability to figure out the correct admin
account name since they have info that QEMU/libvirt do not.
The command accepts either the clear text password string, encoded
in base64 to make it 8-bit safe in JSON:
$ echo -n "123456" | base64
MTIzNDU2
$ virsh -c qemu:///system qemu-agent-command f21x86_64 \
'{ "execute": "guest-set-user-password",
"arguments": { "crypted": false,
"username": "root",
"password": "MTIzNDU2" } }'
{"return":{}}
Or a password that has already been run though a crypt(3) like
algorithm appropriate for the guest, again then base64 encoded:
$ echo -n '$6$n01A2Tau$e...snip...DfMOP7of9AJ1I8q0' | base64
JDYkb...snip...YT2Ey
$ virsh -c qemu:///system qemu-agent-command f21x86_64 \
'{ "execute": "guest-set-user-password",
"arguments": { "crypted": true,
"username": "root",
"password": "JDYkb...snip...YT2Ey" } }'
NB windows support is desirable, but not implemented in this
patch.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Allow "unlocked" reads of the ram_list by using an RCU-enabled QLIST.
The ramlist mutex is kept. call_rcu callbacks are run with the iothread
lock taken, but that may change in the future. Writers still take the
ramlist mutex, but they no longer need to assume that the iothread lock
is taken.
Readers of the list, instead, no longer require either the iothread
or ramlist mutex, but they need to use rcu_read_lock() and
rcu_read_unlock().
One place in arch_init.c was downgrading from write side to read side
like this:
qemu_mutex_lock_iothread()
qemu_mutex_lock_ramlist()
...
qemu_mutex_unlock_iothread()
...
qemu_mutex_unlock_ramlist()
and the equivalent idiom is:
qemu_mutex_lock_ramlist()
rcu_read_lock()
...
qemu_mutex_unlock_ramlist()
...
rcu_read_unlock()
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QLIST has RCU-friendly primitives, so switch to it.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add RCU-enabled variants on the existing bsd DQ facility. Each
operation has the same interface as the existing (non-RCU)
version. Also, each operation is implemented as macro.
Using the RCU-enabled QLIST, existing QLIST users will be able to
convert to RCU without using a different list interface.
Signed-off-by: Mike Day <ncmike@ncultra.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Note that even after this patch, most callers of address_space_*
functions must still be under the big QEMU lock, otherwise the memory
region returned by address_space_translate can disappear as soon as
address_space_translate returns. This will be fixed in the next part
of this series.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After the previous patch, TLBs will be flushed on every change to
the memory mapping. This patch augments that with synchronization
of the MemoryRegionSections referred to in the iotlb array.
With this change, it is guaranteed that iotlb_to_region will access
the correct memory map, even once the TLB will be accessed outside
the BQL.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This for now is a simple TLB flush. This can change later for two
reasons:
1) an AddressSpaceDispatch will be cached in the CPUState object
2) it will not be possible to do tlb_flush once the TCG-generated code
runs outside the BQL.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that objects actually obey the rules, document them.
Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
object_unparent should not be called until the parent device is going to be
destroyed. Only remove the capability and do memory_region_del_subregion
at unrealize time. Freeing the data structures is left in shpc_free, to
be called from the instance_finalize callback.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This memory leak was introduced inadvertently by omitting object_unparent.
A better fix is to use the new memory_region_set_size instead of destroying
and recreating the MMIO region on the fly.
Also, ensure that unmapping and remapping the region is done atomically.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not throw away the value returned by bdrv_check_request() and
bdrv_check_byte_request().
Fix up some coding style issues in the proximity of the affected hunks.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1423162705-32065-17-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that request clamping is done in the BlockBackend, the "growable"
field can be removed from the BlockDriverState. All BDSs are now treated
as being "growable" (that is, they are allowed to grow; they are not
necessarily actually able to).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
BlockBackend is used as the interface between the block layer and guest
devices. It should therefore assure that all requests are clamped to the
image size.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1423162705-32065-15-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qemu-io should behave like a guest, therefore it should use BlockBackend
to access the block layer.
There are a couple of places where that is infeasible: First, the
bdrv_debug_* functions could theoretically be mirrored in the
BlockBackend, but since these are functions internal to the block layer,
they should not be visible externally (qemu-io as a test tool is exempt
from this).
Second, bdrv_get_info() and bdrv_get_specific_info() work on a single
BDS alone, therefore they should stay BDS-specific.
Third, bdrv_is_allocated() mainly works on a single BDS as well. Some
data may be passed through from the BDS's file (if sectors which are
apparently allocated in the file are not really allocated there but just
zero).
[Fixed conflicts around block_acct_start() usage from Fam Zheng's
"qemu-io: Account IO by aio_read and aio_write" commit. Use
BlockBackend and blk_get_stats() instead of BlockDriverState.
--Stefan]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-14-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Remove "growable" option from the "open" command and from the qemu-io
command line. qemu-io is about to be converted to BlockBackend which
will make sure that no request exceeds the image size, so the only way
to keep "growable" would be to use BlockBackend if it is not given and
to directly access the BDS if it is.
qemu-io is a debugging tool, therefore removing a rarely used option
will have only a very small impact, if any. There was only one
qemu-iotest which used the option; since it is not critical, this patch
just removes it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-13-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Although qemu-img already creates BlockBackends, it does not do accesses
to the images through them. This patch converts all of the bdrv_* calls
for which this is currently possible to blk_* calls. Most of the
remaining calls will probably stay bdrv_* calls because they really do
operate on the BDS level instead of the BB level.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-10-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-9-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-8-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
As part of the required changes, this fixes a bug where specifying an
invalid driver would result in the block layer probing the image format;
now it will result in an error, unless "<unset>" is specified as the
driver name. Fixing this would require further work on the xen_disk code
which does not seem worth it (at this point and for this patch).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-7-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Due to different error propagation, this breaks tests 051 and 087; fix
their output.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-6-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
While specifying a different driver and format is obviously invalid,
specifying the same driver once through driver and once through format
is invalid as well. Add a test for it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-5-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The argument given to bdrv_find_protocol() is just a file name, which
makes it difficult for the caller to reconstruct what protocol
bdrv_find_protocol() was hoping to find. This patch adds an Error
parameter to that function to solve this issue.
Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
blk_new_with_bs() creates a BlockBackend with an empty BlockDriverState
attached to it. Empty BDSs are not nice, therefore add an alternative
function which combines blk_new_with_bs() with bdrv_open().
Note: In contrast to bdrv_open() which takes a BlockDriver parameter,
blk_new_open() does not take such a parameter. This is because
bdrv_open() opens a BlockDriverState, therefore it is natural to be able
to set the BlockDriver for that BDS. The fact that bdrv_open() can open
more than a single BDS is merely some form of a byproduct.
blk_new_open() on the other hand is intended to be used to create a
whole tree of BlockDriverStates. Therefore, setting a single BlockDriver
does not make much sense. Instead, the drivers to be used for each of
the nodes must be configured through the "options" QDict; including the
driver of the root BDS.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1423162705-32065-3-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Create the blk_* counterparts for the following bdrv_* functions (which
make sense to call on the BlockBackend level):
- bdrv_co_write_zeroes()
- bdrv_write_compressed()
- bdrv_truncate()
- bdrv_nb_sectors()
- bdrv_discard()
- bdrv_load_vmstate()
- bdrv_save_vmstate()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>