AMD has a separate guest CPL field, because on AMD, the SYSCALL/SYSRET
instructions do not force SS.DPL to predefined values. On Intel they do,
so the CPL on Intel is just the guest's SS.DPL value.
Even though technically possible on AMD, there is no sane reason for a
guest kernel to set a non-three SS.DPL, doing that would mess up several
common segmentation practices and wouldn't be compatible with Intel.
So, force the Intel behavior on AMD, by always setting SS.DPL<=>CPL.
Remove the now unused CPL field from nvmm_x64_state::misc[]. This actually
increases performance on AMD: to detect interrupt windows the virtualizer
has to modify some fields of misc[], and because CPL was there, we had to
flush the SEG set of the VMCB cache. Now there is no flush necessary.
While here remove the CPL check for XSETBV on Intel, contrary to AMD
Intel checks the CPL before the intercept, so if we receive an XSETBV
VMEXIT, we are certain that it was executed at CPL=0 in the guest. By the
way my check was wrong in the first place, it was reading SS.RPL instead
of SS.DPL.
tree, don't display a CTLESC which is there only to protect a CTL*
char (a data char that happens to have the same value). No actual
CTL* chars are printed as data, so no escaping is needed to protect
data which just happens to look the same. Dropping this avoids the
possibility of confusion/ambiguity in what the word actually contains.
NFC for any normal shell build (very little of this file gets compiled there)
anway) on tech-userlevel with no adverse response.
This allows the magic of vars like HOSTNAME SECONDS, ToD (etc) to be
restored should it be lost - perhaps by having a var of the same name
imported from the environment (which needs to remove the magic in case
a set of scripts are using the env to pass data, and the var name chosen
happens to be one of our magic ones).
No change to SMALL shells (or smaller) - none of the magic vars (except
LINENO, which is exempt from all of this) exist in those, hence such a
shell has no need for this command either.
crtbegin.o has a read-only .eh_frame, and libstdc++ builds.
2017-09-01 Joerg Sonnenberger <joerg@bec.de>
Jeff Law <law@redhat.com>
* varasm.c (bss_initializer_p): Do not put constants into .bss
(categorize_decl_for_section): Handle bss_initializer_p returning
false when DECL_INITIAL is NULL.
2017-11-27 Jakub Jelinek <jakub@redhat.com>
PR target/83100
* varasm.c (bss_initializer_p): Return true for DECL_COMMON
TREE_READONLY decls.
2018-02-09 Jakub Jelinek <jakub@redhat.com>
PR middle-end/84237
* output.h (bss_initializer_p): Add NAMED argument, defaulted to false.
* varasm.c (bss_initializer_p): Add NAMED argument, if true, ignore
TREE_READONLY bit.
(get_variable_section): For decls in named .bss* sections pass true as
second argument to bss_initializer_p.
is an extra "unusable" bit, which has a twisted meaning. We can't just
ignore this bit, because when unset, the CPU performs extra checks on the
other attributes, which may cause VMENTRY to fail and the guest to be
killed.
Typically, on Qemu, some guests like Windows XP trigger two consecutive
getstate+setstate calls, and while processing them, we end up wrongfully
removing the "unusable" bits that were previously set.
Fix that by forcing "unusable = !present". Each hypervisor I could check
does something different, but this seems to be the least problematic
solution for now.
While here, the fields of vmx_guest_segs are VMX indexes, so they should
be uint64_t (no functional change).
It is UP only, has xbd(4) and xennet(4) as PV drivers.
The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
Add new tests traceme_raisesignal_masked[1-8].
New tests to verify that masking (with SIG_BLOCK) signal in tracee
stops tracer from catching this raised signal. Masked crash signals are
invisible to tracer as well.
All tests pass.
Verify correct behavior of crash signals (SIGTRAP, SIGBUS, SIGILL, SIGFPE,
SIGSEGV) in existing test scenarios:
- traceme_raise
- traceme_sendsignal_handle
- traceme_sendsignal_masked
- traceme_sendsignal_ignored
- traceme_sendsignal_simple
- traceme_vfork_raise
These tests verify signals out of the context of CPU trap. These new tests
will help to retain expected behavior in future changes in semantics of
the trapsignals in the kernel.
Pass -DACPI_MISALIGNMENT_NOT_SUPPORTED under kUBSan enabled. This option
is dedicated for alignment sensitive CPUs in acpica. It was originally
designed for Itanium CPUs, but nowadays it's wanted for aarch64 as well.
Define it in acpica code under kUBSan in order to pacify Undefined Behavior
reports on all ports (in particular x86). The number of reports is now
halved with this patch applied. The remaining alignment alarms in acpica
will be addressed in future.
Patch contributed by <Akul Pillai>
VMs on Intel CPUs. Overall this implementation is fast and reliable, I am
able to run NetBSD VMs with many VCPUs on a quad-core Intel i5.
NVMM-Intel applies several optimizations already present in NVMM-AMD, and
has a code structure similar to it. No change was needed in the NVMM MI
frontend, or in libnvmm.
Some differences exist against AMD:
- On Intel the ASID space is big, so we don't fall back to a shared ASID
when there are more VCPUs executing than available ASIDs in the host,
contrary to AMD. There are enough ASIDs for the maximum number of VCPUs
supported by NVMM.
- On Intel there are two TLBs we need to take care of, one for the host
(EPT) and one for the guest (VPID). Changes in EPT paging flush the
host TLB, changes to the guest mode flush the guest TLB.
- On Intel there is no easy way to set/fetch the VTPR, so we intercept
reads/writes to CR8 and maintain a software TPR, that we give to the
virtualizer as if it was the effective TPR in the guest.
- On Intel, because of SVS, the host CR4 and LSTAR are not static, so
we're forced to save them on each VMENTRY.
- There is extra Intel weirdness we need to take care of, for example the
reserved bits in CR0 and CR4 when accesses trap.
While this implementation is functional and can already run many OSes, we
likely have a problem on 32bit-PAE guests, because they require special
care on Intel CPUs, and currently we don't handle that correctly; such
guests may misbehave for now (without altering the host stability). I
expect to fix that soon.
the three event types available on AMD, but Intel has seven of them, all
with weird and twisted meanings, and they require extra parameters.
Software interrupts should not be used anyway.
The idea is that under NVMM, we don't want to implement the hypervisor page
tables manually in NVMM directly, because we want pageable guests; that is,
we want to allow UVM to unmap guest pages when the host comes under
pressure.
Contrary to AMD-SVM, Intel-VMX uses a different set of PTE bits from
native, and this has three important consequences:
- We can't use the native PTE bits, so each time we want to modify the
page tables, we need to know whether we're dealing with a native pmap
or an EPT pmap. This is accomplished with callbacks, that handle
everything PTE-related.
- There is no recursive slot possible, so we can't use pmap_map_ptes().
Rather, we walk down the EPT trees via the direct map, and that's
actually a lot simpler (and probably faster too...).
- The kernel is never mapped in an EPT pmap. An EPT pmap cannot be loaded
on the host. This has two sub-consequences: at creation time we must
zero out all of the top-level PTEs, and at destruction time we force
the page out of the pool cache and into the pool, to ensure that a next
allocation will invoke pmap_pdp_ctor() to create a native pmap and not
recycle some stale EPT entries.
To create an EPT pmap, the caller must invoke pmap_ept_transform() on a
newly-allocated native pmap. And that's about it, from then on the EPT
callbacks will be invoked, and the pmap can be destroyed via the usual
pmap_destroy(). The TLB shootdown callback is not initialized however,
it is the responsibility of the hypervisor (NVMM) to set it.
There are some twisted cases that we need to handle. For example if
pmap_is_referenced() is called on a physical page that is entered both by
a native pmap and by an EPT pmap, we take the Accessed bits from the
two pmaps using different PTE sets in each case, and combine them into a
generic PP_ATTRS_U flag (that does not depend on the pmap type).
Given that the EPT layout is a 4-Level tree with the same address space as
native x86_64, we allow ourselves to use a few native macros in EPT, such
as pmap_pa2pte(), rather than re-defining them with "ept" in the name.
Even though this EPT code is rather complex, it is not too intrusive: just
a few callbacks in a few pmap functions, predicted-false to give priority
to native. So this comes with no messy #ifdef or performance cost.
run without a XEN domain loader having previously overwritten the
hypercall page with its hypercall trampoline machine code, we still
get to detect its presence by calling the xen_version hypercall stub.
We use this hack to detect the presence or absence of the hypervisor,
without relying on the MSR support on HVM domains.
This works as an added sanity check that the hypercall page
registration has indeed succeeded in HVM mode.