mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Nicholas Piggin	0dfe59fe77	target/ppc: add SMT support to msgsnd broadcast msgsnd has a broadcast mode that sends hypervisor doorbells to all threads belonging to the same core as the target. A "subcore" mode sends to all or one thread depending on 1LPAR mode. Reviewed-by: Glenn Miles <milesg@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 09:34:41 +10:00
Nicholas Piggin	45693f94dd	target/ppc: Implement attn instruction on BookS 64-bit processors attn is an implementation-specific instruction that on POWER (and G5/ 970) can be enabled with a HID bit (disabled = illegal), and executing it causes the host processor to stop and the service processor to be notified. Generally used for debugging. Implement attn and make it checkstop the system, which should be good enough for QEMU debugging. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 09:34:38 +10:00
Glenn Miles	6bfcf1dc23	target/ppc: Add clrbhrb and mfbhrbe instructions Add support for the clrbhrb and mfbhrbe instructions. Since neither instruction is believed to be critical to performance, both instructions were implemented using helper functions. Access to both instructions is controlled by bits in the HFSCR (for privileged state) and MMCR0 (for problem state). A new function, helper_mmcr0_facility_check, was added for checking MMCR0[BHRBA] and raising a facility_unavailable exception if required. NOTE: For P8 and P9, due to a performance issue, branch history will not be kept, but the instructions will be allowed to execute as normal with the exception that the mfbhrbe instruction will always return a zero value. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 09:33:32 +10:00
Glenn Miles	4de4a4705f	target/ppc: Add recording of taken branches to BHRB This commit continues adding support for the Branch History Rolling Buffer (BHRB) as is provided starting with the P8 processor and continuing with its successors. This commit is limited to the recording and filtering of taken branches. The following changes were made: - Enabled functionality on P10 processors only due to performance impact seen with P8 and P9 where it is not disabled for non problem state branches. - Added a BHRB buffer for storing branch instruction and target addresses for taken branches - Renamed gen_update_cfar to gen_update_branch_history and added a 'target' parameter to hold the branch target address and 'inst_type' parameter to use for filtering - Added TCG code to gen_update_branch_history that stores data to the BHRB and updates the BHRB offset. - Added BHRB resource initialization and reset functions Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 09:33:06 +10:00
Chinmay Rath	687a30ad3c	target/ppc: Move VMX integer max/min instructions to decodetree. Moving the following instructions to decodetree specification : v{max, min}{u, s}{b, h, w, d} : VX-form The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	664eb39ec9	target/ppc: Move VMX integer logical instructions to decodetree. Moving the following instructions to decodetree specification: v{and, andc, nand, or, orc, nor, xor, eqv} : VX-form The changes were verified by validating that the tcp ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	21b5f5464f	target/ppc: Move VMX storage access instructions to decodetree Moving the following instructions to decodetree specification : {l,st}ve{b,h,w}x, {l,st}v{x,xl}, lvs{l,r} : X-form The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured using the '-d in_asm,op' flag. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	948e257c48	target/ppc: Move logical fixed-point instructions to decodetree. Moving the below instructions to decodetree specification : andi[s]., {ori, xori}[s] : D-form {and, andc, nand, or, orc, nor, xor, eqv}[.], exts{b, h, w}[.], cnt{l, t}z{w, d}[.], popcnt{b, w, d}, prty{w, d}, cmp, bpermd : X-form With this patch, all the fixed-point logical instructions have been moved to decodetree. The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> [np: 32-bit compile fix] Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	ae556c6a49	target/ppc: Move cmp{rb, eqb}, tw[i], td[i], isel instructions to decodetree. Moving the following instructions to decodetree specification : cmp{rb, eqb}, t{w, d} : X-form t{w, d}i : D-form isel : A-form The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured using the '-d in_asm,op' flag. Also for CMPRB, following review comments : Replaced repetition of arithmetic right shifting (tcg_gen_shri_i32) followed by extraction of last 8 bits (tcg_gen_ext8u_i32) with extraction of the required bits using offsets (tcg_gen_extract_i32). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> [np: 32-bit compile fix] Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	f424bc10eb	target/ppc: Move div/mod fixed-point insns (64 bits operands) to decodetree. Moving the below instructions to decodetree specification : divd[u, e, eu][o][.] : XO-form mod{sd, ud} : X-form With this patch, all the fixed-point arithmetic instructions have been moved to decodetree. The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured using the '-d in_asm,op' flag. Also, remaned do_divwe method in fixedpoint-impl.c.inc to do_dive because it is now used to divide doubleword operands as well, and not just words. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> [np: 32-bit compile fix] Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	703e88f723	target/ppc: Move multiply fixed-point insns (64-bit operands) to decodetree. Moving the following instructions to decodetree : mul{ld, ldo, hd, hdu}[.] : XO-form madd{hd, hdu, ld} : VA-form The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> [np: 32-bit compile fix] Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	a81b5c1867	target/ppc: Move neg, darn, mod{sw, uw} to decodetree. Moving the below instructions to decodetree specification : neg[o][.] : XO-form mod{sw, uw}, darn : X-form The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> [np: 32-bit compile fix] Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	2871921d85	target/ppc: Move divw[u, e, eu] instructions to decodetree. Moving the following instructions to decodetree specification : divw[u, e, eu][o][.] : XO-form The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	a1faff873a	target/ppc: Move mul{li, lw, lwo, hw, hwu} instructions to decodetree. Moving the following instructions to decodetree specification : mulli : D-form mul{lw, lwo, hw, hwu}[.] : XO-form The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag. Also cleaned up code for mullw[o][.] as per review comments while keeping the logic of the tcg ops generated semantically same. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	177fcc06dc	target/ppc: Move floating-point arithmetic instructions to decodetree. This patch moves the below instructions to decodetree specification : f{add, sub, mul, div, re, rsqrte, madd, msub, nmadd, nmsub}[s][.] : A-form ft{div, sqrt} : X-form With this patch, all the floating-point arithmetic instructions have been moved to decodetree. The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Nicholas Piggin	b3cfa2dd2b	target/ppc: Add ISA v3.1 variants of sync instruction POWER10 adds a new field to sync for store-store syncs, and some new variants of the existing syncs that include persistent memory. Implement the store-store syncs and plwsync/phwsync. Reviewed-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Nicholas Piggin	ab4f174bae	target/ppc: Fix embedded memory barriers Memory barriers are supposed to do something on BookE systems, these were probably just missed during MTTCG enablement, maybe no targets support SMP. Either way, add proper BookE implementations. Reviewed-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Nicholas Piggin	13f5086783	target/ppc: Move sync instructions to decodetree This tries to faithfully reproduce the odd BookE logic. Note the e206 check in gen_msync_4xx() is always false, so not carried over. It does change the handling of non-zero reserved bits outside the defined fields from being illegal to being ignored, which the architecture specifies ot help with backward compatibility of new fields. The existing behaviour causes illegal instruction exceptions when using new POWER10 sync variants that add new fields, after this the instructions are accepted and are implemented as supersets of the new behaviour, as intended. Reviewed-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Nicholas Piggin	82676f1fc4	target/ppc: Fix broadcast tlbie synchronisation With mttcg, broadcast tlbie instructions do not wait until other vCPUs have been kicked out of TCG execution before they complete (including necessary subsequent tlbsync, etc., instructions). This is contrary to the ISA, and it permits other vCPUs to use translations after the TLB flush. For example: CPU0 // memP is initially 0, memV maps to memP with pte pte = 0; ptesync ; tlbie ; eieio ; tlbsync ; ptesync memP = 1; CPU1 assert(*memV == 0); It is possible for the assertion to fail because CPU1 translates memV using the TLB after CPU0 has stored 1 to the underlying memory. This race was observed with a careful test case where CPU1 checks run in a very large expensive TB so it can run for the entire CPU0 period between clearing the pte and storing the memory, but host vCPU thread preemption could cause the race to hit anywhere. As explained in commit `4ddc104689` ("target/ppc: Fix tlbie"), it is not enough to just use tlb_flush_all_cpus_synced(), because that does not execute until the calling CPU has finished its TB. It is also required that the TB is ended at the point where the TLB flush must subsequently take effect. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-05-24 08:57:50 +10:00
Chinmay Rath	a9bd40d937	target/ppc: Move add and subf type fixed-point arithmetic instructions to decodetree This patch moves the below instructions to decodetree specification: {add, subf}[c,e,me,ze][o][.] : XO-form addic[.], subfic : D-form addex : Z23-form This patch introduces XO form instructions into decode tree specification, for which all the four variations([o][.]) have been handled with a single pattern. The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-03-13 02:47:04 +10:00
Nicholas Piggin	2cc0e449d1	target/ppc: Fix lxv/stxv MSR facility check The move to decodetree flipped the inequality test for the VEC / VSX MSR facility check. This caused application crashes under Linux, where these facility unavailable interrupts are used for lazy-switching of VEC/VSX register sets. Getting the incorrect interrupt would result in wrong registers being loaded, potentially overwriting live values and/or exposing stale ones. Cc: qemu-stable@nongnu.org Reported-by: Joel Stanley <joel@jms.id.au> Fixes: `70426b5bb7` ("target/ppc: moved stxvx and lxvx from legacy to decodtree") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1769 Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Tested-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2024-02-23 23:16:34 +10:00
Manos Pitsidianakis	2bd55fd394	ppc: correct typos Correct typos automatically found with the `typos` tool <https://crates.io/crates/typos> Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> (mjt: remove 2 "arbitrer" hunks, suggested by BALATON) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-02-20 22:21:25 +03:00
Richard Henderson	ad75a51e84	tcg: Rename cpu_env to tcg_env Allow the name 'cpu_env' to be used for something else. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-10-03 08:01:02 -07:00
Michael Tokarev	e6a19a6477	ppc: spelling fixes Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2023-09-20 07:54:34 +03:00
Nicholas Piggin	718209358f	target/ppc: Fix LQ, STQ register-pair order for big-endian LQ, STQ have the same register-pair ordering as LQARX/STQARX., which is the even (lower) register contains the most significant bits. This is not implemented correctly for big-endian. do_ldst_quad() has variables low_addr_gpr and high_addr_gpr which is confusing because they are low and high addresses, whereas LQARX/STQARX. and most such things use the low and high values for lo/hi variables. The conversion to native 128-bit memory access functions missed this strangeness. Fix this by changing the if condition, and change the variable names to hi/lo to match convention. Cc: qemu-stable@nongnu.org Reported-by: Ivan Warren <ivan@vmfacility.fr> Fixes: `57b38ffd0c` ("target/ppc: Use tcg_gen_qemu_{ld,st}_i128 for LQARX, LQ, STQ") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1836 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2023-09-06 11:19:33 +02:00
Richard Henderson	253d110dba	target/ppc: Use tcg_gen_negsetcond_* Tested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-08-24 11:22:42 -07:00
Philippe Mathieu-Daudé	283a917772	target/ppc: Inline gen_icount_io_start() Now that gen_icount_io_start() is a simple wrapper to translator_io_start(), inline it. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230602095439.48102-1-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-06-05 12:04:29 -07:00
Richard Purdie	5260ecffd2	target/ppc: Fix fallback to MFSS for MFFS* instructions on pre 3.0 ISAs The following commits changed the code such that the fallback to MFSS for MFFSCRN, MFFSCRNI, MFFSCE and MFFSL on pre 3.0 ISAs was removed and became an illegal instruction: `bf8adfd88b` - target/ppc: Move mffscrn[i] to decodetree `394c2e2fda` - target/ppc: Move mffsce to decodetree `3e5bce70ef` - target/ppc: Move mffsl to decodetree The hardware will handle them as a MFFS instruction as the code did previously. This means applications that were segfaulting under qemu when encountering these instructions which is used in glibc libm functions for example. The fallback for MFFSCDRN and MFFSCDRNI added in a later patch was also missing. This patch restores the fallback to MFSS for these instructions on pre 3.0s ISAs as the hardware decoder would, fixing the segfaulting libm code. It doesn't have the fallback for 3.0 onwards to match hardware behaviour. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> Reviewed-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230510111913.1718734-1-richard.purdie@linuxfoundation.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2023-05-27 08:25:19 -03:00
Richard Henderson	57b38ffd0c	target/ppc: Use tcg_gen_qemu_{ld,st}_i128 for LQARX, LQ, STQ No need to roll our own, as this is now provided by tcg. This was the last use of retxl, so remove that too. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-05-23 16:51:19 -07:00
Shivaprasad G Bhat	6a5d81b172	tcg: ppc64: Fix mask generation for vextractdm In function do_extractm() the mask is calculated as dup_const(1 << (element_width - 1)). '1' being signed int works fine for MO_8,16,32. For MO_64, on PPC64 host this ends up becoming 0 on compilation. The vextractdm uses MO_64, and it ends up having mask as 0. Explicitly use 1ULL instead of signed int 1 like its used everywhere else. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1536 Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <168319292809.1159309.5817546227121323288.stgit@ltc-boston1.aus.stglabs.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2023-05-05 12:34:22 -03:00
Richard Henderson	4fe0e9db0a	target/ppc: Rewrite trans_ADDG6S Compute all carry bits in parallel instead of a loop. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-13 07:03:39 -07:00
Richard Henderson	61d4bf3338	target/ppc: Avoid tcg_const_* in fp-impl.c.inc All uses are strictly read-only. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-13 07:03:39 -07:00
Richard Henderson	36052a7aa9	target/ppc: Avoid tcg_const_* in vsx-impl.c.inc All remaining uses are strictly read-only. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-13 07:03:39 -07:00
Richard Henderson	06c005f65c	target/ppc: Avoid tcg_const_* in xxeval Initialize a new temp instead of tcg_const_*. Fix a pasto in a comment. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-13 07:03:39 -07:00
Richard Henderson	4528d720b4	target/ppc: Avoid tcg_const_* in vmx-impl.c.inc All remaining uses are strictly read-only. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-13 07:03:39 -07:00
Richard Henderson	ffc0ce24fd	target/ppc: Avoid tcg_const_i64 in do_vcntmb Compute both partial results separately and accumulate at the end, instead of accumulating in the middle. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-13 07:03:39 -07:00
Richard Henderson	c3be8116d9	target/ppc: Avoid tcg_const_i64 in do_vector_shift_quad Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-13 06:44:37 -07:00
Richard Henderson	5b7a8b81d2	target/ppc: Split out gen_vx_vmul10 Move the body out of this large macro. Use tcg_constant_i64. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-13 06:44:37 -07:00
Richard Henderson	571f850722	target/ppc: Drop tcg_temp_free Translators are no longer required to free tcg temporaries. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-05 13:44:08 -08:00
Richard Henderson	9723281fbb	target/ppc: Don't use tcg_temp_local_new Since tcg_temp_new is now identical, use that. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-03-01 07:33:28 -10:00
Lucas Mateus Castro (alqotel)	bbd8dd5e45	target/ppc: Use gvec to decode XVTSTDC[DS]P Used gvec to translate XVTSTDCSP and XVTSTDCDP. xvtstdcsp: rept loop imm master version prev version current version 25 4000 0 0,206200 0,040730 (-80.2%) 0,040740 (-80.2%) 25 4000 1 0,205120 0,053650 (-73.8%) 0,053510 (-73.9%) 25 4000 3 0,206160 0,058630 (-71.6%) 0,058570 (-71.6%) 25 4000 51 0,217110 0,191490 (-11.8%) 0,192320 (-11.4%) 25 4000 127 0,206160 0,191490 (-7.1%) 0,192640 (-6.6%) 8000 12 0 1,234719 0,418833 (-66.1%) 0,386365 (-68.7%) 8000 12 1 1,232417 1,435979 (+16.5%) 1,462792 (+18.7%) 8000 12 3 1,232760 1,766073 (+43.3%) 1,743990 (+41.5%) 8000 12 51 1,239281 1,319562 (+6.5%) 1,423479 (+14.9%) 8000 12 127 1,231708 1,315760 (+6.8%) 1,426667 (+15.8%) xvtstdcdp: rept loop imm master version prev version current version 25 4000 0 0,159930 0,040830 (-74.5%) 0,040610 (-74.6%) 25 4000 1 0,160640 0,053670 (-66.6%) 0,053480 (-66.7%) 25 4000 3 0,160020 0,063030 (-60.6%) 0,062960 (-60.7%) 25 4000 51 0,160410 0,128620 (-19.8%) 0,127470 (-20.5%) 25 4000 127 0,160330 0,127670 (-20.4%) 0,128690 (-19.7%) 8000 12 0 1,190365 0,422146 (-64.5%) 0,388417 (-67.4%) 8000 12 1 1,191292 1,445312 (+21.3%) 1,428698 (+19.9%) 8000 12 3 1,188687 1,980656 (+66.6%) 1,975354 (+66.2%) 8000 12 51 1,191250 1,264500 (+6.1%) 1,355083 (+13.8%) 8000 12 127 1,197313 1,266729 (+5.8%) 1,349156 (+12.7%) Overall, these instructions are the hardest ones to measure performance as the gvec implementation is affected by the immediate. Above there are 5 different scenarios when it comes to immediate and 2 when it comes to rept/loop combination. The immediates scenarios are: all bits are 0 therefore the target register should just be changed to 0, with 1 bit set, with 2 bits set in a combination the new implementation can deal with using gvec, 4 bits set and the new implementation can't deal with it using gvec and all bits set. The rept/loop scenarios are high loop and low rept (so it should spend more time executing it than translating it) and high rept low loop (so it should spend more time translating it than executing this code). These comparisons are between the upstream version, a previous similar implementation and a one with a cleaner code(this one). For a comparison with o previous different implementation: <20221010191356.83659-13-lucas.araujo@eldorado.org.br> Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-13-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)	da3c53bac3	target/ppc: Moved XSTSTDC[QDS]P to decodetree Moved XSTSTDCSP, XSTSTDCDP and XSTSTDCQP to decodetree and moved some of its decoding away from the helper as previously the DCMX, XB and BF were calculated in the helper with the help of cpu_env, now that part was moved to the decodetree with the rest. xvtstdcsp: rept loop master patch 8 12500 1,85393600 1,94683600 (+5.0%) 25 4000 1,78779800 1,92479000 (+7.7%) 100 1000 2,12775000 2,28895500 (+7.6%) 500 200 2,99655300 3,23102900 (+7.8%) 2500 40 6,89082200 7,44827500 (+8.1%) 8000 12 17,50585500 18,95152100 (+8.3%) xvtstdcdp: rept loop master patch 8 12500 1,39043100 1,33539800 (-4.0%) 25 4000 1,35731800 1,37347800 (+1.2%) 100 1000 1,51514800 1,56053000 (+3.0%) 500 200 2,21014400 2,47906000 (+12.2%) 2500 40 5,39488200 6,68766700 (+24.0%) 8000 12 13,98623900 18,17661900 (+30.0%) xvtstdcdp: rept loop master patch 8 12500 1,35123800 1,34455800 (-0.5%) 25 4000 1,36441200 1,36759600 (+0.2%) 100 1000 1,49763500 1,54138400 (+2.9%) 500 200 2,19020200 2,46196400 (+12.4%) 2500 40 5,39265700 6,68147900 (+23.9%) 8000 12 14,04163600 18,19669600 (+29.6%) As some values are now decoded outside the helper and passed to it as an argument the number of arguments of the helper increased, the number of TCGop needed to load the arguments increased. I suspect that's why the slow-down in the tests with a high REPT but low LOOP. Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-12-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)	a70a524710	target/ppc: Moved XVTSTDC[DS]P to decodetree Moved XVTSTDCSP and XVTSTDCDP to decodetree an restructured the helper to be simpler and do all decoding in the decodetree (so XB, XT and DCMX are all calculated outside the helper). Obs: The tests in this one are slightly different, these are the sum of these instructions with all possible immediate and those instructions are repeated 10 times. xvtstdcsp: rept loop master patch 8 12500 2,76402100 2,70699100 (-2.1%) 25 4000 2,64867100 2,67884100 (+1.1%) 100 1000 2,73806300 2,78701000 (+1.8%) 500 200 3,44666500 3,61027600 (+4.7%) 2500 40 5,85790200 6,47475500 (+10.5%) 8000 12 15,22102100 17,46062900 (+14.7%) xvtstdcdp: rept loop master patch 8 12500 2,11818000 1,61065300 (-24.0%) 25 4000 2,04573400 1,60132200 (-21.7%) 100 1000 2,13834100 1,69988100 (-20.5%) 500 200 2,73977000 2,48631700 (-9.3%) 2500 40 5,05067000 5,25914100 (+4.1%) 8000 12 14,60507800 15,93704900 (+9.1%) Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-11-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)	95a89d3118	target/ppc: Use gvec to decode XVCPSGN[SD]P Moved XVCPSGNSP and XVCPSGNDP to decodetree and used gvec to translate them. xvcpsgnsp: rept loop master patch 8 12500 0,00561400 0,00537900 (-4.2%) 25 4000 0,00562100 0,00400000 (-28.8%) 100 1000 0,00696900 0,00416300 (-40.3%) 500 200 0,02211900 0,00840700 (-62.0%) 2500 40 0,09328600 0,02728300 (-70.8%) 8000 12 0,27295300 0,06867800 (-74.8%) xvcpsgndp: rept loop master patch 8 12500 0,00556300 0,00584200 (+5.0%) 25 4000 0,00482700 0,00431700 (-10.6%) 100 1000 0,00585800 0,00464400 (-20.7%) 500 200 0,01565300 0,00839700 (-46.4%) 2500 40 0,05766500 0,02430600 (-57.8%) 8000 12 0,19875300 0,07947100 (-60.0%) Like the previous instructions there seemed to be a improvement on translation time. Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-10-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)	a5b3680519	target/ppc: Use gvec to decode XV[N]ABS[DS]P/XVNEG[DS]P Moved XVABSSP, XVABSDP, XVNABSSP,XVNABSDP, XVNEGSP and XVNEGDP to decodetree and used gvec to translate them. xvabssp: rept loop master patch 8 12500 0,00477900 0,00476000 (-0.4%) 25 4000 0,00442800 0,00353300 (-20.2%) 100 1000 0,00478700 0,00366100 (-23.5%) 500 200 0,00973200 0,00649400 (-33.3%) 2500 40 0,03165200 0,02226700 (-29.7%) 8000 12 0,09315900 0,06674900 (-28.3%) xvabsdp: rept loop master patch 8 12500 0,00475000 0,00474400 (-0.1%) 25 4000 0,00355600 0,00367500 (+3.3%) 100 1000 0,00444200 0,00366000 (-17.6%) 500 200 0,00942700 0,00732400 (-22.3%) 2500 40 0,02990000 0,02308500 (-22.8%) 8000 12 0,08770300 0,06683800 (-23.8%) xvnabssp: rept loop master patch 8 12500 0,00494500 0,00492900 (-0.3%) 25 4000 0,00397700 0,00338600 (-14.9%) 100 1000 0,00421400 0,00353500 (-16.1%) 500 200 0,01048000 0,00707100 (-32.5%) 2500 40 0,03251500 0,02238300 (-31.2%) 8000 12 0,08889100 0,06469800 (-27.2%) xvnabsdp: rept loop master patch 8 12500 0,00511000 0,00492700 (-3.6%) 25 4000 0,00398800 0,00381500 (-4.3%) 100 1000 0,00390500 0,00365900 (-6.3%) 500 200 0,00924800 0,00784600 (-15.2%) 2500 40 0,03138900 0,02391600 (-23.8%) 8000 12 0,09654200 0,05684600 (-41.1%) xvnegsp: rept loop master patch 8 12500 0,00493900 0,00452800 (-8.3%) 25 4000 0,00369100 0,00366800 (-0.6%) 100 1000 0,00371100 0,00380000 (+2.4%) 500 200 0,00991100 0,00652300 (-34.2%) 2500 40 0,03025800 0,02422300 (-19.9%) 8000 12 0,09251100 0,06457600 (-30.2%) xvnegdp: rept loop master patch 8 12500 0,00474900 0,00454400 (-4.3%) 25 4000 0,00353100 0,00325600 (-7.8%) 100 1000 0,00398600 0,00366800 (-8.0%) 500 200 0,01032300 0,00702400 (-32.0%) 2500 40 0,03125000 0,02422400 (-22.5%) 8000 12 0,09475100 0,06173000 (-34.9%) This one to me seemed the opposite of the previous instructions, as it looks like there was an improvement in the translation time (itself not a surprise as operations were done twice before so there was the need to translate twice as many TCGop) Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-9-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)	26c964f851	target/ppc: Move VABSDU[BHW] to decodetree and use gvec Moved VABSDUB, VABSDUH and VABSDUW to decodetree and use gvec to translate them. vabsdub: rept loop master patch 8 12500 0,03601600 0,00688500 (-80.9%) 25 4000 0,03651000 0,00532100 (-85.4%) 100 1000 0,03666900 0,00595300 (-83.8%) 500 200 0,04305800 0,01244600 (-71.1%) 2500 40 0,06893300 0,04273700 (-38.0%) 8000 12 0,14633200 0,12660300 (-13.5%) vabsduh: rept loop master patch 8 12500 0,02172400 0,00687500 (-68.4%) 25 4000 0,02154100 0,00531500 (-75.3%) 100 1000 0,02235400 0,00596300 (-73.3%) 500 200 0,02827500 0,01245100 (-56.0%) 2500 40 0,05638400 0,04285500 (-24.0%) 8000 12 0,13166000 0,12641400 (-4.0%) vabsduw: rept loop master patch 8 12500 0,01646400 0,00688300 (-58.2%) 25 4000 0,01454500 0,00475500 (-67.3%) 100 1000 0,01545800 0,00511800 (-66.9%) 500 200 0,02168200 0,01114300 (-48.6%) 2500 40 0,04571300 0,04138800 (-9.5%) 8000 12 0,12209500 0,12178500 (-0.3%) Same as VADDCUW and VSUBCUW, overall performance gain but it uses more TCGop (4 before the patch, 6 after). Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-8-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)	c85929b2dd	target/ppc: Move VAVG[SU][BHW] to decodetree and use gvec Moved the instructions VAVGUB, VAVGUH, VAVGUW, VAVGSB, VAVGSH, VAVGSW, to decodetree and use gvec with them. For these one the right shift had to be made before the sum as to avoid an overflow, so add 1 at the end if any of the entries had 1 in its LSB as to replicate the "+ 1" before the shift described by the ISA. vavgub: rept loop master patch 8 12500 0,02616600 0,00754200 (-71.2%) 25 4000 0,02530000 0,00637700 (-74.8%) 100 1000 0,02604600 0,00790100 (-69.7%) 500 200 0,03189300 0,01838400 (-42.4%) 2500 40 0,06006900 0,06851000 (+14.1%) 8000 12 0,13941000 0,20548500 (+47.4%) vavguh: rept loop master patch 8 12500 0,01818200 0,00780600 (-57.1%) 25 4000 0,01789300 0,00641600 (-64.1%) 100 1000 0,01899100 0,00787200 (-58.5%) 500 200 0,02527200 0,01828400 (-27.7%) 2500 40 0,05361800 0,06773000 (+26.3%) 8000 12 0,12886600 0,20291400 (+57.5%) vavguw: rept loop master patch 8 12500 0,01423100 0,00776600 (-45.4%) 25 4000 0,01780800 0,00638600 (-64.1%) 100 1000 0,02085500 0,00787000 (-62.3%) 500 200 0,02737100 0,01828800 (-33.2%) 2500 40 0,05572600 0,06774200 (+21.6%) 8000 12 0,13101700 0,20311600 (+55.0%) vavgsb: rept loop master patch 8 12500 0,03006000 0,00788600 (-73.8%) 25 4000 0,02882200 0,00637800 (-77.9%) 100 1000 0,02958000 0,00791400 (-73.2%) 500 200 0,03548800 0,01860400 (-47.6%) 2500 40 0,06360000 0,06850800 (+7.7%) 8000 12 0,13816500 0,20550300 (+48.7%) vavgsh: rept loop master patch 8 12500 0,01965900 0,00776600 (-60.5%) 25 4000 0,01875400 0,00638700 (-65.9%) 100 1000 0,01952200 0,00786900 (-59.7%) 500 200 0,02562000 0,01760300 (-31.3%) 2500 40 0,05384300 0,06742800 (+25.2%) 8000 12 0,13240800 0,20330000 (+53.5%) vavgsw: rept loop master patch 8 12500 0,01407700 0,00775600 (-44.9%) 25 4000 0,01762300 0,00640000 (-63.7%) 100 1000 0,02046500 0,00788500 (-61.5%) 500 200 0,02745600 0,01843000 (-32.9%) 2500 40 0,05375500 0,06820500 (+26.9%) 8000 12 0,13068300 0,20304900 (+55.4%) These results to me seems to indicate that with gvec the results have a slower translation but faster execution. Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-7-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)	d57fbd8fd9	target/ppc: Move VPRTYB[WDQ] to decodetree and use gvec Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8, respectively. vprtybw: rept loop master patch 8 12500 0,01198900 0,00703100 (-41.4%) 25 4000 0,01070100 0,00571400 (-46.6%) 100 1000 0,01123300 0,00678200 (-39.6%) 500 200 0,01601500 0,01535600 (-4.1%) 2500 40 0,03872900 0,05562100 (43.6%) 8000 12 0,10047000 0,16643000 (65.7%) vprtybd: rept loop master patch 8 12500 0,00757700 0,00788100 (4.0%) 25 4000 0,00652500 0,00669600 (2.6%) 100 1000 0,00714400 0,00825400 (15.5%) 500 200 0,01211000 0,01903700 (57.2%) 2500 40 0,03483800 0,07021200 (101.5%) 8000 12 0,09591800 0,21036200 (119.3%) vprtybq: rept loop master patch 8 12500 0,00675600 0,00667200 (-1.2%) 25 4000 0,00619400 0,00643200 (3.8%) 100 1000 0,00707100 0,00751100 (6.2%) 500 200 0,01199300 0,01342000 (11.9%) 2500 40 0,03490900 0,04092900 (17.2%) 8000 12 0,09588200 0,11465100 (19.6%) I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ, I'm not sure if it's worth to move those instructions. Comparing the assembly of the helper with the TCGop they are pretty similar, so I'm not sure why vprtybd took so much more time. Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-6-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)	90b5aadb09	target/ppc: Move VNEG[WD] to decodtree and use gvec Moved the instructions VNEGW and VNEGD to decodetree and used gvec to decode it. vnegw: rept loop master patch 8 12500 0,01053200 0,00548400 (-47.9%) 25 4000 0,01030500 0,00390000 (-62.2%) 100 1000 0,01096300 0,00395400 (-63.9%) 500 200 0,01472000 0,00712300 (-51.6%) 2500 40 0,03809000 0,02147700 (-43.6%) 8000 12 0,09957100 0,06202100 (-37.7%) vnegd: rept loop master patch 8 12500 0,00594600 0,00543800 (-8.5%) 25 4000 0,00575200 0,00396400 (-31.1%) 100 1000 0,00676100 0,00394800 (-41.6%) 500 200 0,01149300 0,00709400 (-38.3%) 2500 40 0,03441500 0,02169600 (-37.0%) 8000 12 0,09516900 0,06337000 (-33.4%) Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-5-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)	611bc69bf6	target/ppc: Move V(ADD\|SUB)CUW to decodetree and use gvec This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an implementation based on the helper, with the main difference being changing the -1 (aka all bits set to 1) result returned by cmp when true to +1. It also implemented a .fni4 version of those instructions and dropped the helper. vaddcuw: rept loop master patch 8 12500 0,01008200 0,00612400 (-39.3%) 25 4000 0,01091500 0,00471600 (-56.8%) 100 1000 0,01332500 0,00593700 (-55.4%) 500 200 0,01998500 0,01275700 (-36.2%) 2500 40 0,04704300 0,04364300 (-7.2%) 8000 12 0,10748200 0,11241000 (+4.6%) vsubcuw: rept loop master patch 8 12500 0,01226200 0,00571600 (-53.4%) 25 4000 0,01493500 0,00462100 (-69.1%) 100 1000 0,01522700 0,00455100 (-70.1%) 500 200 0,02384600 0,01133500 (-52.5%) 2500 40 0,04935200 0,03178100 (-35.6%) 8000 12 0,09039900 0,09440600 (+4.4%) Overall there was a gain in performance, but the TCGop code was still slightly bigger in the new version (it went from 4 to 5). Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-4-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-10-28 13:15:22 -03:00

1 2 3 4 5 ...

381 Commits