be2cdc5e35
This paves the way for enabling scalable parallel generation of TCG code. Instead of tracking TBs with a single binary search tree (BST), use a BST for each TCG region, protecting it with a lock. This is as scalable as it gets, since each TCG thread operates on a separate region. The core of this change is the introduction of struct tcg_region_tree, which contains a pointer to a GTree and an associated lock to serialize accesses to it. We then allocate an array of tcg_region_tree's, adding the appropriate padding to avoid false sharing based on qemu_dcache_linesize. Given a tc_ptr, we first find the corresponding region_tree. This is done by special-casing the first and last regions first, since they might be of size != region.size; otherwise we just divide the offset by region.stride. I was worried about this division (several dozen cycles of latency), but profiling shows that this is not a fast path. Note that region.stride is not required to be a power of two; it is only required to be a multiple of the host's page size. Note that with this design we can also provide consistent snapshots about all region trees at once; for instance, tcg_tb_foreach acquires/releases all region_tree locks before/after iterating over them. For this reason we now drop tb_lock in dump_exec_info(). As an alternative I considered implementing a concurrent BST, but this can be tricky to get right, offers no consistent snapshots of the BST, and performance and scalability-wise I don't think it could ever beat having separate GTrees, given that our workload is insert-mostly (all concurrent BST designs I've seen focus, understandably, on making lookups fast, which comes at the expense of convoluted, non-wait-free insertions/removals). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
46 lines
1.3 KiB
C
46 lines
1.3 KiB
C
/*
|
|
* Internal structs that QEMU exports to TCG
|
|
*
|
|
* Copyright (c) 2003 Fabrice Bellard
|
|
*
|
|
* This library is free software; you can redistribute it and/or
|
|
* modify it under the terms of the GNU Lesser General Public
|
|
* License as published by the Free Software Foundation; either
|
|
* version 2 of the License, or (at your option) any later version.
|
|
*
|
|
* This library is distributed in the hope that it will be useful,
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
* Lesser General Public License for more details.
|
|
*
|
|
* You should have received a copy of the GNU Lesser General Public
|
|
* License along with this library; if not, see <http://www.gnu.org/licenses/>.
|
|
*/
|
|
|
|
#ifndef QEMU_TB_CONTEXT_H
|
|
#define QEMU_TB_CONTEXT_H
|
|
|
|
#include "qemu/thread.h"
|
|
#include "qemu/qht.h"
|
|
|
|
#define CODE_GEN_HTABLE_BITS 15
|
|
#define CODE_GEN_HTABLE_SIZE (1 << CODE_GEN_HTABLE_BITS)
|
|
|
|
typedef struct TranslationBlock TranslationBlock;
|
|
typedef struct TBContext TBContext;
|
|
|
|
struct TBContext {
|
|
|
|
struct qht htable;
|
|
/* any access to the tbs or the page table must use this lock */
|
|
QemuMutex tb_lock;
|
|
|
|
/* statistics */
|
|
unsigned tb_flush_count;
|
|
int tb_phys_invalidate_count;
|
|
};
|
|
|
|
extern TBContext tb_ctx;
|
|
|
|
#endif
|