diff options
author | Kenneth Graunke <[email protected]> | 2017-03-28 20:20:00 -0700 |
---|---|---|
committer | Kenneth Graunke <[email protected]> | 2017-04-10 14:32:00 -0700 |
commit | eb41aa82c42c249d0b5c94853f876e3e0d5a6a0e (patch) | |
tree | 8c672fae3edecffe731eb3b9435a6a534492efbb /src/mesa/drivers/dri/i965/brw_bufmgr.h | |
parent | e7ab0ea5e7e7a128023928d77c1f1346a27509c2 (diff) |
i965/drm: Rewrite relocation handling.
The execbuf2 kernel API requires us to construct two kinds of lists.
First is a "validation list" (struct drm_i915_gem_exec_object2[])
containing each BO referenced by the batch. (The batch buffer itself
must be the last entry in this list.) Each validation list entry
contains a pointer to the second kind of list: a relocation list.
The relocation list contains information about pointers to BOs that
the kernel may need to patch up if it relocates objects within the VMA.
This is a very general mechanism, allowing every BO to contain pointers
to other BOs. libdrm_intel models this by giving each drm_intel_bo a
list of relocations to other BOs. Together, these form "reloc trees".
Processing relocations involves a depth-first-search of the relocation
trees, starting from the batch buffer. Care has to be taken not to
double-visit buffers. Creating the validation list has to be deferred
until the last minute, after all relocations are emitted, so we have the
full tree present. Calculating the amount of aperture space required to
pin those BOs also involves tree walking, which is expensive, so libdrm
has hacks to try and perform less expensive estimates.
For some reason, it also stored the validation list in the global
(per-screen) bufmgr structure, rather than as an local variable in the
execbuffer function, requiring locking for no good reason.
It also assumed that the batch would probably contain a relocation
every 2 DWords - which is absurdly high - and simply aborted if there
were more relocations than the max. This meant the first relocation
from a BO would allocate 180kB of data structures!
This is way too complicated for our needs. i965 only emits relocations
from the batchbuffer - all GPU commands and state such as SURFACE_STATE
live in the batch BO. No other buffer uses relocations. This means we
can have a single relocation list for the batchbuffer. We can add a BO
to the validation list (set) the first time we emit a relocation to it.
We can easily keep a running tally of the aperture space required for
that list by adding the BO size when we add it to the validation list.
This patch overhauls the relocation system to do exactly that. There
are many nice benefits:
- We have a flat relocation list instead of trees.
- We can produce the validation list up front.
- We can allocate smaller arrays and dynamically grow them.
- Aperture space checks are now (a + b <= c) instead of a tree walk.
- brw_batch_references() is a trivial validation list walk.
It should be straightforward to make it O(1) in the future.
- We don't need to bloat each drm_bacon_bo with 32B of reloc data.
- We don't need to lock in execbuffer, as the data structures are
context-local, and not per-screen.
- Significantly less code and a better match for what we're doing.
- The simpler system should make it easier to take advantage of
I915_EXEC_NO_RELOC in a future patch.
Improves performance in Synmark 7.0's OglBatch7:
- Skylake GT4e: 12.1499% +/- 2.29531% (n=130)
- Apollolake: 3.89245% +/- 0.598945% (n=35)
Improves performance in GFXBench4's gl_driver2 test:
- Skylake GT4e: 3.18616% +/- 0.867791% (n=229)
- Apollolake: 4.1776% +/- 0.240847% (n=120)
v2: Feedback from Chris Wilson:
- Omit explicit zero initializers for garbage execbuf fields.
- Use .rsvd1 = ctx_id rather than i915_execbuffer2_set_context_id
- Drop unnecessary fencing assertions.
- Only use _WR variant of execbuf ioctl when necessary.
- Shrink the arrays to be smaller by default.
Reviewed-by: Jason Ekstrand <[email protected]>
Diffstat (limited to 'src/mesa/drivers/dri/i965/brw_bufmgr.h')
-rw-r--r-- | src/mesa/drivers/dri/i965/brw_bufmgr.h | 53 |
1 files changed, 9 insertions, 44 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h index 237f39bb078..d3db6a3967b 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h @@ -88,6 +88,15 @@ struct _drm_bacon_bo { * entries when calling drm_bacon_bo_emit_reloc() */ uint64_t offset64; + + /** + * Boolean of whether the GPU is definitely not accessing the buffer. + * + * This is only valid when reusable, since non-reusable + * buffers are those that have been shared with other + * processes, so we don't know their state. + */ + bool idle; }; #define BO_ALLOC_FOR_RENDER (1<<0) @@ -178,37 +187,6 @@ void drm_bacon_bo_wait_rendering(drm_bacon_bo *bo); */ void drm_bacon_bufmgr_destroy(drm_bacon_bufmgr *bufmgr); -/** Executes the command buffer pointed to by bo. */ -int drm_bacon_bo_exec(drm_bacon_bo *bo, int used); - -/** Executes the command buffer pointed to by bo on the selected ring buffer */ -int drm_bacon_bo_mrb_exec(drm_bacon_bo *bo, int used, unsigned int flags); -int drm_bacon_bufmgr_check_aperture_space(drm_bacon_bo ** bo_array, int count); - -/** - * Add relocation entry in reloc_buf, which will be updated with the - * target buffer's real offset on on command submission. - * - * Relocations remain in place for the lifetime of the buffer object. - * - * \param bo Buffer to write the relocation into. - * \param offset Byte offset within reloc_bo of the pointer to - * target_bo. - * \param target_bo Buffer whose offset should be written into the - * relocation entry. - * \param target_offset Constant value to be added to target_bo's - * offset in relocation entry. - * \param read_domains GEM read domains which the buffer will be - * read into by the command that this relocation - * is part of. - * \param write_domains GEM read domains which the buffer will be - * dirtied in by the command that this - * relocation is part of. - */ -int drm_bacon_bo_emit_reloc(drm_bacon_bo *bo, uint32_t offset, - drm_bacon_bo *target_bo, uint32_t target_offset, - uint32_t read_domains, uint32_t write_domain); - /** * Ask that the buffer be placed in tiling mode * @@ -271,9 +249,6 @@ int drm_bacon_bo_disable_reuse(drm_bacon_bo *bo); */ int drm_bacon_bo_is_reusable(drm_bacon_bo *bo); -/** Returns true if target_bo is in the relocation tree rooted at bo. */ -int drm_bacon_bo_references(drm_bacon_bo *bo, drm_bacon_bo *target_bo); - /* drm_bacon_bufmgr_gem.c */ drm_bacon_bufmgr *drm_bacon_bufmgr_gem_init(struct gen_device_info *devinfo, int fd, int batch_size); @@ -290,8 +265,6 @@ void *drm_bacon_gem_bo_map__cpu(drm_bacon_bo *bo); void *drm_bacon_gem_bo_map__gtt(drm_bacon_bo *bo); void *drm_bacon_gem_bo_map__wc(drm_bacon_bo *bo); -int drm_bacon_gem_bo_get_reloc_count(drm_bacon_bo *bo); -void drm_bacon_gem_bo_clear_relocs(drm_bacon_bo *bo, int start); void drm_bacon_gem_bo_start_gtt_access(drm_bacon_bo *bo, int write_enable); int drm_bacon_gem_bo_wait(drm_bacon_bo *bo, int64_t timeout_ns); @@ -300,14 +273,6 @@ drm_bacon_context *drm_bacon_gem_context_create(drm_bacon_bufmgr *bufmgr); int drm_bacon_gem_context_get_id(drm_bacon_context *ctx, uint32_t *ctx_id); void drm_bacon_gem_context_destroy(drm_bacon_context *ctx); -int drm_bacon_gem_bo_context_exec(drm_bacon_bo *bo, drm_bacon_context *ctx, - int used, unsigned int flags); -int drm_bacon_gem_bo_fence_exec(drm_bacon_bo *bo, - drm_bacon_context *ctx, - int used, - int in_fence, - int *out_fence, - unsigned int flags); int drm_bacon_bo_gem_export_to_prime(drm_bacon_bo *bo, int *prime_fd); drm_bacon_bo *drm_bacon_bo_gem_create_from_prime(drm_bacon_bufmgr *bufmgr, |