diff options
author | Kenneth Graunke <[email protected]> | 2014-12-03 14:26:48 -0800 |
---|---|---|
committer | Kenneth Graunke <[email protected]> | 2014-12-04 17:50:52 -0800 |
commit | ae45a5a28d8c8a48e7353e37da2ce28a6f2bdef4 (patch) | |
tree | 81ceb657d99bc330ba4cb5d50fb8ac54ac221dee /src/mesa/drivers/dri/i965/brw_draw.c | |
parent | 0b4a6886915571540cfa26fec6fd460d3b81216f (diff) |
i965: Compute VS attribute WA bits earlier and check if they changed.
BRW_NEW_VERTICES is flagged every time we draw a primitive. Having
the brw_vs_prog atom depend on BRW_NEW_VERTICES meant that we had to
compute the VS program key and do a program cache lookup for every
single primitive. This is painfully expensive.
The workaround bit computation is almost entirely based on the vertex
attribute arrays (brw->vb.inputs[i]), which are set by brw_merge_inputs.
The only thing it uses the VS program for is to see which VS inputs are
actually read. brw_merge_inputs() happens once per primitive, and can
safely look at the currently bound vertex program, as it doesn't change
in the middle of a draw.
This patch moves the workaround bit computation to brw_merge_inputs(),
right after assigning brw->vb.inputs[i], and stores the previous WA bit
values in the context. If they've actually changed from the last draw
(which is uncommon), we signal that we need a new vertex program,
causing brw_vs_prog to compute a new key.
Improves performance in Gl32Batch7 by 13.6123% +/- 0.739652% (n=166)
on Haswell GT3e. I'm told Baytrail shows similar gains.
v2: Introduce a new BRW_NEW_VS_ATTRIB_WORKAROUNDS dirty bit, rather
than reusing BRW_NEW_VERTEX_PROGRAM (suggested by Chris Forbes).
This prevents unnecessary re-emission of surface/sampler related
atoms (and an SOL atom on Sandybridge).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Diffstat (limited to 'src/mesa/drivers/dri/i965/brw_draw.c')
-rw-r--r-- | src/mesa/drivers/dri/i965/brw_draw.c | 42 |
1 files changed, 42 insertions, 0 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 4c2802ac66f..c581cc0f5c8 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -46,6 +46,7 @@ #include "brw_defines.h" #include "brw_context.h" #include "brw_state.h" +#include "brw_vs.h" #include "intel_batchbuffer.h" #include "intel_buffers.h" @@ -281,6 +282,7 @@ static void brw_emit_prim(struct brw_context *brw, static void brw_merge_inputs( struct brw_context *brw, const struct gl_client_array *arrays[]) { + const struct gl_context *ctx = &brw->ctx; GLuint i; for (i = 0; i < brw->vb.nr_buffers; i++) { @@ -293,6 +295,46 @@ static void brw_merge_inputs( struct brw_context *brw, brw->vb.inputs[i].buffer = -1; brw->vb.inputs[i].glarray = arrays[i]; } + + if (brw->gen < 8 && !brw->is_haswell) { + struct gl_program *vp = &ctx->VertexProgram._Current->Base; + /* Prior to Haswell, the hardware can't natively support GL_FIXED or + * 2_10_10_10_REV vertex formats. Set appropriate workaround flags. + */ + for (i = 0; i < VERT_ATTRIB_MAX; i++) { + if (!(vp->InputsRead & BITFIELD64_BIT(i))) + continue; + + uint8_t wa_flags = 0; + + switch (brw->vb.inputs[i].glarray->Type) { + + case GL_FIXED: + wa_flags = brw->vb.inputs[i].glarray->Size; + break; + + case GL_INT_2_10_10_10_REV: + wa_flags |= BRW_ATTRIB_WA_SIGN; + /* fallthough */ + + case GL_UNSIGNED_INT_2_10_10_10_REV: + if (brw->vb.inputs[i].glarray->Format == GL_BGRA) + wa_flags |= BRW_ATTRIB_WA_BGRA; + + if (brw->vb.inputs[i].glarray->Normalized) + wa_flags |= BRW_ATTRIB_WA_NORMALIZE; + else if (!brw->vb.inputs[i].glarray->Integer) + wa_flags |= BRW_ATTRIB_WA_SCALE; + + break; + } + + if (brw->vb.attrib_wa_flags[i] != wa_flags) { + brw->vb.attrib_wa_flags[i] = wa_flags; + brw->state.dirty.brw |= BRW_NEW_VS_ATTRIB_WORKAROUNDS; + } + } + } } /** |