| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
I forgot to delete this in 9ef2b9277d3bead6dbfa47e95794ca61e8be4e84.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This code is far too complicated to cut and paste.
v2: Update the newly added genX_gpu_memcpy.c; const a few things.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
So much of this code was cut and pasted per stage. We can accomplish
much of it by looping over shader stages.
Improves performance of OglBatch7 (version 6) by 1.50783% +/- 0.287049%
(n = 71) at 1024x768 on Cherryview.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
The context fields are for Gen4-5; setting them has always been useless.
There's no point in spending the cost in the hottest path in the driver.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Matt intentionally switched the VS calculation to be float-based in
commit c1da15709a0c0c2775bd9e534f67c60f7dc95ce8. Tessellation support
was written before this and rebased forward, and missed the change.
Now it's consistent.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
If geometry/tessellation shaders are disabled, prog_data will be NULL
(see brw_state_upload.c). This consolidates dirty bits a little.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->gs.base.prog_data = &brw->gs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_gs_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->tes.base.prog_data = &brw->tes.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_tes_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->tcs.base.prog_data = &brw->tcs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_tcs_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->vs.base.prog_data = &brw->vs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_vs_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
Using consistent naming allows us to create macros more easily.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Using consistent naming allows us to create macros more easily.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we have gen_device_info mutable, we can update its values and drop
all copies we had in brw_context.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make gen_device_info a mutable structure so we can update the fields that
can be refined by querying the kernel (like subslices and EU numbers).
This patch does not make any functional change, it just makes
gen_get_device_info() fill a structure rather than returning a const
pointer.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
"intelScreen" is wordy and also doesn't fit our style guidelines.
"screen" is shorter, which is nice, because we use it fairly often.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by:
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes it possible to skip urb re-configuration if the
subsequent renders agree with the settings.
Also allows blorp to allocate the maximun amount of vs entries
available. Core upload logic already knows how to calculate this.
Helps one synthetic benchmark.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes regression introduced by af5ca43f2676bff7499f93277f908b681cb821d0
Cc: "12.0 11.2" <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cherryview/Broxton annoyingly have a minimum number of VS URB entries
of 34, which is not a multiple of 8. When the VS size is less than 9,
the number of VS entries has to be a multiple of 8.
Notably, BLORP programmed the minimum number of VS URB entries (34), with
a size of 1 (less than 9), which is invalid.
It seemed like this could be a problem in the regular URB code as well,
so I went ahead and updated that to be safe.
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
This reverts commit 2a8aa1e3deb99a1ae16d942318da648c1327ece5.
|
|
|
|
|
|
|
| |
Fixes regression introduced by af5ca43f2676bff7499f93277f908b681cb821d0
Reviewed-by: Matt Turner <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is trying to enforce the fact that the hardware requires HS, TE,
and DS to be enabled or disabled together. But it's kind of an ad-hoc
attempt, and not too useful.
More importantly, we aren't going to have a gl_shader_program for the
TCS which is automatically generated when none is present. (We'll just
handle it in the driver backend.) So, these will trip for no reason.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For several reasons, I don't think it's particularly useful to have
separate flags:
1. Most of the time, tessellation shaders are paired, so both will be
replaced at the same time.
2. The data layout is tightly coupled. Both need to agree on the number
of per-patch slots in the VUE map. Even adding extra TCS outputs
that aren't read by the TES will trigger the need for recompiles.
3. The TCS is optional from an API perspective, but required by the
hardware whenever tessellation is enabled. So, atoms that deal with
the TCS must check brw->tess_eval_program (BRW_NEW_TESS_EVAL_PROGRAM?)
rather than brw->tess_ctrl_program to tell whether tessellation is
enabled.
So, not only is it unlikely to be useful, it's a bit confusing to get
right. Simply using one flag for both simplifies this.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: (by Ken, incorporating feedback from Matt Turner):
- Rewrite the push constant allocation code to be clearer.
- Only apply the minimum VS entries workaround on Gen 8.
v3: (by Ken)
- Fix a bug in v2 where we failed to allocate the full push constant
space when the number of enabled stages didn't divide the available
push constant space evenly. (Any left over space is now allocated
to the PS, as it was in v1.)
- Fix an off-by-one error in v2's number of enabled stages calculation.
- Use DIV_ROUND_UP for nicer formatting.
- Line wrapping fixes.
Signed-off-by: Chris Forbes <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
| |
This is a newer convention, which we prefer over ALIGN(x, n) / n.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
This will make sure that we recalculate the URB layout anytime the URB
size is modified by the L3 partitioning code.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
| |
PIPE_CONTOL!!!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Literals without an f/F suffix are of type double, and implicit
conversion rules specify that the float in (float op double) be
converted to a double before the operation is performed. I believe float
execution was intended (in nearly all cases) or is sufficient (in the
case of gen7_urb.c).
Removes a lot of float <-> double conversion instructions and replaces
many double instructions with float instructions which are cheaper.
text data bss dec hex filename
4928659 195160 26192 5150011 4e953b i965_dri.so before
4928315 195152 26192 5149659 4e93db i965_dri.so after
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, we only use ctx->NewDriverState.
I used this bash & sed command in the i965 directory:
for file in *.[ch] *.[ch]pp; do
sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file
done
Followed by manual changes to brw_state_upload.c.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's been merged into brw_state_flags::brw for simplicity and
efficiency.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I put the BRW_NEW_*_PROG_DATA flags at the beginning so that
brw_state_cache.c can still continue using 1 << brw_cache_id.
I also added a comment explaining the difference between
BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time
to remember it.
Non-mechanical changes:
- brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache.
- brw_state_upload.c - INTEL_DEBUG=state changes.
- brw_context.h - bit definition merging.
v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention
state-based recompiles, and nix the "proper subset" claim,
as it's false. (Caught by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only
ones that are left are legitimately related to the program cache. Yet,
it seems a bit wasteful to have an entire bitfield for only 7 bits.
State upload is one of the hottest paths in the driver. For each atom
in the list, we call check_state() to see if it needs to be emitted.
Currently, this involves comparing three separate bitfields (mesa, brw,
and cache). Consolidating the brw and cache bitfields would save a
small amount of CPU overhead per atom. Broadwell, for example, has
57 state atoms, so this small savings can add up.
CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the
offset into the program cache BO (prog_offset). Since most uses refer
to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name.
Removing "cache" completely is a bit painful, so I decided to do it in
several patches for easier review, and to separate mechanical changes
from manual ones. This one simply renames things, and was made via:
$ for file in *.[ch]; do
sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \
-e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file
done
Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw!
The next patch will remedy this flaw. It will also fix the
alphabetization issues.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the dirty flags were listed in some arbitrary order. Some used
bonus parenthesis. Some put multiple flags on one line, others put one
per line. Some used tabs instead of spaces...but only on some lines.
This patch settles on one flag per line, in alphabetical order, using
spaces instead of tabs, and sheds the unnecessary parentheses.
Sorting was mostly done with vim's visual block feature and !sort,
although I alphabetized short lists by hand; it was pretty manual.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We only get here if the VS/GS compiled programs change, but we can even
skip it if the VS/GS size didn't change.
Affects cairo runtime on glamor by -1.26471% +/- 0.674335% (n=234)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3e4e74e9a77446bce49607433d86be3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed9187c4d4ce1e3c486b5dd0880d18ec8b
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1eca190417998d548fd140c1eca37a54
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d923fd80c84090fbf4506c9eaaffea1
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404dad009d8cff5124cf8acee7daeaceb64
Signed-off-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the workarounds list, these stalls aren't needed on
production Baytrail systems. Piglit confirms that as well.
These cause a small slowdown when we are sending a large number of small
batches to the GPU. Removing these improves performance by up to 5% on
some CPU bound SynMark tests (Batch[4-7], DrvState1, HdrBloom,
Multithread, ShMapPcf).
Signed-off-by: Gregory Hunt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the BSpec's 3D workarounds page, this is unnecessary on
shipping Haswell hardware, and was never necessary on Broadwell. It
unfortunately doesn't say anything about Baytrail.
The workaround database confirms those results for Ivybridge, Haswell,
and Broadwell. Baytrail is less clear - one page says it's necessary,
while the other says it isn't. For now, be conservative and leave it
enabled.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Broadwell has 2Kb push constant size increments like Haswell GT3.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
If it wasn't necessary for Haswell, it's likely not to be necessary for
Broadwell either.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When adding geometry shader support, we accidentally reversed the size
and offset parameters.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Cc: "10.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: Don't go to extra work to avoid extraneous flushes. (Previous
experiments in the kernel have suggested that flushing the pipeline
when it is already empty is extremely cheap).
Cc: "10.0" <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Haswell GT2 and GT3 require the number of vertex shader URB entries to
be at least 64, not 32.
At the moment, we always meet this requirement automatically, because
in the absence of a geometry shader, we assign all available URB space
to the vertex shader. But when we turn on support for geometry
shaders, this lower limit will become important.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware requires that after constant buffers for a stage are
allocated using a 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS}
command, and prior to execution of a 3DPRIMITIVE, the corresponding
stage's constant buffers must be reprogrammed using a
3DSTATE_CONSTANT_{VS,HS,DS,GS,PS} command.
Previously we didn't need to worry about this, because we only
programmed 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} once on
startup (or, previous to that, whenever BRW_NEW_CONTEXT was flagged).
But now that we reallocate the constant buffers whenever geometry
shaders are switched on and off, we need to make sure the constant
buffers are reprogrammed.
We do this by adding a new bit, BRW_NEW_PUSH_CONSTANT_ALLOCATION, to
brw->state.dirty.brw.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we would always use the same push constant allocation
regardless of what shader programs were being run: the available push
constant space was split into 2 equal size partitions, one for the
vertex shader, and one for the fragment shader.
Now that we are adding geometry shader support, we need to do
something smarter. This patch adjusts things so that when a geometry
shader is in use, we split the available push constant space into 3
nearly-equal size partitions instead of 2.
Since the push constant allocation is now affected by GL state, it can
no longer be set up by brw_upload_initial_gpu_state(); instead it must
be set up by a state atom.
Reviewed-by: Kenneth Graunke <[email protected]>
|