| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
move_grf_array_access_to_scratch() calculates scratch buffer offsets in
bytes. However, emit_scratch_read/write() expects the base_offset
parameter to be measured in OWords.
As a result, a shader using a scratch read/write offset greater than
zero (in practice, a shader containing more than one variable in
scratch) would use too large an offset, frequently exceeding the
available scratch space.
This patch corrects the mismatch by removing spurious conversion from
OWords to bytes in move_grf_array_access_to_scratch().
This is based on a patch by Paul Berry.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
It's prohibited by ARB_vp and NV_vp, and not used by fixed function t&l.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Saves 96MB of wasted memory in the l4d2 demo.
v2: Rebase on compare func change, change brace style.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, this just avoids comparing all unused parts of param[] and
pull_param[], but it's a step toward getting rid of those giant statically
sized arrays.
v2: Actually use the new function instead of just looking at its
address. This required changing the args to const pointers.
(review by Kenneth)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Eric added support for WM key debugging. This adds it for the VS.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This never worked. brwProgramStringNotify also explicitly rejects
programs that use CAL and RET. So there's no need for this to exist.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
With this and the previous patch, 640x480 nexuiz is running 0.169118%
+/- 0.0863696% faster (n=121). On a VS state change microbenchmark,
performance is increased 8.28645% +/- 0.460478% (n=52).
v2: Fix CACHE_NEW_VS comment.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that this is all factored out, it's trivial to do.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I initially produced the patch using this bash command:
for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i
's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i
's/GL_FALSE/false/g' $file; done
Then I manually added #include <stdbool.h> to fix compilation errors,
and converted a few functions back to GLboolean that were used in core
Mesa's function pointer table to avoid "incompatible pointer" warnings.
Finally, I cleaned up some whitespace issues introduced by the change.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chad Versace <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, if the user enabled a non-consecutive set of clip planes
(e.g. 0, 1, and 3), the driver would compact them down to a
consecutive set starting at 0. This optimization was of dubious
value, and complicated the implementation of gl_ClipDistance.
This patch changes the driver so that with Gen6 and later chipsets, we
no longer compact the clip planes. However, we still discard any clip
planes beyond the highest number that is in use, so performance should
not be affected for applications that use clip planes consecutively
from 0.
With chipsets previous to Gen6, we still compact the clip planes,
since the pre-Gen6 clipper thread relies on this behavior.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The only remaining uses of brw_vs_prog_key::nr_userclip only occurred
when using clip planes (as opposed to gl_ClipDistance). This patch
renames the value to nr_userclip_planes and sets it to zero when
gl_ClipDistance is in use. This avoids unnecessary VS recompiles.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, brw_compute_vue_map required an argument indicating the
number of clip planes in use, but all it did with it was check if it
was nonzero.
This patch changes brw_compute_vue_map to take a boolean instead.
This allows us to avoid some unnecessary recompilation of the Gen4/5
GS and SF threads.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previous to this patch, setup_uniform_clipplane_values() was setting
up clip plane uniforms based on ctx->Transform.ClipPlanesEnabled, a
piece of state not stored in the vertex shader cache key. As a
result, a change to this piece of state might not trigger a necessary
vertex shader recompile.
The patch adds a field to the vertex shader cache key,
userclip_planes_enabled, to store the current value of
ctx->Transform.ClipPlanesEnabled. Also, it changes
setup_uniform_clipplane_values() to read from this new field, so that
it's manifestly clear that the vertex shader isn't depending on state
not stored in the cache key.
Note: when the vertex shader uses gl_ClipDistance, the VS backend
doesn't need to know which clip planes are in use, so we leave the
field as zero in that case to avoid unnecessary recompiles.
Fixes Piglit test vs-clip-vertex-enables.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
No functional change. This patch rearranges the struct
brw_vs_prog_key so that the two fields related to clipping are
together, and documents those fields. This should make the patches
that follow easier to comprehend, since they add additional
clipping-related fields to this structure.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that i965 supports 8 clip planes instead of 6, the size of the
brw_vs_compile::userplane array needs to be increased to 8. Changed
the array size to MAX_CLIP_PLANES so that if the number changes again
in the future, this array size won't be missed.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
When the vertex shader writes to gl_ClipDistance, we do clipping based
on clip distances rather than user clip planes, so don't waste push
constant space storing user clip planes that won't be used.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Since we now lay out the VUE the same way regardless of whether
two-sided color is enabled, brw_compute_vue_map() no longer needs to
know whether two-sided color is enabled. This allows the two-sided
color flag to be removed from the clip, GS, and VS keys, so that fewer
GPU programs need to be recompiled when turning two-sided color on and
off.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the old VS backend computed the URB entry size by adding
the number of vertex shader outputs to the size of the URB header.
This often produced a larger result than necessary, because some
vertex shader outputs are stored in the header, so they were being
double counted. This patch changes the old VS backend to compute the
URB entry size directly from the number of slots in the VUE map.
Note: there's a subtle change in that we no longer count header
registers towards the size of the VF input. I believe this is
correct, because the header is only emitted in the output of the VS
stage--it is not present in the input. (As evidence for this, note
that brw_vs_state.c sets urb_entry_read_offset to 0--it does not
include space for the header as part of the VS input).
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
structure.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Link failure is something that shouldn't happen, but we sometimes want
it during development. The precompile also allows analysis of shader
codegen with shader-db.
|
| |
|
|
|
|
|
|
| |
The low-level IR is a mashup of brw_fs.cpp and ir_to_mesa.cpp. It's
currently controlled by the INTEL_NEW_VS=1 environment variable, and
only tested for the trivial "gl_Position = gl_Vertex;" shader so far.
|
|
|
|
|
|
|
|
|
| |
This sadly requires work in the VS to rescale them, because the
hardware doesn't support this format natively.
Fixes arb_es2_compatibility-fixed-type and gtf/fixed_data_type.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
VS places color attributes together so that SF unit can fetch the right
attribute according to object orientation. This fixes light issue in
mesa demo geartrain, projtex.
|
| |
|
|
|
|
|
|
|
|
| |
This basically restores the previous state, where a vertex result slot
is set up for the texcoord to be replaced with point coord. Fixes
piglit point-sprite test.
Bug #27625
|
|
|
|
|
|
|
| |
The pull constants require sending out to an overworked shared unit
and waiting for a response, while push constants are nicely loaded in
for us at thread dispatch time. By putting things we access in every
VS invocation there, ETQW performance improved by 2.5% +/- 1.6% (n=6).
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If we can't fit all the VS outputs into the MRF, we need to overflow into
temporary GRF registers, then use some MOVs and a second brw_urb_WRITE()
instruction to place the overflow vertex results into the URB.
This is hit when a vertex/fragment shader pair has a large number of varying
variables (12 or more).
There's still something broken here, but it seems close...
|
|
|
|
|
|
|
| |
Make the use_const_buffer field per-program and only call the code which
updates the constant buffer's data if the flag is set.
This should undo the perf regression from 20f3497e4b6756e330f7b3f54e8acaa1d6c92052
|
|
|
|
|
|
| |
Hook up a constant buffer, binding table, etc for the VS unit.
This will allow using large constant buffers with vertex shaders.
The new code is disabled at this time (use_const_buffer=FALSE).
|
| |
|
|
|
|
|
|
|
|
| |
The i965 driver previously had it's own set of code to convert
fixed-function TNL state to a vertex program. Core Mesa has code to
do this, so there is no reason to duplicate that effort in the driver.
In fact, this duplication leads to bugs when other aspects of the Mesa
infrastructure change.
|
|\
| |
| |
| |
| |
| |
| | |
Conflicts:
src/mesa/drivers/dri/i965/brw_sf.h
src/mesa/drivers/dri/i965/intel_context.c
|
| |
| |
| |
| | |
most of the sample working with some small modification
|
| |
| |
| |
| | |
else instruction fix.
|
| | |
|
|/
|
|
| |
eg: #include "shader/program.h" and remove -I$(TOP)/src/mesa/program
|
| |
|
|
This driver comes from Tungsten Graphics, with a few further modifications by
Intel.
|