| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem
in nir for platforms that don't have popcount or popcountll, such as
32bit msvc.
v2: - Fix additional uses of _mesa_bitcount added after this was
originally written
Acked-by: Eric Engestrom <[email protected]> (v1)
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'd like to reuse the upload logic for a new program cache, but the
buffers will need to have a different lifetime than the default
uploader, and also some address space restrictions. So, we can't
use a single uploader for both situations - we'll need two of them.
This creates a public 'uploader' structure, and adjusts the interface
to take an uploader rather than always using brw->upload. It should
have no functional change at the moment.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This burns an extra 10k of memory or so in the case where you don't have
any images. However, if you have several shaders which use images, this
should be much less memory. It also gets rid of a part of prog_data
that really has nothing to do with the compiler.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This makes it easier to loop over programs.
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel only cares about whether the object is to be written to or
not, only reduces (reloc.read_domains, reloc.write_domain) down to just
!!reloc.write_domain. When we use NO_RELOC, the kernel doesn't even read
those relocs and instead userspace has to pass that information in the
execobject.flags. We can simplify our reloc api by also removing the
unused read/write domains and only pass the resultant flags.
The caveat to the above are when we need to make the kernel aware that
certain objects need to take into account different work arounds.
Previously, this was done using the magic (INSTRUCTION, INSTRUCTION)
reloc domains. NO_RELOC requires this to be passed in the execobject
flags as well, and now we push that up the callstack.
The API is more compact, more expressive of what happens underneath, but
unfortunately requires more knowledge of the system at the point of use.
Conversely it also means that knowledge is specific and not generally
applied and so not overused.
text data bss dec hex filename
8502991 356912 424944 9284847 8dacef lib/i965_dri.so (before)
8500455 356912 424944 9282311 8da307 lib/i965_dri.so (after)
v2: (by Ken) Rebase.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The BRW_NEW_CURBE_OFFSETS dirty bit is signalled when changing the
partitioning of the Constant Buffer URB section between the various
shader stages, on Gen4-5.
BRW_NEW_PUSH_CONSTANT_ALLOCATION is basically the same thing on Gen7+.
So, save a bit, and use the new name.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Here we move OriginUpperLeft and PixelCenterInteger into gl_program
all other fields have been replace by shader_info.
V2: Don't use anonymous union/structs to hold vertex/fragment fields
suggested by Ian.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Note we access shader_info from the program struct rather than the
nir_shader pointer because shader cache won't create a nir_shader.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Here we move the only field in gl_vertex_program to the
ARB program fields in gl_program.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.
There are other advantages such as being able to reuse the
shader info populated by GLSL IR.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->wm.base.prog_data = &brw->wm.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_wm_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->vs.base.prog_data = &brw->vs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_vs_prog_data as needed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]
|
|
|
|
|
|
| |
It's never used.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I finally managed to dig up some information on our mysterious GPU hangs.
A wiki page from the Crestline validation team mentions that they found
a GPU hang in "Serious Sam 2" (on Windows) with remarkably similar
conditions to the ones we've seen in Google Chrome and glmark2.
Apparently, if WM_STATE has "PS Use Source Depth" enabled, CC_STATE has
most depth state disabled, and you issue a CONSTANT_BUFFER command and
immediately draw, the depth interpolator makes a small mistake that
leads to hangs.
Most of the traces I looked at contained a CONSTANT_BUFFER packet
immediately followed by 3DPRIMITIVE, or at least very few packets.
It appears they also have "PS Use Source Depth" enabled - either at the
hang, or a little before it. So I think this is our bug.
The workaround is to emit a non-pipelined state packet after issuing a
CONSTANT_BUFFER packet. This is really similar to the workaround I
developed in commit c4fd0c9052dd391d6f2e9bb8e6da209dfc7ef35b.
v2: Fix word-wrapping issues.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, we only use ctx->NewDriverState.
I used this bash & sed command in the i965 directory:
for file in *.[ch] *.[ch]pp; do
sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file
done
Followed by manual changes to brw_state_upload.c.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen4 hardware appears to GPU hang frequently when using Chromium, and
also when running 'glmark2 -b ideas'. Most of the error states contain
3DPRIMITIVE commands in quick succession, with very few state packets
between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER.
I trimmed an apitrace of the glmark2 hang down to two draw calls with a
glUniformMatrix4fv call between the two. Either draw by itself works
fine, but together, they hang the GPU. Removing the glUniform call
makes the hangs disappear. In the hardware state, this translates to
removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets.
Flushing before emitting CONSTANT_BUFFER packets also appears to make
the hangs disappear. I observed a slowdown in glxgears by doing it all
the time, so I've chosen to only do it when BRW_NEW_BATCH and
BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or
already flushed the whole pipeline).
I'd much rather understand the problem, but at this point, I don't see
how we'd ever be able to track it down further. We have no real tools,
and the hardware people moved on years ago. I've analyzed 20+ error
states and read every scrap of documentation I could find.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Matt Turner <[email protected]>
Cc: "10.4 10.3" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I put the BRW_NEW_*_PROG_DATA flags at the beginning so that
brw_state_cache.c can still continue using 1 << brw_cache_id.
I also added a comment explaining the difference between
BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time
to remember it.
Non-mechanical changes:
- brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache.
- brw_state_upload.c - INTEL_DEBUG=state changes.
- brw_context.h - bit definition merging.
v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention
state-based recompiles, and nix the "proper subset" claim,
as it's false. (Caught by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only
ones that are left are legitimately related to the program cache. Yet,
it seems a bit wasteful to have an entire bitfield for only 7 bits.
State upload is one of the hottest paths in the driver. For each atom
in the list, we call check_state() to see if it needs to be emitted.
Currently, this involves comparing three separate bitfields (mesa, brw,
and cache). Consolidating the brw and cache bitfields would save a
small amount of CPU overhead per atom. Broadwell, for example, has
57 state atoms, so this small savings can add up.
CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the
offset into the program cache BO (prog_offset). Since most uses refer
to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name.
Removing "cache" completely is a bit painful, so I decided to do it in
several patches for easier review, and to separate mechanical changes
from manual ones. This one simply renames things, and was made via:
$ for file in *.[ch]; do
sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \
-e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file
done
Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw!
The next patch will remedy this flaw. It will also fix the
alphabetization issues.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the dirty flags were listed in some arbitrary order. Some used
bonus parenthesis. Some put multiple flags on one line, others put one
per line. Some used tabs instead of spaces...but only on some lines.
This patch settles on one flag per line, in alphabetical order, using
spaces instead of tabs, and sheds the unnecessary parentheses.
Sorting was mostly done with vim's visual block feature and !sort,
although I alphabetized short lists by hand; it was pretty manual.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 5e37a2a4a8a, I made the pull constant code stop calling
_mesa_load_state_parameters() when there were no pull parameters.
This worked fine on Gen6+ because the push constant code also called
it if there were any push constants. However, the Gen4-5 push constant
code wasn't doing this. This patch makes it do so, like the Gen6+ code.
A better long term solution would be to make core Mesa just handle this
for us when necessary.
Fixes around 8766 Piglit tests on Ironlake, and probably Gen4 as well.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Tested-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3e4e74e9a77446bce49607433d86be3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed9187c4d4ce1e3c486b5dd0880d18ec8b
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1eca190417998d548fd140c1eca37a54
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d923fd80c84090fbf4506c9eaaffea1
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404dad009d8cff5124cf8acee7daeaceb64
Signed-off-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The brw_stage_prog_data struct previously contained an array of float pointers
to the values of parameters. These were then copied into a batch buffer to
upload the values using a regular assignment. However the float values were
also being overloaded to store integer values for integer uniforms. This can
break if x87 floating-point registers are used to do the assignment because
the fst instruction tries to fix up invalid float values. If an integer
constant happened to look like an invalid float value then it would get
altered when it was copied into the batch buffer.
This patch changes the pointers to be gl_constant_value instead so that the
assignment should end up copying without any alteration. This also makes it
more obvious that the values being stored here are overloaded for multiple
types.
There are some static asserts where the values are uploaded to ensure that the
size of gl_constant_value is the same as a float.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
If we had some NOS affecting VS compilation that resulted in optimization
changing the set of constants to be uploaded, we might not have reuploaded
the constants.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At this point, the extra copy of the data and memcmp are as expensive as
just re-uploading.
Note: now that we'll always upload, and brw_constant_buffer watches
BRW_NEW_BATCH anyway, we don't need to explicitly unref the old curbe_bo
at batch reset time.
No significant performance difference on glamor copywinwin10 (n=55),
despite that test having a 98% hit rate on the cache.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
No performance difference on glamor with copywinwin10 (n=40) on my gm45.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_stage_prog_data.
There doesn't seem to be any reason for nr_params, nr_pull_params,
param, and pull_param to be duplicated in the stage-specific
subclasses of brw_stage_prog_data. Moving their definition to the
common base class will allow some code sharing in a future commit, the
removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and
the simplification of the stage-specific brw_*_prog_data_compare.
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using an unoptimized variant of glamor spending 50% of its CPU time in
brw_draw_prims() (and hitting the cache *very* frequently):
N Min Max Median Avg Stddev
x 200 29200 40500 34900 34750 958.43256
+ 200 31000 40300 34700 34622 916.35941
No difference proven at 95.0% confidence
Similarly, no difference on GLB2.7:
N Min Max Median Avg Stddev
x 63 64.1 71.36 70.69 70.113175 1.6782026
+ 63 63.6 71.18 70.75 70.223651 1.6044186
No difference proven at 95.0% confidence
v2: Rebase on master (by anholt)
v3: Add a missing BEGIN_BATCH(3) to aa_line_parameters -- CACHED_BATCH
didn't have the asserts about batchbuffer usage that ADVANCE_BATCH
does, so we started assertion failing.
Signed-off-by: Kenneth Graunke <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the
old copyright name is creating unnecessary confusion, hence this change.
This was the sed script I used:
$ cat tg2vmw.sed
# Run as:
#
# git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed
#
# Rename copyrights
s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g
/Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./
s/TUNGSTEN GRAPHICS/VMWARE/g
# Rename emails
s/[email protected]/[email protected]/
s/[email protected]/[email protected]/g
s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/
s/jrfonseca\[email protected]/[email protected]/g
s/keithw\[email protected]/[email protected]/g
s/[email protected]/[email protected]/g
s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/
s/[email protected]/[email protected]/
# Remove dead links
s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g
# C string src/gallium/state_trackers/vega/api_misc.c
s/"Tungsten Graphics, Inc"/"VMware, Inc"/
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
Performed via:
$ for file in *; do sed -i 's/ *//g'; done
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This makes brw_context inherit directly from gl_context; that was the
only thing left in intel_context.
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes brw_context available in every function that used
intel_context. This makes it possible to start migrating fields from
intel_context to brw_context.
Surprisingly, this actually removes some code, as functions that use
OUT_BATCH don't need to declare "intel"; they just use "brw."
Signed-off-by: Kenneth Graunke <[email protected]>
Acked-by: Chris Forbes <[email protected]>
Acked-by: Paul Berry <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow the generic parts to be re-used for geometry shaders.
Reviewed-by: Jordan Justen <[email protected]>
v2: Put urb_read_length and urb_entry_size in the generic struct.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
These have been wrong since f428255bde93a452a7cdd48fba21839c99beb6cb
back in 2009!
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
The old VS backend doesn't exist, but I believe these still need to be
delivered to the clipper thread.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Only the old backend used it.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Ever since ctx->NativeIntegers was set, the conversion flag has been
PARAM_NO_CONVERT.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This is consumed by the unit state.
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
While other units need to know about our constant buffer offsets,
nothing else cared about which particular BO other than the emit() half.
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|