| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
BRW_CACHE_VS_PROG is more easily associated with program caches than
plain BRW_VS_PROG.
While we're at it, rename BRW_WM_PROG to BRW_CACHE_FS_PROG, to move away
from the outdated Windowizer/Masker name.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This flag signifies that we've emitted a new SAMPLER_STATE table.
Given that we haven't cached those in years, CACHE_NEW_SAMPLER isn't
a great name. Putting it in the BRW_NEW_* hierarchy would make more
sense; BRW_NEW_SAMPLER_STATE_TABLE better reflects its actual purpose.
When this flag is raised, the pointer to the SAMPLER_STATE table has
changed, so we need to re-issue any packets which point to it (unit
state on Gen4-5, 3DSTATE_SAMPLER_STATE_POINTERS on Gen6, and the
per-stage variants on Gen7+).
Saves 2 * sizeof(void *) bytes per context, as we remove useless
aux_compare/aux_free function pointers.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Marking brw_stage_state::sampler_count as CACHE_NEW_SAMPLER is wrong.
The number of samplers used by each program is actually computed at
draw time (brw_try_draw_prims), based purely on the currently bound
shader programs (gl_program::SamplersUsed).
CACHE_NEW_SAMPLER means that we've emitted a new SAMPLER_STATE table.
Although this could indicate that the number of samplers has changed,
it could also simply mean that the contents of the table has changed
(i.e. we've bound different textures).
The real reason these atoms depend on CACHE_NEW_SAMPLER is because they
include a pointer to the SAMPLER_STATE table. This was not commented.
So, move the comments to the appropriate place.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've been streaming these out for ages, so they basically have nothing
to do with brw_state_cache.c.
Saves 6 * sizeof(void *) bytes per context, as we won't have useless
aux_compare/aux_free functions for them.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
These always happen together; the extra atom just means another item to
iterate through, flags to check, and a call through a function pointer.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen4-5, unit state is specified as indirect state, rather than
commands. If any unit state changes, we upload it via brw_state_batch
and arrange for 3DSTATE_PIPELINED_POINTERS to be re-emitted, which
updates pointers to all unit state at once.
Since there's only one command and state atom (brw_psp_urb_cs) that
needs to know about this, there's no benefit to having six separate
flags. We can combine CACHE_NEW_*_UNIT into a single flag.
We also haven't cached these in a long time, so it doesn't make sense
to use the "CACHE_NEW_" prefix. Instead, use the "BRW_NEW_" prefix.
This also saves 12 * sizeof(void *) bytes of memory per context, as
we remove useless aux_compare/aux_free functions for each CACHE bit.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the dirty flags were listed in some arbitrary order. Some used
bonus parenthesis. Some put multiple flags on one line, others put one
per line. Some used tabs instead of spaces...but only on some lines.
This patch settles on one flag per line, in alphabetical order, using
spaces instead of tabs, and sheds the unnecessary parentheses.
Sorting was mostly done with vim's visual block feature and !sort,
although I alphabetized short lists by hand; it was pretty manual.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using MRT on Gen4-5, we have to simulate GL's alpha test feature
by emitting discards in the fragment shader. In this case, it makes
sense to set prog_data->uses_kill, which means the fragment shader may
kill pixels via the discard mechanism.
This saves us from having to look an extra key value in a couple of
places, including in the generator.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This means the generator doesn't have to look at the key, which is a
little nicer - we're pretty close to no key dependencies at all.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Kristian noted that there's very little use of brw_wm_prog_key in the
generator, and that it basically just generates what it's told, without
caring about what stage it's handling.
One exception to this is derivative handling. When handling dFdxCoarse
and dFdxFine, we packed an enum value in a second source register,
explicitly telling the generator what to do. For dFdx, we specified an
enum value of "please use the hint", then checked the program key in the
generator level code.
A natural method is to define separate FS_OPCODE_DD[XY]_{COARSE,FINE}
opcodes, and have the front-end (which already decides what IR to
generate based on the program key) decide which dPdx/dPdy should
correspond to. This consolidates the decision making in one place.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
prog_data->foo is a bit more readable than brw->wm.prog_data->foo.
The local variable definition is also a great location to put the
obligatory /* CACHE_NEW_WM_PROG */ comment.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw->wm.prog_data is covered by CACHE_NEW_WM_PROG, not
BRW_NEW_FRAGMENT_PROGRAM. So, we should listen to it.
However, I believe that BRW_NEW_FRAGMENT_PROGRAM is sufficient to cover
all the necessary cases - CACHE_NEW_WM_PROG happens in a subset of
cases. So, the code being wrong shouldn't have triggered bugs.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
Just use the same entrypoints we use for st/wgl's opengl32.dll.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since pack_bytes expands to two mov(4) align1 instructions, we can't use
swizzles directly. For an instruction like
pack_bytes m4.y:UD, vgrf13.xyzw:UD
we can write into the .y component by settings the offset based on the
swizzle.
Also while we're doing this, we can set the dependency control hints
properly, so that a series of pack_bytes writing into separate
components of a register can issue without blocking.
|
|
|
|
|
| |
Reduces the number of instructions needed to implement packSnorm4x8()
from 13 -> 7.
|
|
|
|
|
| |
Reduces the number of instructions needed to implement packUnorm4x8()
from 11 -> 6.
|
|
|
|
| |
Will be used by emit_pack_{s,u}norm_4x8().
|
|
|
|
|
|
|
| |
Reduces the number of instructions needed to implement unpackSnorm4x8()
from 16 -> 6.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Reduces the number of instructions needed to implement unpackUnorm4x8()
from 11 -> 4.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Using Eric's original VF -> float conversion code to initialize the
table.
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes broken rendering in Windows-based QtQuick2 apps run through Wine.
This library sets all texture units' GL_COORD_REPLACE, leaves point
sprite mode enabled, and then draws a triangle fan.
Will need a slightly different fix for Gen4-5, but I don't have my old
machines in a usable state currently.
V2: - Simplify patch -- the real changes are no longer duplicated across
the Gen6 and Gen7 atoms.
- Also don't clobber attr overrides -- which matters on Haswell too,
and fixes the other half of the problem
- Fix newly-introduced warnings
V3: - Use BRW_NEW_GEOMETRY_PROGRAM and brw->geometry_program rather than
core flag and state; keep the state flags in order.
Signed-off-by: Chris Forbes <[email protected]>
Cc: "10.4" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84651
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already precompile GLSL programs; it seems logical to precompile ARB
programs as well. We just never hooked it up.
This also makes the programs compile even if no drawing occurs, which is
useful for shader-db.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the prototypes for brw_vs/gs/fs_precompile were scattered
between brw_vs.h (C), brw_gs.h (C), and brw_fs.h (C++ only). Also,
brw_fs_precompile had C++ linkage, while the others were C.
This patch moves all the prototypes to a central location (brw_shader.h)
and makes brw_fs_precompile have C linkage.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We'd like to do precompiling for ARB vertex and fragment programs,
which only have gl_program structures - gl_shader_program is NULL.
This patch makes the various precompile functions take a gl_program
parameter directly, rather than accessing it via gl_shader_program.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
brw_shader_precompile should just do a precompile; it makes more sense
for the caller to decide whether we should do one. Simpler.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ben was asking about the undocumented restriction that the math
instruction cannot use the dependency control hints. I went to reconfirm
and disabled the is_math() check in opt_set_dependency_control() and saw
that the disassembled math instructions with dependency hints had a
bogus math function. We were mistakenly overwriting it by setting an
empty conditional mod.
Unfortunately, this wasn't the cause of the aforementioned problem (I
reproduced it). This bug is benign, since we don't set dependeny hints
on math instructions -- but maybe some day.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
The math function field is at the same location as conditional mod.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When converting a uniform array reference to a pull constant load, the
`reladdr` expression itself may have its own `reladdr`, arbitrarily
deeply. This arises from expressions like:
a[b[x]] where a, b are uniform arrays (or lowered const arrays),
and x is not a constant.
Just iterate the lowering to pull constants until we stop seeing these
nested. For most shaders, there will be only one pass through this loop.
Fixes the piglit test:
tests/spec/glsl-1.20/linker/double-indirect-1.shader_test
Signed-off-by: Chris Forbes <[email protected]>
Cc: "10.3 10.4" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 5e37a2a4a8a, I made the pull constant code stop calling
_mesa_load_state_parameters() when there were no pull parameters.
This worked fine on Gen6+ because the push constant code also called
it if there were any push constants. However, the Gen4-5 push constant
code wasn't doing this. This patch makes it do so, like the Gen6+ code.
A better long term solution would be to make core Mesa just handle this
for us when necessary.
Fixes around 8766 Piglit tests on Ironlake, and probably Gen4 as well.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Tested-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Fix one of the few cases where we can't reliable touch the destination hazard
bits. I am explicitly doing this patch individually so it is easy to backport. I
was tempted to do this patch before the previous patch which reorganized the
code, but I believe even doing that first, this is still easy to backport.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move this to a separate function so that we can begin to add other little
caveats without making too big a mess.
NOTE: There is some desire to improve this function eventually, but we need to
fix a bug first.
v2:
Use const for the inst for the hazard check (Matt)
Invert safe logic to get rid of the double negative (Matt)
Add PRM reference for predicates (Matt)
Add note about empirical evidence for math (Matt)
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
The visitor emits MOVs to temporary registers for immediates, so these
never trigger. For further proof, check case ir_triop_fma.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
| |
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
| |
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
texture_offset was only used by some texturing operations, and offset
was only used by spill/unspill and some URB operations. These fields are
never used at the same time.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check that the target is GL_TEXTURE_CUBE_MAP before emitting
TEXCOORDTYPE_VECTOR texture coordinates.
I'm not sure if the hardware would like CARTESIAN coordinates
with cube maps, and as I'm too lazy to find out just emit the
VECTOR coordinates for cube maps always. For other targets use
CARTESIAN or HOMOGENOUS depending on the number of texture
coordinates provided.
Fixes rendering of the "electric" background texture in chromium-bsu
main menu. We appear to be provided with three texture coordinates
there (I'm guessing due to the funky texture matrix rotation it does).
So the code would decide to use TEXCOORDTYPE_VECTOR instead of
TEXCOORDTYPE_CARTESIAN even though we're dealing with a 2D texure.
The results weren't what one might expect.
demos/cubemap still works, which hopefully indicates that this doesn't
break things.
Also tested with:
bin/glean -o -v -v -v -t +texCube --quick
bin/cubemap -auto
from piglit.
Reviewed-by: Ian Romanick <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for decoding the new branch control bit. I saw two things wrong with
the existing code.
1. It didn't bother trying to decode the bit.
- While we do not *intentionally* emit this bit today, I think it's interesting
to see if we somehow ended up with the bit set. It may also be useful in the
future.
2. It seemed to be the wrong bit.
- The docs are pretty poor wrt which bit this actually occupies. To me, it
/looks/ like it should be bit 28. I am not sure where Ken got 30 from. I
verified it should be 28 by looking at the simulator code.
I also added the most basic support for GOTO simply so we don't need to remember
to change the function in the future.
v2:
Move the branch_ctrl check out of the if gen >= 6 check to make it more
readable. (Matt)
ENDIF doesn't have branch_ctrl (Matt + Ken)
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Saves a tiny bit of CPU overhead.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Acked-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes regression of WebGL Conformance test texture-size-limit [1] on
Ivybridge Mobile GT2 0x0166 with Google Chrome R38.
Regression introduced by
commit 6c044231535b93c5d16404528946cad618d96bd9
Author: Kenneth Graunke <[email protected]>
Date: Sun Feb 2 02:58:42 2014 -0800
i965: Bump GL_MAX_CUBE_MAP_TEXTURE_SIZE to 8192.
The test regressed because the pointer offset arithmetic in
intel_miptree_map_gtt() overflows for large textures. The pointer
arithmetic is not 64-bit safe.
[1] https://github.com/KhronosGroup/WebGL/blob/52f0dc240f04dce31b1b8e2b8107fe2b8332dc90/sdk/tests/conformance/textures/texture-size-limit.html
Cc: "10.3 10.4" <[email protected]>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=78770
Fixes: Intel CHRMOS-1377
Reported-by: Lu Hua <[email protected]>
Reviewed-by: Ian Romanic <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|