summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* vbo: add assertion in ATTR_UNION macroBrian Paul2015-10-131-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* vbo: add comments, braces in ATTR_UNION() in vbo_exec_api.cBrian Paul2015-10-131-2/+12
| | | | Reviewed-by: Marek Olšák <[email protected]>
* vbo: fix whitespace in vbo_exec_draw.cBrian Paul2015-10-131-13/+12
| | | | Reviewed-by: Marek Olšák <[email protected]>
* vbo: move 'tmp' var initializationBrian Paul2015-10-131-1/+2
| | | | | | Improve readability a bit. Reviewed-by: Marek Olšák <[email protected]>
* vbo: improve fprintf() formattingBrian Paul2015-10-131-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* vbo: simplify vertex array initializations in vbo_context.cBrian Paul2015-10-131-52/+43
| | | | Reviewed-by: Marek Olšák <[email protected]>
* vbo: get rid of needless NR_MAT_ATTRIBS constantBrian Paul2015-10-131-6/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* vbo: fix incorrect switch statement in init_mat_currval()Brian Paul2015-10-131-1/+1
| | | | | | | | | | | | The variable 'i' is a value in [0, MAT_ATTRIB_MAX-1] so subtracting VERT_ATTRIB_GENERIC0 gave a bogus value and we executed the default switch clause for all loop iterations. This doesn't fix any known issues but was clearly incorrect. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* mesa: pass caller name to create_textures()Brian Paul2015-10-131-7/+6
| | | | Simpler than the dsa flag approach.
* i965/vs: Simplify fs_visitor's ATTR file.Kenneth Graunke2015-10-123-21/+49
| | | | | | | | | | | | | | | | | | Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of compilation, assign_vs_urb_setup() translated those into GRF units, and converted ATTR to HW_REGs. This patch moves the transslation earlier, making ATTR work in terms of GRF units from the beginning. assign_vs_urb_setup() simply has to add the number of payload registers and push constants to obtain the final hardware GRF number. (We can't do this earlier as those values aren't known.) ATTR still supports reg_offset; however, it's simply added to reg. It's not clear whether this is valuable or not. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* ff_fragment_shader: Use binding to set the sampler unitIan Romanick2015-10-121-6/+4
| | | | | | | | | | | | This is the way layout(binding=xxx) works from GLSL. The old method just happened to work (and significantly predated support for layout(binding=xxx)), but future changes will break this. v2: Remove some stale comments. Suggested by Matt and Chris Forbes. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.6 11.0" <[email protected]>
* i965: Fix unsafe pointer when dumping VS/FS IRIago Toral Quiroga2015-10-122-2/+2
| | | | | | | | | | | | | | | | | | | | | For the VS and FS stages that use ARB_vertex_program or ARB_fragment_program we don't have a shader program, however, when debuging is enabled, we call brw_dump_ir like this: brw_dump_ir("vertex", prog, &vs->base, &vp->program.Base); where vs will be NULL (since prog is NULL). As pointed out by Chris, this &vs->base is not really a dereference, it simply computes a new address that just happens to be 0x0 because the offset of base in brw_shader is 0. Then brw_dump_ir will see a NULL pointer and not do anything. This is why this does not crash at the moment. However, this does not look very safe (it would crash for any location of base that is not the first in brw_shader), so patch it to prevent a potential (even if unlikely) problem in the future. Reviewed-by: Topi Pohjolainen <[email protected]>
* mesa/uniforms: fix get_uniform for doubles (v2)Dave Airlie2015-10-121-16/+37
| | | | | | | | | | | | | | | The initial glGetUniformdv support didn't cover all the casting cases that are apparantly legal, and cts seems to test for them. I've updated the piglit test to cover these cases now. v2: fix indentation - it's all broken in this file (Ilia) fix src/dst index tracking in light of fp64 support (Ilia) cc: "11.0" <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965/vec4: Implement b2f and b2i using negation.Matt Turner2015-10-111-7/+1
| | | | | | | | | | Curro added this in commit 3ee2daf23d (before the vec4/NIR backend was added) but it was missed in the new NIR backend. Add it there as well. instructions in affected programs: 1857 -> 1810 (-2.53%) helped: 15 Reviewed-by: Francisco Jerez <[email protected]>
* i965/gs: Make MAX_GS_INPUT_VERTICES a #define in brw_context.h.Kenneth Graunke2015-10-103-4/+2
| | | | | | | | For scalar VS, I'll need this in brw_fs.cpp as well. It seems silly to redeclare it in three places. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.Kenneth Graunke2015-10-102-52/+42
| | | | | | | | | | | | | | | | | | | | Previously, we used nir_lower_io with the scalar type_size function, which mapped VERT_ATTRIB_* locations to...some numbers. Then, in fs_visitor::nir_setup_inputs(), we created temporaries indexed by those numbers, and emitted MOVs from the actual ATTR registers to those temporaries. Virtually all of these were copy propagated away, but it's still ugly. This patch reworks our input lowering to produce NIR lower_input intrinsics that properly index into the ATTR file, so we can access it directly. No changes in shader-db. v2: Fix unreachable() message (Ken), update commit message (Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Fix a subtlety in the nr_attributes == 0 workaround.Kenneth Graunke2015-10-101-5/+7
| | | | | | | | | | | | | | | | | | | nr_attributes is used to compute first_non_payload_grf, which is the first register we're allowed to use for ordinary register allocation. The hardware requires us to read at least one pair of values, but we're completely free to overwrite that garbage register with whatever we like. Instead of altering nr_attributes, we should alter urb_read_length, which only affects the amount we ask the VF to read. This should save us a register in trivial cases (which admittedly isn't very useful). While we're at it, improve the explanation in the comments. v2: Actually do what I said (caught by Ilia). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vs: Unify URB entry size/read length calculations between backends.Kenneth Graunke2015-10-104-30/+38
| | | | | | | | | | | | | | | | | Both the vec4 and scalar VS backends had virtually identical URB entry size and read length calculations. We can move those up a level to backend-agnostic code and reuse it for both. Unfortunately, the backends need to know nr_attributes to compute first_non_payload_grf, so I had to store that in prog_data. We could use urb_read_length, but that's nr_attributes rounded up to a multiple of two, so doing so would waste a register in some cases. There's more code to be removed in the vec4 backend, but that will come in a follow-on patch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/cfg: Fix cfg_t::dump() when a block has no immediate dominator.Kenneth Graunke2015-10-101-1/+5
| | | | | | | | | | | Switch statements introduce a bogus loop with an unconditional break at the end of the loop, just before the while...so the while is unreachable and has no immediate dominator. v2: With less exuberance Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gen8: Remove gen<8 checks in gen8 codeChad Versace2015-10-091-4/+4
| | | | | | | Some assertions in gen8_surface_state.c checked for gen < 8. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/gen9: Enable rep clears on gen9Chad Versace2015-10-092-1/+6
| | | | | | | | | | | | | The (gen < 9) check in brw_clear() was too broad. It disabled all types of fast color clears: a. singlesample rep clears b. singlesample MCS fast clears c. multisample MCS fast clears The MCS clears are still buggy, but the rep clear works well. So let's enable it. Reviewed-by: Neil Roberts <[email protected]>
* i965/gen9: Disable MCS for 1x color surfacesChad Versace2015-10-091-0/+8
| | | | | | | | Fast color clears are disabled for gen9 (see the checks in brw_meta_fast_clear), so there is no reason to allocate the MCS and track its clear/resolve state. Reviewed-by: Neil Roberts <[email protected]>
* program: remove _mesa_init_*_program wrappersMarek Olšák2015-10-0910-184/+50
| | | | | | | They didn't do anything useful. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* program: remove other unused functionsMarek Olšák2015-10-092-143/+0
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* program: remove unused cloning and combining functionsMarek Olšák2015-10-092-294/+0
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* program: remove unused function _mesa_find_line_columnMarek Olšák2015-10-092-48/+0
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* st/mesa: release the glsl_to_tgsi visitor after translationMarek Olšák2015-10-091-2/+17
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: translate tessellation shaders into TGSI when we get themMarek Olšák2015-10-093-36/+64
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: translate geometry shaders into TGSI when we get themMarek Olšák2015-10-093-15/+30
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: translate fragment shaders into TGSI when we get themMarek Olšák2015-10-094-37/+55
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: translate vertex shaders into TGSI when we get themMarek Olšák2015-10-093-36/+44
| | | | | | | | | | The translate functions is split into two: - translation to TGSI - creating the variant (TGSI transformations only) Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: fix glDrawPixels with a textureMarek Olšák2015-10-095-29/+111
| | | | | | | | | | | The samplers for DrawPixels data and the pixel map are assigned to slots which don't overlap with the existing sampler slots. The texture coordinates for the user texture are uploaded as a constant. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: implement DrawPixels shader transformation using tgsi_transform_shaderMarek Olšák2015-10-0910-504/+303
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: make Z/S drawpix shaders independent of variants, don't use Mesa IR v2Marek Olšák2015-10-095-136/+60
| | | | | | | | | | | | | - there is no connection to user fragment shaders, so having these as shader variants makes no sense - don't use Mesa IR, use TGSI - don't create gl_fragment_program, just create the shader CSO v2: generate exactly the same shader as before to fix llvmpipe Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: implement glBitmap shader transformation using tgsi_transform_shaderMarek Olšák2015-10-097-244/+202
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: remove old emulation for VS and FS variantsMarek Olšák2015-10-095-107/+17
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: use TGSI utility to emulate features for FS variantsMarek Olšák2015-10-091-6/+21
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: use TGSI utility to emulate features for VS variantsMarek Olšák2015-10-091-12/+29
| | | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: decrease the size of st_vertex_programMarek Olšák2015-10-092-51/+48
| | | | | | | | The other variables can't be moved. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* st/mesa: inline st_prepare_vertex_programMarek Olšák2015-10-092-40/+11
| | | | | | | | | | | No other shader stage has a "prepare" function. This will allow removing some variables from st_vertex_program. Also, prepare_fragment_program was a dead prototype. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* mesa: call ProgramStringNotify for fixed-function vertex programsMarek Olšák2015-10-091-2/+1
| | | | | | | | | Drivers weren't notified about this at all. This allows disabling on-demand compilation in drivers. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
* glsl: move shader_enums into nirRob Clark2015-10-092-4/+2
| | | | | | | | | | | | | | | | | | | | First step towards inverting the dependency between glsl and nir (so nir can be used without glsl). Also solves this issue with 'make distclean' Making distclean in mesa make[2]: Entering directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa' Makefile:2486: ../glsl/.deps/shader_enums.Plo: No such file or directory make[2]: *** No rule to make target '../glsl/.deps/shader_enums.Plo'. Stop. make[2]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa' Makefile:684: recipe for target 'distclean-recursive' failed make[1]: *** [distclean-recursive] Error 1 make[1]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src' Makefile:615: recipe for target 'distclean-recursive' failed make: *** [distclean-recursive] Error 1 Reported-by: Andy Furniss <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* mesa: Get rid of texture-dependent image unit derived state.Francisco Jerez2015-10-094-33/+0
| | | | | | | | | | | | | | The point is to avoid having to re-validate all image units when _NEW_TEXTURE is flagged, which can be expensive if the driver exposes a large number of image units. This has been reported to fix a 36% performance regression in the Synmark2 Multithread benchmark on the i965 driver which exposes 192 image units. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91788 Reported-by: Wendy Wang <[email protected]> Tested-by: Ye Tian <[email protected]> CC: "11.0" <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Use _mesa_is_image_unit_valid() instead of gl_image_unit::_Valid.Francisco Jerez2015-10-093-6/+10
| | | | | | | | gl_image_unit::_Valid will be removed in a future commit. Tested-by: Ye Tian <[email protected]> CC: "11.0" <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: Skip redundant texture completeness checking during image validation.Francisco Jerez2015-10-091-1/+2
| | | | | | | | | | | | | The call to _mesa_test_texobj_completeness() is unnecessary if the texture is already known to be complete. If the texture object is dirtied in the meantime _BaseComplete and _MipmapComplete will be reset to false. _mesa_is_image_unit_valid() will start to be called more frequently in a future commit, so it seems desirable to avoid the unnecessary work. Tested-by: Ye Tian <[email protected]> CC: "11.0" <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: Expose function to calculate whether a shader image unit is valid.Francisco Jerez2015-10-092-4/+15
| | | | | | | | | | | | | A future commit will remove all texture object-dependent derived state from the image unit struct to make validation unnecessary on texture state changes. Instead of checking gl_image_unit::_Valid drivers will be required to call this function when needed to find out whether an image unit is in a valid state and whether access from the shader is allowed. Tested-by: Ye Tian <[email protected]> CC: "11.0" <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Don't tell the hardware about our UAV access.Francisco Jerez2015-10-096-19/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hardware documentation relating to the UAV HW-assisted coherency mechanism and UAV access enable bits is scarce and sometimes contradictory, and there's quite some guesswork behind this commit, so let me summarize the background first: HSW and later hardware have infrastructure to support a stricter form of data coherency between shader invocations from separate primitives. The mechanism is controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and _PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on the 3DPRIMITIVE command. Regardless of whether "UAV Coherency Required" is set, the hardware fixed-function units will increment a per-stage semaphore for each request received if "Accesses UAV" is set for the same or any lower stage. An implicit DC flush is emitted by the lowermost stage with "Accesses UAV" set once it's done processing the request, this also happens regardless of the value of "UAV Coherency Required". The completion of the DC flush will cause the same stage and all previous ones to decrement the semaphore, marking the UAV accesses for the primitive as coherent with L3. The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline stall before any threads are dispatched for the first FF stage with "Accesses UAV" set until the semaphore is cleared for the same stage. Effectively this guarantees that UAV memory accesses performed by previous primitives from any stage will be strictly ordered (and thanks to the implicit DC flush visible in memory) with UAV accesses from the following primitives. None of this is required by the usual image, atomic counter and SSBO GL APIs which have very relaxed cross-primitive coherency and ordering requirements, so we don't actually ever set the "UAV Coherency Required" bit -- Ordering with respect to shader invocations from previous stages on the same primitive where there is a data dependency is of course already guaranteed as the spec requires, regardless of this mechanism being enabled. We do set the "Accesses UAV" bits though since my commit ac7664e493655e290783c23a0412b9c70936da50 (which this patch partially reverts), mainly because of comments like the following from the BDW PRM: > 3DSTATE_GS >[...] > 12 Accesses UAV > Format: Enable > This field must be set when GS has a UAV access. There are similar comments in the documentation for the other 3DSTATE_*S commands. The "must" part is misleading and unjustified AFAIK. Most of the "Accesses UAV" bits don't seem to have any side effects other than the implicit DC flushes and the related book-keeping in anticipation for a subsequent primitive with "UAV Coherency Required" set, so in most cases they are unnecessary and may incur a performance penalty. There is an exception though. On Gen8+ the PS_EXTRA UAV access bit influences the calculation of the PS UAV-only and ThreadDispatchEnable signals which on previous generations were set explicitly by the driver, so we cannot always avoid enabling it on the PS stage. The primary motivation for this change is that in fact the hardware coherency mechanism is buggy and will cause a rather non-deterministic hang on Gen8 when VS is the only stage with "Accesses UAV" set and the processing of a request terminates immediately after the implicit DC flush is sent for a previous primitive with no additional vertices being emitted for the second primitive, what will cause the hardware to skip sending a second DC flush and cause the VS to stall indefinitely waiting for a response from the DC (BDWGFX HSD 1912017). This hardware bug can be reproduced on current master with the spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit subtest (if you have the patience to run it a few dozen times). The proposed workaround is to insert CS STALLs speculatively between 3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage only. Because this would affect one of the hottest paths in the driver and likely decrease performance even further due to the unnecessary serialization, and because we don't actually need the implicit DC flushes, it seems better to just disable them. Cc: 11.0 <[email protected]>
* i965/fs: Handle non-const sample number in interpolateAtSampleNeil Roberts2015-10-094-43/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a non-const sample number is given to interpolateAtSample it will now generate an indirect send message with the sample ID similar to how non-const sampler array indexing works. Previously non-const values were ignored and instead it ended up using a constant 0 value. The generator will try to determine if the sample ID is dynamically uniform via nir_src_is_dynamically_uniform. If not it will query the pixel interpolator in a loop, once for each different live sample number. The next live sample number is found using emit_uniformize. If multiple live channels have the same sample number then they will be handled in a single iteration of the loop. The loop is necessary because the indirect send message doesn't seem to have a way to specify a different value for each fragment. This fixes the following two Piglit tests: arb_gpu_shader5-interpolateAtSample-nonconst arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform v2: Handle dynamically non-uniform sample ids. v3: Remove the BREAK instruction and predicate the WHILE directly. Make the tokens arrays const. (Matt Turner) v4: Iterate over the live channels instead of each possible sample number. v5: Don't special case immediate values in brw_pixel_interpolator_query. Make a better wrapper for the function to set up the PI send instruction. Ensure that the SHL instructions are scalar. (Francisco Jerez). Reviewed-by: Francisco Jerez <[email protected]>
* i965: Add a second successor to BRW_OPCODE_WHILENeil Roberts2015-10-091-0/+4
| | | | | | | | | It is possible to directly predicate the WHILE instruction. In this case there will be a second successor block because the execution can resume from the instruction after the loop. This will be used in a subsequent patch. Reviewed-by: Matt Turner <[email protected]>
* main: fix length of values written to glGetProgramResourceiv() for ↵Samuel Iglesias Gonsalvez2015-10-091-4/+10
| | | | | | | | | | ACTIVE_VARIABLES Return the number of values written. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>