aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* gallium: add PIPE_BIND_COMMAND_ARGS_BUFFERChristoph Bumiller2014-07-022-0/+4
| | | | | Intended for use with GL_ARB_draw_indirect's DRAW_INDIRECT_BUFFER target or for D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS.
* xmlconfig/dri: bool -> unsigned charDave Airlie2014-07-024-12/+9
| | | | | | | | | Drop stdbool, due to the X server being a pain and having struct members called bool, although I've sent a patch to fix that we should retain stupidity here. Use unsigned char which is what GLboolean is anyways. Signed-off-by: Dave Airlie <[email protected]>
* i965/fs: Update discard jump to preserve uniform loads via sampler.Cody Northrop2014-07-011-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 17c7ead7 exposed a bug in how uniform loading happens in the presence of discard. It manifested itself in an application as randomly incorrect pixels on the borders of conditional areas. This is due to how discards jump to the end of the shader incorrectly for some channels. The current implementation checks each 2x2 subspan to preserve derivatives. When uniform loading via samplers was turned on, it uses a full execution mask, as stated in lower_uniform_pull_constant_loads(), and only populates four channels of the destination (see generate_uniform_pull_constant_load_gen7()). It happens incorrectly when the first subspan has been jumped over. The series that implemented this optimization was done before the changes to use samplers for uniform loads. Uniform sampler loads use special execution masks and only populate four channels, so we can't jump over those or corruption ensues. This fix only jumps to the end of the shader if all relevant channels are disabled, i.e. all 8 or 16, depending on dispatch. This preserves the original GLbenchmark 2.7 speedup noted in commit beafced2. It changes the shader assembly accordingly: before : (-f0.1.any4h) halt(8) 17 2 null { align1 WE_all 1Q }; after(8) : (-f0.1.any8h) halt(8) 17 2 null { align1 WE_all 1Q }; after(16): (-f0.1.any16h) halt(16) 17 2 null { align1 WE_all 1H }; v2: Cleaned up comments and conditional ordering. v3: Fix typo. Signed-off-by: Cody Northrop <[email protected]> Reviewed-by: Mike Stroyan <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
* i965/fs: Mark case unreachable to silence warning.Matt Turner2014-07-011-0/+2
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Use unreachable() instead of unconditional assert().Matt Turner2014-07-0155-324/+182
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: Make unreachable macro take a string argument.Matt Turner2014-07-017-16/+17
| | | | | | To aid in debugging. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Remove useless conditionals.Matt Turner2014-07-011-6/+3
| | | | | Setting a couple of bits is the same cost or less as conditionally setting a couple of bits.
* i965/fs: Pass cfg to calculate_live_intervals().Matt Turner2014-07-016-12/+15
| | | | | | | We've often created the CFG immediately before, so use it when available. Reviewed-by: Ian Romanick <[email protected]>
* i965: Mark fields in the live interval classes protected.Matt Turner2014-07-012-16/+19
| | | | | | | | cfg, for instance, is a pointer to a local variable in calculate_live_intervals, certainly not valid after that function has returned. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Remove now unused foreach_list* macros.Matt Turner2014-07-011-24/+0
| | | | | | foreach_list_typed_const was never used as far as I can tell. Reviewed-by: Ian Romanick <[email protected]>
* i965: Use typed foreach_in_list_safe instead of foreach_list_safe.Matt Turner2014-07-0111-55/+20
| | | | Acked-by: Ian Romanick <[email protected]>
* i965: Use typed foreach_in_list instead of foreach_list.Matt Turner2014-07-0118-184/+76
| | | | Acked-by: Ian Romanick <[email protected]>
* i965: Add and use foreach_inst_in_block macros.Matt Turner2014-07-017-25/+17
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Use is_head_sentinel() instead of ->prev == NULL.Matt Turner2014-07-011-1/+1
| | | | | | | Makes it more clear what we're doing and requires less knowledge of exec_list. Reviewed-by: Ian Romanick <[email protected]>
* mesa: Add and use foreach_list_typed_safe.Matt Turner2014-07-012-3/+10
| | | | Acked-by: Ian Romanick <[email protected]>
* mesa: Add and use foreach_in_list_use_after.Matt Turner2014-07-013-9/+7
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Replace uses of foreach_list_const.Matt Turner2014-07-012-17/+6
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Replace another couple uses of foreach_list.Matt Turner2014-07-011-6/+4
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Use foreach_list_typed when possible.Matt Turner2014-07-013-31/+18
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: Use typed foreach_in_list_safe instead of foreach_list_safe.Matt Turner2014-07-011-3/+1
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Use typed foreach_in_list_safe instead of foreach_list_safe.Matt Turner2014-07-0120-87/+41
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: Use typed foreach_in_list instead of foreach_list.Matt Turner2014-07-013-85/+41
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Use typed foreach_in_list instead of foreach_list.Matt Turner2014-07-0139-291/+184
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Add typed foreach_in_list_safe macro.Matt Turner2014-07-011-0/+9
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glsl: Add typed foreach_in_list/_reverse macros.Matt Turner2014-07-011-0/+10
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: fix the condition in src/loader/Makefile.amAxel Davy2014-07-011-0/+2
| | | | | | | | We want to have the dri common files compiled to define USE_DRICONF. We need to check both NEED_OPENGL_COMMON and HAVE_DRICOMMON Signed-off-by: Axel Davy <[email protected]> Tested-by: Brian Paul <[email protected]>
* mesa: update comment for UniformBufferSize to indicate size is in bytesBrian Paul2014-07-011-1/+1
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: fix incorrect size of UBO declarationsBrian Paul2014-07-011-1/+8
| | | | | | | | | UniformBufferSize is in bytes so we need to divide by 16 to get the number of constant buffer slots. Also, the ureg_DECL_constant2D() function takes first..last parameters so we need to subtract one for the last value. Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: don't use address register for constant-indexed ir_binop_ubo_loadBrian Paul2014-07-011-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | Before, we were always using the address register and indirect addressing to index into a UBO constant buffer. With this change we only do that when necessary. Using the piglit bin/arb_uniform_buffer_object-rendering test as an example: Shader code: uniform ub_rot {float rotation; }; ... m[1][1] = cos(rotation); Before: IMM[1] INT32 {0, 1, 0, 0} 1: UARL ADDR[0].x, IMM[1].xxxx 2: MOV TEMP[0].x, CONST[3][ADDR[0].x].xxxx 3: COS TEMP[1].x, TEMP[0].xxxx After: 0: COS TEMP[0].x, CONST[3][0].xxxx Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: allow 2D indexing for all shader types in translate_src()Brian Paul2014-07-011-1/+4
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: don't ignore const buf index in src_register()Brian Paul2014-07-011-1/+1
| | | | | | | Otherwise, if we were creating a const buffer src register for a UBO the index into the UBO was always zero. Reviewed-by: Roland Scheidegger <[email protected]>
* nvc0: expose 4 vertex streams, use stream ids in xfbIlia Mirkin2014-07-016-3/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: only merge emit/restart for identical streamsIlia Mirkin2014-07-011-3/+10
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: avoid creating restarts with non-0 streamIlia Mirkin2014-07-011-3/+7
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix emitting vertex streamIlia Mirkin2014-07-011-7/+8
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* mesa/st: add vertex stream supportIlia Mirkin2014-07-012-4/+8
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add a cap for max vertex streamsIlia Mirkin2014-07-0115-1/+28
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add an index argument to create_queryIlia Mirkin2014-07-0125-33/+52
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add support for stream in so infoIlia Mirkin2014-07-013-0/+3
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add vertex stream argument to EMIT/ENDPRIMIlia Mirkin2014-07-013-8/+8
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* i965/fs: Mark predicated PLN instructions with dependency hints.Matt Turner2014-06-301-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | To implement the unlit_centroid_workaround, previously we emitted (+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 1Q }; (-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 1Q }; where the flag register contains the channel enable bits from g0. Since the predicates are complementary, the pair of pln instructions write to non-overlapping components of the destination, which is the case that the dependency control hints are designed for. Typically setting dependency control hints on predicated instructions isn't safe (if an instruction doesn't execute due to the predicate, it won't update the scoreboard, leaving it in a bad state) but since we must have at least one channel executing (i.e., +f0 is true for some channel) by virtue of the fact that the thread is running, we can put the +f0 pln instruction last and set the hints: (-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 NoDDClr 1Q }; (+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 NoDDChk 1Q }; Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Predicate PLN instructions used in unlit centroid WA.Matt Turner2014-06-301-6/+14
| | | | | | Maybe lets us skip some PLN instructions if whole subspans are disabled? Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Add no_dd_{clear,check} fields to fs_inst.Matt Turner2014-06-302-6/+10
| | | | | | | And plumb them through. Also make the assert in the generator look like the vec4 one. Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Let sat-prop ignore live ranges if producer already has sat.Matt Turner2014-06-301-4/+7
| | | | | | | | | | | | | | | | | | | This sequence (where both x and w are used afterwards) wasn't handled. mul.sat x, y, z ... mov.sat w, x We assumed that if x was used after the mov.sat, that we couldn't propagate the saturate modifier, but in fact x was already saturated. So ignore the live range check if the producing instruction already saturates its result. Cuts one instruction from hundreds of TF2 shaders. total instructions in shared programs: 1995631 -> 1994951 (-0.03%) instructions in affected programs: 155248 -> 154568 (-0.44%) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Pass const references to emit functions.Matt Turner2014-06-302-12/+14
| | | | Cuts 10k of .text and saves a bunch of useless struct copies.
* i965/vec4: Pass const references to instruction functions.Matt Turner2014-06-302-41/+67
| | | | | | | | | | text data bss dec hex filename 4231165 123200 39648 4394013 430c1d i965_dri.so 4186277 123200 39648 4349125 425cc5 i965_dri.so Cuts 43k of .text and saves a bunch of useless struct copies. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Pass const references to vec4_instruction().Matt Turner2014-06-302-6/+7
| | | | | | | | | | text data bss dec hex filename 4244821 123200 39648 4407669 434175 i965_dri.so 4231165 123200 39648 4394013 430c1d i965_dri.so Cuts 13k of .text and saves a bunch of useless struct copies. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Pass const references to instruction functions.Matt Turner2014-06-302-34/+41
| | | | | | | | | | text data bss dec hex filename 4270747 123200 39648 4433595 43a6bb i965_dri.so 4244821 123200 39648 4407669 434175 i965_dri.so Cuts 25k of .text and saves a bunch of useless struct copies. Reviewed-by: Kenneth Graunke <[email protected]>
* radeonsi: Use dma_copy when possible for si_blit.Axel Davy2014-07-011-0/+19
| | | | | | | | | | | | This improves GLX DRI3 GPU offloading significantly on CPU bound benchmarks particularly. No performance impact for DRI2 GPU offloading. v2: Add missing tests Signed-off-by: Axel Davy <[email protected]> Reviewed-by: Marek Olšák<[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glx/dri3: add GPU offloading support.Axel Davy2014-07-012-32/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The differences with DRI2 GPU offloading are: a) There's no logic for GPU offloading needed in the Xserver b) for DRI2, the card would render to a back buffer, and the content would be copied to the front buffer (the same buffers everytime). Here we can potentially use several back buffers and copy to buffers with no tiling to share with X. We send them with the Present extension. That means than the DRI2 solution is forced to have tearings with GPU offloading. In the ideal scenario, this DRI3 solution doesn't have this problem. However without dma-buf fences, a race can appear (if the card is slow and the rendering hasn't finished before the server card reads the buffer), and then old content is displayed. If a user hits this, he should probably revert to the DRI2 solution (LIBGL_DRI3_DISABLE). Users with cards fast enough seem to not hit this in practice (I have an Amd hd 7730m, and I don't hit this, except if I force a low dpm mode) c) for non-fullscreen apps, the DRI2 GPU offloading solution requires compositing. This DRI3 solution doesn't have this requirement. Rendering to a pixmap also works. d) There is no need to have a DDX loaded for the secondary card. V4: Fixes some piglit tests Signed-off-by: Axel Davy <[email protected]> Signed-off-by: Dave Airlie <[email protected]>