summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* gm107/ra: fix constraints for surface operationsSamuel Pitoiset2016-07-201-2/+23
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: lower surface operationsSamuel Pitoiset2016-07-202-1/+77
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: bind images for 3d/cp shaders on GM107+Samuel Pitoiset2016-07-205-18/+207
| | | | | | | | | On Maxwell, images binding is slightly different (and much better) regarding Fermi and Kepler because a texture view needs to be uploaded for each image and this is going to simplify the thing a lot. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: increase the tex handles area size in the driver cbSamuel Pitoiset2016-07-201-11/+11
| | | | | | | | | | | | Currently, we can store 32 tex handles of 32-bits integer each and that fits perfectly with the underlying hardware except on GM107+ which requires to upload a texture view for each images. This patch increases the number of storable texture handles in the driver constant buffer from 32 to 40 because we expose 8 images. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nir: Fix uninitialized use of 'replacement'.Kenneth Graunke2016-07-191-1/+1
| | | | | | | | | | | For intrinsics we don't care about, just skip to the next loop iteration and process the next instruction. We don't want to execute the rest of the code. This was a bug in commit cdfc05ea6e8c87876cdbf588aa8e03d70f3da4bb. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Use tex_mocs instead of rb_mocs for GL images.Kenneth Graunke2016-07-191-1/+1
| | | | | | | | | | | | | | Fixes a 10-20% performance regression in OglCSDof caused by commit 5a8c89038abab0184ea72664ab390ec6ca58b4d6, which made images (in the image load/store sense) use BDW_MOCS_PTE instead of BDW_MOCS_WB. This seems sketchy, as the default PTE value is supposed to be WB LLC eLLC, which is the same as our MOCS WB setting. It's only supposed to change when using a surface for display, which won't ever happen for images. Something may be wrong in the kernel... Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* winsys/amdgpu: use pb_cache buckets for fewer pb_cache missesMarek Olšák2016-07-191-6/+21
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/radeon: use pb_cache buckets for fewer pb_cache missesMarek Olšák2016-07-191-7/+22
| | | | | | This makes Bioshock Infinite with deferred flushing 2.2% faster. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/pb_cache: reduce the number of pointer dereferencesMarek Olšák2016-07-191-7/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/pb_cache: divide the cache into buckets for reducing cache missesMarek Olšák2016-07-195-26/+47
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/pb_cache: check parameters that are more likely to fail firstMarek Olšák2016-07-191-8/+7
| | | | | | This makes Bioshock Infinite with deferred flushing 2% faster. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: emit PS exports lastMarek Olšák2016-07-191-13/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This effectively removes s_waitcnt instructions after FP16 exports. Before: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300 v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702 exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v4, v5 ; 5E000B04 v_cvt_pkrtz_f16_f32_e32 v1, v6, v7 ; 5E020F06 exp 15, 1, 1, 0, 0, v0, v1, v0, v0 ; F800041F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v8, v9 ; 5E001308 v_cvt_pkrtz_f16_f32_e32 v1, v10, v11 ; 5E02170A exp 15, 2, 1, 0, 0, v0, v1, v0, v0 ; F800042F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v12, v13 ; 5E001B0C v_cvt_pkrtz_f16_f32_e32 v1, v14, v15 ; 5E021F0E exp 15, 3, 1, 1, 1, v0, v1, v0, v0 ; F8001C3F 00000100 s_endpgm ; BF810000 After: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300 v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702 v_cvt_pkrtz_f16_f32_e32 v2, v4, v5 ; 5E040B04 v_cvt_pkrtz_f16_f32_e32 v3, v6, v7 ; 5E060F06 exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100 v_cvt_pkrtz_f16_f32_e32 v4, v8, v9 ; 5E081308 v_cvt_pkrtz_f16_f32_e32 v5, v10, v11 ; 5E0A170A exp 15, 1, 1, 0, 0, v2, v3, v0, v0 ; F800041F 00000302 v_cvt_pkrtz_f16_f32_e32 v6, v12, v13 ; 5E0C1B0C v_cvt_pkrtz_f16_f32_e32 v7, v14, v15 ; 5E0E1F0E exp 15, 2, 1, 0, 0, v4, v5, v0, v0 ; F800042F 00000504 exp 15, 3, 1, 1, 1, v6, v7, v0, v0 ; F8001C3F 00000706 s_endpgm ; BF810000 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set optimal settings in COMPUTE_RESOURCE_LIMITSMarek Olšák2016-07-191-2/+6
| | | | | | ported from Vulkan Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: really wait for the second EOP event and not the first oneMarek Olšák2016-07-191-1/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove RADEON_FLUSH_KEEP_TILING_FLAGS flagMarek Olšák2016-07-195-16/+4
| | | | | | always set Reviewed-by: Nicolai Hähnle <[email protected]>
* nir/algebraic: Optimize fabs(u2f(x))Ian Romanick2016-07-191-0/+1
| | | | | | | | | | I noticed this when I tried to do frexp(float(some_unsigned)) in the ir_unop_find_lsb lowering pass. The code generated for frexp() uses fabs, and this resulted in an extra instruction. Ultimately I ended up not using frexp. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* st/mesa: Enable MESA_shader_integer_functions on all GLSL 1.30 platformsIan Romanick2016-07-192-1/+16
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Enable MESA_shader_integer_functions on all GLSL 1.30 platformsIan Romanick2016-07-192-6/+15
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Don't lower uaddCarry and usubBorrow in both GLSL IR and NIRIan Romanick2016-07-191-3/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Update assertion to account for Gen < 7Ian Romanick2016-07-191-1/+4
| | | | | | | | | | | | Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the assertion assumed the Gen7+ accumulator rules. A future patch will allow this instruction on at least Gen6, so update the assertion. v2: Use get_lowered_simd_width instead of open coding it. Suggested by Curro. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> [v1]
* i965: Use LZD to implement nir_op_find_lsb on Gen < 7Ian Romanick2016-07-192-3/+45
| | | | | | | v2: Rebase on changes to previous two patches. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use LZD to implement nir_op_ifind_msb on Gen < 7Ian Romanick2016-07-192-21/+90
| | | | | | | | | v2: Retype LZD source as UD to avoid potential problems with 0x80000000. Suggested by Matt. Also update comment about problem values with LZD(abs(x)). Suggested by Curro. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use LZD to implement nir_op_ufind_msbIan Romanick2016-07-194-1/+54
| | | | | | | | | | This uses one less instruction. v2: Move emit_find_msb_using_lzd out of the visitor classes. Suggested by Curro. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Always enable GL_ARB_shading_language_packingIan Romanick2016-07-191-1/+1
| | | | | | | | | With the existing lowering passes, the functions from this extension become a bunch of bit twiddling operations that have always been supported. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move enable of EXT_shader_integer_mixIan Romanick2016-07-191-1/+2
| | | | | | | | This extension does not depend on the Gen. It only depends on the availability of GLSL 1.30. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_bin_imul_highIan Romanick2016-07-192-0/+150
| | | | | | | | | | This isn't the lowering pass you want. Most GPUs that can support GLSL 1.30 have a multiply unit that can do something more interesting than 32x32->32. Many have 32x16->48. Any GPU that does, should do the lowering in the backend. This is just the thing that will always work. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_unop_find_msbIan Romanick2016-07-192-0/+107
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_unop_find_lsbIan Romanick2016-07-192-0/+87
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_unop_bitfield_reverseIan Romanick2016-07-192-0/+92
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_quadop_bitfield_insertIan Romanick2016-07-192-0/+74
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_triop_bitfield_extractIan Romanick2016-07-192-0/+81
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add lowering pass for ir_unop_bit_countIan Romanick2016-07-192-0/+54
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* MESA_shader_integer_functions: Allow new function overload matching rulesIan Romanick2016-07-191-5/+7
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* MESA_shader_integer_functions: Allow implicit int->uint conversionsIan Romanick2016-07-192-6/+10
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* MESA_shader_integer_functions: Expose new built-in functionsIan Romanick2016-07-191-11/+20
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* MESA_shader_integer_functions: Boiler plate extension trackingIan Romanick2016-07-196-0/+10
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* MESA_shader_integer_functions: Add extension specificationIan Romanick2016-07-191-0/+520
| | | | | | | | | v2: Fix typo in #extension line noticed by Ken. v3: Update spec status. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* gm107/ir: make use of ADD32I for all immediatesSamuel Pitoiset2016-07-191-1/+1
| | | | | | | | ADD only allows to emit 19-bits immediates. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* gm107/ir: add missing NEG modifier for IADD32ISamuel Pitoiset2016-07-191-0/+1
| | | | | | | | Like FADD32I, the NEG modifier of src0 is at position 56. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* ddebug: Fix trivial typo in stderr messageAndreas Boll2016-07-191-1/+1
| | | | Signed-off-by: Andreas Boll <[email protected]>
* configure.ac: Use ${datarootdir} for --with-vulkan-icddir help string tooAndreas Boll2016-07-191-1/+1
| | | | | | | | | | | | The help string wasn't updated in cbc37f7. Fixes: cbc37f7 ("anv: install the intel_icd.json to ${datarootdir} by default") Signed-off-by: Andreas Boll <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Cc: [email protected]
* vl: fix memory leakEric Engestrom2016-07-191-7/+1
| | | | | | | CovID: 1363008 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Nayan Deshmukh <[email protected]> Reviewed-by: Christian König <[email protected]>
* vl: add entry pointBoyuan Zhang2016-07-191-0/+1
| | | | | | | | | | | | | Add entrypoint to distinguish H.264 decode and encode. For example, in patch 5/11 when is calling "VaCreateContext", "pps" and "sps" shouldn't be allocated for H.264 encoding. So we need to use the entry_point to determine this is H.264 decode or H.264 encode. We can use config to determine the entrypoint since config_id is passed to us for VaCreateContext call. However, for VaDestoyContext call, only context_id is passed to us. So we need to know the entrypoint in order to not free the pps/sps for encoding case. Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
* nv50,nvc0: srgb rendering is only available for rgba/bgraIlia Mirkin2016-07-181-2/+2
| | | | | | | | | | | | | | | | Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already didn't have the bind flags). This makes the state tracker pick a different format when rendering is required, or mark the fb as incomplete. This fixes: bin/getteximage-formats init-by-clear-and-render -auto -fbo bin/getteximage-formats init-by-rendering -auto -fbo which previously ran into srgb-encoding differences. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nvc0: add support for BGRA8 imagesIlia Mirkin2016-07-187-1/+16
| | | | | | | | This is useful for pbo downloads, which are now accelerated with images. BGRA8 is a moderately common format to do that in. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* i965: Skip update_texture_surface when the plane doesn't existJason Ekstrand2016-07-181-8/+10
| | | | | | | | | Thanks to rebase fail, recent surface state changes (commits 7e951cd56, 8521ce1a7, and 69c0dc5c53) effectively reverted 727a9b24933 and 367cf3a2e3e which was unintentional. This should bring it back. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* glsl: use linked shaders rather than compiled shadersTimothy Arceri2016-07-191-4/+4
| | | | | | | | | | | At this point there is no reason not to be using the linked shaders, using the linked shaders should be faster and will make things simpler for upcoming shader cache work. The previous variable name suggests the linked shaders were intended to be used here anyway. Reviewed-by: Iago Toral Quiroga <[email protected]>
* The extension is already exposed, this simply marks it as done.Lars Hamre2016-07-191-1/+1
| | | | | Signed-off-by: Lars Hamre <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* docs: Fix typo in extension nameAnuj Phogat2016-07-181-1/+1
| | | | Signed-off-by: Anuj Phogat <[email protected]>
* docs: Add support for GL_KHR_texture_compression_astc_sliced_3dAnuj Phogat2016-07-181-0/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reported-by: Ilia Mirkin <[email protected]>