aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add missing persample_shading field to brw_wm_debug_recompile.Kenneth Graunke2014-07-211-0/+2
| | | | | | | | Otherwise, the performance warning for shader recompiles will just say "something else". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/disasm: Don't disassemble the URB complete field on Broadwell.Kenneth Graunke2014-07-211-2/+4
| | | | | | | | It doesn't exist, so attempting to read it will trigger generation assertions in the brw_inst API. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Disable hex offset printing in disassembly.Kenneth Graunke2014-07-211-1/+2
| | | | | | | | | | | | | | | Printing the hex offsets makes it basically impossible to diff assembly: if you add even a single instruction, the entire shader shows up as a difference. So, every time I want to compare assembly, I have to strip this out. The hex offsets might be useful when debugging compaction, or when inspecting the program cache buffer. Since it's occasionally useful, but uncommon, this patch disables it by default, but makes it easy to re-enable it temporarily when the need arises. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Use foreach_inst_in_block a couple more places.Matt Turner2014-07-212-8/+2
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Replace cfg instances with calls to calculate_cfg().Matt Turner2014-07-215-22/+22
| | | | | | | | | | | Avoids regenerating it unnecessarily. Every program in shader-db improved, none by an amount less than a 1/3 reduction. One Dota2 shader decreased from 62 -> 24. cfg calculations: 429492 -> 193197 (-55.02%) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Add a foreach_block_and_inst macro.Matt Turner2014-07-211-0/+4
| | | | | | Will let us abstract how the instructions are stored. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add cfg to backend_visitor.Matt Turner2014-07-219-33/+48
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* radeonsi/compute: Add support scratch buffer support v2Tom Stellard2014-07-213-2/+85
| | | | | | | | The scratch buffer will be used for private memory and also register spilling. v2: - Code cleanups
* radeonsi/compute: Bump number of user sgprs for LLVM 3.5Tom Stellard2014-07-211-1/+6
| | | | Reviewed-by: Marek Olšák <[email protected]>
* winsys/radeon: Query the kernel for the number of SEs and SHs per SETom Stellard2014-07-212-0/+8
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/compute: Share COMPUTE_DBG macro with r600gTom Stellard2014-07-213-13/+10
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Read rodata from ELF and append it to the end of shadersTom Stellard2014-07-213-1/+22
| | | | | | | The is used for programs that have arrays of constants that are accessed using dynamic indices. The shader will compute the base address of the constants and then access them using SMRD instructions.
* glsl: Fix bad indentationIan Romanick2014-07-191-1/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence unused parameter warningIan Romanick2014-07-191-1/+1
| | | | | | | brw_fs_visitor.cpp:2400:1: warning: unused parameter 'ir' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence 'comparison is always true' warningIan Romanick2014-07-191-2/+0
| | | | | | | | | | | | | The parameter is an int16_t, and we're check that it's value will fit in 16-bits. Yes, the value that is stored in 16-bits will surely fit in 16-bits. brw_inst.h: In function 'brw_inst_set_gen6_jump_count': brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence many unused parameter warningsIan Romanick2014-07-191-0/+10
| | | | | | | | brw_inst.h: In function 'brw_inst_set_src1_vstride': brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* main/format_pack: Fix a wrong datatype in pack_ubyte_R8G8_UNORMJason Ekstrand2014-07-181-1/+1
| | | | | | | | Before it was only storing one of the color components due to truncation. With this patch it now properly stores all of them. Reviewed-by: Brian Paul <[email protected]> Cc: "10.2" <[email protected]>
* Add support for RGBA8 and RGBX8 textures in intel_texsubimage_tiled_memcpyJason Ekstrand2014-07-171-0/+11
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Improve debug output in intelTexImage and intelTexSubimageJason Ekstrand2014-07-172-1/+9
| | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* radeonsi: only update vertex buffers when they need updatingMarek Olšák2014-07-183-2/+22
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove nr_vertex_buffersMarek Olšák2014-07-183-6/+23
| | | | | | | | Unused. Also inline util_set_vertex_buffers_count and simplify it. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: move vertex buffer descriptors from IB to memoryMarek Olšák2014-07-187-106/+133
| | | | | | | | | | This removes the intermediate storage (pm4 state) and generates descriptors directly in a staging buffer. It also reduces the number of flushes, because the descriptors no longer take CS space. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add support for fine-grained sampler view updatesMarek Olšák2014-07-183-30/+21
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: move si_set_sampler_views to si_descriptors.cMarek Olšák2014-07-183-73/+68
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: move sampler descriptors from IB to memoryMarek Olšák2014-07-185-82/+82
| | | | | | | | | | | | | | Sampler descriptors are now represented by si_descriptors. This also adds support for fine-grained sampler state updates and the border color update is now isolated in a separate function. Border colors have been broken if texturing from multiple shader stages is used. This patch doesn't change that. BTW, blitting already makes use of fine-grained state updates. u_blitter uses 2 textures at most, so we only have to save 2. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: implement ARB_draw_indirectMarek Olšák2014-07-185-17/+128
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: don't add info->start to the index buffer offsetMarek Olšák2014-07-181-11/+25
| | | | | | | info->start will be invalid once info->indirect isn't NULL, so it shouldn't be added to ib.offset. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: use an SGPR instead of VGT_INDX_OFFSETMarek Olšák2014-07-184-14/+23
| | | | | | | | The draw indirect packets cannot set VGT_INDX_OFFSET, they can only set user data SGPRs. This is the only way to support start/index_bias with indirect drawing. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: assume LLVM 3.4.2 is always presentMarek Olšák2014-07-186-56/+7
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0Marek Olšák2014-07-186-3/+29
| | | | | | | Most (all?) Unigine shaders fail to compile without this if sample shading is advertised. This is, of course, Unigine developers' fault. Reviewed-by: Brian Paul <[email protected]>
* glsl: add a mechanism to allow #extension directives in the middle of shadersMarek Olšák2014-07-184-0/+17
| | | | | | | | | | | This is needed to make Unigine Heaven 4.0 and Unigine Valley 1.0 work with sample shading. Also, if this is disabled, the error message at least makes sense now. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* r600g: Implement GL_ARB_texture_gatherGlenn Kennard2014-07-182-7/+42
| | | | | | | | | | | | Only supported on evergreen and later. Currently limited to single component textures as the hardware GATHER4 instruction ignores texture swizzles. Piglit quick run passes on radeon 6670 with all applicable textureGather tests, no regressions. Signed-off-by: Glenn Kennard <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()Anuj Phogat2014-07-171-2/+2
| | | | | | | | | | | | | | | | The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL as base internal format and non-zero x, y offsets. Currently x, y offsets are ignored while updating the texture image. Fixes Khronos GLES3 CTS tests: npot_tex_sub_image_2d npot_tex_sub_image_3d npot_pbo_tex_sub_image_2d npot_pbo_tex_sub_image_2d Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* Revert "i965: Extend compute-to-mrf pass to understand blocks of MOVs"Anuj Phogat2014-07-171-53/+10
| | | | | | | | | | | | | | This reverts commit bbefb15e01e1c16af69646898918982ae00f8c92. Fixes the 11 regressions caused in framebuffer_blit tests in Khronos GLES3 CTS tests: Original patch reduced the instruction count but had no performance benefits. So, it's safe to revert it without causing any performance regressions. Signed-off-by: Anuj Phogat <[email protected]> Acked-by: Kristian Høgsberg <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i915: Fix up intelInitScreen2 for DRI3Adel Gadllah2014-07-171-1/+2
| | | | | | | | | | | | | | | | | Commit 442442026eb updated both i915 and i965 for DRI3 support, but one check in intelInitScreen2 was missed for i915 causing crashes when trying to use i915 with DRI3. So fix that up. Reported-by: Igor Gnatenko <[email protected]> References: https://bugzilla.redhat.com/show_bug.cgi?id=1115323 References: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754297 Tested-by: František Zatloukal <[email protected]> Tested-by: Dirk Griesbach <[email protected]> Signed-off-by: Adel Gadllah <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Cc: "10.2" <[email protected]>
* mesa: Fix regression introduced by commit "mesa: fix packing of float texels ↵Pavel Popov2014-07-181-8/+8
| | | | | | | | | | | | | | | to GL_SHORT/GL_BYTE". This commit "mesa: fix packing of float texels to GL_SHORT/GL_BYTE" replaced *_TO_BYTE to *_TO_BYTE_TEX because *_TO_FLOAT_TEX are used to unpack the texels to floats. In this case *_TO_FLOATZ in function extract_float_rgba also should be replaced to *_TO_FLOAT_TEX. Underline that these macros automatically preserve zero when converting. The regression was observed on 3 oglconform tests: snorm-textures basic.getTexImage snorm-textures advanced.mipmap.manual.getTex snorm-textures advanced.mipmap.upload.getTex Signed-off-by: Pavel Popov <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* nv50: fix build failure on m68k due to invalid struct alignment assumptionsThorsten Glaser2014-07-171-0/+5
| | | | | | | | Make alignment assumptions explicit by inserting correct padding with unknown struct members. Reviewed-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* clover: Call end_query before getting timestamp result v2Tom Stellard2014-07-171-0/+1
| | | | | | | | | | | v2: - Move the end_query() call into the timestamp constructor. - Still pass false as the wait parameter to get_query_result(). Reviewed-by: Niels Ole Salscheider <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> CC: "10.2" <[email protected]>
* glsl: handle a switch where default is in the middle of casesTapani Pälli2014-07-172-3/+83
| | | | | | | | | | | | | | | | | | | | | This fixes following tests in es3conform: shaders.switch.default_not_last_dynamic_vertex shaders.switch.default_not_last_dynamic_fragment and makes following tests in Piglit pass: glsl-1.30/execution/switch/fs-default-notlast-fallthrough glsl-1.30/execution/switch/fs-default_notlast No Piglit regressions. v2: take away unnecessary ir_if, just use conditional assignment v3: use foreach_in_list instead of foreach_list Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (v2) Reviewed-by: Ian Romanick <[email protected]> (v3)
* glsl: Make the tree rebalancer use vector_elements, not components().Kenneth Graunke2014-07-161-2/+2
| | | | | | | | | | | components() includes matrix columns, so if this code encountered a matrix, it would ask for something like a vec9 or vec16. This is clearly not what you want. Earlier code now prevents this from seeing matrices, but we should still use vector_elements, for clarity. Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Guard against error_type in the tree rebalancer.Kenneth Graunke2014-07-161-1/+3
| | | | | | This helped me track down the bug fixed in the previous commit. Signed-off-by: Kenneth Graunke <[email protected]>
* glsl: Make the tree rebalancer bail on matrix operands.Kenneth Graunke2014-07-161-1/+3
| | | | | | | | | It doesn't handle things like (vector * matrix) correctly, and apparently Matt's intention was to bail. Fixes shader compilation in Natural Selection 2. Signed-off-by: Kenneth Graunke <[email protected]>
* Revert "i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams."Kenneth Graunke2014-07-162-26/+7
| | | | | | | | | | | | | | | | This reverts commit 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5. This caused GPU hangs on Ivybridge for some users and huge (80%) performance regressions across the board on multiple platforms. We need to find a better solution. I've made several attempts, but none of them have worked yet. In the meantime, we should revert this. Reverting it breaks GL_PRIMITIVES_GENERATED for non-zero streams, but that's okay, since we don't expose GL_ARB_gpu_shader5 yet. Fixes Piglit's EXT_transform_feedback/generatemipmap prims_generated test case on Haswell.
* ilo: add some missing formatsChia-I Wu2014-07-161-21/+22
| | | | Map more pipe formats to hardware formats. Enable more VB formats on Haswell.
* ilo: update and tailor the surface format tableChia-I Wu2014-07-161-286/+258
| | | | | Recreate the table from scratch with the help of a pdf-table-to-csv converter. Switch to a form that is more suitable for ilo.
* i965: Don't copy propagate abs into Broadwell logic instructions.Kenneth Graunke2014-07-152-12/+6
| | | | | | | | | | | | It's not clear what abs on logical instructions means on Broadwell, and it doesn't appear to do anything sensible. Fixes 270 Piglit tests (the bitand/bitor/bitxor tests with abs). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81157 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/fs: Use WE_all for gl_SampleID header register munging.Kenneth Graunke2014-07-151-5/+9
| | | | | | | | | | | | This code should execute without regard to the currently executing channels. Asking for gl_SampleID inside control flow might break in strange ways. It appears to break even at the top of the program in SIMD16 mode occasionally as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965/fs: Set force_uncompressed and force_sechalf on samplepos setup.Kenneth Graunke2014-07-151-6/+8
| | | | | | | | | | | | | | gen8_fs_generator uses these to decide whether to set the execution size to 8 or 16, so we incorrectly made both of these MOVs the full width in SIMD16 shaders. (It happened to work out on Gen4-7.) Setting them should also help inform optimization passes what's really going on, which could help avoid bugs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965: Set execution size to 8 for instructions with force_sechalf set.Kenneth Graunke2014-07-151-1/+1
| | | | | | | | | | | | | | | | | Both inst->force_uncompressed and inst->force_sechalf mean that the generated instruction should be uncompressed and have an execution size of 8. We don't require the visitor to set both flags - setting inst->force_sechalf by itself is supposed to be enough. On Gen4-7, guess_execution_size() demoted instructions to 8-wide based on the default compression state. On Gen8+, we instead set a default execution size, which worked great...except that we forgot to check inst->force_sechalf when deciding whether to use 8 or 16. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* nvc0: fix translate path for PRIM_RESTART_WITH_DRAW_ARRAYSChristoph Bumiller2014-07-151-13/+28
| | | | Reviewed-by: Ilia Mirkin <[email protected]>