summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* nir: Simplify 0 >= b2f(a)Ian Romanick2016-03-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | This also prevented some regressions with other patches in my local tree. Broadwell / Skylake total instructions in shared programs: 8980835 -> 8980833 (-0.00%) instructions in affected programs: 45 -> 43 (-4.44%) helped: 1 HURT: 0 total cycles in shared programs: 70077904 -> 70077900 (-0.00%) cycles in affected programs: 122 -> 118 (-3.28%) helped: 1 HURT: 0 No changes on earlier platforms. v2: Modify the comments to look more like a proof. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Simplify i2b with negated or abs operandIan Romanick2016-03-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables removing ssa_201 and ssa_202 in sequences like: vec1 ssa_200 = flt ssa_199, ssa_194 vec1 ssa_201 = b2i ssa_200 vec1 ssa_202 = i2b -ssa_201 shader-db results: Sandy Bridge total instructions in shared programs: 8462257 -> 8462180 (-0.00%) instructions in affected programs: 3846 -> 3769 (-2.00%) helped: 35 HURT: 0 total cycles in shared programs: 117542934 -> 117542462 (-0.00%) cycles in affected programs: 20072 -> 19600 (-2.35%) helped: 20 HURT: 1 Ivy Bridge total instructions in shared programs: 7775252 -> 7775137 (-0.00%) instructions in affected programs: 3645 -> 3530 (-3.16%) helped: 35 HURT: 0 total cycles in shared programs: 65760522 -> 65760068 (-0.00%) cycles in affected programs: 21082 -> 20628 (-2.15%) helped: 25 HURT: 2 Haswell total instructions in shared programs: 7108666 -> 7108589 (-0.00%) instructions in affected programs: 3253 -> 3176 (-2.37%) helped: 35 HURT: 0 total cycles in shared programs: 64675726 -> 64675272 (-0.00%) cycles in affected programs: 21034 -> 20580 (-2.16%) helped: 26 HURT: 1 Broadwell / Skylake total instructions in shared programs: 8980912 -> 8980835 (-0.00%) instructions in affected programs: 3223 -> 3146 (-2.39%) helped: 35 HURT: 0 total cycles in shared programs: 70077926 -> 70077904 (-0.00%) cycles in affected programs: 21886 -> 21864 (-0.10%) helped: 21 HURT: 6 G45 and Ironlake showed no change. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Lower flrp with Boolean interpolator to bcselIan Romanick2016-03-221-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | On Intel platforms that don't set lower_flrp, using bcsel instead of flrp seems to be a small amount worse. On those platforms, the use of flrp, bcsel, and multiply of b2f is still an active area of research. In review, Matt suggested this is because bcsel turns into CMP+SEL, and because of the flag register we can't schedule instructions well. shader-db results: G4X / Ironlake total instructions in shared programs: 4016538 -> 4012279 (-0.11%) instructions in affected programs: 161556 -> 157297 (-2.64%) helped: 1077 HURT: 1 total cycles in shared programs: 84328296 -> 84315862 (-0.01%) cycles in affected programs: 4174570 -> 4162136 (-0.30%) helped: 926 HURT: 53 Unsurprisingly, no changes on later platforms. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Have NIR lower flrp on pre-GEN6 vec4 backendIan Romanick2016-03-221-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were doing the lowering by hand in vec4_visitor::emit_lrp. By doing it in NIR, we have the opportunity for NIR to do additional optimization of the expanded code. This also enables optimizations added by the next commit. shader-db results: G4X / Ironlake total instructions in shared programs: 4024401 -> 4016538 (-0.20%) instructions in affected programs: 447686 -> 439823 (-1.76%) helped: 2623 HURT: 0 total cycles in shared programs: 84375846 -> 84328296 (-0.06%) cycles in affected programs: 16964960 -> 16917410 (-0.28%) helped: 2556 HURT: 41 Unsurprisingly, no changes on later platforms. v2: Formatting and comment changes suggested by Matt. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* swrast: fix discarded const warning in s_texture.cBrian Paul2016-03-221-1/+1
| | | | Signed-off-by: Brian Paul <brianp@vmware.com>
* i965: fix invalid memory writeMarc-André Lureau2016-03-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed some heap corruption running virgl tests, and valgrind helped me to track it down to the following error: ==29272== Invalid write of size 4 ==29272== at 0x90283D4: push_loop_stack (brw_eu_emit.c:1307) ==29272== by 0x9029A7D: brw_DO (brw_eu_emit.c:1750) ==29272== by 0x90554B0: fs_generator::generate_code(cfg_t const*, int) (brw_fs_generator.cpp:1999) ==29272== by 0x904491F: brw_compile_fs (brw_fs.cpp:5685) ==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137) ==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638) ==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51) ==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260) ==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006) ==29272== by 0x8C84325: _mesa_link_program (shaderapi.c:1042) ==29272== by 0x8C851D7: _mesa_LinkProgram (shaderapi.c:1515) ==29272== by 0x4E4B8E8: add_shader_program (vrend_renderer.c:880) ==29272== Address 0xf2f3cb0 is 0 bytes after a block of size 112 alloc'd ==29272== at 0x4C2AA98: calloc (vg_replace_malloc.c:711) ==29272== by 0x8ED11F7: ralloc_size (ralloc.c:113) ==29272== by 0x8ED1282: rzalloc_size (ralloc.c:134) ==29272== by 0x8ED14C0: rzalloc_array_size (ralloc.c:196) ==29272== by 0x9019C7B: brw_init_codegen (brw_eu.c:291) ==29272== by 0x904F565: fs_generator::fs_generator(brw_compiler const*, void*, void*, void const*, brw_stage_prog_data*, unsigned int, bool, gl_shader_stage) (brw_fs_generator.cpp:124) ==29272== by 0x9044883: brw_compile_fs (brw_fs.cpp:5675) ==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137) ==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638) ==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51) ==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260) ==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006) if_depth_in_loop is an array of size p->loop_stack_array_size, and push_loop_stack() will access if_depth_in_loop[p->loop_stack_depth+1], thus the condition to grow the array should be p->loop_stack_array_size <= (p->loop_stack_depth + 1) (it's currently off by 2...) This can be reproduced by running the following test with virgl test server: LIBGL_ALWAYS_SOFTWARE=y GALLIUM_DRIVER=virpipe bin/shader_runner ./tests/shaders/glsl-fs-unroll-explosion.shader_test -auto Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* tgsi: drop unused set_exec/kill_mask interfaces.Dave Airlie2016-03-223-37/+0
| | | | | | | | | These don't get used and haven't been in git history from what I can see, so drop them. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* st/mesa: add support for internalformat query2.Dave Airlie2016-03-222-10/+45
| | | | | | | | Add code to handle GL_INTERNALFORMAT_PREFERRED. Add code to deal with GL_RENDERBUFFER being passes into ChooseTextureFormat. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* i965: Fix assert conditions for src/dst x/y offsetsAnuj Phogat2016-03-211-3/+3
| | | | | | Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* swrast: Move assert for 'slice' in to check_map_teximageAnuj Phogat2016-03-211-1/+1
| | | | | Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
* r600/sb: Do not distribute neg in expr_handler::fold_assoc() when folding ↵xavier2016-03-221-2/+6
| | | | | | | | | | | | | | | multiplications. Previously it was doing this transformation for a Trine 3 shader: MUL R6.x.12, R13.x.23, 0.5|3f000000 - MULADD R4.x.12, -R6.x.12, 2|40000000, 1|3f800000 + MULADD R4.x.12, -R13.x.23, -1|bf800000, 1|3f800000 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94412 Signed-off-by: Xavier Bouchoux <xavierb@gmail.com> Cc: "11.0 11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* nvc0: make sure to delete samplers used by compute shadersSamuel Pitoiset2016-03-211-1/+1
| | | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
* i965/blorp: Make BlitFramebuffer() do sRGB encoding in ES 3.x.Kenneth Graunke2016-03-211-1/+4
| | | | | | | | | | | | | | | | | | According to the ES 3.0 and GL 4.4 specifications, glBlitFramebuffer is supposed to perform sRGB decoding and encoding whenever sRGB formats are in use. The ES 3.0 specification is completely clear, and has always stated this. However, the GL specification has changed behavior in 4.1, 4.2, and 4.4. The original behavior stated that no sRGB encoding should occur. The 4.4 behavior matches ES 3.0's wording. However, implementing the new behavior appears to break applications such as Left 4 Dead 2. This patch changes Meta to apply the ES 3.x rules in ES 3.x, but leaves OpenGL alone for now, to avoid breaking applications. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/blorp: Refactor sRGB encoding/decoding.Kenneth Graunke2016-03-214-11/+23
| | | | | | | | | | | | Because the rules for sRGB are so insane, we change brw_blorp_miptrees to take decode_srgb and encode_srgb flags, which control linearization of the source and destination separately. This should make it easy to implement whatever crazy combination of rules people throw at us. For now, it should be equivalent. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* meta: Make BlitFramebuffer() do sRGB encoding in ES 3.x.Kenneth Graunke2016-03-213-9/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | According to the ES 3.0 and GL 4.4 specifications, glBlitFramebuffer is supposed to perform sRGB decoding and encoding whenever sRGB formats are in use. The ES 3.0 specification is completely clear, and has always stated this. However, the GL specification has changed behavior in 4.1, 4.2, and 4.4. The original behavior stated that no sRGB encoding should occur. The 4.4 behavior matches ES 3.0's wording. However, implementing the new behavior appears to break applications such as Left 4 Dead 2. This patch changes Meta to apply the ES 3.x rules in ES 3.x, but leaves OpenGL alone for now, to avoid breaking applications. Meta implements several other functions in terms of BlitFramebuffer, and many of those explicitly do not perform sRGB encoding. So, this patch explicitly disables sRGB encoding in those other functions, preserving the existing (correct) behavior. If you're from the future and are reading this, hi! Welcome to the "fun" of debugging sRGB problems! Best of luck! Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* radeonsi: Set PIPE_SHADER_CAP_MAX_SHADER_IMAGESEdward O'Callaghan2016-03-211-1/+2
| | | | | | | | | This enables ARB_shader_image_load_store and ARB_shader_image_size. Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> [allow the same number of images for all shader stages and require LLVM 3.9] Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: disable early Z if the fragment shader writes to memoryNicolai Hähnle2016-03-211-2/+12
| | | | | | Empirically, both the EXEC_ON_* flags and LATE_Z are necessary. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* tgsi/scan: add writes_memory to flag presence of stores or atomicsNicolai Hähnle2016-03-212-4/+9
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: force the DCC enable bit off in image descriptors for writing (v2)Nicolai Hähnle2016-03-211-8/+49
| | | | | | | | This avoids a lockup at least on Tonga. v2: only force DCC off on VI+ (Marek) Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: implement MemoryBarrier (v2)Nicolai Hähnle2016-03-211-0/+37
| | | | | | v2: invalidate both constant and VMEM/TC L1 for constant buffers (Marek) Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: implement volatile memory accessNicolai Hähnle2016-03-211-0/+4
| | | | | | | | | | Prevent loads from being re-ordered or coalesced. Atomics don't need special handling by definition, and stores don't need special handling because LLVM is unable to detect dead image or buffer stores. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: implement coherent memory access (v2)Nicolai Hähnle2016-03-211-4/+13
| | | | | | v2: set glc=1 for volatile also on buffers Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Lower TGSI_OPCODE_MEMBAR down to LLVM opNicolai Hähnle2016-03-211-0/+31
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Lower TGSI_OPCODE_ATOM* down to LLVM opNicolai Hähnle2016-03-211-8/+113
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Lower TGSI_OPCODE_STORE down to LLVM opNicolai Hähnle2016-03-211-3/+80
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Lower TGSI_OPCODE_LOAD down to LLVM op (v3)Nicolai Hähnle2016-03-211-0/+139
| | | | | | | v2: new signature style for buffer intrinsics (offsets) v3: new signature style for llvm.amdgcn.buffer.load.format (overloaded return) Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
* radeonsi: extract the LLVM type name construction into its own functionNicolai Hähnle2016-03-211-7/+19
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Lower TGSI_OPCODE_RESQ down to LLVM opNicolai Hähnle2016-03-211-0/+129
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: extract TXQ buffer size computation into its own functionNicolai Hähnle2016-03-211-20/+35
| | | | | | This will allow it to be reused for RESQ. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: decompress shader imagesNicolai Hähnle2016-03-211-3/+33
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: update shader image descriptor for invalidated bufferNicolai Hähnle2016-03-211-1/+21
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: implement set_shader_images (v2)Nicolai Hähnle2016-03-216-29/+254
| | | | | | | | | Whether DCC is disabled depends on the access flags with which the image is bound: image_load supports DCC, but store and atomic don't. v2: remove an unnecessary masking of images->desc.enabled_mask Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallium/radeon: make r600_texture_disable_dcc externally accessibleNicolai Hähnle2016-03-212-2/+4
| | | | | | We will need it in radeonsi for shader images. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* tgsi/scan: track which shader images are really buffersNicolai Hähnle2016-03-212-0/+7
| | | | | Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* tgsi/scan: add images_writemaskNicolai Hähnle2016-03-212-2/+21
| | | | | Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* st/mesa: translate additional flags in MemoryBarrierNicolai Hähnle2016-03-211-3/+18
| | | | | | | | Re-order flags in the order in which they appear in the OpenGL spec in the description of MemoryBarrier(). Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallium: add additional PIPE_BARRIER_* bitsNicolai Hähnle2016-03-211-0/+7
| | | | | Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* svga: add svga_winsys_context::pipe_debug_callback pointerBrian Paul2016-03-212-2/+9
| | | | | | | | The svga winsys modules can use this to send debug messages to the state tracker and Mesa. Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>
* svga: Fix the index buffer rebind regressionCharmaine Lee2016-03-211-7/+2
| | | | | | | | | | | | | The index buffer handle saved in the hw_state structure could be invalid after the index buffer is destroyed. Instead of rebinding the index buffer using the saved index buffer handle, we will reset the index buffer handle in the hw_state structure to force resending of the index buffer. Fixes bug 1593320 Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* svga: rebind stream output targetsCharmaine Lee2016-03-213-0/+27
| | | | | | | | | To ensure stream output target surfaces are available for the draw commands, we need to rebind the current stream output targets at the first draw in the command buffer. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* svga: rebind index bufferCharmaine Lee2016-03-212-0/+9
| | | | | | | | | | | Similar to other resources, current index buffer needs to be rebound at the first draw of the current command buffer to make sure the buffer is available for the draw command. Fixes bug 1587263. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* svga: minor formatting fix, comment additionBrian Paul2016-03-211-1/+4
| | | | | | To sync with our internal tree. Signed-off-by: Brian Paul <brianp@vmware.com>
* svga: optimize constant buffer uploadsCharmaine Lee2016-03-211-1/+11
| | | | | | | | | | | | | | | | | When a constant buffer slot is allocated in the upload buffer, the allocated slot size is always in multiple of 256. But the actual buffer size might not be in multiple of 256. This causes a gap between the ending offset of a slot and the starting offset of the next slot. The gap will prevent the two slots to be updated in a single update command. In order to maximize the chance of merging the contiguous dirty ranges, when a slot is to be allocated in the constant upload buffer, specify a buffer size in multiple of 256. There is about 10% performance improvement with Lightsmark2008 and 30% with Cinebench R11. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
* svga: add a few more resource updates HUD queryCharmaine Lee2016-03-216-22/+91
| | | | | | | | | | | | This patch adds the following HUD queries: .num-resource-updates -- number of resource update. Commands include UPDATE_SUBRESOURCE, UPDATE_GB_IMAGE. .num-buffer-uploads -- number of buffer uploads. .num-const-buf-updates -- number of set constant buffer. .num-const-updates -- number of set shader constant. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
* svga: add new num-readbacks HUD queryCharmaine Lee2016-03-215-7/+24
| | | | | | | To find out how many image readback command is issued. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
* svga: use shader sampler view declarationsBrian Paul2016-03-216-47/+74
| | | | | | | | | | | | | | Previously, we looked at the bound textures (via the pipe_sampler_views) to determine texture dimensions (1D/2D/3D/etc) and datatype (float vs. int). But this could fail in out of memory conditions. If we failed to allocate a texture and didn't create a pipe_sampler_view, we'd default to using 0 (PIPE_BUFFER) as the texture type. This led to device errors because of inconsistent shader code. This change relies on all TGSI shaders having an SVIEW declaration for each SAMP declaration. The previous patch series does that. Reviewed-by: Charmaine Lee <charmainel@vmware.com>
* gallium/tests: declare sampler views in shadersBrian Paul2016-03-213-0/+3
| | | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>
* gallium/util: declare sampler view in util_make_fs_blit_msaa_depthstencil()Brian Paul2016-03-211-1/+2
| | | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>
* postprocess: declare sampler views in shadersBrian Paul2016-03-212-0/+9
| | | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>
* hud: add sampler view declaration in text fragment shaderBrian Paul2016-03-211-0/+1
| | | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>