summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* main/get_hash_params: Add GL_SAMPLE_SHADING_ARBJason Ekstrand2014-07-291-0/+1
| | | | | | | | | | | GL_SAMPLE_SHADING is specified as a valid pname for glGet in the GL_ARB_sample_shading extension. It seems as if we forgot to add it to the table of pnames. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* xmlconfig: Use program_invocation_short_name when building for cygwinYaakov Selkowitz2014-07-291-0/+2
| | | | | | | | | mesa/mesa/src/mesa/drivers/dri/common/xmlconfig.c:104:10: warning: #warning "Per application configuration won't work with your OS version." [-Wcpp] # warning "Per application configuration won't work with your OS version." Signed-off-by: Yaakov Selkowitz <[email protected]> Reviewed-by: Jon TURNEY <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* glapi: add indexed blend functions (GL 4.0)Tapani Pälli2014-07-281-5/+5
| | | | | | | | | | | This makes some of the UE4 engine demos (Stylized, Mobile Temple) render correctly, tested on Intel Haswell machine. Signed-off-by: Tapani Pälli <[email protected]> Acked-by: Anuj Phogat <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78716
* gallium: rename shader cap MAX_CONSTS to MAX_CONST_BUFFER_SIZEMarek Olšák2014-07-281-2/+3
| | | | | | | | | | This new name isn't so confusing. I also changed the gallivm limit, because it looked wrong. Reviewed-by: Brian Paul <[email protected]> v2: use sizeof(float[4])
* main/cs: Add additional compute shader constant valuesJordan Justen2014-07-272-0/+18
| | | | | | | | With MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader, this fixes piglit: * arb_compute_shader-minmax Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* Add an accelerated version of F_TO_I for x86_64Jason Ekstrand2014-07-241-1/+5
| | | | | | | | | | | | | According to a quick micro-benchmark, this new version is 20% faster on my Haswell laptop. v2: Removed the XXX note about x86_64 from the comment v3: Use an intrinsic instead of an __asm__ block. This should give us MSVC support for free. v4: Enable it for all x86_64 builds, not just with USE_X86_64_ASM Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Decide predicate/predicate_inverse outside of the for loop.Matt Turner2014-07-241-9/+14
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Swap if/else conditions in SEL peephole.Matt Turner2014-07-241-3/+3
| | | | | | Will clarify make the next commit easier to read. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Improve dead control flow elimination.Matt Turner2014-07-241-10/+15
| | | | | | | | ... to eliminate an ELSE instruction followed immediately by an ENDIF. instructions in affected programs: 704 -> 700 (-0.57%) Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/st: add support for interpolate_at_* opsIlia Mirkin2014-07-241-3/+9
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* mesa: add ARB_clear_texture.xml to file list, remove duplicate declsIlia Mirkin2014-07-241-12/+0
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965: Accelerate uploads of RGBA and BGRA GL_UNSIGNED_INT_8_8_8_8_REV texturesJason Ekstrand2014-07-231-1/+5
| | | | | | | | | | Since intel is always going to be little-endian, GL_UNSIGNED_INT_8_8_8_8_REV is the same as GL_UNSIGNED_BYTE for RGBA and BGRA textures, so the same acceleration code will work. We might as well use it. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Fix the name in the error messageIan Romanick2014-07-231-1/+1
| | | | | | | Obvious copy-and-paste bug. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Set LastRT on the final FB write on Broadwell.Kenneth Graunke2014-07-231-4/+2
| | | | | | | | | | | | | | | | | In Piglit's EXT_framebuffer_multisample/alpha-to-coverage-dual-src-blend test, key->nr_color_regions == 2, but the dual source blend FB write has ir->target set to 0. So we failed to set "Last Render Target Select" on any FB write message. We only emit one FB write per render target, so my comment about setting LastRT on every FB write directed at the last color region is a bit... misinformed. According to the documentation, depth buffer writes and scoreboard updates happen on the FB write with LastRT set, so I believe we want to set it only once. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Port INTEL_DEBUG=optimizer to the vec4 backend.Kenneth Graunke2014-07-231-6/+36
| | | | | | | Largely via copy and paste. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Save the gl_shader_stage enum in backend_visitor.Kenneth Graunke2014-07-232-1/+4
| | | | | | | | | This will be useful for INTEL_DEBUG=optimizer in the vec4 backend, which needs to know whether it's currently processing a VS or GS. It isn't worth adding virtual methods for this case. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Don't print WE_normal in disassembly.Kenneth Graunke2014-07-231-1/+1
| | | | | | | | | Dropping this helps most lines fit in an 80 column terminal. The absence of WE_normal also helps call attention to WE_all, where something unusual is going on. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta: Add a meta implementation of GL_ARB_clear_textureNeil Roberts2014-07-234-0/+198
| | | | | | | | | | | | | | | | | | | | Adds an implementation of the ClearTexSubImage driver entry point that tries to set up an FBO to render to the texture and then calls glClearBuffer with a scissor to perform the actual clear. If an FBO can't be created for the texture then it will fall back to using _mesa_store_ClearTexSubImage. When used in combination with _mesa_store_ClearTexSubImage this should provide an implementation that works for all DRI-based drivers. However as this has only been tested with the i965 driver it is currently only enabled there. v2: Only enable the extension for the i965 driver instead of all DRI drivers. Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add some more comments. v3: Use glClearBuffer* to avoid having to modify glClearColor and friends. Handle sRGB textures. Explicitly disable dithering. Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
* meta: Add a state flag for the GL_DITHERNeil Roberts2014-07-232-0/+12
| | | | | | | | | The Meta implementation of glClearTexSubImage is going to want to ensure that dithering is disabled so that it can get a consistent color across the whole texture when clearing. This adds a state flag to easily save it and set it to the default value when performing meta operations. Reviewed-by: Topi Pohjolainen <[email protected]>
* texstore: Add a generic implementation of GL_ARB_clear_textureNeil Roberts2014-07-232-0/+79
| | | | | | | | | Adds an implmentation of the ClearTexSubImage driver entry point that just maps the texture and writes the values in. The extension is not yet enabled by default because it doesn't work with multisample textures as they don't have a simple linear layout. Reviewed-by: Jason Ekstrand <[email protected]>
* mesa/main: Add generic bits of ARB_clear_texture implementationNeil Roberts2014-07-233-1/+271
| | | | | | | | | This adds the driver entry point for glClearTexSubImage and fills in the _mesa_ClearTexImage and _mesa_ClearTexSubImage functions that call it. v2: Don't clear some of the images if only one of them makes an error Reviewed-by: Jason Ekstrand <[email protected]>
* teximage: Add utility func for format/internalFormat compatibility checkNeil Roberts2014-07-231-21/+38
| | | | | | | In texture_error_check() there was a snippet of code to check whether the given format and internal format are basically compatible. This has been split out into its own static helper function so that it can be used by an implementation of glClearTexImage too.
* mesa/main: add ARB_clear_texture entrypointsIlia Mirkin2014-07-235-0/+32
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Neil Roberts <[email protected]>
* mesa: Don't use memcpy() in _mesa_texstore() for float depth texture dataAnuj Phogat2014-07-211-0/+15
| | | | | | | | | | | | | | | | because float depth texture data needs clamping to [0.0, 1.0]. Let the _mesa_texstore() fallback to slower path. Fixes Khronos GLES3 CTS tests: shadow_execution_vert shadow_execution_frag V2: Move the check to _mesa_texstore_can_use_memcpy() function. Add check for floating point data types. Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965/fs: Fix gl_SampleMask handling for SIMD16 on Gen8+.Kenneth Graunke2014-07-211-5/+0
| | | | | | | | | | | | We actually want to use mov(16), not mov(8). Fixes 7 Piglit tests: ARB_sample_shading/builtin-gl-sample-mask [2468] and ARB_sample_shading/builtin-gl-sample-mask-simple [468]. Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991 Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.Kenneth Graunke2014-07-213-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | We might be able to do this without an extra program key field, but this is non-invasive and fixes the bug, for now. This fixes the following Piglit tests on Broadwell: - ARB_sample_shading/builtin-gl-sample-id 2 - ARB_sample_shading/builtin-gl-sample-position 2 - EXT_framebuffer_multisample/multisample-blit 2 color - EXT_framebuffer_multisample/multisample-blit 2 color linear - EXT_framebuffer_multisample/multisample-blit 2 depth - EXT_framebuffer_multisample/no-color 2 depth combined - EXT_framebuffer_multisample/no-color 2 depth separate - EXT_framebuffer_multisample/no-color 2 depth single - EXT_framebuffer_multisample/no-color 2 depth-computed combined - EXT_framebuffer_multisample/no-color 2 depth-computed separate - EXT_framebuffer_multisample/no-color 2 depth-computed single - EXT_framebuffer_multisample/unaligned-blit 2 color msaa - EXT_framebuffer_multisample/unaligned-blit 2 depth msaa Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991 Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing persample_shading field to brw_wm_debug_recompile.Kenneth Graunke2014-07-211-0/+2
| | | | | | | | Otherwise, the performance warning for shader recompiles will just say "something else". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/disasm: Don't disassemble the URB complete field on Broadwell.Kenneth Graunke2014-07-211-2/+4
| | | | | | | | It doesn't exist, so attempting to read it will trigger generation assertions in the brw_inst API. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Disable hex offset printing in disassembly.Kenneth Graunke2014-07-211-1/+2
| | | | | | | | | | | | | | | Printing the hex offsets makes it basically impossible to diff assembly: if you add even a single instruction, the entire shader shows up as a difference. So, every time I want to compare assembly, I have to strip this out. The hex offsets might be useful when debugging compaction, or when inspecting the program cache buffer. Since it's occasionally useful, but uncommon, this patch disables it by default, but makes it easy to re-enable it temporarily when the need arises. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Use foreach_inst_in_block a couple more places.Matt Turner2014-07-212-8/+2
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Replace cfg instances with calls to calculate_cfg().Matt Turner2014-07-215-22/+22
| | | | | | | | | | | Avoids regenerating it unnecessarily. Every program in shader-db improved, none by an amount less than a 1/3 reduction. One Dota2 shader decreased from 62 -> 24. cfg calculations: 429492 -> 193197 (-55.02%) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Add a foreach_block_and_inst macro.Matt Turner2014-07-211-0/+4
| | | | | | Will let us abstract how the instructions are stored. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add cfg to backend_visitor.Matt Turner2014-07-219-33/+48
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Silence unused parameter warningIan Romanick2014-07-191-1/+1
| | | | | | | brw_fs_visitor.cpp:2400:1: warning: unused parameter 'ir' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence 'comparison is always true' warningIan Romanick2014-07-191-2/+0
| | | | | | | | | | | | | The parameter is an int16_t, and we're check that it's value will fit in 16-bits. Yes, the value that is stored in 16-bits will surely fit in 16-bits. brw_inst.h: In function 'brw_inst_set_gen6_jump_count': brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence many unused parameter warningsIan Romanick2014-07-191-0/+10
| | | | | | | | brw_inst.h: In function 'brw_inst_set_src1_vstride': brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* main/format_pack: Fix a wrong datatype in pack_ubyte_R8G8_UNORMJason Ekstrand2014-07-181-1/+1
| | | | | | | | Before it was only storing one of the color components due to truncation. With this patch it now properly stores all of them. Reviewed-by: Brian Paul <[email protected]> Cc: "10.2" <[email protected]>
* Add support for RGBA8 and RGBX8 textures in intel_texsubimage_tiled_memcpyJason Ekstrand2014-07-171-0/+11
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Improve debug output in intelTexImage and intelTexSubimageJason Ekstrand2014-07-172-1/+9
| | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0Marek Olšák2014-07-183-3/+25
| | | | | | | Most (all?) Unigine shaders fail to compile without this if sample shading is advertised. This is, of course, Unigine developers' fault. Reviewed-by: Brian Paul <[email protected]>
* glsl: add a mechanism to allow #extension directives in the middle of shadersMarek Olšák2014-07-181-0/+5
| | | | | | | | | | | This is needed to make Unigine Heaven 4.0 and Unigine Valley 1.0 work with sample shading. Also, if this is disabled, the error message at least makes sense now. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()Anuj Phogat2014-07-171-2/+2
| | | | | | | | | | | | | | | | The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL as base internal format and non-zero x, y offsets. Currently x, y offsets are ignored while updating the texture image. Fixes Khronos GLES3 CTS tests: npot_tex_sub_image_2d npot_tex_sub_image_3d npot_pbo_tex_sub_image_2d npot_pbo_tex_sub_image_2d Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* Revert "i965: Extend compute-to-mrf pass to understand blocks of MOVs"Anuj Phogat2014-07-171-53/+10
| | | | | | | | | | | | | | This reverts commit bbefb15e01e1c16af69646898918982ae00f8c92. Fixes the 11 regressions caused in framebuffer_blit tests in Khronos GLES3 CTS tests: Original patch reduced the instruction count but had no performance benefits. So, it's safe to revert it without causing any performance regressions. Signed-off-by: Anuj Phogat <[email protected]> Acked-by: Kristian Høgsberg <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i915: Fix up intelInitScreen2 for DRI3Adel Gadllah2014-07-171-1/+2
| | | | | | | | | | | | | | | | | Commit 442442026eb updated both i915 and i965 for DRI3 support, but one check in intelInitScreen2 was missed for i915 causing crashes when trying to use i915 with DRI3. So fix that up. Reported-by: Igor Gnatenko <[email protected]> References: https://bugzilla.redhat.com/show_bug.cgi?id=1115323 References: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754297 Tested-by: František Zatloukal <[email protected]> Tested-by: Dirk Griesbach <[email protected]> Signed-off-by: Adel Gadllah <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Cc: "10.2" <[email protected]>
* mesa: Fix regression introduced by commit "mesa: fix packing of float texels ↵Pavel Popov2014-07-181-8/+8
| | | | | | | | | | | | | | | to GL_SHORT/GL_BYTE". This commit "mesa: fix packing of float texels to GL_SHORT/GL_BYTE" replaced *_TO_BYTE to *_TO_BYTE_TEX because *_TO_FLOAT_TEX are used to unpack the texels to floats. In this case *_TO_FLOATZ in function extract_float_rgba also should be replaced to *_TO_FLOAT_TEX. Underline that these macros automatically preserve zero when converting. The regression was observed on 3 oglconform tests: snorm-textures basic.getTexImage snorm-textures advanced.mipmap.manual.getTex snorm-textures advanced.mipmap.upload.getTex Signed-off-by: Pavel Popov <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* Revert "i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams."Kenneth Graunke2014-07-162-26/+7
| | | | | | | | | | | | | | | | This reverts commit 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5. This caused GPU hangs on Ivybridge for some users and huge (80%) performance regressions across the board on multiple platforms. We need to find a better solution. I've made several attempts, but none of them have worked yet. In the meantime, we should revert this. Reverting it breaks GL_PRIMITIVES_GENERATED for non-zero streams, but that's okay, since we don't expose GL_ARB_gpu_shader5 yet. Fixes Piglit's EXT_transform_feedback/generatemipmap prims_generated test case on Haswell.
* i965: Don't copy propagate abs into Broadwell logic instructions.Kenneth Graunke2014-07-152-12/+6
| | | | | | | | | | | | It's not clear what abs on logical instructions means on Broadwell, and it doesn't appear to do anything sensible. Fixes 270 Piglit tests (the bitand/bitor/bitxor tests with abs). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81157 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/fs: Use WE_all for gl_SampleID header register munging.Kenneth Graunke2014-07-151-5/+9
| | | | | | | | | | | | This code should execute without regard to the currently executing channels. Asking for gl_SampleID inside control flow might break in strange ways. It appears to break even at the top of the program in SIMD16 mode occasionally as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965/fs: Set force_uncompressed and force_sechalf on samplepos setup.Kenneth Graunke2014-07-151-6/+8
| | | | | | | | | | | | | | gen8_fs_generator uses these to decide whether to set the execution size to 8 or 16, so we incorrectly made both of these MOVs the full width in SIMD16 shaders. (It happened to work out on Gen4-7.) Setting them should also help inform optimization passes what's really going on, which could help avoid bugs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965: Set execution size to 8 for instructions with force_sechalf set.Kenneth Graunke2014-07-151-1/+1
| | | | | | | | | | | | | | | | | Both inst->force_uncompressed and inst->force_sechalf mean that the generated instruction should be uncompressed and have an execution size of 8. We don't require the visitor to set both flags - setting inst->force_sechalf by itself is supposed to be enough. On Gen4-7, guess_execution_size() demoted instructions to 8-wide based on the default compression state. On Gen8+, we instead set a default execution size, which worked great...except that we forgot to check inst->force_sechalf when deciding whether to use 8 or 16. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]