summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965: Use 8x4 aligned rectangles for HiZ operations on Broadwell.Kenneth Graunke2014-06-161-4/+16
| | | | | | | | | | | | | | | | | Like on Haswell, we need to use 8x4 aligned rectangle primitives for hierarchical depth buffer resolves and depth clears. See the comments in brw_blorp.cpp's brw_hiz_op_params() constructor. (The Broadwell documentation confirms that this is still necessary.) This patch makes the Broadwell code follow the same behavior as Chad and Jordan's Gen7 BLORP code. Based on a patch by Topi Pohjolainen. This fixes es3conform's framebuffer_blit_functionality_scissor_blit test, with no Piglit regressions. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: "10.2" <[email protected]>
* i965: Make INTEL_DEBUG=mip print out whether HiZ is enabled.Kenneth Graunke2014-06-161-0/+2
| | | | | | | | | We only enable HiZ for miplevels which are aligned on 8x4 blocks. When debugging HiZ failures, it's useful to know whether a particular miplevel is using HiZ or not. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* main/extensions: Only parse MESA_EXTENSION_OVERRIDE onceJordan Justen2014-06-161-74/+40
| | | | | | | | | Previously, we would parse MESA_EXTENSION_OVERRIDE each time a context was created. Now we will save the results of that parsing and use it during context initialization. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main/extensions: Build list of extensions that can't be disabledJordan Justen2014-06-161-5/+20
| | | | | | | | This will allow us to utilize the early MESA_EXTENSION_OVERRIDE parsing at the later extension string initialization step. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main/extensions: Create extra extensions override stringJordan Justen2014-06-161-0/+38
| | | | | | | | This will allow us to utilize the early MESA_EXTENSION_OVERRIDE parsing at the later extension string initialization step. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/cs: Use override structure rather than separate env varJordan Justen2014-06-162-4/+2
| | | | | | | | | | | | In 25268b93, we added a new environment variable (INTEL_COMPUTE_SHADER) to allow some constant values to be upgraded for the ARB_compute_shader extension. Now, we can look to see if the extension was enabled via the MESA_EXTENSION_OVERRIDE environment variable. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main/extensions: Add early extension override structuresJordan Justen2014-06-163-0/+59
| | | | | | | | | | | | | | | | | | | | | | | During the early one_time_init phase of context creation, we initialize two global gl_extensions structures. We read the MESA_EXTENSION_OVERRIDE environment variable, and store positive and negative overrides in two structures: * struct gl_extensions _mesa_extension_override_enables * struct gl_extensions _mesa_extension_override_disables These are filled before the driver initializes extensions and constants, therefore the driver can make adjustments based on the desired overrides. This can be useful during development of a new extension where the extension is only partially ready. The driver can't actually advertise support for the extension, but if it sees that the override is set for the extension, then it can expose more supported parts of the extension, such as upgrading context constants. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main/extensions: Create a context-less set_extensions functionJordan Justen2014-06-161-5/+20
| | | | | | | | | | | | We will add new gl_extensions structures that capture the environment variable extension overrides and are available early in context creation. This will allow a driver to take actions during its initialization based on the extension overrides. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main/extensions: Don't advertise unknown extensions overrides with (-)Jordan Justen2014-06-161-1/+1
| | | | | | | | | | | | Previously setting: MESA_EXTENSION_OVERRIDE=-GL_MESA_ham_sandwich Would cause Mesa to advertise support for the GL_MESA_ham_sandwich extension, even though the override specifically asked for it to be disabled. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* Enable GL_ARB_explicit_uniform_location in the drivers.Tapani Pälli2014-06-163-0/+3
| | | | | | | v2: enable also for i915 (Ian) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Petri Latvala <[email protected]>
* mesa: support inactive uniforms in glUniform* functionsTapani Pälli2014-06-161-0/+15
| | | | | | | | | | Support inactive uniforms that have explicit location set in glUniform* functions. v2: remove unnecessary extension check, use new define (Ian) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: add new enum MAX_UNIFORM_LOCATIONSTapani Pälli2014-06-165-0/+12
| | | | | | | | | | | | | Patch adds new implementation dependent value required by the GL_ARB_explicit_uniform_location extension. Default value for user assignable locations is calculated as sum of MaxUniformComponents for each stage. v2: fix descriptor in get_hash_params.py (Petri) v3: simpler formula for calculating initial value (Ian) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: add enable bit for ARB_explicit_uniform_locationTapani Pälli2014-06-162-0/+2
| | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Use the sampler for pull constant loads on Broadwell.Kenneth Graunke2014-06-151-8/+8
| | | | | | | | | | | | | | | | | | | | | We've used the LD sampler message for pull constant loads on earlier hardware for some time, and also were already using it for the FS on Broadwell. This patch makes us use it for Broadwell VS/GS as well. I believe that when I wrote this code in 2012, we still used the data port in some cases, and I somehow neglected to convert it while rebasing. Improves performance in GLBenchmark 2.7 Egypt by 416.978% +/- 2.25821% (n = 17). Many other applications should benefit similarly: this speeds up uniform array access in the VS, which is commonly used for skinning shaders, among other things. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Tested-by: Ben Widawsky <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing newlines to a few perf_debug messages.Kenneth Graunke2014-06-151-2/+2
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Drop Broadwell perf_debugs about missing MOCS that aren't missing.Kenneth Graunke2014-06-152-4/+0
| | | | | | | | | I actually added MOCS support for these things, but forgot to delete the corresponding perf_debug() warnings. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing MOCS setup for 3DSTATE_INDEX_BUFFER on Broadwell.Kenneth Graunke2014-06-151-3/+1
| | | | | | | | Somehow I missed this when adding all of the other MOCS values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/vec4: Fix dead code elimination for VGRFs of size > 1.Kenneth Graunke2014-06-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | When faced with code such as: mov vgrf31.0:UD, 960D mov vgrf31.1:UD, vgrf30.xxxx:UD The dead code eliminator didn't consider reg_offsets, so it decided that the second instruction was writing was writing to the same register as the first one, and eliminated the first one. But they're actually different registers. This fixes INTEL_DEBUG=shader_time for vertex shaders. In the above code, vgrf31.0 represents the offset into the shader_time buffer where the data should be written, and vgrf31.1 represents the actual time data. With a completely undefined offset, results were...unexpected. I think this is probably one of the few cases (maybe only case) where we generate multiple MOVs to a large VGRF. Normally, we just use them as texturing results; the other SEND-from-GRF uses a size 1 VGRF. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965: Add SHADER_OPCODE_SHADER_TIME_ADD to dump_instructions() decode.Kenneth Graunke2014-06-151-0/+2
| | | | | | | "shader_time_add" is a lot more informative than "op152". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa/drivers: Fix clang constant-logical-operand warnings.Vinson Lee2014-06-144-13/+13
| | | | | | | | | | | | | | | | This patch fixes several clang constant-logical-operand warnings such as the following. ../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: warning: use of logical '||' with constant operand [-Wconstant-logical-operand] if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL) ^ ~~~~~~~~~~~ ../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: note: use '|' for a bitwise operation if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL) ^~ | Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta_blit: properly compute texture width for the CopyTexSubImage fallbackJason Ekstrand2014-06-131-1/+1
| | | | | | | Cc: "10.2" <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* Remove _mesa_is_type_integer and _mesa_is_enum_format_or_type_integerNeil Roberts2014-06-132-36/+0
| | | | | | | | | | | | | | | | | | | | | | | The comment for _mesa_is_type_integer is confusing because it says that it returns whether the type is an “integer (non-normalized)” format. I don't think it makes sense to say whether a type is normalized or not because it depends on what format it is used with. For example, GL_RGBA+GL_UNSIGNED_BYTE is normalized but GL_RGBA_INTEGER+GL_UNSIGNED_BYTE isn't. If the normalized comment is just a mistake then it still doesn't make much sense because it is missing the packed-pixel types such as GL_UNSIGNED_INT_5_6_5. If those were added then it effectively just returns type != GL_FLOAT. That function was only used in _mesa_is_enum_format_or_type_integer. This function effectively checks whether the format is non-normalized or the type is an integer. I can't think of any situation where that check would make sense. As far as I can tell neither of these functions have ever been used anywhere so we should just remove them to avoid confusion. These functions were added in 9ad8f431b2a47060bf05517246ab0fa8d249c800. Reviewed-by: Brian Paul <[email protected]>
* i965: Set the fast clear color value for texture surfacesNeil Roberts2014-06-122-2/+6
| | | | | | | | | | | | | | When a multisampled texture is used for sampling the fast clear color value needs to be programmed into the surface state. This was being left as all zeroes so if the surface was cleared to a value other than black then it wouldn't work properly. This doesn't matter for single-sample textures because in that case the MCS buffer is resolved before it is used as a texture source. https://bugs.freedesktop.org/show_bug.cgi?id=79729 Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "10.1 10.2" <[email protected]>
* i965: Fix disassembly of BLORP clear programs.Kenneth Graunke2014-06-121-1/+1
| | | | | | | Too many levels of indirection. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Move FB write default state mashing in a level.Kenneth Graunke2014-06-121-7/+7
| | | | | | | | | | We only need to alter the default state if we're emitting MOVs for header related fields. So, we can simply move the push/pop of state in to the if (header_present) block, bypassing it in the common case. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
* i965: Fix Haswell discard regressions since Gen4-5 line AA fix.Kenneth Graunke2014-06-121-2/+7
| | | | | | | | | | | | | | | | In commit dc2d3a7f5c217a7cee92380fbf503924a9591bea, Iago accidentally moved fire_fb_write() above the brw_pop_insn_state(), which caused the SEND to lose its predication and change from WE_normal to WE_all. Haswell uses predicated SENDs for discards, so this broke Piglit's tests for discards. We want the Gen4-5 MOV to be uncompressed, unpredicated, and unmasked, but the actual FB write itself should respect those. So, pop state first, and force it again around the single MOV. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
* i965: Use brw->gen in some generation checks.Matt Turner2014-06-115-11/+17
| | | | | | | Will simplify the automated conversion if we want to allow compiling the driver for a single generation. Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Clean up tabs in brw_fs_cse.cpp.Matt Turner2014-06-111-43/+43
| | | | I'm adding vec4 CSE, and I want to diff the files.
* meta: save and restore swizzle for _GenerateMipmapRobert Bragg2014-06-111-0/+12
| | | | | | | | | | This makes sure to use a no-op swizzle while iteratively rendering each level of a mipmap otherwise we may loose components and effectively apply the swizzle twice by the time these levels are sampled. Signed-off-by: Robert Bragg <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Emit smarter code for b2f of a comparisonIan Romanick2014-06-112-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | Previously we would emit the comparison, emit an AND to mask off extra bits from the comparison result, then convert the result to float. Now, do the comparison, then use a cleverly constructed SEL to pick either 0.0f or 1.0f. No piglit regressions on Ivybridge. total instructions in shared programs: 1642311 -> 1639449 (-0.17%) instructions in affected programs: 136533 -> 133671 (-2.10%) GAINED: 0 LOST: 0 Programs that are affected appear to save between 1 and 5 instuctions (just by skimming the output from shader-db report.py. v2: s/b2i/b2f/ in commit subject (noticed by Chris Forbes). Remove extraneous fix_3src_operand (suggested by Matt). The latter change required swapping the order of the operands and using predicate_inverse. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Silence a couple unused parameter warningsIan Romanick2014-06-111-2/+2
| | | | | | | | brw_vec4_visitor.cpp:2717:1: warning: unused parameter 'ir' [-Wunused-parameter] brw_vec4_visitor.cpp:2723:1: warning: unused parameter 'ir' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Store gl_uniform_driver_storage::format as the actual typeIan Romanick2014-06-111-1/+1
| | | | | | | | And delete the incorrect comment. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* i965: Add GPU BLIT of texture image to PBO in Intel driverJon Ashburn2014-06-101-0/+110
| | | | | | | | | | | | | | | | | | | Add Intel driver hook for glGetTexImage to accelerate the case of reading texture image into a PBO. This case gets huge performance gains by using GPU BLIT directly to PBO rather than GPU BLIT to temporary texture followed by memcpy. No regressions on Piglit tests with Intel driver. Performance gain (1280 x 800 FBO, Ivybridge): glGetTexImage + glMapBufferRange with patch 1.45 msec glGetTexImage + glMapBufferRange without patch 4.68 msec v3: (by Kenneth Graunke) - Fix compile after Eric's change to drop the tiling argument to intel_miptree_create_for_bo. - Add GL_TEXTURE_3D to blacklisted texture targets to prevent Piglit regressions. - Squash in several whitespace and coding style fixes.
* i965: Invalidate live intervals when inserting Gen4 SEND workarounds.Kenneth Graunke2014-06-101-0/+6
| | | | | | | | | We need to invalidate the live intervals when inserting new instructions. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965: Don't use the head sentinel as an fs_inst in Gen4 workaround code.Kenneth Graunke2014-06-101-1/+1
| | | | | | | | | | | When walking backwards, we want to stop at the head sentinel, which is where scan_inst->prev->prev == NULL, not scan_inst->prev == NULL. Fixes random crashes, as well as valgrind errors. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* meta: Label the meta GLSL clear program.Kenneth Graunke2014-06-101-0/+1
| | | | | | | | | Giving the meta clear program a meaningful name makes it easier to find in output such as INTEL_DEBUG=fs or INTEL_DEBUG=shader_time. We already did so for integer programs, but neglected to label the primary program. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Combine generate_math[12]_gen6 methods.Kenneth Graunke2014-06-102-33/+13
| | | | | | | | | | | | | | These used to call different math emitters (brw_math vs. brw_math2). Now that they both call gen6_math, they're virtually identical. When unrolling SIMD16 to multiple SIMD8 operations, we should take care not to apply sechalf to brw_null_reg for src1. Otherwise, we'd end up with BRW_ARF_NULL + 1 as the register number, and I'm not sure if that's valid. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Drop the generate_math[12]_gen7 methods.Kenneth Graunke2014-06-102-30/+5
| | | | | | | | | | These functions are basically identical, so we should combine them. However, they're so trivial, we may as well just fold them into their only call sites. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Combine generate_math[12]_gen6 methods.Kenneth Graunke2014-06-102-28/+12
| | | | | | | | | These are trivial to combine: we should just avoid checking the second operand if it's brw_null_reg. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Drop the generate_math2_gen7() method.Kenneth Graunke2014-06-102-14/+1
| | | | | | | | | It's now a single line of code, so we may as well fold it into the caller. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Rename brw_math to gen4_math.Kenneth Graunke2014-06-106-46/+46
| | | | | | | | | Usually, I try to use "brw" for functions that apply to all generations, and "gen4" for dead end/legacy code that is only used on Gen4-5. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Split Gen4-5 and Gen6+ MATH instruction emitters.Kenneth Graunke2014-06-104-89/+39
| | | | | | | | | | | | | | | | | | | | Our existing functions, brw_math and brw_math2, had unclear roles: Gen4-5 used brw_math for both unary and binary math functions; it never used brw_math2. Since operands are already in message registers, this is reasonable. Gen6+ used brw_math for unary math functions, and brw_math2 for binary math functions, duplicating a lot of code. The only real difference was that brw_math used brw_null_reg() for src1. This patch improves brw_math2's assertions to allow both unary and binary operations, renames it to gen6_math(), and drops the Gen6+ code out of brw_math(). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Make src_reg::equals() take a constant reference, not a pointer.Kenneth Graunke2014-06-103-14/+14
| | | | | | | This is more typical C++ style. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Don't set the "switch" flag on control flow instructions on Gen6+.Kenneth Graunke2014-06-101-6/+4
| | | | | | | | | | | | | | Thread switching on control flow instructions is a documented workaround for Gen4-5 errata. As far as I can tell, it hasn't been needed since Sandybridge. Thread switching is not free, so in theory this may help performance slightly. Flow control instructions with the "switch" flag cannot be compacted, so removing it will make these instructions compactable. (Of course, we still have to implement compaction for flow control instructions...) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Allow CSE on math opcodes on Gen6+.Kenneth Graunke2014-06-101-0/+11
| | | | | | | | | | total instructions in shared programs: 2081469 -> 2081248 (-0.01%) instructions in affected programs: 22606 -> 22385 (-0.98%) No programs were hurt by this patch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Replace open-coded linked list with exec_list.Matt Turner2014-06-104-62/+45
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: Fix substitution of large shadersCody Northrop2014-06-101-3/+14
| | | | | Signed-off-by: Cody Northrop <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Make gen7_pi field of brw_instruction use unsigned instead of GLuintKristian Høgsberg2014-06-091-12/+12
| | | | | | | | Nothing else uses GL-types here. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Don't include mtypes.h in brw_disasm.cKristian Høgsberg2014-06-091-2/+0
| | | | | | | | It's not used. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: initialize src as reg_undef for texture opcodes on Gen4.Matt Turner2014-06-091-6/+6
| | | | Untested.