summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* mesa: Permanently enable features supported by target CPU at compile time.Siavash Eliasi2014-11-261-0/+26
| | | | | | | | | | | This will remove the need for unnecessary runtime checks for CPU features if already supported by target CPU, resulting in smaller and less branchy code. V2: - Removed the SSSE3 related part for the not yet merged patch. - Avoiding redefinition of macros. Tested-by: David Heidelberg <[email protected]>
* i965/vec4: Handle destination writemasks in VEC4_OPCODE_PACK_BYTES.Matt Turner2014-11-251-2/+13
| | | | | | | | | | | | | | Since pack_bytes expands to two mov(4) align1 instructions, we can't use swizzles directly. For an instruction like pack_bytes m4.y:UD, vgrf13.xyzw:UD we can write into the .y component by settings the offset based on the swizzle. Also while we're doing this, we can set the dependency control hints properly, so that a series of pack_bytes writing into separate components of a register can issue without blocking.
* i965/vec4: Optimize packSnorm4x8().Matt Turner2014-11-253-4/+29
| | | | | Reduces the number of instructions needed to implement packSnorm4x8() from 13 -> 7.
* i965/vec4: Optimize packUnorm4x8().Matt Turner2014-11-253-4/+27
| | | | | Reduces the number of instructions needed to implement packUnorm4x8() from 11 -> 6.
* i965/vec4: Add VEC4_OPCODE_PACK_4_BYTES.Matt Turner2014-11-254-0/+52
| | | | Will be used by emit_pack_{s,u}norm_4x8().
* i965/vec4: Optimize unpackSnorm4x8().Matt Turner2014-11-253-3/+33
| | | | | | | Reduces the number of instructions needed to implement unpackSnorm4x8() from 16 -> 6. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Optimize unpackUnorm4x8().Matt Turner2014-11-253-3/+31
| | | | | | | Reduces the number of instructions needed to implement unpackUnorm4x8() from 11 -> 4. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Add vector float immediate infrastructure.Matt Turner2014-11-253-0/+23
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add vector float immediate infrastructure.Matt Turner2014-11-253-0/+24
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Disassemble vector float immediates properly.Matt Turner2014-11-251-1/+5
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Add unit test for float <-> VF conversions.Matt Turner2014-11-252-0/+105
| | | | | Using Eric's original VF -> float conversion code to initialize the table.
* i965: Add functions to convert float <-> VF.Matt Turner2014-11-253-0/+80
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/Gen6-7: Do not replace texcoords with point coord if not drawing pointsChris Forbes2014-11-252-12/+49
| | | | | | | | | | | | | | | | | | | | | | Fixes broken rendering in Windows-based QtQuick2 apps run through Wine. This library sets all texture units' GL_COORD_REPLACE, leaves point sprite mode enabled, and then draws a triangle fan. Will need a slightly different fix for Gen4-5, but I don't have my old machines in a usable state currently. V2: - Simplify patch -- the real changes are no longer duplicated across the Gen6 and Gen7 atoms. - Also don't clobber attr overrides -- which matters on Haswell too, and fixes the other half of the problem - Fix newly-introduced warnings V3: - Use BRW_NEW_GEOMETRY_PROGRAM and brw->geometry_program rather than core flag and state; keep the state flags in order. Signed-off-by: Chris Forbes <[email protected]> Cc: "10.4" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84651 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Precompile ARB programs.Kenneth Graunke2014-11-241-2/+9
| | | | | | | | | | | | We already precompile GLSL programs; it seems logical to precompile ARB programs as well. We just never hooked it up. This also makes the programs compile even if no drawing occurs, which is useful for shader-db. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make precompile functions accessible from C.Kenneth Graunke2014-11-245-10/+19
| | | | | | | | | | | | | Previously, the prototypes for brw_vs/gs/fs_precompile were scattered between brw_vs.h (C), brw_gs.h (C), and brw_fs.h (C++ only). Also, brw_fs_precompile had C++ linkage, while the others were C. This patch moves all the prototypes to a central location (brw_shader.h) and makes brw_fs_precompile have C linkage. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Pass gl_program pointers into precompile functions.Kenneth Graunke2014-11-247-29/+33
| | | | | | | | | | | | We'd like to do precompiling for ARB vertex and fragment programs, which only have gl_program structures - gl_shader_program is NULL. This patch makes the various precompile functions take a gl_program parameter directly, rather than accessing it via gl_shader_program. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move brw->precompile checks out a level.Kenneth Graunke2014-11-241-4/+4
| | | | | | | | | brw_shader_precompile should just do a precompile; it makes more sense for the caller to decide whether we should do one. Simpler. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Drop unused NV_fragment_program opcodes.Eric Anholt2014-11-244-177/+0
| | | | | | | | | The extension itself was deleted 2 years ago. There are still some prog_instruction opcodes from NV_fp that exist because they're used by ir_to_mesa.cpp, though. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Ian Roamnick <[email protected]>
* mesa: Drop unused SFL/STR opcodes.Eric Anholt2014-11-243-16/+0
| | | | | | | | They're part of NV_vertex_program2, which I'm pretty sure we're never going to support. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Ian Roamnick <[email protected]>
* i965/gen6/gs: Don't declare a src_reg with struct.Matt Turner2014-11-241-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Fix all32h/any32h predicate disassembly.Matt Turner2014-11-241-1/+1
| | | | Reviewed-by: Chris Forbes <[email protected]>
* i965: Don't overwrite the math function with conditional mod.Matt Turner2014-11-242-2/+4
| | | | | | | | | | | | | | | Ben was asking about the undocumented restriction that the math instruction cannot use the dependency control hints. I went to reconfirm and disabled the is_math() check in opt_set_dependency_control() and saw that the disassembled math instructions with dependency hints had a bogus math function. We were mistakenly overwriting it by setting an empty conditional mod. Unfortunately, this wasn't the cause of the aforementioned problem (I reproduced it). This bug is benign, since we don't set dependeny hints on math instructions -- but maybe some day. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Assert that math instructions don't have conditional mod.Matt Turner2014-11-242-0/+4
| | | | | | The math function field is at the same location as conditional mod. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Fix Get(GL_TRANSPOSE_CURRENT_MATRIX_ARB) to transposeChris Forbes2014-11-241-1/+1
| | | | | | | | | This was just returning the same value as GL_CURRENT_MATRIX_ARB. Spotted while investigating something else in apitrace. Signed-off-by: Chris Forbes <[email protected]> Cc: "10.3 10.4" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Handle nested uniform array indexingChris Forbes2014-11-241-29/+37
| | | | | | | | | | | | | | | | | | | When converting a uniform array reference to a pull constant load, the `reladdr` expression itself may have its own `reladdr`, arbitrarily deeply. This arises from expressions like: a[b[x]] where a, b are uniform arrays (or lowered const arrays), and x is not a constant. Just iterate the lowering to pull constants until we stop seeing these nested. For most shaders, there will be only one pass through this loop. Fixes the piglit test: tests/spec/glsl-1.20/linker/double-indirect-1.shader_test Signed-off-by: Chris Forbes <[email protected]> Cc: "10.3 10.4" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Fix function name in GetActiveUniformName errorChris Forbes2014-11-231-1/+1
| | | | Signed-off-by: Chris Forbes <[email protected]>
* i965: Make Gen4-5 push constants call _mesa_load_state_parameters too.Kenneth Graunke2014-11-211-0/+4
| | | | | | | | | | | | | | | | | | In commit 5e37a2a4a8a, I made the pull constant code stop calling _mesa_load_state_parameters() when there were no pull parameters. This worked fine on Gen6+ because the push constant code also called it if there were any push constants. However, the Gen4-5 push constant code wasn't doing this. This patch makes it do so, like the Gen6+ code. A better long term solution would be to make core Mesa just handle this for us when necessary. Fixes around 8766 Piglit tests on Ironlake, and probably Gen4 as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Tested-by: Mark Janes <[email protected]>
* i965/vec4/gen8: Handle the MUL dest hazard exceptionBen Widawsky2014-11-212-2/+19
| | | | | | | | | | | Fix one of the few cases where we can't reliable touch the destination hazard bits. I am explicitly doing this patch individually so it is easy to backport. I was tempted to do this patch before the previous patch which reorganized the code, but I believe even doing that first, this is still easy to backport. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212 Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Extract depctrl hazardsBen Widawsky2014-11-211-22/+27
| | | | | | | | | | | | | | | | | Move this to a separate function so that we can begin to add other little caveats without making too big a mess. NOTE: There is some desire to improve this function eventually, but we need to fix a bug first. v2: Use const for the inst for the hazard check (Matt) Invert safe logic to get rid of the double negative (Matt) Add PRM reference for predicates (Matt) Add note about empirical evidence for math (Matt) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Remove is_valid_3src().Matt Turner2014-11-213-8/+1
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Remove is_valid_3src() checks from emit_lrp.Matt Turner2014-11-211-4/+1
| | | | | | | The visitor emits MOVs to temporary registers for immediates, so these never trigger. For further proof, check case ir_triop_fma. Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Remove unused apply_stride().Matt Turner2014-11-212-11/+0
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Move ip_record class to its one use.Matt Turner2014-11-212-12/+12
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* i965: Move common fields into backend_instruction.Matt Turner2014-11-213-5/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Combine offset/texture_offset fields.Matt Turner2014-11-216-15/+13
| | | | | | | | texture_offset was only used by some texturing operations, and offset was only used by spill/unspill and some URB operations. These fields are never used at the same time. Reviewed-by: Jason Ekstrand <[email protected]>
* i915: Only use TEXCOORDTYPE_VECTOR with cube maps on gen2Ville Syrjälä2014-11-201-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check that the target is GL_TEXTURE_CUBE_MAP before emitting TEXCOORDTYPE_VECTOR texture coordinates. I'm not sure if the hardware would like CARTESIAN coordinates with cube maps, and as I'm too lazy to find out just emit the VECTOR coordinates for cube maps always. For other targets use CARTESIAN or HOMOGENOUS depending on the number of texture coordinates provided. Fixes rendering of the "electric" background texture in chromium-bsu main menu. We appear to be provided with three texture coordinates there (I'm guessing due to the funky texture matrix rotation it does). So the code would decide to use TEXCOORDTYPE_VECTOR instead of TEXCOORDTYPE_CARTESIAN even though we're dealing with a 2D texure. The results weren't what one might expect. demos/cubemap still works, which hopefully indicates that this doesn't break things. Also tested with: bin/glean -o -v -v -v -t +texCube --quick bin/cubemap -auto from piglit. Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
* i965/disasm: Properly decode branch_ctrl (gen8+)Ben Widawsky2014-11-203-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for decoding the new branch control bit. I saw two things wrong with the existing code. 1. It didn't bother trying to decode the bit. - While we do not *intentionally* emit this bit today, I think it's interesting to see if we somehow ended up with the bit set. It may also be useful in the future. 2. It seemed to be the wrong bit. - The docs are pretty poor wrt which bit this actually occupies. To me, it /looks/ like it should be bit 28. I am not sure where Ken got 30 from. I verified it should be 28 by looking at the simulator code. I also added the most basic support for GOTO simply so we don't need to remember to change the function in the future. v2: Move the branch_ctrl check out of the if gen >= 6 check to make it more readable. (Matt) ENDIF doesn't have branch_ctrl (Matt + Ken) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Skip _mesa_load_state_parameters when there are zero parameters.Kenneth Graunke2014-11-202-11/+11
| | | | | | | | Saves a tiny bit of CPU overhead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Eric Anholt <[email protected]>
* i965: Fix segfault in WebGL Conformance on IvybridgeChad Versace2014-11-181-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Fixes regression of WebGL Conformance test texture-size-limit [1] on Ivybridge Mobile GT2 0x0166 with Google Chrome R38. Regression introduced by commit 6c044231535b93c5d16404528946cad618d96bd9 Author: Kenneth Graunke <[email protected]> Date: Sun Feb 2 02:58:42 2014 -0800 i965: Bump GL_MAX_CUBE_MAP_TEXTURE_SIZE to 8192. The test regressed because the pointer offset arithmetic in intel_miptree_map_gtt() overflows for large textures. The pointer arithmetic is not 64-bit safe. [1] https://github.com/KhronosGroup/WebGL/blob/52f0dc240f04dce31b1b8e2b8107fe2b8332dc90/sdk/tests/conformance/textures/texture-size-limit.html Cc: "10.3 10.4" <[email protected]> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=78770 Fixes: Intel CHRMOS-1377 Reported-by: Lu Hua <[email protected]> Reviewed-by: Ian Romanic <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* mesa/main: Fix tmp_row memory leak in texstore_rgba_integer.Siavash Eliasi2014-11-181-1/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* st/mesa: add a fallback for clear_with_quad when no vs_layerIlia Mirkin2014-11-172-5/+21
| | | | | | | | | | | | | | | | | Not all drivers can set gl_Layer from VS. Add a fallback that passes the instance id from VS to GS, and then uses the GS to set the layer. Tested by adding quad_buffers |= clear_buffers; clear_buffers = 0; to the st_Clear logic, and forcing set_vertex_shader_layered in all cases. No piglit regressions (on piglits with 'clear' in the name). Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: "10.4 10.3" <[email protected]>
* nine: Implement threadpoolAxel Davy2014-11-181-0/+5
| | | | | | | | | | | | | | | DRI_PRIME setups have different issues due the lack of dma-buf fences support in the drivers. For DRI3 DRI_PRIME, a race can appear, making tearings visible, or worse showing older content than expected. Until dma-buf fences are well supported (and by all drivers), an alternative is to send the buffers to the server only when rendering has finished. Since waiting the rendering has finished in the main thread has a performance impact, this patch uses an additional thread to offload the wait and the sending of the buffers to the server. Acked-by: Jose Fonseca <[email protected]> Reviewed-by: David Heidelberg <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* nine: Add drirc options (v2)Axel Davy2014-11-181-0/+13
| | | | | | | | | Implements vblank_mode and throttling, which allows us change default ratio between framerate and input lag. Acked-by: Jose Fonseca <[email protected]> Signed-off-by: David Heidelberg <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* st/mesa: copy sampler_array_size field when copying instructionsBrian Paul2014-11-171-1/+6
| | | | | | | | | | | | | | | | | | The sampler_array_size field was added by "mesa/st: add support for dynamic sampler offsets". But the field wasn't getting copied in the get_pixel_transfer_visitor() or get_bitmap_visitor() functions. The count_resources() function then didn't properly compute the glsl_to_tgsi_visitor::samplers_used bitmask. Then, we didn't declare all the sampler registers in st_translate_program(). Finally, we asserted when we tried to emit a tgsi ureg src register with File = TGSI_FILE_UNDEFINED. Add the missing assignments and some new assertions to catch the invalid register sooner. Cc: "10.3, 10.4" <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: remove unused pipe_viewport_state::translate[3] and scale[3]Marek Olšák2014-11-165-10/+0
| | | | Almost all drivers ignore them.
* tgsi/ureg: simplify code for declaring propertiesMarek Olšák2014-11-163-17/+30
| | | | Tested-by: Nick Sarnie <[email protected]>
* gallium/util: add a window_space option to the passthrough vertex shaderMarek Olšák2014-11-163-3/+5
| | | | Tested-by: Nick Sarnie <[email protected]>
* Revert "mesa: Wrap SSE4.1 code in #ifdef __SSE4_1__."Emil Velikov2014-11-151-3/+0
| | | | | | | | | | | | | | | | | This reverts commit 8d3f739383fbdf671752fdec707f1c2b9b2aa6a3. In the last commit we've updated our check to determine if the actual code is buildable, rather than if the compiler acknowledges the option. I.e. did anyone provide -mno-sse4.1 vs is my compiler too old. Now this code will never be attemped to be build, in both cases. Confirmed by building mesa with export CFLAGS='-march=native -mno-sse4.1' ./configure && make Tested-by: David Heidelberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move fs_visitor ra pass to new fs_visitor::allocate_registers()10.4-branchpointKristian Høgsberg2014-11-142-59/+69
| | | | | | | | | This will be reused for the scalar VS pass. v2 (Ken): Rebase on master. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move fs_visitor optimization pass into new method fs_visitor::optimize()Kristian Høgsberg2014-11-142-65/+72
| | | | | | | We'll reuse this toplevel optimization driver for the scalar VS. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>