summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* dri/i965: extend GLES3 sRGB workaround to cover all formatsHaixia Shi2016-04-121-4/+3
| | | | | | | | | | It is incorrect to assume BGRA byte order for the GLES3 sRGB workaround. v2: use _mesa_get_srgb_format_linear to handle all formats Signed-off-by: Haixia Shi <[email protected]> Reviewed-by: Stéphane Marchesin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add autogenerated 'brw_nir_trig_workarounds.c' to gitignoreEduardo Lima Mitev2016-04-121-0/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Port INTEL_PRECISE_TRIG=1 to NIR.Kenneth Graunke2016-04-117-28/+58
| | | | | | | | | | | | | | | | | | | | This makes the extra multiply visible to NIR's algebraic optimizations (for constant reassociation) as well as constant folding. This means that when the result of sin/cos are multiplied by an constant, we can eliminate the extra multiply altogether, reducing the cost of the workaround. It also means we only have to implement it one place, rather than in both backends. This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion, which has a ton of sin() calls, but always multiplies them by an immediate constant. The extra multiply gets folded away. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Pass brw_compiler into brw_preprocess_nir() instead of is_scalar.Kenneth Graunke2016-04-112-3/+6
| | | | | | | | | | I want to be able to read other fields. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir/lower_system_values: Add support for several computed valuesJason Ekstrand2016-04-111-1/+2
| | | | Reviewed-by: Rob Clark <[email protected]>
* i965: fix struct type in commentTimothy Arceri2016-04-111-1/+1
| | | | Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: enable OES_texture_buffer on gen7+Ilia Mirkin2016-04-101-0/+1
| | | | | | | | | It will only end up getting exposed on gen8+ since it requires GL ES 3.1, but it should be ready to go on gen7 when support for GL ES 3.1 is completed there. Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Kenneth Graunke <[email protected]>
* i965/disasm: Decode per-slot offsets.Kenneth Graunke2016-04-091-0/+5
| | | | | | | We just never bothered to decode this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/disasm: Decode "channel mask present" bit correctly.Kenneth Graunke2016-04-091-4/+15
| | | | | | | | Bit 15 means "interleave" for most messages, but for SIMD8 messages it means "use channel masks". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/disasm: Simplify the URB opcode printing with ?:.Kenneth Graunke2016-04-091-7/+6
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/tiled_memcopy: Get rid of the direction parameter to get_memcpyJason Ekstrand2016-04-085-22/+5
| | | | | | | | | Now that we can use the much simpler rgba8_copy function, we don't need to hand different functions out based on direction. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tiled_memcpy: Rework the RGBA -> BGRA mem_copy functionsJason Ekstrand2016-04-081-76/+63
| | | | | | | | | | | | | This splits the two copy functions into three: One for unaligned copies, one for aligned sources, and one for aligned destinations. Thanks to the previous commit, we are now guaranteed that the aligned ones will *only* operate on aligned memory so they should be safe. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93962 Cc: "11.1 11.2" <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tiled_memcopy: Add aligned mem_copy parameters to the [de]tiling functionsJason Ekstrand2016-04-081-32/+43
| | | | | | | | | | | | | | | | | | Each of the [de]tiling functions has three mem_copy calls: 1) Left edge to tile boundary 2) Tile boundary to tile boundary in a loop 3) Tile boundary to right edge Copies 2 and 3 start at a tile edge so the pointer to tiled memory is guaranteed to be at least 16-byte aligned. Copy 1, on the other hand, starts at some arbitrary place in the tile so it doesn't have any such alignment guarantees. Cc: "11.1 11.2" <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Check eu/subslices are > 0Ben Widawsky2016-04-081-1/+1
| | | | | | | | Now that the check is restricted to gen8+, we should always get back a non-zero positive value for the EU and subslice counts. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix eu/subslice warningBen Widawsky2016-04-081-11/+23
| | | | | | | | | | | Older gen platforms do not actually return a value for sublice and eu total (IMO, confusingly) they return -ENODEV. This patch defers the SSEU setup until we have the actual GPU generation to avoid useless warnings when running on older platforms with older kernels. Reported-by: Mark Janes <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Extract SSEU configuration infoBen Widawsky2016-04-081-14/+21
| | | | | Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/sf_state: Pull flat_enables out of prog_dataJason Ekstrand2016-04-064-27/+5
| | | | | | | | | | Previously, we were walking over the shader source to figure out which inputs should be marked flat. Now, we can just pull it out of prog_data. This is needed for properly setting up 3DSTATE_SF/SBE for Vulkan and it also means that it will get properly cached. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add a flat_inputs field to prog_dataJason Ekstrand2016-04-062-0/+37
| | | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* brw/device_info: Add a helper for getting a device nameJason Ekstrand2016-04-062-0/+13
| | | | | | | This is needed by the Vulkan driver Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs_surface_builder: Mask signed integers after conversionJason Ekstrand2016-04-061-0/+18
| | | | | Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Make the repclear shader support either a uniform or a flat inputJason Ekstrand2016-04-061-5/+18
| | | | | | | | | | In the Vulkan driver we use a single flat input instead of a uniform because setting up push constants is more disruptive to the pipeline than setting up another vertex input. This uses the number of uniforms as a key to keep it working for the GL driver. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move get_hw_prim_for_gl_prim to brw_util.cJason Ekstrand2016-04-062-29/+28
| | | | | | | | It's used by brw_compile_gs in brw_vec4_gs_visitor.cpp so it needs to be in a file that's linked into libi965_compiler.la. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* drirc: add a workaround for blackness in WarsowMarek Olšák2016-04-061-0/+8
| | | | Cc: 11.1 11.2 <[email protected]>
* i965/fs: Move the code for load/store_shared to emit_cs_intrinsicJason Ekstrand2016-04-041-76/+76
| | | | | | | They are compute-shader only and that's where the code for doing atomics on shared variables lives so it seemes to make sense. Reviewed-by: Jordan Justen <[email protected]>
* i965/nir: Provide a default LOD for buffer texturesJason Ekstrand2016-04-042-0/+8
| | | | | | | | | Our hardware requires an LOD for all texelFetch commands even if they are on buffer textures. GLSL IR gives us an LOD of 0 in that case, but the LOD is really rather meaningless. This commit allows other NIR producers to be more lazy and not provide one at all. Reviewed-by: Jordan Justen <[email protected]>
* i965: Fix invalid pointer read in dead_control_flow_eliminate().Kenneth Graunke2016-04-041-0/+4
| | | | | | | | | | | There may not be a previous block. In this case, there's no real work to do, so just continue on to the next one. v2: Update for bblock->prev() API change. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make bblock_t::next and friends return NULL at sentinels.Kenneth Graunke2016-04-042-1/+13
| | | | | | | | | The bblock_t::prev/prev_const/next/next_const API returns bblock_t pointers, rather than exec_nodes. So it's a bit surprising. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/peephole_ffma: Only match a mul+add if none of the ops are exactJason Ekstrand2016-04-041-0/+11
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Add an INTEL_PRECISE_TRIG=1 option to fix SIN/COS output range.Kenneth Graunke2016-04-044-4/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | The SIN and COS instructions on Intel hardware can produce values slightly outside of the [-1.0, 1.0] range for a small set of values. Obviously, this can break everyone's expectations about trig functions. According to an internal presentation, the COS instruction can produce a value up to 1.000027 for inputs in the range (0.08296, 0.09888). One suggested workaround is to multiply by 0.99997, scaling down the amplitude slightly. Apparently this also minimizes the error function, reducing the maximum error from 0.00006 to about 0.00003. When enabled, fixes 16 dEQP precision tests dEQP-GLES31.functional.shaders.builtin_functions.precision. {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}. at the cost of making every sin and cos call more expensive (about twice the number of cycles on recent hardware). Enabling this option has been shown to reduce GPUTest Volplosion performance by about 10%. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Allow 8x MSAA on >= 64bpp formats on Gen8+.Kenneth Graunke2016-04-041-1/+2
| | | | | | | | | | | See commit 3b0279a69 - this restriction is documented in the "Surface Format" field of RENDER_SURFACE_STATE. Looking at newer documentation, this restriction appears to exist on Haswell, but no longer applies on Gen8+. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Add an implemnetation of nir_op_fquantize2f16Jason Ekstrand2016-04-012-0/+53
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Use brw->urb.min_vs_urb_entries instead of 32 for BLORP.Kenneth Graunke2016-03-311-4/+1
| | | | | | | | | | | Haswell GT2 and GT3 have a minimum of 64 entries. Hardcoding 32 is not legal. v2: Delete stale comment (caught by Alejandro). Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Fix textureSize() depth value for 1 layer surfaces on Gen4-6.Kenneth Graunke2016-03-312-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | According to the Sandybridge PRM's description of the resinfo message, the .z value returned will be Depth == 0 ? 0 : Depth + 1. The earlier PRMs have the same table. This means we return 0 for array textures with a single slice, when we ought to return 1. Just override it to max(depth, 1). Fixes 10 dEQP-GLES3.functional tests on Sandybridge: shaders.texture_functions.texturesize.sampler2darray_fixed_vertex shaders.texture_functions.texturesize.sampler2darray_fixed_fragment shaders.texture_functions.texturesize.sampler2darray_float_vertex shaders.texture_functions.texturesize.sampler2darray_float_fragment shaders.texture_functions.texturesize.isampler2darray_vertex shaders.texture_functions.texturesize.isampler2darray_fragment shaders.texture_functions.texturesize.usampler2darray_vertex shaders.texture_functions.texturesize.usampler2darray_fragment shaders.texture_functions.texturesize.sampler2darrayshadow_vertex shaders.texture_functions.texturesize.sampler2darrayshadow_fragment Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Don't add barrier deps for FB write messages.Matt Turner2016-03-301-1/+2
| | | | | | | Ken did this earlier, and this is just me reimplementing his patch a little differently. Reviewed-by: Francisco Jerez <[email protected]>
* i965: Add and use is_scheduling_barrier() function.Matt Turner2016-03-301-4/+17
|
* i965: Remove NOP insertion kludge in scheduler.Matt Turner2016-03-301-20/+5
| | | | | | | | | | | Instead of removing every instruction in add_insts_from_block(), just move the instruction to its scheduled location. This is a step towards doing both bottom-up and top-down scheduling without conflicts. Note that this patch changes cycle counts for programs because it begins including control flow instructions in the estimates. Reviewed-by: Francisco Jerez <[email protected]>
* i965: Assert that an instruction is not inserted around itself.Matt Turner2016-03-301-0/+4
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* i965: Relax restriction on scheduling last instruction.Matt Turner2016-03-301-20/+3
| | | | | | | | | | | | | | | | | | | | | | I think when this code was written, basic blocks were always ended by a control flow instruction or an end-of-thread message. That's no longer the case, and removing this restriction actually helps things: instructions in affected programs: 7267 -> 7244 (-0.32%) helped: 4 total cycles in shared programs: 66559580 -> 66431900 (-0.19%) cycles in affected programs: 28310152 -> 28182472 (-0.45%) helped: 9577 HURT: 879 GAINED: 2 The addition of the is_control_flow() checks is not a functional change, since the add_insts_from_block() does not put them in the list of instructions to schedule. I plan to change this in a later patch. Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4/tcs: Set conditional mod on TCS_OPCODE_SRC0_010_IS_ZERO.Matt Turner2016-03-302-2/+3
| | | | | | | | | | | | | | | | | | Missing this causes an assertion failure in the scheduler with the next patch. Additionally, this gives cmod propagation enough information to optimize code better. total instructions in shared programs: 7112991 -> 7112852 (-0.00%) instructions in affected programs: 25704 -> 25565 (-0.54%) helped: 139 total cycles in shared programs: 64812898 -> 64810674 (-0.00%) cycles in affected programs: 127224 -> 125000 (-1.75%) helped: 139 Acked-by: Francisco Jerez <[email protected]>
* Revert "i965: Don't add barrier deps for FB write messages."Matt Turner2016-03-301-4/+3
| | | | | | | | | | | | | | This reverts commit d0e1d6b7e27bf5f05436e47080d326d7daa63af2. The change in the vec4 code is a mistake -- there's never an FS_OPCODE_FB_WRITE in vec4 code. The change in the fs code had the (harmless) effect of not recognizing an FB_WRITE as a scheduling barrier even if it was marked EOT -- harmless because the scheduler marked the last instruction of a block as a barrier, something I'm changing in the following patches. This will be reimplemented later in the series.
* i965: Simplify full scheduling-barrier conditions.Matt Turner2016-03-301-27/+8
| | | | | | | All of these were simply code for "architecture register file" (and in the case of destinations, "not the null register"). Reviewed-by: Francisco Jerez <[email protected]>
* i965: Remove incorrect cycle estimates.Matt Turner2016-03-301-10/+0
| | | | | | | | These printed the cycle count the last basic block (sched.time is set per basic block!). We have accurate, full program, data printed elsewhere. Reviewed-by: Francisco Jerez <[email protected]>
* glsl: add transform feedback buffers to resource listTimothy Arceri2016-03-311-1/+1
| | | | Reviewed-by: Dave Airlie <[email protected]>
* mesa: split transform feedback buffer into its own structTimothy Arceri2016-03-313-7/+7
| | | | | | | This will be used in a following patch to implement interface query support for TRANSFORM_FEEDBACK_BUFFER. Reviewed-by: Dave Airlie <[email protected]>
* glsl: use bitmask of active xfb buffer indicesTimothy Arceri2016-03-311-1/+1
| | | | | | | | | | | | | This allows us to print the correct binding point when not all buffers declared in the shader are bound. For example if we use a single buffer: layout(xfb_buffer=2, offset=0) out vec4 v; We now print '2' when the buffer is not bound rather than '0'. Reviewed-by: Dave Airlie <[email protected]>
* i965: Don't inline intel_batchbuffer_require_space().Matt Turner2016-03-302-26/+28
| | | | | | | | | | | | It's called by the inline intel_batchbuffer_begin() function which itself is used in BEGIN_BATCH. So in sequence of code emitting multiple packets, we have inlined this ~200 byte function multiple times. Making it an out-of-line function presumably improved icache usage. Improves performance of Gl32Batch7 by 3.39898% +/- 0.358674% (n=155) on Ivybridge. Reviewed-by: Abdiel Janulgue <[email protected]>
* meta: use _mesa_prepare_mipmap_levels()Brian Paul2016-03-291-24/+8
| | | | | | | | | | | | | | | | | | | | | | | | The prepare_mipmap_level() wrapper for _mesa_prepare_mipmap_level() is not needed. It only served to undo the GL_TEXTURE_1D_ARRAY height/depth change was was made before the call to prepare_mipmap_level() Said another way, regardless of how the meta code manipulates the height/ depth dims for GL_TEXTURE_1D_ARRAY, the gl_texture_image dimensions are correctly set up by _mesa_prepare_mipmap_levels(). Tested by plugging _mesa_meta_GenerateMipmap() into the swrast driver and testing with piglit. v2 (idr): Early out of the mipmap generation loop with dstImage is NULL. This can occur for immutable textures that have a limited range of levels or in the presense of memory allocation failures. Fixes arb_texture_view-mipgen on Intel platforms. Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* xlib: add support for GLX_ARB_create_contextBrian Paul2016-03-293-0/+77
| | | | | | | | | | | | | | | | | This adds the glXCreateContextAttribsARB() function for the xlib/swrast driver. This allows more piglit tests to run with this driver. For example, without this patch we get: $ bin/fbo-generatemipmap-1d -auto piglit: error: waffle_config_choose failed due to WAFFLE_ERROR_UNSUPPORTED_ ON_PLATFORM: GLX_ARB_create_context is required in order to request an OpenGL version not equal to the default value 1.0 piglit: error: Failed to create waffle_config for OpenGL 2.0 Compatibility Context piglit: info: Failed to create any GL context PIGLIT: {"result": "skip" } Reviewed-by: Jose Fonseca <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* i965: Don't use CUBE wrap modes for integer formats on IVB/BYT.Kenneth Graunke2016-03-291-1/+5
| | | | | | | | | | | | | | | | There is no linear filtering for integer formats, so we should always be using CLAMP_TO_EDGE mode. Fixes 46 dEQP cases on Ivybridge (which were likely broken by commit 0faf26e6a0a34c3544644852802484f2404cc83e). This workaround doesn't appear to be necessary on any other hardware; I haven't found any documentation mentioning errata in this area. v2: Only apply on Ivybridge/Baytrail to avoid regressing GLES3.1 tests. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> [v1]
* Revert "i965: Set address rounding bits for GL_NEAREST filtering as well."Kenneth Graunke2016-03-291-6/+3
| | | | | | | This reverts commit 60d6a8989ab44cf47accee6bc692ba6fb98f6a9f. It's pretty sketchy, and apparently regressed a bunch of dEQP tests on Sandybridge.