summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* glapi: add ARB_gpu_shader_fp64 (v2)Dave Airlie2015-02-197-37/+465
| | | | | | | | | | | | | | | Just add the xml file covering this extension, and dummy interface files in mesa, and fix up sanity tests. v2: Enable ProgramUniform*d* from ARB_separate_shader_objects (Ian) use 40 instead of 43 for dispatch_sanity.cpp (Chris) uncomment PU sanity tests. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* freedreno: add missing PIPE_CAP_RESOURCE_FROM_USER_MEMORY to switchIlia Mirkin2015-02-191-0/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: add ARB_instanced_arrays supportIlia Mirkin2015-02-192-2/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: add support for vertexid and instanceid sysvalsIlia Mirkin2015-02-194-16/+119
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: pass number of instances to drawIlia Mirkin2015-02-198-18/+22
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: add ETC2 decoding supportIlia Mirkin2015-02-192-4/+17
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* st/mesa: pass etc2 textures to driver if supportedIlia Mirkin2015-02-194-11/+40
| | | | | | | If the driver actually supports ETC2, don't decode it in software. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe,softpipe: only support ETC1, not the upcoming ETC2Ilia Mirkin2015-02-182-0/+8
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add ETC2 format supportIlia Mirkin2015-02-187-114/+104
| | | | | | | No actual decoding is added, similar faking mechanism to bptc. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* freedreno/a3xx: add hardware ETC1 supportIlia Mirkin2015-02-182-0/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium/dri: Shut up a compiler warning.Eric Anholt2015-02-181-1/+1
| | | | | | | The compiler doesn't see that buffers is set in the !image case and used in the !image case. Reviewed-by: Matt Turner <[email protected]>
* nir: Recognize and reduce duplicated fsats.Eric Anholt2015-02-181-0/+2
| | | | | | | | No effect on vc4 shader-db. v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* nir: Add a flag for lowering fsat.Eric Anholt2015-02-182-1/+3
| | | | | | | | | | vc4 cse/algebraic-disabled stats: total instructions in shared programs: 44356 -> 44354 (-0.00%) instructions in affected programs: 55 -> 53 (-3.64%) v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* nir: Add a flag for lowering ffma.Eric Anholt2015-02-182-1/+3
| | | | | | | | | | | | vc4 cse/algebraic-disabled stats: total uniforms in shared programs: 13966 -> 13791 (-1.25%) uniforms in affected programs: 435 -> 260 (-40.23%) total instructions in shared programs: 44732 -> 44356 (-0.84%) instructions in affected programs: 9599 -> 9223 (-3.92%) v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* nir: Add a flag for lowering fneg/ineg.Eric Anholt2015-02-182-0/+12
| | | | | | | | | | | vc4 cse/algebraic-disabled stats: total instructions in shared programs: 44911 -> 44732 (-0.40%) instructions in affected programs: 11371 -> 11192 (-1.57%) v2: Fix broken iabs(isub(0, a)) transformation. v3: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* nir: Add a flag for lowering fsqrt(x) to frcp(frsqrt(x)).Eric Anholt2015-02-182-1/+3
| | | | | | | | | | | | vc4 cse/algebraic-disabled stats: total uniforms in shared programs: 13972 -> 13966 (-0.04%) uniforms in affected programs: 408 -> 402 (-1.47%) total instructions in shared programs: 44973 -> 44911 (-0.14%) instructions in affected programs: 1551 -> 1489 (-4.00%) v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* nir: Add lowering of POW instructions if the lower flag is set.Eric Anholt2015-02-181-0/+1
| | | | | | | | | | This could be done in a separate pass like we do in GLSL IR, but it seems to me like having the definitions of the transformations in the two directions next to each other makes a lot of sense. v2: Reorder the comment about the transformation. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Conditionalize the POW reconstruction on shader compiler options.Eric Anholt2015-02-183-2/+6
| | | | | | | | | | | | | Mesa has a shader compiler struct flagging whether GLSL IR's opt_algebraic and other passes should try and generate certain types of opcodes or patterns. Extend that to NIR by defining our own struct, which is automatically generated from the Mesa struct in glsl_to_nir and provided directly by the driver in TGSI-to-NIR. v2: Split out the previous two prep patches. v3: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <[email protected]> (v2)
* nir: Add an optional expression controlling nir_algebraic xforms.Eric Anholt2015-02-181-7/+32
| | | | | | | | | | | | This will be used so that we can customize the transforms for the target GPU, so we don't un-lower expressions that had already been lowered (or introduce new lowering transformations that not all GPUs want) v2: Drop the complication of having the condition->index dictionary, since we don't actually expect there to be many different conditions (change by Kenneth). Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add a nir_shader_compiler_options struct pointed to by the shaders.Eric Anholt2015-02-184-4/+40
| | | | | | | | | This will be used to give the optimization passes a chance to customize behavior for the particular target device. v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965/simd8vs: Fix SIMD8 atomics (read-only)Jordan Justen2015-02-181-8/+16
| | | | | | | | | | | | | | | | An update for d9cd982d556be560af3bcbcdaf62b6b93eb934a5. A similar change was needed for CS to allow the piglit test tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test to pass. The previous change (d9cd982d) should fix cases that write atomics, such as atomicCounterIncrement, and this change will fix cases than only read atomics, such as atomicCounter. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* ilo: fix PCB alloc asserts on Gen7.5 GT3Chia-I Wu2015-02-181-1/+5
| | | | GT3 has two slices and all limits are doubled.
* ilo: fix compiler warningsChia-I Wu2015-02-183-8/+12
| | | | | Fix -Wmaybe-uninitialized warnings. The change to ilo_blit_resolve_slices_for_hiz() is a potential bug fix.
* i915: For the love of all that is holy, stop saying "IGD"Adam Jackson2015-02-181-7/+7
| | | | | | | a001 and a011 are pineview chips. Say so. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* auxiliary/vl: honour the DRI2PROTO_CFLAGSEmil Velikov2015-02-181-0/+1
| | | | | | | | Otherwise for non-default installations the build will fail to find the headers and error out. Cc: "10.5" <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* auxiliary/vl: Build vl_winsys_dri.c only when needed.Emil Velikov2015-02-181-0/+4
| | | | | | | | | | | | With commit c39dbfdd0f7(auxiliary/vl: bring back the VL code for the dri targets) we did not fully consider users of dri-swrast alone. Thus we ended up trying to compile the dri2 specific code on platform which lack it - Cygwin for example. Cc: "10.5" <[email protected]> Reported-by: Jon TURNEY <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jon TURNEY <[email protected]>
* glx: do not leak the dri2 extension informationEmil Velikov2015-02-181-1/+2
| | | | | | | | | | | | The XExtensionInfo is allocated dynamically (if the pointer is NULL) in the XEXT_GENERATE_FIND_DISPLAY macro. On the other hand the macro XEXT_GENERATE_CLOSE_DISPLAY does not check/free the memory. Follow the example set by dri1 and appledri, and use a static variable. Spotted while hunting "still reachable" leaks in Waffle. Signed-off-by: Emil Velikov <[email protected]>
* Revert "radeon/llvm: enable unsafe math for graphics shaders"Michel Dänzer2015-02-181-4/+0
| | | | | | | | | | | This reverts commit 0e9cdedd2e3943bdb7f3543a3508b883b167e427. It caused the grass to disappear in The Talos Principle. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89069 Cc: "10.5 10.4" <[email protected]> Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: add ARB_pipeline_statistics_query supportIlia Mirkin2015-02-182-4/+55
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965: implement ARB_pipeline_statistics_queryBen Widawsky2015-02-173-0/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NOTE: The implementation was initially one patch, this. All the history is kept here, even though all the core mesa changes were moved to the parent of this patch. This patch implements ARB_pipeline_statistics_query. This addition to GL does not add a new API. Instead, it adds new tokens to the existing query APIs. The work to hook up the new tokens is trivial due to it's similarity to the previous work done for the query APIs. I've implemented all the new tokens to some degree, but have stubbed out the untested ones at the entry point for Begin(). Doing this should allow the remainder of the code to be left in. The new tokens give GL clients a way to obtain stats about the GL pipeline. Generally, you get the number of things going in, invocations, and number of things coming out, primitives, of the various stages. There are two immediate uses for this, performance information, and debugging various types of misrendering. I doubt one can use these for debugging very complex applications, but for piglit tests, it should be quite useful. Tessellation shaders, and compute shaders are not addressed in this patch because there is no upstream implementation. I've implemented how I believe tessellation shader stats will work for Intel hardware (though there is a bit of ambiguity). Compute shaders are a bit more interesting though, and I don't yet know what we'll do there. For the lazy, here is a link to the relevant part of the spec: https://www.opengl.org/registry/specs/ARB/pipeline_statistics_query.txt Running the piglit tests http://lists.freedesktop.org/archives/piglit/2014-November/013321.html (http://cgit.freedesktop.org/~bwidawsk/piglit/log/?h=pipe_stats) yield the following results: > piglit-run.py -t stats tests/all.py output/pipeline_stats > [5/5] pass: 5 Running Test(s): 5 v2: - Don't allow pipeline_stats to be per stream (Ilia). This may (not sure) be needed for AMD_transform_feedback4, which we do not support. > If AMD_transform_feedback4 is supported then GEOMETRY_SHADER_PRIMITIVES_- > EMITTED_ARB counts primitives emitted to any of the vertex streams for > which STREAM_RASTERIZATION_AMD is enabled. - Remove comment from GL3.txt because it is only used for extensions that are part of required versions (Ilia) - Move the new tokens to a new XML doc instead of using the main GL4x.xml (Ilia) - Add a fallthrough comment (Ilia) - Only divide PS invocations by 4 on HSW+ (Ben) v3: - Add ARB_pipeline_statistics_query to relnotes.html - Add ARB_pipeline_statistics_query.xml to the Makefile.am, and master XML (Ilia) - Correct extension number (Ilia) - Add link to xml in the main GL API xml (Ilia) - remove special GS case from gen6_end_query (Ian) - Make lookup table static so gcc doesn't initialized it on every call (Ian) - Use if (_mesa_has_geometry_shaders(ctx)) instead of explicit checks (Ian) - Core mesa parts moved into a prep patch (Ilia) v4: - Change to 10.6 relnotes since we missed 10.5 window - Moved compute shader stuff into the switch statement (Jordan) - Jordan: Add compute shader support v5: - Fixed relnote style (Ilia) v6: - Rebased on master which beat me to adding the first relnotes - essentially this undoes v5 (which had a typo anyway) - Some code style fixes (Ken) - Remove some excess comments (Ken) - Unify tessellation failure style - unreachable (Ken) - Fix workaround comment for PS invocations (Ken) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Add support for the ARB_pipeline_statistics_query extensionBen Widawsky2015-02-177-0/+136
| | | | | | | | | | | | | | | | | | | | | | | | | This was originally part of a single patch which added the extension, and implemented it for i965 classic. For information about the evolution of the patch, please see the subsequent commit. One difference here as compared to the original mega patch is this does build support for the compute shader query. Since it cannot be tested on any platform, it will always return NULL for now. Jordan has already written a patch to address this, and when that patch lands, this logic can be modified. v2: Fix typo in subject (Brian Paul) Add checks for desktop gl (Ilia) Fail for any callers for now (Ilia) Update QueryCounterBits for new tokens (Ilia) Jordan: Use _mesa_has_compute_shaders Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> v3: Rebased on patch which adds the proper information to unstub tessellation shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Add _mesa_has_compute_shadersJordan Justen2015-02-171-0/+11
| | | | | | | | | | v2 (Ben): Change GLboolean to bool as requested by Ian Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Ben Widawsky <[email protected]>
* mesa: Add ARB_tessellation_shader to extension table.Fabian Bieler2015-02-172-0/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Prefer Meta over the BLT for BlitFramebuffer.Kenneth Graunke2015-02-171-7/+7
| | | | | | | | | | | | | | | | | There's some debate about whether we should use Meta or BLORP, but either should run circles around the BLT engine. In particular, this means that Gen8+ will use the 3D engine for blits, like we do on Gen6-7. Improves performance in "copypixrate -blit -back" (from Mesa demos) by 232.037% +/- 3.15795% (n=10) on Broadwell GT3e. v2: Rebase on Laura's changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.5" <[email protected]>
* i965/fs: Add algebraic optimizations for MAD.Matt Turner2015-02-171-0/+43
| | | | | | | | | total instructions in shared programs: 5764176 -> 5763808 (-0.01%) instructions in affected programs: 25121 -> 24753 (-1.46%) helped: 164 HURT: 2 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Emit MAD instructions when possible.Matt Turner2015-02-172-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we didn't emit MAD instructions since they cannot take immediate arguments, but with the opt_combine_constants() pass we can handle this properly. total instructions in shared programs: 5920017 -> 5733278 (-3.15%) instructions in affected programs: 3625153 -> 3438414 (-5.15%) helped: 22017 HURT: 870 GAINED: 91 LOST: 49 Without constant pooling, this patch is a complete loss: total instructions in shared programs: 5912589 -> 5987888 (1.27%) instructions in affected programs: 3190050 -> 3265349 (2.36%) helped: 1564 HURT: 17827 GAINED: 27 LOST: 101 And since the constant pooling patch by itself hurt a bunch of things, from before constant pooling to this patch the results are: total instructions in shared programs: 5895414 -> 5747946 (-2.50%) instructions in affected programs: 3617993 -> 3470525 (-4.08%) helped: 20478 HURT: 4469 GAINED: 54 LOST: 146 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allow immediates in MAD and LRP instructions.Matt Turner2015-02-172-3/+33
| | | | | | | | | | | | | | | And then the opt_combine_constants() pass will pull them out into registers. This will allow us to do some algebraic optimizations on MAD and LRP. total instructions in shared programs: 5946656 -> 5931320 (-0.26%) instructions in affected programs: 778247 -> 762911 (-1.97%) helped: 3780 HURT: 6 GAINED: 12 LOST: 12 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add pass to combine immediates.Matt Turner2015-02-174-0/+287
| | | | | | | | | | | | | total instructions in shared programs: 5885407 -> 5940958 (0.94%) instructions in affected programs: 3617311 -> 3672862 (1.54%) helped: 3 HURT: 23556 GAINED: 31 LOST: 165 ... but will allow us to always emit MAD instructions. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove force_writemask_all assertion for execsize < 8.Matt Turner2015-02-171-1/+0
| | | | | | | This doesn't seem to be necessary. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cfg: Add function to generate a dot file of the dominator tree.Matt Turner2015-02-172-0/+11
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cfg: Add function to generate a dot file of the CFG.Matt Turner2015-02-172-0/+15
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cfg: Calculate the immediate dominators.Matt Turner2015-02-172-4/+76
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cfg: Allow cfg::dump to be called without a visitor.Matt Turner2015-02-171-1/+2
| | | | | | | | | The fs_visitor's dump_instruction() implementation calls cfg_t() indirectly through calculate_live_intervals, so if you have an infinite loop in the CFG code, you can't call cfg::dump(fs_visitor *) to debug it. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Allow exec_list sentinels as arguments to insert functions.Matt Turner2015-02-171-2/+4
| | | | | | | | | | | | | | | | | | To insert an instruction at the end of a basic block, we typically do something like inst = block->last_non_control_flow_inst(); inst->insert_after(block, new_inst); But blocks can consist of a single control flow instruction, so inst will actually be the exec_list's head sentinel. We shouldn't use it as if it were a regular instruction, but it is safe to insert something after it. This patch avoids assert-failing because an exec_list sentinel wasn't in the basic block's instruction list. Reviewed-by: Kenneth Graunke <[email protected]>
* Make _mesa_swizzle_and_convert argument types in .c match those in .hAlan Coopersmith2015-02-171-2/+2
| | | | | | | | | | Caused Solaris Studio compilers to fail to build with errors about incompatible function redefinitions. Signed-off-by: Alan Coopersmith <[email protected]> Cc: "10.5" <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* Use __typeof instead of typeof with Solaris Studio compilersAlan Coopersmith2015-02-171-3/+3
| | | | | | | | | | While the C compiler accepts typeof, C++ requires __typeof. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86944 Signed-off-by: Alan Coopersmith <[email protected]> Cc: "10.5" <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* Avoid fighting with Solaris headers over isnormal()Alan Coopersmith2015-02-171-1/+1
| | | | | | | | | | | When compiling in C99 or C++11 modes, Solaris defines isnormal() as a macro via <math.h>, which causes the function definition to become too mangled to compile. Signed-off-by: Alan Coopersmith <[email protected]> Cc: "10.5" <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* Remove extraneous ; after DECL_TYPE usageAlan Coopersmith2015-02-171-33/+33
| | | | | | | | | | | | | | The macro is defined to provide a trailing ; so this caused the expansion to end in ";;" which made the Solaris Studio compilers issue warnings for every line of: "builtin_type_macros.h", line 113: Warning: extra ";" ignored. for every file that included the header, filling build logs with thousands of useless warnings. Signed-off-by: Alan Coopersmith <[email protected]> Cc: "10.5" <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* glsl: Reduce memory consumption of copy propagation passes.Kenneth Graunke2015-02-172-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | | opt_copy_propagation and opt_copy_propagation_elements create new ACP and Kill sets each time they enter a new control flow block. For if blocks, they also copy the entire existing ACP set contents into the new set. When we exit the control flow block, we discard the new sets. However, we weren't freeing them - so they lived on until the pass finished. This can waste a lot of memory (57MB on one pessimal shader). This patch makes the pass allocate ACP entries using this->acp as the memory context, and Kill entries out of this->kill. It also steals kill entries when moving them from the inner kill list to the parent. It then frees the lists, including their contents. v2: Move ralloc_free(this->acp) just before this->acp = orig_acp (suggested by Eric Anholt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "10.5 10.4" <[email protected]>
* i965: Add device limits for tess threads & URB entriesChris Forbes2015-02-174-0/+48
| | | | | | | | This should cover all platforms prior to Skylake. Signed-off-by: Chris Forbes <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Ben Widawsky <[email protected]>