summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* automake,scons: Put NIR source files in a separate var to fix SCons build.Jose Fonseca2015-04-012-1/+4
| | | | | | SCons does not build NIR yet. Trivial.
* automake: Fix out-of-source builds.Jose Fonseca2015-04-011-0/+1
| | | | | | Add include path for generated nir_opcodes.h. Trivial.
* mesa: don't include colormac.h in format codeBrian Paul2015-04-013-2/+2
| | | | | Acked-by: Matt Turner <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* mesa: remove unneeded #include of colormac.hBrian Paul2015-04-0113-13/+5
| | | | | Acked-by: Matt Turner <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* tnl: remove unneeded #include of colormac.hBrian Paul2015-04-0111-11/+1
| | | | | Acked-by: Matt Turner <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* swrast: remove unneeded #include of colormac.hBrian Paul2015-04-0119-19/+4
| | | | | Acked-by: Matt Turner <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* mesa: remove unused macros from colormac.hBrian Paul2015-04-011-45/+0
| | | | | Acked-by: Matt Turner <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* nir: Recognize a pattern of bool frobbing from TGSI KILL_IF.Eric Anholt2015-04-011-0/+2
| | | | | | | | | | | | TGSI's conditional discards take float arg and negate it, so GLSL to TGSI generates a b2f and negates that value. Only, in NIR we want a proper bool once again, so we compare with 0. This is a lot of pointless extra instructions. total instructions in shared programs: 39735 -> 39702 (-0.08%) instructions in affected programs: 1342 -> 1309 (-2.46%) Reviewed-by: Connor Abbott <[email protected]>
* nir: Recognize a pattern for doing b2f without the opcode.Eric Anholt2015-04-011-0/+1
| | | | | | | Since we have patterns based on b2f, generate them if we see the b2f equivalent using an iand. This is common when generating NIR from TGSI. Reviewed-by: Connor Abbott <[email protected]>
* vc4: Add shader-db dumping of NIR instruction count.Eric Anholt2015-04-011-0/+29
| | | | | | | | I was previously using temporary disables of VC4 optimization to show the benefits of improved NIR optimization, but this can get me quick and dirty numbers for NIR-only improvements without having to add hacks to disable VC4's code (disabling of which might hide ways that the NIR changes would hurt actual VC4 codegen).
* vc4: Convert to consuming NIR.Eric Anholt2015-04-015-720/+707
| | | | | | | | | | | | | | | | | | | NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)
* gallium: Add tgsi_to_nir to get a nir_shader for a TGSI shader.Eric Anholt2015-04-013-0/+1454
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will be used by the VC4 driver for doing device-independent optimization, and hopefully eventually replacing its whole IR. It also may be useful to other drivers for the same reason. v2: Add all of the instructions I was relying on tgsi_lowering to remove, and more. v3: Rebase on SSA rework of the builder. v4: Use the NIR ineg operation instead of doing a src modifier. v5: Don't use ineg for fnegs. (infer_src_type on MOV doesn't do what I expect, again). v6: Fix handling of multi-channel KILL_IF sources. v7: Make ttn_get_f() return a swizzle of a scalar load_const, rather than a vector load_const. CSE doesn't recognize that srcs out of those channels are actually all the same. v8: Rebase on nir_builder auto-sizing, make the scalar arguments to non-ALU instructions actually be scalars. v9: Add support for if/loop instructions, additional texture targets, and untested support for indirect addressing on temps. v10: Rebase on master, drop bad comment about control flow and just choose the X channel, use int comparison opcodes in LIT for now, drop unused pipe_context argument.. v11: Fix translation of LRP (previously missed because I mis-translated back out), use nir_builder init helpers. v12: Rebase on master, adding explicit include of mtypes.h to get INTERP_QUALIFIER_* v13: Rebase on variables being in lists instead of hash tables, drop use of mtypes.h in favor of util/pipeline.h. Use Ken's nir_builder swizzle and fmov/imov_alu helpers, drop "struct" in front of nir_builder, use nir_builder directly as the function arg in a lot of cases, drop redundant members of ttn_compile that are also in nir_builder, drop some half-baked malloc failure handling. v14: The indirect uniform src0 should be scalar, not vector (noticed as odd by robclark, confirmed by cwabbott). Apply Ken's review to initialize s->num_uniforms and friends, skip ttn_channel for dot products, and use the simpler discard_if intrinsic. Reviewed-by: Kenneth Graunke <[email protected]> (v13) Acked-by: Rob Clark <[email protected]>
* vc4: Tell shader-db how big our UBOs are, if present.Eric Anholt2015-04-011-0/+6
| | | | I had regressed them for a while with the NIR work.
* mesa: Make a shared header for 3D pipeline enum / #defines.Eric Anholt2015-04-013-142/+173
| | | | | | | | | | | | | | | | | | NIR uses these enums/#defines in nir_variables and associated intrinsics, but I want to be able to use them from TGSI->NIR and NIR->TGSI. Otherwise, we had to pull in all of mtypes.h. This doesn't cover all of the enums we might want from a shared compiler core (like varying slots or vert attribs), but it at least covers what I need at the moment (system values and interp qualifiers). v2: Move to src/glsl since util/ is really vague. Include in Makefile.am list. Use plain bitshifts and stdint types instead of undefined BITFIELD64_BIT. v3: Rename to shader_enums.h. Move it into Makefile.sources. Reviewed-by: Kenneth Graunke <[email protected]> (v2, with recommendation to rename)
* nir: add nir_builder.h to the tarballEmil Velikov2015-04-011-0/+1
| | | | | | | | | | | | | | The header was added with commit 2a135c470e3(nir: Add an ALU op builder kind of like ir_builder.h) but did not made it into to the sources list. Fortunately it remained unused until a recent commit faf6106c6f6(nir: Implement a Mesa IR -> NIR translator.) v2: Remove the bogus dependency. Tweak commit message. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* xmlpool: remove the clean targetEmil Velikov2015-04-011-6/+4
| | | | | | | | ... by folding it into CLEANFILES. Don't worry about $(LANG) as it is essentially the first folder of $(POS). With the latter already handled. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* xmlpool: don't forget to ship the MOSEmil Velikov2015-04-011-1/+8
| | | | | | | | | | | | This will allow us to finally remove python from the build time dependencies list. Considering that you're building from a release tarball of course :-) Cc: Bernd Kuhls <[email protected]> Reported-by: Bernd Kuhls <[email protected]> Cc: "10.5" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* osmesa: don't try to bundle osmesa.def SConscriptEmil Velikov2015-04-011-2/+0
| | | | | | | | Both of which were removed with commit 69db422218b(scons: Don't build osmesa.) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* docs: note that classic osmesa/libEGL no longer builds with sconsEmil Velikov2015-04-012-4/+2
| | | | | | | Plus nuke the final reference to osmesa from README.WIN32. Reviewed-by: Jose Fonseca <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* i965: Handle scratch accesses where reladdr also points to scratch spaceIago Toral Quiroga2015-04-012-26/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a problem when we have IR like this: (array_ref (var_ref temps) (swiz x (expression ivec4 bitcast_f2i (swiz xxxx (array_ref (var_ref temps) (constant int (2)) ) )) )) ) ) where we are indexing an array with the result of an expression that accesses the same array. In this scenario, temps will be moved to scratch space and we will need to add scratch reads/writes for all accesses to temps, however, the current implementation does not consider the case where a reladdr pointer (obtained by indexing into temps trough a expression) points to a register that is also stored in scratch space (as in this case, where the expression used to index temps access temps[2]), and thus, requires a scratch read before it is accessed. v2 (Francisco Jerez): - Handle also recursive reladdr addressing. - Do not memcpy dst_reg into src_reg when rewriting reladdr. v3 (Francisco Jerez): - Reduce complexity by moving recursive reladdr scratch access handling to a separate recursive function. - Do not skip demoting reladdr index registers to scratch space if the top level GRF has already been visited. v4 (Francisco Jerez) - Remove redundant checks. - Simplify code by making emit_resolve_reladdr return a register with the original src data except for reg, reg_offset and reladdr. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89508 Reviewed-by: Francisco Jerez <[email protected]>
* gallivm: (trivial) fix the logic deciding if function call should be used...Roland Scheidegger2015-04-011-3/+1
| | | | | Copy and paste bug with the img filter decision. Since there's only 2 different filters anyway just drop this bit.
* mesa/fbo: lock ctx->Shared->Mutex when allocating renderbuffersMartin Peres2015-04-011-0/+2
| | | | | | | | | | | | | | | | This mutex is used to make sure the shared context does not change while some shared code is looking into it. Calling BindRenderbufferEXT BindRenderbuffer with a gles context would not take the mutex before allocating an entry. Commit a34669b then moved out the allocation out of bind_renderbuffer into allocate_renderbuffer before using it for the CreateRenderBuffer entry point. This thus also made this entry point unsafe. The issue has been hinted by Ilia Mirkin. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Martin Peres <[email protected]>
* mesa/fbo: do not assign a value that is never read later onMartin Peres2015-04-011-6/+3
| | | | | | | | | | | | | The issue has been detected by coverty. v2: - move the declaration of obj to the else clause (Brian Paul) v3: Review by Brian Paul - get rid of the obj declaration in favor of a direct reference Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Martin Peres <[email protected]>
* egl: add initial EGL_MESA_image_dma_buf_export v2.4Dave Airlie2015-04-0110-5/+332
| | | | | | | | | | | | | | | | | | | | | | | | | | At the moment to get an EGL image to a dma-buf file descriptor, you have to use EGL_MESA_drm_image, and then use libdrm to convert this to a file descriptor. This extension just provides an API modelled on EGL_MESA_drm_image, to return a dma-buf file descriptor. v2: update spec for new API proposal add internal queries to get the fourcc back from intel driver. v2.1: add gallium pieces. v2.2: add offsets to spec and API, rename fd->fds, stride->strides in API. rewrite spec a bit more, add some q/a v2.3: add modifiers to query interface and 64-bit type for that (Daniel Stone) specifiy what happens to num fds vs num planes differences. (Chad Versace) v2.4: fix grammar (Daniel Stone) Signed-off-by: Dave Airlie <[email protected]>
* i965/state: Remove brw->state.dirtyJordan Justen2015-03-312-7/+0
| | | | | | | | | We now use brw->NewGLState and brw->ctx.NewDriverState instead. Suggested-by: Kenneth Graunke <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/state: Don't use brw->state.dirty.mesaJordan Justen2015-03-315-12/+11
| | | | | | | | | | | | | | | Now, we only use brw->NewGLState. I used this bash & sed command in the i965 directory: for file in *.[ch] *.[ch]pp; do sed -i -e 's/brw->state\.dirty\.mesa/brw->NewGLState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/state: Don't use brw->state.dirty.brwJordan Justen2015-03-3133-83/+82
| | | | | | | | | | | | | | | Now, we only use ctx->NewDriverState. I used this bash & sed command in the i965 directory: for file in *.[ch] *.[ch]pp; do sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/state: Add compute pipeline with empty atom listsJordan Justen2015-03-313-1/+37
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/state: Only upload render programs for render state uploadsJordan Justen2015-03-311-20/+25
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/state: Create separate dirty state bits for each pipelineJordan Justen2015-03-312-27/+75
| | | | | | | | | | | | | | | | | | When clearing the state for a pipeline, we will save changed state for the other pipelines. v3: * Adjust brw_upload_pipeline_state * Don't pull pipeline state bits into common state bits * Don't clear pipeline state bits * Adjust 'clear' phase * brw_clear_dirty_bits is now brw_render_state_finished * Move cross-pipeline state flagging to brw_pipeline_state_finished * Move pipeline clears to brw_pipeline_state_finished Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/state: Support multiple pipelines in brw->num_atomsJordan Justen2015-03-312-39/+65
| | | | | | | | | | | | | | | | | brw->num_atoms is converted to an array, but currently just an array of length 1. Adds brw_copy_pipeline_atoms which copies the atoms for a pipeline, and sets brw->num_atoms[p] for pipeline p. v2: * Rename brw->atoms[] to render_atoms * Rename brw_add_pipeline_atoms to brw_copy_pipeline_atoms * Rename brw_pipeline_first_atom to brw_get_pipeline_atoms Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/state: Rename brw_clear_dirty_bits to brw_render_state_finishedJordan Justen2015-03-313-3/+3
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/state: Rename brw_upload_state to brw_upload_render_stateJordan Justen2015-03-313-10/+11
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* gallivm: do some hack heuristic to disable texture functionsRoland Scheidegger2015-04-011-0/+40
| | | | | | | | | | | | | | We've seen some cases where performance can hurt quite a bit. Technically, the more simple the function the more overhead there is for using a function for this (and the less benefits this provides). Hence don't do this if we expect the generated code to be simple. There's an even more important reason why this hurts performance, which is shaders reusing the same unit with some of the same inputs, as llvm cannot figure out the calculations are the same if they are performned in the function (even just reusing the same unit without any input being the same provides such optimization opportunities though not very much). This is something which would need to be handled by IPO passes however.
* i965/fs: Allow CSE to handle MULs with negated arguments.Matt Turner2015-03-311-5/+37
| | | | | | | | | | | | | | | | | | | | | mul x, -y is equivalent to mul -x, y; and mul x, y is the negation of mul x, -y. With NIR: total instructions in shared programs: 6167779 -> 6161193 (-0.11%) instructions in affected programs: 983511 -> 976925 (-0.67%) helped: 4106 HURT: 16 GAINED: 18 LOST: 7 Without NIR: total instructions in shared programs: 6192323 -> 6185299 (-0.11%) instructions in affected programs: 987875 -> 980851 (-0.71%) helped: 4146 HURT: 16 GAINED: 16 LOST: 0
* i965: Mark brw_inst_bits' brw_inst* parameter const.Matt Turner2015-03-311-12/+12
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Remove bogus Makefile dependency.Matt Turner2015-03-311-2/+0
|
* glsl: Reassociate multiplication of mat*mat*vec.Matt Turner2015-03-311-0/+14
| | | | | | | | | | | | | | | | | | | The typical case of mat4*mat4*vec4 is 80 scalar multiplications, but mat4*(mat4*vec4) is only 32. On HSW (with vec4 vertex shaders): instructions in affected programs: 4420 -> 3194 (-27.74%) On BDW (with scalar vertex shaders): instructions in affected programs: 12756 -> 6726 (-47.27%) Implementing a general matrix chain ordering is harder (or at least tedious) because of having to walk the GLSL IR to create a list of multiplicands. I'm guessing that this patch handles 90+% of cases, but of course to tell definitively you'd have to implement the general thing. Reviewed-by: Chris Forbes <[email protected]>
* glsl: Implement type inferencing of matrix types.Matt Turner2015-03-311-4/+6
| | | | Reviewed-by: Chris Forbes <[email protected]>
* glsl: Factor out a get_mul_type() function.Matt Turner2015-03-313-57/+78
| | | | Reviewed-by: Chris Forbes <[email protected]>
* nouveau: synchronize "scratch runout" destruction with the command streamMarcin Ślusarz2015-03-312-19/+37
| | | | | | | | | | | | | | | | | | | | | When nvc0_push_vbo calls nouveau_scratch_done it does not mean scratch buffers can be freed immediately. It means "when hardware advances to this place in the command stream the scratch buffers can be freed". To fix it, just postpone scratch runout destruction after current fence is signalled. The bug existed for a very long time. Nobody noticed, because "scratch runout" code path is rarely executed. Fixes hang at the very beginning of first mission in "Serious Sam 3" on nve7/gk107. It manifested as: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]] Cc: "10.4 10.5" <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* docs: document Viewperf 12 issuesBrian Paul2015-03-311-9/+85
| | | | Signed-off-by: Brian Paul <[email protected]>
* i965/skl: Avoid using the 1D stencil layout for stencil-only imagesNeil Roberts2015-03-311-1/+2
| | | | | | | | | | | | | Commit cf67ca9ffa9 made the layouting code pick a special layout for 1D images on Skylake. This should not be used for depth and stencil buffers because these need to be treated as 2D tiled images. However the patch was missing a check for images with a base format of GL_STENCIL_INDEX. In practice I don't think it's currently possible to hit this because Mesa doesn't support GL_ARB_texture_stencil8 and it's not possible to create a 1D renderbuffer, but it'll be good to be ready for when the extension is supported. Reviewed-by: Anuj Phogat <[email protected]>
* clover: Return CL_BUILD_ERROR for CL_PROGRAM_BUILD_STATUS when compilation ↵Tom Stellard2015-03-311-0/+2
| | | | | | | | | | | fails v2 v2: - Don't use _errs map Cc: 10.5 10.4 <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* radeonsi/compute: Default to the same PIPE_SHADER_CAP values as other shader ↵Tom Stellard2015-03-311-1/+5
| | | | | | | | | types v2 v2: - Fix typo Reviewed-by: Marek Olšák <[email protected]>
* radeon/vce: implement video usability information supportLeo Liu2015-03-313-0/+59
| | | | | | | | | This will help encoding VUI into the bitstream v2: make backward compatible Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/omx/enc: export framerate to vce driverLeo Liu2015-03-311-4/+4
| | | | | | | The framerate will be used for video usability info support by VCE driver Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* llvmpipe: enable ARB_texture_gatherRoland Scheidegger2015-03-312-3/+4
| | | | | | | | | | | | Just announce support for 4 components. While here also increase the max/min texel offsets (the limit is completely artificial, was chosen because that's what other hardware did, however there's other drivers using larger limits). Over a thousand little piglits skip->pass. v2: update docs/GL3.txt Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: implement TG4 for ARB_texture_gatherRoland Scheidegger2015-03-312-40/+133
| | | | | | | | | | | | | | | | This is quite trivial, essentially just follow all the same code you'd use with linear min/mag (and no mip) filter, then just skip the filtering after looking up the texels in favor of direct assignment of the right channel to the result. (This is though not true for the multi-offset version if we'd want to support it - for this would probably need to do something along the lines of 4x nearest sampling due to the necessity of doing coord wrapping individually per texel.) Supports multi-channel formats. From the SM5 gather cap bit, should support non-constant offsets, plus shadow comparisons (the former untested), but not component selection (should be easy to implement but all this stuff is not really exposable anyway for now). Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: add gather support to sampler interfaceRoland Scheidegger2015-03-313-21/+34
| | | | | | Luckily thanks to the revamped interface this is a lot less work now... Reviewed-by: Jose Fonseca <[email protected]>