summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Rename do_<stage>_prog to brw_compile_<stage>_prog (and export)Carl Worth2015-04-029-32/+51
| | | | | | | | | | | | This is in preparation for these functions to be called from other files. This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Split out per-stage dirty-bit checking into separate functionsCarl Worth2015-04-024-35/+59
| | | | | | | | | | | | The dirty-bit checking from each brw_upload_<stage>_prog function is split out into its a new brw_<stage>_state_dirty function. This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Split out brw_<stage>_populate_key into their own functionsCarl Worth2015-04-023-40/+64
| | | | | | | | | | | | | | This commit splits portions of the existing brw_upload_vs_prog and brw_upload_gs_prog function into new brw_vs_populate_key and brw_gs_populate_key functions. This follows the same style as is already present for all other stages, (see brw_wm_populate_key, etc.). This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nv50/ir: avoid folding immediates into imad operationsIlia Mirkin2015-04-021-1/+2
| | | | | | | | Commit 09ee907266 added logic to fold immediates into mad operations, but the emission code is only there for fmad. Only allow it on float types. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix imad emission when dst == src2Ilia Mirkin2015-04-021-1/+1
| | | | | | | Commit fb63df22151f added 4-byte mad support, but only supported emission for floats. Disable it for ints for now. Signed-off-by: Ilia Mirkin <[email protected]>
* nir: Allocate nir_tex_instr::sources out of the instruction itself.Kenneth Graunke2015-04-021-1/+1
| | | | | | | | The lifetime of the sources array needs to be match the nir_tex_instr itself. So, allocate it using the instruction itself as the context. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Allocate predecessor and dominance frontier sets from block itself.Kenneth Graunke2015-04-021-2/+2
| | | | | | | | These sets are part of the block, and their lifetime needs to match the block itself. So, allocate them using the block itself as the context. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Allocate register fields out of the register itself.Kenneth Graunke2015-04-021-3/+3
| | | | | | | | | The lifetime of each register's use/def/if_use sets needs to match the register itself. So, allocate them using the register itself as the context. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Make nir_create_function() strdup the function name.Kenneth Graunke2015-04-021-1/+1
| | | | | | | | | | | | glsl_to_nir passes in the ir_function's name field; we were copying the pointer, but not duplicating the memory. We want to be able to free the linked GLSL IR program after translating to NIR, so we'll need to create a copy of the function name that the NIR shader actually owns. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Free dead variables when removing them.Kenneth Graunke2015-04-021-1/+3
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Combine remove_dead_local_vars() and remove_dead_global_vars().Kenneth Graunke2015-04-021-14/+4
| | | | | | | We can just pass a pointer to the list of variables, and reuse the code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* ralloc: Implement a new ralloc_adopt() API.Kenneth Graunke2015-04-022-0/+33
| | | | | | | | | | | | | | | | | | | | | ralloc_adopt() reparents all children from one context to another. Conceptually, ralloc_adopt(new_ctx, old_ctx) behaves like this pseudocode: foreach child of old_ctx: ralloc_steal(new_ctx, child) However, ralloc provides no way to iterate over a memory context's children, and ralloc_adopt does this task more efficiently anyway. One potential use of this is to implement a memory-sweeper pass: first, steal all of a context's memory to a temporary context. Then, walk over anything that should be kept, and ralloc_steal it back to the original context. Finally, free the temporary context. This works when the context is something that can't be freed (i.e. an important structure). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/opt_peephole_ffma: Fix a couple typos in a commentJason Ekstrand2015-04-021-2/+2
| | | | Acked-by: Matt Turner <[email protected]>
* mesa: add ARB_depth_buffer_float to ES3.0 required extension listIlia Mirkin2015-04-021-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* vc4: Add support for nir_iabs.Eric Anholt2015-04-021-0/+5
| | | | | Tested using the GLSL 1.30 tests for integer abs(). Not currently used, but it was one of the new opcodes used by robclark's idiv lowering.
* i965/generator: Get rid of the ! in the unreachable statementJason Ekstrand2015-04-021-1/+1
| | | | Reviewed-by: Mark Janes <[email protected]>
* nir/print: Correctly print swizzles for explicitly sized alu sourcesJason Ekstrand2015-04-021-12/+12
| | | | Reviewed-by: Connor Abbott <[email protected]>
* freedreno/a3xx: add MRT supportIlia Mirkin2015-04-029-139/+221
| | | | | | | The hardware only supports 4 MRTs. It should be possible to emulate support for 8, but doesn't seem worth the trouble. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: convert blit program to array for each number of rtsIlia Mirkin2015-04-0212-21/+45
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add support for laying out MRTs in gmemIlia Mirkin2015-04-022-16/+43
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add core infrastructure support for MRTsIlia Mirkin2015-04-024-8/+14
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/ir3: add support for FS_COLOR0_WRITES_ALL_CBUFS propertyIlia Mirkin2015-04-022-1/+10
| | | | | | | This will enable the driver to tell which regids to link up to which MRT outputs. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: add independent blend function supportIlia Mirkin2015-04-022-8/+9
| | | | | | This is needed for MRT support Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: remove alpha key from ir3_shaderIlia Mirkin2015-04-029-42/+8
| | | | | | | This complication is unnecessary and makes MRTs more complicated and likely to generate tons of variants. Signed-off-by: Ilia Mirkin <[email protected]>
* i915g: Implement EGL_EXT_image_dma_buf_importStéphane Marchesin2015-04-015-7/+38
| | | | | | | This adds all the plumbing to get EGL_EXT_image_dma_buf_import in i915g. Signed-off-by: Stéphane Marchesin <[email protected]>
* i965/fs: Relax type check in cmod propagation.Matt Turner2015-04-011-1/+3
| | | | | | | | | | | The thing we want to avoid is int/float comparisons, but int/unsigned comparisons with 0 are equivalent. total instructions in shared programs: 6194829 -> 6193996 (-0.01%) instructions in affected programs: 117192 -> 116359 (-0.71%) helped: 471 Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Remove useless ftrunc inside f2i/f2u.Matt Turner2015-04-011-0/+4
| | | | | | | No shader-db changes, probably because they're all removed by the GLSL compiler optimization added in commit 69ad5fd4. Reviewed-by: Eric Anholt <[email protected]>
* nir: Recognize (a < b || a < c) as a < max(b, c).Matt Turner2015-04-011-0/+2
| | | | | | | | | | Doesn't work for analogous && cases, because of NaNs. total instructions in shared programs: 6195712 -> 6194829 (-0.01%) instructions in affected programs: 42000 -> 41117 (-2.10%) helped: 403 Reviewed-by: Eric Anholt <[email protected]>
* nir: Add addition/multiplication identities of exp/log.Matt Turner2015-04-011-0/+6
| | | | | | | instructions in affected programs: 2858 -> 2808 (-1.75%) helped: 12 Reviewed-by: Eric Anholt <[email protected]>
* nir: Add identities for the log function.Matt Turner2015-04-011-0/+8
| | | | | | | | | The rcp(log(x)) pattern affects instruction counts. instructions in affected programs: 144 -> 138 (-4.17%) helped: 6 Reviewed-by: Eric Anholt <[email protected]>
* nir: Add identities for the exponential function.Matt Turner2015-04-011-0/+6
| | | | | | No changes in shader-db. Reviewed-by: Eric Anholt <[email protected]>
* nir: Recognize another open coded lrp.Matt Turner2015-04-011-0/+1
| | | | | | | | | total instructions in shared programs: 6195924 -> 6195768 (-0.00%) instructions in affected programs: 4876 -> 4720 (-3.20%) helped: 58 HURT: 10 Reviewed-by: Eric Anholt <[email protected]>
* nir: Recognize open coded lrp.Matt Turner2015-04-011-0/+1
| | | | | | | | | total instructions in shared programs: 6197614 -> 6195924 (-0.03%) instructions in affected programs: 34773 -> 33083 (-4.86%) helped: 147 HURT: 6 Reviewed-by: Eric Anholt <[email protected]>
* nir: Use _mesa_flsll(InputsRead) in prog->nir.Kenneth Graunke2015-04-011-2/+2
| | | | | | | | | | | | | | | | InputsRead is a 64-bit bitfield. Using _mesa_fls would silently truncate off the high bits, claiming inputs 32..56 (VARYING_SLOT_MAX) were never read. Using <= here was a hack I threw in at the last minute to fix programs which happened to use input slot 32. Switch back to using < now that the underlying problem is fixed. Fixes crashes in "Euro Truck Simulator 2" when using prog->nir, which uses input slot 33. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Implement _mesa_flsll().Kenneth Graunke2015-04-011-0/+24
| | | | | | | This is _mesa_fls() for 64-bit values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: In prog->nir, don't wrap dot products with ptn_channel(..., X).Kenneth Graunke2015-04-011-4/+4
| | | | | | | | ptn_move_dest and nir_fadd already take care of replicating the last channel out, so we can just use a scalar and skip splatting it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use the same nir options for all gensJason Ekstrand2015-04-011-10/+2
| | | | | | | If we tell NIR to split ffma's, then we don't need seperate options anymore. Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Run DCE again before going out of SSAJason Ekstrand2015-04-011-0/+2
| | | | | | | We run lowering and optimization passes that might leave garbage lying around. This keeps the FS cse from having to clean it up. Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Run the ffma peephole after the rest of the optimizationsJason Ekstrand2015-04-012-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea here is that fusing multiply-add combinations too early can reduce our ability to perform CSE and value-numbering. Instead, we split ffma opcodes up-front, hope CSE cleans up, and then fuse after-the-fact. Unless an algebraic pass does something silly where it inserts something between the multiply and the add, splitting and re-fusing should never cause a problem. We run the late algebraic optimizations after this so that things like compare-with-zero don't hurt our ability to fuse things. shader-db results for fragment shaders on Haswell: total instructions in shared programs: 4390538 -> 4379236 (-0.26%) instructions in affected programs: 989359 -> 978057 (-1.14%) helped: 5308 HURT: 97 GAINED: 78 LOST: 5 This does, unfortunately, cause some substantial hurt to a shader in Kerbal Space Program. However, the damage is caused by changing a single instruction from a ffma to an add. This, in turn, *decreases* register pressure in one part of the program causing it to fail to register allocate and spill. Given the overwhelmingly positive results in other shaders and the fact that the NIR for the Kerbal shaders is actually better, this should be considered a positive. Reviewed-by: Matt Turner <[email protected]>
* nir/peephole_ffma: Be less agressive about fusing multiply-addsJason Ekstrand2015-04-011-0/+41
| | | | | | | | | | | | shader-db results for fragment shaders on Haswell: total instructions in shared programs: 4395688 -> 4389623 (-0.14%) instructions in affected programs: 355876 -> 349811 (-1.70%) helped: 1455 HURT: 14 GAINED: 5 LOST: 0 Reviewed-by: Matt Turner <[email protected]>
* nir: Add a dedicated ffma peephole optimizationJason Ekstrand2015-04-013-0/+223
| | | | | | | | | | | | | i965/nir: Use the dedicated ffma peephole total instructions in shared programs: 4418748 -> 4394618 (-0.55%) instructions in affected programs: 1292790 -> 1268660 (-1.87%) helped: 5999 HURT: 457 GAINED: 4 LOST: 9 Reviewed-by: Matt Turner <[email protected]>
* nir: Move the compare-with-zero optimizations to the late sectionJason Ekstrand2015-04-011-4/+4
| | | | | | | | | | | | | total instructions in shared programs: 4422307 -> 4422363 (0.00%) instructions in affected programs: 4230 -> 4286 (1.32%) helped: 0 HURT: 12 While this does hurt some things, the losses are minor and it prevents the compare-with-zero optimization from fighting with ffma which is much more important. Reviewed-by: Matt Turner <[email protected]>
* nir/algebraic: Add a seperate section for "late" optimizationsJason Ekstrand2015-04-013-0/+13
| | | | | | i965/nir: Use the late optimizations Reviewed-by: Matt Turner <[email protected]>
* nir/algebraic: Remove a duplicate optimizationJason Ekstrand2015-04-011-3/+0
| | | | | | This optimization is repeated verbatim above Reviewed-by: Matt Turner <[email protected]>
* nir/algebraic: #define around structure definitionsJason Ekstrand2015-04-011-6/+11
| | | | | | | | Previously, we couldn't generate two algebraic passes in the same file because of multiple structure definitions. To solve this, we play the age-old header file trick and just #define around it. Reviewed-by: Matt Turner <[email protected]>
* nir/print: Don't print extra swizzzle componentsJason Ekstrand2015-04-011-7/+19
| | | | | | | | Previously, NIR would just print 4 swizzle components if the swizzle was anything other than foo.xyzw. This creates lots of noise if, for example, you have a one-component element with a swizzle of foo.xxxx. Reviewed-by: Kenneth Grunke <[email protected]>
* configure: nuke --with-max-{width,height}Emil Velikov2015-04-014-21/+4
| | | | | | | | | | Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH, MAX_HEIGHT). Update all the remaining references to the defines. v2: Use the correct variable name in the comments Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: ship tgsi_to_nir.h in the tarballEmil Velikov2015-04-011-1/+2
| | | | | Acked-by: Matt Turner <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* configure.ac: error out if python/mako is not found when requiredEmil Velikov2015-04-011-2/+11
| | | | | | | | | | | | In case of using a distribution tarball (or a dirty git tree) one can have the generated sources locally. Make configure.ac error out otherwise, to alert that about the unmet requirement(s) of python/mako. v2: Check only for a single file for each dependency. Suggested-by: Matt Turner <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Make sure not to dereference NULL.Matt Turner2015-04-011-0/+2
| | | | Found by Coverity.