summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* nir: Add nir_lower_alu_to_scalar.Eric Anholt2015-01-233-0/+188
| | | | | | | | | | | | | | | | This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted for vc4. v2: Use the nir_src_for_ssa() helper, and another instance of nir_alu_src_copy(). v3: Drop the non-SSA support. All intended callers will have SSA-only ALU ops. v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused unsupported() function, drop lower_context struct. v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert about weird input_sizes[]. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Make some helpers for copying ALU src/dests.Eric Anholt2015-01-234-9/+25
| | | | | | | | | There aren't many users yet, but I wanted to do this from my scalarizing pass. v2: Constify the src arguments. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add algebraic optimizations for division and reciprocal.Kenneth Graunke2015-01-231-0/+5
| | | | | | | | | | | | | | | | | These also exist in opt_algebraic.cpp. total NIR instructions in shared programs: 2011430 -> 2011211 (-0.01%) NIR instructions in affected programs: 42221 -> 42002 (-0.52%) helped: 198 total i965 instructions in shared programs: 6020553 -> 6020116 (-0.01%) i965 instructions in affected programs: 84322 -> 83885 (-0.52%) helped: 394 HURT: 1 (by 1 instruction) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Add algebraic optimizations for exponential/logarithmic functions.Kenneth Graunke2015-01-231-0/+10
| | | | | | | | | | | | | | | | | | Most of these exist in the GLSL IR algebraic pass already. However, SSA allows us to find more instances of the patterns. total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%) NIR instructions in affected programs: 124189 -> 120026 (-3.35%) helped: 604 total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%) i965 instructions in affected programs: 261295 -> 254507 (-2.60%) helped: 1295 HURT: 3 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Add algebraic optimizations for simplifying comparisons.Kenneth Graunke2015-01-231-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | The first batch removes bonus fnot/inot operations, possibly allowing other optimizations to better recognize patterns. The next batch replaces a fadd and constant 0.0 with an fneg - negation is usually free on GPUs, while addition is not. total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%) NIR instructions in affected programs: 411143 -> 405922 (-1.27%) helped: 2233 HURT: 214 A few shaders are hurt by a few instructions due to moving neg such that it has a constant operand, which is then folded, resulting in two distinct load_consts for x and -x. We can always clean that up later. total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%) i965 instructions in affected programs: 784980 -> 775093 (-1.26%) helped: 4508 HURT: 2 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Add algebraic optimizations for pointless shifts.Kenneth Graunke2015-01-231-0/+7
| | | | | | | | | | | | | | | | | The GLSL IR optimization pass contained these; we may as well include them too. v2: Fix a >> 0 and a << 0 optimizations (caught by Matt). No change in the number of NIR instructions on a shader-db run. total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%) i965 instructions in affected programs: 542 -> 537 (-0.92%) helped: 2 (in glamor) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Add a bunch of algebraic optimizations on logic/bit operations.Kenneth Graunke2015-01-231-0/+13
| | | | | | | | | | | | | | | | | | | | Matt and I noticed a bunch of "val <- ior a a" operations in a shader, so we decided to add an algebraic optimization for that. While there, I decided to add a bunch more of them. v2: Delete bogus fand/for optimizations (caught by Jason). total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%) NIR instructions in affected programs: 149634 -> 146937 (-1.80%) helped: 1032 total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%) i965 instructions in affected programs: 537 -> 542 (0.93%) HURT: 2 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Implement CSE on intrinsics that can be eliminated and reordered.Kenneth Graunke2015-01-231-2/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had load_input and load_uniform intrinsics repeated several times, with the same parameters, but each one generating a distinct SSA value. This made ALU operations on those values appear distinct as well. Generating distinct SSA values is silly - these are read only variables. CSE'ing them makes everything use a single SSA value, which then allows other operations to be CSE'd away as well. Generalizing a bit, it seems like we should be able to safely CSE any intrinsics that can be eliminated and reordered. I didn't implement support for variables for the time being. v2: Assert that info->num_variables == 0 (requested by Jason). total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%) NIR instructions in affected programs: 2413496 -> 2001071 (-17.09%) helped: 16872 total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%) i965 instructions in affected programs: 640654 -> 620094 (-3.21%) helped: 2071 HURT: 585 GAINED: 14 LOST: 25 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Pull nir_instr_can_cse()'s SSA checks out of the switch.Kenneth Graunke2015-01-231-2/+6
| | | | | | | | | | | This should not be a change in behavior, as all current cases that potentially answer "yes" require SSA. The next patch will introduce another case that requires SSA. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Report NIR instruction counts (in SSA form) via KHR_debug.Kenneth Graunke2015-01-231-0/+32
| | | | | | | | | | | | | | | | | This allows us to count NIR instructions via shader-db. Use "run" as normal. The results file will contain both NIR and assembly. Then, to generate a NIR report: ./report.py <(grep NIR results/foo) <(grep NIR results/bar) Or, to generate an i965 report: ./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Print NIR on INTEL_DEBUG=fs.Kenneth Graunke2015-01-231-0/+11
| | | | | | | | | This is useful for debugging and looking for optimization opportunities. It will need to be expanded when we add support for other scalar stages. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Do optimizations again just before lowering source mods.Kenneth Graunke2015-01-231-13/+21
| | | | | | | | | | | | | | | | We want to run CSE and algebraic optimizations again after lowering IO. Some of the passes in the optimization loop don't handle saturates and other modifiers, so run it before lowering to source modifiers. total instructions in shared programs: 6046190 -> 6045768 (-0.01%) instructions in affected programs: 22406 -> 21984 (-1.88%) helped: 47 HURT: 0 GAINED: 0 LOST: 0 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* loader: Remove NEED_OPENGL_COMMON check.Matt Turner2015-01-231-2/+0
| | | | HAVE_DRICOMMON is sufficient since OpenGL must be enabled for DRI.
* mesa: Build with subdir-objects.Matt Turner2015-01-235-573/+562
|
* glsl: Build a libglsl_util library.Matt Turner2015-01-232-17/+23
| | | | | Rather than sourcing files with ../dir/file.c which leads to distclean wiping out ../dir's .deps directory.
* mapi: Build with subdir-objects.Matt Turner2015-01-234-99/+53
|
* mapi: Remove vgapi from SUBDIRS.Matt Turner2015-01-231-3/+5
| | | | OpenVG is disabled with via autotools.
* mesa: Drop inclusion of glapi_gen.mk.Matt Turner2015-01-231-5/+1
| | | | | Some glapi headers used to be generated from this Makefile.am, but no longer.
* glsl: Build with subdir-objects.Matt Turner2015-01-233-190/+188
| | | | | | Apparently $(top_srcdir) is not expanded in a source list when using subdir-objects, so remove that. It's not clear to me why we were going to such lengths to prefix each source file anyway.
* nir: Add headers to distribution.Matt Turner2015-01-231-0/+2
|
* nir: Add nir_{opt_,}algebraic.py to distribution.Matt Turner2015-01-231-0/+2
|
* mesa: Add format_{un,}pack.py to distribution.Matt Turner2015-01-231-0/+2
|
* mesa: Remove pack_tmp.h from sources.Matt Turner2015-01-231-1/+0
| | | | Missed in commit 3a4de321.
* nir: add generated file to .gitignoreConnor Abbott2015-01-231-0/+1
| | | | | Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix min_vs_entries for CHVVille Syrjälä2015-01-231-1/+1
| | | | | | | According to BSpec the correct number for min_vs_entries is 34 for CHV. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
* i965: Fix max_wm_threads for CHVVille Syrjälä2015-01-231-1/+1
| | | | | | | | | | | | | | | | Change max_wm_threads to match the spec on CHV. The max number of threads in 3DSTATE_PS is always programmed to 64 and the hardware internally scales that depending on the GT SKU. So this doesn't change the max number of threads actually used, but it does affect the scratch space calculation. On CHV the old value was too small, so the amount of scratch space allocated wasn't sufficient to satisfy the actual max number of threads used. Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
* glsl: fix stale commentConnor Abbott2015-01-231-5/+4
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Connor Abbott <[email protected]>
* i965/emit: Assert that src1 is not an MRF after doing the MRF->GRF conversionJason Ekstrand2015-01-221-1/+1
| | | | | | | | | When emitting texturing from indirect texture units, we need to be able to scratch around in the header message. Since we only do this for >= HSW, this is ok since there are no MRFs. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj phogat <[email protected]>
* i965/emit: Do the sampler index adjustment directly in header.0.3Jason Ekstrand2015-01-224-7/+5
| | | | | | | | | | | Prior to this commit, the adjust_sampler_state_pointer function took an extra register that it could use as scratch space. The usual candidate was the destination of the sampler instruction. However, if that register ever aliased anything important such as the sampler index, this would scratch all over important data. Fortunately, the calculation is such that we can just do it in place and we don't need the scratch space at all. Reviewed-by: Chris Forbes <[email protected]>
* st/nine: Correctly handle when ff vs should have no texture coord input/outputAxel Davy2015-01-221-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous code semantic was: . if ff ps will not run a ff stage, then do not output texture coords for this stage for vs . if XYZRHW is used (position_t), use only the mode where input coordinates are copied to the outputs. Problem is when apps don't give texture inputs. When apps precise PASSTHRU, it means copy texture coord input to texture coord output if there is such input. The case where there is no texture coord input wasn't handled correctly. Drivers like r300 dislike when vs has inputs that are not fed. Moreover if the app uses ff vs with a programmable ps, we shouldn't look at what are the parameters of the ff ps to decide to output or not texture coordinates. The new code semantic is: . if XYZRHW is used, restrict to PASSTHRU . if PASSTHRU is used and no texture input is declared, then do not output texture coords for this stage The case where ff ps needs a texture coord input and ff vs doesn't output it is not handled, and should probably be a runtime error. This fixes 3Dmark05, which uses ff vs with programmable ps. Reviewed-by: Tiziano Bacocco <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* st/nine: Change comment relating to vertex shader inputs not matching ↵Axel Davy2015-01-221-5/+6
| | | | | | | declaration Reviewed-by: Tiziano Bacocco <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* st/nine: Allocate vs constbuf buffer for indirect addressing once.Axel Davy2015-01-223-5/+6
| | | | | | | | | | | | | | | When the shader does indirect addressing on the constants, we allocate a temporary constant buffer to which we copy the constants from the app given user constants and the constants filled in the shader. This patch makes this buffer be allocated once. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Signed-off-by: Tiziano Bacocco <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Allocate the correct size for the user constant bufferAxel Davy2015-01-223-7/+8
| | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Add variables containing the size of the constant buffersAxel Davy2015-01-223-6/+10
| | | | | | Reviewed-by: Tiziano Bacocco <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Fix sm3 relative addressing for non-debug buildAxel Davy2015-01-221-4/+0
| | | | | | | | | | | | | Relative addressing needs the constant buffer to get all the correct constants, even those defined by the shader. The code to copy the shader constants to the constant buffer was enabled only for debug build. Enable it always. Cc: "10.4" <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: David Heidelberg <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* st/nine: Remove unused code for psAxel Davy2015-01-223-40/+15
| | | | | | | | | Since constant indirect adressing is not allowed for ps, we can remove our code to handle that. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Correct rules for relative adressing and constants.Axel Davy2015-01-221-6/+8
| | | | | | | | | relative adressing for constants is possible only for vs float constants. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Implement TEXREG2AR, TEXREG2GB and TEXREG2RGBAxel Davy2015-01-221-3/+36
| | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Implement TEXDP3TEXAxel Davy2015-01-221-1/+19
| | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Implement TEXDP3Axel Davy2015-01-221-1/+11
| | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Implement TEXDEPTHAxel Davy2015-01-221-1/+22
| | | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: David Heidelberg <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Implement TEXM3x3SPECAxel Davy2015-01-221-1/+38
| | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Implement TEXM3x2TEXAxel Davy2015-01-221-1/+19
| | | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: David Heidelberg <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: implement TEXM3x2DEPTHAxel Davy2015-01-221-1/+26
| | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Fix TEXM3x3 and implement TEXM3x3VSPECAxel Davy2015-01-221-17/+36
| | | | | | | | | | The fix is that this line: "src[s] = tx->regs.vT[s];" is wrong if s doesn't start from 0. Instead access tx->regs.vT directly when needed. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Fill missing dst and src number for some instructions.Axel Davy2015-01-221-23/+23
| | | | | | | | | | Not filling them correctly results in bad padding and later crash. Reviewed-by: David Heidelberg <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Implement TEXCOORD special behavioursAxel Davy2015-01-221-5/+26
| | | | | | | | | | | | | | | | | | texcoord for ps < 1_4 should clamp between 0 and 1 the values. texcrd (texcoord ps 1_4) does not clamp and can be used with two modifiers _dw and _dz that means the channels are divided by w or z. Implement those in shared code, since the same modifiers can be used for texld ps 1_4. v2: replace DIV by RCP + MUL v3: Remove an useless MOV Reviewed-by: Tiziano Bacocco <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Fix CALLNZ implementationAxel Davy2015-01-221-9/+4
| | | | | | | | | | | | | | | | | Nothing seems to indicates the negation modifier would be stored in the instruction flags instead of the source modifier. tx_src_param has already handled it if it is in the source modifier. In addition, when the card supports native integers, the boolean are stored in 32 bits int and are equal to 0 or 0xFFFFFFFF. Given 0xFFFFFFFF is NaN if it was a float, better use UIF than IF. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* st/nine: Fix some fixed function pipeline operationAxel Davy2015-01-221-2/+4
| | | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>
* st/nine: Clamp ps 1.X constantsAxel Davy2015-01-221-0/+7
| | | | | | | | | This is wine (and windows) behaviour. Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Axel Davy <[email protected]> Cc: "10.4" <[email protected]>