summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Emit MADs from (x + -(y * z)).Matt Turner2015-01-151-0/+12
| | | | | | | | | | Just use the negation source modifier on one of the multiplicand arguments. total instructions in shared programs: 5889529 -> 5880016 (-0.16%) instructions in affected programs: 600846 -> 591333 (-1.58%) Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/nir: Do a final copy lowering pass before lowering locals to regsJason Ekstrand2015-01-151-0/+3
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Rename lower_variables to lower_vars_to_ssaJason Ekstrand2015-01-151-1/+1
| | | | | | | | The original name wasn't particularly descriptive. This one indicates that it actually gives you SSA values as opposed to the old pass which lowered variables to registers. Reviewed-by: Connor Abbott <[email protected]>
* nir/tex_instr: Add a nir_tex_src struct and dynamically allocate the src arrayJason Ekstrand2015-01-151-2/+2
| | | | | | | | This solves a number of problems. First is the ability to change the number of sources that a texture instruction has. Second, it solves the delema that may occur if a texture instruction has more than 4 sources. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Handle sample ID, position, and mask betterJason Ekstrand2015-01-152-12/+71
| | | | | | | | | | Before, we were emitting the full pile of setup instructions for sample_id and sample_pos every time they were used. With this commit, we emit them in their own pass once at the beginning of the shader and simply emit uses later on. When it comes time for setting up VS, we can put setup for its special values in the same pass. Reviewed-by: Connor Abbott <[email protected]>
* nir: Make load_const SSA-onlyJason Ekstrand2015-01-152-26/+3
| | | | | | | | As it was, we weren't ever using load_const in a non-SSA way. This allows us to substantially simplify the load_const instruction. If we ever need a non-SSA constant load, we can do a load_const and an imov. Reviewed-by: Connor Abbott <[email protected]>
* i965/nir: Move the other lowering passes to before out-of-SSAJason Ekstrand2015-01-151-6/+6
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir/lower_atomics: Use/support SSAJason Ekstrand2015-01-151-3/+3
| | | | | | | | | | | Previously, lower_atomics was non-SSA only. We assert-failed if the destination of an atomic operation intrinsic was an SSA def and we used temporary registers for computing offsets. This commit changes both of these behaviors. We now use SSA values for computing offsets (so we can optimize them) and we handle SSA destinations. We also move the pass to run before we go out of SSA on i965 as it now generates SSA values. Reviewed-by: Connor Abbott <[email protected]>
* nir: Remove predicationJason Ekstrand2015-01-151-62/+11
| | | | | | | | We stopped generating predicates in glsl_to_nir some time ago. Right now, it's all dead untested code that I'm not convinced always worked in the first place. If we decide we want them back, we can revert this patch. Reviewed-by: Connor Abbott <[email protected]>
* nir: Make bcsel a fully vector operationJason Ekstrand2015-01-151-3/+8
| | | | | | | | Previously, the condition was a scalar that applied to all components simultaneously. As of this commit, the condition is a vector and each component is switched seperately. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Add support for indirect texture arraysJason Ekstrand2015-01-151-4/+21
| | | | | | | | v2 Jason Ekstrand <[email protected]>: - Use the nir_tex_src_sampler_offset source type instead of the sampler_indirect thing that I cooked up before. Reviewed-by: Chris Forbes <[email protected]>
* nir/tex_instr: Rename the indirect source type and add an array sizeJason Ekstrand2015-01-151-1/+1
| | | | | | | | | In particular, we rename nir_tex_src_sampler_index to _sampler_offset and add a sampler_array_size field to nir_tex_instr. This way we can pass the size of sampler arrays through to backends even after removing the variable information and, with it, the type. Reviewed-by: Connor Abbott <[email protected]>
* nir: Use a source for uniform buffer indices instead of an indexJason Ekstrand2015-01-151-37/+59
| | | | | | | | | | In GLSL-to-NIR we were just setting the base index to 0 whenever there was an indirect so having it expressed as a sum makes no sense. Also, while a base offset may make sense for the memory location (first element in the array, etc.) it makes less sense for the actual uniform buffer index. This may change later, but it seems to make more sense for now. Reviewed-by: Connor Abbott <[email protected]>
* nir: Make texture instruction names more consistentJason Ekstrand2015-01-151-2/+2
| | | | | | | | This commit renames nir_instr_as_texture to nir_instr_as_tex and renames nir_instr_type_texture to nir_instr_type_tex to be consistent with nir_tex_instr. Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a basic constant folding passJason Ekstrand2015-01-151-0/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add an algebraic optimization passJason Ekstrand2015-01-151-1/+1
| | | | | | | | | This pass uses the previously built algebraic transformations framework and should act as an example for anyone else wanting to make an algebraic transformation pass for NIR. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a lowering pass for adding source modifiers where possibleJason Ekstrand2015-01-151-0/+5
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Implement the ARB_gpu_shader5 interpolation intrinsicsJason Ekstrand2015-01-151-0/+120
| | | | Reviewed-by: Chris Forbes <[email protected]>
* i965/fs_nir: Add a has_indirect flag and clean up some of the input/output codeJason Ekstrand2015-01-151-63/+14
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Vectorize intrinsicsJason Ekstrand2015-01-151-48/+16
| | | | | | | | | | We used to have the number of components built into the intrinsic. This meant that all of our load/store intrinsics had vec1, vec2, vec3, and vec4 variants. This lead to piles of switch statements to generate the correct intrinsic names, and introspection to figure out the number of components. We can make things much nicer by allowing "vectorized" intrinsics. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Use the new variable lowering codeJason Ekstrand2015-01-151-19/+25
| | | | | | | This commit switches us over to the new variable lowering code which is capable of properly handling lowering indirects as we go. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Don't dump the shader.Jason Ekstrand2015-01-151-5/+0
| | | | | | This is killing piglit. I'll leave the logging local Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Properly saturate multipliesJason Ekstrand2015-01-151-1/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Handle SSA constantsJason Ekstrand2015-01-151-17/+33
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Use an array rather than a hash table for register lookupJason Ekstrand2015-01-153-23/+30
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Add the CSE pass and actually run in a loopJason Ekstrand2015-01-151-13/+18
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a fused multiply-add peepholeJason Ekstrand2015-01-151-0/+2
|
* i965/fs_nir: Turn on the peephole select optimizationJason Ekstrand2015-01-151-0/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Validate optimization passesJason Ekstrand2015-01-151-8/+15
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Differentiate between signed and unsigned versions of find_msbJason Ekstrand2015-01-151-6/+8
| | | | | | | | | | | We also make the return types match GLSL. The GLSL spec specifies that findMSB and findLSB return a signed integer. Previously, nir had them return unsigned. This updates nir's behavior to match what GLSL expects. We also update the nir-to-fs generator to take the new instructions. While we're at it, we fix the case where the input to findMSB is zero. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Do retyping for ALU srouces in get_nir_alu_srcJason Ekstrand2015-01-151-15/+8
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Convert the shader to/from SSAJason Ekstrand2015-01-151-0/+9
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Don't duplicate emit_general_interpolationJason Ekstrand2015-01-152-110/+4
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: Don't take an ir_variable for emit_general_interpolationJason Ekstrand2015-01-154-35/+41
| | | | | | | | | | Previously, emit_general_interpolation took an ir_variable and pulled the information it needed from that. This meant that in fs_fp, we were constructing a dummy ir_variable just to pass into it. This commit makes emit_general_interpolation take only the information it needs and gets rid of the fs_fp cruft. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Add atomic counters supportJason Ekstrand2015-01-151-3/+22
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Handle coarse/fine derivativesJason Ekstrand2015-01-151-0/+18
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Add support for sample_pos and sample_idJason Ekstrand2015-01-151-3/+14
|
* Fix up varying pull constantsJason Ekstrand2015-01-151-1/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Use the correct texture offset immediateJason Ekstrand2015-01-151-4/+3
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Use the correct types for texture inputsJason Ekstrand2015-01-151-7/+25
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Make the sampler register always unsignedJason Ekstrand2015-01-151-2/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: Only use nir for 8-wide non-fast-clear shaders.Jason Ekstrand2015-01-151-1/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: add a NIR frontendConnor Abbott2015-01-155-4/+1756
| | | | | | | | | | This is similar to the GLSL IR frontend, except consuming NIR. This lets us test NIR as part of an actual compiler. v2: Jason Ekstrand <[email protected]>: Make brw_fs_nir build again Only use NIR of INTEL_USE_NIR is set whitespace fixes
* i965/fs: Don't pass through the coordinate typeConnor Abbott2015-01-153-22/+21
| | | | All we really need is the number of components.
* i965/fs: make emit_fragcoord_interpolation() not take an ir_variableConnor Abbott2015-01-154-9/+14
|
* i965: Store the atoms directly in the contextIan Romanick2015-01-142-4/+17
| | | | | | | | | | | | | | | | | Instead of having an extra pointer indirection in one of the hottest loops in the driver. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40) 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60) v2 (Ken): Cut size of array from 64 to 57 to save memory. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Micro-optimize brw_get_index_typeIan Romanick2015-01-143-14/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the switch-statement, GCC 4.8.3 produces a small pile of code with a branch. 00000000 <brw_get_index_type>: 000000: 8b 54 24 04 mov 0x4(%esp),%edx 000004: b8 01 00 00 00 mov $0x1,%eax 000009: 81 fa 03 14 00 00 cmp $0x1403,%edx 00000f: 74 0d je 00001e <brw_get_index_type+0x1e> 000011: 31 c0 xor %eax,%eax 000013: 81 fa 05 14 00 00 cmp $0x1405,%edx 000019: 0f 94 c0 sete %al 00001c: 01 c0 add %eax,%eax 00001e: c3 ret However, this could be two instructions. 00000000 <brw_get_index_type>: 000000: 2d 01 14 00 00 sub $0x1401,%eax 000005: d1 e8 shr %eax 000007: 90 nop 000008: 90 nop 000009: 90 nop 00000a: 90 nop 00000b: c3 ret The function was also moved to the header so that it could be inlined at the two call sites. Without this, 32-bit also needs to pull the parameter from the stack. This means there is a push, a call, a move, and a ret added to a two instruction function. The above code shows the function with __attribute__((regparm=1)), but even this adds several extra instructions. There is also an extra instruction on 64-bit to move the parameter to %eax for the subtract. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40) 64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40) Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix "vertex" vs. "geometry" and "VS" vs. "GS" in debug output.Kenneth Graunke2015-01-144-10/+21
| | | | | | | | | We were happily printing "Native code for unnamed vertex shader" and "VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output, as well as the KHR_debug output used by shader-db. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Pass a shader stage abbreviation to fs_generator().Kenneth Graunke2015-01-145-11/+15
| | | | | | | | | | | A lot of messages hardcoded the string "FS", which is confusing on Broadwell, where we use this code for VS support as well. shader-db particularly got confused, as it reported two "FS SIMD8" shaders, and no vertex shaders at all. Craziness ensued. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/miptree_map_blit: Don't do the initial copy if INVALIDATE_RANGE is setJason Ekstrand2015-01-131-8/+15
| | | | | | | | | | | Before we were always coping from the buffer being mapped into the temporary buffer. However, if INVALIDATE_RANGE is set, then we know that the data is going to be junk after we unmap so there's no point in doing the blit. This is important because doing the blit will cause a stall 3 lines later when we map the buffer. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>