summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* nir: Remove predicationJason Ekstrand2015-01-151-62/+11
| | | | | | | | We stopped generating predicates in glsl_to_nir some time ago. Right now, it's all dead untested code that I'm not convinced always worked in the first place. If we decide we want them back, we can revert this patch. Reviewed-by: Connor Abbott <[email protected]>
* nir: Make bcsel a fully vector operationJason Ekstrand2015-01-151-3/+8
| | | | | | | | Previously, the condition was a scalar that applied to all components simultaneously. As of this commit, the condition is a vector and each component is switched seperately. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Add support for indirect texture arraysJason Ekstrand2015-01-151-4/+21
| | | | | | | | v2 Jason Ekstrand <[email protected]>: - Use the nir_tex_src_sampler_offset source type instead of the sampler_indirect thing that I cooked up before. Reviewed-by: Chris Forbes <[email protected]>
* nir/tex_instr: Rename the indirect source type and add an array sizeJason Ekstrand2015-01-151-1/+1
| | | | | | | | | In particular, we rename nir_tex_src_sampler_index to _sampler_offset and add a sampler_array_size field to nir_tex_instr. This way we can pass the size of sampler arrays through to backends even after removing the variable information and, with it, the type. Reviewed-by: Connor Abbott <[email protected]>
* nir: Use a source for uniform buffer indices instead of an indexJason Ekstrand2015-01-151-37/+59
| | | | | | | | | | In GLSL-to-NIR we were just setting the base index to 0 whenever there was an indirect so having it expressed as a sum makes no sense. Also, while a base offset may make sense for the memory location (first element in the array, etc.) it makes less sense for the actual uniform buffer index. This may change later, but it seems to make more sense for now. Reviewed-by: Connor Abbott <[email protected]>
* nir: Make texture instruction names more consistentJason Ekstrand2015-01-151-2/+2
| | | | | | | | This commit renames nir_instr_as_texture to nir_instr_as_tex and renames nir_instr_type_texture to nir_instr_type_tex to be consistent with nir_tex_instr. Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a basic constant folding passJason Ekstrand2015-01-151-0/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add an algebraic optimization passJason Ekstrand2015-01-151-1/+1
| | | | | | | | | This pass uses the previously built algebraic transformations framework and should act as an example for anyone else wanting to make an algebraic transformation pass for NIR. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a lowering pass for adding source modifiers where possibleJason Ekstrand2015-01-151-0/+5
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Implement the ARB_gpu_shader5 interpolation intrinsicsJason Ekstrand2015-01-151-0/+120
| | | | Reviewed-by: Chris Forbes <[email protected]>
* i965/fs_nir: Add a has_indirect flag and clean up some of the input/output codeJason Ekstrand2015-01-151-63/+14
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Vectorize intrinsicsJason Ekstrand2015-01-151-48/+16
| | | | | | | | | | We used to have the number of components built into the intrinsic. This meant that all of our load/store intrinsics had vec1, vec2, vec3, and vec4 variants. This lead to piles of switch statements to generate the correct intrinsic names, and introspection to figure out the number of components. We can make things much nicer by allowing "vectorized" intrinsics. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Use the new variable lowering codeJason Ekstrand2015-01-151-19/+25
| | | | | | | This commit switches us over to the new variable lowering code which is capable of properly handling lowering indirects as we go. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Don't dump the shader.Jason Ekstrand2015-01-151-5/+0
| | | | | | This is killing piglit. I'll leave the logging local Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Properly saturate multipliesJason Ekstrand2015-01-151-1/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Handle SSA constantsJason Ekstrand2015-01-151-17/+33
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Use an array rather than a hash table for register lookupJason Ekstrand2015-01-153-23/+30
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Add the CSE pass and actually run in a loopJason Ekstrand2015-01-151-13/+18
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a fused multiply-add peepholeJason Ekstrand2015-01-151-0/+2
|
* i965/fs_nir: Turn on the peephole select optimizationJason Ekstrand2015-01-151-0/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Validate optimization passesJason Ekstrand2015-01-151-8/+15
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Differentiate between signed and unsigned versions of find_msbJason Ekstrand2015-01-151-6/+8
| | | | | | | | | | | We also make the return types match GLSL. The GLSL spec specifies that findMSB and findLSB return a signed integer. Previously, nir had them return unsigned. This updates nir's behavior to match what GLSL expects. We also update the nir-to-fs generator to take the new instructions. While we're at it, we fix the case where the input to findMSB is zero. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Do retyping for ALU srouces in get_nir_alu_srcJason Ekstrand2015-01-151-15/+8
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Add an SSA-based liveness analysis pass.Jason Ekstrand2015-01-151-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Convert the shader to/from SSAJason Ekstrand2015-01-151-0/+9
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Don't duplicate emit_general_interpolationJason Ekstrand2015-01-152-110/+4
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: Don't take an ir_variable for emit_general_interpolationJason Ekstrand2015-01-154-35/+41
| | | | | | | | | | Previously, emit_general_interpolation took an ir_variable and pulled the information it needed from that. This meant that in fs_fp, we were constructing a dummy ir_variable just to pass into it. This commit makes emit_general_interpolation take only the information it needs and gets rid of the fs_fp cruft. Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Add atomic counters supportJason Ekstrand2015-01-151-3/+22
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Handle coarse/fine derivativesJason Ekstrand2015-01-151-0/+18
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Add support for sample_pos and sample_idJason Ekstrand2015-01-151-3/+14
|
* Fix up varying pull constantsJason Ekstrand2015-01-151-1/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Use the correct texture offset immediateJason Ekstrand2015-01-151-4/+3
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Use the correct types for texture inputsJason Ekstrand2015-01-151-7/+25
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs_nir: Make the sampler register always unsignedJason Ekstrand2015-01-151-2/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: Only use nir for 8-wide non-fast-clear shaders.Jason Ekstrand2015-01-151-1/+2
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: add a NIR frontendConnor Abbott2015-01-155-4/+1756
| | | | | | | | | | This is similar to the GLSL IR frontend, except consuming NIR. This lets us test NIR as part of an actual compiler. v2: Jason Ekstrand <[email protected]>: Make brw_fs_nir build again Only use NIR of INTEL_USE_NIR is set whitespace fixes
* i965/fs: Don't pass through the coordinate typeConnor Abbott2015-01-153-22/+21
| | | | All we really need is the number of components.
* i965/fs: make emit_fragcoord_interpolation() not take an ir_variableConnor Abbott2015-01-154-9/+14
|
* mesa: Micro-optimize _mesa_is_valid_prim_modeIan Romanick2015-01-141-18/+12
| | | | | | | | | | | | | | | | | | | You would not believe the mess GCC 4.8.3 generated for the old switch-statement. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence -0.37374% +/- 0.184057% (n=40) 64-bit: Difference at 95.0% confidence 0.966722% +/- 0.338442% (n=40) The regression on 32-bit is odd. Callgrind says the caller, _mesa_is_valid_prim_mode is faster. Before it says 2,293,760 cycles, and after it says 917,504. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Check for vertex program the same way in desktop GL and ESIan Romanick2015-01-141-11/+3
| | | | | | | | | | | | | | | On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Multithread: 32-bit: Difference at 95.0% confidence 0.416027% +/- 0.163529% (n=40) 64-bit: Difference at 95.0% confidence 0.494771% +/- 0.259985% (n=40) Gl32Batch7 had no difference proven at 95.0% confidence (n=120) on 32-bit or 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Drop index buffer bounds checkIan Romanick2015-01-141-48/+7
| | | | | | | | | | | | | | | | | | | | | The previous check was insufficient (as it did not take 'indices' into consideration), and DX10 hardware does not need this check anyway. Since index_bytes is no longer used, remove it. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 1.66929% +/- 0.230107% (n=40) 64-bit: Difference at 95.0% confidence -1.40848% +/- 0.288038% (n=40) The regression on 64-bit is odd. Callgrind says the caller, validate_DrawElements_common is faster. Before it says 10,321,920 cycles, and after it says 8,945,664. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Only check for a current vertex shader in core profileIan Romanick2015-01-141-1/+13
| | | | | | | | | | | | | | This doesn't affect performance, but it feels more correct. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: No difference proven at 95.0% confidence (n=120) 64-bit: No difference proven at 95.0% confidence (n=120) Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Only validate shaders that can exist in the contextIan Romanick2015-01-141-29/+49
| | | | | | | | | | | | On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 0.495267% +/- 0.202063% (n=40) 64-bit: Difference at 95.0% confidence 3.57576% +/- 0.288175% (n=40) Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Store the atoms directly in the contextIan Romanick2015-01-142-4/+17
| | | | | | | | | | | | | | | | | Instead of having an extra pointer indirection in one of the hottest loops in the driver. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40) 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60) v2 (Ken): Cut size of array from 64 to 57 to save memory. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Micro-optimize brw_get_index_typeIan Romanick2015-01-143-14/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the switch-statement, GCC 4.8.3 produces a small pile of code with a branch. 00000000 <brw_get_index_type>: 000000: 8b 54 24 04 mov 0x4(%esp),%edx 000004: b8 01 00 00 00 mov $0x1,%eax 000009: 81 fa 03 14 00 00 cmp $0x1403,%edx 00000f: 74 0d je 00001e <brw_get_index_type+0x1e> 000011: 31 c0 xor %eax,%eax 000013: 81 fa 05 14 00 00 cmp $0x1405,%edx 000019: 0f 94 c0 sete %al 00001c: 01 c0 add %eax,%eax 00001e: c3 ret However, this could be two instructions. 00000000 <brw_get_index_type>: 000000: 2d 01 14 00 00 sub $0x1401,%eax 000005: d1 e8 shr %eax 000007: 90 nop 000008: 90 nop 000009: 90 nop 00000a: 90 nop 00000b: c3 ret The function was also moved to the header so that it could be inlined at the two call sites. Without this, 32-bit also needs to pull the parameter from the stack. This means there is a push, a call, a move, and a ret added to a two instruction function. The above code shows the function with __attribute__((regparm=1)), but even this adds several extra instructions. There is also an extra instruction on 64-bit to move the parameter to %eax for the subtract. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40) 64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40) Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta: Put _mesa_meta_in_progress in the header fileIan Romanick2015-01-142-12/+5
| | | | | | | | | | | | | | | ...so that it can be inlined in the two places that call it. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: No difference proven at 95.0% confidence (n=120) 64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40) Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix "vertex" vs. "geometry" and "VS" vs. "GS" in debug output.Kenneth Graunke2015-01-144-10/+21
| | | | | | | | | We were happily printing "Native code for unnamed vertex shader" and "VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output, as well as the KHR_debug output used by shader-db. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Pass a shader stage abbreviation to fs_generator().Kenneth Graunke2015-01-145-11/+15
| | | | | | | | | | | A lot of messages hardcoded the string "FS", which is confusing on Broadwell, where we use this code for VS support as well. shader-db particularly got confused, as it reported two "FS SIMD8" shaders, and no vertex shaders at all. Craziness ensued. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: rename RGBA8888_* format constants to something appropriate.Iago Toral Quiroga2015-01-146-22/+22
| | | | | | | | The 8888 suggests 8-bit components which is not correct, so replace that with the actual size of the components in each format. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/miptree_map_blit: Don't do the initial copy if INVALIDATE_RANGE is setJason Ekstrand2015-01-131-8/+15
| | | | | | | | | | | Before we were always coping from the buffer being mapped into the temporary buffer. However, if INVALIDATE_RANGE is set, then we know that the data is going to be junk after we unmap so there's no point in doing the blit. This is important because doing the blit will cause a stall 3 lines later when we map the buffer. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>