summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nir: add an optimization to turn global registers into local registersConnor Abbott2015-01-153-0/+106
| | | | | After linking and inlining, this allows us to convert these registers into SSA values and optimise more code.
* nir: add a pass to lower atomicsConnor Abbott2015-01-153-0/+130
| | | | | v2: Jason Ekstrand <[email protected]> whitespace fixes
* nir: add a pass to lower system value readsConnor Abbott2015-01-153-0/+109
| | | | | v2: Jason Ekstrand <[email protected]>: whitespace fixes
* nir: add a pass to lower sampler instructionsConnor Abbott2015-01-153-0/+176
|
* nir: add a pass to remove unused variablesConnor Abbott2015-01-153-0/+141
| | | | | | | | After we lower variables, we want to delete them in order to free up some memory. v2: Jason Ekstrand <[email protected]>: whitespace fixes
* nir: keep track of the number of input, output, and uniform slotsConnor Abbott2015-01-153-4/+16
|
* nir: add a pass to lower variables for scalar backendsConnor Abbott2015-01-153-0/+1237
|
* nir: add a glsl-to-nir passConnor Abbott2015-01-153-1/+1797
| | | | | | v2: Jason Ekstrand <[email protected]>: Make glsl_to_nir build again fix whitespace
* nir: add a validation passConnor Abbott2015-01-153-0/+793
| | | | | | | This is similar to ir_validate.cpp. v2: Jason Ekstrand <[email protected]>: whitespace fixes
* nir: add a printerConnor Abbott2015-01-153-0/+915
| | | | | | | This is similar to ir_print_visitor.cpp. v2: Jason Ekstrand <[email protected]>: whitespace fixes
* SQUASH: Fix comments from ericJason Ekstrand2015-01-151-0/+3
| | | | Reviewed-by: Eric Anholt <[email protected]>
* SQUASH: Add an assertJason Ekstrand2015-01-151-0/+1
|
* nir: add core helper functionsConnor Abbott2015-01-153-3/+1815
| | | | | | | | | These include functions for adding and removing various bits of IR and helpers for iterating over all the sources and destinations of an instruction. This is similar to ir.cpp. v2: Jason Ekstrand <[email protected]>: whitespace and automake fixes
* SQUASH: Use the enum for the variable modeJason Ekstrand2015-01-151-1/+1
|
* nir: add the core datastructuresConnor Abbott2015-01-156-0/+1751
| | | | | | | | | | | | | This includes all the instructions, ifs, loops, functions, etc. This is similar to the information in ir.h. v2: Jason Ekstrand <[email protected]>: Include ralloc and hash_table from the util directory whitespace fixes Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-By glenn.kennard <[email protected]>
* nir: add a simple C wrapper around glsl_types.hConnor Abbott2015-01-154-1/+238
| | | | | | | v2: Jason Ekstrand <[email protected]>: whitespace and automake fixes Reviewed-by: Eric Anholt <[email protected]>
* nir: add initial READMEConnor Abbott2015-01-151-0/+118
| | | | Reviewed-by: Eric Anholt <[email protected]>
* exec_list: add a list_foreach_typed_reverse() macroConnor Abbott2015-01-151-0/+6
| | | | Reviewed-by: Eric Anholt <[email protected]>
* vc4: Add some dumping for STORE_TILE_BUFFER_GENERAL.Eric Anholt2015-01-151-1/+79
|
* vc4: Add dumping for the TILE_RENDERING_MODE_CONFIG packet.Eric Anholt2015-01-151-1/+70
| | | | I wanted to read it, so I wrote parsing.
* vc4: Fix CL dumping trying to dump too far.Eric Anholt2015-01-151-2/+2
| | | | | Execution will end at the cl->next, because that's what ct0ea/ct1ea get programmed to.
* vc4: Fix texture type masking.Eric Anholt2015-01-151-1/+1
| | | | | Everything from ETC1 to RGBA64 was getting its top bit dropped, but we didn't use any of those formats.
* vc4: Colormask should apply after all other fragment ops (like logic op).Eric Anholt2015-01-151-9/+18
| | | | | Theoretically it should apply after dithering as well, but ditehring for 565 happens in fixed function in the TLB store.
* vc4: No turning unpack arguments into small immediates.Eric Anholt2015-01-151-0/+3
| | | | | Since unpack only happens on things read from the A register file, we have to leave them as something that can be allocated to A (temp or uniform).
* vc4: Move the tests for src needing to be an A register to vc4_qir.c.Eric Anholt2015-01-153-17/+28
| | | | I want it from another location.
* vc4: Don't swap the raddr on instructions doing unpacks.Eric Anholt2015-01-151-0/+5
| | | | | It would mean different unpacking behavior, since only the A file does unpack (with PM==0).
* vc4: Don't let pairing happen with badly mismatched unpack flags.Eric Anholt2015-01-151-0/+39
| | | | | No difference on shader-db, but prevents definite regressions in the blending changes.
* vc4: Don't let pairing happen with badly mismatched pack flags.Eric Anholt2015-01-151-0/+39
| | | | | No difference on shader-db, but will become more important as I introduce more use of pack flags with the blending changes.
* vc4: Fix early Z behavior on hardware.Eric Anholt2015-01-151-2/+1
| | | | | | It turns out the simulator was not treating this bit the same as the RPi, and I'd forgotten to remove it when turning on early Z. The result was that you'd get big chunks of your rendering missing.
* Revert "radeonsi: only set BC_OPTIMIZE_DISABLE when necessary"Michel Dänzer2015-01-152-15/+6
| | | | | | | | | | | | | | | | This reverts commit 0543630d0b0d9d9f6eefbc14fbd3385d4de37ba0. It caused flickering artifacts in Steam games such as Team Fortress 2 or Left 4 Dead 2. We could probably only enable this optimization by also making sure the shader code only uses either SI_PARAM_LINEAR_CENTROID or SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader variant. Sorry I didn't remember this when reviewing the reverted change. Reviewed-by: Marek Olšák <[email protected]>
* st/clover: Adapt to TargetLibraryInfo.h move in LLVM SVN r226078Michel Dänzer2015-01-151-0/+4
| | | | Trivial.
* mesa: Micro-optimize _mesa_is_valid_prim_modeIan Romanick2015-01-141-18/+12
| | | | | | | | | | | | | | | | | | | You would not believe the mess GCC 4.8.3 generated for the old switch-statement. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence -0.37374% +/- 0.184057% (n=40) 64-bit: Difference at 95.0% confidence 0.966722% +/- 0.338442% (n=40) The regression on 32-bit is odd. Callgrind says the caller, _mesa_is_valid_prim_mode is faster. Before it says 2,293,760 cycles, and after it says 917,504. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Check for vertex program the same way in desktop GL and ESIan Romanick2015-01-141-11/+3
| | | | | | | | | | | | | | | On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Multithread: 32-bit: Difference at 95.0% confidence 0.416027% +/- 0.163529% (n=40) 64-bit: Difference at 95.0% confidence 0.494771% +/- 0.259985% (n=40) Gl32Batch7 had no difference proven at 95.0% confidence (n=120) on 32-bit or 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Drop index buffer bounds checkIan Romanick2015-01-141-48/+7
| | | | | | | | | | | | | | | | | | | | | The previous check was insufficient (as it did not take 'indices' into consideration), and DX10 hardware does not need this check anyway. Since index_bytes is no longer used, remove it. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 1.66929% +/- 0.230107% (n=40) 64-bit: Difference at 95.0% confidence -1.40848% +/- 0.288038% (n=40) The regression on 64-bit is odd. Callgrind says the caller, validate_DrawElements_common is faster. Before it says 10,321,920 cycles, and after it says 8,945,664. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Only check for a current vertex shader in core profileIan Romanick2015-01-141-1/+13
| | | | | | | | | | | | | | This doesn't affect performance, but it feels more correct. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: No difference proven at 95.0% confidence (n=120) 64-bit: No difference proven at 95.0% confidence (n=120) Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Only validate shaders that can exist in the contextIan Romanick2015-01-141-29/+49
| | | | | | | | | | | | On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 0.495267% +/- 0.202063% (n=40) 64-bit: Difference at 95.0% confidence 3.57576% +/- 0.288175% (n=40) Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Store the atoms directly in the contextIan Romanick2015-01-142-4/+17
| | | | | | | | | | | | | | | | | Instead of having an extra pointer indirection in one of the hottest loops in the driver. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40) 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60) v2 (Ken): Cut size of array from 64 to 57 to save memory. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Micro-optimize brw_get_index_typeIan Romanick2015-01-143-14/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the switch-statement, GCC 4.8.3 produces a small pile of code with a branch. 00000000 <brw_get_index_type>: 000000: 8b 54 24 04 mov 0x4(%esp),%edx 000004: b8 01 00 00 00 mov $0x1,%eax 000009: 81 fa 03 14 00 00 cmp $0x1403,%edx 00000f: 74 0d je 00001e <brw_get_index_type+0x1e> 000011: 31 c0 xor %eax,%eax 000013: 81 fa 05 14 00 00 cmp $0x1405,%edx 000019: 0f 94 c0 sete %al 00001c: 01 c0 add %eax,%eax 00001e: c3 ret However, this could be two instructions. 00000000 <brw_get_index_type>: 000000: 2d 01 14 00 00 sub $0x1401,%eax 000005: d1 e8 shr %eax 000007: 90 nop 000008: 90 nop 000009: 90 nop 00000a: 90 nop 00000b: c3 ret The function was also moved to the header so that it could be inlined at the two call sites. Without this, 32-bit also needs to pull the parameter from the stack. This means there is a push, a call, a move, and a ret added to a two instruction function. The above code shows the function with __attribute__((regparm=1)), but even this adds several extra instructions. There is also an extra instruction on 64-bit to move the parameter to %eax for the subtract. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40) 64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40) Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta: Put _mesa_meta_in_progress in the header fileIan Romanick2015-01-142-12/+5
| | | | | | | | | | | | | | | ...so that it can be inlined in the two places that call it. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: No difference proven at 95.0% confidence (n=120) 64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40) Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix "vertex" vs. "geometry" and "VS" vs. "GS" in debug output.Kenneth Graunke2015-01-144-10/+21
| | | | | | | | | We were happily printing "Native code for unnamed vertex shader" and "VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output, as well as the KHR_debug output used by shader-db. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Pass a shader stage abbreviation to fs_generator().Kenneth Graunke2015-01-145-11/+15
| | | | | | | | | | | A lot of messages hardcoded the string "FS", which is confusing on Broadwell, where we use this code for VS support as well. shader-db particularly got confused, as it reported two "FS SIMD8" shaders, and no vertex shaders at all. Craziness ensued. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* configure: add check for GNU indentSamuel Iglesias Gonsalvez2015-01-141-1/+7
| | | | | | | | | | | | | | Only GNU indent is supported when indenting autogenerated format_pack.c and format_unpack.c files. Some non-GNU indent (Mac OS X and FreeBSD) add extra whitespaces than break the build of those files. Fallback to 'cat' if a non-GNU indent is found. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=88335 Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Tested-by: Vinson Lee <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* configure: change required Python Mako version to 0.3.4Samuel Iglesias Gonsalvez2015-01-141-1/+1
| | | | | Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: rename RGBA8888_* format constants to something appropriate.Iago Toral Quiroga2015-01-146-22/+22
| | | | | | | | The 8888 suggests 8-bit components which is not correct, so replace that with the actual size of the components in each format. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/miptree_map_blit: Don't do the initial copy if INVALIDATE_RANGE is setJason Ekstrand2015-01-131-8/+15
| | | | | | | | | | | Before we were always coping from the buffer being mapped into the temporary buffer. However, if INVALIDATE_RANGE is set, then we know that the data is going to be junk after we unmap so there's no point in doing the blit. This is important because doing the blit will cause a stall 3 lines later when we map the buffer. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/glsl/glapi: enable GL_EXT_draw_buffers extensionTapani Pälli2015-01-146-2/+18
| | | | | | | | | | | | | | | | Patch enables ES2 extension that utilizes existing ES3 functionality. Changes make all the subtests to run and pass in WebGL conformance test 'webgl-draw-buffers' when running Chrome on OpenGL ES, also Piglit test 'draw_buffers_gles2' passes. v2: remove unused boolean (Ilia Mirkin) v3: proper error checking for invalid values (Chad Versace) v4: run error check explicitly for ES2 and ES3 (Kenneth Graunke) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/fs: Allow constant propagation between different typesJason Ekstrand2015-01-131-2/+2
| | | | | | | | | | | | | | This will be needed for NIR because it is typeless and treats all constants as uint32 values and reinterprets them when they are used later. This commit allows those values to be properly propagated. Also, this helps some synmark shaders because it allows us to copy propagate a 0x00000000UD into a 0.0F in a load_payload, which then lets us combine 4 load_payloads. instructions in affected programs: 2288 -> 2144 (-6.29%) Reviewed-by: Matt Turner <[email protected]>
* egl/wayland: Fix unused variable warningsChad Versace2015-01-131-2/+0
| | | | Remove ctx variables unused as of 70e8ccc459.
* mesa: Enable GL_RGB/GL_RGBA in GLES3 glGetInternalformativMike Mason2015-01-132-7/+15
| | | | | | | | | | | Removes commit 7894278 changes and moves fix to _mesa_GetInternalformativ(). The original commit enabled the GL_RGB and GL_RGBA unsized internal formats as valid for render buffers in GLES3, but this is incorrect. They should have only been enabled for GetInternalformativ() Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88079 Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* freedreno/ir3: handle "holes" in inputsRob Clark2015-01-131-1/+31
| | | | | | | | | | | | If, for example, only the x/y/w components of in.xyzw are actually used, we still need to have a group of four registers and assign all four components. The hardware can't write in.xy and in.w to discontiguous registers. To handle this, pad with a dummy NOP instruction, to keep the neighbor chain contiguous. This fixes a problem noticed with firefox OMTC. Signed-off-by: Rob Clark <[email protected]>