summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Mesa: Add support for GL_OES_texture_*float* extensions.Kalyan Kondapally2015-01-294-0/+132
| | | | | | | | | | | | | | | | | | | This patch series adds support for following GLES2 Texture Float extensions: 1)GL_OES_texture_float, 2)GL_OES_texture_half_float, 3)GL_OES_texture_float_linear, 4)GL_OES_texture_half_float_linear. This patch adds basic infrastructure and needed boolean flags to advertise support for these extensions, by default the support is disabled. Next patch in the series introduces support for HALF_FLOAT_OES token. v4: take assert away and make valid_filter_for_float conditional (Tapani), fix the alphabetical order (Emil) Signed-off-by: Kevin Rogovin <[email protected]> Signed-off-by: Kalyan Kondapally <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nir: Make vec-to-movs handle src/dest aliasing.Eric Anholt2015-01-281-10/+72
| | | | | | | | | | | | | | | It now emits vector MOVs instead of a series of individual MOVs, which should be useful to any vector backends. This pushes the problem of src/dest aliasing of channels on a scalar chip to the backend, but if there are any vector operations in your shader then you needed to be handling this already. Fixes fs-swap-problem with my scalarizing patches. v2: Rename to insert_mov(), and add a comment about what it does. v3: Rewrite the comment. Reviewed-by: Connor Abbott <[email protected]> (v3)
* gallium: Replace u_simple_list.h with util/simple_list.hEric Anholt2015-01-2825-228/+23
| | | | | | | The code was exactly the same, except util/ has c++ guards and a struct simple_node declaration. Reviewed-by: Marek Olšák <[email protected]>
* mesa: Port a variant of 68afbe89c72d085dcbbf2b264f0201ab73fe339e to util/Eric Anholt2015-01-281-0/+1
| | | | | | | | The idea is that after a remove_from_list(), you might want to be able to do a remove_from_list() on it again or an is_empty_list(). This is apparently relied on by r300g. Reviewed-by: Marek Olšák <[email protected]>
* mesa: Move simple_list.h to src/util.Eric Anholt2015-01-2831-28/+29
| | | | | | We have two copies of it in the tree, I'm going to delete one. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Enable VGPR spilling for all shader types v5Tom Stellard2015-01-288-52/+217
| | | | | | | | | | | | | | | | | | | | v2: - Only emit write SPI_TMPRING_SIZE once per packet. - Use context global scratch buffer. v3: - Patch shaders using WRITE_DATA packet instead of map/unmap. - Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and VS_PARTIAL_FLUSH when patching shaders. v4: - Code cleanups. - Remove unnecessary multiplies. v5: - Patch shaders in system memory and re-upload to vram. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi/compute: Allocate the scratch buffer during state creationTom Stellard2015-01-282-24/+62
| | | | | | | | | | This moves scratch buffer allocation from si_launch_grid() to si_create_compute_state(). This helps to reduce the overhead of launching a kernel and also fixes a bug in the code that would cause the scratch buffer to be too small if a kernel with smaller scratch size was launched before a kernel with a larger scratch size. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: Add radeon_shader_binary member to struct si_shaderTom Stellard2015-01-282-6/+6
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi/compute: Rename si_compute::program to si_compute::shaderTom Stellard2015-01-281-5/+5
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: Avoid leaking memory when rebuilding shader statesMarek Olšák2015-01-283-4/+13
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* nir/opcodes: Use a return type of tfloat for ldexpJason Ekstrand2015-01-281-1/+1
| | | | | Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* Revert "util: Move the alternate fpclassify implementation to util"Jason Ekstrand2015-01-282-63/+50
| | | | | | | | | | | | | This reverts commits d6eb572905e39c36168b8f5da240af961f9dde0a and 58e8468d113c7d3d4a59ea4a8d70fd45b78e85e6. This is no longer necessary as we aren't using it in NIR anymore. Also, it broke the build on some strange systems so let's put it back in querymatrix where it came from. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88852 Acked-by: Matt Turner <[email protected]>
* Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp"Jason Ekstrand2015-01-281-1/+1
| | | | | | | | | | | This reverts commit d7d340fb2f68c46bd5a0008ecf53c6693e29c916. We have an isnormal() implementation available, the only problem was that we had the wrong return type (fixed in a later patch). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806 Acked-by: Matt Turner <[email protected]>
* util: Predicate the fpclassify fallback on !defined(__cplusplus)Jason Ekstrand2015-01-281-2/+12
| | | | | | | | | | | The problem is that the fallbacks we have at the moment don't work in C++. While we could theoretically fix the fallbacks it would also raise the issue of correctly detecting the fpclassify function. So, for now, we'll just disable it until we actually have a C++ user. Reported-by: Tom Stellard <[email protected]> Tested-by: Tom Stellard <[email protected]> Tested-by: EdB <[email protected]>
* drirc: set allow_glsl_extension_directive_midshader for Dead Island.Sven Arvidsson2015-01-281-0/+4
| | | | | | Signed-off-by: Sven Arvidsson <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87076 Signed-off-by: Marek Olšák <[email protected]>
* nir/opcodes: Use fpclassify() instead of isnormal() for ldexpJason Ekstrand2015-01-281-1/+1
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806 Reviewed-by: Ian Romanick <[email protected]>
* util: Move the alternate fpclassify implementation to utilJason Ekstrand2015-01-282-50/+53
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/tex: Don't create read-write textures with non-renderable formatsJason Ekstrand2015-01-281-0/+5
| | | | | | | | | I haven't actually seen this bug in the wild, but it's possible that someone could ask to do a S3TC PBO download or something. This protects us from accidentally creating a render target with a compressed or otherwise non-renderable format. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen8: Include the buffer offset when emitting renderbuffer relocsJason Ekstrand2015-01-281-1/+1
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88792 Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: improve error messaging for format CSV parserTapani Pälli2015-01-282-2/+7
| | | | | | | | Patch adds 2 error messages that point user directly to fix mispelled or impossible swizzle field for a format. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* clover/llvm: Dump the OpenCL C code earlier.EdB2015-01-281-3/+3
| | | | | | | | [ Francisco Jerez: As discussed on the mailing list, this is intended to produce more useful debug output in cases where the compilation terminates unexpectedly. ] Reviewed-by: Francisco Jerez <[email protected]>
* clover/llvm: Move CLOVER_DEBUG stuff into anonymous namespace.EdB2015-01-281-13/+20
| | | | | | | [ Francisco Jerez: As we're at it make debug_options[] local to its only user and remove temporary. ] Reviewed-by: Francisco Jerez <[email protected]>
* r600g: add support for primitive id without geom shader (v2)Dave Airlie2015-01-286-1/+51
| | | | | | | | | | | | | | | | | | | GLSL 1.50 specifies a fragment shader may have a primitive id input without a geometry shader present. On r600 hw there is a special GS scenario for this, you have to enable GS_SCENARIO_A and pass the primitive id through the vertex shader which operates in GS_A mode. This is a first pass attempt at this, and passes the piglit tests that test for this. v1.1: clean up debug print + no need to assign key value to setup output. v2: add r600 support Reviewed-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g: move selecting the pixel shader earlier.Dave Airlie2015-01-281-3/+4
| | | | | | | | | | | In order to detect that a pixel shader has a prim id input when we have no geometry shader we need to reorder the shader selection so the pixel shader is selected first, then the vertex shader key can take into account the primitive id input requirement and lack of geom shader. Reviewed-by: Glenn Kennard <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* st/clover: Pass target instead of target.begin() to std::string()Michel Dänzer2015-01-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | Fixes reading beyond allocated memory: ==1936== Invalid read of size 1 ==1936== at 0x4C2C1B4: strlen (vg_replace_strmem.c:412) ==1936== by 0x9E00C30: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20) ==1936== by 0x5B44FAE: clover::compile_program_llvm(clover::compat::string const&, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&, pipe_shader_ir, clover::compat::string const&, clover::compat::string const&, clover::compat::string&) (invocation.cpp:698) ==1936== by 0x5B39A20: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==1936== by 0x5B20152: clBuildProgram (program.cpp:182) ==1936== by 0x400F41: main (hello_world.c:109) ==1936== Address 0x56fee1f is 0 bytes after a block of size 15 alloc'd ==1936== at 0x4C28C20: malloc (vg_replace_malloc.c:296) ==1936== by 0x5B398F0: alloc (compat.hpp:59) ==1936== by 0x5B398F0: vector<std::basic_string<char> > (compat.hpp:98) ==1936== by 0x5B398F0: string<std::basic_string<char> > (compat.hpp:327) ==1936== by 0x5B398F0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==1936== by 0x5B20152: clBuildProgram (program.cpp:182) ==1936== by 0x400F41: main (hello_world.c:109) Reviewed-by: Francisco Jerez <[email protected]>
* r600g,radeonsi: Fix calculation of IR target cap string buffer sizeMichel Dänzer2015-01-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes writing beyond the allocated buffer: ==31855== Invalid write of size 1 ==31855== at 0x50AB2A9: vsprintf (iovsprintf.c:43) ==31855== by 0x508F6F6: sprintf (sprintf.c:32) ==31855== by 0xB59C7EC: r600_get_compute_param (r600_pipe_common.c:526) ==31855== by 0x5B2B7DE: get_compute_param<char> (device.cpp:37) ==31855== by 0x5B2B7DE: clover::device::ir_target() const (device.cpp:201) ==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==31855== by 0x5B20152: clBuildProgram (program.cpp:182) ==31855== by 0x400F41: main (hello_world.c:109) ==31855== Address 0x56fed5f is 0 bytes after a block of size 15 alloc'd ==31855== at 0x4C29180: operator new(unsigned long) (vg_replace_malloc.c:324) ==31855== by 0x5B2B7C2: allocate (new_allocator.h:104) ==31855== by 0x5B2B7C2: allocate (alloc_traits.h:357) ==31855== by 0x5B2B7C2: _M_allocate (stl_vector.h:170) ==31855== by 0x5B2B7C2: _M_create_storage (stl_vector.h:185) ==31855== by 0x5B2B7C2: _Vector_base (stl_vector.h:136) ==31855== by 0x5B2B7C2: vector (stl_vector.h:278) ==31855== by 0x5B2B7C2: get_compute_param<char> (device.cpp:35) ==31855== by 0x5B2B7C2: clover::device::ir_target() const (device.cpp:201) ==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63) ==31855== by 0x5B20152: clBuildProgram (program.cpp:182) ==31855== by 0x400F41: main (hello_world.c:109) Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* nir: fix a bug with constant folding non-per-component instructionsConnor Abbott2015-01-261-1/+2
| | | | | | | | | | | | | Before, we were only copying the first N channels, where N is the size of the SSA destination, which is fine for per-component instructions, but non-per-component instructions like fdot3 can have more source components than destination components. Fix this using the helper function introduced in the last patch. v2: use new helper name Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Connor Abbott <[email protected]>
* nir: add a helper function for getting the number of source componentsConnor Abbott2015-01-261-0/+15
| | | | | | | | | | | | | | Unlike with non-SSA ALU instructions, where if they're per-component you have to look at the writemask to know which source channels are being used, SSA ALU instructions always have all the possible channels enabled so we can just look at the number of components in the SSA definition for per-component instructions to say how many source components are being used. v2: use new name nir_ssa_alu_instr_src_components() Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Connor Abbott <[email protected]>
* i965: Implemente a tiled fast-path for glReadPixels and glGetTexImageSisinty Sasmita Patra2015-01-263-1/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy functions. These are the fast paths for glReadPixels and glGetTexImage. On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts amount of time spent in glReadPixels by more than half and reduces the time of the entire test by 10%. v2: Jason Ekstrand <[email protected]> - Refactor to make the functions look more like the old intel_tex_subimage_tiled_memcpy - Don't export the readpixels_tiled_memcpy function - Fix some pointer arithmatic bugs in partial image downloads (using ReadPixels with a non-zero x or y offset) - Fix a bug when ReadPixels is performed on an FBO wrapping a texture miplevel other than zero. v3: Jason Ekstrand <[email protected]> - Better documentation fot the *_tiled_memcpy functions - Add target restrictions for renderbuffers wrapping textures v4: Jason Ekstrand <[email protected]> - Only check the return value of brw_bo_map for error and not bo->virtual v5: Jason Ekstrand <[email protected]> - Don't unnecessarily repeat a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tiled_memcpy: Add tiled-to-linear pathsSisinty Sasmita Patra2015-01-262-0/+281
| | | | | | | | | | | | | | | | | This commit addes tiled copy functions for coping from tiled memory to linear memory. These are very similar to the existing linear-to-tiled paths. v2: Jason Ekstrand <[email protected]> - New commit message - Various whitespace fixes - Added ptrdiff_t casts as done in commit 225a09790 v3: Jason Ekstrand <[email protected]> - Fixed a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Refactor tiled memcpy functions and move them into their own fileSisinty Sasmita Patra2015-01-264-392/+506
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit refactors the tiled_memcpy code in intel_tex_subimage.c and moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled respectively. The *_faster functions are similarly renamed. There was also a bit of logic to select between the the libc provided memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle. This was moved into an intel_get_memcpy function so that rgba8_copy can live (and be inlined) in intel_tiled_memcpy.c. v2: Jason Ekstrand <[email protected]> - Better commit message - Fix up the copyright on the intel_tiled_memcpy files - Various whitespace fixes - Moved a bunch of stuff that did not need to be exposed from intel_tiled_memcpy.h to intel_tiled_memcpy.c - Added proper documentation for intel_get_memcpy - Incorperated the ptrdiff_t tweaks from commit 225a09790 v3: Jason Ekstrand <[email protected]> - Fixed a comment - Move the tile size constants into the .c file Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tex_subimage: Use the fast tiled path for rectangle texturesJason Ekstrand2015-01-261-1/+2
| | | | | | | | There's no reason why we should be doing this for 2D textures and not rectangles. Just a matter of adding another hunk to the condition. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* mesa: simplify detection of fpclassifyFelix Janda2015-01-261-11/+7
| | | | | | Fixes compilation with musl libc. Reviewed-by: Ian Romanick <[email protected]>
* nir/opcodes: Don't go through doubles when constant-folding iabsJason Ekstrand2015-01-261-1/+1
| | | | | | | | | Previously, we called the abs() function in math.h. However, this involves unnecessarily going through double. This commit changes it to use integers directly with a ternary. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/opcodes: Simplify and fix the unpack_half_*_split_* constant expressionsJason Ekstrand2015-01-261-6/+4
| | | | | | | | | | Previously, these functions were explicitly writing to dst.x and dst.y. However they both return only one component so writing to dst.y is invalid. Also, since they only return one component, we don't need the explicit assignment in the expression and can simplify it use an implicit assignment. Reviewed-by: Connor Abbott <[email protected]>
* nir: Use pointers for nir_src_copy and nir_dest_copyJason Ekstrand2015-01-2610-53/+47
| | | | | | | | This avoids the overhead of copying structures and better matches the newly added nir_alu_src_copy and nir_alu_dest_copy. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* i965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.Kenneth Graunke2015-01-261-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | "MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions. Previously, we deleted MOV.nz instructions when the instruction generating the MOV's source also wrote the flag register (as the flag register already contains the desired value). However, we wouldn't delete CMP.nz instructions that served the same purpose. We also didn't attempt true cmod propagation on MOV.nz instructions, while we would for the equivalent CMP.nz form. This patch fixes both limitations, treating both forms equally. CMP.nz instructions will now be deleted (helping the NIR backend), and MOV.nz instructions will have their .nz propagated. No changes in shader-db without NIR. With NIR, total instructions in shared programs: 6006153 -> 5969364 (-0.61%) instructions in affected programs: 2087139 -> 2050350 (-1.76%) helped: 10704 HURT: 0 GAINED: 2 LOST: 2 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* clover: Fix build with llvm after r226981Jan Vesely2015-01-261-0/+4
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88783 Signed-off-by: Jan Vesely <[email protected]>
* nir/constant_folding: use the new constant folding infrastructureConnor Abbott2015-01-241-158/+21
| | | | | Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add new constant folding infrastructureJason Ekstrand2015-01-246-184/+787
| | | | | | | | | | | | | | | | | | | | | | | | | Add a required field to the Opcode class, const_expr, that contains an expression or statement that computes the result of the opcode given known constant inputs. Then take those const_expr's and expand them into a function that takes an opcode and an array of constant inputs and spits out the constant result. This means that when adding opcodes, there's one less place to update, and almost all the opcodes are self-documenting since the information on how to compute the result is right next to the definition. The helper functions in nir_constant_expressions.c were taken from ir_constant_expressions.cpp. v3 Jason Ekstrand <[email protected]> - Use mako to generate one function per opcode instead of doing piles of string splicing v4 Jason Ekstrand <[email protected]> - More comments and better indentation in the mako - Add a description of the constant expression language in nir_opcodes.py - Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: use Python to autogenerate opcode informationConnor Abbott2015-01-249-401/+479
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, we used a system where a file, nir_opcodes.h, defined some macros that were included to generate the enum values and the nir_op_infos structure. This worked pretty well, but for development the error messages were never very useful, Python tools couldn't understand the opcode list, and it was difficult to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h, which contains all the enum names and gets included into nir.h like before. In addition to solving the above problems, using Python and Mako to generate everything means that it's much easier to add keep information centralized as we add new things like constant propagation that require per-opcode information. v2: - make Opcode derive from object (Dylan) - don't use assert like it's a function (Dylan) - style fixes for fnoise, use xrange (Dylan) - use iterkeys() in nir_opcodes_h.py (Dylan) - use pydoc-style comments (Jason) - don't make fmin/fmax commutative and associative yet (Jason) Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> v3 Jason Ekstrand <[email protected]> - Alphabetize source file lists - Generate nir_opcodes.h in the builddir instead of the source dir - Include $(builddir)/src/glsl/nir in the i965 build - Rework nir_opcodes.h generation so it generates a complete header file instead of one that has to be embedded inside an enum declaration
* i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.Matt Turner2015-01-232-0/+24
| | | | | | | | | total instructions in shared programs: 5952059 -> 5951603 (-0.01%) instructions in affected programs: 138812 -> 138356 (-0.33%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add support for removing MOV.NZ instructions.Matt Turner2015-01-232-3/+52
| | | | | | | | | | | | | | | | | | | | | | For some reason, we occasionally write the flag register with a MOV.NZ instruction: add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F mov.nz.f0(8) null g26<8,8,1>D A MOV.NZ instruction on the result of a CMP is like comparing for equality with true in C. It's useless. Removing it allows us to generate: add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F total instructions in shared programs: 5955701 -> 5951657 (-0.07%) instructions in affected programs: 302910 -> 298866 (-1.34%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allow flipping cond mod for negated arguments.Matt Turner2015-01-232-3/+39
| | | | | | | | | | | | | | | | | | | | | This allows us to apply the optimization in cases where the CMP's argument is negated, by flipping the conditional mod. For example, it allows us to optimize this: add(8) temp a b cmp.l.f0(8) null -temp 0.0 into add.g.f0(8) temp a b total instructions in shared programs: 5958360 -> 5955701 (-0.04%) instructions in affected programs: 466880 -> 464221 (-0.57%) GAINED: 0 LOST: 1 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Propagate cmod across flag read if it contains the same value.Matt Turner2015-01-232-2/+55
| | | | | | | | total instructions in shared programs: 5959463 -> 5958900 (-0.01%) instructions in affected programs: 70031 -> 69468 (-0.80%) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add unit tests for cmod propagation pass.Matt Turner2015-01-232-0/+318
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add pass to propagate conditional modifiers.Matt Turner2015-01-234-0/+101
| | | | | | | | | total instructions in shared programs: 5974160 -> 5959463 (-0.25%) instructions in affected programs: 1743737 -> 1729040 (-0.84%) GAINED: 0 LOST: 12 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Eliminate null-dst instructions without side-effects.Matt Turner2015-01-231-0/+11
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Apply conditional mod specially to split MAD/LRP.Matt Turner2015-01-231-4/+20
| | | | | | | | | | Otherwise we'll apply the conditional mod to only one of SIMD8 instructions and trigger an assertion. NoDDClr/NoDDChk have the same problem but we never apply those to these instructions, so I'm leaving them for a later time. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add a pass to fixup 3-src instructions that have a null dest.Matt Turner2015-01-232-0/+18
| | | | | | | | | | 3-src instructions can only have GRF/MRF destinations. It's really difficult to deal with that restriction in dead code elimination (that wants to give instructions null destinations to show that their result isn't used) while allowing 3-src instructions to have conditional mod, so don't, and just give then a destination before register allocation. Reviewed-by: Kenneth Graunke <[email protected]>