aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965/tiled_memcpy: Add tiled-to-linear pathsSisinty Sasmita Patra2015-01-262-0/+281
| | | | | | | | | | | | | | | | | This commit addes tiled copy functions for coping from tiled memory to linear memory. These are very similar to the existing linear-to-tiled paths. v2: Jason Ekstrand <[email protected]> - New commit message - Various whitespace fixes - Added ptrdiff_t casts as done in commit 225a09790 v3: Jason Ekstrand <[email protected]> - Fixed a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Refactor tiled memcpy functions and move them into their own fileSisinty Sasmita Patra2015-01-264-392/+506
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit refactors the tiled_memcpy code in intel_tex_subimage.c and moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled respectively. The *_faster functions are similarly renamed. There was also a bit of logic to select between the the libc provided memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle. This was moved into an intel_get_memcpy function so that rgba8_copy can live (and be inlined) in intel_tiled_memcpy.c. v2: Jason Ekstrand <[email protected]> - Better commit message - Fix up the copyright on the intel_tiled_memcpy files - Various whitespace fixes - Moved a bunch of stuff that did not need to be exposed from intel_tiled_memcpy.h to intel_tiled_memcpy.c - Added proper documentation for intel_get_memcpy - Incorperated the ptrdiff_t tweaks from commit 225a09790 v3: Jason Ekstrand <[email protected]> - Fixed a comment - Move the tile size constants into the .c file Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tex_subimage: Use the fast tiled path for rectangle texturesJason Ekstrand2015-01-261-1/+2
| | | | | | | | There's no reason why we should be doing this for 2D textures and not rectangles. Just a matter of adding another hunk to the condition. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* mesa: simplify detection of fpclassifyFelix Janda2015-01-261-11/+7
| | | | | | Fixes compilation with musl libc. Reviewed-by: Ian Romanick <[email protected]>
* i965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.Kenneth Graunke2015-01-261-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | "MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions. Previously, we deleted MOV.nz instructions when the instruction generating the MOV's source also wrote the flag register (as the flag register already contains the desired value). However, we wouldn't delete CMP.nz instructions that served the same purpose. We also didn't attempt true cmod propagation on MOV.nz instructions, while we would for the equivalent CMP.nz form. This patch fixes both limitations, treating both forms equally. CMP.nz instructions will now be deleted (helping the NIR backend), and MOV.nz instructions will have their .nz propagated. No changes in shader-db without NIR. With NIR, total instructions in shared programs: 6006153 -> 5969364 (-0.61%) instructions in affected programs: 2087139 -> 2050350 (-1.76%) helped: 10704 HURT: 0 GAINED: 2 LOST: 2 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: use Python to autogenerate opcode informationConnor Abbott2015-01-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, we used a system where a file, nir_opcodes.h, defined some macros that were included to generate the enum values and the nir_op_infos structure. This worked pretty well, but for development the error messages were never very useful, Python tools couldn't understand the opcode list, and it was difficult to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h, which contains all the enum names and gets included into nir.h like before. In addition to solving the above problems, using Python and Mako to generate everything means that it's much easier to add keep information centralized as we add new things like constant propagation that require per-opcode information. v2: - make Opcode derive from object (Dylan) - don't use assert like it's a function (Dylan) - style fixes for fnoise, use xrange (Dylan) - use iterkeys() in nir_opcodes_h.py (Dylan) - use pydoc-style comments (Jason) - don't make fmin/fmax commutative and associative yet (Jason) Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> v3 Jason Ekstrand <[email protected]> - Alphabetize source file lists - Generate nir_opcodes.h in the builddir instead of the source dir - Include $(builddir)/src/glsl/nir in the i965 build - Rework nir_opcodes.h generation so it generates a complete header file instead of one that has to be embedded inside an enum declaration
* i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.Matt Turner2015-01-232-0/+24
| | | | | | | | | total instructions in shared programs: 5952059 -> 5951603 (-0.01%) instructions in affected programs: 138812 -> 138356 (-0.33%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add support for removing MOV.NZ instructions.Matt Turner2015-01-232-3/+52
| | | | | | | | | | | | | | | | | | | | | | For some reason, we occasionally write the flag register with a MOV.NZ instruction: add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F mov.nz.f0(8) null g26<8,8,1>D A MOV.NZ instruction on the result of a CMP is like comparing for equality with true in C. It's useless. Removing it allows us to generate: add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F total instructions in shared programs: 5955701 -> 5951657 (-0.07%) instructions in affected programs: 302910 -> 298866 (-1.34%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allow flipping cond mod for negated arguments.Matt Turner2015-01-232-3/+39
| | | | | | | | | | | | | | | | | | | | | This allows us to apply the optimization in cases where the CMP's argument is negated, by flipping the conditional mod. For example, it allows us to optimize this: add(8) temp a b cmp.l.f0(8) null -temp 0.0 into add.g.f0(8) temp a b total instructions in shared programs: 5958360 -> 5955701 (-0.04%) instructions in affected programs: 466880 -> 464221 (-0.57%) GAINED: 0 LOST: 1 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Propagate cmod across flag read if it contains the same value.Matt Turner2015-01-232-2/+55
| | | | | | | | total instructions in shared programs: 5959463 -> 5958900 (-0.01%) instructions in affected programs: 70031 -> 69468 (-0.80%) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add unit tests for cmod propagation pass.Matt Turner2015-01-232-0/+318
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add pass to propagate conditional modifiers.Matt Turner2015-01-234-0/+101
| | | | | | | | | total instructions in shared programs: 5974160 -> 5959463 (-0.25%) instructions in affected programs: 1743737 -> 1729040 (-0.84%) GAINED: 0 LOST: 12 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Eliminate null-dst instructions without side-effects.Matt Turner2015-01-231-0/+11
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Apply conditional mod specially to split MAD/LRP.Matt Turner2015-01-231-4/+20
| | | | | | | | | | Otherwise we'll apply the conditional mod to only one of SIMD8 instructions and trigger an assertion. NoDDClr/NoDDChk have the same problem but we never apply those to these instructions, so I'm leaving them for a later time. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add a pass to fixup 3-src instructions that have a null dest.Matt Turner2015-01-232-0/+18
| | | | | | | | | | 3-src instructions can only have GRF/MRF destinations. It's really difficult to deal with that restriction in dead code elimination (that wants to give instructions null destinations to show that their result isn't used) while allowing 3-src instructions to have conditional mod, so don't, and just give then a destination before register allocation. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add is_3src() to backend_instruction.Matt Turner2015-01-233-5/+8
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add backend_instruction::can_do_cmod().Matt Turner2015-01-232-0/+46
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cfg: Add a foreach_block_reverse macro.Matt Turner2015-01-231-0/+3
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/cfg: Add a foreach_inst_in_block_reverse_safe macro.Matt Turner2015-01-231-0/+3
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Don't make instructions with a null dest a barrier to scheduling.Matt Turner2015-01-231-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we properly track accumulator dependencies, the scheduler is able to schedule instructions between the mach and mov in the common the integer multiplication pattern: mul acc0, x, y mach null, x, y mov dest, acc0 Since a null destination implies no dependency on the destination, we can also safely schedule instructions (that don't write the accumulator) between the mul and mach. GAINED: 103 LOST: 43 Causes one program to spill (643 -> 1076 instructions). I committed this patch last year (commit 42a26cb5) but reverted it (commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program (bug 78648). Tapani reapplied this patch and could not reproduce the bug with current Mesa. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Allow SIMD16 on pre-SNB when try_replace_with_sel is successfulIan Romanick2015-01-233-13/+13
| | | | | | | | | | | | | | | | | | | | | If try_replace_with_sel is able to replace the flow control with a SEL instruction, then there is no flow control... failing SIMD16 because of nonexistent flow control is wrong. No piglit regressions on any i965 platform in Jenkins. total instructions in shared programs: 4382707 -> 4382707 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 GAINED: 2089 LOST: 0 No other platforms affected in shader-db. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Report NIR instruction counts (in SSA form) via KHR_debug.Kenneth Graunke2015-01-231-0/+32
| | | | | | | | | | | | | | | | | This allows us to count NIR instructions via shader-db. Use "run" as normal. The results file will contain both NIR and assembly. Then, to generate a NIR report: ./report.py <(grep NIR results/foo) <(grep NIR results/bar) Or, to generate an i965 report: ./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Print NIR on INTEL_DEBUG=fs.Kenneth Graunke2015-01-231-0/+11
| | | | | | | | | This is useful for debugging and looking for optimization opportunities. It will need to be expanded when we add support for other scalar stages. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Do optimizations again just before lowering source mods.Kenneth Graunke2015-01-231-13/+21
| | | | | | | | | | | | | | | | We want to run CSE and algebraic optimizations again after lowering IO. Some of the passes in the optimization loop don't handle saturates and other modifiers, so run it before lowering to source modifiers. total instructions in shared programs: 6046190 -> 6045768 (-0.01%) instructions in affected programs: 22406 -> 21984 (-1.88%) helped: 47 HURT: 0 GAINED: 0 LOST: 0 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Build with subdir-objects.Matt Turner2015-01-235-573/+562
|
* mesa: Drop inclusion of glapi_gen.mk.Matt Turner2015-01-231-5/+1
| | | | | Some glapi headers used to be generated from this Makefile.am, but no longer.
* mesa: Add format_{un,}pack.py to distribution.Matt Turner2015-01-231-0/+2
|
* mesa: Remove pack_tmp.h from sources.Matt Turner2015-01-231-1/+0
| | | | Missed in commit 3a4de321.
* i965: Fix min_vs_entries for CHVVille Syrjälä2015-01-231-1/+1
| | | | | | | According to BSpec the correct number for min_vs_entries is 34 for CHV. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
* i965: Fix max_wm_threads for CHVVille Syrjälä2015-01-231-1/+1
| | | | | | | | | | | | | | | | Change max_wm_threads to match the spec on CHV. The max number of threads in 3DSTATE_PS is always programmed to 64 and the hardware internally scales that depending on the GT SKU. So this doesn't change the max number of threads actually used, but it does affect the scratch space calculation. On CHV the old value was too small, so the amount of scratch space allocated wasn't sufficient to satisfy the actual max number of threads used. Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
* i965/emit: Assert that src1 is not an MRF after doing the MRF->GRF conversionJason Ekstrand2015-01-221-1/+1
| | | | | | | | | When emitting texturing from indirect texture units, we need to be able to scratch around in the header message. Since we only do this for >= HSW, this is ok since there are no MRFs. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj phogat <[email protected]>
* i965/emit: Do the sampler index adjustment directly in header.0.3Jason Ekstrand2015-01-224-7/+5
| | | | | | | | | | | Prior to this commit, the adjust_sampler_state_pointer function took an extra register that it could use as scratch space. The usual candidate was the destination of the sampler instruction. However, if that register ever aliased anything important such as the sampler index, this would scratch all over important data. Fortunately, the calculation is such that we can just do it in place and we don't need the scratch space at all. Reviewed-by: Chris Forbes <[email protected]>
* meta: Move loop declaration to top of block.José Fonseca2015-01-221-2/+4
| | | | | | Fixes MSVC build. Trvial.
* i965/tex_subimage: use meta instead of the blitter for PBO TexSubImageJason Ekstrand2015-01-221-100/+15
| | | | Reviewed-by: Neil Roberts <[email protected]>
* i965/tex_image: Use meta for instead of the blitter PBO TexImage and GetTexImageJason Ekstrand2015-01-221-179/+18
| | | | Reviewed-by: Neil Roberts <[email protected]>
* i965/pixel_read: Use meta_pbo_GetTexSubImage for PBO ReadPixelsJason Ekstrand2015-01-221-118/+3
| | | | | | | Since the meta path can do strictly more than the blitter path, we just remove the blitter path entirely. Reviewed-by: Neil Roberts <[email protected]>
* meta: Add an implementation of GetTexSubImage for PBOsJason Ekstrand2015-01-222-0/+125
| | | | Reviewed-by: Neil Roberts <[email protected]>
* meta: Add a BlitFramebuffers-based implementation of TexSubImageJason Ekstrand2015-01-223-0/+248
| | | | | | | | | | | | | | This meta path, designed for use with PBO's, creates a temporary texture out of the PBO and uses BlitFramebuffers to do the actual texture upload. v2 Jason Ekstrand <[email protected]>: - Add support for handling simple packing options v3 Jason Ekstrand <[email protected]>: - Refactor to split out the texture-from-pbo code - Rename to _mesa_meta_pbo_TexSubImage Reviewed-by: Neil Roberts <[email protected]>
* formats: Use a hash table for _mesa_format_from_array_formatJason Ekstrand2015-01-221-12/+56
| | | | | | | | | | | Going through the for loop every time has noticable overhead. This fixes things up so we only do that once ever and then just do a hash table lookup which should be much cheaper. v2 Jason Ekstrand <[email protected]>: - Use once_flag and call_once from c11/threads.h instead of pthreads Reviewed-by: Neil Roberts <[email protected]>
* i965: Implement SetTextureStorageForBufferObjectJason Ekstrand2015-01-221-0/+57
| | | | Reviewed-by: Neil Roberts <[email protected]>
* i965: Apply the miptree offset to surface state for renderbuffersJason Ekstrand2015-01-224-4/+8
| | | | | | | | | Previously, we were completely ignoring the mt->offset field for renderbuffers. While it does have some alignment constraints, it is valid to use it. This patch adds the code to each of the 4 surface state setup functions to handle it. Reviewed-by: Neil Roberts <[email protected]>
* i965/mipmap_tree: Add a depth parameter to create_for_boJason Ekstrand2015-01-226-7/+14
| | | | Reviewed-by: Neil Roberts <[email protected]>
* mesa/dd: Add a function for creating a texture from a buffer objectJason Ekstrand2015-01-221-0/+16
| | | | Reviewed-by: Neil Roberts <[email protected]>
* i965/vec4: Fix fprintf argument ordering.Matt Turner2015-01-211-2/+2
| | | | Introduced in commit 3167a80b.
* mesa: change assert to unreachable in two format functionsTobias Klausmann2015-01-212-2/+2
| | | | | | | | | | This fixes two problems reported by osc: I: Program returns random data in a function E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/format_utils.c:180 E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/glformats.c:2714 Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Tobias Klausmann <[email protected]>
* mesa: Add assert to check number of vector elementsJan Vesely2015-01-212-0/+2
| | | | | | | | The below code crashes when vector_elements <= 0 Fixes Warray-bounds warnings Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* mesa: Fix some signed-unsigned comparison warningsJan Vesely2015-01-2127-52/+54
| | | | | | | | v2: s/unsigned int/unsigned/ in prog_optimize.c Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: David Heidelberg <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* mesa: remove comparisons that are always trueJan Vesely2015-01-212-3/+0
| | | | | Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* i965: Extract scalar region checking logicBen Widawsky2015-01-203-7/+15
| | | | | | | | | | | There are currently 2 users of this functionality. I have 2 more users coming up, and having a simple function makes the results much cleaner. The existing interface semantics was proposed by Matt. v2 (Ken): Rename to region_matches()/has_scalar_region(). Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add QWORD sizes to type_sz macroBen Widawsky2015-01-201-0/+3
| | | | | | | | | | | | | | | | | | | GEN8 added the QWORD as a valid type for certain operations on the EU. In order to calculate the number of registers used one must have the type size as part of the equation. Quoting the formula in the code: regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32; Adding this separately for bisection since there is no simple way to add an assert in the type_sz function. NOTE: As a side note, I was confused for a while because it's impossible to calculate the region, ie. registers needed, without vstride. However, at this point these are all part of the IR, and so no vstride must exist. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>