mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	drirc: set allow_glsl_extension_directive_midshader for Dead Island.	Sven Arvidsson	2015-01-28	1	-0/+4
\| \| \| \| \| \|	Signed-off-by: Sven Arvidsson <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87076 Signed-off-by: Marek Olšák <[email protected]>
*	i965/tex: Don't create read-write textures with non-renderable formats	Jason Ekstrand	2015-01-28	1	-0/+5
\| \| \| \| \| \| \| \| \|	I haven't actually seen this bug in the wild, but it's possible that someone could ask to do a S3TC PBO download or something. This protects us from accidentally creating a render target with a compressed or otherwise non-renderable format. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/gen8: Include the buffer offset when emitting renderbuffer relocs	Jason Ekstrand	2015-01-28	1	-1/+1
\| \| \| \| \|	Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88792 Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage	Sisinty Sasmita Patra	2015-01-26	3	-1/+271
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy functions. These are the fast paths for glReadPixels and glGetTexImage. On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts amount of time spent in glReadPixels by more than half and reduces the time of the entire test by 10%. v2: Jason Ekstrand <[email protected]> - Refactor to make the functions look more like the old intel_tex_subimage_tiled_memcpy - Don't export the readpixels_tiled_memcpy function - Fix some pointer arithmatic bugs in partial image downloads (using ReadPixels with a non-zero x or y offset) - Fix a bug when ReadPixels is performed on an FBO wrapping a texture miplevel other than zero. v3: Jason Ekstrand <[email protected]> - Better documentation fot the *_tiled_memcpy functions - Add target restrictions for renderbuffers wrapping textures v4: Jason Ekstrand <[email protected]> - Only check the return value of brw_bo_map for error and not bo->virtual v5: Jason Ekstrand <[email protected]> - Don't unnecessarily repeat a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965/tiled_memcpy: Add tiled-to-linear paths	Sisinty Sasmita Patra	2015-01-26	2	-0/+281
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit addes tiled copy functions for coping from tiled memory to linear memory. These are very similar to the existing linear-to-tiled paths. v2: Jason Ekstrand <[email protected]> - New commit message - Various whitespace fixes - Added ptrdiff_t casts as done in commit 225a09790 v3: Jason Ekstrand <[email protected]> - Fixed a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Refactor tiled memcpy functions and move them into their own file	Sisinty Sasmita Patra	2015-01-26	4	-392/+506
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit refactors the tiled_memcpy code in intel_tex_subimage.c and moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled respectively. The *_faster functions are similarly renamed. There was also a bit of logic to select between the the libc provided memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle. This was moved into an intel_get_memcpy function so that rgba8_copy can live (and be inlined) in intel_tiled_memcpy.c. v2: Jason Ekstrand <[email protected]> - Better commit message - Fix up the copyright on the intel_tiled_memcpy files - Various whitespace fixes - Moved a bunch of stuff that did not need to be exposed from intel_tiled_memcpy.h to intel_tiled_memcpy.c - Added proper documentation for intel_get_memcpy - Incorperated the ptrdiff_t tweaks from commit 225a09790 v3: Jason Ekstrand <[email protected]> - Fixed a comment - Move the tile size constants into the .c file Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965/tex_subimage: Use the fast tiled path for rectangle textures	Jason Ekstrand	2015-01-26	1	-1/+2
\| \| \| \| \| \| \| \|	There's no reason why we should be doing this for 2D textures and not rectangles. Just a matter of adding another hunk to the condition. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.	Kenneth Graunke	2015-01-26	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions. Previously, we deleted MOV.nz instructions when the instruction generating the MOV's source also wrote the flag register (as the flag register already contains the desired value). However, we wouldn't delete CMP.nz instructions that served the same purpose. We also didn't attempt true cmod propagation on MOV.nz instructions, while we would for the equivalent CMP.nz form. This patch fixes both limitations, treating both forms equally. CMP.nz instructions will now be deleted (helping the NIR backend), and MOV.nz instructions will have their .nz propagated. No changes in shader-db without NIR. With NIR, total instructions in shared programs: 6006153 -> 5969364 (-0.61%) instructions in affected programs: 2087139 -> 2050350 (-1.76%) helped: 10704 HURT: 0 GAINED: 2 LOST: 2 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir: use Python to autogenerate opcode information	Connor Abbott	2015-01-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, we used a system where a file, nir_opcodes.h, defined some macros that were included to generate the enum values and the nir_op_infos structure. This worked pretty well, but for development the error messages were never very useful, Python tools couldn't understand the opcode list, and it was difficult to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h, which contains all the enum names and gets included into nir.h like before. In addition to solving the above problems, using Python and Mako to generate everything means that it's much easier to add keep information centralized as we add new things like constant propagation that require per-opcode information. v2: - make Opcode derive from object (Dylan) - don't use assert like it's a function (Dylan) - style fixes for fnoise, use xrange (Dylan) - use iterkeys() in nir_opcodes_h.py (Dylan) - use pydoc-style comments (Jason) - don't make fmin/fmax commutative and associative yet (Jason) Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> v3 Jason Ekstrand <[email protected]> - Alphabetize source file lists - Generate nir_opcodes.h in the builddir instead of the source dir - Include $(builddir)/src/glsl/nir in the i965 build - Rework nir_opcodes.h generation so it generates a complete header file instead of one that has to be embedded inside an enum declaration
*	i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.	Matt Turner	2015-01-23	2	-0/+24
\| \| \| \| \| \| \| \| \|	total instructions in shared programs: 5952059 -> 5951603 (-0.01%) instructions in affected programs: 138812 -> 138356 (-0.33%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add support for removing MOV.NZ instructions.	Matt Turner	2015-01-23	2	-3/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For some reason, we occasionally write the flag register with a MOV.NZ instruction: add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F mov.nz.f0(8) null g26<8,8,1>D A MOV.NZ instruction on the result of a CMP is like comparing for equality with true in C. It's useless. Removing it allows us to generate: add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F total instructions in shared programs: 5955701 -> 5951657 (-0.07%) instructions in affected programs: 302910 -> 298866 (-1.34%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Allow flipping cond mod for negated arguments.	Matt Turner	2015-01-23	2	-3/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to apply the optimization in cases where the CMP's argument is negated, by flipping the conditional mod. For example, it allows us to optimize this: add(8) temp a b cmp.l.f0(8) null -temp 0.0 into add.g.f0(8) temp a b total instructions in shared programs: 5958360 -> 5955701 (-0.04%) instructions in affected programs: 466880 -> 464221 (-0.57%) GAINED: 0 LOST: 1 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Propagate cmod across flag read if it contains the same value.	Matt Turner	2015-01-23	2	-2/+55
\| \| \| \| \| \| \| \|	total instructions in shared programs: 5959463 -> 5958900 (-0.01%) instructions in affected programs: 70031 -> 69468 (-0.80%) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add unit tests for cmod propagation pass.	Matt Turner	2015-01-23	2	-0/+318
\| \| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add pass to propagate conditional modifiers.	Matt Turner	2015-01-23	4	-0/+101
\| \| \| \| \| \| \| \| \|	total instructions in shared programs: 5974160 -> 5959463 (-0.25%) instructions in affected programs: 1743737 -> 1729040 (-0.84%) GAINED: 0 LOST: 12 Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Eliminate null-dst instructions without side-effects.	Matt Turner	2015-01-23	1	-0/+11
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Apply conditional mod specially to split MAD/LRP.	Matt Turner	2015-01-23	1	-4/+20
\| \| \| \| \| \| \| \| \| \|	Otherwise we'll apply the conditional mod to only one of SIMD8 instructions and trigger an assertion. NoDDClr/NoDDChk have the same problem but we never apply those to these instructions, so I'm leaving them for a later time. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add a pass to fixup 3-src instructions that have a null dest.	Matt Turner	2015-01-23	2	-0/+18
\| \| \| \| \| \| \| \| \| \|	3-src instructions can only have GRF/MRF destinations. It's really difficult to deal with that restriction in dead code elimination (that wants to give instructions null destinations to show that their result isn't used) while allowing 3-src instructions to have conditional mod, so don't, and just give then a destination before register allocation. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add is_3src() to backend_instruction.	Matt Turner	2015-01-23	3	-5/+8
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add backend_instruction::can_do_cmod().	Matt Turner	2015-01-23	2	-0/+46
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/cfg: Add a foreach_block_reverse macro.	Matt Turner	2015-01-23	1	-0/+3
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/cfg: Add a foreach_inst_in_block_reverse_safe macro.	Matt Turner	2015-01-23	1	-0/+3
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Don't make instructions with a null dest a barrier to scheduling.	Matt Turner	2015-01-23	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we properly track accumulator dependencies, the scheduler is able to schedule instructions between the mach and mov in the common the integer multiplication pattern: mul acc0, x, y mach null, x, y mov dest, acc0 Since a null destination implies no dependency on the destination, we can also safely schedule instructions (that don't write the accumulator) between the mul and mach. GAINED: 103 LOST: 43 Causes one program to spill (643 -> 1076 instructions). I committed this patch last year (commit 42a26cb5) but reverted it (commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program (bug 78648). Tapani reapplied this patch and could not reproduce the bug with current Mesa. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Allow SIMD16 on pre-SNB when try_replace_with_sel is successful	Ian Romanick	2015-01-23	3	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If try_replace_with_sel is able to replace the flow control with a SEL instruction, then there is no flow control... failing SIMD16 because of nonexistent flow control is wrong. No piglit regressions on any i965 platform in Jenkins. total instructions in shared programs: 4382707 -> 4382707 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 GAINED: 2089 LOST: 0 No other platforms affected in shader-db. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/nir: Report NIR instruction counts (in SSA form) via KHR_debug.	Kenneth Graunke	2015-01-23	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to count NIR instructions via shader-db. Use "run" as normal. The results file will contain both NIR and assembly. Then, to generate a NIR report: ./report.py <(grep NIR results/foo) <(grep NIR results/bar) Or, to generate an i965 report: ./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/nir: Print NIR on INTEL_DEBUG=fs.	Kenneth Graunke	2015-01-23	1	-0/+11
\| \| \| \| \| \| \| \| \|	This is useful for debugging and looking for optimization opportunities. It will need to be expanded when we add support for other scalar stages. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/nir: Do optimizations again just before lowering source mods.	Kenneth Graunke	2015-01-23	1	-13/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to run CSE and algebraic optimizations again after lowering IO. Some of the passes in the optimization loop don't handle saturates and other modifiers, so run it before lowering to source modifiers. total instructions in shared programs: 6046190 -> 6045768 (-0.01%) instructions in affected programs: 22406 -> 21984 (-1.88%) helped: 47 HURT: 0 GAINED: 0 LOST: 0 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Fix min_vs_entries for CHV	Ville Syrjälä	2015-01-23	1	-1/+1
\| \| \| \| \| \| \|	According to BSpec the correct number for min_vs_entries is 34 for CHV. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
*	i965: Fix max_wm_threads for CHV	Ville Syrjälä	2015-01-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change max_wm_threads to match the spec on CHV. The max number of threads in 3DSTATE_PS is always programmed to 64 and the hardware internally scales that depending on the GT SKU. So this doesn't change the max number of threads actually used, but it does affect the scratch space calculation. On CHV the old value was too small, so the amount of scratch space allocated wasn't sufficient to satisfy the actual max number of threads used. Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
*	i965/emit: Assert that src1 is not an MRF after doing the MRF->GRF conversion	Jason Ekstrand	2015-01-22	1	-1/+1
\| \| \| \| \| \| \| \| \|	When emitting texturing from indirect texture units, we need to be able to scratch around in the header message. Since we only do this for >= HSW, this is ok since there are no MRFs. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj phogat <[email protected]>
*	i965/emit: Do the sampler index adjustment directly in header.0.3	Jason Ekstrand	2015-01-22	4	-7/+5
\| \| \| \| \| \| \| \| \| \| \|	Prior to this commit, the adjust_sampler_state_pointer function took an extra register that it could use as scratch space. The usual candidate was the destination of the sampler instruction. However, if that register ever aliased anything important such as the sampler index, this would scratch all over important data. Fortunately, the calculation is such that we can just do it in place and we don't need the scratch space at all. Reviewed-by: Chris Forbes <[email protected]>
*	meta: Move loop declaration to top of block.	José Fonseca	2015-01-22	1	-2/+4
\| \| \| \| \| \|	Fixes MSVC build. Trvial.
*	i965/tex_subimage: use meta instead of the blitter for PBO TexSubImage	Jason Ekstrand	2015-01-22	1	-100/+15
\| \| \| \|	Reviewed-by: Neil Roberts <[email protected]>
*	i965/tex_image: Use meta for instead of the blitter PBO TexImage and GetTexImage	Jason Ekstrand	2015-01-22	1	-179/+18
\| \| \| \|	Reviewed-by: Neil Roberts <[email protected]>
*	i965/pixel_read: Use meta_pbo_GetTexSubImage for PBO ReadPixels	Jason Ekstrand	2015-01-22	1	-118/+3
\| \| \| \| \| \| \|	Since the meta path can do strictly more than the blitter path, we just remove the blitter path entirely. Reviewed-by: Neil Roberts <[email protected]>
*	meta: Add an implementation of GetTexSubImage for PBOs	Jason Ekstrand	2015-01-22	2	-0/+125
\| \| \| \|	Reviewed-by: Neil Roberts <[email protected]>
*	meta: Add a BlitFramebuffers-based implementation of TexSubImage	Jason Ekstrand	2015-01-22	2	-0/+247
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This meta path, designed for use with PBO's, creates a temporary texture out of the PBO and uses BlitFramebuffers to do the actual texture upload. v2 Jason Ekstrand <[email protected]>: - Add support for handling simple packing options v3 Jason Ekstrand <[email protected]>: - Refactor to split out the texture-from-pbo code - Rename to _mesa_meta_pbo_TexSubImage Reviewed-by: Neil Roberts <[email protected]>
*	i965: Implement SetTextureStorageForBufferObject	Jason Ekstrand	2015-01-22	1	-0/+57
\| \| \| \|	Reviewed-by: Neil Roberts <[email protected]>
*	i965: Apply the miptree offset to surface state for renderbuffers	Jason Ekstrand	2015-01-22	4	-4/+8
\| \| \| \| \| \| \| \| \|	Previously, we were completely ignoring the mt->offset field for renderbuffers. While it does have some alignment constraints, it is valid to use it. This patch adds the code to each of the 4 surface state setup functions to handle it. Reviewed-by: Neil Roberts <[email protected]>
*	i965/mipmap_tree: Add a depth parameter to create_for_bo	Jason Ekstrand	2015-01-22	6	-7/+14
\| \| \| \|	Reviewed-by: Neil Roberts <[email protected]>
*	i965/vec4: Fix fprintf argument ordering.	Matt Turner	2015-01-21	1	-2/+2
\| \| \| \|	Introduced in commit 3167a80b.
*	i965: Extract scalar region checking logic	Ben Widawsky	2015-01-20	3	-7/+15
\| \| \| \| \| \| \| \| \| \| \|	There are currently 2 users of this functionality. I have 2 more users coming up, and having a simple function makes the results much cleaner. The existing interface semantics was proposed by Matt. v2 (Ken): Rename to region_matches()/has_scalar_region(). Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add QWORD sizes to type_sz macro	Ben Widawsky	2015-01-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GEN8 added the QWORD as a valid type for certain operations on the EU. In order to calculate the number of registers used one must have the type size as part of the equation. Quoting the formula in the code: regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32; Adding this separately for bisection since there is no simple way to add an assert in the type_sz function. NOTE: As a side note, I was confused for a while because it's impossible to calculate the region, ie. registers needed, without vstride. However, at this point these are all part of the IR, and so no vstride must exist. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Work around mysterious Gen4 GPU hangs with minimal state changes.	Kenneth Graunke	2015-01-19	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gen4 hardware appears to GPU hang frequently when using Chromium, and also when running 'glmark2 -b ideas'. Most of the error states contain 3DPRIMITIVE commands in quick succession, with very few state packets between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER. I trimmed an apitrace of the glmark2 hang down to two draw calls with a glUniformMatrix4fv call between the two. Either draw by itself works fine, but together, they hang the GPU. Removing the glUniform call makes the hangs disappear. In the hardware state, this translates to removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets. Flushing before emitting CONSTANT_BUFFER packets also appears to make the hangs disappear. I observed a slowdown in glxgears by doing it all the time, so I've chosen to only do it when BRW_NEW_BATCH and BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or already flushed the whole pipeline). I'd much rather understand the problem, but at this point, I don't see how we'd ever be able to track it down further. We have no real tools, and the hardware people moved on years ago. I've analyzed 20+ error states and read every scrap of documentation I could find. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367 Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Matt Turner <[email protected]> Cc: "10.4 10.3" <[email protected]>
*	i965/nir: Enable SIMD16 support in the NIR FS backend.	Kenneth Graunke	2015-01-19	1	-2/+1
\| \| \| \| \| \| \| \|	With the previous commits in place, it just works. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/nir: Use offset() instead of altering reg_offset directly.	Kenneth Graunke	2015-01-19	1	-59/+32
\| \| \| \| \| \| \| \| \| \| \|	offset() properly handles reg_width, so it'll work for SIMD16. While we're in the area, simplify a few cases, and use retype() to cut a few more lines of code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/nir: Replace fs_reg(GRF, virtual_grf_alloc(...)) with vgrf(...).	Kenneth Graunke	2015-01-19	3	-13/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	brw_fs_nir.cpp creates almost all of its registers via: fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components)); When we add SIMD16 support, we'll need to set reg->width = 16 and double the VGRF size...on pretty much every VGRF it allocates. This patch replaces that pattern with a new "vgrf" helper method: fs_reg reg = vgrf(num_components); The new function correctly takes reg_width into account. For now, reg_width is always 1, so this should have no functional change. v2: Just make vgrf() account for reg_width right away, rather than changing the behavior in the next patch. v3: Replace one last virtual_grf_alloc I missed. It's used in code that only runs for dispatch_width == 8, so it doesn't matter, but consistency is nice. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Replace fs_reg(fs_visitor, type) with fs_visitor::vgrf(type).	Kenneth Graunke	2015-01-19	6	-128/+122
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I dislike how fs_reg has a constructor that knows about fs_visitor. Apart from that, it stands alone, with no need to interact with the rest of the compiler. Which is sensible - a class that represents a register should do just that. Allocating virtual register numbers should be left up to the compiler (fs_visitor). This patch replaces the constructor with a new fs_visitor::vgrf method, eliminating fs_reg's dependency on fs_visitor. It ends up being no more code. v2: Rebase from May 2014 -> January 2015. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: Fix the dummy fragment shader.	Kenneth Graunke	2015-01-17	1	-7/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We hit an assertion that the destination of the FB write should not be an immediate. (I don't know what we were thinking.) Use ARF null. Trying to substitute real shaders with the dummy shader would crash when trying to upload non-existent uniforms. Say there are none. It also wouldn't generate any code because we didn't compute the CFG, and code generation now requires it. Compute it. Gen4-5 also require a message header to be present. On Gen6+, there were assertion failures in SF/SBE state because urb_setup was memset to 0 instad of -1, causing it to think there were attributes when nothing was set up right. Set to no attributes. Finally, you have to ensure "Setup URB Entry Read Length" is non-zero or you get GPU hangs, at least on Crestline. It now works on at least Crestline and Haswell. Signed-off-by: Kenneth Graunke <[email protected]>
*	i965: Fix up too-wide comment	Kristian Høgsberg	2015-01-16	1	-4/+3
\| \| \| \|	Signed-off-by: Kristian Høgsberg <[email protected]>