aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_reg.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/hsw: Initialize SLM index in state registerJordan Justen2016-03-081-0/+16
| | | | | | | | | | | | | | | For Haswell, we need to initialize the SLM index in the state register. This can be copied out of the CS header dword 0. v2: * Use UW move to avoid changing upper 16-bits of sr0.1 (mattst88) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94081 Fixes: piglit arb_compute_shader/execution/shared-atomics.shader_test Signed-off-by: Jordan Justen <[email protected]> Cc: "11.2" <[email protected]> Tested-by: Ilia Mirkin <[email protected]> (v1) Reviewed-by: Matt Turner <[email protected]>
* i965: Add support for swizzling arbitrary immediates to (brw_)swizzle().Francisco Jerez2016-03-061-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scalar immediates used to be handled correctly by swizzle() (as the identity) but since commit 58fa9d47b536403c4e3ca5d6a2495691338388fd it will corrupt the contents of the immediate. Vector immediates were never handled correctly, but we had ad-hoc code to swizzle VF immediates in the vec4 copy propagation pass. This takes care of swizzling V and UV in addition. v2: Don't implement swizzling of V/UV immediates (Matt). If you need to swizzle an integer vector immediate in the future apply the following diff to go back to v1: --- a/src/mesa/drivers/dri/i965/brw_eu.c +++ b/src/mesa/drivers/dri/i965/brw_eu.c @@ -119,11 +119,10 @@ brw_swap_cmod(uint32_t cmod) static unsigned imm_shift(enum brw_reg_type type, unsigned i) { - assert(type != BRW_REGISTER_TYPE_UV && type != BRW_REGISTER_TYPE_V && - "Not implemented."); - if (type == BRW_REGISTER_TYPE_VF) return 8 * (i & 3); + else if (type == BRW_REGISTER_TYPE_UV || type == BRW_REGISTER_TYPE_V) + return 4 * (i & 7); else return 0; } Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Pass symbolic swizzle to brw_swizzle() as a single argument.Francisco Jerez2016-03-061-11/+4
| | | | | | | | And replace brw_swizzle1() with brw_swizzle(). Seems slightly cleaner and will allow reusing brw_swizzle() in the vec4 back-end more easily. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gen8: Always use BRW_REGISTER_TYPE_UW for MUL on GEN8+Marta Lofstedt2015-12-301-27/+0
| | | | | | | | | | | | | The imulExtended tests of the shader bitfield tests of the OpenGL ES 3.1 CTS, fail on gen8+, when BRW_REGISTER_TYPE_W is used for SHADER_OPECODE_MULH. Also, remove unused helper function: static inline bool type_is_signed(unsigned type) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add tessellation control shaders.Kenneth Graunke2015-12-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add brw_imm_uv().Matt Turner2015-11-201-0/+9
|
* i965: Don't bother setting regioning on immediates.Matt Turner2015-11-201-6/+0
| | | | The region fields are unioned with the immediate storage.
* i965: Make brw_imm_vf4() take 8-bit restricted floats.Matt Turner2015-11-191-31/+7
| | | | | | | | | | | | | | | | | | | | This partially reverts commit bbf8239f92ecd79431dfa41402e1c85318e7267f. I didn't like that commit to begin with -- computing things at compile time is fine -- but for purposes of verifying that the resulting values are correct, looking up 0x00 and 0x30 in a table is a lot better than evaluating a recursive function. Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats directly (instead of only integral values that would be converted to restricted float), we can use this function as a replacement for the vector float src_reg/fs_reg constructors. brw_float_to_vf() is not currently an inline function, so it will not be evaluated at compile time. I'll address that in a follow-up patch. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Combine register file field.Matt Turner2015-11-131-2/+2
| | | | | | | | The first four values (2-bits) are hardware values, and VGRF, ATTR, and UNIFORM remain values used in the IR. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-131-5/+5
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add and use enum brw_reg_file.Matt Turner2015-11-131-12/+13
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Reorganize brw_reg fields.Matt Turner2015-11-131-8/+8
| | | | | | | | | Put fields that are meaningless with an immediate in the same storage with the immediate. This leaves fields type, file, nr, subnr in the first dword where there's now extra room for expansion. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.Matt Turner2015-11-131-20/+20
| | | | | | | | | | | | | | | | | | | | | | Generated by sed -i -e 's/\.bits\././g' *.c *.h *.cpp sed -i -e 's/dw1\.//g' *.c *.h *.cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/brw_reg: Add a brw_VxH_indirect helperJason Ekstrand2015-11-111-0/+11
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Note that the UV immediate type is Gen6+.Matt Turner2015-10-221-1/+1
|
* i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generationIago Toral Quiroga2015-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some bug reports about shaders failing to compile in gen6 because MRF 14 is used when we need to spill. For example: https://bugs.freedesktop.org/show_bug.cgi?id=86469 https://bugs.freedesktop.org/show_bug.cgi?id=90631 Discussion in bugzilla pointed to the fact that gen6 might actually have 24 MRF registers available instead of 16, so we could use other MRF registers and avoid these conflicts (we still need to investigate why some shaders need up to MRF 14 anyway, since this is not expected). Notice that the hardware docs are not clear about this fact: SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device Hardware" says "Number per Thread" - "24 registers" However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says: "Normal threads should construct their messages in m1..m15. (...) Regardless of actual hardware implementation, the thread should not assume th at MRF addresses above m15 wrap to legal MRF registers." Therefore experimentation was necessary to evaluate if we had these extra MRF registers available or not. This was tested in gen6 using MRF registers 21..23 for spilling and doing a full piglit run (all.py) forcing spilling of everything on the FS backend. It was also tested by doing spilling of everything on both the FS and the VS backends with a piglit run of shader.py. In both cases no regressions were observed. In fact, many of these tests where helped in the cases where we forced spilling, since that triggered the same underlying problem described in the bug reports. Here are some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on gen6 hardware: Using MRFs 13..15 for spilling: crash: 2, fail: 113, pass: 6621, skip: 5461 Using MRFs 21..23 for spilling: crash: 2, fail: 12, pass: 6722, skip: 5461 This patch sets the ground for later patches to implement spilling using MRF registers 21..23 in gen6. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move MRF register asserts out of brw_reg.hIago Toral Quiroga2015-09-211-3/+4
| | | | | | | | | | | | | | | In a later patch we will make BRW_MAX_MRF return a different value depending on the hardware generation, but it is inconvenient to add a gen parameter to the brw_reg functions only for the assertions, so move these to places where we have the hardware generation available. Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that would make sure that we catch all uses of MRF registers, even those coming from modules that generate native code directly, like blorp. Unfortunately, this is very late in the process which can make things harder to debug, so add asserts to the generator as well. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Add auxiliary func to build a writemask from a component sizeEduardo Lima Mitev2015-08-031-0/+6
| | | | | | | New method brw_writemask_for_size() will return a writemask with the first 'size' components activated. Reviewed-by: Jason Ekstrand <[email protected]>
* Delete duplicate function is_power_of_two() and use _mesa_is_pow_two()Anuj Phogat2015-07-291-1/+1
| | | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* i965: Add notification registerJordan Justen2015-06-121-0/+16
| | | | | | | | | | | | This will be used by the wait instruction when implementing the barrier() function. v2: * Changes suggested by mattst88 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Document brw_mask_reg().Francisco Jerez2015-05-121-1/+5
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Make the brw_inst helpers take a device_info instead of a contextJason Ekstrand2015-04-221-2/+2
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Calculate delta_x and delta_y together.Matt Turner2015-04-211-0/+7
| | | | | | | | | | | | | This lets SIMD16 programs on G45 and Gen5 use the PLN instruction. On Ironlake: total instructions in shared programs: 5634757 -> 5518055 (-2.07%) instructions in affected programs: 1745837 -> 1629135 (-6.68%) helped: 11439 HURT: 4 Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make type_sz() return unsigned.Matt Turner2015-04-211-1/+1
| | | | | | Avoids annoying warnings when comparing with sizeof(...). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Some more trivial swizzle clean-up.Francisco Jerez2015-03-231-4/+2
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Fix signedness of brw_is_single_value_swizzle() argument.Francisco Jerez2015-03-231-1/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Define some useful swizzle helper functions.Francisco Jerez2015-03-231-0/+97
| | | | | | | | | | This defines helper functions implementing some common swizzle transformations that are usually open-coded in the compiler back-end, causing a lot of clutter. Some optimization passes will become almost trivial implemented in terms of these functions (e.g. vec4_visitor::opt_reduce_swizzle()). Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Replace ud_reg_to_w() with a more general helper function.Francisco Jerez2015-02-191-0/+22
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Extract scalar region checking logicBen Widawsky2015-01-201-0/+13
| | | | | | | | | | | There are currently 2 users of this functionality. I have 2 more users coming up, and having a simple function makes the results much cleaner. The existing interface semantics was proposed by Matt. v2 (Ken): Rename to region_matches()/has_scalar_region(). Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add QWORD sizes to type_sz macroBen Widawsky2015-01-201-0/+3
| | | | | | | | | | | | | | | | | | | GEN8 added the QWORD as a valid type for certain operations on the EU. In order to calculate the number of registers used one must have the type size as part of the equation. Quoting the formula in the code: regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32; Adding this separately for bisection since there is no simple way to add an assert in the type_sz function. NOTE: As a side note, I was confused for a while because it's impossible to calculate the region, ie. registers needed, without vstride. However, at this point these are all part of the IR, and so no vstride must exist. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/brw_reg: struct constructor now needs explicit negate and abs values.Andres Gomez2014-12-151-2/+20
| | | | | | | | | | | | | | | | | | | We were assuming, when constructing a new brw_reg struct, that the negate and abs register modifiers would not be present by default in the new register. Now, we force explicitly setting these values when constructing a new register. This will avoid problems like forgetting to properly set them when we are using a previous register to generate this new register, as it was happening in the dFdx and dFdy generation functions. Fixes piglit test shaders/glsl-deriv-varyings Cc: "10.4 10.3" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82991 Reviewed-by: Matt Turner <[email protected]>
* i965: Add functions to convert float <-> VF.Matt Turner2014-11-251-0/+4
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/brw_reg: Make the accumulator register take an explicit width.Jason Ekstrand2014-09-301-2/+3
| | | | | | | The big pile of patches I just pushed regresses about 25 piglit tests on SNB. This fixes the regressions. Signed-off-by: Jason Ekstrand <[email protected]>
* i965/brw_reg: Add a firsthalf function and use it in the generatorJason Ekstrand2014-09-301-0/+6
| | | | | | | | Right now, this function is a no-op but it indicates that we intend to only use the first half of the 16-wide register. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Make null_reg_* const members of fs_visitor instead of globalsJason Ekstrand2014-09-301-0/+6
| | | | | | | | | We also set the register width equal to the dispatch width. Right now, this is effectively a no-op since we don't do anything with it. However, it will be important once we add an actual width field to fs_reg. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: forward-declare struct brw_context in brw_reg.hIlia Mirkin2014-07-091-0/+2
| | | | | | | | | | | | | | | | | | | | Commit 54e91e7420 introduced a function declaration that uses brw_context. While brw_context tends to get included in most files, it is not when compiling intel_asm_annotation.c resulting in the following warning: In file included from brw_shader.h:25:0, from brw_cfg.h:32, from intel_asm_annotation.c:24: brw_reg.h:122:39: warning: 'struct brw_context' declared inside parameter list [enabled by default] brw_reg.h:122:39: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] Add a forward-declaration for struct brw_context to avoid the issue. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use enum brw_reg_type for register types.Matt Turner2014-07-051-4/+4
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use unreachable() instead of unconditional assert().Matt Turner2014-07-011-2/+1
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: Make unreachable macro take a string argument.Matt Turner2014-07-011-2/+1
| | | | | | To aid in debugging. Reviewed-by: Ian Romanick <[email protected]>
* Revert "i965: Add 'wait' instruction support"Matt Turner2014-06-171-16/+0
| | | | | | This reverts commit 20be3ff57670529a410b30a1008a71e768d08428. No evidence of ever being used.
* i965: Mark brw_reg_type and register_file enums as PACKED.Matt Turner2014-02-211-1/+2
| | | | | | | | | | | | | | | | | | | The C99 spec says the type of an enum is implementation defined (but can be char, signed int, or unsigned int). gcc appears to always give enums four bytes, even when they can fit in less. It does so because this is what other compilers seem to do [0] and therefore to maintain ABI compatibility with them. gcc has an -fshort-enum flag that tells the compiler to use only as much space as needed for an enum. Adding __attribute__((__packed__)) to an enum definition has the same behavior, but on a per-enum basis. brw_reg_type and register_file are not part of the ABI, so we can safely mark them as PACKED so that they'll take only a byte, rather than four. [0] http://gcc.gnu.org/onlinedocs/gcc/Non-bugs.html#index-fshort-enums-3868 Acked-by: Kenneth Graunke <[email protected]>
* i965: Have brw_imm_vf4() take the vector components as integer values.Francisco Jerez2014-02-191-10/+30
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Add helper function to find out the signedness of a register type.Francisco Jerez2014-02-191-0/+28
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Use swizzle() in the ARB_vertex_program code.Francisco Jerez2014-02-191-0/+2
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Fix register types in dump_instructions().Kenneth Graunke2014-02-051-0/+1
| | | | | | | | | | | | | | This regressed when I converted BRW_REGISTER_TYPE_* to be an abstract type that doesn't match the hardware description. dump_instruction() was using reg_encoding[] from brw_disasm.c, which no longer matches (and was incorrect for Gen8+ anyway). This patch introduces a new function to convert the abstract enum values into the letter suffix we expect. Signed-off-by: Kenneth Graunke <[email protected]> Reported-by: Matt Turner <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* s/Tungsten Graphics/VMware/José Fonseca2014-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/[email protected]/[email protected]/ s/[email protected]/[email protected]/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\[email protected]/[email protected]/g s/keithw\[email protected]/[email protected]/g s/[email protected]/[email protected]/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/[email protected]/[email protected]/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <[email protected]>
* i965: Add support for Broadwell's new register types.Kenneth Graunke2013-12-201-0/+5
| | | | | | | | | | | | Broadwell introduces support for Q, UQ, and HF types. It also extends DF support to allow immediate values. Irritatingly, although HF and DF both support immediates, they're represented by a different value depending on the register file. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add BRW_REGISTER_TYPE_DF.Kenneth Graunke2013-12-201-0/+2
| | | | | | | | | Ivybridge, Baytrail, and Haswell support double float register types, but do not support them as immediate values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Abstract BRW_REGISTER_TYPE_* into an enum with unique values.Kenneth Graunke2013-12-201-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | On released hardware, values 4-6 are overloaded. For normal registers, they mean UB/B/DF. But for immediates, they mean UV/VF/V. Previously, we just created #defines for each name, reusing the same value. This meant we could directly splat the brw_reg::type field into the assembly encoding, which was fairly nice, and worked well. Unfortunately, Broadwell makes this infeasible: the HF and DF types are represented as different numeric values depending on whether the source register is an immediate or not. To preserve sanity, I decided to simply convert BRW_REGISTER_TYPE_* to an abstract enum that has a unique value for each register type, and write translation functions. One nice benefit is that we can add assertions about register files and generations. I've chosen not to convert brw_reg::type to the enum, since converting it caused a lot of trouble due to C++ enum rules (even though it's defined in an extern "C" block...). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Delete bogus BRW_REGISTER_TYPE_HF define.Kenneth Graunke2013-12-201-1/+0
| | | | | | | | | | git blame ascribes this to the initial commit of the driver. No released hardware has ever supported half float, according to the documentation for SrcType in the ISA reference. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>