aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_defines.h
Commit message (Collapse)AuthorAgeFilesLines
* i965: Implement "Static Vertex Count" geometry shader optimization.Kenneth Graunke2015-09-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex Count" fields, which control a new optimization. Normally, geometry shaders can output arbitrary numbers of vertices, which means that resource allocation has to be done on the fly. However, if the number of vertices is statically known, the hardware can pre-allocate resources up front, which is more efficient. Thanks to the new NIR GS intrinsics, this is easy. We just call the function introduced in the previous commit to get the vertex count. If it obtains a count, we stop emitting the extra 32-bit "Vertex Count" field in the VUE, and instead fill out the 3DSTATE_GS fields. Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91) on my Lenovo X250 laptop (Broadwell GT2) at 1024x768. shader-db statistics for geometry shaders only: total instructions in shared programs: 3227 -> 3207 (-0.62%) instructions in affected programs: 242 -> 222 (-8.26%) helped: 10 v2: Don't break non-NIR paths (just skip this optimization). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Implement FS_OPCODE_GET_BUFFER_SIZESamuel Iglesias Gonsalvez2015-09-251-0/+1
| | | | | Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4: Implement VS_OPCODE_GET_BUFFER_SIZESamuel Iglesias Gonsalvez2015-09-251-0/+3
| | | | | | | | Notice that Skylake needs to include a header in the sampler message so it will need some tweaks to work there. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/cs: Implement DispatchComputeIndirect supportJordan Justen2015-09-241-0/+2
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Add defines for tessellation stagesChris Forbes2015-09-221-0/+72
| | | | | | | | | | | | | v2 (Ken): - Squash together commits for HS, DS, and TE, as well as fixes. - Add INTEL_MASK variants so we can use SET_FIELD if we want. - Rename GEN7_HS_INSTANCE_CONTROL to GEN7_HS_INSTANCE_COUNT to match the documentation. - Add some more fields from the PRMs. - Add Broadwell variants. Signed-off-by: Chris Forbes <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965/cs: Enable barrier in MEDIA_INTERFACE_DESCRIPTORJordan Justen2015-09-101-0/+2
| | | | | | | | | | | | | Enable barrier in MEDIA_INTERFACE_DESCRIPTOR if the program uses the barrier() GLSL function. On Ivy Bridge and Haswell, this allows the piglit test tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test to pass. On gen8, this enables a similar test with a local group size of 896 to pass. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: add support for textureSamples functionIlia Mirkin2015-09-101-0/+2
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> [v2: kayden-supplied code in fs_nir replacing need for logical opcode] Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Mark topologies with adjacency information as G45+.Kenneth Graunke2015-09-081-4/+4
| | | | | | | These didn't exist on the original 965. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Fix value of _3DPRIM_TRIFAN_NOSTIPPLE.Kenneth Graunke2015-09-081-1/+1
| | | | | | | | | | | | TRIFAN_NOSTIPPLE has always been 0x16 - 0x15 is marked "Reserved" on all platforms. See the 965 PRM, Volume 2, Table 3-1, "3D Primitive Topology Type Encoding" for a list. We don't currently use this, and I don't expect we will, but we may as well not leave the bogus value around. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add defines for all new Gen7/8 URB opcodesChris Forbes2015-09-081-1/+7
| | | | | | | Tessellation needs to emit URB reads and atomics; Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen9: Annotate input coverage mask changeBen Widawsky2015-09-031-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | As far as I can tell, the behavior is preserved from the previous generations. Before we set a single bit to tell the FS whether or not we'll be using an input coverage mask. Now we have some options which are implementing various extensions. These bits are used for the various conservative rasterization mechanisms (for collision detection, binning, and whatever else). I believe that the behavior is preserved because the problem which conservative rasterization is attempting to fix would go away with the "NORMAL" mode (at the cost of performance, I believe). This patch serves as documentation of the change by creating the enums, as well as giving some of the history with the links here so that the next person who comes along and looks at it doesn't spend as long as I had to in order to determine if there is an issue or not. Previously, this algorithm had been done in software, and this can still be used as long as we don't export an extension stating otherwise. References: https://www.opengl.org/registry/specs/NV/conservative_raster.txt References: https://http.developer.nvidia.com/GPUGems2/gpugems2_chapter42.html Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Setup push constant data for uniformsJordan Justen2015-09-021-0/+6
| | | | | | | | | | brw_upload_cs_push_constants was based on gen6_upload_push_constants. v2: * Add FINISHME comments about more efficient ways to push uniforms Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/chv|skl: Apply sampler bypass w/aBen Widawsky2015-08-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Certain compressed formats require this setting. The docs don't go into much detail as to why it's needed exactly. This patch introduces no piglit regressions on gen9 (bsw is untested). Note that the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are WTF. The patch also fixes nothing I can find. http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/ v2: Reworded commit message (Matt); Added piglit results link. Restructured condition (Matt) Moved check out to function (Nanley). I left the setting of the bit in the surface state open coded because it seems to go better with the existing code. v3: Use and inline function only in gen8_emit_texture_surface_state() (Matt). Cc: Matt Turner <[email protected]> Cc: Nanley Chery <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/surface_formats: add support for 2D ASTC surface formatsNanley Chery2015-08-261-0/+32
| | | | | | | | | | | | | | | | | | Define two-thirds of the 2D Intel ASTC surface formats (LDR-only). This allows a 1-to-1 mapping from the mesa format to the Intel format. ASTC textures will default to being processed in LDR mode. If there is hardware support for HDR/Full mode and the texture is not sRGB, add the format bit necessary to process it in HDR/Full mode. v2: remove extra newlines. v3: follow existing coding style in translate_tex_format(). v4: expound on the GEN9_SURFACE_ASTC_HDR_FORMAT_BIT comment. update SF table - ASTC is actually supported in Gen8. v5: conform the ASTC MESA_FORMAT enums to the existing naming convention. Reviewed-by: Anuj Phogat <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* i965/gen7-8: Set up early depth/stencil control appropriately for image ↵Francisco Jerez2015-08-111-0/+3
| | | | | | | | | load/store. v2: Store early fragment test mode in brw_wm_prog_data instead of getting it from core mesa data structures (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen7-8: Poke the 3DSTATE UAV access enable bits.Francisco Jerez2015-08-111-0/+4
| | | | | | v2: Set the PS UAV-only bit on HSW (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Define virtual instruction to calculate the high 32 bits of a multiply.Francisco Jerez2015-08-061-0/+5
| | | | | | | | | | | This instruction will translate to the MUL/MACH sequence that computes the high 32-bits of the result of a 64-bit multiply. Before Gen8 integer operations that used the accumulator were limited to 8-wide, but the SIMD lowering pass can easily be hooked up to sidestep this limitation, we just need a virtual opcode to represent the MUL/MACH sequence in the IR. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Define logical typed and untyped surface opcodes.Francisco Jerez2015-07-291-0/+20
| | | | | | | | | | | | | | | Each logical variant is largely equivalent to the original opcode but instead of taking a single payload source it expects its arguments separately as individual sources, like: typed_surface_write_logical null, coordinates, source, surface, num_coordinates, num_components This patch defines the opcodes and usual instruction boilerplate, including a placeholder lowering function provided mainly as documentation for their source registers. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Define logical texture sampling opcodes.Francisco Jerez2015-07-291-0/+31
| | | | | | | | | | | | | | | | | Each logical variant is largely equivalent to the original opcode but instead of taking a single payload source it expects the arguments separately as individual sources, like: tex_logical dst, coordinates, shadow_c, lod, lod2, sample_index, mcs, sampler, offset, num_coordinate_components, num_grad_components This patch defines the opcodes and usual instruction boilerplate, including a placeholder lowering function provided mostly as documentation for their source registers. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove the FS_OPCODE_SET_OMASK pseudo-opcode.Francisco Jerez2015-07-291-1/+0
| | | | | | | This is now unused. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Define logical framebuffer write opcode.Francisco Jerez2015-07-291-0/+15
| | | | | | | | | | | | | | | | The logical variant is largely equivalent to the original opcode but instead of taking a single payload source it expects its arguments that make up the payload separately as individual sources, like: fb_write_logical null, color0, color1, src0_alpha, src_depth, dst_depth, sample_mask, num_components This patch defines the opcode and usual instruction boilerplate, including a placeholder lowering function provided mainly as self-documentation. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Define HW-binding table and resource streamer control opcodesAbdiel Janulgue2015-07-181-0/+30
| | | | | | | | | | | v2: Use macros for HW binding table edits (Topi) v3: Add Broadwell support. v4: Make hardware binding table bit definitions even more clearer (Ken) Cc: [email protected] Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/gen9: Use custom MOCS entries set up by the kernel.Francisco Jerez2015-07-161-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of relying on hardware defaults the i915 kernel driver is going program custom MOCS tables system-wide on Gen9 hardware. The "WT" entry previously used for renderbuffers had a number of problems: It disabled caching on eLLC, it used a reserved L3 cacheability setting, and it used to override the PTE controls making renderbuffers always WT on LLC regardless of the kernel's setting. Instead use an entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE, L3CC=WB. The "WB" entry previously used for anything other than renderbuffers has moved to a different index in the new MOCS tables but it should have the same caching semantics as the old entry. Even though the corresponding kernel change ("drm/i915: Added Programming of the MOCS") is in a way an ABI break it doesn't seem necessary to check that the kernel is recent enough because the change should only affect Gen9 which is still unreleased hardware. v2: Update MOCS values for the new Android-incompatible tables introduced in v7 of the kernel patch. Cc: 10.6 <[email protected]> Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-July/071080.html Reviewed-by: Ben Widawsky <[email protected]>
* i965/cs: Initialize GPGPU Thread CountJordan Justen2015-07-141-0/+5
| | | | | | | | | | | | | | | | This field should always be set for gen8. In the bdw PRM, Volume 2d: Command Reference: Structures under INTERFACE_DESCRIPTOR_DATA, DWORD 6, Bits 9:0, Number of Threads in GPGPU Thread Group: "This field should not be set to 0 even if the barrier is disabled, since an accurate value is needed for proper pre-emption." In the HSW PRM, the it doesn't mention that it must always be set, but it should not hurt. Reported-by: Kristian Høgsberg <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/skl: Set the pulls bary bit in 3DSTATE_PS_EXTRANeil Roberts2015-07-061-0/+1
| | | | | | | | | | | On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if the shader sends a message to the pixel interpolator. This fixes the interpolateAt* tests on SKL, apart from interpolateatsample-nonconst but that is not implemented anywhere so it's not a regression. Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "10.6 10.5" <[email protected]>
* i965/gen9: Don't use encrypted MOCSBen Widawsky2015-06-231-2/+2
| | | | | | | | | | | | | | | | | | On gen9+ MOCS is an index into a table. It is 7 bits, and AFAICT, bit 0 is for doing encrypted reads. I don't recall how I decided to do this for BXT. I don't know this patch was ever needed, since it seems nothing is broken today on SKL. Furthermore, this patch may no longer be needed because of the ongoing changes with MOCS setup. It is what is being used/tested, so it's included in the series. The chosen values are the old values left shifted. That was also an arbitrary choice. v2: Use shift in MOCS to make it clear what we're doing. (Ken) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen9: Disable Mip Tail for YF/YS tiled surfacesAnuj Phogat2015-06-161-0/+3
| | | | | | | | | | | | | | | Disabling miptails fixed the buffer corruption happening in FBO which use YF/YS tiled renderbuffer or texture as color attachment. Spec recommends disabling mip tails only for non-mip-mapped surfaces. But, without disabling miptails I couldn't get correct data out of mipmapped YF/YS tiled surface. We need better understanding of miptails before start using them. For now this patch helps move things forward. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/gen9: Set tiled resource mode in surface stateAnuj Phogat2015-06-161-0/+6
| | | | | | | | This patch sets the tiled resource mode for texture and renderbuffer surfaces. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/fs: Implement support for ir_barrierJordan Justen2015-06-121-0/+5
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add GATEWAY_SFID definitionsJordan Justen2015-06-121-0/+8
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Create a shader_dispatch_mode enum to replace VS/GS fields.Kenneth Graunke2015-06-011-3/+2
| | | | | | | | | | | | | We used to store the GS dispatch mode in brw_gs_prog_data while separately storing the VS dispatch mode in brw_vue_prog_data::simd8. This patch introduces an enum to represent all possible dispatch modes, and stores it in brw_vue_prog_data::dispatch_mode, unifying the two. Based on a suggestion by Matt Turner. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Add Gen9 surface state decodingBen Widawsky2015-05-181-0/+2
| | | | | | | | | | | | | | Gen9 surface state is very similar to the previous generation. The important changes here are aux mode, and the way clear colors work. NOTE: There are some things intentionally left out of this decoding. v2: Redo the string for the aux buffer type to address compressed variants. v3: Use the shift for compression enable (instead of compression mode) (Topi) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add gen8 surface state debug infoBen Widawsky2015-05-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AFAICT, none of the old data was wrong (the gen7 decoder), but it wa smissing a bunch of stuff. Adds a tick (') to denote the beginning of the surface state for easier reading. This will be replaced later with some better, but more risky code. OLD: 0x00007980: 0x23016000: SURF: 2D BRW_SURFACEFORMAT_B8G8R8A8_UNORM 0x00007984: 0x18000000: SURF: offset 0x00007988: 0x00ff00ff: SURF: 256x256 size, 0 mips, 1 slices 0x0000798c: 0x000003ff: SURF: pitch 1024, tiled 0x00007990: 0x00000000: SURF: min array element 0, array extent 1 0x00007994: 0x00000000: SURF: mip base 0 0x00007998: 0x00000000: SURF: x,y offset: 0,0 0x0000799c: 0x09770000: SURF: 0x00007940: 0x231d7000: SURF: 2D BRW_SURFACEFORMAT_R8G8B8A8_UNORM 0x00007944: 0x78000000: SURF: offset 0x00007948: 0x001f001f: SURF: 32x32 size, 0 mips, 1 slices 0x0000794c: 0x0000007f: SURF: pitch 128, tiled 0x00007950: 0x00000000: SURF: min array element 0, array extent 1 0x00007954: 0x00000000: SURF: mip base 0 0x00007958: 0x00000000: SURF: x,y offset: 0,0 0x0000795c: 0x09770000: SURF: NEW (v1): 0x00007980: 0x23016000: SURF': 2D B8G8R8A8_UNORM VALIGN4 HALIGN4 X-tiled 0x00007984: 0x18000000: SURF: MOCS: 0x18 Base MIP: 0.0 (0 mips) Surface QPitch: 0 0x00007988: 0x00ff00ff: SURF: 256x256 [AUX_NONE] 0x0000798c: 0x000003ff: SURF: 1 slices (depth), pitch: 1024 0x00007990: 0x00000000: SURF: min array element: 0, array extent 1, MULTISAMPLE_1 0x00007994: 0x00000000: SURF: x,y offset: 0,0, min LOD: 0 0x00007998: 0x00000000: SURF: AUX pitch: 0 qpitch: 0 0x0000799c: 0x09770000: SURF: Clear color: ---- 0x00007940: 0x231d7000: SURF': 2D R8G8B8A8_UNORM VALIGN4 HALIGN4 Y-tiled 0x00007944: 0x78000000: SURF: MOCS: 0x78 Base MIP: 0 (0 mips) Surface QPitch: ff0000 0x00007948: 0x001f001f: SURF: 32x32 [AUX_NONE] 0x0000794c: 0x0000007f: SURF: 1 slices (depth), pitch: 128 0x00007950: 0x00000000: SURF: min array element: 0, array extent 1, MULTISAMPLE_1 0x00007954: 0x00000000: SURF: x,y offset: 0,0, min LOD: 0 0x00007958: 0x00000000: SURF: AUX pitch: 0 qpitch: 0 0x0000795c: 0x09770000: SURF: Clear color: ---- 0x00007920: 0x00007980: BIND0: surface state address 0x00007924: 0x00007940: BIND1: surface state address v2: Style cleanups (Matt) Fix aux mode dword 7->6 (Topi) Use exp2 instead of pow (Matt) Add dwords 8-12 to the dump v3: Needed to update the surface format name getter for the change in the first patch in the series Signed-off-by: Ben Widawsky <[email protected]> Cc: Matt Turner <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add gen7+ sampler state to batch debugBen Widawsky2015-05-181-0/+1
| | | | | | | | | | | | | | | | | | | | OLD: 0x00007e00: 0x10000000: WM SAMP0: filtering 0x00007e04: 0x000d0000: WM SAMP0: wrapping, lod 0x00007e08: 0x00000000: WM SAMP0: default color pointer 0x00007e0c: 0x00000090: WM SAMP0: chroma key, aniso NEW: 0x00007e00: 0x10000000: SAMPLER_STATE 0: Disabled = no, Base Mip: 0.0, Mip/Mag/Min Filter: NONE/NEAREST/NEAREST, LOD Bias: 0.0 0x00007e04: 0x000d0000: SAMPLER_STATE 0: Min LOD: 0.0, Max LOD: 13.0 0x00007e08: 0x00000000: SAMPLER_STATE 0: Border Color 0x00007e0c: 0x00000090: SAMPLER_STATE 0: Max aniso: RATIO 2:1, TC[XYZ] Address Control: CLAMP|CLAMP|WRAP v2: Move GET_BITS macro to here (with paren protection) Ben/Topi Add const to the sampler pointer (Topi) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use predicate enable bit for conditional rendering w/o stallingNeil Roberts2015-05-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously whenever a primitive is drawn the driver would call _mesa_check_conditional_render which blocks waiting for the result of the query to determine whether to render. On Gen7+ there is a bit in the 3DPRIMITIVE command which can be used to disable the primitive based on the value of a state bit. This state bit can be set based on whether two registers have different values using the MI_PREDICATE command. We can load these two registers with the pixel count values stored in the query begin and end to implement conditional rendering without stalling. Unfortunately these two source registers were not in the whitelist of available registers in the kernel driver until v3.19. This patch uses the command parser version from intel_screen to detect whether to attempt to set the predicate data registers. The predicate enable bit is currently only used for drawing 3D primitives. For blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For most of these it would probably just work to call the new brw_check_conditional_render function instead of _mesa_check_conditional_render because they already work in terms of rendering primitives. However it's a bit trickier for blits because it can use the BLT ring or the blorp codepath. I think these operations are less useful for conditional rendering than rendering primitives so it might be best to leave it for a later patch. v2: Use the command parser version to detect whether we can write to the predicate data registers instead of trying to execute a register load command. v3: Simple rebase v4: Changes suggested by Kenneth Graunke: Split the load_64bit_register function out to a separate patch so it can be a shared public function. Avoid calling _mesa_check_conditional_render if we've already determined that there's no query object. Some styling fixes. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen6: setup limits for ARB_viewport_arrayChris Forbes2015-05-061-1/+1
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Introduce the FIND_LIVE_CHANNEL pseudo-opcode.Francisco Jerez2015-05-041-0/+8
| | | | | | | | | | | | | This instruction calculates the index of an arbitrary channel enabled in the current execution mask. It's expected to be used as input for the BROADCAST opcode, but it's implemented as a separate instruction rather than being baked into BROADCAST because FIND_LIVE_CHANNEL has no dependencies so it can always be CSE'ed with other instances of the same instruction within a basic block. v2: Whitespace fixes. Reviewed-by: Matt Turner <[email protected]>
* i965: Introduce the BROADCAST pseudo-opcode.Francisco Jerez2015-05-041-0/+6
| | | | | | | | | | | | | | | | | | | The BROADCAST instruction picks the channel from its first source given by an index passed in as second source. This will be used in situations where all channels from the same SIMD thread have to agree on the value of something, e.g. a surface binding table index. This is in particular the case for UBO, sampler and image arrays, which can be indexed dynamically with the restriction that all active SIMD channels access the same index, provided to the shared unit as part of a single scalar field of the message descriptor. Simply taking the index value from the first channel as we were doing until now is incorrect, because it might contain an uninitialized value if the channel had previously been disabled by non-uniform control flow. v2: Minor style fixes. Improve commit message. Reviewed-by: Matt Turner <[email protected]>
* i965: Add memory fence opcode.Francisco Jerez2015-05-041-0/+2
| | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add typed surface access opcodes.Francisco Jerez2015-05-041-0/+4
| | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add untyped surface write opcode.Francisco Jerez2015-05-041-0/+1
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/cs: Emit MEDIA_STATE_FLUSH after WALKERJordan Justen2015-05-021-0/+1
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Implement brw_emit_gpgpu_walkerJordan Justen2015-05-021-0/+13
| | | | | | | | | | | | | Tested on Ivybridge, Haswell and Broadwell. v2: * Use SET_FIELD. (Ken) * Use simd_size / 16 to support SIMD8/16/32. Ken suggested that we might be able to do it arithmetically rather than just supporting SIMD8 and SIMD16 with a conditional. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Upload brw_cs_stateJordan Justen2015-05-021-0/+20
| | | | | | | | v3: * Add defines. Misc cleanup suggestions. (Ken) Signed-off-by: Jordan Justen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/cs: Add CS_OPCODE_CS_TERMINATEJordan Justen2015-05-021-0/+5
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/ps: Use SET_FIELD() for sampler countTopi Pohjolainen2015-04-301-0/+1
| | | | | | | | The value is actually clamped to 0-16 as sample state pointer can be used to support more than 16 samplers. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/fs: Combine pixel center calculation into one inst.Matt Turner2015-04-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The X and Y values come interleaved in g1 (.4-.11 inclusive), so we can calculate them together with a single add(32) instruction on some platforms like Broadwell and newer or in SIMD8 elsewhere. Note that I also moved the PIXEL_X/PIXEL_Y virtual opcodes from before LINTERP to after it. That's because the writes_accumulator_implicitly() function in backend_instruction tests for <= LINTERP for determining whether the instruction indeed writes the accumulator implicitly. The old FS_OPCODE_PIXEL_X/Y emitted ADD instructions, which did, but the new opcodes just emit MOVs, which don't. It doesn't matter, since we don't use these opcodes on Gen4/5 anymore, but in the case that we do... On Broadwell: total instructions in shared programs: 7192355 -> 7186224 (-0.09%) instructions in affected programs: 1190700 -> 1184569 (-0.51%) helped: 6131 On Haswell: total instructions in shared programs: 6155979 -> 6152800 (-0.05%) instructions in affected programs: 652362 -> 649183 (-0.49%) helped: 3179 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Emit ADDs for gl_FragCoord, not virtual opcodes.Matt Turner2015-04-211-2/+0
| | | | | | | | | | These were used only on Gen4 and 5. emit_interpolation_setup_gen6() emits ADDs directly. The virtual opcodes weren't providing anything useful. I'm going to repurpose these opcodes, so deleting and readding them makes it simpler to see what's going on. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/skl: Add the header for constant loads outside of the generatorNeil Roberts2015-04-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5a06ee738 added a step to the generator to set up the message header when generating the VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 instruction. That pseudo opcode is implemented in terms of multiple actual opcodes, one of which writes to one of the source registers in order to set up the message header. This causes problems because the scheduler isn't aware that the source register is written to and it can end up reorganising the instructions incorrectly such that the write to the source register overwrites a needed value from a previous instruction. This problem was presenting itself as a rendering error in the weapon in Enemy Territory: Quake Wars. Since commit 588859e1 there is an additional problem that the double register allocated to include the message header would end up being split into two. This wasn't happening previously because the code to split registers was explicitly avoided for instructions that are sending from the GRF. This patch fixes both problems by splitting the code to set up the message header into a new pseudo opcode so that it will be done outside of the generator. This new opcode has the header register as a destination so the scheduler can recognise that the register is written to. This has the additional benefit that the scheduler can optimise the message header slightly better by moving the mov instructions further away from the send instructions. On Skylake it appears to fix the following three Piglit tests without causing any regressions: gs-float-array-variable-index gs-mat3x4-row-major gs-mat4x3-row-major I think we actually may need to do something similar for the fs backend and possibly for message headers from regular texture sampling but I'm not entirely sure. v2: Make sure the exec-size is retained as 8 for the mov instruction to initialise the header from g0. This was accidentally lost during a rebase on top of 07c571a39fa1. Split the patch into two so that the helper function is a separate change. Fix emitting the MOV instruction on Gen7. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89058 Reviewed-by: Ben Widawsky <[email protected]>
* i965: Add missing defines for render cache messages.Francisco Jerez2015-03-021-1/+7
| | | | | | And remove duplicated definition of OWORD_DUAL_BLOCK_WRITE. Reviewed-by: Paul Berry <[email protected]>