aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_defines.h
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add #defines for Memory Object Control State fields on Gen7-7.5.Kenneth Graunke2013-07-181-0/+26
| | | | | | | | | | | The L3 controls are identical on all platforms, but LLC differs: - Ivybridge has a "cache in LLC" flag - Baytrail has no LLC, but instead has a snoop bit: "data accesses in this page must be snooped in the CPU caches." - Haswell has writeback/uncached flags for LLC and eLLC (eDRAM). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Add MOCS shift and mask for SURFACE_STATE entries.Kenneth Graunke2013-07-181-0/+3
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Cite the Ivybridge PRM for SFID enum values.Kenneth Graunke2013-07-151-2/+1
| | | | | | | | | | | The Ivybridge PRM adds new SFIDs and lists them in a different volume than Sandybridge, so it's worth adding a reference. I also removed the BSpec reference, as the section it referred to was moved somewhere, and I couldn't find it. This leaves one Haswell SFID without a citation, but we can add one once the PRMs are out. Signed-off-by: Kenneth Graunke <[email protected]>
* i965/blorp: Write blorp code to do render target resolves.Paul Berry2013-06-121-0/+1
| | | | | | | | | | | | | | | | | | | This patch implements the "render target resolve" blorp operation. This will be needed when a buffer that has experienced a fast color clear is later used for a purpose other than as a render target (texturing, glReadPixels, or swapped to the screen). It resolves any remaining deferred clear operation that was not taken care of during normal rendering. Fortunately not much work is necessary; all we need to do is scale down the size of the rectangle primitive being emitted, run the fragment shader with the "Render Target Resolve Enable" bit set, and ensure that the fragment shader writes to the render target using the "replicated color" message. We already have a fragment shader that does that (the shader that we use for fast color clears), so for simplicity we re-use it. Reviewed-by: Eric Anholt <[email protected]>
* i965/gen7+: Implement fast color clear operation in BLORP.Paul Berry2013-06-121-0/+2
| | | | | | | | | | | | | | | | | | Since we defer allocation of the MCS miptree until the time of the fast clear operation, this patch also implements creation of the MCS miptree. In addition, this patch adds the field intel_mipmap_tree::fast_clear_color_value, which holds the most recent fast color clear value, if any. We use it to set the SURFACE_STATE's clear color for render targets. v2: Flag BRW_NEW_SURFACES when allocating the MCS miptree. Generate a perf_debug message if clearing to a color that isn't compatible with fast color clear. Fix "control reaches end of non-void function" build warning. Reviewed-by: Eric Anholt <[email protected]>
* i965 gen7: use SURFACE_STATE fields to select render level/layerJordan Justen2013-06-021-0/+2
| | | | | | | | | | Rather than pointing the surface_state directly at a single sub-image of the texture for rendering, we now point the surface_state at the top level of the texture, and configure the surface_state as needed based on this. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add surface format defines from the public specs.Eric Anholt2013-05-081-0/+45
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add support for emitting and disassembling bit instructions.Matt Turner2013-05-061-0/+7
| | | | | | | | | | | | Specifically bfe - for bitfieldExtract() bfi1 and bfi2 - for bitfieldInsert() bfrev - for bitfieldReverse() cbit - for bitCount() fbh - for findMSB() fbl - for findLSB() Reviewed-by: Chris Forbes <[email protected]>
* i965: Add 3-src destination and shared-source type macros.Matt Turner2013-05-061-0/+11
| | | | Reviewed-by: Chris Forbes <[email protected]>
* i965: Remove strange comments about math functions.Matt Turner2013-04-241-3/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove traces of nonexistent TAN math function.Matt Turner2013-04-241-1/+0
| | | | | | | Never existed? At least never supported. Doesn't appear in 965, G45, or ILK documentation. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Trim trailing whitespace in brw_defines.h.Eric Anholt2013-04-171-144/+144
| | | | | | It was all over the formats section I wanted to edit. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use GRFs for pull constant offsets on gen7.Eric Anholt2013-04-101-0/+1
| | | | | | | | | | | | | This allows the computation of the offset to get written directly into the message source. shader-db results: total instructions in shared programs: 3308390 -> 3283025 (-0.77%) instructions in affected programs: 442998 -> 417633 (-5.73%) No difference in GLB2.7 low res (n=9). Reviewed-by: Matt Turner <[email protected]>
* i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.Kenneth Graunke2013-03-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "discard" instructions generate HALT instructions which jump to a final HALT near the end of the shader. Previously, fs_generator created this final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it to jump right before the FB write epilogue. This is normally good. However, INTEL_DEBUG=shader_time also has an epilogue section which records the final timestamp. The frontend emits IR for this just before FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering: 1. Shader Time Epilogue 2. Final HALT (where discards jump) 3. Framebuffer Write Epilogue This meant that discarded pixels completely skipped the shader time epilogue, causing no ending timestamp to be written. This obviously led to inaccurate results. This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just before any epilogue sections. This is where the final HALT should be generated, and makes it easy to ensure the correct ordering: 1. Final HALT 2. Shader Time Epilogue 3. Framebuffer Write Epilogue For shaders that don't discard, this opcode compiles away to nothing. The scheduler adds barrier dependencies to make sure that it doesn't get moved above any FS_OPCODE_DISCARD_JUMP instructions. One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to a mere 5.13 Gcycles. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Generate LOD sampler message from ir_lod.Matt Turner2013-03-291-0/+2
| | | | | v2: Support Ironlake as well. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make INTEL_DEBUG=shader_time use the RAW surface format.Kenneth Graunke2013-03-141-0/+1
| | | | | | | | | | | | | | Untyped Atomic Operation messages are illegal for non-RAW formats. The IVB hardware proceeds happily (after all, who cares what the format of the surface is if you're doing untyped ops on it?), but later hardware apparently doesn't. The simulator for gen7 does complain, though. v2: Rebase against updates to previous patches. (by anholt) NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix INTEL_DEBUG=shader_time for Haswell.Kenneth Graunke2013-03-141-0/+1
| | | | | | | | | | | | | | | Haswell's "Data Cache" data port is a single unit, but split into two SFIDs to allow for more message types without adding more bits in the message descriptor. Untyped Atomic Operations are now message 0010 in the second data cache data port, rather than 6 in the first. v2: Use the #defines from the previous commit. (by anholt) NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> (v1)
* i965: Add definitions for gen7+ data cache messages.Eric Anholt2013-03-141-0/+37
| | | | | | | | | | We were sparsely using some of these message types, but I'll just fill them all in now. It will be used for fixing shader_time on HSW. v2: Add missing MEDIA_BLOCK_READ. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Switch to using sampler LD messages for uniform pull constants.Eric Anholt2013-03-111-1/+1
| | | | | | | | | When forcing the compiler to always generate pull constants instead of push constants (in order to have an easy to use testcase), improves performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: add a new virtual opcode: SHADER_OPCODE_TXF_MSChris Forbes2013-03-021-0/+1
| | | | | | | | | | | | | This is very similar to the TXF opcode, but lowers to `ld2dms` rather than `ld` on Gen7. V4: - add SHADER_OPCODE_TXF_MS to is_tex() functions, so regalloc thinks it actually writes the correct number of registers. Otherwise in nontrivial shaders some of the registers tend to get clobbered, producing bad results. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Add support for emitting the LRP instruction.Kenneth Graunke2013-02-281-0/+1
| | | | | | | | Like MAD, this is another three-source instruction. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965/fs/gen7: Emit code for GLSL 3.00 pack/unpack operations (v4)Chad Versace2013-01-241-0/+3
| | | | | | | | | | | | | v2: Remove lewd comment. [for idr] v3: - Optimize away tmp register for packHalf2x16. [for anholt, paul] - Improve comments. [for anholt, paul] - Reduce near-duplicate code by removing vec4_visitor emit_pack/unpack methods. [for chadv] v4: Factor our UD/W register conversion into helper function. [for anholt] Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v2) Signed-off-by: Chad Versace <[email protected]>
* i965: Add opcodes for F32TO16 and F16TO32Chad Versace2013-01-241-0/+2
| | | | | | | | | | | | | The GLSL ES 3.00 operations packHalf2x16 and unpackHalf2x16 will emit these opcodes. - Define the opcodes BRW_OPCODE_{F32TO16,F16TO32}. - Add the opcodes to the brw_disasm table. - Define convenience functions brw_{F32TO16,F16TO32}. Reviewed-by: Ian Romanick <[email protected]> Acked-by: Paul Berry <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Add #defines for GL_FIXED vertex formats.Kenneth Graunke2013-01-071-0/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Add remaining #defines for packed vertex formats.Kenneth Graunke2013-01-071-0/+9
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Use Haswell's sample_d_c for textureGrad with shadow samplers.Kenneth Graunke2013-01-071-0/+1
| | | | | | The new hardware actually just supports this now. Reviewed-by: Eric Anholt <[email protected]>
* i965: Replace structs with bit-shifting for Gen7 SURFACE_STATE entries.Kenneth Graunke2013-01-031-7/+36
| | | | | | | | | | | | | | | | | Every generation except Gen7 creates SURFACE_STATE entries via a uint32_t array. Only Gen7 uses the older bitfield structure, which we moved away from because it was less efficient. Convert it for consistency. This reduces the compiled size of gen7_wm_surface_state.o by 2.86% in a release build. v2: Fix accidental use of BRW_SURFACE_WIDTH/HEIGHT in brw_state_dump.c; switch back to gen7_set_surface_mcs_info setting surf[6] directly (both per Eric's review comments). Acked-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Set up gen7 UBO loads as sends from GRFs.Eric Anholt2012-12-141-0/+2
| | | | | | | | | | | | This gives the instruction scheduler a chance to schedule between the loads, whereas before it was restricted due to the dependencies between the MRFs for setting them up. For one shader in gles3conform, it goes from getting stuck in register allocation for as long as anybody's bothered to leave it running down to 23 seconds, thanks to the LIFO scheduling. Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Improve performance of shaders that start out with a discard.Eric Anholt2012-12-111-0/+1
| | | | | | | | | | | | | | I had tried this in the past, but ran into trouble with applications that sample from undiscarded pixels in the same subspan. To fix that issue, only jump to the end for an entire subspan at a time. Improves GLbenchmark 2.7 (1024x768) performance by 7.9 +/- 1.5% (n=8). v2: Drop the br variable in the jump instruction -- if I ever do jumps pre-gen6, it'll be a different code block anyway since we don't have HALT until gen6. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Rewrite discards to use a flag subreg to track discarded pixels.Eric Anholt2012-12-111-1/+0
| | | | | | | | | This makes much more sense on gen6+, and will also prove useful for early exit of shaders on discard. v2: fix up a stale comment from before converting gen4-5. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a debug flag for counting cycles spent in each compiled shader.Eric Anholt2012-12-051-2/+21
| | | | | | | | | | | | | | | | | | | | | | This can be used for two purposes: Using hand-coded shaders to determine per-instruction timings, or figuring out which shader to optimize in a whole application. Note that this doesn't cover the instructions that set up the message to the URB/FB write -- we'd need to convert the MRF usage in these instructions to GRFs so that our offsets/times don't overwrite our shader outputs. Reviewed-by: Kenneth Graunke <[email protected]> (v1) v2: Check the timestamp reset flag in the VS, which is apparently getting set fairly regularly in the range we watch, resulting in negative numbers getting added to our 32-bit counter, and thus large values added to our uint64_t. v3: Rebase on reladdr changes, removing a new safety check that proved impossible to satisfy. Add a comment to the AOP defs from Ken's review, and put them in a slightly more sensible spot. v4: Check timestamp reset in the FS as well.
* i965/fs: Add instruction emit for varying-index reads of uniforms.Eric Anholt2012-12-041-0/+3
| | | | | | | The gen7 send-from-GRF path is sufficiently different from the perspective of IR generation and optimization that I just made it a separate opcode. v2: fix whitespace, rebase on Ken's recent refactor.
* i965/fs: Rename the existing pull constant load opcode.Eric Anholt2012-12-041-1/+1
| | | | | We're going to use another send message for handling loads with a varying per-fragment array index.
* i965: Fix primitive restart on Haswell.Kenneth Graunke2012-09-061-0/+3
| | | | | | | | | | | | | | | | | | | | | Haswell moved the "Cut Index Enable" bit from the INDEX_BUFFER packet to a new 3DSTATE_VF packet, so we need to emit that. Also, it requires us to specify the cut index rather than assuming it's 0xffffffff. This adds a new Haswell-specific tracked state atom to gen7_atoms. Normally, we would create a new generation-specific atom list, but since there's only one difference over Ivybridge so far, I chose to simply make it return without doing any work on non-Haswell systems. Fixes five piglit tests: - general/primitive-restart-DISABLE_VBO - general/primitive-restart-VBO_COMBINED_VERTEX_AND_INDEX - general/primitive-restart-VBO_INDEX_ONLY - general/primitive-restart-VBO_SEPARATE_VERTEX_AND_INDEX - general/primitive-restart-VBO_VERTEX_ONLY Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/gen6+: Add support for edge flags.Eric Anholt2012-08-091-0/+1
| | | | | | | Fixes the 3 new piglit edgeflag tests. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40707 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/msaa: Add CMS-related sampler messages to brw_defines.h.Paul Berry2012-07-111-0/+2
| | | | Reviewed-by: Chad Versace <[email protected]>
* i965/fs: Add FS_OPCODE_MOV_DISPATCH_TO_FLAGS to fragment shader backend.Paul Berry2012-07-021-0/+1
| | | | | | | | | | | | | | In order to compute centroid varyings correctly, the fragment shader needs to be able to load the current pixel/sample mask into a flag register. This patch adds an opcode to the fragment shader back-end to do this; the opcode gets translated into the instruction mov(1) f0<1>UW g1.14<0,1,0>UW { align1 WE_all } Since this instruction clobbers f0, instruction scheduling has to treat it the same as instructions that have a conditional modifier. Reviewed-by: Eric Anholt <[email protected]>
* i965/msaa: Adapt clip setup for centroid noperspective interpolation.Paul Berry2012-06-251-0/+4
| | | | | | | | | | | | | | | | | | To save time, we only instruct the clip stage of the pipeline to compute noperspective barycentric coordinates if those coordinates are needed by the fragment shader. Previously, we would determine whether the coordinates were needed by seeing whether the fragment shader used the BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC interpolation mode. However, with MSAA, it's possible that the fragment shader might use BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC instead. In the future, when we support ARB_sample_shading, it might use BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC. This patch modifies the upload_clip_state() functions to check for all three possible noperspective interpolation modes. Reviewed-by: Eric Anholt <[email protected]>
* i965/msaa: Add defines for Gen7.Paul Berry2012-05-251-0/+5
| | | | | | Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Implement proper texel fetch messages for Gen7.Paul Berry2012-05-251-0/+1
| | | | | | | | | | | | | | | | | On Gen6, texel fetch is always accomplished using the SAMPLE_LD message, which accepts arguments (u, v, r, lod, si). On Gen7, there are two* texel fetch messages: SAMPLE_LD for non-MSAA surfaces, taking arguments (u, lod, v), and SAMPLE_LD2DSS for MSAA surfaces, taking arguments (si, u, v). *Technically, there are other texel fetch messages, but they are used for "compressed" MSAA surfaces, which we don't yet support. This patch adds the proper message types and argument orderings for Gen7. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: add flag to enable cut_indexJordan Justen2012-05-231-0/+2
| | | | | | | | | When brw->prim_restart.enable_cut_index is set, the cut index will be enabled when uploading index_buffer commands. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/gen6+: Add support for fast depth clears.Eric Anholt2012-05-231-1/+3
| | | | | | | | | | Improves citybench high-res performance 3.0% +- 0.4%, n=10. Improves Lightsmark 1024x768 performance 0.74% +/- 0.20% (n=78). No significant difference on openarena (n=5, didn't fast clear) or nexuiz (n=3). Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/gen6: Initial implementation of MSAA.Paul Berry2012-05-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to understand multisampled buffers, adapting the rendering pipeline setup to enable multisampled rendering, and adding multisample resolve operations to brw_blorp_blit.cpp. Some preparation work is also included for Gen7, but it is not yet enabled. MSAA support is still fairly preliminary. In particular, the following are not yet supported: - Fully general blits between MSAA and non-MSAA buffers. - Formats other than RGBA8, DEPTH24, and STENCIL8. - Centroid interpolation. - Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE, GL_SAMPLE_COVERAGE_INVERT). Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on i965/Gen6. v2: - In intel_alloc_renderbuffer_storage(), quantize the requested number of samples to the next higher sample count supported by the hardware. This ensures that a query of GL_SAMPLES will return the correct value. It also ensures that MSAA is fully disabled on Gen7 for now (since Gen7 MSAA support doesn't work yet). - When reading from a non-MSAA surface, ensure that s_is_zero is true so that we won't try to read from a nonexistent sample.
* i965: Set "Shader Channel Select" fields in Haswell's SURFACE_STATE.Kenneth Graunke2012-03-301-0/+8
| | | | | | | | | | | | These can be used to implement EXT_texture_swizzle without baking state-dependent swizzle instructions into the shader and forcing recompiles. For now, just set them to pass-through mode, so everything continues to work as it did on Ivybridge. We can optimize this later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fill in Sample Mask in Haswell's 3DSTATE_PS.Kenneth Graunke2012-03-301-0/+2
| | | | | | | We only need one sample, since we don't support multisampling yet. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Set "Stencil Buffer Enable" bit on Haswell.Kenneth Graunke2012-03-301-0/+1
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Set Line Stipple enable bit in 3DSTATE_SF for Haswell.Kenneth Graunke2012-03-301-0/+2
| | | | | | | Apparently this needs to be the same as in 3DSTATE_WM. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Update max VS/PS threads shift offsets for Haswell.Kenneth Graunke2012-03-301-1/+3
| | | | | | | These now start at bit 23 instead of bit 24/25. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add support for the MAD opcode on gen6+.Eric Anholt2012-02-101-0/+1
| | | | | | v2: Fix MRF handling on gen7. Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965: fix inverted point sprite origin when rendering to FBOYuanhan Liu2012-01-281-0/+1
| | | | | | | | | | | | | | | | | | | | When rendering to FBO, rendering is inverted. At the same time, we would also make sure the point sprite origin is inverted. Or, we will get an inverted result correspoinding to rendering to the default winsys FBO. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44613 NOTE: This is a candidate for stable release branches. v2: add the simliar logic to ivb, too (comments from Ian) simplify the logic operation (comments from Brian) v3: pick a better comment from Eric use != for the logic instead of ^ (comments from Ian) Signed-off-by: Yuanhan Liu <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>