summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add Gen8+ VS dispatch_mode assertion.Kenneth Graunke2015-06-011-0/+3
| | | | | | | Suggested by Ben Widawsky. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Drop LOAD_PAYLOAD workaround in fs_visitor::emit_urb_writes().Kenneth Graunke2015-06-011-12/+4
| | | | | | | | | Now that Jason's LOAD_PAYLOAD improvements have landed, we don't need this. Passing 1 for the number of header registers already takes care of setting force_writemask_all on the header copy. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Use proper pitch for scalar GS pull constants and UBOs.Kenneth Graunke2015-06-011-3/+7
| | | | | | | | | See the corresponding code in brw_vs_surface_state.c. v2: const more things (requested by Topi Pohjolainen) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Create a shader_dispatch_mode enum to replace VS/GS fields.Kenneth Graunke2015-06-018-22/+24
| | | | | | | | | | | | | We used to store the GS dispatch mode in brw_gs_prog_data while separately storing the VS dispatch mode in brw_vue_prog_data::simd8. This patch introduces an enum to represent all possible dispatch modes, and stores it in brw_vue_prog_data::dispatch_mode, unifying the two. Based on a suggestion by Matt Turner. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Drop "Vector Mask Enable" bit from 3DSTATE_GS on Gen8+.Kenneth Graunke2015-06-011-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The documentation makes it pretty clear that we shouldn't use this: "Under normal conditions SW shall specify DMask, as the GS stage will provide a Dispatch Mask appropriate to SIMD4x2 or SIMD8 thread execution (as a function of dispatch mode). E.g., for SIMD4x2 execution, the GS stage will generate a Dispatch Mask that is equal to what the EU would use as the Vector Mask. For SIMD8 execution there is no known usage model for use of Vector Mask (as there is for PS shaders)." I also managed to find descriptions of DMask and VMask, in the "State Register" (sr0.2/3) field descriptions: "Dispatch Mask (DMask). This 32-bit field specifies which channels are active at Dispatch time." "Vector Mask (VMask). This 32-bit field contains, for each 4-bit group, the OR of the corresponding 4-bit group in the dispatch mask." SIMD4x2 shaders process one or two vec4 values, with each 4-bit group corresponding to xyzw channel enables (either all on, or all off). Thus, DMask = VMask in SIMD4x2 mode. But in SIMD8 mode, 4-bit groups are meaningless, so it just messes up your values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Don't add base_binding_table_index if it's zeroNeil Roberts2015-05-312-2/+4
| | | | | | | | | | | | | | | When calculating the binding table index for non-constant sampler array indexing it needs to add the base binding table index which is a constant within the generated code. Often this base is zero so we can avoid a redundant instruction in that case. It looks like nothing in shader-db is doing non-constant sampler array indexing so this patch doesn't make any difference but it might be worth having anyway. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Acked-by: Ben Widawsky <[email protected]>
* i965: Don't use a temporary when generating an indirect sampleNeil Roberts2015-05-312-26/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously when generating the send instruction for a sample instruction with an indirect sampler it would use the destination register as a temporary store. This breaks when used in combination with the opt_sampler_eot optimisation because that forces the destination to be null. This patch fixes that by avoiding the temp register altogether. The reason the temporary register was needed was because it was trying to ensure the binding table index doesn't overflow a byte by and'ing it with 0xff. The result is then or'd with samper_index<<8. This patch instead just and's the whole thing by 0xfff. This will ensure that a bogus sampler index won't overflow into the rest of the message descriptor but unlike the previous code it won't ensure that the binding table index doesn't overflow into the sampler index. It doesn't seem like that should matter very much though because if the shader is generating a bogus sampler index then it's going to just get garbage out either way. Instead of doing sampler_index<<8|(sampler_index+base_table_index) the new code avoids one operation by doing sampler_index*0x101+base_table_index which should be equivalent. However if we wanted to avoid the multiply for some reason we could do this by adding an extra or instruction still without needing the temporary register. This fixes a number of Piglit tests on Skylake that were using indirect samplers such as: spec@arb_gpu_shader5@execution@sampler_array_indexing@fs-simple Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Acked-by: Ben Widawsky <[email protected]> Tested-by: Anuj Phogat <[email protected]>
* dri_util: make version var unsigned to silence warningsBrian Paul2015-05-291-1/+1
| | | | | | | _mesa_override_gl_version_contextless() takes an unsigned version parameter. Reviewed-by: Matt Turner <[email protected]>
* i965: Disable compaction for EOT send messagesBen Widawsky2015-05-291-0/+6
| | | | | | | | | | | | | | | | | | | | | | AFAICT, there is no real way to make sure a send message with EOT is properly ignored from compact, nor can I see a way to actually encode EOT while compacting. Before the single send optimization we'd always bail because we hit the is_immediate && !is_compactable_immediate case. However, with single send, is_immediate is not true, and so we end up trying to compact the un-compactible. Without this, any compacting single send instruction will hang because the EOT isn't there. I am not sure how I didn't hit this when I originally enabled the optimization. I didn't check if some surrounding code changed. I know Neil and Matt were both looking into this. I did a quick search and didn't see any patches out there to handle this. Please ignore if this has already been sent by someone. (Direct me to it and I will review it). Reported-by: Neil Roberts <[email protected]> Reported-by: Mark Janes <[email protected]> Tested-by: Mark Janes <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Rework the logic for generating NIR from ARB vertex programsJason Ekstrand2015-05-281-12/+11
| | | | | | | | Whether or not to use NIR is now equivalent to brw->scalar_vs. We can simplify the logic and make it far less confusing. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove the ir_visitor codeJason Ekstrand2015-05-283-2228/+2
| | | | | | | Now that everything is running through NIR, this is all dead. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove the old fragment program codeJason Ekstrand2015-05-283-769/+0
| | | | | | | Now that everything is running through NIR, this is all dead. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make NIR non-optional for scalar shadersJason Ekstrand2015-05-282-27/+5
| | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make fs/vec4_visitor inherit from ir_visitor directlyJason Ekstrand2015-05-283-3/+3
| | | | | | | | This is using multiple inheritance in C++. However, ir_visitor is really just an interface with no data so it shouldn't be so bad. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename backend_visitor to backend_shaderJason Ekstrand2015-05-2813-44/+44
| | | | | | | | The backend_shader class really is a representation of a shader. The fact that it inherits from ir_visitor is somewhat immaterial. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* Revert "i915: Enable ARB_direct_state_access"Ian Romanick2015-05-281-1/+0
| | | | | | | This reverts commit 121030eed8fc41789d2f4f7517bbc0dd6199667b. Acked-by: Fredrik Höglund <[email protected]> Cc: "10.6" <[email protected]>
* Revert "i965: Enable ARB_direct_state_access"Ian Romanick2015-05-281-1/+0
| | | | | | | This reverts commit a57feba0a35de35728269aeb26b039e4f2393d69. Acked-by: Fredrik Höglund <[email protected]> Cc: "10.6" <[email protected]>
* mesa: Allow overriding the version of ES2+ contextsIan Romanick2015-05-281-0/+4
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* dri_util: Use _mesa_override_gl_version_contextlessIan Romanick2015-05-281-7/+11
| | | | | | | | | Remove _mesa_get_gl_version_override. We don't need two functions that do basically the same thing. This change seemed easier (esp. with the next patch) than going the other way. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* i965/fs: Properly handle explicit depth in SIMD16 with dual-source blendJason Ekstrand2015-05-281-1/+5
| | | | | | | Cc: "10.6" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90629 Tested-by: Markus Wick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence warning in 3-src type-setting.Matt Turner2015-05-281-0/+2
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/fs: Fix lowering of integer multiplication with cmod.Matt Turner2015-05-281-0/+11
| | | | | | | | | | | | If the multiplication's result is unused, except by a conditional_mod, the destination will be null. Since the final instruction in the lowered sequence is a partial-write, we can't put the conditional mod on it and we have to store the full result to a register and do a MOV with a conditional mod. Cc: "10.6" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90580 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Emit 3DSTATE_MULTISAMPLE before WM_HZ_OP (gen8+)Ben Widawsky2015-05-271-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting with GEN8, there is documentation that the multisample state command must be emitted before the 3DSTATE_WM_HZ_OP command any time the multisample count changes. The 3DSTATE_WM_HZ_OP packet gets emitted as a result of a intel_hix_exec(), which is called upon a fast clear and/or a resolve. This can happen before the state atoms are checked, and so the multisample state must be put directly in the function. v1: - In v0, I was always emitting the command, but Ken came up with the condition to determine whether or not the sample count actually changed. - Ken's recommendation was to set brw->num_multisamples after emitting 3DSTATE_MULTISAMPLE. This doesn't work. I put my best guess as to why in the XXX (it was causing 7 regressions on BDW). v2: Flag NEW_MULTISAMPLE state. As Ken found, in state upload we check for the multisample change to determine whether or not to emit certain packets. Since the hiz code doesn't actually care about the number of multisamples, set the flag and let the later code take care of it. Jenkins results: http://otc-mesa-ci.jf.intel.com/view/dev/job/bwidawsk/136/ Fixes around 200 piglit tests on SKL. I'm somewhat surprised that it seems to have no impact on BDW as the restriction is needed there as well. Cc: "10.5 10.6" <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Neil Roberts <[email protected]> (v0) Reviewed-by: Kenneth Graunke <[email protected]> (v2)
* i965: Remove _NEW_MULTISAMPLE dirty bit from 3DSTATE_PS_EXTRA.Kenneth Graunke2015-05-271-2/+2
| | | | | | | BRW_NEW_NUM_SAMPLES is sufficient. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Delete GS scratch space workaround warning.Kenneth Graunke2015-05-271-4/+0
| | | | | | | | | | | This workaround is documented in the 3DSTATE_GS documentation. It appears to only apply to early steppings of Broadwell and Skylake. I don't think it ever affected production hardware, so at this point it probably makes sense to delete it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/skl: Add a message header for the TXF_MCS instruction in vec4vsNeil Roberts2015-05-261-2/+18
| | | | | | | | | | | | | | | When using SIMD4x2 on Skylake, the sampler instructions need a message header to select the correct mode. This was added for most sample instructions in 0ac4c2727 but the TXF_MCS instruction is emitted separately and it was missed. This fixes a bunch of Piglit tests which test texelFetch in a geometry shader, for example: spec/arb_texture_multisample/texelfetch/2-gs-sampler2dms Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix implied_mrf_writes for scratch writesJason Ekstrand2015-05-231-1/+1
| | | | | | | | We build the entire message in the generator so all the MRF writes are implied. Cc: "10.5 10.6" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* prog_to_nir: Use a variable for uniform dataJason Ekstrand2015-05-231-12/+3
| | | | | | | | | | | | | Previously, the prog_to_nir pass was directly generating uniform load/store intrinsics. This converts it to use a single giant "parameters" variable and we now depend on lowering to get the uniform load/store intrinsics. One advantage of this is that we now have one code-path after we do the initial conversion into NIR. No shader-db changes. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Skip swizzle disassembly when using 3-src repctrl.Matt Turner2015-05-221-9/+12
| | | | | ... since it's always .x, and also always print the subreg offset when using repctrl.
* swrast: Build fix for SolarisAlan Coopersmith2015-05-201-1/+3
| | | | | | | | Fixes regression from commit 5b2d3480f57168d50ad24cf0b8c9244414bd3701 Cc: "10.5 10.6" <[email protected]> Signed-off-by: Alan Coopersmith <[email protected]> Reviewed-by: Jeremy Huddleston Sequoia <[email protected]>
* nir: Get rid of the array elements parameter on load/store intrinsicsJason Ekstrand2015-05-201-31/+25
| | | | | | | | | | | | | Previously, we used intrinsic->const_index[1] to represent "the number of array elements to load" for load/store intrinsics. However, this set to 1 by every pass that ever creates a load/store intrinsic. Also, while it might make some sense for registers, it makes no sense whatsoever in SSA. On top of that, the i965 backend was the only backend to ever support it; freedreno and vc4 just assert that it's always 1. Let's just delete it. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* i965: add brw_cs.h to the sources listEmil Velikov2015-05-191-0/+1
| | | | Signed-off-by: Emil Velikov <[email protected]>
* i965: Use NIR by default for vertex shaders on GEN8+Jason Ekstrand2015-05-181-1/+1
| | | | | | | | | | | | | | | | | | GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell: total instructions in shared programs: 2742062 -> 2681339 (-2.21%) instructions in affected programs: 1514770 -> 1454047 (-4.01%) helped: 5813 HURT: 1120 The gained programs are ARB vertext programs that were previously going through the vec4 backend. Now that we have prog_to_nir, ARB vertex programs can go through the scalar backend so they show up as "gained" in the shader-db results. Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Acked-by: Matt Turner <[email protected]>
* i965: Add gen8 blend stateBen Widawsky2015-05-181-2/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OLD: 0x00007340: 0x00800000: BLEND: 0x00007344: 0x84202100: BLEND: NEW: 0x00007340: 0x00800000: BLEND: Alpha blend/test 0x00007344: 0x0000000b84202100: BLEND_ENTRY00: Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha) function ADD,ADD (color, alpha), Disables: ---- 0x0000734c: 0x0000000b84202100: BLEND_ENTRY01: Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha) function ADD,ADD (color, alpha), Disables: ---- 0x00007354: 0x0000000b84202100: BLEND_ENTRY02: Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha) function ADD,ADD (color, alpha), Disables: ---- 0x0000735c: 0x0000000b84202100: BLEND_ENTRY03: Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha) function ADD,ADD (color, alpha), Disables: ---- 0x00007364: 0x0000000b84202100: BLEND_ENTRY04: Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha) function ADD,ADD (color, alpha), Disables: ---- 0x0000736c: 0x0000000b84202100: BLEND_ENTRY05: Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha) function ADD,ADD (color, alpha), Disables: ---- 0x00007374: 0x0000000b84202100: BLEND_ENTRY06: Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha) function ADD,ADD (color, alpha), Disables: ---- 0x0000737c: 0x0000000b84202100: BLEND_ENTRY07: Color Buffer Blend factor ONE,ONE,ONE,ONE (src,dst,src alpha, dst alpha) function ADD,ADD (color, alpha), Disables: ---- v2: Line length fixes, and const usage (Topi) Safer initialization of name string (Topi) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add renderbuffer surface indexes to debugBen Widawsky2015-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is optional in the series. It does make the output much cleaner, but there is some risk. Sample output (v3): 0x00007e80: 0x231d7000: SURF000: 2D R8G8B8A8_UNORM VALIGN4 HALIGN4 Y-tiled 0x00007e84: 0x05000000: SURF000: MOCS: 0x5 Base MIP: 0.0 (0 mips) Surface QPitch: 0 0x00007e88: 0x009f009f: SURF000: 160x160 [AUX_NONE] 0x00007e8c: 0x0000027f: SURF000: 1 slices (depth), pitch: 640 0x00007e90: 0x00000000: SURF000: min array element: 0, array extent 1, MULTISAMPLE_1 0x00007e94: 0x00000000: SURF000: x,y offset: 0,0, min LOD: 0 0x00007e98: 0x00000000: SURF000: AUX pitch: 0 qpitch: 0 0x00007e9c: 0x09770000: SURF000: Clear color: R(0)G(0)B(0)A(0) 0x00007ea0: 0x00001000: SURF000: 0x00001000 0x00007ea4: 0x00000000: SURF000: 0x00000000 0x00007ea8: 0x00000000: SURF000: 0x00000000 0x00007eac: 0x00000000: SURF000: 0x00000000 0x00007e40: 0x234df000: SURF001: 2D R11G11B10_FLOAT VALIGN4 HALIGN16 Y-tiled 0x00007e44: 0x09000000: SURF001: MOCS: 0x9 Base MIP: 0.0 (0 mips) Surface QPitch: 0 0x00007e48: 0x009f009f: SURF001: 160x160 [AUX_CCS_D (Uncompressed, MULTISAMPLE_COUNT=1)] 0x00007e4c: 0x0000027f: SURF001: 1 slices (depth), pitch: 640 0x00007e50: 0x00000000: SURF001: min array element: 0, array extent 1, MULTISAMPLE_1 0x00007e54: 0x00000000: SURF001: x,y offset: 0,0, min LOD: 0 0x00007e58: 0x00000001: SURF001: AUX pitch: 0 qpitch: 0 0x00007e5c: 0x09770000: SURF001: Clear color: R(0)G(0)B(0)A(0) 0x00007e60: 0x0002b000: SURF001: 0x0002b000 0x00007e64: 0x00000000: SURF001: 0x00000000 0x00007e68: 0x0002a000: SURF001: 0x0002a000 0x00007e6c: 0x00000000: SURF001: 0x00000000 v2: Rebased on Topi's recent series which changed around some of the gen8 surface setup code. v3: Use ralloc_asprintf instead of asprintf to be more friendly to non-GNU platforms. Signed-off-by: Ben Widawsky <[email protected]>
* i965: Add Gen9 surface state decodingBen Widawsky2015-05-186-39/+68
| | | | | | | | | | | | | | Gen9 surface state is very similar to the previous generation. The important changes here are aux mode, and the way clear colors work. NOTE: There are some things intentionally left out of this decoding. v2: Redo the string for the aux buffer type to address compressed variants. v3: Use the shift for compression enable (instead of compression mode) (Topi) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add gen8 surface state debug infoBen Widawsky2015-05-182-6/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AFAICT, none of the old data was wrong (the gen7 decoder), but it wa smissing a bunch of stuff. Adds a tick (') to denote the beginning of the surface state for easier reading. This will be replaced later with some better, but more risky code. OLD: 0x00007980: 0x23016000: SURF: 2D BRW_SURFACEFORMAT_B8G8R8A8_UNORM 0x00007984: 0x18000000: SURF: offset 0x00007988: 0x00ff00ff: SURF: 256x256 size, 0 mips, 1 slices 0x0000798c: 0x000003ff: SURF: pitch 1024, tiled 0x00007990: 0x00000000: SURF: min array element 0, array extent 1 0x00007994: 0x00000000: SURF: mip base 0 0x00007998: 0x00000000: SURF: x,y offset: 0,0 0x0000799c: 0x09770000: SURF: 0x00007940: 0x231d7000: SURF: 2D BRW_SURFACEFORMAT_R8G8B8A8_UNORM 0x00007944: 0x78000000: SURF: offset 0x00007948: 0x001f001f: SURF: 32x32 size, 0 mips, 1 slices 0x0000794c: 0x0000007f: SURF: pitch 128, tiled 0x00007950: 0x00000000: SURF: min array element 0, array extent 1 0x00007954: 0x00000000: SURF: mip base 0 0x00007958: 0x00000000: SURF: x,y offset: 0,0 0x0000795c: 0x09770000: SURF: NEW (v1): 0x00007980: 0x23016000: SURF': 2D B8G8R8A8_UNORM VALIGN4 HALIGN4 X-tiled 0x00007984: 0x18000000: SURF: MOCS: 0x18 Base MIP: 0.0 (0 mips) Surface QPitch: 0 0x00007988: 0x00ff00ff: SURF: 256x256 [AUX_NONE] 0x0000798c: 0x000003ff: SURF: 1 slices (depth), pitch: 1024 0x00007990: 0x00000000: SURF: min array element: 0, array extent 1, MULTISAMPLE_1 0x00007994: 0x00000000: SURF: x,y offset: 0,0, min LOD: 0 0x00007998: 0x00000000: SURF: AUX pitch: 0 qpitch: 0 0x0000799c: 0x09770000: SURF: Clear color: ---- 0x00007940: 0x231d7000: SURF': 2D R8G8B8A8_UNORM VALIGN4 HALIGN4 Y-tiled 0x00007944: 0x78000000: SURF: MOCS: 0x78 Base MIP: 0 (0 mips) Surface QPitch: ff0000 0x00007948: 0x001f001f: SURF: 32x32 [AUX_NONE] 0x0000794c: 0x0000007f: SURF: 1 slices (depth), pitch: 128 0x00007950: 0x00000000: SURF: min array element: 0, array extent 1, MULTISAMPLE_1 0x00007954: 0x00000000: SURF: x,y offset: 0,0, min LOD: 0 0x00007958: 0x00000000: SURF: AUX pitch: 0 qpitch: 0 0x0000795c: 0x09770000: SURF: Clear color: ---- 0x00007920: 0x00007980: BIND0: surface state address 0x00007924: 0x00007940: BIND1: surface state address v2: Style cleanups (Matt) Fix aux mode dword 7->6 (Topi) Use exp2 instead of pow (Matt) Add dwords 8-12 to the dump v3: Needed to update the surface format name getter for the change in the first patch in the series Signed-off-by: Ben Widawsky <[email protected]> Cc: Matt Turner <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add gen7+ sampler state to batch debugBen Widawsky2015-05-182-1/+71
| | | | | | | | | | | | | | | | | | | | OLD: 0x00007e00: 0x10000000: WM SAMP0: filtering 0x00007e04: 0x000d0000: WM SAMP0: wrapping, lod 0x00007e08: 0x00000000: WM SAMP0: default color pointer 0x00007e0c: 0x00000090: WM SAMP0: chroma key, aniso NEW: 0x00007e00: 0x10000000: SAMPLER_STATE 0: Disabled = no, Base Mip: 0.0, Mip/Mag/Min Filter: NONE/NEAREST/NEAREST, LOD Bias: 0.0 0x00007e04: 0x000d0000: SAMPLER_STATE 0: Min LOD: 0.0, Max LOD: 13.0 0x00007e08: 0x00000000: SAMPLER_STATE 0: Border Color 0x00007e0c: 0x00000090: SAMPLER_STATE 0: Max aniso: RATIO 2:1, TC[XYZ] Address Control: CLAMP|CLAMP|WRAP v2: Move GET_BITS macro to here (with paren protection) Ben/Topi Add const to the sampler pointer (Topi) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add viewport extents (gen8) to batch decodeBen Widawsky2015-05-181-4/+11
| | | | | | | | | | | | | | | | | | 0x00007da0: 0xc1da740e: SF_CLIP VP: guardband xmin = -27.306667 0x00007da4: 0x41da740e: SF_CLIP VP: guardband xmax = 27.306667 0x00007da4: 0x41da740e: SF_CLIP VP: guardband ymin = -23.405714 0x00007da8: 0xc1bb3ee7: SF_CLIP VP: guardband ymax = 23.405714 0x00007db0: 0x00000000: SF_CLIP VP: Min extents: 0.00x0.00 0x00007db8: 0x00000000: SF_CLIP VP: Max extents: 299.00x349.00 While here, fix the wrong offsets for the guardband (I didn't check if it used to be valid on GEN4). v2: Remove leftover GET_BITS which belongs later in the series. (Topi) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add all surface types to the batch decodeBen Widawsky2015-05-183-15/+10
| | | | | | | | | | | | | | | | | | | | | It's true that not all surfaces apply for every gen, but for the most part this is what we want. (The unfortunate case is when we use a valid surface, but not for the specific GEN). This was automated with a vim macro. v2: Shortened common forms such as R8G8B8A8->RGBA8. Note that this makes some of the sample output in subsequent commits slightly incorrect. v3: Use the name from the table (Ken). This requires declaring the surface format array as extern, and declaring the struct in the .h file. v4: Move the struct back and create a helper function to obtain the name (Ken) Get rid of the now useless helper in the state_dump.c Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> (v3) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add string for surface format to tableBen Widawsky2015-05-181-217/+219
| | | | | | | Recommended-by: Kenneth Graunke <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/fs: Implement integer multiply without mul/mach.Matt Turner2015-05-181-28/+66
| | | | | | | Ivybridge and Baytrail can't use mach with 2Q quarter control, so just do it without the accumulator. Stupid accumulator. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Rework compression control selection.Matt Turner2015-05-181-3/+6
| | | | | | | | The next commit uses an add(16) with a UW destination with a stride of 2, which needs compression control since it's writing two registers. The old code would have failed to set compression control correctly. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Support integer multiplication in SIMD16 on Haswell.Matt Turner2015-05-181-5/+47
| | | | | | | Ivybridge (and presumably Baytrail) have a bug that prevents this from working. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Add set_sechalf() method.Matt Turner2015-05-181-0/+10
| | | | | | Used in the next commit. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Unrestrict constant propagation into integer multiply.Matt Turner2015-05-181-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | Gen8+'s MUL instruction doesn't ignore the high 16-bits of one source like on earlier platforms, so we can constant propagate into it without worry. Integer multiplies (not into the accumulator, which is done for imul_high) are lowered in lower_integer_multiplication(), so it's safe there as well. On Broadwell, fragment shaders only: total instructions in shared programs: 4377769 -> 4377451 (-0.01%) instructions in affected programs: 48064 -> 47746 (-0.66%) helped: 156 On Broadwell, vertex shaders only: total instructions in shared programs: 2858885 -> 2856313 (-0.09%) instructions in affected programs: 26380 -> 23808 (-9.75%) helped: 134 On Broadwell, vertex shaders only (with INTEL_USE_NIR=1): total instructions in shared programs: 2911688 -> 2865984 (-1.57%) instructions in affected programs: 1421715 -> 1376011 (-3.21%) helped: 6186 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Lower integer multiplication after optimizations.Matt Turner2015-05-184-64/+70
| | | | | | | | | | | | 32-bit x 32-bit integer multiplication requires multiple instructions until Broadwell. This patch just lets us treat the MUL instruction in the FS backend like it operates on Broadwell, and after optimizations we lower it into a sequence of instructions on older platforms. Doing this will allow us to some extra optimization on integer multiplies. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix textureSize for Lod > 0 with non-mipmap filtersIago Toral Quiroga2015-05-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, when the MinFilter is GL_LINEAR or GL_NEAREST we hide the actual miplevel count from the hardware (and we avoid re-creating the miptree structure with all the levels), since we don't expect levels other than the base level to be needed. Unfortunately, GLSL's textureSize() function is an exception to this rule. This function takes a lod parameter that we need to use to return the size of the appropriate miplevel (if it exists). The spec only requires that the miplevel exists, so even if the sampler is configured with a linear or nearest MinFilter, as far as the user has uploaded miplevels for the texture, textureSize() should return the appropriate sizes. This patch fixes this by exposing the actual miplevel count for all sampling engine textures while keeping the original implementation for render targets (for render targets textures we do not provide the miplevel count but the actual LOD we are wrting to, so we want to make sure that we make this the base level). Fixes 28 dEQP tests in the following category: dEQP-GLES3.functional.shaders.texture_functions.texturesize.* Reviewed-by: Ben Widawsky <[email protected]>
* i965: Fix FS unit testsIan Romanick2015-05-152-2/+4
| | | | | | | | | | | Commit 3687d75 changed the fs_visitor constructors, but it didn't update all the users. As a result, 'make check' fails. I added the explicit cast to the gl_program* parameter to make it more clear which NULL was which. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Combine the fs_visitor constructors.Kenneth Graunke2015-05-145-74/+20
| | | | | | | | | | | | | | For scalar GS support, we either need to add a fourth constructor which takes the GS structures, or combine the existing two and pass the shader stage. Given that they're not significantly different, I opted for the latter. v2: Remove more stuff from the .h file (Jason and Jordan). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>