summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel: Use Clear Color struct size.Rafael Antognolli2018-04-056-15/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The size of the clear color struct (expected by the hardware) is 8 dwords (isl_dev.ss.clear_value_state_size here). But we still need to track the size of the clear color, used when memcopying it to/from the state buffer. For that we keep isl_dev.ss.clear_value_size. v4: - Add struct to gen11 too (Jason, Jordan) - Add field for Converted Clear Color to gen11 (Jason) - Add clear_color_state_offset to differentiate from clear_value_offset. - Fix all the places where clear_value_size was used. v5 (Jason): - Split genxml changes to another commit. - Remove unnecessary gen checks. - Bring back missing offset increment to init_fast_clear_color(). v6 (Jason): - On init_fast_clear_color, change: addr.offset += 4 => sdi.Address.offset += i * 4 - Use GEN_GEN instead of GEN_VERSIONx10. [[email protected]: isl_device_init changes] Signed-off-by: Rafael Antognolli <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/genxml: Add Clear Color struct to gen10+.Rafael Antognolli2018-04-052-0/+18
| | | | | | | v5: Split genxml changes into its own commit (Jason). Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/genxml: Use a single field for clear color address on gen10.Rafael Antognolli2018-04-052-8/+6
| | | | | | | | | | | | | | | | | | | | genxml does not support having two address fields with different names but same position in the state struct. Both "Clear Color Address" and "Clear Depth Address Low" mean the same thing, only for different surface types. To workaround this genxml limitation, rename "Clear Color Address" to "Clear Value Address" and use it for both color and depth. Do the same for the high bits. TODO: add support for multiple addresses at the same position in the xml. v2: Combine high and low order bits into a single address field. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* genxml: Preserve fields that share dword space with addresses.Rafael Antognolli2018-04-051-2/+6
| | | | | | | | | | | | | | | | | Some instructions contain fields that are either an address or a value of some type based on the content of other fields, such as clear color values vs address. That works fine if these fields are in the less significant dword, the lower 32 bits of the address, because they get OR'ed with the address. But if they are in the higher 32 bits, they get discarded. On Gen10 we have fields that share space with the higher 16 bits of the address too. This commit makes sure those fields don't get discarded. v5: Remove spurious whitespace (Jason). Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/image: Do not override lower bits of dword.Rafael Antognolli2018-04-052-15/+12
| | | | | | | | | | | | | The lower bits seem to have extra fields in every platform but gen8 (even though we don't use them in gen9). So just go ahead and avoid using them for the address. v4: Use Jason's suggestion for comment explaining the change. v5: Fix aux_address comment in anv_private.h (Jason) Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel: compiler: silence compiler warningLionel Landwerlin2018-04-041-0/+1
| | | | | | | | | | ../src/intel/compiler/brw_reg.h: In function ‘bool brw_regs_negative_equal(const brw_reg*, const brw_reg*)’: ../src/intel/compiler/brw_reg.h:305:1: warning: control reaches end of non-void function [-Wreturn-type] Introduced by 8f83eea71e233 ("i965: Add negative_equals methods"). Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv: Fix close(fd) before import issue in vkCreateDmaBufImageINTELKevin Strasser2018-04-031-2/+2
| | | | | | | | | If we close the fd before calling DRM_IOCTL_PRIME_FD_TO_HANDLE the kernel will hit a -EBADF error. Move the close(fd) call to the end of anv_CreateDmaBufImageINTEL(). Signed-off-by: Kevin Strasser <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel: gen-decoder: print all dword a field belongs toLionel Landwerlin2018-04-032-7/+9
| | | | | | | | | | Prior to printing a decoded field, print out all dwords that field belongs to. In particular with address fields spanning multiple dwords, we want to have all the dwords presented before the field is decoded to make it easier to read. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: genxml: decode variable length MI_LRILionel Landwerlin2018-04-0310-0/+40
| | | | | | | | | | | | | | | MI_LOAD_REGISTER_IMM can load multiple (register, value) tuples in one command. In our drivers we only use one tuple at a time, but the kernel might load more than one at a time. Instead of making all the tuple part of a group, we leave out the first tuple (the one we use in the generated packing structures). This is particularly useful for looking at error stats generated by the kernel. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: gen-decoder: don't decode fields beyond a dword lengthLionel Landwerlin2018-04-031-15/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For example, a PIPE_CONTROL with DWordLength = 2 should look like this : 0xffffe374: 0x7a000002: PIPE_CONTROL 0xffffe374: 0x7a000002 : Dword 0 DWord Length: 2 0xffffe378: 0x00800000 : Dword 1 Depth Cache Flush Enable: false Stall At Pixel Scoreboard: false State Cache Invalidation Enable: false Constant Cache Invalidation Enable: false VF Cache Invalidation Enable: false DC Flush Enable: false Pipe Control Flush Enable: false Notify Enable: false Indirect State Pointers Disable: false Texture Cache Invalidation Enable: false Instruction Cache Invalidate Enable: false Render Target Cache Flush Enable: false Depth Stall Enable: false Post Sync Operation: 0 (No Write) Generic Media State Clear: false TLB Invalidate: false Global Snapshot Count Reset: false Command Streamer Stall Enable: false Store Data Index: 0 LRI Post Sync Operation: 1 (MMIO Write Immediate Data) Destination Address Type: 0 (PPGTT) Flush LLC: false 0xffffe37c: 0x00000000 : Dword 2 Address: 0x00000000 0xffffe384: 0x05000000: MI_BATCH_BUFFER_END Prior to this change, fields beyond the length of the command would be decoded (notice the MI_BATCH_BUFFER_END decoded as part of the previous PIPE_CONTROL) : 0xffffe374: 0x7a000002: PIPE_CONTROL 0xffffe374: 0x7a000002 : Dword 0 DWord Length: 2 0xffffe378: 0x00800000 : Dword 1 Depth Cache Flush Enable: false Stall At Pixel Scoreboard: false State Cache Invalidation Enable: false Constant Cache Invalidation Enable: false VF Cache Invalidation Enable: false DC Flush Enable: false Pipe Control Flush Enable: false Notify Enable: false Indirect State Pointers Disable: false Texture Cache Invalidation Enable: false Instruction Cache Invalidate Enable: false Render Target Cache Flush Enable: false Depth Stall Enable: false Post Sync Operation: 0 (No Write) Generic Media State Clear: false TLB Invalidate: false Global Snapshot Count Reset: false Command Streamer Stall Enable: false Store Data Index: 0 LRI Post Sync Operation: 1 (MMIO Write Immediate Data) Destination Address Type: 0 (PPGTT) Flush LLC: false 0xffffe37c: 0x00000000 : Dword 2 Address: 0x00000000 0xffffe380: 0x00000000 : Dword 3 0xffffe384: 0x05000000 : Dword 4 Immediate Data: 83886080 0xffffe384: 0x05000000: MI_BATCH_BUFFER_END Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: error_decode: add an option to decode all buffersLionel Landwerlin2018-04-031-2/+7
| | | | | | | | | | | The kernel reports workaround batch buffers, but we're not presenting them currently. Also they might not be useful for debugging purely userspace driver issues, when problems arise because of interactions between kernel & userspace drivers, it's nice to be able to decode them. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* intel: genxml: add preemption control instructionsLionel Landwerlin2018-04-034-0/+26
| | | | | | | Helpful to debug kernel workaround batchbuffers. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Scott D Phillips <[email protected]>
* nir+drivers: add helpers to get # of src/dest componentsRob Clark2018-04-031-6/+5
| | | | | | | | | Add helpers to get the number of src/dest components for an intrinsic, and update spots that were open-coding this logic to use the helpers instead. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* anv/cmd_buffer: honor pending clear views for depth/stencil attachmentsIago Toral Quiroga2018-04-021-1/+21
| | | | | | | | | | | | v2: rebased on top of subpass rework. v3: rebased v4: - rebased - reset pending clear views in one go rather one bit at a time (Caio) Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: consider multiview masks for tracking pending clear aspectsIago Toral Quiroga2018-04-022-3/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When multiview is active a subpass clear may only clear a subset of the attachment layers. Other subpasses in the same render pass may also clear too and we want to honor those clears as well, however, we need to ensure that we only clear a layer once, on the first subpass that uses a particular layer (view) of a given attachment. This means that when we check if a subpass attachment needs to be cleared we need to check if all the layers used by that subpass (as indicated by its view_mask) have already been cleared in previous subpasses or not, in which case, we must clear any pending layers used by the subpass, and only those pending. v2: - track pending clear views in the attachment state (Jason) - rebased on top of fast-clear rework. v3: - rebased on top of subpass rework. v4: rebased. v5 (Caio): - Rebased. - Initialize pending clear views to only have bits set for layers that exist. - Reset pending clear views in one go rather one bit at a time. - Put "last subpass for this attachment" condition in a separate function to simplify the conditional that resets pending_clear_aspects. Fixes: dEQP-VK.multiview.readback_implicit_clear.* Reviewed-by: Jason Ekstrand <[email protected]>
* intel/vec4: Set channel_sizes for MOV_INDIRECT sourcesJason Ekstrand2018-03-301-1/+4
| | | | | | | | | | | Otherwise, any indirect push constant access results in an assertion failure when we start digging through the channel_sizes array. This fixes dEQP-VK.pipeline.push_constant.graphics_pipeline.dynamic_index_vert on Haswell. It should be a harmless no-op for GL since indirect push constants aren't used there. Reviewed-by: Kenneth Graunke <[email protected]> Fixes: e69e5c7006d "i965/vec4: load dvec3/4 uniforms first in the..."
* util: Add and use util_is_power_of_two_nonzeroIan Romanick2018-03-291-2/+2
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* util: Move util_is_power_of_two to bitscan.h and rename to ↵Ian Romanick2018-03-294-8/+8
| | | | | | | | | | | util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Matt Turner <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* autotools: Include intel/dev/meson.build in tarballDylan Baker2018-03-281-0/+1
| | | | | | | Fixes: 272bef0601a1bdb5292771aefc8d62fcbdf4c47f ("intel: Split gen_device_info out into libintel_dev") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/fs: Don't emit a des copy for image ops with has_dest == falseJason Ekstrand2018-03-271-3/+6
| | | | | | | | | | This was causing us to walk dest_components times over a thing with no destination. This happened to work because all of the image intrinsics without a destination also happened to have dest_components == 0. We shouldn't be reading dest_components if has_dest == false. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* intel/aubinator_error_decode: Decode more registers.Rafael Antognolli2018-03-261-0/+12
| | | | | | Decode SC_INSTDONE, ROW_INSTDONE and SAMPLER_INSTDONE. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add SAMPLER_INSTDONE register.Rafael Antognolli2018-03-266-0/+139
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add ROW_INSTDONE register.Rafael Antognolli2018-03-266-0/+114
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Add SC_INSTDONE register.Rafael Antognolli2018-03-266-0/+140
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/vec4: Fix null destination register in 3-source instructionsIan Romanick2018-03-262-0/+27
| | | | | | | | | | | | | | | | | | | | | | | A recent commit (see below) triggered some cases where conditional modifier propagation and dead code elimination would cause a MAD instruction like the following to be generated: mad.l.f0 null, ... Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases like this in the scalar backend. This commit basically ports that code to the vec4 backend. NOTE: I have sent a couple tests to the piglit list that reproduce this bug *without* the commit mentioned below. This commit fixes those tests. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Tested-by: Tapani Pälli <[email protected]> Cc: [email protected] Fixes: ee63933a7 ("nir: Distribute binary operations with constants into bcsel") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704
* i965/vec4: Propagate conditional modifiers from compares to addsIan Romanick2018-03-261-5/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No changes on Broadwell or later as those platforms do not use the vec4 backend. Ivy Bridge and Haswell had similar results. (Ivy Bridge shown) total instructions in shared programs: 11682119 -> 11681056 (<.01%) instructions in affected programs: 150403 -> 149340 (-0.71%) helped: 950 HURT: 0 helped stats (abs) min: 1 max: 16 x̄: 1.12 x̃: 1 helped stats (rel) min: 0.23% max: 2.78% x̄: 0.82% x̃: 0.71% 95% mean confidence interval for instructions value: -1.19 -1.04 95% mean confidence interval for instructions %-change: -0.84% -0.79% Instructions are helped. total cycles in shared programs: 257495842 -> 257495238 (<.01%) cycles in affected programs: 270302 -> 269698 (-0.22%) helped: 271 HURT: 13 helped stats (abs) min: 2 max: 14 x̄: 2.42 x̃: 2 helped stats (rel) min: 0.06% max: 1.13% x̄: 0.32% x̃: 0.28% HURT stats (abs) min: 2 max: 12 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.15% max: 1.18% x̄: 0.30% x̃: 0.26% 95% mean confidence interval for cycles value: -2.41 -1.84 95% mean confidence interval for cycles %-change: -0.31% -0.26% Cycles are helped. Sandy Bridge total instructions in shared programs: 10430493 -> 10429727 (<.01%) instructions in affected programs: 120860 -> 120094 (-0.63%) helped: 766 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.30% max: 2.70% x̄: 0.78% x̃: 0.73% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.80% -0.75% Instructions are helped. total cycles in shared programs: 146138718 -> 146138446 (<.01%) cycles in affected programs: 244114 -> 243842 (-0.11%) helped: 132 HURT: 0 helped stats (abs) min: 2 max: 4 x̄: 2.06 x̃: 2 helped stats (rel) min: 0.03% max: 0.43% x̄: 0.16% x̃: 0.19% 95% mean confidence interval for cycles value: -2.12 -2.00 95% mean confidence interval for cycles %-change: -0.18% -0.15% Cycles are helped. GM45 and Iron Lake had identical results. (Iron Lake shown) total instructions in shared programs: 7780251 -> 7780248 (<.01%) instructions in affected programs: 175 -> 172 (-1.71%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.49% max: 2.44% x̄: 1.81% x̃: 1.49% total cycles in shared programs: 177851584 -> 177851578 (<.01%) cycles in affected programs: 9796 -> 9790 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.05% max: 0.08% x̄: 0.06% x̃: 0.05% Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Allow cmod propagation when src0 is a uniform or shader inputIan Romanick2018-03-261-1/+2
| | | | | | | | | | No shader-db changes. This source must have been written by a previous instruction, so it cannot be a uniform or a shader input. However, this change allows the next commit to help more shaders. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Propagate conditional modifiers from compares to addsIan Romanick2018-03-262-5/+400
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The math inside the add and the cmp in this instruction sequence is the same. We can utilize this to eliminate the compare. add(8) g5<1>F g2<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; cmp.z.f0(8) null<1>F g2<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g8<1>F (abs)g5<8,8,1>F 3e-37F { align1 1Q }; This is reduced to: add.z.f0(8) g5<1>F g2<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; (-f0) sel(8) g8<1>F (abs)g5<8,8,1>F 3e-37F { align1 1Q }; This optimization pass could do even better. The nature of converting vectorized code from the GLSL front end to scalar code in NIR results in sequences like: add(8) g7<1>F g4<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; add(8) g6<1>F g3<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; add(8) g5<1>F g2<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; cmp.z.f0(8) null<1>F g2<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g8<1>F (abs)g5<8,8,1>F 3e-37F { align1 1Q }; cmp.z.f0(8) null<1>F g3<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g10<1>F (abs)g6<8,8,1>F 3e-37F { align1 1Q }; cmp.z.f0(8) null<1>F g4<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g12<1>F (abs)g7<8,8,1>F 3e-37F { align1 1Q }; In this sequence, only the first cmp.z is removed. With different scheduling, all 3 could get removed. Skylake total instructions in shared programs: 14407009 -> 14400173 (-0.05%) instructions in affected programs: 1307274 -> 1300438 (-0.52%) helped: 4880 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.03% max: 8.70% x̄: 0.70% x̃: 0.52% 95% mean confidence interval for instructions value: -1.45 -1.35 95% mean confidence interval for instructions %-change: -0.72% -0.69% Instructions are helped. total cycles in shared programs: 532943169 -> 532923528 (<.01%) cycles in affected programs: 14065798 -> 14046157 (-0.14%) helped: 2703 HURT: 339 helped stats (abs) min: 1 max: 1062 x̄: 12.27 x̃: 2 helped stats (rel) min: <.01% max: 28.72% x̄: 0.38% x̃: 0.21% HURT stats (abs) min: 1 max: 739 x̄: 39.86 x̃: 12 HURT stats (rel) min: 0.02% max: 27.69% x̄: 1.38% x̃: 0.41% 95% mean confidence interval for cycles value: -8.66 -4.26 95% mean confidence interval for cycles %-change: -0.24% -0.14% Cycles are helped. LOST: 0 GAINED: 1 Broadwell total instructions in shared programs: 14719636 -> 14712949 (-0.05%) instructions in affected programs: 1288188 -> 1281501 (-0.52%) helped: 4845 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.38 x̃: 1 helped stats (rel) min: 0.03% max: 8.00% x̄: 0.70% x̃: 0.52% 95% mean confidence interval for instructions value: -1.43 -1.33 95% mean confidence interval for instructions %-change: -0.72% -0.68% Instructions are helped. total cycles in shared programs: 559599253 -> 559581699 (<.01%) cycles in affected programs: 13315565 -> 13298011 (-0.13%) helped: 2600 HURT: 269 helped stats (abs) min: 1 max: 2128 x̄: 12.24 x̃: 2 helped stats (rel) min: <.01% max: 23.95% x̄: 0.41% x̃: 0.20% HURT stats (abs) min: 1 max: 790 x̄: 53.07 x̃: 20 HURT stats (rel) min: 0.02% max: 15.96% x̄: 1.55% x̃: 0.75% 95% mean confidence interval for cycles value: -8.47 -3.77 95% mean confidence interval for cycles %-change: -0.27% -0.18% Cycles are helped. LOST: 0 GAINED: 8 Haswell total instructions in shared programs: 12978609 -> 12973483 (-0.04%) instructions in affected programs: 932921 -> 927795 (-0.55%) helped: 3480 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.47 x̃: 1 helped stats (rel) min: 0.03% max: 7.84% x̄: 0.78% x̃: 0.58% 95% mean confidence interval for instructions value: -1.53 -1.42 95% mean confidence interval for instructions %-change: -0.80% -0.75% Instructions are helped. total cycles in shared programs: 410270788 -> 410250531 (<.01%) cycles in affected programs: 10986161 -> 10965904 (-0.18%) helped: 2087 HURT: 254 helped stats (abs) min: 1 max: 2672 x̄: 14.63 x̃: 4 helped stats (rel) min: <.01% max: 39.61% x̄: 0.42% x̃: 0.21% HURT stats (abs) min: 1 max: 519 x̄: 40.49 x̃: 16 HURT stats (rel) min: 0.01% max: 12.83% x̄: 1.20% x̃: 0.47% 95% mean confidence interval for cycles value: -12.82 -4.49 95% mean confidence interval for cycles %-change: -0.31% -0.18% Cycles are helped. LOST: 0 GAINED: 5 Ivy Bridge total instructions in shared programs: 11686082 -> 11681548 (-0.04%) instructions in affected programs: 937696 -> 933162 (-0.48%) helped: 3150 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 7.84% x̄: 0.69% x̃: 0.49% 95% mean confidence interval for instructions value: -1.49 -1.38 95% mean confidence interval for instructions %-change: -0.71% -0.67% Instructions are helped. total cycles in shared programs: 257514962 -> 257492471 (<.01%) cycles in affected programs: 11524149 -> 11501658 (-0.20%) helped: 1970 HURT: 239 helped stats (abs) min: 1 max: 3525 x̄: 17.48 x̃: 3 helped stats (rel) min: <.01% max: 49.60% x̄: 0.46% x̃: 0.17% HURT stats (abs) min: 1 max: 1358 x̄: 50.00 x̃: 15 HURT stats (rel) min: 0.02% max: 59.88% x̄: 1.84% x̃: 0.65% 95% mean confidence interval for cycles value: -17.01 -3.35 95% mean confidence interval for cycles %-change: -0.33% -0.08% Cycles are helped. LOST: 9 GAINED: 1 Sandy Bridge total instructions in shared programs: 10432841 -> 10429893 (-0.03%) instructions in affected programs: 685071 -> 682123 (-0.43%) helped: 2453 HURT: 0 helped stats (abs) min: 1 max: 9 x̄: 1.20 x̃: 1 helped stats (rel) min: 0.02% max: 7.55% x̄: 0.64% x̃: 0.46% 95% mean confidence interval for instructions value: -1.23 -1.17 95% mean confidence interval for instructions %-change: -0.67% -0.62% Instructions are helped. total cycles in shared programs: 146133660 -> 146134195 (<.01%) cycles in affected programs: 3991634 -> 3992169 (0.01%) helped: 1237 HURT: 153 helped stats (abs) min: 1 max: 2853 x̄: 6.93 x̃: 2 helped stats (rel) min: <.01% max: 29.00% x̄: 0.24% x̃: 0.14% HURT stats (abs) min: 1 max: 1740 x̄: 59.56 x̃: 12 HURT stats (rel) min: 0.03% max: 78.98% x̄: 1.96% x̃: 0.42% 95% mean confidence interval for cycles value: -5.13 5.90 95% mean confidence interval for cycles %-change: -0.17% 0.16% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 1 GM45 and Iron Lake had similar results (GM45 shown): total instructions in shared programs: 4800332 -> 4798380 (-0.04%) instructions in affected programs: 565995 -> 564043 (-0.34%) helped: 1451 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 1.35 x̃: 1 helped stats (rel) min: 0.05% max: 5.26% x̄: 0.47% x̃: 0.31% 95% mean confidence interval for instructions value: -1.40 -1.29 95% mean confidence interval for instructions %-change: -0.50% -0.45% Instructions are helped. total cycles in shared programs: 122032318 -> 122027798 (<.01%) cycles in affected programs: 8334868 -> 8330348 (-0.05%) helped: 1029 HURT: 1 helped stats (abs) min: 2 max: 40 x̄: 4.43 x̃: 2 helped stats (rel) min: <.01% max: 1.83% x̄: 0.09% x̃: 0.04% HURT stats (abs) min: 38 max: 38 x̄: 38.00 x̃: 38 HURT stats (rel) min: 0.25% max: 0.25% x̄: 0.25% x̃: 0.25% 95% mean confidence interval for cycles value: -4.70 -4.08 95% mean confidence interval for cycles %-change: -0.09% -0.08% Cycles are helped. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Allow cmod propagation when src0 is a uniform or shader inputIan Romanick2018-03-261-1/+2
| | | | | | | | | | No shader-db changes. This source must have been written by a previous instruction, so it cannot be a uniform or a shader input. However, this change allows the next commit to help about 900 more shaders. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add negative_equals methodsIan Romanick2018-03-267-0/+72
| | | | | | | | | | | | | | | | | | | | | | This method is similar to the existing ::equals methods. Instead of testing that two src_regs are equal to each other, it tests that one is the negation of the other. v2: Simplify various checks based on suggestions from Matt. Use src_reg::type instead of fixed_hw_reg.type in a check. Also suggested by Matt. v3: Rebase on 3 years. Fix some problems with negative_equals with VF constants. Add fs_reg::negative_equals. v4: Replace the existing default case with BRW_REGISTER_TYPE_UB, BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF. Suggested by Matt. Expand the FINISHME comment to better explain why it isn't already finished. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> [v3] Reviewed-by: Matt Turner <[email protected]>
* anv: Set genX_table for gen11Jordan Justen2018-03-231-0/+3
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add gen11 to anv_genX_callJordan Justen2018-03-231-0/+3
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* android: Use local i915_drm.h rather than the system one.Kenneth Graunke2018-03-231-0/+2
| | | | | | Fixes: 2d26c9993389a8eb8f712 (intel: devinfo: meson: include drm uapi) Reviewed-by: Lionel Landwerlin <[email protected]> Tested-by: Clayton Craft <[email protected]>
* nir: Rename image intrinsics to image_varJason Ekstrand2018-03-233-36/+36
| | | | | | | | | | | Generated with git grep -l nir_intrinsic_image | xargs \ sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g' and some manual fixing in nir_intrinsics.h Reviewed-by: Timothy Arceri <[email protected]>
* i965: perf: query topologyLionel Landwerlin2018-03-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the introduction of asymmetric slices in CNL, we cannot rely on the previous SUBSLICE_MASK getparam to tell userspace what subslices are available. We introduce a new uAPI in the kernel driver to report exactly what part of the GPU are fused and require this to be available on Gen10+. Prior generations can continue to rely on GETPARAM on older kernels. This patch is quite a lot of code because we have to support lots of different kernel versions, ranging from not providing any information (for Haswell on 4.13 through 4.17), to being able to query through GETPARAM (for gen8/9 on 4.13 through 4.17), to finally requiring 4.17 for Gen10+. This change stores topology information in a unified way on brw_context.topology from the various kernel APIs. And then generates the appropriate values for the equations from that unified topology. v2: Move slice/subslice masks fields to gen_device_info (Rafael) v3: Add a gen_device_info_subslice_available() helper (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: devinfo: add helper functions to fill fusing masks valuesLionel Landwerlin2018-03-222-1/+140
| | | | | | | | | | | | | | | | | | | | | | There are a couple of ways we can get the fusing information from the kernel : - Through DRM_I915_GETPARAM with the SLICE_MASK/SUBSLICE_MASK parameters - Through the new DRM_IOCTL_I915_QUERY by requesting the DRM_I915_QUERY_TOPOLOGY_INFO The second method is more accurate and also gives us the EUs fusing masks. It's also a requirement for CNL as this platform has asymetric subslices and the first method SUBSLICE_MASK value is assumed uniform across slices. v2: Change gen_device_info_update_from_masks() to generate topology and call into gen_device_info_update_from_topology (Lionel/Ken) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: devinfo: meson: include drm uapiLionel Landwerlin2018-03-221-1/+1
| | | | | | | Already available with the autotools build. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: devinfo: store slice/subslice/eu masksLionel Landwerlin2018-03-222-1/+91
| | | | | | | | | | | We want to store values coming from the kernel but as a first step, we can generate mask values out the numbers already stored in the gen_device_info masks. v2: Add a helper to set EU masks (Lionel/Ken) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: devinfo: store number of EUs per subsliceLionel Landwerlin2018-03-222-2/+38
| | | | | | | | | | | This will be reused to store values reported by the kernel. The main use case will be for use as the input values of the metric sets equations for the INTEL_performance_queries extension. By storing this information in the gen_device_info we make this non GL specific so this can be reused by Vulkan if we ever have an equivalent extension. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/radv: autotools: include vulkan_*.h headersJuan A. Suarez Romero2018-03-221-0/+4
| | | | Reviewed-by: Emil Velikov <[email protected]>
* intel/compiler: Readd ICL to test_eu_validate.cppMatt Turner2018-03-221-0/+1
| | | | Now that the PCI IDs are upstream, this can be readded.
* intel/compiler: Skip 64-bit type tests when types not availableMatt Turner2018-03-221-0/+19
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/icl: Clear "null render target" bit in extended message ↵Jason Ekstrand2018-03-222-0/+6
| | | | | | | | | descriptor Otherwise all our render target writes go no where. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/icl: Update the assert in brw_stage_has_packed_dispatch()Anuj Phogat2018-03-221-1/+1
| | | | | | | | Rafael ran piglit with the test code enabled and saw no additional GPU hangs. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/common/icl: Disable hiz surface samplingAnuj Phogat2018-03-221-0/+1
| | | | | | | | On gen11+ AUX_HIZ is not a supported value for surfaces being sampled by the 3D sampler. Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/common/icl: Add L3 configAnuj Phogat2018-03-221-0/+18
| | | | | | ICL uses the same L3 configs as CNL, just leaving the SLM configs out. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/tools/aubinator: Drop platform list from print_help()Matt Turner2018-03-221-1/+1
| | | | | | | | We all know the platform names, and I don't want to update this list continually. Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/pipeline: don't pass constant view index in multiviewCaio Marcelo de Oliveira Filho2018-03-211-6/+11
| | | | | | | | | | | If view mask has only one bit set, view index is effectively a constant, so doesn't need to be passed to the next stages, just always set it. Part of this was in the original patch that added anv_nir_lower_multiview.c but disabled. Reviewed-by: Jason Ekstrand <[email protected]>
* anv/pipeline: use less instructions for multiviewCaio Marcelo de Oliveira Filho2018-03-211-1/+1
| | | | | | | | | | | | | | | The view_index is encoded in the remainder of dividing instance id by the number of views in the view mask (n). In the general case (handled by the else clause), there is a need to map from 0..n-1 into the number of the view being masked. For that a map is encoded. In the case only the first n bits in the mask are set, the mapping is trivial, 0..n-1 already represent what view is being referred to. That case was in the original patch that added anv_nir_lower_multiview.c but disabled. Reviewed-by: Jason Ekstrand <[email protected]>
* aubinator_error_decode: Compare only the class_name of the ring.Rafael Antognolli2018-03-211-1/+1
| | | | | | | | | | | ring_name is "<class_name> + <instance_id>" (e.g. rcs0). So we need to first compare the class name only, then get the instance id. Without this, INSTDONE is not being decoded. Signed-off-by: Rafael Antognolli <[email protected]> Cc: Chris Wilson <[email protected]> Reviewed-by: Chris Wilson <[email protected]>