summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* anv: wire up the state_pool_padding testEmil Velikov2019-02-051-0/+5
| | | | | | | | | | Cc: Jason Ekstrand <[email protected]> Fixes: 927ba12b53c ("anv/tests: Adding test for the state_pool padding.") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]><Paste> Reviewed-by: Dylan Baker <[email protected]> (cherry picked from commit 8943eb8f03fe67710ce65fc0a54024751ff2b5bd)
* anv: Fix VK_EXT_transform_feedback working with varyings packed in PSIZDanylo Piliaiev2019-02-041-3/+20
| | | | | | | | | | | | | | Transform feedback did not set correct SO_DECL.ComponentMask for varyings packed in VARYING_SLOT_PSIZ: gl_Layer - VARYING_SLOT_LAYER in VARYING_SLOT_PSIZ.y gl_ViewportIndex - VARYING_SLOT_VIEWPORT in VARYING_SLOT_PSIZ.z gl_PointSize - VARYING_SLOT_PSIZ in VARYING_SLOT_PSIZ.w Fixes: 36ee2fd61c8f94 "anv: Implement the basic form of VK_EXT_transform_feedback" Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 64d3b148fe71453c296ba9525f49ffe728171582)
* intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 modeJason Ekstrand2019-02-041-7/+6
| | | | | | | | | | | | Previously, we only applied the fix to shaders with a dispatch mode of SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16 instructions. If you have a SIMD8 instruction in a SIMD16 shader, neither would trigger and the restriction could still be hit. Fixes: 232ed8980217dd "i965/fs: Register allocator shoudn't use grf127..." Reviewed-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit b4f0d062cd12b4f675bac900ac41d1085a79239a)
* intel/fs: Use split sends for surface writes on gen9+Jason Ekstrand2019-01-292-18/+47
| | | | | | | | | | | | | Surface reads don't need them because they just have the one address payload. With surface writes, on the other hand, we can put the address and the data in the different halves and avoid building the payload all together. The decrease in register pressure and added freedom in register allocation resulting from this change reduces spilling enough to improve the performance of one customer benchmark by about 2x. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Add interference between SENDS sourcesJason Ekstrand2019-01-291-0/+27
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Support SENDS in SHADER_OPCODE_SENDJason Ekstrand2019-01-293-8/+66
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/disasm: Properly disassemble split sendsJason Ekstrand2019-01-291-19/+142
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/eu: Add support for the SENDS[C] messagesJason Ekstrand2019-01-294-19/+255
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/inst: Indent some codeJason Ekstrand2019-01-291-177/+183
| | | | | | | We're about to add some more if cases so let's have the giant re-indent in it's own patch to make review easier. Acked-by: Iago Toral Quiroga <[email protected]>
* intel/inst: Fix the ia16_addr_imm helpersJason Ekstrand2019-01-291-4/+5
| | | | | | | | These have clearly never seen any use.... On gen8, the bottom 4 bits are missing so we need to shift them off before we call set_bits and shift again when we get the bits. Found by inspection. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/disasm: Rework SEND decoding to use descriptorsJason Ekstrand2019-01-291-36/+50
| | | | | | | | | | | | Instead of fetching the information out of the instruction directly, fetch the descriptor and then pluck the information out of the descriptor. The current scheme works ok for SEND but with SENDS, it all falls to pieces because the descriptor is completely shuffled around. This commit doesn't actually convert everything. One notable exception is URB messages which don't even use descriptors in emit_urb_WRITE yet. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/eu: Add more message descriptor helpersJason Ekstrand2019-01-291-27/+216
| | | | | | | | | | | | | | | | We want to be able to extract data from descriptors as well as unify a bit of the descriptor construction. One of the unifications we do is to unify the read/write and dataport descriptors. On gen4-5, read/write are substantially different and the read descriptors change between gen4 and gen4.x. On gen6, they unified layouts between read, write, and dataport. Then, on gen8, they added one bit to the message type field but left it reserved MBZ for read/write messages. This commit chooses to treat that as if they expanded the field everywhere and just didn't have enough enum values for read/write to bother with the extra bit. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/eu/validate: SEND restrictions also apply to SENDCJason Ekstrand2019-01-291-1/+2
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/eu: Use GET_BITS in brw_inst_set_send_ex_descJason Ekstrand2019-01-291-5/+5
| | | | | | It's a bit more readable Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Use SHADER_OPCODE_SEND for varying UBO pulls on gen7+Jason Ekstrand2019-01-297-88/+25
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Use SHADER_OPCODE_SEND for texturing on gen7+Jason Ekstrand2019-01-294-142/+177
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Use a logical opcode for IMAGE_SIZEJason Ekstrand2019-01-294-6/+21
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Use SHADER_OPCODE_SEND for surface messagesJason Ekstrand2019-01-295-214/+201
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Add a generic SEND opcodeJason Ekstrand2019-01-299-3/+91
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/eu: Rework surface descriptor helpersJason Ekstrand2019-01-292-234/+234
| | | | | | | | | | This commit pulls the surface descriptor helpers out into brw_eu.h and makes them no longer depend on the codegen infrastructure. This should allow us to use them directly from the IR code instead of the generator. This change is unfortunately less mechanical than perhaps one would like but it should be fairly straightforward. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/eu: Add has_simd4x2 bools to surface_write functionsJason Ekstrand2019-01-291-6/+8
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Take an explicit exec size in brw_surface_payload_size()Jason Ekstrand2019-01-291-20/+39
| | | | | | | | Instead of magically falling back to SIMD8 for atomics and typed messages on Ivy Bridge, explicitly figure out the exec size and pass that into brw_surface_payload_size. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Handle IMAGE_SIZE in size_read() and is_send_from_grf()Jason Ekstrand2019-01-291-0/+2
| | | | | | | Like all the other sends, it's just mlen * REG_SIZE. Fixes: 3cbc02e4693 "intel: Use TXS for image_size when we have..." Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/defines: Explicitly cast to uint32_t in SET_FIELD and SET_BITSJason Ekstrand2019-01-291-2/+2
| | | | | | | | | If you pass a bool in as the value to set, the C standard says that it gets converted to an int prior to shifting. If you try to set a bool to bit 31, this lands you in undefined behavior. It's better just to add the explicit cast and let the compiler delete it for us. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Get rid of fs_inst::equalsJason Ekstrand2019-01-292-23/+7
| | | | | | | | | | There are piles of fields that it doesn't check so using it is a lie. The only reason why it's not causing problem is because it has exactly one user which only uses it for MOV instructions (which aren't very interesting) and only on Sandy Bridge and earlier hardware. Just get rid of it and inline it in the one place that it's actually used. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/compiler: Add a file-level description of brw_eu_validate.cMatt Turner2019-01-261-1/+13
| | | | | | Acked-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* android: fix build issues with libmesa_anv_gen* librariesTapani Pälli2019-01-251-0/+1
| | | | | | | We need this include path to find nir/nir_xfb_info.h. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* intel/batch-decoder: fix a vb end address calculationAndrii Simiklit2019-01-251-1/+3
| | | | | | | | | | According to the loop implementation (in 'ctx_print_buffer' function), which advances dword by dword over vertex buffer(vb), the vb size should be aligned by 4 bytes too. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109449 Signed-off-by: Andrii Simiklit <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: fix vertex buffer size calculation for gen<8Andrii Simiklit2019-01-251-1/+1
| | | | | | | | | | | | | | | | It should be incremented by one according to how it is calculated by 'emit_vertex_buffer_state': "\#if GEN_GEN < 8 .BufferAccessType = step_rate ? INSTANCEDATA : VERTEXDATA, .InstanceDataStepRate = step_rate, \#if GEN_GEN >= 5 .EndAddress = ro_bo(bo, end_offset - 1), \#endif \#endif" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109449 Signed-off-by: Andrii Simiklit <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: drop always-successful VkResultEric Engestrom2019-01-251-9/+4
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/allocator: Avoid race condition in anv_block_pool_map.Rafael Antognolli2019-01-242-6/+27
| | | | | | | | | | | | | | | | | Accessing bo->map and then pool->center_bo_offset without a lock is racy. One way of avoiding such race condition is to store the bo->map + center_bo_offset into pool->map at the time the block pool is growing, which happens within a lock. v2: Only set pool->map if not using softpin (Jason). v3: Move things around and only update center_bo_offset if not using softpin too (Jason). Cc: Jason Ekstrand <[email protected]> Reported-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109442 Fixes: fc3f58832015cbb177179e7f3420d3611479b4a9 Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Reset default flag register in brw_find_live_channel()Matt Turner2019-01-231-2/+11
| | | | | | | | | | | | emit_uniformize() emits SHADER_OPCODE_FIND_LIVE_CHANNEL with its flag_subreg set, so that the IR knows which flag is accessed. However the flag is only used on Gen7 in Align1 mode. To avoid setting unnecessary bits in the instruction words, get the information we need and reset the default flag register. This allows round-tripping through the assembler/disassembler. Reviewed-by: Francisco Jerez <[email protected]>
* anv: Implement transform feedback queriesJason Ekstrand2019-01-223-2/+73
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* genxml: Add SO_PRIM_STORAGE_NEEDED and SO_NUM_PRIMS_WRITTENJason Ekstrand2019-01-226-0/+192
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Implement CmdBegin/EndQueryIndexedJason Ekstrand2019-01-221-1/+20
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Implement vkCmdDrawIndirectByteCountEXTJason Ekstrand2019-01-222-1/+148
| | | | | | | | Annoyingly, this requires that we implement integer division on the command streamer. Fortunately, we're only ever dividing by constants so we can use the mulh+add+shift trick and it's not as bad as it sounds. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Implement the basic form of VK_EXT_transform_feedbackJason Ekstrand2019-01-227-2/+329
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add pipeline cache support for xfb_infoJason Ekstrand2019-01-224-9/+52
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add but do not enable VK_EXT_transform_feedbackJason Ekstrand2019-01-221-0/+1
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Always emit at least one vertex elementJason Ekstrand2019-01-221-3/+1
| | | | | | | | | This seems to make the simulator happier. The early return wasn't really protecting anything and the code that follows will happily initialize the dummy element to STORE_0 and emit it. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/pipeline: Add a pdevice helper variableJason Ekstrand2019-01-211-9/+10
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv: Only parse pImmutableSamplers if the descriptor has samplersJason Ekstrand2019-01-211-12/+31
| | | | | | | Cc: [email protected] Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: replace more nir_load_system_value calls with builder functionsKarol Herbst2019-01-211-2/+1
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename nir_var_ssbo to nir_var_mem_ssboKarol Herbst2019-01-191-1/+1
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename nir_var_ubo to nir_var_mem_uboKarol Herbst2019-01-191-1/+1
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename nir_var_function to nir_var_function_tempKarol Herbst2019-01-192-9/+9
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* intel/genxml: add missing MI_PREDICATE compare operationsLionel Landwerlin2019-01-197-1/+12
| | | | | | | | Doesn't save us a great deal of lines but at least they get decoded in aubinators. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* anv: document cache flushes & invalidationsLionel Landwerlin2019-01-191-0/+67
| | | | | | | | | | | | | | | | | A little bit of explanation regarding how vkCmdPipelineBarrier() works. v2: Avoid referring to data port cache when it's actually sampler caches (Jason) Complete explanation for indirect draws (Jason) v3: s/samplers/sampler/ (Jason) s/UBOs/data port/ Add documentation for VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT_EXT (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Eric Engestrom <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* anv: narrow flushing of the render target to buffer writesLionel Landwerlin2019-01-196-20/+15
| | | | | | | | | | | | | | In commit 9a7b3199037ac4 ("anv/query: flush render target before copying results") we tracked all the render target writes to apply a flushes in the vkCopyQueryResults(). But we can narrow this down to only when we write a buffer (which is the only input of vkCopyQueryResults). v2: Drop newer render target write flags introduce by 1952fd8d2ce905 ("anv: Implement VK_EXT_conditional_rendering for gen 7.5+") Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v1)
* intel/fs: Promote execution type to 32-bit when any half-float conversion is ↵Francisco Jerez2019-01-181-0/+21
| | | | | | | | | | | | | | | needed. The docs are fairly incomplete and inconsistent about it, but this seems to be the reason why half-float destinations are required to be DWORD-aligned on BDW+ projects. This way the regioning lowering pass will make sure that the destination components of W to HF and HF to W conversions are aligned like the corresponding conversion operation with 32-bit execution data type. Tested-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>