summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* anv/query: Use snooping on !LLC platformsJason Ekstrand2017-04-071-13/+11
| | | | | | | | | | | | | | Commit b2c97bc789198427043cd902bc76e194e7e81c7d which made us start using a busy-wait for individual query results also messed up cache flushing on !LLC platforms. For one thing, I forgot the mfence after the clflush so memory access wasn't properly getting fenced. More importantly, however, was that we were clflushing the whole query range and then waiting for individual queries and then trying to read the results without clflushing again. Getting the clflushing both correct and efficient is very subtle and painful. Instead, let's side-step the problem by just snooping. Reviewed-by: Chris Wilson <[email protected]>
* anv: provide anv_gem_busy() stub for the testsEmil Velikov2017-04-071-0/+6
| | | | | | | | | | | Otherwise linking way fail. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100600 Fixes: f195d40eca4 ("anv/device: Add a helper for querying whether a BO is busy") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Juan A. Suarez Romero <[email protected]> Tested-by: Vinson Lee <[email protected]>
* anv/blorp: sample input attachments with resolves on BDWSamuel Iglesias Gonsálvez2017-04-071-0/+11
| | | | | | | | | | | | | | On Broadwell we still need to do a resolve between the subpass that writes and the subpass that reads when there is a self-dependency because HW could not see fast-clears and works on the render cache as if there was regular non-fast-clear surface. Fixes 16 tests on BDW: dEQP-VK.renderpass.formats.*.input.clear.store.self_dep* Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/aubinator: Stop searching after a custom handler is foundJordan Justen2017-04-061-1/+3
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/gen_decoder: return -1 for unknown command formatsJordan Justen2017-04-063-13/+23
| | | | | | | | | | | | | | | | | Decoding with aubinator encountered a command of 0xffffffff. With the previous code, it caused aubinator to jump 255 + 2 dwords to start decoding again. Instead we can attempt to detect the known instruction formats. If the format is not recognized, then we can advance just 1 dword. v2: * Update aubinator_error_decode * Actually convert the length variable returned into a *signed* integer in aubinator.c, intel_batchbuffer.c and aubinator_error_decode.c. Signed-off-by: Jordan Justen <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* intel/gen_decoder: Fix length for Media State/Object commandsJordan Justen2017-04-061-2/+10
| | | | | | | | From BDW PRM, Volume 6: Command Stream Programming, 'Render Command Header Format'. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/aubinator_error_decode: Fix structure decode dataJordan Justen2017-04-061-1/+1
| | | | | | | | | | The call to gen_print_group should provide a pointer to the beginning of the the structure data, not the start of the batch data. Cc: Lionel Landwerlin <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/query: Busy-wait for available query entriesJason Ekstrand2017-04-051-6/+56
| | | | | | | | | | | | | | | | | | | Before, we were just looking at whether or not the user wanted us to wait and waiting on the BO. Some clients, such as the Serious engine, use a single query pool for hundreds of individual query results where the writes for those queries may be split across several command buffers. In this scenario, the individual query we're looking for may become available long before the BO is idle so waiting on the query pool BO to be finished is wasteful. This commit makes us instead busy-loop on each query until it's available. This significantly reduces pipeline bubbles and improves performance of The Talos Principle on medium settings (where the GPU isn't overloaded with drawing) by around 20% on my SkyLake gt4. Reviewed-by: Chris Wilson <[email protected]> Tested-by: Eero Tamminen <[email protected]> Tested-by: Grazvydas Ignotas <[email protected]>
* anv/device: Add a helper for querying whether a BO is busyJason Ekstrand2017-04-053-6/+47
| | | | | | This is a bit more efficient than using GEM_WAIT with a timeout of 0. Reviewed-by: Chris Wilson <[email protected]>
* anv: provide required gem stubs for the testsEmil Velikov2017-04-051-0/+19
| | | | | | | | | | | | | | | | | | Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones. Otherwise we may get link-time errors, when building the tests. v2: Introduce all the missing stubs at once. Cc: Jason Ekstrand <[email protected]> Cc: Vinson Lee <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574 Fixes: c964f0e485d ("anv: Query the kernel for reset status") Fixes: 651ec926fc1 ("anv: Add support for 48-bit addresses") Fixes: 060a6434eca ("anv: Advertise larger heap sizes") Signed-off-by: Emil Velikov <[email protected]> --- I've intentionally kept the order the same identical to the anv_gem.c. This way we can easily grep & diff in the future ;-)
* intel: genxml: automake: include gen_bits_header.py in the tarballEmil Velikov2017-04-051-0/+1
| | | | Signed-off-by: Emil Velikov <[email protected]>
* intel: genxml: automake: polish automake rulesEmil Velikov2017-04-051-2/+2
| | | | Signed-off-by: Emil Velikov <[email protected]>
* anv: Advertise larger heap sizesJason Ekstrand2017-04-043-14/+75
| | | | | | | | | | | Instead of just advertising the aperture size, we do something more intelligent. On systems with a full 48-bit PPGTT, we can address 100% of the available system RAM from the GPU. In order to keep clients from burning 100% of your available RAM for graphics resources, we have a nice little heuristic (which has received exactly zero tuning) to keep things under a reasonable level of control. Reviewed-by: Kristian H. Kristensen <[email protected]>
* anv: Add support for 48-bit addressesJason Ekstrand2017-04-045-0/+54
| | | | | | | | | | | | | | | | | | This commit adds support for using the full 48-bit address space on Broadwell and newer hardware. Thanks to certain limitations, not all objects can be placed above the 32-bit boundary. In particular, general and state base address need to live within 32 bits. (See also Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.) In order to handle this, we add a supports_48bit_address field to anv_bo and only set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set. We set the bit for all client-allocated memory objects but leave it false for driver-allocated objects. While this is more conservative than needed, all driver allocations should easily fit in the first 32 bits of address space and keeps things simple because we don't have to think about whether or not any given one of our allocation data structures will be used in a 48-bit-unsafe way. Reviewed-by: Kristian H. Kristensen <[email protected]>
* anv: Replace anv_bo::is_winsys_bo with a uint32_t flagsJason Ekstrand2017-04-043-9/+11
| | | | Reviewed-by: Kristian H. Kristensen <[email protected]>
* anv/blorp: Align vertex buffers to 64BJason Ekstrand2017-04-041-1/+14
| | | | | | | | | | This fixes issues seen when adding support for full 48-bit addresses. The 48-bit addresses themselves have nothing to do with it other than that it caused the kernel to place buffers slightly differently so they interacted differently with the caches. Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* anv: Query the kernel for reset statusJason Ekstrand2017-04-044-40/+107
| | | | | | | | | | | | When a client causes a GPU hang (or experiences issues due to a hang in another client) we want to let it know as soon as possible. In particular, if it submits work with a fence and calls vkWaitForFences or vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be able to trust the results of that rendering. In order to provide this guarantee, we have to ask the kernel for context status in a few key locations. Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Check for device loss at the end of WaitForFencesJason Ekstrand2017-04-041-5/+14
| | | | | | | It's possible that the device could have been lost while we were waiting. We should let the user know if this has happened. Reviewed-by: Kenneth Graunke <[email protected]>
* anv/pipeline: Properly handle unset gl_Layer and gl_ViewportIndexJason Ekstrand2017-04-041-3/+24
| | | | | | | | | | When the shader does not set one of these values, they are supposed to get a default value of 0. We have hardware bits in 3DSTATE_CLIP for this but haven't been setting them. This fixes the intermittent failure of dEQP-VK.geometry.layered.3d.render_to_default_layer. Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]>
* i965/fs: Always provide a default LOD of 0 for TXS and TXLJason Ekstrand2017-04-041-9/+9
| | | | | | | | | | | | | We already provide a default LOD for textureQueryLevels and texture() on non-fragment stages. However, there are more cases where one is needed such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list out all of the cases one at a time, just provide the default for all TXS and TXL operations. This fixes a shader validation error in the new Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391 Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0 17.0" <[email protected]>
* intel/isl: Refactor and clerify gen8 alignment calculationsJason Ekstrand2017-04-041-15/+49
| | | | | | | | Adding the actual table from the docs makes it clearer exactly what the restrictions are. In particular, it becomes clear that compressed textures ignore the alignment parameters in RENDER_SURFACE_STATE. Reviewed-by: Chad Versace <[email protected]>
* intel: tools: add aubinator_error_decode toolLionel Landwerlin2017-04-045-1/+766
| | | | | | | | | | | | | | | | This is pretty much the same tool as what i-g-t has, only with a more fancy decoding of the instructions/registers. It also doesn't support anything before gen4. v2 (from Matt): Drop authors Remove undefined automake variable v3: Fix incorrect offsets for dword > 1 (Jordan) v4: Fix decompression error with large blobs (Jordan) Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Matt Turner <[email protected]>
* intel: genxml: add RING_BUFFER_CTL registersLionel Landwerlin2017-04-045-0/+272
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: genxml: add FAULT_REG registerLionel Landwerlin2017-04-045-0/+206
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: genxml: add gen7 ERR_INT registerLionel Landwerlin2017-04-042-0/+22
| | | | | | | v2: add register to gen7.5 (Matt) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: genxml: add ACTHD registersLionel Landwerlin2017-04-042-0/+32
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: genxml: add GFX_ARB_ERROR_RPT registerLionel Landwerlin2017-04-045-0/+73
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: genxml: add INSTDONE registersLionel Landwerlin2017-04-045-0/+387
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* android: intel: genxml: fix genX_xml.h generation rulesMauro Rossi2017-04-041-0/+5
| | | | | | | | | | | | | Recent changes in Makefile.sources merged the aubinator files in a unique list of generated files and genxml/genX_xml.h is now needed to avoid the following building error: ninja: error: '.../genxml/genX_xml.h', needed by '.../genxml/genX_xml.h', missing and no known rule to make it build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed Fixes: 0f83c05 "intel: genxml: compress all gen files into one" Acked-by: Lionel Landwerlin <[email protected]>
* intel/vec4: Add some fall through commentsJason Ekstrand2017-04-031-0/+4
| | | | Reviewed-by: Matt Turner <[email protected]>
* anv: Implement VK_KHR_incremental_presentJason Ekstrand2017-04-033-2/+15
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* vulkan/wsi: Plumb present regions through the common codeJason Ekstrand2017-04-031-1/+2
| | | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Daniel Stone <[email protected]> Acked-by: Dave Airlie <[email protected]>
* aubinator/gen_decoder/i965: decode instructions from dword 0Lionel Landwerlin2017-04-033-8/+20
| | | | | | | | | Some packets like 3DSTATE_VF_STATISTICS, 3DSTATE_DRAWING_RECTANGLE, 3DPRIMITIVE, PIPELINE_SELECT, etc... have configurable fields in dword0, we probably want to print those. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: gen_decoder: store pointer to current decoded field in iteratorLionel Landwerlin2017-04-032-25/+26
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: genxml: fix out of tree buildsLionel Landwerlin2017-03-311-2/+2
| | | | | | | | v2: use Emil's recommendation change rule to closer to genxml/genX_bits.h Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* anv: change BLOCK_POOL_MEMFD_SIZE to 1GBTapani Pälli2017-03-311-2/+2
| | | | | | | | This allows us to run 32bit Vulkan apps on Android, ftruncate call would fail on 2GB (max size being 2GB - 1). Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* android: add libmesa_genxml as dep to libmesa_islTapani Pälli2017-03-311-1/+2
| | | | | | | | | This is to fix following compile error with libmesa_isl: mesa/src/intel/isl/isl.c:28:10: fatal error: 'genxml/genX_bits.h' file not found Fixes: f0eaf38 ("genxml: New generated header genX_bits.h (v6)") Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emli Velikov <[email protected]>
* aubinator: enable snb/ilk through --genLionel Landwerlin2017-03-311-0/+2
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel: genxml: compress all gen files into oneLionel Landwerlin2017-03-315-62/+58
| | | | | | | | | | | | | Combining all the files into a single string didn't make any difference in the size of the aubinator binary. With this change we now also embed gen4/4.5/5 descriptions, which increases the aubinator size by ~16Kb. v2 (Lionel): rebase makefiles Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel: Add INTEL_CFLAGS to aubinator CFLAGS.Kenneth Graunke2017-03-301-1/+2
| | | | It still needs intel_aub.h. Fixes the build.
* intel: automake: move INTEL_CFLAGS as applicableEmil Velikov2017-03-302-1/+1
| | | | | | | | | | Only common/decoder.[ch] requires it [for intel_aub.h]. v2: The code was moved to from intel/tools to intel/common, update accordingly. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: android: remove libdrm_intel requirementEmil Velikov2017-03-303-12/+6
| | | | | | | | The only part which requires libdrm_intel tools/aubinator is not built on Android. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/cmd_buffer: fix host memory leakCraig Stout2017-03-291-1/+9
| | | | | | | | push_constants must be free'd. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100452 Reviewed-by: Jason Ekstrand <[email protected]> Cc: "17.0 13.0" <[email protected]>
* anv/batch_chain: Handle another OOM in cmd_buffer_execbufJason Ekstrand2017-03-291-2/+4
| | | | | | Found by inspection while rebasing other patches. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: expose BRW_OPCODE_[F32TO16/F16TO32] name on gen8+Alejandro Piñeiro2017-03-291-0/+9
| | | | | | | | | | | | | | | | | | | | | | Technically those hw operations are only available on gen7, as gen8+ support the conversion on the MOV. But, when using the builder to implement nir operations (example: nir_op_fquantize2f16), it is not needed to do the gen check. This check is done later, on the final emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the specific operation accordingly. So in the middle, during optimization phases those hw operations can be around for gen8+ too. Without this patch, several (at least 95) vulkan-cts quantize tests crashes when using INTEL_DEBUG=optimizer. For example: dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert v2: simplify the code using GEN_GE (Ilia Mirkin) v3: tweak brw_instruction_name instead of changing opcode_descs table, that is used for validation (Matt Turner) Reviewed-by: Matt Turner <[email protected]>
* anv/cmd_buffer: Refactor flush_pipeline_select_*Jason Ekstrand2017-03-281-26/+16
| | | | | | | While having the _3d and _gpgpu versions is nice, there's no reason why we need to have duplicated logic for tracking the current pipeline. Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv: Flush caches prior to PIPELINE_SELECT on all gensJason Ekstrand2017-03-281-2/+1
| | | | | | | | | | | | | | | | The programming note that says we need to do this still exists in the SkyLake PRM and, from looking at the bspec, seems like it may apply to all hardware generations SNB+. Unfortunately, this isn't particularly clear cut since there is also language in the bspec that says you can skip the flushing and stall to get better throughput. Experimentation with the "Car Chase" benchmark in GL seems to indicate that some form of flushing is still needed. This commit makes us do the full set of flushes regardless of hardware generation. We can always reduce the flushing later. Reported-by: Topi Pohjolainen <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: "17.0 13.0" <[email protected]>
* anv/cmd_buffer: Fix bad indentationJason Ekstrand2017-03-281-24/+25
| | | | | | | | A bunch of code was indented in such a way that it looked like it went with the if statement above but it definitely didn't. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: "17.0 13.0" <[email protected]>
* anv/cmd_buffer: Apply flush operations prior to executing secondariesJason Ekstrand2017-03-281-0/+5
| | | | | | | This fixes rendering issues in the Vulkan port of skia on some hardware. Reviewed-by: Lionel Landwerlin <[email protected]> Cc: "13.0 17.0" <[email protected]>
* anv/blorp: Use anv_get_layerCount everywhereJason Ekstrand2017-03-281-8/+12
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Cc: "13.0 17.0" <[email protected]>