summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radv: Avoid binning RAVEN hangs.Bas Nieuwenhuizen2019-08-081-1/+2
| | | | | | | Mirroring radeonsi. CC: <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: Fix off by one for S_028C48_MAX_ALLOC_COUNT.Bas Nieuwenhuizen2019-08-081-1/+1
| | | | Reviewed-by: Dave Airlie <[email protected]>
* swr/rasterizer: modernize thread TLBJan Zielinski2019-08-0814-30/+135
| | | | Reviewed-by: Alok Hota <[email protected]>
* swr/rasterizer: Refactor events collection mechanismJan Zielinski2019-08-0810-439/+382
| | | | | | Several improvements and cleanups in events and statstics mechanisms Reviewed-by: Alok Hota <[email protected]>
* swr/rasterizer: improvements in simdlibJan Zielinski2019-08-0817-492/+49
| | | | | | | | | | | | | 1. fix build issues with MSVC 2019 compiler The MSVC 2019 compiler seems to have an issue with optimized code-gen when using the _mm256_and_si256() intrinsic. Only disable use of integer vpand on buggy versions MSVC 2019. Otherwise allow use of integer vpand intrinsic. 2. Remove unused vec/matrix functionality Reviewed-by: Alok Hota <[email protected]>
* swr/rasterizer: Events are now grouped and enabled by knobsJan Zielinski2019-08-0815-202/+372
| | | | | | | | | | | | | All events are now grouped as follows: -Framework (i.e. ThreadStart) [always ON] -Api (i.e. SwrSync) [always ON] -Pipeline [default ON] -Shader [default ON] -SWTag [default OFF] -Memory [default OFF] Reviewed-by: Alok Hota <[email protected]>
* swr/rasterizer: do not mark tiles dirty until actually renderedJan Zielinski2019-08-0813-8/+72
| | | | Reviewed-by: Alok Hota <[email protected]>
* swr/rasterizer: enable size accumulation in mem statsJan Zielinski2019-08-0811-104/+128
| | | | | | Small refactoring is also performed Reviewed-by: Alok Hota <[email protected]>
* swr/rasterizer: enable using AOS vertex data formatJan Zielinski2019-08-083-21/+81
| | | | Reviewed-by: Alok Hota <[email protected]>
* v3d: handle wait requirement when retrieving query results correctlyIago Toral Quiroga2019-08-081-2/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: use the GPU to record primitives written to transform feedbackIago Toral Quiroga2019-08-088-16/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can use the PRIMITIVE_COUNTS_FEEDBACK packet to write various primitive counts to a buffer, including the number of primives written to transform feedback buffers, which will handle buffer overflow correctly. There are a couple of caveats with this: Primitive counters are reset when we emit a 'Tile Binning Mode Configuration' packet, which can happen in the middle of a primitives query, so we need to read the buffer when we submit a job and accumulate the counts in the context so we don't lose them. We also need to do the same when we switch primitive type during transform feedback so we can compute the correct number of recorded vertices from the number of primitives. This is necessary so we can provide an accurate vertex count for draw from transform feedback. v2: - When computing the number of vertices for a primitive, pass in the base primitive, since that is what the hardware will count. - No need to update primitive counts when switching primitive types if the base primitives are the same. - Log perf warning when mapping the primitive counts BO for readback (Eric). - Only emit the primitive counts packet once at job end (Eric). - Use u_upload mechanism for the primitive counts buffer (Eric). - Use the XML to generate indices into the primitive counters buffer (Eric). Fixes piglit tests: spec/ext_transform_feedback/overflow-edge-cases spec/ext_transform_feedback/query-primitives_written-bufferrange spec/ext_transform_feedback/query-primitives_written-bufferrange-discard spec/ext_transform_feedback/change-size base-shrink spec/ext_transform_feedback/change-size base-grow spec/ext_transform_feedback/change-size offset-shrink spec/ext_transform_feedback/change-size offset-grow spec/ext_transform_feedback/change-size range-shrink spec/ext_transform_feedback/change-size range-grow spec/ext_transform_feedback/intervening-read prims-written Reviewed-by: Eric Anholt <[email protected]>
* gallium/util: add a helper to compute vertex count from primitive countIago Toral Quiroga2019-08-083-1/+91
| | | | | | | | v2: - Only compute vertex counts for base primitives. - Add a unit test (Eric) Reviewed-by: Eric Anholt <[email protected]>
* v3d: be more explicit about the query types supportedIago Toral Quiroga2019-08-081-3/+11
| | | | Reviewed-by: Eric Anholt <[email protected]>
* v3d: generate packet unpack functionsIago Toral Quiroga2019-08-081-0/+10
| | | | | | | | | These were not being compiled because of the lack of __gen_unpack_address. v2: - Shift raw address correctly (Eric). Reviewed-by: Eric Anholt <[email protected]>
* v3d: add header guards in v3d_packet_helpers.hIago Toral Quiroga2019-08-081-0/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* panfrost: Print errors from kernelTomeu Vizoso2019-08-081-5/+5
| | | | | Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Mark buffers as PANFROST_BO_HEAPTomeu Vizoso2019-08-081-0/+6
| | | | | | | | | | | | What we call GROWABLE in Mesa corresponds to the HEAP BO flag in the kernel. These buffers cannot be memory mapped in the CPU side at the moment, so make sure they are also marked INVISIBLE. This allows us to allocate a big heap upfront (16MB) without actually reserving space unless it's needed. Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Mark BOs as NOEXECTomeu Vizoso2019-08-084-2/+37
| | | | | | | | | Unless a BO has the EXECUTABLE flag, mark it as NOEXEC. v2: - Rework version detection (Alyssa). Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Take into account flags when looking up in the BO cacheTomeu Vizoso2019-08-083-3/+5
| | | | | | | | | | | | This will be useful right now so we avoid retrieving a non-executable buffer when a executable one is needed. As we support more flags, this logic will need to be extended to consider the different trade-offs to be made when matching BO specifications to BOs in the cache. Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Allocate shaders in their own BOsTomeu Vizoso2019-08-087-32/+61
| | | | | | | | | | | | | | | | | | | | | Instead of all shaders being stored in a single BO, have each shader in its own. This removes the need for a 16MB allocation per context, and allows us to place transient blend shaders in BOs marked as executable (before they were allocated in the transient pool, which shouldn't be executable). v2: - Store compiled blend shaders in a malloc'ed buffer, to avoid reading from GPU-accessible memory when patching (Alyssa). - Free struct panfrost_blend_shader (Alyssa). - Give the job a reference to regular shaders when emitting (Alyssa). v3: - Split out the allocation flags change (Rob). Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* util/hash_table: Fix hashing in clears on 32-bitTomeu Vizoso2019-08-081-2/+12
| | | | | | | | | | | | | | | | | | | Some hash functions (eg. key_u64_hash) will attempt to dereference the key, causing an invalid access when passed DELETED_KEY_VALUE (0x1) or FREED_KEY_VALUE (0x0). When in 32-bit arch a 64-bit key value doesn't fit into a pointer, so hash_table_u64 internally use a pointer to a struct containing the 64-bit key value. Fix _mesa_hash_table_u64_clear() to handle the 32-bit case by creating a temporary hash_key_u64 to pass to the hash function. Signed-off-by: Tomeu Vizoso <[email protected]> Suggested-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Cc: Samuel Pitoiset <[email protected]> Cc: Nicolai Hähnle <[email protected]>
* anv: support GetSwapchainGrallocUsage2ANDROID for AndroidTapani Pälli2019-08-083-22/+88
| | | | | | | | | | | | | | New function supports gralloc1 usage flags that get set separately for producer and consumer. As we still need to support old method too, let's share common code and use android_convertGralloc0To1Usage helper. Bump the VK_ANDROID_native_buffer version to indicate support for the new call. Changes were tested on Android Celadon P with Basemark GPU and various Sascha Willems Vulkan demos. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* st/mesa: eliminate unnecessary redirectionMark Janes2019-08-071-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: fix debug typoMark Janes2019-08-071-5/+5
| | | | | | Misspelling was seen with INTEL_DEBUG=perfmon. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: make gen_perf_query_object privateMark Janes2019-08-072-72/+80
| | | | | | Encapsulate the details of this structure within the perf implemenation. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: make perf context privateMark Janes2019-08-075-103/+136
| | | | | | Encapsulate the details of this data structure. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: print debug informationMark Janes2019-08-073-25/+43
| | | | | | | | | INTEL_DEBUG=perfmon will iterate over the perf queries, printing information about the state of each query. Some of this information will be private to intel/perf, and needs to a dump routine that can be called from i965. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: make internal methods privateMark Janes2019-08-072-95/+62
| | | | | | | Now that all references from i965 have been moved to perf, we can make internal methods private again. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: make oa_sample_buffers privateMark Janes2019-08-072-119/+120
| | | | | | | All references to this data structure have been moved inside the perf subsystem. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: expose method to create queryMark Janes2019-08-073-10/+22
| | | | | | | By encapsulating this implementation within perf, we can eventually make struct gen_perf_ctx private. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move initialization of pipeline statistics metrics to gen_perfMark Janes2019-08-073-217/+222
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move get_query_data into gen_perfMark Janes2019-08-074-399/+378
| | | | | | | | | | | | | | | This refactor moves several helper functions for get_query_data as well: - accumulate_oa_reports - read_gt_frequency - get_pipeline_stats_data - get_oa_counter_data Functions which are no longer referenced in brw_performance_query.c have been removed. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move delete_query to gen_perfMark Janes2019-08-073-39/+94
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move is_query_ready to gen_perfMark Janes2019-08-073-133/+32
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move wait_query to perfMark Janes2019-08-073-38/+168
| | | | | | | | | | | | | The following methods have duplicate implementation of read_oa_samples_until in brw_performance_query.c: - read_oa_samples_for_query - read_oa_samples_until They ar still referenced by other methods in the file and will be removed on the subsequent commit. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: create a vtable entry for bo_busyMark Janes2019-08-072-4/+6
| | | | | | | Iris and i965 variants of this method need to be called by perf routines. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: create a vtable entry for bo_wait_renderingMark Janes2019-08-072-1/+4
| | | | | | | Iris and i965 variants of this method need to be called by perf routines. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: create a vtable entry for batch_referencesMark Janes2019-08-072-6/+12
| | | | | | | Iris and i965 variants of this method need to be called by perf routines. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: refactor gen_perf_end_query into gen_perfMark Janes2019-08-073-45/+57
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: refactor gen_perf_begin_query into gen_perfMark Janes2019-08-073-247/+260
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move perf-related state into gen_perf_contextMark Janes2019-08-073-49/+80
| | | | | | | | | | | | To move more operations into intel/perf, several state items are needed. Save references to that state in the perf_ctxt, rather than passing them in for every operation. This commit includes an initializer for gen_perf_context, to set those references and also encapsulate the initialization of the sample buffer state. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: create a vtable entries for buffer object map/unmapMark Janes2019-08-073-6/+17
| | | | | | These operations are needed to refactor subsequent methods into perf Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move client reference counts into perfMark Janes2019-08-073-36/+37
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move open_perf into perfMark Janes2019-08-073-48/+49
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move close_perf into perfMark Janes2019-08-073-18/+20
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: create a vtable entry for emit_mi_flushMark Janes2019-08-072-2/+5
| | | | | | This method is needed to move subsequent methods into perf. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: use temporary pointers to simplify access to perf stateMark Janes2019-08-071-78/+92
| | | | | | | | | | | Most accesses to perf state were made through repeated dereferences of brw_context members. Prefering temporary variables of perf_ctx and perf_cfg has the following advantages: - more concise implementation - easier refactor when moving subsequent methods to perf Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move snapshot_statistics_registers into perfMark Janes2019-08-073-28/+33
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move query_object into perfMark Janes2019-08-074-159/+136
| | | | | | Query objects can now be encapsulated within the perf subsystem. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: create a vtable entry for store_register_mem64Mark Janes2019-08-072-3/+9
| | | | | | This method is needed to move subsequent methods into perf. Reviewed-by: Kenneth Graunke <[email protected]>