summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965: Compile with -msse2 (instead of -msse2)Matt Turner2017-07-141-1/+1
| | | | | Ian noted that were were two Pentium 4 Extreme Edition LGA 775 CPUs, and they only have SSE2.
* i965: Compile with -msse3Matt Turner2017-07-141-1/+2
| | | | | | | | | | | | All CPUs that can be paired with a GPU supported by i965_dri.so supports SSE3. This allows us to ensure that some vectorized version of the tiled memcpy path is enabled on 32-bit systems. This also ensures that __builtin_ia32_clflush is always usable. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101774 Tested-by: Tobias Klausmann <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* egl: Fix predecence problem when setting __DRI_CTX_FLAG_NO_ERRORKenneth Graunke2017-07-141-1/+1
| | | | | | | | | | This accidentally set __DRI_CTX_FLAG_NO_ERROR whenever any flags were present. Just needs extra parenthesis. Fixes: 4909519a6655 (egl: Add EGL_KHR_create_context_no_error support) Reviewed-by: Grigori Goronzy <[email protected]> Tested-by: Mark Janes <[email protected]>
* drirc: whitelist glthread for Euro Truck Simulator 2Petr Sebor2017-07-141-0/+3
|
* drirc: whitelist glthread for American Truck SimulatorPetr Sebor2017-07-141-0/+3
|
* mesa/marshal: fix Windows buildGrigori Goronzy2017-07-141-3/+3
| | | | | | | This was broken by commit 1ad24faa. Reported by AppVeyor: https://ci.appveyor.com/project/mesa3d/mesa/build/4918
* drirc: whitelist glthread for The Witcher 2Edmondo Tommasina2017-07-141-0/+3
| | | | | | | | Performance delta on AMD Phenom II X3 720 / RX 470 The Witcher 2: +18% Signed-off-by: Marek Olšák <[email protected]>
* drirc: whitelist glthread for Civilization 5Edmondo Tommasina2017-07-141-0/+3
| | | | | | | | Performance delta on AMD Phenom II X3 720 Civilization 5: +28% Signed-off-by: Marek Olšák <[email protected]>
* swr: JitManager runtime determination of architectureTim Rowley2017-07-141-1/+2
| | | | | | | Fixes performance regression from f50aa21456d - was forcing internal code generation to target AVX (no gather, etc). Reviewed-by: Bruce Cherniak <[email protected]>
* st/mesa: Add KHR_no_error toggle to driconfGrigori Goronzy2017-07-143-0/+9
| | | | | | | | Allows applications to be whitelisted. v2: Remove misguided DRI common part. Reviewed-by: Marek Olšák <[email protected]>
* egl: Add EGL_KHR_create_context_no_error supportGrigori Goronzy2017-07-146-2/+53
| | | | | | | | This only adds the EGL side, needs to be plumbed into Mesa frontend. v2: Add check for extension availability. Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: Add support for KHR_no_error flagGrigori Goronzy2017-07-145-5/+18
| | | | | | | Add a new context flag and plumb it through the various layers of the context creation code to set up dispatch tables for the no-error mode. Reviewed-by: Marek Olšák <[email protected]>
* dri: Add KHR_no_error DRI extensionGrigori Goronzy2017-07-1410-3/+23
| | | | | | | | | | This basic extension allows usage of the __DRI_CTX_FLAG_NO_ERROR flag. This includes support code for classic Mesa drivers to switch on the no-error mode if the flag is set. v2: Move to common DRI code. Reviewed-by: Marek Olšák <[email protected]>
* mesa/marshal: fix glNamedBufferData with NULL dataGrigori Goronzy2017-07-141-4/+13
| | | | | | | The semantics are similar to glBufferData. Tested-by: Marc Dietrich <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa/marshal: add marshalling for glClearBuffer*Grigori Goronzy2017-07-143-5/+163
| | | | | | | | | Add async marshalling/unmarshalling for all glClearBuffer variants. These entry points are commonly used in general and Alien Isolation specifically uses glClearBufferiv. Slightly reduces the number of thread synchronizations with glthread in that game. Reviewed-by: Marek Olšák <[email protected]>
* mesa/marshal: extract ClearBuffer helpersGrigori Goronzy2017-07-142-29/+50
| | | | | | | | | Extract clear buffer helper functions in preparation for adding marshal/unmarshal functions for the various glClearBuffer variants. v2: Fix command size. Reviewed-by: Marek Olšák <[email protected]>
* gallium/hud: use double values for all graphsChristoph Haag2017-07-143-8/+14
| | | | | | | | | | | | | | | The fps graph for example calculates the fps as double with small variations based on when query_new_value() is called, which causes many values to be truncated on the cast to uint64_t. The HUD internally stores the values as double, so just use double everywhere instead of fixing this with rounding. Using doubles also allows the hud to show small variations instead of being clamped to discrete values. v2: Don't print decimals in the dump file when not necessary Signed-off-by: Christoph Haag <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* Revert "etnaviv: add support for snorm textures"Lucas Stach2017-07-142-7/+3
| | | | | | | | This reverts commit d8b2ccdb880f, which causes priglit regressions on GPUs with SNORM support. We'll have another try at enabling this feature after the 17.2 branchpoint. Signed-off-by: Lucas Stach <[email protected]>
* etnaviv: reset indexed rendering information when not rendering indexedWladimir J. van der Laan2017-07-141-1/+6
| | | | | | | | | | A dangling bo object would result in memory corruption while loading a level in ioquake3_opengl2. Fixes: 330d0607ed60 (gallium: remove pipe_index_buffer and set_index_buffer) Suggested-by: Lucas Stach <[email protected]> Signed-off-by: Wladimir J. van der Laan <[email protected]> Reviewed-by: Lucas Stach <[email protected]>
* etnaviv: Use the correct LOG instruction on GC3000Wladimir J. van der Laan2017-07-143-10/+59
| | | | | | | | | | | GC3000 has a new LOG instruction, similar to the new SIN and COS instructions. Generate the new instruction sequence when appropriate; there are two occasions, as part of LIT and the generator for the LG2 instruction itself. Signed-off-by: Wladimir J. van der Laan <[email protected]> Reviewed-by: Lucas Stach <[email protected]>
* etnaviv: flush source TS before resolveLucas Stach2017-07-141-0/+4
| | | | | | | | | | If we blit from a rendertarget or a depthstencil buffer there might still be dirty data in the TS buffer which needs to be flushed out. Fixes missing shadow tiles in glmark2 shadow. Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Philipp Zabel <[email protected]>
* etnaviv: flush color cache and depth cache together before resolvesPhilipp Zabel2017-07-141-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before resolving a rendertarget or a depth/stencil resource into a texture, flush both the color cache and the depth cache together. It is unclear whether this is necessary for the following stall to work properly, or whether the depth flush just adds enough time for the color cache flush to finish before the resolver is started, but this change removes artifacts that otherwise appear if a texture is sampled directly after rendering into it. The test case is a simple QML scene graph with a QtWebEngine based WebView rendered on top of a blue background: import QtQuick 2.0 import QtQuick.Window 2.2 import QtWebView 1.1 Window { Rectangle { id: background anchors.fill: parent color: "blue" } WebView { id: webView anchors.fill: parent } Component.onCompleted: { webView.url = "<some animated website>" } } If the website is animated, the WebView renders the site contents into texture tiles and immediately afterwards samples from them to draw the tiles into the Qt renderbuffer. Without this patch, a small irregular triangle in the lower right of each browser tile appears solid blue, as if the texture sampler samples zeroes instead of the website contents, and the previously rendered blue Rectangle shows through. Other attempts such as adding a pipeline stall before the color flush or a TS cache flush afterwards or flushing multiple times, with stalls before and after each flush, have shown no effect. Signed-off-by: Philipp Zabel <[email protected]>
* st/mesa: handle stfbi being NULL on entry of st_framebuffer_reuse_or_createLucas Stach2017-07-141-0/+3
| | | | | | | | | | | Apparently this can happen. Just bail out early in that case, as all the called functions return NULL in that case. Fixes weston-terminal for me. Fixes: 147d7fb772a7 ("st/mesa: add a winsys buffers list in st_context") Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* egl/wayland: Use MIN2 for wl_drm versionDaniel Stone2017-07-141-3/+1
| | | | | | | | Use a slightly more explicit version cap for binding wl_drm, so we can add other interfaces with different versioning schemes later. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* egl/wayland: Fix whitespace damageDaniel Stone2017-07-141-18/+20
| | | | | | | Convert tabs to spaces, fix misalignments. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* util: Remove u_math from u_vectorDaniel Stone2017-07-142-1/+3
| | | | | | | | | | | u_vector.h doesn't actually use anything from u_math, but it does mean everyone has to pull in src/gallium/auxiliary/util includes. Just remove it, adding a <string.h> include to u_vector.c to cover memcpy. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* configure: only install khrplatform.h if neededEric Engestrom2017-07-141-0/+2
| | | | | | | | | | khrplatform.h is only used by EGL and GLES; let's only install it when one of those is enabled. Cc: [email protected] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jussi Kukkonen <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* anv/pipeline: do not use BITFIELD64_BIT()Juan A. Suarez Romero2017-07-141-1/+1
| | | | | | | | | In the previous commit, forgot to apply v2 suggestions. Fixes: 28d0c38 (anv/pipeline: use unsigned long long constant to check enable vertex inputs) Signed-off-by: Juan A. Suarez Romero <[email protected]>
* anv/pipeline: use unsigned long long constant to check enable vertex inputsJuan A. Suarez Romero2017-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When initializing the ANV pipeline, one of the tasks is checking which vertex inputs are enabled. This is done by checking if the enabled bits in inputs_read. But the mask to use is computed doing `(1 << (VERT_ATTRIB_GENERIC0 + desc->location))`. The problem here is that if location is 15 or greater, the sum is 32 or greater. But C is handling 1 as a 32-bit integer, which means the displaced bit is out of range and thus the full value is 0. Thus, use 1ull, which is an unsigned long long value. This fixes: dEQP-VK.pipeline.vertex_input.max_attributes.16_attributes.binding_one_to_one.interleaved v2: use 1ull instead of BITFIELD64_BIT() (Matt Turner) Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Cc: [email protected]
* i965: Use pushed UBO data in the scalar backend.Kenneth Graunke2017-07-133-1/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This actually takes advantage of the newly pushed UBO data, avoiding pull loads. Improves performance in GLBenchmark Manhattan 3.1 by: HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%. (thanks to Eero Tamminen for these numbers) shader-db results on Skylake, ignoring programs with spill/fill changes: total instructions in shared programs: 13963994 -> 13651893 (-2.24%) instructions in affected programs: 4250328 -> 3938227 (-7.34%) helped: 28527 HURT: 0 total cycles in shared programs: 179808608 -> 172535170 (-4.05%) cycles in affected programs: 79720410 -> 72446972 (-9.12%) helped: 26951 HURT: 1248 LOST: 46 GAINED: 21 Many "Deus Ex: Mankind Divided" shaders which already spilled end up spill a lot more (about 240 programs hurt, 9 helped). The cycle estimator suggests this is still overall a win (-0.23% in cycle counts) presumably because we trade pull loads for fills. v2: Drop "PULL" environment variable left in for initial debugging (caught by Matt). Reviewed-by: Matt Turner <[email protected]>
* i965: Factor out push locations.Kenneth Graunke2017-07-132-16/+25
| | | | | | | | With UBOs, the answer of "have we decided to push this uniform" gets a bit more complicated - for one, we have multiple surfaces. This patch refactors things so we can add the new code in a single place. Reviewed-by: Matt Turner <[email protected]>
* i965: Push UBO data, but don't use it just yet.Kenneth Graunke2017-07-134-8/+75
| | | | | | | | | | | This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets, and updates the compiler to know that there's extra payload data, so things continue working. However, it still issues pull loads for all data. I wanted to separate the two aspects for greater bisectability. v2: Update for new intel_bufferobj_buffer parameter. Reviewed-by: Matt Turner <[email protected]>
* i965: Pad buffer objects by 2kB in robust contexts to avoid OOB access.Kenneth Graunke2017-07-131-2/+20
| | | | | | | This is an annoyingly big hammer, but it seems less mean than disabling UBO pushing, and I'm not sure what else to do. Reviewed-by: Matt Turner <[email protected]>
* i965: Stop re-uploading push constants after URB reconfiguration.Kenneth Graunke2017-07-132-8/+7
| | | | | | | | | | | Previously we would re-upload the constant data to the batchbuffer, then re-emit the packets. We only need to do the last step (causing the existing data in the batchbuffer to be re-uploaded to the push constant staging area in the L3). Now that we've separated the two, it's pretty easy to accomplish. Reviewed-by: Matt Turner <[email protected]>
* i965: Separate uploading push constant data from the pointer packets.Kenneth Graunke2017-07-133-34/+52
| | | | | | | | | | | | | | | | | | I hope to upload UBO via 3DSTATE_CONSTANT_XS packets, in addition to normal uniforms. In order to do that, I'll need to re-emit the packets when UBOs change. But I don't want to re-copy the regular uniform data to the batchbuffer every time. This patch separates out the data uploading from the packet submission. We're running low on dirty bits, so I made the new atom happen on every draw call, and added a flag to stage_state indicating that we want the packet for that stage emitted. I would have preferred to do this outside the atom system, but it has to happen between the uploading of push constant data and the binding table upload. Reviewed-by: Matt Turner <[email protected]>
* i965: Introduce a BRW_NEW_DRAW_CALL dirty bit.Kenneth Graunke2017-07-133-0/+8
| | | | | | This allows us to have atoms which are signalled on every draw call. Reviewed-by: Matt Turner <[email protected]>
* i965: Store per-stage push constant BO pointers.Kenneth Graunke2017-07-133-3/+6
| | | | | | | | | | | | | | | | | | | | Right now, we always upload new push constant data, and immediately emit 3DSTATE_CONSTANT_* packets. We call intel_upload_space and store the resulting BO pointer in brw->curbe.curbe_bo. We read that when emitting the packets. This works today, but is fragile - it depends on upload and packet emission being interleaved. If we instead were to upload all the data, then emit all the packets, then upload BO wrapping will get us into trouble. For example, the VS constants may land in one upload BO, but the FS constants may not fit and land in a second upload BO. Uploading FS constants would overwrite the brw->curbe.curbe_bo pointer, so when we emitted 3DSTATE_CONSTANT_VS, we'd get the wrong BO. I intend to separate out this code in a future commit, so I need to fix this. To fix it, we simply store a per-stage BO pointer. Reviewed-by: Matt Turner <[email protected]>
* i965: Select ranges of UBO data to be uploaded as push constants.Kenneth Graunke2017-07-139-0/+322
| | | | | | | | | | | | | | | This adds a NIR pass that decides which portions of UBOS we should upload as push constants, rather than pull constants. v2: Switch to uint16_t for the UBO block number, because we may have a lot of them in Vulkan (suggested by Jason). Add more comments about bitfield trickery (requested by Matt). v3: Skip vec4 stages for now...I haven't finished wiring up support in the vec4 backend, and so pushing the data but not using it will just be wasteful. Reviewed-by: Matt Turner <[email protected]>
* i965: Require a UBO offset alignment of 32 bytes.Kenneth Graunke2017-07-131-1/+4
| | | | | | | | | | Soon, we're going to start providing UBO data to shaders as push constants, rather than requiring them to issue pull loads. The 3DSTATE_CONSTANT_* commands require 32 byte aligned pointers. So, we need to increase this from 16 to 32. Reviewed-by: Matt Turner <[email protected]>
* i965: Switch to absolute addressing for constant buffer 0.Kenneth Graunke2017-07-134-0/+37
| | | | | | | | | | | | | | | | | By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic state base address. This makes it unusable for pushing UBOs. I'd like to be able to use all four push buffers. There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake) which controls whether buffer 0 is relative to dynamic state base address, or simply a normal pointer. Setting that gives us full flexibility. We can't currently write this on Haswell and earlier, and will need to update the kernel command parser, and then do the whole version checking song and dance. Reviewed-by: Matt Turner <[email protected]>
* i965: Use async maps for BufferSubData to regions with no valid data.Kenneth Graunke2017-07-131-1/+3
| | | | | | | | | | | | | | | When writing a region of a buffer via glBufferSubData(), we can write the data asynchronously if the destination doesn't contain any data. Even if it's busy, the data was undefined, so the new data is fine too. Removes all stall avoidance blits on BufferSubData calls in "Total War: WARHAMMER" on my Skylake GT4. Decreases the number of stall avoidance blits in Manhattan 3.1: - Skylake GT4: -18.3544% +/- 6.76483% (n=13) - Apollolake: -12.1095% +/- 5.24458% (n=13) Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Track a range of the buffer which contains valid data.Kenneth Graunke2017-07-132-4/+48
| | | | | Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add a "write" parameter to intel_bufferobj_buffer.Kenneth Graunke2017-07-139-19/+26
| | | | | | | This doesn't do anything yet, but soon we'll want to know whether an access to a buffer section may write that data, or simply reads it. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Convert GS_STATE to genxml.Rafael Antognolli2017-07-135-172/+54
| | | | | | | | Merge the code with gen6+ 3DSTATE_GS, and delete brw_gs_state.c, together with brw_gs_unit_state. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Prepare gs_state emitting code to include gen4-5.Rafael Antognolli2017-07-131-13/+11
| | | | | | | | | | | | Since we always call brw_batch_emit anyways, we can hopefully make things simpler by calling it only once, and then branching inside its body. This can be helpful when bringing the gen4-5 code into this function. Additionally, check for GEN_GEN == 6 instead of < 7 in cases that won't apply to lower gens. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove upload_gs_state_for_tf.Rafael Antognolli2017-07-134-60/+16
| | | | | | | | | | | | This function only emits a particular case of 3DSTATE_GS. Instead, we can do that inside genX(upload_gs_state), and later reuse part of that code for emitting gen4-5 state. There's the additional benefit of allowing us to remove gen6_gs_state.c, which was only left because of this function. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Convert BLEND_CONSTANT_COLOR state to genxml.Rafael Antognolli2017-07-133-64/+27
| | | | | | | It's a very simple conversion, and it allows us to delete brw_cc.c. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Convert CC state on gen4-5 to genxml.Rafael Antognolli2017-07-135-284/+68
| | | | | | | | | | | | | Use set_blend_entry_bits and set_depth_stencil_bits to fill most of the color calc struct, and then manually update the rest. v2: - Always check for depth_irb (Ken) - Always set Backface Stencil Ref (Ken) - Always set alpha reference value (Ken) Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move color calc code around a bit.Rafael Antognolli2017-07-131-8/+8
| | | | | | | This makes the code more consistent accross generations. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Check for alpha channel just like in gen6+.Rafael Antognolli2017-07-131-1/+4
| | | | | | | | | | | | | | gen6+ uses _mesa_base_format_has_channel() to check for the alpha channel, while gen4-5 use ctx->DrawBuffer->Visual.alphaBits. By using _mesa_base_format_has_channel() here we keep the same behavior accross all gen. While initially both ways of checking the alpha channel seemed correct to me, this change also seems to fix fbo-blending-formats piglit test on gen4. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>