summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* intel/blorp: Drop blorp_resolve_ccs_attachmentJason Ekstrand2017-11-272-61/+20
| | | | | | | | | | The only reason why we needed that version was because the Vulkan driver needed to be able to create the surface states so it could handle indirect clear colors. Now that blorp handles them natively, there's no need for the extra entrypoint. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Let blorp handle indirect clear colors for CCS resolvesJason Ekstrand2017-11-273-67/+20
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* anv: Move get_fast_clear_state_address into anv_private.hJason Ekstrand2017-11-272-50/+33
| | | | | | | While we're at it, we break it into two nicely named functions. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/blorp: Take a range of layers in blorp_ccs_resolveJason Ekstrand2017-11-273-4/+8
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/blorp: Add initial support for indirect clear colorsJason Ekstrand2017-11-276-0/+109
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/blorp: Use a designated initializer for blorp_surfJason Ekstrand2017-11-271-8/+9
| | | | | | | | This way uninitialized fields get automatically zeroed and it's safe to add more fields to blorp_surf. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/blorp: Add fast-clear to the special case in MSAA resolvesJason Ekstrand2017-11-271-2/+9
| | | | | | | | | | | | This doesn't go all the way of avoiding the txf_ms if it's fast-cleared, however it does at least make us only do it once. This should improve performance of MSAA resolves in the presence of lots of clear color. Without the patch, enabling fast-clears in the multisampling Sascha demo drops the framerate by about 10%. With this patch, enabling fast-clears increases the demo's framerate by 25%. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/blorp/blit: Rename blorp_nir_txf_ms_mcsJason Ekstrand2017-11-271-4/+5
| | | | | | | | | That name is already taken by one of the helpers in blorp_nir_builder.h and, while we haven't moved the guts of blorp_blit.c there yet, we'd like to start using some things from that header. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* st/glsl_to_tgsi: make use of driver_cache_blob with the disk cacheTimothy Arceri2017-11-284-231/+110
| | | | | | | | | | | | driver_cache_blob was introduced with the i965 disk cache, it allows us to simplify the cache a little and possibly offers some minor speed improvements since we load the GLSL metadata and TGSI from disk in one pass. Using driver_cache_blob should also make it straight forward to implement binary support for ARB_get_program_binary in gallium. Reviewed-by: Marek Olšák <[email protected]>
* glsl: Fix typo nagivation -> navigationGwan-gyeong Mun2017-11-281-1/+1
| | | | | | Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* gl_table.py: add extern C guard for the generated glapitable.hEmil Velikov2017-11-271-0/+8
| | | | | | | | | | The header can be included from C++, hence contents should have appropriate notation. Cc: [email protected] Cc: Dylan Baker <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* ac: pack legacy_surf_level betterMarek Olšák2017-11-271-3/+3
| | | | | | r600_texture: 1488 -> 1248 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: change legacy_surf_level::slice_size to dword unitsMarek Olšák2017-11-2712-36/+38
| | | | | | | | | The next commit will reduce the size even more. v2: typecast to uint64_t manually v3: add more typecasts, add asserts Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: pack ac_surface betterMarek Olšák2017-11-273-11/+12
| | | | | | r600_texture: 1736 -> 1488 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always initialize max_forced_staging_uploadsMarek Olšák2017-11-271-0/+2
| | | | | | | | | r600_resource is malloc'd. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103808 Fixes: 4b0dc098b256 ("gallium/u_threaded: don't map big VRAM buffers for the first upload directly") Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove an old hack for evergreenMarek Olšák2017-11-271-10/+0
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST when profitableMarek Olšák2017-11-271-1/+16
| | | | | | ported from Vulkan Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/nir: don't write tcs outputs to LDS that aren't read back.Dave Airlie2017-11-271-1/+16
| | | | | | | | | | | | If the TCS doesn't read back the outputs, no need to store them to LDS in the first place. (except for tess factors). This seems to give about 50fps (3290->3330) with tessellation demo. I haven't tested if it impacts DoW3 at all. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nir: fill outputs_read field and add patch outputs read (v2)Dave Airlie2017-11-272-12/+30
| | | | | | | | This is to be used for TCS optimisations on radv. v2: don't set written on reads (nha) Reviewed-by: Timothy Arceri <[email protected]>
* r600/eg: dump event type in dumpsDave Airlie2017-11-271-0/+1
| | | | | | This just makes it easier to debug some things. Signed-off-by: Dave Airlie <[email protected]>
* nouveau/compiler: Allow to omit line numbers when printing instructionsTobias Klausmann2017-11-265-4/+13
| | | | | | | | | | | | | | | | This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff! V2: - Use environmental variable (Karol Herbst) V3: - Use the already populated nv50_ir_prog_info to forward information to the print pass (Pierre Moreau) V4: - get rid of default value in PrintPass constructor Signed-off-by: Tobias Klausmann <[email protected]> Reviewed-by: Pierre Moreau <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: try flushing unflushed fences in si_fence_finish even when timeout ↵Nicolai Hähnle2017-11-261-3/+3
| | | | | | | | | | | | | | | | | == 0 Under certain conditions, waiting on a GL sync objects should act like a flush, regardless of the timeout. Portal 2, CS:GO, and presumably other Source engine games rely on this behavior and hang during loading without this fix. Fixes: bc65dcab3bc4 ("radeonsi: avoid syncing the driver thread in si_fence_finish") Signed-off-by: Marek Olšák <[email protected]> Tested-by: Kai Wasserbäch <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103902 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103904
* nv50/ir: move LateAlgebraicOpt to the very endIlia Mirkin2017-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | Memory loads can take offsets, but the SHLADD will often attempt to consume the offsets too. As there may be multiple memory loads with the same base but different offsets, those would end up in a SHLADD instead of the offset of the memory operation. This moves the pass after we've had a chance to attempt to propagate immediate adds into the indirect offset. total instructions in shared programs : 6580681 -> 6567716 (-0.20%) total gprs used in shared programs : 944261 -> 943375 (-0.09%) total shared used in shared programs : 0 -> 0 (0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) total bytes used in shared programs : 60339896 -> 60221504 (-0.20%) local shared gpr inst bytes helped 0 0 555 2698 2698 hurt 0 0 138 336 336 Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: when merging immediates/consts, load directlyIlia Mirkin2017-11-261-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | When a MERGE operation gets its constraint moves added, it susbstantially extends live ranges to be reusing an immediate from earlier in the program (not to mention the silliness of loading an immediate into a register, and then moving into another register). We detect these scenarios and insert moves that take the immediate or constbuf load directly into the register. If it's the last use, then we can just move that operation to the closer location. With SM35 (255 regs) we get these results: total instructions in shared programs : 6583670 -> 6580681 (-0.05%) total gprs used in shared programs : 950818 -> 944261 (-0.69%) total shared used in shared programs : 0 -> 0 (0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) total bytes used in shared programs : 60367456 -> 60339896 (-0.05%) local shared gpr inst bytes helped 0 0 4584 3186 3186 hurt 0 0 55 968 968 I suspect they will be better for SM20 and SM30. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add optimization for modulo by a non-power-of-2 valueIlia Mirkin2017-11-261-0/+15
| | | | | | | | We can still use the optimized division methods which make use of multiplication with overflow. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]>
* nv50/ir: optimize signed integer modulo by pow-of-2Ilia Mirkin2017-11-252-10/+29
| | | | | | | | | It's common to use signed int modulo in GLSL. As it happens, the GLSL specs allow the result to be undefined, but that seems fairly surprising. It's not that much more effort to get it right, at least for positive modulo operators. Signed-off-by: Ilia Mirkin <[email protected]>
* util: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVCMatt Turner2017-11-251-2/+3
| | | | MSVC doesn't support #warning?! Getting really tired of this.
* util: Use preprocessor correctlyMatt Turner2017-11-251-1/+1
| | | | | Fixes: 6a353479a757 ("util: Assume little endian in the absence of platform-specific handling")
* freedreno/a4xx: add ARB_framebuffer_no_attachments supportIlia Mirkin2017-11-252-1/+6
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a4xx: add indirect draw supportIlia Mirkin2017-11-252-0/+33
| | | | | | | | This is a copy of the a5xx logic. Fails a few tests, but basic functionality is there. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: regenerate pm4 header, adjust code for new namesIlia Mirkin2017-11-253-114/+171
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a4xx: add stencil texturing supportIlia Mirkin2017-11-253-12/+35
| | | | | | | Copied from a5xx, should be identical. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xxIlia Mirkin2017-11-257-3/+151
| | | | | | | | | | Unfortunately Adreno A4xx hardware returns incorrect results with the GATHER4 opcodes. As a result, we have to lower to 4 individual texture calls (txl since we have to force lod to 0). We achieve this using offsets, including on cube maps which normally never have offsets. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: allow texture offsets with cube mapsIlia Mirkin2017-11-251-2/+13
| | | | | | | | | | GL doesn't have this, but some hardware supports it. This is convenient for lowering tg4 to plain texture calls, which is necessary on Adreno A4xx hardware. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* util: Fix disk_cache index calculation on big endianMatt Turner2017-11-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cache-test test program attempts to create a collision (using key_a and key_a_collide) by making the first two bytes identical. The idea is fine -- the shader cache wants to use the first four characters of a SHA1 hex digest as the index. The following program unsigned char array[4] = {1, 2, 3, 4}; int *ptr = (int *)array; for (int i = 0; i < 4; i++) { printf("%02x", array[i]); } printf("\n"); printf("%08x\n", *ptr); prints 01020304 04030201 on little endian, and 01020304 01020304 on big endian. On big endian platforms reading the character array back as an int (as is done in disk_cache.c) does not yield the same results as reading the byte array. To get the first four characters of the SHA1 hex digest when we mask with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian platforms. Bugzilla: https://bugs.freedesktop.org/103668 Bugzilla: https://bugs.gentoo.org/637060 Bugzilla: https://bugs.gentoo.org/636326 Fixes: 87ab26b2ab35 ("glsl: Add initial functions to implement an on-disk cache") Reviewed-by: Emil Velikov <[email protected]>
* util: Add a SHA1 unit test programMatt Turner2017-11-252-1/+67
| | | | Reviewed-by: Emil Velikov <[email protected]>
* util: Fix SHA1 implementation on big endianMatt Turner2017-11-251-1/+2
| | | | | | | | | | | The code defines a macro blk0(i) based on the preprocessor condition BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap operation. Unfortunately, if the preprocessor macros used in the test are no defined, then the comparison becomes 0 == 0 and it evaluates as true. Fixes: d1efa09d342b ("util: import sha1 implementation from OpenBSD") Reviewed-by: Emil Velikov <[email protected]>
* util: Assume little endian in the absence of platform-specific handlingMatt Turner2017-11-251-0/+3
|
* mesa: shrink VERT_ATTRIB bitfields to 32 bitsMarek Olšák2017-11-2514-49/+59
| | | | | | There are only 32 vertex attribs now. Reviewed-by: Ian Romanick <[email protected]>
* mesa: remove unused vertex attrib WEIGHTMarek Olšák2017-11-2511-27/+24
| | | | | | | | | | | | We don't support ARB_vertex_blend. Note that the attribute aliasing check for ARB_vertex_program had to be rewritten. vbo_context: 20344 -> 20008 bytes gl_context: 74672 -> 74616 bytes Reviewed-by: Ian Romanick <[email protected]>
* mesa: don't assign numbers to vertex attrib enums manuallyMarek Olšák2017-11-253-133/+133
| | | | | | I plan to remove one of them. Reviewed-by: Ian Romanick <[email protected]>
* gallium/hud: add HUD sharing within a context share groupMarek Olšák2017-11-254-14/+106
| | | | | | | This is needed for profiling multi-context applications like Chrome. One context can record queries and another context can draw the HUD. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: update the HUD interface for multiple contextsMarek Olšák2017-11-259-19/+22
| | | | | | | This is the boring subset of the following commit. All new parameters are optional. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: prevent a crash if the recording context is inactiveMarek Olšák2017-11-251-1/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: separate code for record context init/releaseMarek Olšák2017-11-252-16/+37
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: separate code for draw context init/releaseMarek Olšák2017-11-251-70/+111
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: don't use hud->pipe in hud_parse_env_varMarek Olšák2017-11-251-6/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: use cso_get_pipe_contextMarek Olšák2017-11-256-7/+8
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* cso: add cso_get_pipe_contextMarek Olšák2017-11-252-0/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: pass pipe_context explicitly to most functionsMarek Olšák2017-11-255-64/+57
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>