summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* meta: Fix make check failures in setup_glsl_msaa_blit_scaled_shader()Anuj Phogat2014-10-011-8/+9
| | | | | | | introduced by commit 68ee950. Signed-off-by: Anuj Phogat <[email protected]> Reported-by: Mark Janes <[email protected]>
* mesa: fix _mesa_alloc_dispatch_table() declarationBrian Paul2014-10-011-1/+1
| | | | Insert 'void' parameter to match declaration in api_exec.h. Trivial.
* meta: (trivial) remove accidental double semicolonRoland Scheidegger2014-10-011-1/+1
|
* i965: Enable EXT_framebuffer_multisample_blit_scaled for gen8Anuj Phogat2014-10-011-2/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* meta: Implement ext_framebuffer_multisample_blit_scaled extensionAnuj Phogat2014-10-012-13/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extension enables doing a multisample buffer resolve and buffer scaling using a single glBlitFrameBuffer() call. Currently, we have this extension implemented in BLORP which is only used by SNB and IVB. This patch implements the extension in meta path which makes it available to Broadwell. Implementation features: - Supports scaled resolves of 2X, 4X and 8X multisample buffers. - Avoids unnecessary shader compilations by storing the pre compiled shaders for each supported sample count. - Uses bilinear filtering for both GL_SCALED_RESOLVE_FASTEST_EXT and GL_SCALED_RESOLVE_NICEST_EXT filter options. This is an allowed behavior in the extension's spec. - I tried doing bicubic filtering for GL_SCALED_RESOLVE_NICEST_EXT filter. It made the edges in the image look little smoother but the image gets blurred causing no overall quality improvement. For now I have dropped the idea of doing different filtering for nicest filter. V2: - Minor changes to simplify the fragment shader. - Refactor the code to move i965 specific sample_map computation out of Meta. We now use ctx->Const.SampleMap{2,4,8}x variables initialized by the driver. - Use a simple msaa resolve shader for scaled resolves with scaling factor = 1.0. V3: - Make changes to create a string out of ctx->Const.SampleMap{2,4,8}x variables and use it in fragment shader. V4: - Make changes to use uint8_t type ctx->Const.SampleMap{2,4,8}x variables. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Initialize the SampleMap{2,4,8}x variablesAnuj Phogat2014-10-013-0/+55
| | | | | | | | | | | | | with values specific to Intel hardware. V2: Define and use gen6_get_sample_map() function to initialize the variables. V3: Change the function name to gen6_set_sample_maps() and use memcpy() to fill in the data. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa: Add new variables in gl_context to store sample layoutAnuj Phogat2014-10-011-0/+32
| | | | | | | | | | | | | SampleMap{2,4,8}x variables are used in later patches to implement EXT_framebuffer_multisample_blit_scaled extension. V2: Use integer array instead of a string. Bump up the comment. V3: Use uint8_t type array. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* st/va: implement vlVa(Query|Create|Get|Put|Destroy)ImageLeo Liu2014-10-013-9/+281
| | | | | | | | | | | | | | This patch implements functions for images support, which basically supports copy data between video surface and user buffers, in this case supports SW decode, and other video output v2: fix buffer size for odd-sized image case expose I420 format as well v3: fix YUV 4:2:2 format data buffer size cleanup I420 format exposure Signed-off-by: Leo Liu <[email protected]>
* st/va: implement Picture functions for mpeg2 h264 and vc1Christian König2014-10-014-6/+371
| | | | | | | | This patch implements codec for mpeg2 h264 and vc1, populates codec parameters and pass them to HW driver. Signed-off-by: Christian König <[email protected]> Signed-off-by: Leo Liu <[email protected]>
* st/va: implement Context Surface and BufferChristian König2014-10-014-21/+320
| | | | | | | | | | | | | | This patch implements context managements, relate it HW driver, functions for video surface managements, and functions for application data memory buffer managements. implemented functions: vlVa(Create|Destroy)Context vlVa(Create|Destroy|Put)Surfaces vlVa(Create|Destroy)Buffer Signed-off-by: Christian König <[email protected]> Signed-off-by: Leo Liu <[email protected]>
* st/va: implement vlVa(Create|Destroy|Query|Get)ConfigChristian König2014-10-013-5/+143
| | | | | | | | | | | This patch is for application to query configuration, such as profiles, entrypoints, and attributes v2: fix missing profile with query Signed-off-by: Michael Varga <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Leo Liu <[email protected]>
* st/va: skeleton VAAPI state trackerChristian König2014-10-0116-0/+1052
| | | | | | | | | | | | | | | | This patch adds a skeleton VA-API state tracker, which is filled with live in the subsequent patches. v2: fixes in configure.ac and va state_tracker Makefile.am v3: do not link against libva. detect libva version, and correctly set driver entrypoint name. rebase(cleanup) targets/va/Makefile.am v4: cleanup va version auto detection add back targets/va/va.sym Signed-off-by: Christian König <[email protected]> Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* st/vdpau: move common functions to utilLeo Liu2014-10-012-77/+78
| | | | | | | | Break out these functions so that they can be shared with a other state trackers. They will be used in subsequent patches for the new VA-API state tracker. Signed-off-by: Leo Liu <[email protected]>
* freedreno: max-texture-lod-bias should be 15.0fRob Clark2014-10-011-1/+1
| | | | | | Fixes piglit lodbias test. Signed-off-by: Rob Clark <[email protected]>
* mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.Kenneth Graunke2014-10-011-0/+6
| | | | | | | | | Cuts the number of i965 color calculator viewport uploads by 100x (11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.Kenneth Graunke2014-10-011-1/+1
| | | | | | | | | I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG, which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not needed here anyway - only SBE needs it. Just a copy and paste mistake. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Drop brwBindProgram driver hook.Kenneth Graunke2014-10-011-20/+0
| | | | | | | | | | | | | | | | This function flagged BRW_NEW_*_PROGRAM When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM. However, brw_upload_state also checks for that changing, sets the same flags, and also updates brw->fragment_program and so on. So, this looks to be entirely redundant. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add missing /* BRW_NEW_FRAGMENT_PROGRAM */ comments.Kenneth Graunke2014-10-013-6/+7
| | | | | | | | | I had to dig a bit to figure out why this was necessary. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use "1ull" instead of "1" in BRW_NEW_* defines.Kenneth Graunke2014-10-011-32/+32
| | | | | | | | | | Now that the bitfield is a uint64_t, we should use 1ull. Currently, we only have 32 entries, so 1 works fine, but it's not future-proof. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.Kenneth Graunke2014-10-013-4/+4
| | | | | | | | | ~0 is 0xFFFFFFFF, which only covers the first 32 bits. We need all 64. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Fix INTEL_DEBUG=state to work with 64-bit dirty bits.Kenneth Graunke2014-10-011-16/+7
| | | | | | | | | | | This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits beyond 1 << 31. We missed doing this when widening the driver flags from uint32_t to uint64_t. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.Kenneth Graunke2014-10-012-3/+0
| | | | | | | | | Unused since krh rewrote fast clears to use meta. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Fix typo in commentChris Forbes2014-10-011-1/+1
| | | | Signed-off-by: Chris Forbes <[email protected]>
* i965: Fix spelling of GEN7_SAMPLER_EWA_ANISOTROPIC_ALGORITHMChris Forbes2014-10-012-2/+2
| | | | Signed-off-by: Chris Forbes <[email protected]>
* llvmpipe: Add missing LLVMGetGlobalContext() arg in lp_test_format.c.Vinson Lee2014-09-301-1/+1
| | | | | | | | | | | | | | | | | | Fix build error introduced with commit eedbce9c63a3f385908bdc8a69e8be98dd3522ff. lp_test_format.c: In function ‘test_format_unorm8’: lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’ gallivm = gallivm_create("test_module_unorm8"); ^ In file included from ../../../../src/gallium/auxiliary/gallivm/lp_bld_format.h:38:0, from lp_test_format.c:42: ../../../../src/gallium/auxiliary/gallivm/lp_bld_init.h:58:1: note: declared here gallivm_create(const char *name, LLVMContextRef context); ^ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84538 Signed-off-by: Vinson Lee <[email protected]>
* glx/dri3: Provide error diagnostics when DRI3 allocation failsKeith Packard2014-09-301-8/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of just segfaulting in the driver when a buffer allocation fails, report error messages indicating what went wrong so that we can debug things. As a simple example, chromium wraps Mesa in a sandbox which doesn't allow access to most syscalls, including the ability to create shared memory segments for fences. Before, you'd get a simple segfault in mesa and your 3D acceleration would fail. Now you get: $ chromium --disable-gpu-blacklist [10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018 libGL: pci id for fd 12: 8086:0a16, driver i965 libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted. libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted. libGL error: DRI3 Fence object allocation failure Operation not permitted [10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize. [10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed. [10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer. This made it pretty easy to diagnose the problem in the referenced bug report. Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681 Signed-off-by: Keith Packard <[email protected]> Cc: [email protected] Reviewed-by: Matt Turner <[email protected]>
* glx/dri3: Use four buffers until X driver supports async flipsKeith Packard2014-09-302-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A driver which doesn't have async flip support will queue up flips without any way to replace them afterwards. This means we've got a scanout buffer pinned as soon as we schedule a flip and so we need another buffer to keep from stalling. When vblank_mode=0, if there are only three buffers we do: current scanout buffer = 0 at MSC 0 Render frame 1 to buffer 1 PresentPixmap for buffer 1 at MSC 1 This is sitting down in the kernel waiting for vblank to become the next scanout buffer Render frame 2 to buffer 2 PresentPixmap for buffer 2 at MSC 1 This cannot be displayed at MSC 1 because the kernel doesn't have any way to replace buffer 1 as the pending scanout buffer. So, best case this will get displayed at MSC 2. Now we block after this, waiting for one of the three buffers to become idle. We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1 because it's sitting in the kernel waiting to become the next scanout buffer and we can't use buffer 2 because that's the most recent frame which will become the next scanout buffer if the application doesn't manage to generate another complete frame by MSC 2. With four buffers, we get: current scanout buffer = 0 at MSC 0 Render frame 1 to buffer 1 PresentPixmap for buffer 1 at MSC 1 This is sitting down in the kernel waiting for vblank to become the next scanout buffer Render frame 2 to buffer 2 PresentPixmap for buffer 2 at MSC 1 This cannot be displayed at MSC 1 because the kernel doesn't have any way to replace buffer 1 as the pending scanout buffer. So, best case this will get displayed at MSC 2. The X server will queue this swap until buffer 1 becomes the scanout buffer. Render frame 3 to buffer 3 PresentPixmap for buffer 3 at MSC 1 As soon as the X server sees this, it will replace the pending buffer 2 swap with this swap and release buffer 2 back to the application Render frame 4 to buffer 2 PresentPixmap for buffer 2 at MSC 1 Now we're in a steady state, flipping between buffer 2 and 3 waiting for one of them to be queued to the kernel. ... current scanout buffer = 1 at MSC 1 Now buffer 0 is free and (e.g.) buffer 2 is queued in the kernel to be the scanout buffer at MSC 2 Render frames, flipping between buffer 0 and 3 When the system can replace a queued buffer, and we update Present to take advantage of that, we can use three buffers and get: current scanout buffer = 0 at MSC 0 Render frame 1 to buffer 1 PresentPixmap for buffer 1 at MSC 1 This is sitting waiting for vblank to become the next scanout buffer Render frame 2 to buffer 2 PresentPixmap for buffer 2 at MSC 1 Queue this for display at MSC 1 1. There are three possible results: 1) We're still before MSC 1. Buffer 1 is released, buffer 2 is queued waiting for MSC 1. 2) We're now after MSC 1. Buffer 0 was released at MSC 1. Buffer 1 is the current scanout buffer. a) If the user asked for a tearing update, we swap scanout from buffer 1 to buffer 2 and release buffer 1. b) If the user asked for non-tearing update, we queue buffer 2 for the MSC 2. In all three cases, we have a buffer released (call it 'n'), ready to receive the next frame. Render frame 3 to buffer n PresentPixmap for buffer n If we're still before MSC 1, then we'll ask to present at MSC 1. Otherwise, we'll ask to present at MSC 2. Present already does this if the driver offers async flips, however it does this by waiting for the right vblank event and sending an async flip right at that point. I've hacked the intel driver to offer this, but I get tearing at the top of the screen. I think this is because flips are always done from within the ring, and so the latency between the vblank event and the async flip happening can cause tearing at the top of the screen. That's why I'm keying the need for the extra buffer on the lack of 2D driver support for async flips. Signed-off-by: Keith Packard <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Tested-by: Dylan Baker <[email protected]>
* i965/fs: Fix the buildJason Ekstrand2014-09-301-1/+1
|
* i965/fs: Fix an uninitialized value warningsJason Ekstrand2014-09-301-3/+4
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* galahad: fix indirect drawRoland Scheidegger2014-10-011-2/+9
| | | | | | | | Need to unwrap the indirect resource otherwise bad things will happen. Fixes random crashes and timeouts with piglit's arb_indirect_draw tests. Reviewed-by: Jose Fonseca <[email protected]>
* galahad: (trivial) handle cubemap arraysRoland Scheidegger2014-10-011-0/+7
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* i965/fs: Emit compressed BFI2 instructions on Gen > 7.Matt Turner2014-09-301-1/+1
| | | | | | | IVB had a restriction that prevented us from emitting compressed three-source instructions, and although that was lifted on Haswell, Haswell had a new restriction that said BFI instructions specifically couldn't be compressed.
* i965/fs: Allow SIMD16 borrow/carry/64-bit multiply on Gen > 7.Matt Turner2014-09-301-3/+3
| | | | | | | These checks were intended for Gen 7 only. None of these restrictions apply to Gen 8. Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Set MUL source type to W/UW in 64-bit mul macro on Gen8.Matt Turner2014-09-301-1/+22
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Optimize sqrt+inv into rsq.Matt Turner2014-09-301-0/+11
| | | | | | | | | | | | | | | | | | Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b The improvement here is that we've broken a dependency between these instructions. Leads to 330 fewer INV instructions and 330 more RSQ. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Optimize sqrt+inv into rsq.Matt Turner2014-09-301-0/+11
| | | | | | | | | | | | | | | | | | | | | | | Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b In most cases the sqrt's result is still used, so the improvement here is that we've broken a dependency between these instructions. Leads to 80 fewer INV instructions and 80 more RSQ. Occasionally the sqrt's result is no longer used, leading to: instructions in affected programs: 5005 -> 4949 (-1.12%) Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Call opt_algebraic after opt_cse.Matt Turner2014-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next patch adds an algebraic optimization for the pattern sqrt a, b rcp c, a and turns it into sqrt a, b rsq c, b but many vertex shaders do a = sqrt(b); var1 /= a; var2 /= a; which generates sqrt a, b rcp c, a rcp d, a If we apply the algebraic optimization before CSE, we'll end up with sqrt a, b rsq c, b rcp d, a Applying CSE combines the RCP instructions, preventing this from happening. No shader-db changes. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Extend predicated break pass to predicate WHILE.Matt Turner2014-09-301-0/+36
| | | | | | | | Helps a handful of programs in Serious Sam 3 that use do-while loops. instructions in affected programs: 16114 -> 16075 (-0.24%) Reviewed-by: Ian Romanick <[email protected]>
* gallivm: Fix build for LLVM 3.2Mathias Fröhlich2014-10-014-3/+21
| | | | | | | | | | | | Do not rely on LLVMMCJITMemoryManagerRef being available. The c binding to the memory manager objects only appeared on llvm-3.4. The change is based on an initial patch of Brian Paul. Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* freedreno: destroy transfer pool after blitterRob Clark2014-09-301-2/+2
| | | | | | | | Blitter can still have transfers hanging around which it frees in util_blitter_destroy(). So let it clean up before we yank the transfer_pool from under it. Signed-off-by: Rob Clark <[email protected]>
* freedreno/lowering: fix token calculation for loweringRob Clark2014-09-301-16/+39
| | | | | | | Indirect registers consume an additional token. Try to clean up the token calculation math a bit, and fix it at the same time. Signed-off-by: Rob Clark <[email protected]>
* i965/fs: Don't make a name for a vector splitting temporaryIan Romanick2014-09-301-3/+8
| | | | | | | | | | If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Don't make a name for the function return variableIan Romanick2014-09-301-4/+7
| | | | | | | | | | If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Don't allocate a name for ir_var_temporary variablesIan Romanick2014-09-304-0/+28
| | | | | | | | | | | | | | | | Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0 After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0 Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0 After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0 A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Use ir_var_temporary for compiler generated temporariesIan Romanick2014-09-303-3/+4
| | | | | | | | | | These few places were using ir_var_auto for seemingly no reason. The names were not added to the symbol table. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add context-level controls for whether temporaries have real namesIan Romanick2014-09-304-0/+23
| | | | | | | | | No change Valgrind massif results for a trimmed apitrace of dota2. v2: Minor rebase on _mesa_init_constants changes. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Never put ir_var_temporary variables in the symbol tableIan Romanick2014-09-305-5/+14
| | | | | | | | | | | Later patches will give every ir_var_temporary the same name in release builds. Adding a bunch of variables named "compiler_temp" to the symbol table can only cause problems. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Add the possibility for ir_variable to have a non-ralloced nameIan Romanick2014-09-303-2/+30
| | | | | | | | | | Specifically, ir_var_temporary variables constructed with a NULL name will all have the name "compiler_temp" in static storage. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Store ir_variable_data::_num_state_slots and ::binding in 16-bits eachIan Romanick2014-09-301-8/+16
| | | | | | | | | | | | | | | | | Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0 After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0 Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0 After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0 A real savings of 173KiB on 32-bit and no change on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: Squish ir_variable::max_ifc_array_access and ::state_slots togetherIan Romanick2014-09-303-36/+48
| | | | | | | | | | | | | | | | | | | | | | | | At least one of these pointers must be NULL, and we can determine which will be NULL by looking at other fields. Use this information to store both pointers in the same location. If anyone can think of a better name for the union than "u", I'm all ears. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0 After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0 Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0 After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>