summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* mesa: Make get_shader_flags publicly availableGregory Hainaut2014-02-212-3/+6
| | | | | | | | | | Future patches will use this function outside shaderapi.c. This was originally included in another patch, but it was split out by Ian Romanick. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa/sso: Add extension entry points for GL_ARB_separate_shader_objectsGregory Hainaut2014-02-2112-44/+1154
| | | | | | | | | | | | | | | | | | | | | | | | | Nothings implemented yet but glProgramUniform* which are mostly a copy/paste of the older function glUniform* I create dedicated pipelineobj.[ch] file that will contains function related to the "new" pipeline container object. V2: formatting improvement V3: * indentation fix * Update copyright * Add a comment on ProgramParameteri already present in another extension * Remove TODO, will be readded on correct patch V4 (idr): * Fix dispatch_sanity unit test * Make extension string available in core profiles (instead of just compatibility). * Trivial reformating Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* glsl/sso: Add parser and AST-to-HIR support for separate shader object layoutsIan Romanick2014-02-214-13/+71
| | | | | | | | | | | | GL_ARB_separate_shader_objects adds the ability to specify location layouts for interstage inputs and outputs. In addition, this extension makes 'in' and 'out' generally available for shader inputs and outputs. This mimics the behavior of GL_ARB_explicit_attrib_location. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa/sso: Add extension tracking for ARB_separate_shader_objectsIan Romanick2014-02-214-0/+10
| | | | | | | This adds the necessary bits for both the API and the GLSL compiler. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa: Refactor per-stage link check to its own functionIan Romanick2014-02-211-68/+34
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Stop throwing away our double precision for time calculations.Eric Anholt2014-02-213-4/+4
| | | | | | Fixes negative times being reported in our perf debug. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Add support for integer blits.Eric Anholt2014-02-212-7/+71
| | | | | | | | Compared to i965, the code generated doesn't use the AVG instruction. But I'm not sure that multisampled integer resolves are really that important to worry about. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Add support for doing MSAA to MSAA blits.Eric Anholt2014-02-212-49/+104
| | | | | | | These are non-stretched, non-resolving blits, so it's just a matter of sampling once from our gl_SampleID and storing that to our color/depth. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Save and restore a bunch of MSAA state.Eric Anholt2014-02-212-4/+38
| | | | | | | | | | We're disabling GL_MULTISAMPLE, so we didn't need to worry about a lot of that state. But to do MSAA to MSAA blits, we need to start handling more state. v2: Fix pasteo caught by Kenneth. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Try to do blending of sRGB values in linear colorspace.Eric Anholt2014-02-211-5/+25
| | | | | | | Blending of values would occur when doing GL_LINEAR filtering with scaling, and in an upcoming commit when doing MSAA resolves. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Add support for doing multisample resolves.Eric Anholt2014-02-212-12/+197
| | | | | | | | | Note that this doesn't handle GL_EXT_multisample_scaled_blit yet. The i965 code for that extension bakes in knowledge of the sample positions (well, knowledge of the sample positions aligned to a lower-resolution grid), which we would have to do at runtime somehow for meta. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix miptree matching for multisampled, non-interleaved miptrees.Eric Anholt2014-02-212-1/+16
| | | | | | | | We haven't been executing this code before the meta-blit case, because we've been flagging the miptree as validated at texstorage time, and never having to revalidate. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Remove unnecessary condition.Courtney Goeltzenleuchter2014-02-211-2/+1
| | | | | | | | | Identified by Valgrind memory check. Initialized block-opaque in a different patch. This test seems unnecessary. If opaque must be true, just set to true. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Courtney Goeltzenleuchter <[email protected]>
* clover: Unabbreviate a few data accessor names for consistency.Francisco Jerez2014-02-219-28/+28
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Replace the transfer(new ...) idiom with a safer create(...) helper ↵Francisco Jerez2014-02-216-56/+43
| | | | | | function. Tested-by: Tom Stellard <[email protected]>
* clover: Migrate a bunch of pointers and references in the object tree to ↵Francisco Jerez2014-02-2129-163/+179
| | | | | | smart references. Tested-by: Tom Stellard <[email protected]>
* clover: Allow storing a range into a container of different (but compatible) ↵Francisco Jerez2014-02-211-7/+7
| | | | | | element type. Tested-by: Tom Stellard <[email protected]>
* clover: Define an intrusive smart reference class.Francisco Jerez2014-02-211-0/+66
| | | | Tested-by: Tom Stellard <[email protected]>
* clover: Some improvements for the intrusive pointer class.Francisco Jerez2014-02-216-30/+45
| | | | | | | | Define some additional convenience operators, clean up the implementation slightly, and rename it to 'intrusive_ptr' for reasons that will be obvious in the next commit. Tested-by: Tom Stellard <[email protected]>
* clover: Fix up NULL constant pointer arguments.Francisco Jerez2014-02-211-1/+2
| | | | Tested-by: Tom Stellard <[email protected]>
* tgsi_ureg: add property_gs_invocationsJordan Justen2014-02-202-0/+11
| | | | | | | | Fixes a build break in state_tracker/st_program.c Signed-off-by: Jordan Justen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75278 Reviewed-by: Dave Airlie <[email protected]>
* i965/fs: Implement FS_OPCODE_[UN]PACK_HALF_2x16_SPLIT[_XY] opcodes.Kenneth Graunke2014-02-202-2/+81
| | | | | | | | | | | | I'd neglected to port these to Broadwell. Most of this code is copy and pasted from Gen7, but instead of using F32TO16/F16TO32, we just use MOV with HF register types. Fixes fs-packHalf2x16 and fs-unpackHalf2x16 tests (both the ARB extension and ES 3.0 variants). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Drop bogus F32TO16/F16TO32 instructions on Broadwell - use MOV.Kenneth Graunke2014-02-203-6/+6
| | | | | | | | | | | | | | | Broadwell removed the F32TO16 and F16TO32 instructions. However, it has actual support for HF values, so they're actually just MOV. Fixes vs-packHalf2x16 and vs-unpackHalf2x16 tests (both the ARB extension and ES 3.0 variants). v2: Emulate F32TO16's align16 zeroing bug, since Chad's front end code relies on it happening. We can probably refactor this code to be better later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Create a hardware context before initializing state module.Kenneth Graunke2014-02-201-6/+6
| | | | | | | | | | | | | | | | | | | | | brw_init_state() calls brw_upload_initial_gpu_state(). If hardware contexts are enabled (brw->hw_ctx != NULL), this will upload some initial invariant state for the GPU. Without hardware contexts, we rely on this state being uploaded via atoms that subscribe to the BRW_NEW_CONTEXT bit. Commit 46d3c2bf4ddd227193b98861f1e632498fe547d8 accidentally moved the call to brw_init_state() before creating a hardware context. This meant brw_upload_initial_gpu_state would always early return. Except on Gen6+, we stopped uploading the initial GPU state via state atoms, so it never happened. Fixes a regression since 46d3c2bf4ddd227193b98861f1e632498fe547d8. Cc: "10.0 10.1" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Implement scratch read/write support for Broadwell.Kenneth Graunke2014-02-201-9/+109
| | | | | | | | | | | | | | To make sure that both the Gen4 and Gen7 style messages work, I initially disabled the SHADER_OPCODE_GEN7_SCRATCH_READ optimization, ran Piglit, re-enabled it, and ran Piglit again. Both worked fine. Fixes 40 Piglit tests (most of the varying-packing category). v2: Move num_regs assertion from gen8_fs_generator to gen8_set_dp_scratch_message() (suggested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add Gen8 assembly support for DP Scratch messages.Kenneth Graunke2014-02-202-0/+47
| | | | | | | | | | The new accessors will make it easy to do Gen7-style scratch messages. v2: Move num_regs assertion from gen8_fs_generator into gen8_set_dp_scratch_message() (suggested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Store absolute thread count in max_wm_threads on Broadwell.Kenneth Graunke2014-02-202-2/+5
| | | | | | | | | | | | | | | | | | In the past, 3DSTATE_PS took an absolute number of threads. Conversely, on Broadwell you always program 64, and it implicitly scales based on the GT-level with no special programming. So, I stored 64 in brw_device_info::max_wm_threads. However, I didn't realize that we also use max_wm_threads to compute the size of the scratch space buffer. In that case, we really need the absolute number of threads. This patch hardcodes 3DSTATE_PS to use the value it expects, and changes max_wm_threads back to a (completely fake) absolute thread count (once again copied from Haswell). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use MOV, not OR for setting URB write channel enables on Gen8+.Kenneth Graunke2014-02-201-5/+2
| | | | | | | | | | | | On Broadwell, g0.5 contains the "Scratch Space Pointer"; using OR puts some bits of that into "ignored" sections of our message header. While this doesn't hurt, it's also not terribly /useful/. Using MOV is sufficient to set the only interesting bits in this part of the message header. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Implement a CS stall workaround on Broadwell.Kenneth Graunke2014-02-201-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the latest documentation, any PIPE_CONTROL with the "Command Streamer Stall" bit set must also have another bit set, with five different options: - Render Target Cache Flush - Depth Cache Flush - Stall at Pixel Scoreboard - Post-Sync Operation - Depth Stall I chose "Stall at Pixel Scoreboard" since we've used it effectively in the past, but the choice is fairly arbitrary. Implementing this in the PIPE_CONTROL emit helpers ensures that the workaround will always take effect when it ought to. Apparently, this workaround may be necessary on older hardware as well; for now I've only added it to Broadwell as it's absolutely necessary there. Subsequent patches could add it to older platforms, provided someone tests it there. v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits are set (suggested by Ian Romanick). v3: Prefix the function with "gen8" (requested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v2) Reviewed-by: Eric Anholt <[email protected]>
* i965: support instanced GS on gen7Jordan Justen2014-02-205-2/+11
| | | | | | | | | | v3: * Properly prevent dual object mode execution when the invocation count > 1 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: support gl_InvocationID for gen7Jordan Justen2014-02-205-3/+49
| | | | | | | | | | | | | v2: * Make gl_InvocationID a system value v3: * Properly shift from R0.1 into DST.4 by adding GS_OPCODE_GET_INSTANCE_ID Signed-off-by: Jordan Justen <[email protected]> Acked-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: add gl_InvocationID variable for ARB_gpu_shader5Jordan Justen2014-02-202-0/+3
| | | | | | | | | v2: * Make gl_InvocationID a system value Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main/shaderapi: GL_GEOMETRY_SHADER_INVOCATIONS GetProgramiv supportJordan Justen2014-02-201-0/+6
| | | | | | | | | v3: * Add check for ARB_gpu_shader5 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: initialize gl_geometry_program Invocations fieldJordan Justen2014-02-205-0/+5
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl/linker: produce gl_shader_program Geom.InvocationsJordan Justen2014-02-203-0/+31
| | | | | | | | | | Grab the parsed invocation count, check for consistency during linking, and finally save the result in gl_shader_program Geom.Invocations. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: parse invocations layout qualifier for ARB_gpu_shader5Jordan Justen2014-02-203-0/+43
| | | | | | | | | | | | _mesa_glsl_parse_state in_qualifier->invocations will store the invocations count. v3: * Use in_qualifier to allow the primitive to be specied separately from the invocations count (merge_qualifiers) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: Generate error for invalid input layout declarationsJordan Justen2014-02-201-0/+13
| | | | | | | | Fixes various piglit tests: spec/glsl-1.50/compiler/incorrect-in-layout-qualifier-*.geom Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: convert GS input primitive to use ast_type_qualifierJordan Justen2014-02-206-66/+96
| | | | | | | | | | | | | | | | | | | | | | | | We introduce a new merge_in_qualifier ast_type_qualifier which allows specialized handling of merging input layout qualifiers. By merging layout qualifiers into state->in_qualifier, we allow multiple input qualifiers. For example, the primitive type can be specified specified separately from the invocations count (ARB_gpu_shader5). state->gs_input_prim_type is moved into state->in_qualifier->prim_type state->gs_input_prim_type_specified is still processed separately so we can determine when the input primitive is specified. This is important since certain scenerios are not supported until after the primitive type has been specified in the shader code. v4: * Merge with compute shader input layout qualifiers Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Fix extra return value after winsys rb update refactor.Eric Anholt2014-02-201-1/+1
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75172 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use samplers for UBOs in the VS like we do for non-UBO pulls.Eric Anholt2014-02-201-5/+18
| | | | | | | Improves performance of a dolphin emulator trace I had laying around by 3.60131% +/- 0.995887% (n=128). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add an optimization pass to remove redundant flags movs.Eric Anholt2014-02-202-0/+34
| | | | | | | | | | | | | | We generate steaming piles of these for the centroid workaround, and this quickly cleans them up. total instructions in shared programs: 1591228 -> 1590047 (-0.07%) instructions in affected programs: 26111 -> 24930 (-4.52%) GAINED: 0 LOST: 0 (Improved apps are l4d2, csgo, and dolphin) Reviewed-by: Matt Turner <[email protected]>
* gallivm: add smallfloat to float conversion not relying on cpu denorm handlingRoland Scheidegger2014-02-201-20/+65
| | | | | | | | | | | | | | | | | | | | | | The previous code relied on cpu denorm support for converting small float formats (such r11g11b10_float and r16_float) to floats, otherwise denorms are flushed to zero. We worked around that in llvmpipe blend code by reenabling denorms, but this did nothing for texture sampling. Now it would be possible to reenable it there too but I'm not really a fan of messing with fpu flags (and it seems we can't actually do it reliably with llvm in any case looking at some bug reports). (Not to mention if you actually have a lot of denorms in there, you can expect some order-of-magnitude slowdown with x86 cpus.) So instead use code which adjusts exponents etc. directly hence not relying on cpu denorm support for the rescaling mul. (We still need the fpu flag handling as we can't do float-to-smallfloat without using cpu denorms at least for now - I actually wanted to keep both the old and new code and using one or the other depending on from where it's called but that didn't work out as the parameter would have to be passed through too many layers than I'd like.) Reviewed-by: Zack Rusin <[email protected]> Reviewed-by: Si Chen <[email protected]>
* st/omx/enc: add multi scaling buffers for performance improvementLeo Liu2014-02-202-16/+29
| | | | | Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/omx/dec/h264: fix prevFrameNumOffset handlingChristian König2014-02-201-0/+4
| | | | Signed-off-by: Christian König <[email protected]>
* i965: Actually claim to support MSAA on Broadwell.Kenneth Graunke2014-02-192-1/+10
| | | | | | | | | We need to advertise 8x, 4x, and 2x multisamples. Previously, we only claimed to support 0/1 samples. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Update physical width/height munging for 2x IMS MSAA.Kenneth Graunke2014-02-191-1/+6
| | | | | | | | | | | | | | | | | | I can't find any documentation to explain what ought to be done here, so I simply guessed based on the pattern I observed in the 4x/8x cases. It appears to work, but it could be totally wrong. I was able to find the Sandybridge PRM quote from the comments in the latest documentation: Shared Functions > 3D Sampler > Multisampled Surface Behavior. However, it only mentions 4x MSAA - not even 8x. After a substantial amount more digging, I was able to find a second page (incorrectly tagged) which confirmed the formulas in our code for 8x MSAA. However, that page didn't mention 2x MSAA at all. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Enable smooth points when multisampling without point sprites.Kenneth Graunke2014-02-191-1/+5
| | | | | | | | | | | | | | | | | According to the "Point Multisample Rasterization" of the OpenGL specification (3.0 or later), smooth points are supposed to be enabled implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag. However, if GL_POINT_SPRITE is enabled, you get square points no matter what. Core contexts always enable point sprites, so this effectively makes smooth points go away, even in the case of multisampling. Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests. (Yes, that's right folks, we actually have Piglit tests for this.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Thwack multisample enable bit in 3DSTATE_RASTER.Kenneth Graunke2014-02-192-0/+5
| | | | | | | | | | The meaning and effects of this bit are surprisingly complicated. See Rasterization > Windower > Multisampling > Multisample ModesState. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Only use the SIMD16 program for per-sample shading on Broadwell.Kenneth Graunke2014-02-191-9/+32
| | | | | | | | | | | | This restriction carries forward from earlier platforms. The code is ported straight from gen7_wm_state.c. v2: Actually do it right. v3: Add missing _NEW_MULTISAMPLE bit (caught by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Set "Position XY Offset Select" bits in 3DSTATE_PS on Broadwell.Kenneth Graunke2014-02-191-0/+18
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>