aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add Gen8 assembly support for DP Scratch messages.Kenneth Graunke2014-02-202-0/+47
| | | | | | | | | | The new accessors will make it easy to do Gen7-style scratch messages. v2: Move num_regs assertion from gen8_fs_generator into gen8_set_dp_scratch_message() (suggested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Store absolute thread count in max_wm_threads on Broadwell.Kenneth Graunke2014-02-202-2/+5
| | | | | | | | | | | | | | | | | | In the past, 3DSTATE_PS took an absolute number of threads. Conversely, on Broadwell you always program 64, and it implicitly scales based on the GT-level with no special programming. So, I stored 64 in brw_device_info::max_wm_threads. However, I didn't realize that we also use max_wm_threads to compute the size of the scratch space buffer. In that case, we really need the absolute number of threads. This patch hardcodes 3DSTATE_PS to use the value it expects, and changes max_wm_threads back to a (completely fake) absolute thread count (once again copied from Haswell). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use MOV, not OR for setting URB write channel enables on Gen8+.Kenneth Graunke2014-02-201-5/+2
| | | | | | | | | | | | On Broadwell, g0.5 contains the "Scratch Space Pointer"; using OR puts some bits of that into "ignored" sections of our message header. While this doesn't hurt, it's also not terribly /useful/. Using MOV is sufficient to set the only interesting bits in this part of the message header. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Implement a CS stall workaround on Broadwell.Kenneth Graunke2014-02-201-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the latest documentation, any PIPE_CONTROL with the "Command Streamer Stall" bit set must also have another bit set, with five different options: - Render Target Cache Flush - Depth Cache Flush - Stall at Pixel Scoreboard - Post-Sync Operation - Depth Stall I chose "Stall at Pixel Scoreboard" since we've used it effectively in the past, but the choice is fairly arbitrary. Implementing this in the PIPE_CONTROL emit helpers ensures that the workaround will always take effect when it ought to. Apparently, this workaround may be necessary on older hardware as well; for now I've only added it to Broadwell as it's absolutely necessary there. Subsequent patches could add it to older platforms, provided someone tests it there. v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits are set (suggested by Ian Romanick). v3: Prefix the function with "gen8" (requested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v2) Reviewed-by: Eric Anholt <[email protected]>
* i965: support instanced GS on gen7Jordan Justen2014-02-205-2/+11
| | | | | | | | | | v3: * Properly prevent dual object mode execution when the invocation count > 1 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: support gl_InvocationID for gen7Jordan Justen2014-02-205-3/+49
| | | | | | | | | | | | | v2: * Make gl_InvocationID a system value v3: * Properly shift from R0.1 into DST.4 by adding GS_OPCODE_GET_INSTANCE_ID Signed-off-by: Jordan Justen <[email protected]> Acked-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: add gl_InvocationID variable for ARB_gpu_shader5Jordan Justen2014-02-202-0/+3
| | | | | | | | | v2: * Make gl_InvocationID a system value Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main/shaderapi: GL_GEOMETRY_SHADER_INVOCATIONS GetProgramiv supportJordan Justen2014-02-201-0/+6
| | | | | | | | | v3: * Add check for ARB_gpu_shader5 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: initialize gl_geometry_program Invocations fieldJordan Justen2014-02-205-0/+5
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl/linker: produce gl_shader_program Geom.InvocationsJordan Justen2014-02-203-0/+31
| | | | | | | | | | Grab the parsed invocation count, check for consistency during linking, and finally save the result in gl_shader_program Geom.Invocations. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: parse invocations layout qualifier for ARB_gpu_shader5Jordan Justen2014-02-203-0/+43
| | | | | | | | | | | | _mesa_glsl_parse_state in_qualifier->invocations will store the invocations count. v3: * Use in_qualifier to allow the primitive to be specied separately from the invocations count (merge_qualifiers) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: Generate error for invalid input layout declarationsJordan Justen2014-02-201-0/+13
| | | | | | | | Fixes various piglit tests: spec/glsl-1.50/compiler/incorrect-in-layout-qualifier-*.geom Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: convert GS input primitive to use ast_type_qualifierJordan Justen2014-02-206-66/+96
| | | | | | | | | | | | | | | | | | | | | | | | We introduce a new merge_in_qualifier ast_type_qualifier which allows specialized handling of merging input layout qualifiers. By merging layout qualifiers into state->in_qualifier, we allow multiple input qualifiers. For example, the primitive type can be specified specified separately from the invocations count (ARB_gpu_shader5). state->gs_input_prim_type is moved into state->in_qualifier->prim_type state->gs_input_prim_type_specified is still processed separately so we can determine when the input primitive is specified. This is important since certain scenerios are not supported until after the primitive type has been specified in the shader code. v4: * Merge with compute shader input layout qualifiers Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Fix extra return value after winsys rb update refactor.Eric Anholt2014-02-201-1/+1
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75172 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use samplers for UBOs in the VS like we do for non-UBO pulls.Eric Anholt2014-02-201-5/+18
| | | | | | | Improves performance of a dolphin emulator trace I had laying around by 3.60131% +/- 0.995887% (n=128). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add an optimization pass to remove redundant flags movs.Eric Anholt2014-02-202-0/+34
| | | | | | | | | | | | | | We generate steaming piles of these for the centroid workaround, and this quickly cleans them up. total instructions in shared programs: 1591228 -> 1590047 (-0.07%) instructions in affected programs: 26111 -> 24930 (-4.52%) GAINED: 0 LOST: 0 (Improved apps are l4d2, csgo, and dolphin) Reviewed-by: Matt Turner <[email protected]>
* gallivm: add smallfloat to float conversion not relying on cpu denorm handlingRoland Scheidegger2014-02-201-20/+65
| | | | | | | | | | | | | | | | | | | | | | The previous code relied on cpu denorm support for converting small float formats (such r11g11b10_float and r16_float) to floats, otherwise denorms are flushed to zero. We worked around that in llvmpipe blend code by reenabling denorms, but this did nothing for texture sampling. Now it would be possible to reenable it there too but I'm not really a fan of messing with fpu flags (and it seems we can't actually do it reliably with llvm in any case looking at some bug reports). (Not to mention if you actually have a lot of denorms in there, you can expect some order-of-magnitude slowdown with x86 cpus.) So instead use code which adjusts exponents etc. directly hence not relying on cpu denorm support for the rescaling mul. (We still need the fpu flag handling as we can't do float-to-smallfloat without using cpu denorms at least for now - I actually wanted to keep both the old and new code and using one or the other depending on from where it's called but that didn't work out as the parameter would have to be passed through too many layers than I'd like.) Reviewed-by: Zack Rusin <[email protected]> Reviewed-by: Si Chen <[email protected]>
* st/omx/enc: add multi scaling buffers for performance improvementLeo Liu2014-02-202-16/+29
| | | | | Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/omx/dec/h264: fix prevFrameNumOffset handlingChristian König2014-02-201-0/+4
| | | | Signed-off-by: Christian König <[email protected]>
* i965: Actually claim to support MSAA on Broadwell.Kenneth Graunke2014-02-192-1/+10
| | | | | | | | | We need to advertise 8x, 4x, and 2x multisamples. Previously, we only claimed to support 0/1 samples. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Update physical width/height munging for 2x IMS MSAA.Kenneth Graunke2014-02-191-1/+6
| | | | | | | | | | | | | | | | | | I can't find any documentation to explain what ought to be done here, so I simply guessed based on the pattern I observed in the 4x/8x cases. It appears to work, but it could be totally wrong. I was able to find the Sandybridge PRM quote from the comments in the latest documentation: Shared Functions > 3D Sampler > Multisampled Surface Behavior. However, it only mentions 4x MSAA - not even 8x. After a substantial amount more digging, I was able to find a second page (incorrectly tagged) which confirmed the formulas in our code for 8x MSAA. However, that page didn't mention 2x MSAA at all. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Enable smooth points when multisampling without point sprites.Kenneth Graunke2014-02-191-1/+5
| | | | | | | | | | | | | | | | | According to the "Point Multisample Rasterization" of the OpenGL specification (3.0 or later), smooth points are supposed to be enabled implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag. However, if GL_POINT_SPRITE is enabled, you get square points no matter what. Core contexts always enable point sprites, so this effectively makes smooth points go away, even in the case of multisampling. Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests. (Yes, that's right folks, we actually have Piglit tests for this.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Thwack multisample enable bit in 3DSTATE_RASTER.Kenneth Graunke2014-02-192-0/+5
| | | | | | | | | | The meaning and effects of this bit are surprisingly complicated. See Rasterization > Windower > Multisampling > Multisample ModesState. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Only use the SIMD16 program for per-sample shading on Broadwell.Kenneth Graunke2014-02-191-9/+32
| | | | | | | | | | | | This restriction carries forward from earlier platforms. The code is ported straight from gen7_wm_state.c. v2: Actually do it right. v3: Add missing _NEW_MULTISAMPLE bit (caught by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Set "Position XY Offset Select" bits in 3DSTATE_PS on Broadwell.Kenneth Graunke2014-02-191-0/+18
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Add missing sample shading bits to Gen8's 3DSTATE_PS_EXTRA.Kenneth Graunke2014-02-191-1/+15
| | | | | | | | | | | | v2: Also set the "oMask Present to Render Target" bit, which is required for shaders that write oMask. Otherwise the hardware won't expect the extra data. v3: Add missing _NEW_MULTISAMPLE (caught by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Implement FS_OPCODE_SET_OMASK on Broadwell.Kenneth Graunke2014-02-192-1/+38
| | | | | | | | | I made a few changes which I think simplify the code a bit compared to the Gen7 implementation, but which are largely pointless. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Implement FS_OPCODE_SET_SAMPLE_ID on Broadwell.Kenneth Graunke2014-02-192-1/+32
| | | | | | | | Largely cut and paste from Gen7; it works the same way. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Disable MCS on Broadwell for now.Kenneth Graunke2014-02-191-0/+8
| | | | | | | | v2: Add a perf_debug() message to remind us to come back to this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Use gen7_surface_msaa_bits in Broadwell SURFACE_STATE code.Kenneth Graunke2014-02-191-14/+2
| | | | | | | | | | | We already set the number of samples, but were missing the MSAA layout mode. Reusing gen7_surface_msaa_bits makes it easy to set both. This also lets us drop the Gen8 surface_num_multisamples function. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Use ffs() for sample counting in gen7_surface_msaa_bits().Kenneth Graunke2014-02-191-6/+4
| | | | | | | | | | | The enumerations are just log2(num_samples) shifted by 3, which we can easily compute via ffs(). This also makes it reusable for Broadwell, which has 2x MSAA. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Simplify Broadwell's 3DSTATE_MULTISAMPLE sample count handling.Kenneth Graunke2014-02-191-23/+3
| | | | | | | | | | | These enumerations are simply log2 of the number of multisamples shifted by a bit, so we can calculate them using ffs() in a lot less code. Suggested by Eric Anholt. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: Silence "type qualifiers ignored on function return type" warningIan Romanick2014-02-191-1/+1
| | | | | | | | | | | | | | The const in const unsigned foo(void); is meaningless. Removing it silences this warning: src/glsl/ast_to_hir.cpp:1802:56: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: Only warn for macro names containing __Ian Romanick2014-02-191-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | From page 14 (page 20 of the PDF) of the GLSL 1.10 spec: "In addition, all identifiers containing two consecutive underscores (__) are reserved as possible future keywords." The intention is that names containing __ are reserved for internal use by the implementation, and names prefixed with GL_ are reserved for use by Khronos. Names simply containing __ are dangerous to use, but should be allowed. Per the Khronos bug mentioned below, a future version of the GLSL specification will clarify this. Signed-off-by: Ian Romanick <[email protected]> Cc: "9.2 10.0 10.1" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Darius Spitznagel <[email protected]> Cc: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870 Bugzilla: Khronos #11702
* glcpp: Only warn for macro names containing __Ian Romanick2014-02-192-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and the GLSL ES spec (all versions) say: "All macro names containing two consecutive underscores ( __ ) are reserved for future use as predefined macro names. All macro names prefixed with "GL_" ("GL" followed by a single underscore) are also reserved." The intention is that names containing __ are reserved for internal use by the implementation, and names prefixed with GL_ are reserved for use by Khronos. Since every extension adds a name prefixed with GL_ (i.e., the name of the extension), that should be an error. Names simply containing __ are dangerous to use, but should be allowed. In similar cases, the C++ preprocessor specification says, "no diagnostic is required." Per the Khronos bug mentioned below, a future version of the GLSL specification will clarify this. Signed-off-by: Ian Romanick <[email protected]> Cc: "9.2 10.0 10.1" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Darius Spitznagel <[email protected]> Cc: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870 Bugzilla: Khronos #11702
* configure: Use LLVM shared libraries by defaultTom Stellard2014-02-191-4/+4
| | | | | | | | | | | | | | | Linking with LLVM static libraries is easily broken by changes to the llvm-config program or when LLVM adds, removes, or changes library components. Keeping up with these changes requires a lot of maintanence effort to keep the build working on the master and stable branches. Also, because of issues in the past LLVM static libraries, the release manager is currently configuring with --with-llvm-shared-libs when checking the build before release. Enabling shared libraries by default would allow the release manager to run ./configure with no arguments, and be reasonably confident that the build would succeed. Acked-by: Emil Velikov <[email protected]>
* i965/fs: Allocate the param_size array dynamically.Francisco Jerez2014-02-192-2/+2
| | | | | | | | Useful because the total number of uniform components might exceed MAX_UNIFORMS * 4 in some cases because of the image metadata we'll be passing as push constants. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Use a separate variable to keep track of the last uniform index seen.Francisco Jerez2014-02-195-35/+35
| | | | | | | Like the VEC4 back-end does. It will make dynamic allocation of the param_size array easier in a future commit. Reviewed-by: Paul Berry <[email protected]>
* freedreno: tweak ringbuffer sizes/countRob Clark2014-02-192-2/+2
| | | | | | | | Since we are now consuming two ringbuffers at a time, we probably want a pool larger than 4.. but we don't need each individual ringbuffer to be so large, so offset the pool size increase by reducing rb size. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: scheduling/legalize fixesRob Clark2014-02-193-2/+30
| | | | | | | | | | It seems the write-after-read hazard that applies to texture fetch instructions, also applies to sfu instructions. Also, cat5/cat6 instructions do not have a (ss) bit, so in these cases we need to insert a dummy nop instruction with (ss) bit set. Signed-off-by: Rob Clark <[email protected]>
* i965: Have brw_imm_vf4() take the vector components as integer values.Francisco Jerez2014-02-192-11/+31
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Add helper function to find out the signedness of a register type.Francisco Jerez2014-02-191-0/+28
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Use swizzle() in the ARB_vertex_program code.Francisco Jerez2014-02-192-24/+11
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Use offset() in the ARB_fragment_program code.Francisco Jerez2014-02-191-69/+62
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Remove fs_reg::retype.Francisco Jerez2014-02-193-20/+12
| | | | | | | | | There doesn't seem to be any reason for it to be a method, and it's surprising that the expression 'reg.retype(t)' doesn't retype its object but rather it creates a temporary with the new type. Use 'retype(reg, t)' instead. Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Trivial improvements to the with_writemask() function.Francisco Jerez2014-02-193-18/+15
| | | | | | | | | | | | | | Add assertion that the register is not in the HW_REG or IMM file, calculate the conjunction of the old and new mask instead of replacing the old [consistent with the behavior of brw_writemask(), causes no functional changes right now], make it static inline to let the compiler do a slightly better job at optimizing things, and shorten its name. v2: Assert that the new writemask is not zero to avoid undefined hardware behaviour. Reviewed-by: Paul Berry <[email protected]>
* i965: Make sure that backend_reg::type and brw_reg::type are consistent for ↵Francisco Jerez2014-02-195-0/+26
| | | | | | | | | | | | | | | fixed regs. And define non-mutating helper functions to retype fixed and normal regs with a common interface. At some point we may want to get rid of ::fixed_hw_reg completely and have fixed regs use the normal register data members (e.g. backend_reg::reg to select a fixed GRF number, src_reg::swizzle to store the swizzle, etc.), I have the feeling that this is not the last headache we're going to get because of the multiple ways to represent the same thing and the different register interface depending on the file a register is stored in... Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add non-mutating helper functions to modify src_reg::swizzle and ↵Francisco Jerez2014-02-191-0/+24
| | | | | | ::negate. Reviewed-by: Paul Berry <[email protected]>
* i965: Add non-mutating helper functions to modify the register offset.Francisco Jerez2014-02-192-0/+24
| | | | | | | Yes, we could avoid having four copies of essentially the same code by using templates here. Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Fix off-by-one register class overallocation.Francisco Jerez2014-02-191-1/+1
| | | | Reviewed-by: Paul Berry <[email protected]>