summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add brw_setup_tex_for_precompile. Use in VS, GS & FS.Jordan Justen2015-05-023-24/+24
| | | | | | Suggested-by: Kristian Høgsberg <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Emit compute shader code and upload programsJordan Justen2015-05-023-0/+212
| | | | | | | | | | | | v2: * Don't bother checking for 'gen > 5' (krh) * Populate sampler data in key (krh) v3: * Drop no8 support, and simplify code in several places (Ken) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Set invocation counts based on max_cs_threadsJordan Justen2015-05-021-0/+24
| | | | | | | | | | | | | | For ES, we set the max counts based on SIMD8, which is currently accurate. For desktop GL, we set the max counts based on SIMD16, which can fail in some cases where a SIMD16 program is not currently supported. Therefore, this value is not currently accurate, but will work fine in many cases, and lets us run more test cases. Eventually we want to always be able to generate a SIMD16 program. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Add max_cs_threadsJordan Justen2015-05-024-1/+14
| | | | | | | | Add values for gen7 & gen8. These are the number threads in a subslice. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove comment about chv device numbers being preliminaryJordan Justen2015-05-021-3/+0
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Support compute programs in fs_visitorJordan Justen2015-05-024-3/+93
| | | | | | | | | | v2: * Clean out some unneeded code copied from run_fs (krh) * Always use NIR * Split shader time out into a separate commit Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cache: Add support for CS in program state cacheJordan Justen2015-05-024-0/+54
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Add brw_cs_prog_data, brw_cs_prog_key and brw_context::cs.Paul Berry2015-05-022-0/+62
| | | | | | | | | | | | [email protected]: * Added brw_cs_prog_key structure * Added brw_cs_prog_data::dispatch_grf_start_reg_16 * Added brw_cs_prog_data::local_size * Added brw_cs_prog_data::simd_size Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Add generator support for CS_OPCODE_CS_TERMINATEJordan Justen2015-05-022-0/+36
| | | | | | | | | | v2: * Don't rely on brw_eu* to generate the send instruction. We now generate the send here, and drop the "i965/cs: Add support for the SEND message that terminates a CS thread" brw_eu* patch. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Mark g0 as used by CS_OPCODE_CS_TERMINATEJordan Justen2015-05-021-0/+4
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add emit_cs_terminate to emit CS_OPCODE_CS_TERMINATEJordan Justen2015-05-022-0/+23
| | | | | | | | | | | | v2: * Do more work at the visitor level. g0 is loaded and sent to the generator now. v3: * Use Ken's comment explaining g0 usage Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Add CS_OPCODE_CS_TERMINATEJordan Justen2015-05-022-0/+7
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Add BRW_NEW_CS_PROG_DATA and BRW_CACHE_CS_PROGJordan Justen2015-05-023-0/+6
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add an INTEL_DEBUG=cs option.Paul Berry2015-05-022-2/+4
| | | | | | | | | At the moment it's not wired up to anything. Later patches will hook it up to the compute shader back-end. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/cs: Add compute support to update_program().Paul Berry2015-05-021-0/+21
| | | | | | Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/cs: Update program.c for compute shaders.Paul Berry2015-05-021-0/+3
| | | | | | Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/cs: Add inline functions for dealing with compute shaders.Paul Berry2015-05-021-0/+22
| | | | | | Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Add BRW_NEW_COMPUTE_PROGRAM state flag.Paul Berry2015-05-022-0/+9
| | | | | | | | | Also add code to brw_upload_state to set it when the compute program changes. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Strip trailing constant zeroes in sample messagesNeil Roberts2015-05-012-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a send message is emitted with a message length that is less than required for the message then the remaining parameters default to zero. We can take advantage of this to save a register when a shader passes constant zeroes as the final coordinates to the sample function. I think this might be useful for GLES applications that are using 2D textures to simulate 1D textures. On Skylake it will be useful for shaders that do texelFetch(tex,something,0) which I think is fairly common. This helps more on Skylake because in that case the order of the instruction operands are u,v,lod,r which is good for 2D textures whereas before they were u,lod,v,r which is only good for 1D textures. On Haswell: total instructions in shared programs: 8535730 -> 8533261 (-0.03%) instructions in affected programs: 236968 -> 234499 (-1.04%) helped: 1174 On Skylake: total instructions in shared programs: 10345646 -> 10341237 (-0.04%) instructions in affected programs: 293011 -> 288602 (-1.50%) helped: 1218 Reviewed-by: Matt Turner <[email protected]> v2: Applied suggestions by Kenneth Graunke: - Only apply on Gen5+ - Apply to all texture opcodes, not just TEX and TXF. Moved the optimisation into the loop as suggested by Matt Turner. Fix the array index when there is a header.
* i965/skl: Force the exec size to 8 when initing header for SIMD4x2Neil Roberts2015-05-012-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | On Gen9+ there needs to be a header when sampling using SIMD4x2. The header is set up by copying from the g0 register. Commit 07c571a39f tried to fix this mov instruction to always use an exec size of 8 because previously it was incorrectly using 4. It did this by casting the type of the destination register to vec8. This was done because there is code in brw_set_dest to guess the exec size based on the width of the dest register. However I misunderstood how this works because it is actually only used when the width is less than 8. That means the patch actually changed it to use the default exec size which on SIMD16 would be 16 and the MOV would clobber over the first register in the send message. This patch makes it additionally set the default exec size to 8. This is similar to how the message is set up in fs_generator::generate_tex. I think this wasn't picked up by any Piglit tests because we don't have any fragment shaders that hit this code path so nothing was using SIMD16. However the patch caused failures in deqp tests. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90153 Reviewed-by: Matt Turner <[email protected]> Tested-by: Tapani Pälli <[email protected]>
* i965: Unhardcode a few more stage names and abbreviations.Kenneth Graunke2015-04-302-11/+5
| | | | | | | | | | | | | | | The stage_abbrev and stage_name fields in backend_visitor provide what we need without any additional effort. It also means we'll get the right names for compute shaders, SIMD8 geometry shaders, and both kinds of tessellation shaders. This does unfortunately change the capitalization of the stage abbreviation in the INTEL_DEBUG=optimizer output filenames. It doesn't seem worth adding code to handle, though. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* docs/relnotes: document the new EGL sync extensionsMarek Olšák2015-04-301-0/+4
|
* st/dri: implement the fence interface for CL eventsMarek Olšák2015-04-303-1/+81
|
* gallium,clover: add OpenCL interoperability support for CL eventsMarek Olšák2015-04-305-0/+114
| | | | | | | | | | | v2: - move interop.cpp to clover/api - change intptr_t to void* in the interface - add a virtual function fence() to simplify some code v3: - use bool in the interface v4: - enclose the last two interop functions in try..catch Reviewed-by: Francisco Jerez <[email protected]>
* st/dri: implement the fence interfaceMarek Olšák2015-04-301-0/+80
|
* egl/dri2: return the latest sync status in eglGetSyncAttribKHRMarek Olšák2015-04-301-1/+8
|
* egl/dri2: implement EGL_KHR_cl_event2 (v2)Marek Olšák2015-04-306-12/+111
| | | | v2: fix the SYNC_CONDITION query
* egl/dri2: implement EGL_KHR_wait_syncMarek Olšák2015-04-305-0/+47
|
* egl/dri2: implement EGL_KHR_fence_syncMarek Olšák2015-04-303-5/+133
|
* mesa: add GL_OES_EGL_syncMarek Olšák2015-04-301-0/+1
| | | | | This is an empty extension whose presence means that EGL sync objects can be used with ES contexts.
* dri_interface: add an interface for fencesMarek Olšák2015-04-301-0/+60
|
* egl/dri: don't expose configs with an accumulation bufferMarek Olšák2015-04-301-0/+9
|
* nvc0/ir: fix predicated PFETCH for realIlia Mirkin2015-04-302-2/+2
| | | | | | | | Commit a9d08a250 accidentally didn't make use of the new src1 variable. Use it. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50/ir: fix asFlow() const helper for OP_JOINIlia Mirkin2015-04-291-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0/ir: fix predicated PFETCH emissionIlia Mirkin2015-04-292-2/+6
| | | | | | | | | src1 would contain the predicate, which would get emitted as a register source by an undiscerning srcId helper. Work around this in the same way as in emitTEX. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* gk110/ir: fix set with a register dest to not auto-set the abs flagIlia Mirkin2015-04-291-1/+1
| | | | | | | This was causing src0 to always have the absolute value flag set. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* i965/blorp: Prepare drawing rectangle for flipped coordinatesTopi Pohjolainen2015-04-301-2/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Add support for layered renderingTopi Pohjolainen2015-04-304-5/+9
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Allow blend state to be set for multiple render targetsTopi Pohjolainen2015-04-303-19/+18
| | | | | | | | | | | Original blorp writes only one buffer per shader invocation. Once the launch mechanism is shared with glsl-based programs there will be need for supporting multiple render targets. Also drop the always constant color write disable settings. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Prepare for attributes other than render positionTopi Pohjolainen2015-04-304-7/+12
| | | | | | | | | | | | | | | Note that the magic number of one in gen7 logic is replaced by BRW_SF_URB_ENTRY_READ_OFFSET ( == 1 also) for clarity. On gen6 the change from zero to one (BRW_SF_URB_ENTRY_READ_OFFSET) has no effect for native blorp as blorp doesn't use any additional attributes. In fact, regular pipeline setup always uses BRW_SF_URB_ENTRY_READ_OFFSET even when there are no additional attributes. Hence the change makes the two (blorp and regular) consistent. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Remove unused argumentsTopi Pohjolainen2015-04-303-21/+12
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/gen7/blorp: Remove unused argumentsTopi Pohjolainen2015-04-301-47/+28
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Allow caller to provide sampler settingsTopi Pohjolainen2015-04-303-8/+14
| | | | | | | v2 (Ken): s/use_unorm_coords/non_normalized_coords/ Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Refactor vertex buffer state setupTopi Pohjolainen2015-04-301-26/+34
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Remove constant parameterTopi Pohjolainen2015-04-303-20/+0
| | | | | | | | This was still needed when we had support for blorp clears but now this is fixed to nop. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/gen8: Expose state base address setupTopi Pohjolainen2015-04-302-2/+5
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/ps/gen8: Refactor state uploadingTopi Pohjolainen2015-04-302-26/+58
| | | | | | | | | v2: Use SET_FIELD() for sampler count, and for that reason added GEN7_PS_SAMPLER_COUNT_MASK. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/ps/gen7: Refactor state uploadingTopi Pohjolainen2015-04-302-20/+45
| | | | | | | | | | | Now the uploading depends only on the input parameters instead of consulting the current gl-state. v2: Rebased on top of sampler count clamping Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965: Refactor sampler state setupTopi Pohjolainen2015-04-302-22/+47
| | | | | | | | v2 (Matt): Moved * to the name. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965: Remove dependency to tex object in default color setupTopi Pohjolainen2015-04-301-11/+11
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>