summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: use a bitmask for tracking dirty atomsMarek Olšák2015-09-013-13/+18
| | | | | | | | This mainly removes the cache misses when checking the dirty flags. Not much else though. Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: initialize atom IDs for external atomsMarek Olšák2015-09-012-4/+13
| | | | | Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: call si_init_atom for remaining radeonsi atomsMarek Olšák2015-09-018-41/+29
| | | | | | | | | | I need to initialize more atom IDs. This adds 4 more si_init_atom calls, which simplifies the code. (si_init_atom needs a different context type of the emit functions though) Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: initialize atom IDsMarek Olšák2015-09-011-6/+8
| | | | | Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: define the state atom array separatelyMarek Olšák2015-09-014-21/+23
| | | | | Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: optimize viewport statesMarek Olšák2015-09-016-26/+54
| | | | | | | same as scissors Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: optimize scissor statesMarek Olšák2015-09-018-27/+79
| | | | | | | | | - convert 16 states to 1 atom - only emit 1 scissor if VIEWPORT_INDEX isn't written - use only one packet when emitting consecutive scissors Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: add SI_MAX_ATTRIBSMarek Olšák2015-09-012-5/+6
| | | | | | | PIPE_MAX_ATTRIBS is 32, but we currently only support 16. Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: fix memory usage checking for big IBsMarek Olšák2015-09-011-8/+9
| | | | | | Cc: 11.0 <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: set all 16 viewport Z bounds for GL 4.1Marek Olšák2015-09-011-2/+6
| | | | | | Cc: 11.0 <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* radeonsi: fix a Unigine Heaven hang when drirc is missingMarek Olšák2015-09-014-1/+28
| | | | | | Cc: 10.6 11.0 <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* winsys/amdgpu: use small IBs for better performance on VIMarek Olšák2015-09-011-7/+9
| | | | | Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* gallium/util: add u_bit_scan_consecutive_rangeMarek Olšák2015-09-011-0/+20
| | | | | Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
* i965: Prevent coordinate overflow in intel_emit_linear_blitChris Wilson2015-09-011-38/+34
| | | | | | | | | | | | | | | | | | | | | | | | | Fixes regression from commit 8c17d53823c77ac1c56b0548e4e54f69a33285f1 Author: Kenneth Graunke <[email protected]> Date: Wed Apr 15 03:04:33 2015 -0700 i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions. which adjusted the coordinates to be relative to the nearest cacheline. However, this then offsets the coordinates by up to 63 and this may then cause them to overflow the BLT limits. For the well aligned large transfer case, we can use 32bpp pixels and so reduce the coordinates by 4 (versus the current 8bpp pixels). We also have to be more careful doing the last line just in case it may exceed the coordinate limit. Reported-and-tested-by: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734 Signed-off-by: Chris Wilson <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ian Romanick <[email protected]> Cc: Anuj Phogat <[email protected]> Cc: [email protected] Reviewed-by: Anuj Phogat <[email protected]>
* i965/nir: enable the dead control flow optimizationConnor Abbott2015-09-011-0/+2
| | | | | | | | total instructions in shared programs: 7541551 -> 7541381 (-0.00%) instructions in affected programs: 3054 -> 2884 (-5.57%) helped: 29 Reviewed-by: Kenneth Graunke <[email protected]>
* nir/dead_cf: add support for removing useless loopsConnor Abbott2015-09-011-12/+109
| | | | | | | | | | | | v2: fix detecting if the loop has any phi nodes after it. v2: use nir_foreach_ssa_def() instead of nir_foreach_dest() when checking for values live after the loop to catch const_load instructions. v2: fix handling return instructions v2: add some documentation to loop_is_dead() Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: add a helper for iterating over blocks in a cf nodeConnor Abbott2015-09-012-0/+9
| | | | | | | We were already doing this internally for iterating over a function implementation, so just expose it directly. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: add nir_block_get_following_loop() helperConnor Abbott2015-09-012-0/+18
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/dead_cf: delete code that's unreachable due to jumpsConnor Abbott2015-09-011-8/+115
| | | | | | | | v2: use nir_cf_node_remove_after(). v2: use foreach_list_typed() instead of hardcoding a list walk. v3: update to new control flow modification helpers. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: add an optimization for removing dead control flowConnor Abbott2015-09-013-0/+158
| | | | | | | v2: use nir_cf_node_remove_after() instead of our own broken thing. v3: use the new control flow modification helpers. Reviewed-by: Kenneth Graunke <[email protected]>
* r600g: fix calculation for gpr allocationDave Airlie2015-09-011-1/+1
| | | | | | | | | | | | | | | I've been chasing a geom shader hang on rv635 since I wrote r600 geom code, and finally I hacked some values from fglrx in and I could run texelfetch without failures. This is totally my fault as well, maths fail 101. This makes geom shaders on r600 not fail heavily. Cc: "10.6" "11.0" <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa: Limit Framebuffer Parameter OpenGL ES 3.1 usageMarta Lofstedt2015-09-011-1/+17
| | | | | | | | | | | | | | | According to OpenGL ES 3.1 specification, section 9.2.1 for glFramebufferParameter and section 9.2.3 for glGetFramebufferParameteriv: "An INVALID_ENUM error is generated if pname is not FRAMEBUFFER_DEFAULT_WIDTH, FRAMEBUFFER_DEFAULT_HEIGHT, FRAMEBUFFER_DEFAULT_SAMPLES, or FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS." Therefore exclude OpenGL ES 3.1 from using the GL_FRAMEBUFFER_DEFAULT_LAYERS parameter. Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Kevin Rogovin <kevin.rogovin at intel.com>
* mesa: Expose GL_ARB_framebuffer_no_attachments to GLES 3.1Marta Lofstedt2015-09-015-12/+12
| | | | | | | V2: Conform to new standard for exposing enums for OpenGL ES 3.1. Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nir/builder: Use nir_after_instr to advance the cursorJason Ekstrand2015-08-311-2/+1
| | | | | | | | | | | | This *should* ensure that the cursor gets properly advanced in all cases. We had a problem before where, if the cursor was created using nir_after_cf_node on a non-block cf_node, that would call nir_before_block on the block following the cf node. Instructions would then get inserted in backwards order at the top of the block which is not at all what you would expect from nir_after_cf_node. By just resetting to after_instr, we avoid all these problems. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: advertise ASTC support for SkylakeNanley Chery2015-08-311-0/+5
| | | | | | | v2: remove OES ASTC extension reference. Reviewed-by: Anuj Phogat <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* mesa/glformats: recognize ASTC formats as color formatsNanley Chery2015-08-311-0/+28
| | | | | | | ASTC formats contain RGBA components. Reviewed-by: Chad Versace <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* mesa/texformat: use format conversion function in _mesa_choose_tex_formatvulkan-protex-2015.09.24.r01-baseNanley Chery2015-08-311-81/+13
| | | | | | | | | | | This function's cases for non-generic compressed formats duplicate the GL to MESA translation in _mesa_glenum_to_compressed_format(). This patch replaces the switch cases with a call to the translation function. This change teaches this function about ASTC, thus enabling ASTC for glTex*Storage*() calls. Reviewed-by: Chad Versace <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* mesa/texcompress: correct mapping of S3TC formats in conversion functionNanley Chery2015-08-311-2/+2
| | | | | | | | | MESA_FORMAT_RGBA_DXT5 should actually be reserved for GL_RGBA[4]_DXT5_S3TC. Also, Gallium and other dri drivers (radeon and nouveau) follow this mapping scheme. Reviewed-by: Chad Versace <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* r600/sb: update last_cf for finalize if.Dave Airlie2015-09-011-0/+3
| | | | | | | | | | | | As Glenn did for finalize_loop we need to update_cf when we add a POP at the end of a shader. I think this fixes one of the earlier shader going off end of memory problems we've stopped. Reviewed-by: Glenn Kennard <[email protected]> Cc: "10.6" "11.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965/fs: Use greater-equal cmod to implement maximum.Matt Turner2015-08-312-4/+6
| | | | | | | | | | The docs specifically call out SEL with .l and .ge as the implementations of MIN and MAX respectively. Among other things, SEL with these conditional mods are commutative. See commit 3b7f683f. Reviewed-by: Jordan Justen <[email protected]>
* i965/chv|skl: Apply sampler bypass w/aBen Widawsky2015-08-312-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | Certain compressed formats require this setting. The docs don't go into much detail as to why it's needed exactly. This patch introduces no piglit regressions on gen9 (bsw is untested). Note that the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are WTF. The patch also fixes nothing I can find. http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/ v2: Reworded commit message (Matt); Added piglit results link. Restructured condition (Matt) Moved check out to function (Nanley). I left the setting of the bit in the surface state open coded because it seems to go better with the existing code. v3: Use and inline function only in gen8_emit_texture_surface_state() (Matt). Cc: Matt Turner <[email protected]> Cc: Nanley Chery <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* st/mesa: move to renumbering registers in a groupDave Airlie2015-08-311-19/+38
| | | | | | | | This can be done with a single pass for the instruction base, and takes renumber_registers out of its spot on the profile. Acked-by: Marek Olšák <[email protected] Signed-off-by: Dave Airlie <[email protected]>
* st/mesa: reduce time spent in calculating temp read/writesDave Airlie2015-08-311-74/+79
| | | | | | | | | | | | | The glsl->tgsi convertor does some temporary register reduction however in profiling shader-db this shows up quite highly, so optimise things to reduce the number of loops through all the instructions we do. This drops merge_registers from 4-5% on the profile to 1%. I think this can be reduced further by possibly optimising the renumber pass. Acked-by: Marek Olšák <[email protected] Signed-off-by: Dave Airlie <[email protected]>
* st/mesa: cache tgsi opcode info in the instructionDave Airlie2015-08-311-23/+16
| | | | | | | | | Instead of looking this up lots, lets just cache it in the instruction translation up front. I just noticed this function what high in a profile of shader-db on radeonsi. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: move prim convert from geom shader to function.Dave Airlie2015-08-312-25/+26
| | | | | | | This should avoid C++ fail including this header. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glsl: remove specical case subroutine type countingTimothy Arceri2015-08-311-3/+2
| | | | | | | Unlike samplers we can get the correct value for subroutines from component_slots() Reviewed-by: Dave Airlie <[email protected]>
* r600g: Use TGSI parse results instead of manually exfiltratingEdward O'Callaghan2015-08-301-1/+1
| | | | | | | | This makes better use of the work that the TGSI API has done for us. Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600g: Set geometry properties in r600_create_shader_state()Edward O'Callaghan2015-08-303-25/+23
| | | | | | | | | | | The selector is shared by all shader variants, so the individual shaders shouldn't change it. Use tgsi_shader_scan() results to set geometry properties within a r600_create_shader_state() call and treat said propertices in the selector as read-only within r600_shader_from_tgsi(). Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600g: Move geometry properties state from shader to selectorEdward O'Callaghan2015-08-306-22/+23
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600g: Remove dead assigment to 'gs_input_prim' in shader stateEdward O'Callaghan2015-08-302-4/+0
| | | | | | | | Note that 'geometry shader properties' should be carried in the selector state over the shader state in any case. Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: don't use the emit qt keyword in si_init_atomMarek Olšák2015-08-291-2/+2
| | | | It confuses my editor.
* radeonsi: remove no-op 32-bit maskingMarek Olšák2015-08-295-7/+7
| | | | Reviewed-by: Alex Deucher <[email protected]>
* gallium/radeon: fix the ADDRESS_HI mask for EVENT_WRITE CIK packetsMarek Olšák2015-08-291-8/+8
| | | | | Cc: [email protected] Reviewed-by: Alex Deucher <[email protected]>
* winsys/radeon: handle non-zero finite timeout when waiting for buffersMarek Olšák2015-08-292-38/+41
| | | | Reviewed-by: Alex Deucher <[email protected]>
* freedreno/a3xx: implement half-z clippingIlia Mirkin2015-08-293-2/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a3xx: add basic clip plane supportIlia Mirkin2015-08-293-1/+24
| | | | | | | | | The hardware is capable of dealing with GL1-style user clip planes. No clip vertex, no clip distances. Fixes a number of ucp tests, as well as neverball. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0" <[email protected]>
* nvc0: change prefix of MP performance counters to HW_SMSamuel Pitoiset2015-08-292-149/+149
| | | | | | | According to NVIDIA, local performance counters (MP) are prefixed with SM, while global performance counters (PCOUNTER) are called PM. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: sort performance counter queries by nameSamuel Pitoiset2015-08-292-142/+142
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: make names of performance counter queries consistentSamuel Pitoiset2015-08-292-56/+56
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: use enumerations for driver queriesSamuel Pitoiset2015-08-291-120/+123
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>