summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* r600g/compute: Fix warningsTom Stellard2014-07-232-12/+16
|
* r600g: Use hardware sqrt instructionGlenn Kennard2014-07-232-7/+4
| | | | | | | Piglit quick tests including sqrt pass, no other regressions, tested on radeon 6670. Reviewed-by: Alex Deucher <[email protected]>
* r600g/compute: Remove unneeded code from compute_memory_promote_itemBruno Jiménez2014-07-232-36/+12
| | | | | | | | | | | | Now that we know that the pool is defragmented, we positively know that allocated + unallocated will be the total size of the current pool plus all the items that will be promoted. So we only need to grow the pool once. This will allow us to just add the new items to the end of the item_list without the need of looking for a place to the new item. Reviewed-by: Tom Stellard <[email protected]>
* r600g/compute: Quick exit if there's nothing to add to the poolBruno Jiménez2014-07-231-0/+4
| | | | | | | | This way we can avoid defragmenting the pool, even if it is needed to defragment it, and looping again through the list of unallocated items. Reviewed-by: Tom Stellard <[email protected]>
* r600g/compute: Defrag the pool if it's necesaryBruno Jiménez2014-07-232-17/+19
| | | | | | | | | | | | | | | | | This patch adds a new member to the pool to track its status. For now it is used only for the 'fragmented' status, but if needed it could be used for more statuses. The pool will be considered fragmented if: An item that isn't the last is freed or demoted. This 'strategy' has a problem, although it shouldn't cause any bug. If for example we have two items, A and B. We choose to free A first, now the pool will have the 'fragmented' status. If we now free B, the pool will retain its 'fragmented' status even if it isn't fragmented. Reviewed-by: Tom Stellard <[email protected]>
* r600g/compute: Add a function for defragmenting the poolBruno Jiménez2014-07-232-0/+28
| | | | | | | | | | | This new function will move items forward in the pool, so that there's no gap between them, effectively defragmenting the pool. For now this function is a bit dumb as it just moves items forward without trying to see if other items in the pool could fit in the gaps. Reviewed-by: Tom Stellard <[email protected]>
* r600g/compute: Add a function for moving items in the poolBruno Jiménez2014-07-232-0/+93
| | | | | | | | | | | | | | | | This function will be used in the future by compute_memory_defrag to move items forward in the pool. It does so by first checking for overlaping ranges, if the ranges don't overlap it will copy the contents directly. If they overlap it will try first to make a temporary buffer, if this buffer fails to allocate, it will finally fall back to a mapping. Note that it will only be needed to move items forward, it only checks for overlapping ranges in that case. If needed, it can easily be added by changing the first if. Reviewed-by: Tom Stellard <[email protected]>
* freedreno/a3xx: more vtx formatsRob Clark2014-07-231-0/+17
| | | | | | | | Actually what we currently handle is just the SCALED versions, and not the int versions. The difference probably matters more when we actually support integer in the compiler. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: const file relative addressingRob Clark2014-07-238-68/+203
| | | | | | | | | | | | | | | | | | Teach new compiler scheduling and register assignment how to deal with relative addressing. This gets us what we need to avoid falling back to old compiler for CONST[ADDR[0].x+n]. It is also a prerequisite for temp file relative addressing, although that is going to also need some cleverness in register assignment to keep arrays grouped together. NOTE: doing address calculation in full precision and then narrowing to s16 in the mov to addr reg seems to sometimes cause lockups (and sometimes work?!). It seems more reliable to do the address calculation in s16, like the blob does. Which means teaching RA how to deal with mixed half and full precision allocation. Fortunately that didn't turn out to be too hard, so that is a nice bonus which we could probably take better advantage of elsewhere. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: move functionRob Clark2014-07-231-35/+35
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add back a few stallsRob Clark2014-07-231-0/+8
| | | | | | | | | | | Technically we should not need these. CP_LOAD_STATE can be pipelined. But removing them broke a few piglit tests, like fbo-depth- GL_DEPTH_COMPONENT24-readpixels. I expect these are just masking a problem elsewhere, or perhaps they are only needed under some more specific circumstances. But until that is understood properly, give back a bit of the perf boost we got from c63450e8. Signed-off-by: Rob Clark <[email protected]>
* targets/dri: fix freedreno targetsRob Clark2014-07-232-3/+11
| | | | | | | The kernel driver name is either "kgsl" (downstream/android) or "msm" (upstream). Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2014-07-234-14/+14
| | | | Signed-off-by: Rob Clark <[email protected]>
* docs: Update GL3.txt and relnotes for GL_ARB_clear_textureNeil Roberts2014-07-232-1/+2
|
* meta: Add a meta implementation of GL_ARB_clear_textureNeil Roberts2014-07-234-0/+198
| | | | | | | | | | | | | | | | | | | | Adds an implementation of the ClearTexSubImage driver entry point that tries to set up an FBO to render to the texture and then calls glClearBuffer with a scissor to perform the actual clear. If an FBO can't be created for the texture then it will fall back to using _mesa_store_ClearTexSubImage. When used in combination with _mesa_store_ClearTexSubImage this should provide an implementation that works for all DRI-based drivers. However as this has only been tested with the i965 driver it is currently only enabled there. v2: Only enable the extension for the i965 driver instead of all DRI drivers. Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add some more comments. v3: Use glClearBuffer* to avoid having to modify glClearColor and friends. Handle sRGB textures. Explicitly disable dithering. Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
* meta: Add a state flag for the GL_DITHERNeil Roberts2014-07-232-0/+12
| | | | | | | | | The Meta implementation of glClearTexSubImage is going to want to ensure that dithering is disabled so that it can get a consistent color across the whole texture when clearing. This adds a state flag to easily save it and set it to the default value when performing meta operations. Reviewed-by: Topi Pohjolainen <[email protected]>
* texstore: Add a generic implementation of GL_ARB_clear_textureNeil Roberts2014-07-232-0/+79
| | | | | | | | | Adds an implmentation of the ClearTexSubImage driver entry point that just maps the texture and writes the values in. The extension is not yet enabled by default because it doesn't work with multisample textures as they don't have a simple linear layout. Reviewed-by: Jason Ekstrand <[email protected]>
* mesa/main: Add generic bits of ARB_clear_texture implementationNeil Roberts2014-07-233-1/+271
| | | | | | | | | This adds the driver entry point for glClearTexSubImage and fills in the _mesa_ClearTexImage and _mesa_ClearTexSubImage functions that call it. v2: Don't clear some of the images if only one of them makes an error Reviewed-by: Jason Ekstrand <[email protected]>
* teximage: Add utility func for format/internalFormat compatibility checkNeil Roberts2014-07-231-21/+38
| | | | | | | In texture_error_check() there was a snippet of code to check whether the given format and internal format are basically compatible. This has been split out into its own static helper function so that it can be used by an implementation of glClearTexImage too.
* mesa/main: add ARB_clear_texture entrypointsIlia Mirkin2014-07-237-1/+69
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Neil Roberts <[email protected]>
* r600g/radeonsi: Use write-combined CPU mappings of some BOs in GTTMichel Dänzer2014-07-2317-26/+77
| | | | Reviewed-by: Marek Olšák <[email protected]>
* winsys/radeon: Use separate caching buffer managers for VRAM and GTTMichel Dänzer2014-07-233-9/+20
| | | | | | | Should reduce overhead because the caching buffer manager doesn't need to consider buffers of the wrong type. Reviewed-by: Marek Olšák <[email protected]>
* docs/GL3.txt: update status for ARB_compute_shaderDave Airlie2014-07-231-1/+1
| | | | | | | since some bits are done in tree, but nobody is working on it anymore. Reviewed-by: Chris Forbes <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa: Don't use memcpy() in _mesa_texstore() for float depth texture dataAnuj Phogat2014-07-211-0/+15
| | | | | | | | | | | | | | | | because float depth texture data needs clamping to [0.0, 1.0]. Let the _mesa_texstore() fallback to slower path. Fixes Khronos GLES3 CTS tests: shadow_execution_vert shadow_execution_frag V2: Move the check to _mesa_texstore_can_use_memcpy() function. Add check for floating point data types. Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965/fs: Fix gl_SampleMask handling for SIMD16 on Gen8+.Kenneth Graunke2014-07-211-5/+0
| | | | | | | | | | | | We actually want to use mov(16), not mov(8). Fixes 7 Piglit tests: ARB_sample_shading/builtin-gl-sample-mask [2468] and ARB_sample_shading/builtin-gl-sample-mask-simple [468]. Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991 Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.Kenneth Graunke2014-07-213-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | We might be able to do this without an extra program key field, but this is non-invasive and fixes the bug, for now. This fixes the following Piglit tests on Broadwell: - ARB_sample_shading/builtin-gl-sample-id 2 - ARB_sample_shading/builtin-gl-sample-position 2 - EXT_framebuffer_multisample/multisample-blit 2 color - EXT_framebuffer_multisample/multisample-blit 2 color linear - EXT_framebuffer_multisample/multisample-blit 2 depth - EXT_framebuffer_multisample/no-color 2 depth combined - EXT_framebuffer_multisample/no-color 2 depth separate - EXT_framebuffer_multisample/no-color 2 depth single - EXT_framebuffer_multisample/no-color 2 depth-computed combined - EXT_framebuffer_multisample/no-color 2 depth-computed separate - EXT_framebuffer_multisample/no-color 2 depth-computed single - EXT_framebuffer_multisample/unaligned-blit 2 color msaa - EXT_framebuffer_multisample/unaligned-blit 2 depth msaa Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991 Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing persample_shading field to brw_wm_debug_recompile.Kenneth Graunke2014-07-211-0/+2
| | | | | | | | Otherwise, the performance warning for shader recompiles will just say "something else". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/disasm: Don't disassemble the URB complete field on Broadwell.Kenneth Graunke2014-07-211-2/+4
| | | | | | | | It doesn't exist, so attempting to read it will trigger generation assertions in the brw_inst API. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Disable hex offset printing in disassembly.Kenneth Graunke2014-07-211-1/+2
| | | | | | | | | | | | | | | Printing the hex offsets makes it basically impossible to diff assembly: if you add even a single instruction, the entire shader shows up as a difference. So, every time I want to compare assembly, I have to strip this out. The hex offsets might be useful when debugging compaction, or when inspecting the program cache buffer. Since it's occasionally useful, but uncommon, this patch disables it by default, but makes it easy to re-enable it temporarily when the need arises. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Use foreach_inst_in_block a couple more places.Matt Turner2014-07-212-8/+2
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Replace cfg instances with calls to calculate_cfg().Matt Turner2014-07-215-22/+22
| | | | | | | | | | | Avoids regenerating it unnecessarily. Every program in shader-db improved, none by an amount less than a 1/3 reduction. One Dota2 shader decreased from 62 -> 24. cfg calculations: 429492 -> 193197 (-55.02%) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Add a foreach_block_and_inst macro.Matt Turner2014-07-211-0/+4
| | | | | | Will let us abstract how the instructions are stored. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add cfg to backend_visitor.Matt Turner2014-07-219-33/+48
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* radeonsi/compute: Add support scratch buffer support v2Tom Stellard2014-07-213-2/+85
| | | | | | | | The scratch buffer will be used for private memory and also register spilling. v2: - Code cleanups
* radeonsi/compute: Bump number of user sgprs for LLVM 3.5Tom Stellard2014-07-211-1/+6
| | | | Reviewed-by: Marek Olšák <[email protected]>
* winsys/radeon: Query the kernel for the number of SEs and SHs per SETom Stellard2014-07-212-0/+8
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/compute: Share COMPUTE_DBG macro with r600gTom Stellard2014-07-213-13/+10
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Read rodata from ELF and append it to the end of shadersTom Stellard2014-07-213-1/+22
| | | | | | | The is used for programs that have arrays of constants that are accessed using dynamic indices. The shader will compute the base address of the constants and then access them using SMRD instructions.
* glsl: Fix bad indentationIan Romanick2014-07-191-1/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence unused parameter warningIan Romanick2014-07-191-1/+1
| | | | | | | brw_fs_visitor.cpp:2400:1: warning: unused parameter 'ir' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence 'comparison is always true' warningIan Romanick2014-07-191-2/+0
| | | | | | | | | | | | | The parameter is an int16_t, and we're check that it's value will fit in 16-bits. Yes, the value that is stored in 16-bits will surely fit in 16-bits. brw_inst.h: In function 'brw_inst_set_gen6_jump_count': brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence many unused parameter warningsIan Romanick2014-07-191-0/+10
| | | | | | | | brw_inst.h: In function 'brw_inst_set_src1_vstride': brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* configure.ac: Add LLVM patch version to error message.Vinson Lee2014-07-181-1/+1
| | | | | Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* main/format_pack: Fix a wrong datatype in pack_ubyte_R8G8_UNORMJason Ekstrand2014-07-181-1/+1
| | | | | | | | Before it was only storing one of the color components due to truncation. With this patch it now properly stores all of them. Reviewed-by: Brian Paul <[email protected]> Cc: "10.2" <[email protected]>
* docs: Import 10.2.4 release notesCarl Worth2014-07-183-0/+134
| | | | And add a news item.
* Add support for RGBA8 and RGBX8 textures in intel_texsubimage_tiled_memcpyJason Ekstrand2014-07-171-0/+11
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Improve debug output in intelTexImage and intelTexSubimageJason Ekstrand2014-07-172-1/+9
| | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* radeonsi: only update vertex buffers when they need updatingMarek Olšák2014-07-183-2/+22
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove nr_vertex_buffersMarek Olšák2014-07-183-6/+23
| | | | | | | | Unused. Also inline util_set_vertex_buffers_count and simplify it. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: move vertex buffer descriptors from IB to memoryMarek Olšák2014-07-187-106/+133
| | | | | | | | | | This removes the intermediate storage (pm4 state) and generates descriptors directly in a staging buffer. It also reduces the number of flushes, because the descriptors no longer take CS space. Reviewed-by: Michel Dänzer <[email protected]>