summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Decide predicate/predicate_inverse outside of the for loop.Matt Turner2014-07-241-9/+14
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Swap if/else conditions in SEL peephole.Matt Turner2014-07-241-3/+3
| | | | | | Will clarify make the next commit easier to read. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Improve dead control flow elimination.Matt Turner2014-07-241-10/+15
| | | | | | | | ... to eliminate an ELSE instruction followed immediately by an ENDIF. instructions in affected programs: 704 -> 700 (-0.57%) Reviewed-by: Kenneth Graunke <[email protected]>
* nvc0/ir: support 2d constbuf indexingIlia Mirkin2014-07-241-0/+14
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: emit LDC subopsIlia Mirkin2014-07-241-0/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: emit load constant subopIlia Mirkin2014-07-241-0/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* mesa/st: add support for interpolate_at_* opsIlia Mirkin2014-07-241-3/+9
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* nv50/ir: fix phi/union sources when their def has been mergedIlia Mirkin2014-07-241-0/+8
| | | | | | | | | | | | | In a situation where double-register values are used, the phi nodes can still end up being u32 values. They all get merged into one RA node though. When fixing up the merge (which comes after the phi node), the phi node's def would get fixed, but not its sources which would remain at the low register value. This maintains the invariant that a phi node's defs and sources are allocated the same register. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix hard-coded TYPE_U32 sized registerIlia Mirkin2014-07-241-3/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: mark shader header if fp64 is usedIlia Mirkin2014-07-241-0/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: keep track of whether the program uses fp64Ilia Mirkin2014-07-242-2/+7
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: make sure that the local memory allocation is aligned to 0x10Ilia Mirkin2014-07-241-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]>
* mesa: add ARB_clear_texture.xml to file list, remove duplicate declsIlia Mirkin2014-07-242-12/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* ilo: check the tilings of imported handlesChia-I Wu2014-07-241-30/+36
| | | | Just to be cautious.
* ilo: clean up resource bo renamingChia-I Wu2014-07-244-51/+63
| | | | | s/alloc_bo/rename_bo/ as that is what the functions do. Simplify bo allocation and move the complexity to bo renaming.
* ilo: share some code between {tex,buf}_create_boChia-I Wu2014-07-241-59/+55
| | | | | Add resource_get_bo_name() and resource_get_bo_initial_domain() for use by both functions.
* ilo: use native 3-component vertex formats on GEN7.5+Chia-I Wu2014-07-242-1/+6
| | | | GEN7.5 gains support for those formats natively.
* ilo: allow for device-dependent format translationChia-I Wu2014-07-245-32/+39
| | | | Pass ilo_dev_info to all format translation functions.
* i965: Accelerate uploads of RGBA and BGRA GL_UNSIGNED_INT_8_8_8_8_REV texturesJason Ekstrand2014-07-231-1/+5
| | | | | | | | | | Since intel is always going to be little-endian, GL_UNSIGNED_INT_8_8_8_8_REV is the same as GL_UNSIGNED_BYTE for RGBA and BGRA textures, so the same acceleration code will work. We might as well use it. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Fix the name in the error messageIan Romanick2014-07-231-1/+1
| | | | | | | Obvious copy-and-paste bug. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Fix some bad indentationIan Romanick2014-07-231-3/+3
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Set LastRT on the final FB write on Broadwell.Kenneth Graunke2014-07-231-4/+2
| | | | | | | | | | | | | | | | | In Piglit's EXT_framebuffer_multisample/alpha-to-coverage-dual-src-blend test, key->nr_color_regions == 2, but the dual source blend FB write has ir->target set to 0. So we failed to set "Last Render Target Select" on any FB write message. We only emit one FB write per render target, so my comment about setting LastRT on every FB write directed at the last color region is a bit... misinformed. According to the documentation, depth buffer writes and scoreboard updates happen on the FB write with LastRT set, so I believe we want to set it only once. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Port INTEL_DEBUG=optimizer to the vec4 backend.Kenneth Graunke2014-07-231-6/+36
| | | | | | | Largely via copy and paste. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Save the gl_shader_stage enum in backend_visitor.Kenneth Graunke2014-07-232-1/+4
| | | | | | | | | This will be useful for INTEL_DEBUG=optimizer in the vec4 backend, which needs to know whether it's currently processing a VS or GS. It isn't worth adding virtual methods for this case. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Don't print WE_normal in disassembly.Kenneth Graunke2014-07-231-1/+1
| | | | | | | | | Dropping this helps most lines fit in an 80 column terminal. The absence of WE_normal also helps call attention to WE_all, where something unusual is going on. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* freedreno/a3xx/compiler: fix p0 (kill, etc)Rob Clark2014-07-231-1/+2
| | | | | | | Don't assert (debug builds) or assign random uninitialized value for predicate register (p0).. that screws up kill, etc. Signed-off-by: Rob Clark <[email protected]>
* Revert "r600g/compute: Fix warnings"Tom Stellard2014-07-232-16/+12
| | | | | | This reverts commit 467f1585e28adba0e94ef593de131bc327f098bb. This breaks the build on some systems.
* radeon/llvm: fix formattingGrigori Goronzy2014-07-231-10/+14
| | | | | | | Use K&R and same indent as most other code. No functional change intended. Reviewed-by: Tom Stellard <[email protected]>
* radeon/llvm: enable unsafe math for graphics shadersGrigori Goronzy2014-07-231-0/+5
| | | | | | | | | | | Accuracy of some operations was recently improved in the R600 backend, at the cost of slower code. This is required for compute shaders, but not for graphics shaders. Add unsafe-fp-math hint to make LLVM generate faster but possibly less accurate code. Piglit didn't indicate any regressions. Reviewed-by: Tom Stellard <[email protected]>
* r600g/compute: Fix warningsTom Stellard2014-07-232-12/+16
|
* r600g: Use hardware sqrt instructionGlenn Kennard2014-07-232-7/+4
| | | | | | | Piglit quick tests including sqrt pass, no other regressions, tested on radeon 6670. Reviewed-by: Alex Deucher <[email protected]>
* r600g/compute: Remove unneeded code from compute_memory_promote_itemBruno Jiménez2014-07-232-36/+12
| | | | | | | | | | | | Now that we know that the pool is defragmented, we positively know that allocated + unallocated will be the total size of the current pool plus all the items that will be promoted. So we only need to grow the pool once. This will allow us to just add the new items to the end of the item_list without the need of looking for a place to the new item. Reviewed-by: Tom Stellard <[email protected]>
* r600g/compute: Quick exit if there's nothing to add to the poolBruno Jiménez2014-07-231-0/+4
| | | | | | | | This way we can avoid defragmenting the pool, even if it is needed to defragment it, and looping again through the list of unallocated items. Reviewed-by: Tom Stellard <[email protected]>
* r600g/compute: Defrag the pool if it's necesaryBruno Jiménez2014-07-232-17/+19
| | | | | | | | | | | | | | | | | This patch adds a new member to the pool to track its status. For now it is used only for the 'fragmented' status, but if needed it could be used for more statuses. The pool will be considered fragmented if: An item that isn't the last is freed or demoted. This 'strategy' has a problem, although it shouldn't cause any bug. If for example we have two items, A and B. We choose to free A first, now the pool will have the 'fragmented' status. If we now free B, the pool will retain its 'fragmented' status even if it isn't fragmented. Reviewed-by: Tom Stellard <[email protected]>
* r600g/compute: Add a function for defragmenting the poolBruno Jiménez2014-07-232-0/+28
| | | | | | | | | | | This new function will move items forward in the pool, so that there's no gap between them, effectively defragmenting the pool. For now this function is a bit dumb as it just moves items forward without trying to see if other items in the pool could fit in the gaps. Reviewed-by: Tom Stellard <[email protected]>
* r600g/compute: Add a function for moving items in the poolBruno Jiménez2014-07-232-0/+93
| | | | | | | | | | | | | | | | This function will be used in the future by compute_memory_defrag to move items forward in the pool. It does so by first checking for overlaping ranges, if the ranges don't overlap it will copy the contents directly. If they overlap it will try first to make a temporary buffer, if this buffer fails to allocate, it will finally fall back to a mapping. Note that it will only be needed to move items forward, it only checks for overlapping ranges in that case. If needed, it can easily be added by changing the first if. Reviewed-by: Tom Stellard <[email protected]>
* freedreno/a3xx: more vtx formatsRob Clark2014-07-231-0/+17
| | | | | | | | Actually what we currently handle is just the SCALED versions, and not the int versions. The difference probably matters more when we actually support integer in the compiler. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: const file relative addressingRob Clark2014-07-238-68/+203
| | | | | | | | | | | | | | | | | | Teach new compiler scheduling and register assignment how to deal with relative addressing. This gets us what we need to avoid falling back to old compiler for CONST[ADDR[0].x+n]. It is also a prerequisite for temp file relative addressing, although that is going to also need some cleverness in register assignment to keep arrays grouped together. NOTE: doing address calculation in full precision and then narrowing to s16 in the mov to addr reg seems to sometimes cause lockups (and sometimes work?!). It seems more reliable to do the address calculation in s16, like the blob does. Which means teaching RA how to deal with mixed half and full precision allocation. Fortunately that didn't turn out to be too hard, so that is a nice bonus which we could probably take better advantage of elsewhere. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: move functionRob Clark2014-07-231-35/+35
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add back a few stallsRob Clark2014-07-231-0/+8
| | | | | | | | | | | Technically we should not need these. CP_LOAD_STATE can be pipelined. But removing them broke a few piglit tests, like fbo-depth- GL_DEPTH_COMPONENT24-readpixels. I expect these are just masking a problem elsewhere, or perhaps they are only needed under some more specific circumstances. But until that is understood properly, give back a bit of the perf boost we got from c63450e8. Signed-off-by: Rob Clark <[email protected]>
* targets/dri: fix freedreno targetsRob Clark2014-07-232-3/+11
| | | | | | | The kernel driver name is either "kgsl" (downstream/android) or "msm" (upstream). Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2014-07-234-14/+14
| | | | Signed-off-by: Rob Clark <[email protected]>
* docs: Update GL3.txt and relnotes for GL_ARB_clear_textureNeil Roberts2014-07-232-1/+2
|
* meta: Add a meta implementation of GL_ARB_clear_textureNeil Roberts2014-07-234-0/+198
| | | | | | | | | | | | | | | | | | | | Adds an implementation of the ClearTexSubImage driver entry point that tries to set up an FBO to render to the texture and then calls glClearBuffer with a scissor to perform the actual clear. If an FBO can't be created for the texture then it will fall back to using _mesa_store_ClearTexSubImage. When used in combination with _mesa_store_ClearTexSubImage this should provide an implementation that works for all DRI-based drivers. However as this has only been tested with the i965 driver it is currently only enabled there. v2: Only enable the extension for the i965 driver instead of all DRI drivers. Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add some more comments. v3: Use glClearBuffer* to avoid having to modify glClearColor and friends. Handle sRGB textures. Explicitly disable dithering. Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
* meta: Add a state flag for the GL_DITHERNeil Roberts2014-07-232-0/+12
| | | | | | | | | The Meta implementation of glClearTexSubImage is going to want to ensure that dithering is disabled so that it can get a consistent color across the whole texture when clearing. This adds a state flag to easily save it and set it to the default value when performing meta operations. Reviewed-by: Topi Pohjolainen <[email protected]>
* texstore: Add a generic implementation of GL_ARB_clear_textureNeil Roberts2014-07-232-0/+79
| | | | | | | | | Adds an implmentation of the ClearTexSubImage driver entry point that just maps the texture and writes the values in. The extension is not yet enabled by default because it doesn't work with multisample textures as they don't have a simple linear layout. Reviewed-by: Jason Ekstrand <[email protected]>
* mesa/main: Add generic bits of ARB_clear_texture implementationNeil Roberts2014-07-233-1/+271
| | | | | | | | | This adds the driver entry point for glClearTexSubImage and fills in the _mesa_ClearTexImage and _mesa_ClearTexSubImage functions that call it. v2: Don't clear some of the images if only one of them makes an error Reviewed-by: Jason Ekstrand <[email protected]>
* teximage: Add utility func for format/internalFormat compatibility checkNeil Roberts2014-07-231-21/+38
| | | | | | | In texture_error_check() there was a snippet of code to check whether the given format and internal format are basically compatible. This has been split out into its own static helper function so that it can be used by an implementation of glClearTexImage too.
* mesa/main: add ARB_clear_texture entrypointsIlia Mirkin2014-07-237-1/+69
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Neil Roberts <[email protected]>
* r600g/radeonsi: Use write-combined CPU mappings of some BOs in GTTMichel Dänzer2014-07-2317-26/+77
| | | | Reviewed-by: Marek Olšák <[email protected]>