summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Compute q-values for register allocation manuallyJason Ekstrand2014-10-241-2/+56
| | | | | | | | | | | | | | | | | | | | | Previously, we were allowing the register allocation code to do the computation for us in ra_set_finalize. However, the runtime for this computation is O(c^4 * g) where c is the number of classes and g is the number of GRF registers. However, these q-values are directly computable based on the way we lay out our register classes so there is no need for the aweful runtime algorithm. We were doing ok until commit 7210583eb where we bumped the number of register classes from 11 to 16. While startup times don't normally matter, this caused piglit to take 4 times as long to run on Bay Trail. This patch should make generating the ra_set much faster and melt the piglit run times. v2: Fixed a couple of bugs. I have now verified that the same q-values are generated both ways. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Don't interfere with too many base registersJason Ekstrand2014-10-241-2/+2
| | | | | | | | | | | | On older GENs in SIMD16 mode, we were accidentally building too much interference into our register classes. Since everything is divided by 2, the reigster allocator thinks we have 64 base registers instead of 128. The actual GRF mapping still needs to be doubled, but as far as the ra_set is concerned, we only have 64. We were accidentally adding way too much interference. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Properly precolor payload registers on GEN5 in SIMD16Jason Ekstrand2014-10-241-1/+10
| | | | | | | | | | | For GEN6 SIMD16 mode, we have to 2-align all the registers, so we only have the even-numbered ones. This means that we have to divide the register number by 2 when we precolor. This wasn't a problem before because we were setting up the interference between ra_node registers wrong. This will be fixed in the next commit. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add another use of MAX_VGRF_SIZEJason Ekstrand2014-10-241-1/+1
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* util: Use reg_belongs_to_class instead of BITSET_TESTJason Ekstrand2014-10-241-1/+1
| | | | | | | | | This shouldn't be a functional change since reg_belongs_to_class is just a wrapper around BITSET_TEST. It just makes the code a little easier to read. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* llvmpipe: Ensure the packed input of the lp_test_format is aligned.José Fonseca2014-10-241-2/+10
| | | | | | | | Fixes: - https://bugs.freedesktop.org/show_bug.cgi?id=85377 - http://llvm.org/bugs/show_bug.cgi?id=21365 Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Flush stdout on lp_test_* unit tests.José Fonseca2014-10-242-0/+3
| | | | | | | So that the order of test messages and gallivm/llvmpipe debug output is preserved. Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: Enable ARB_clip_control for gallium drivers.Mathias Fröhlich2014-10-242-1/+14
| | | | | | | | | | | | | | | | Gallium should be prepared fine for ARB_clip_control. So enable this and mention it in the release notes. v2: Only enable for drivers announcing the freshly introduced PIPE_CAP_CLIP_HALFZ capability. v3: Use extension enable infrastructure to connect PIPE_CAP_CLIP_HALFZ with ARB_clip_control. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* gallium: introduce PIPE_CAP_CLIP_HALFZ.Mathias Fröhlich2014-10-2415-0/+20
| | | | | | | | | | | | In preparation of ARB_clip_control. Let the driver decide if it supports pipe_rasterizer_state::clip_halfz being set to true. v3: Initially enable on ilo. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]
* mesa: Handle clip control in meta operations.Mathias Fröhlich2014-10-242-0/+9
| | | | | | | | | | | Restore clip control to the default state if MESA_META_VIEWPORT or MESA_META_DEPTH_TEST is requested. v3: Handle clip control state with MESA_META_TRANSFORM. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* mesa: Implement ARB_clip_control.Mathias Fröhlich2014-10-2413-7/+159
| | | | | | | | | | | | | | | Implement the mesa parts of ARB_clip_control. So far no driver enables this. v3: Restrict getting clip control state to the availability of ARB_clip_control. Move to transformation state. Handle clip control state with the GL_TRANSFORM_BIT. Move _FrontBit update into state.c. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* mesa: Refactor viewport transform computation.Mathias Fröhlich2014-10-247-61/+73
| | | | | | | | | | This is for preparation of ARB_clip_control. v3: Add comments. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* vc4: Reuse uniform_data/contents indices when making uniforms.Eric Anholt2014-10-241-0/+7
| | | | | | | | | | This allows vc4_opt_cse.c to CSE-away operations involving the same uniform values. total instructions in shared programs: 37341 -> 36906 (-1.16%) instructions in affected programs: 10233 -> 9798 (-4.25%) total uniforms in shared programs: 10523 -> 10320 (-1.93%) uniforms in affected programs: 2467 -> 2264 (-8.23%)
* vc4: When asked to discard-map a whole resource, discard it.Eric Anholt2014-10-241-14/+28
| | | | | | | This saves a bunch of extra flushes when texsubimaging a whole texture that's been used for rendering, or subdataing a whole BO. In particular, this massively reduces the runtime of piglit texture-packed-formats (when the probes have been moved out of the inner loop).
* vc4: Refactor flushing before mapping a BO.Eric Anholt2014-10-243-12/+13
| | | | I'm going to want to make some other decisions here before flushing.
* vc4: Allow dead code elimination of unused varyings.Eric Anholt2014-10-245-5/+31
| | | | | | | total instructions in shared programs: 39022 -> 37341 (-4.31%) instructions in affected programs: 26979 -> 25298 (-6.23%) total uniforms in shared programs: 11242 -> 10523 (-6.40%) uniforms in affected programs: 5836 -> 5117 (-12.32%)
* vc4: Add debug output to match shaderdb info to program dumps.Eric Anholt2014-10-244-7/+29
| | | | | | I'm going to be using VC4_DEBUG=shaderdb,norast to do shaderdb stats, but when debugging regressions, I want to match shaderdb output to shader disassembly.
* radeon: enable Hyper-Z on r600g and radeonsi by defaultAndreas Boll2014-10-244-5/+5
| | | | | | | | | | | | | | | | | This reverts commit 01e637114914453451becc0dc8afe60faff48d84. Since then many Hyper-Z issues have been fixed or worked around. Enable Hyper-Z by default so that we get enough feedback for the upcoming mesa 10.4 release. If you have issues with Hyper-Z try to disable Hyper-Z using the enviroment variable R600_DEBUG=nohyperz and please report the issue on the bugtracker. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75011 See also: https://bugs.freedesktop.org/show_bug.cgi?id=75112 Signed-off-by: Andreas Boll <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965: Silence unused variable warning.Matt Turner2014-10-231-2/+1
|
* i965/fs: Silence uninitialized variable warning.Matt Turner2014-10-231-0/+1
| | | | | | | The compiler isn't privy to the knowledge that we're doing at least one framebuffer write. Reviewed-by: Jason Ekstrand <[email protected]>
* util: Add assume() macro.Matt Turner2014-10-231-0/+14
| | | | Reviewed-by: Ian Romanick <[email protected]>
* glapi: Fix compiler warning and script nameJan Vesely2014-10-231-2/+2
| | | | | Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* Revert "freedreno/a3xx: only emit dirty consts"Rob Clark2014-10-232-9/+5
| | | | | | | This reverts commit 94bb33617d1e8978dc52b8aaa4eb41bfb6703f79. Which somehow broke gnome-shell.. and needs more investigation. For now, revert..
* freedreno: fix PIPE_TRANSFER_DISCARD_WHOLE_RESOURCERob Clark2014-10-231-7/+6
| | | | | | | | | | | | | | | fd_bo_cpu_prep() doesn't realize the bo is already referenced in unflushed cmdstream. It could be made to do so (but would have to be implemented twice, ie. both for msm and kgsl). But we still can't do the expected thing if the caller isn't using _NOSYNC. Because of the way the tiling works, we need to build quite a bit of cmdstream at flush time, which is not possible to do at the libdrm level. So rather than trying to make fd_bo_cpu_prep() smarter than it can possibly be, just *always* discard and reallocate if the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag is set. Signed-off-by: Rob Clark <[email protected]>
* clover: use correct typenames for compat::pair's first/secondEmil Velikov2014-10-231-2/+2
| | | | | | | | | | Seems to be a typo judging from the overall declaration of the template. Cc: EdB <[email protected]> Cc: Francisco Jerez <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* auxiliary/os: get the mmap/munmap wrappers working with androidEmil Velikov2014-10-231-5/+12
| | | | | | | | | | - Use macro for munmap under Android - the STATIC_ASSERT uses a off_t which is not used under Android for mmap. As loff_t size does not vary as does off_t just ignore the assert. - Wrap the long lines to improve readability. Signed-off-by: Emil Velikov <[email protected]>
* gallium/nouveau: fully build the driver under androidMauro Rossi2014-10-231-1/+1
| | | | | | Fix the trivial typo in the variable name. Cc: "10.2 10.3" <[email protected]>
* mesa/shaderimage.c: fix inconsistent sign warningAlon Levy2014-10-231-1/+1
| | | | | Signed-off-by: Alon Levy <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* wgl: stw_pixelformat_get_info: correct type for index variableAlon Levy2014-10-231-1/+1
| | | | | Signed-off-by: Alon Levy <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* u_math.h: fix 64 to 32 bit truncation warningAlon Levy2014-10-231-1/+1
| | | | | Signed-off-by: Alon Levy <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallivm: Fix build with LLVM 3.3.José Fonseca2014-10-231-0/+2
| | | | | | | | | The setMCJITMemoryManager method doesn't exist in LLVM 3.3. I thought I had tested the latest version of my earlier change with LLVM 3.3, but it looks I missed it. Trivial.
* gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.José Fonseca2014-10-232-38/+41
| | | | | | | | | | | | | | | | | | JITMemoryManager was removed in LLVM 3.6, and replaced by its base class RTDyldMemoryManager. This change fixes our JIT memory managers specializations to derive from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager. This enables llvmpipe to run with LLVM 3.6. However, lp_free_generated_code is basically a no-op because there are not enough hook points in RTDyldMemoryManager to track and free the code of a module. In other words, with MCJIT, code once created, stays forever allocated until process destruction. This is not speicfic to LLVM 3.6 -- it will happen whenever MCJIT is used regardless of version. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: Fix white-space.José Fonseca2014-10-231-7/+7
| | | | | | Replace tabs with spaces. Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm,llvmpipe,clover: Bump required LLVM version to 3.3.José Fonseca2014-10-237-119/+8
| | | | | | | | | | | | | | We'll need to update gallivm for the interface changes in LLVM 3.6, and the fewer the number of older LLVM versions we support the less hairy that will be. As consequence HAVE_AVX define can disappear. (Note HAVE_AVX meant whether LLVM version supports AVX or not. Runtime support for AVX is always checked and enforced independently.) Verified llvmpipe builds and runs with with LLVM 3.3, 3.4, and 3.5. Reviewed-by: Roland Scheidegger <[email protected]>
* mesa: remove conditional render and rgtc from ES3 requirementsIlia Mirkin2014-10-231-2/+0
| | | | | | | The functionality exposed by those extensions does not appear in ES3 Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* u_blitter: put a comment on util_blitter_cache_all_shaders()Brian Paul2014-10-221-0/+7
| | | | Trivial.
* u_blitter: use ctx->bind_fs_state(), not pipe->bind_fs_state()Brian Paul2014-10-221-3/+3
| | | | | | Consistently use the function pointer we saved earlier. Reviewed-by: Marek Olšák <[email protected]>
* u_blitter: create basic fs shaders in util_blitter_cache_all_shaders()Brian Paul2014-10-221-1/+12
| | | | | | We need to create all fs shaders in this function. Reviewed-by: Marek Olšák <[email protected]>
* u_blitter: do error checking assertions for shader cachingBrian Paul2014-10-221-21/+30
| | | | | | | | | If the user calls util_blitter_cache_all_shaders() set a flag and assert that we never try to create any new fragment shaders after that point. If the assertions fails, it means we missed generating some shader in util_blitter_cache_all_shaders(). Reviewed-by: Marek Olšák <[email protected]>
* glsl: Use signed array index in update_max_array_access()Anuj Phogat2014-10-221-3/+3
| | | | | | | | | Avoids a crash in case of negative array index is used in a shader program. Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* glsl: Fix crash due to negative array indexAnuj Phogat2014-10-221-1/+1
| | | | | | | | | | | | | | | | Currently Mesa crashes with a shader like this: [fragmnet shader] float[5] array; int idx = -2; void main() { gl_FragColor = vec4(0.0, 1.0, 0.0, array[idx]); } Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* radeonsi: implement pipe_rasterizer_state::clip_halfzMarek Olšák2014-10-221-0/+1
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* r600g: implement pipe_rasterizer_state::clip_halfzMarek Olšák2014-10-222-0/+2
| | | | Reviewed-by: Alex Deucher <[email protected]>
* r300g: implement pipe_rasterizer_state::clip_halfzMarek Olšák2014-10-223-0/+9
| | | | Reviewed-by: Alex Deucher <[email protected]>
* r600g: Drop references to destroyed blend stateMichel Dänzer2014-10-221-1/+8
| | | | | | | | | | | | Fixes use-after-free when the currently bound blend state is destroyed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85267 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84140 Reviewed-by: Marek Olšák <[email protected]> Tested-by: Dieter Nützel <[email protected]> Cc: [email protected]
* i965/vec4: Generate better code for ir_triop_csel.Kenneth Graunke2014-10-211-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | Previously, we generated an extra CMP instruction: cmp.ge.f0(8) g6<1>D g1<0,4,1>F 0F cmp.nz.f0(8) null g6<4,4,1>D 0D (+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F The first operand is always a boolean, and we want to predicate the SEL on that. Rather than producing a boolean value and comparing it against zero, we can just produce a condition code in the flag register. Now we generate: cmp.ge.f0(8) null g1<0,4,1>F 0F (+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F No difference in shader-db. v2: Remember to delete the old code (thanks Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Simplify visit(ir_expression *)'s result_src/dst setup.Kenneth Graunke2014-10-211-13/+6
| | | | | | | | | | | | | Using dst_reg(this, ir->type) automatically sets the writemask to the proper size for the type; src_reg(dst_reg) preserves that. This should be equivalent, but less code. Note that src_reg(dst_reg) either uses SWIZZLE_XXXX or SWIZZLE_XYZW, so the old code did need the manual writemask adjustment, since it constructed the registers the other way around. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Delete some dead code in visit(ir_expression *).Kenneth Graunke2014-10-211-8/+0
| | | | | | | | | | | Nothing uses the vector_elements temporary variable. Setting this->result.file is dead because we overwrite this->result a few lines later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Generate better code for ir_triop_csel.Kenneth Graunke2014-10-211-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | Previously, we generated an extra CMP instruction: cmp.ge.f0(8) g4<1>D g2<0,1,0>F 0F cmp.nz.f0(8) null g4<8,8,1>D 0D (+f0) sel(8) g120<1>F g2.4<0,1,0>F g3<0,1,0>F The first operand is always a boolean, and we want to predicate the SEL on that. Rather than producing a boolean value and comparing it against zero, we can just produce a condition code in the flag register. Now we generate: cmp.ge.f0(8) null g2<0,1,0>F 0F (+f0) sel(8) g124<1>F g2.4<0,1,0>F g3<0,1,0>F total instructions in shared programs: 5473459 -> 5473253 (-0.00%) instructions in affected programs: 6219 -> 6013 (-3.31%) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Delete unused gl_uniform_driver_format enum values.Kenneth Graunke2014-10-212-36/+2
| | | | | | | | | | | | | | A while back, Matt made the uniform upload functions simply upload ctx->Const.UniformBooleanTrue for boolean values instead of 0/1, which removed the need to convert it later. We also set UniformBooleanTrue to 1.0f for drivers which want to treat booleans as 0.0/1.0f. Nothing ever sets these, so they are dead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>