summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Handle SF on instructions that write r4.Eric Anholt2016-04-081-10/+14
| | | | | | | Normal SFU writes couldn't have SF because they were marked as multi_instruction, but tex_result and tlb_color_read weren't. This ended up not being a problem according to anything in shader-db, but it seems possible.
* vc4: Allow multi-instruction QIR nodes to get VPM optimization.Eric Anholt2016-04-081-2/+2
| | | | | | | | | | | There used to be multi-instruction operations that would use src[] twice, which is why we couldn't do some optimizations on them. This is no longer the case. total instructions in shared programs: 77973 -> 77969 (-0.01%) instructions in affected programs: 84 -> 80 (-4.76%) total estimated cycles in shared programs: 234165 -> 234157 (-0.00%) estimated cycles in affected programs: 92 -> 84 (-8.70%)
* vc4: Switch to using NIR_PASS macros.Eric Anholt2016-04-085-33/+32
| | | | This gets us better validation of our NIR transformations.
* vc4: Handle nir_intrinsic_load_user_clip_plane as a vec4.Eric Anholt2016-04-082-20/+12
| | | | | | | | I liked having all my NIR be scalar, but nir_validate() complains that the intrinsic writes 4 components but the destination we set up was only 1 component. I could generate a new scalar variant, but it's a lot easier to just leave it as a vec4. This doesn't hurt codegen since we GC unused uniforms, and UCP dot products use all the components anyway.
* vc4: Emit a warning and proceed for handling loops in NIR.Rhys Kidd2016-04-081-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | We don't really suppor control flow yet, but it's a lot nicer to render something and warn on stderr than to crash. Fixes the following piglit tests: - shaders/complex-loop-analysis-bug - shaders/glsl-fs-discard-04 Converts the following piglit tests from crash to fail: - shaders/glsl-fs-continue-inside-do-while - shaders/glsl-fs-loop - shaders/glsl-fs-loop-continue - shaders/glsl-fs-loop-nested - shaders/glsl-texcoord-array - shaders/glsl-vs-continue-inside-do-while - shaders/glsl-vs-loop - shaders/glsl-vs-loop-continue - shaders/glsl-vs-loop-nested No piglit regressions. v2 (Eric): Add stronger stderr warning. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Add a stub for NIR->QIR of control flow function nodesRhys Kidd2016-04-081-0/+11
| | | | | | | | We shouldn't have any NIR functions present since all GLSL functions get inlined, but this would be a more informative error if it does happen. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Add better debug of NIR->QIR control flow graph failureRhys Kidd2016-04-081-1/+2
| | | | | | | | | | | | | Ensure NIR control flow graph nodes that are unhandled in QIR are reported with sufficient verbosity to aid debugging. This improves piglit outputs, amongst other tools. There are no other remaining uses of assert(0) as a blunt tool within vc4. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Remove unused include from vc4_program.cRhys Kidd2016-04-081-1/+0
| | | | | | | | Found with grep and inspection. Test compiled on RPi hw. Assists any future effort to remove TGSI as an intermediate stage. Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: handle unsigned int wraparound in link_shaders()Lars Hamre2016-04-091-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: change check_explicit_uniform_locations() to return an unsigned 0 (Timothy Arceri) We were storing the int result of check_explicit_uniform_locations() in num_explicit_uniform_locs as an unsigned int which caused it to be 4294967295 when a -1 was returned. This in turn would cause the following error during linking: error: count of uniform locations > MAX_UNIFORM_LOCATIONS(4294967295 > 98304) Results from running piglit tests/all with this patch and when ARB_explicit_uniform_location disabled: changes: 178 fixes: 176 regressions: 2 The two regressions are for the following tests: glean@glsl1-matrix column check (1) glean@glsl1-matrix column check (2) which regress from FAIL to CRASH. The regressions are acceptable because the tests are currently failing due to the aforementioned linker error. Signed-off-by: Lars Hamre <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965/tiled_memcopy: Get rid of the direction parameter to get_memcpyJason Ekstrand2016-04-085-22/+5
| | | | | | | | | Now that we can use the much simpler rgba8_copy function, we don't need to hand different functions out based on direction. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tiled_memcpy: Rework the RGBA -> BGRA mem_copy functionsJason Ekstrand2016-04-081-76/+63
| | | | | | | | | | | | | This splits the two copy functions into three: One for unaligned copies, one for aligned sources, and one for aligned destinations. Thanks to the previous commit, we are now guaranteed that the aligned ones will *only* operate on aligned memory so they should be safe. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93962 Cc: "11.1 11.2" <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tiled_memcopy: Add aligned mem_copy parameters to the [de]tiling functionsJason Ekstrand2016-04-081-32/+43
| | | | | | | | | | | | | | | | | | Each of the [de]tiling functions has three mem_copy calls: 1) Left edge to tile boundary 2) Tile boundary to tile boundary in a loop 3) Tile boundary to right edge Copies 2 and 3 start at a tile edge so the pointer to tiled memory is guaranteed to be at least 16-byte aligned. Copy 1, on the other hand, starts at some arbitrary place in the tile so it doesn't have any such alignment guarantees. Cc: "11.1 11.2" <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Check eu/subslices are > 0Ben Widawsky2016-04-081-1/+1
| | | | | | | | Now that the check is restricted to gen8+, we should always get back a non-zero positive value for the EU and subslice counts. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix eu/subslice warningBen Widawsky2016-04-081-11/+23
| | | | | | | | | | | Older gen platforms do not actually return a value for sublice and eu total (IMO, confusingly) they return -ENODEV. This patch defers the SSEU setup until we have the actual GPU generation to avoid useless warnings when running on older platforms with older kernels. Reported-by: Mark Janes <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Extract SSEU configuration infoBen Widawsky2016-04-081-14/+21
| | | | | Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: fix glReadBuffer() assertion failureBrian Paul2016-04-081-0/+2
| | | | | | | | | | | | | | | | | | If the first call in a GL app is glReadPixels(GL_FRONT) we'd fail the assert(st->ctx->FragmentProgram._Current) at st_atom_shader.c:114 in update_fp(). This is because we were calling st_validate_state() without first updating Mesa state with _mesa_update_state(). The regression came from commit 83b589301f4a150f4 "st/mesa: fix frontbuffer glReadPixels regressions". The new piglit gl-1.0-simple-readbuffer test exercises this. Cc: "11.1 11.2" <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* st/va: avoid dereference after free in vlVaDestroyImageThomas Hindoe Paaboel Andersen2016-04-081-1/+3
| | | | | | Cc: "11.1 11.2" <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Tested-by: Julien Isorce <[email protected]>
* radeonsi: do per-pixel clipping based on viewport statesMarek Olšák2016-04-082-11/+85
| | | | | | | | | In other words, vport scissors are derived from viewport states. If the scissor test is enabled, the intersection of both is used. The guard band will disable clipping, so we have to clip per-pixel. Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50/ir: do not try to attach JOIN ops to ATOMSamuel Pitoiset2016-04-071-1/+1
| | | | | | | | | | | This might result in an INVALID_OPCODE dmesg error in case a join is attached to an atomic operation. Spotted with arb_shader_image_load_store-host-mem-barrier on GK104. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* radeonsi: raise number of samplers per shader to 32Nicolai Hähnle2016-04-071-3/+3
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94835 Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: expand the compressed color and depth texture masks to 64 bitsNicolai Hähnle2016-04-073-18/+18
| | | | | | | | | | | This is in preparation of raising the number of exposed sampler views to 32 bits, which will raise the total number of sampler views to 33 for the polygon stipple texture. That texture should never be compressed (and it's certainly not a depth texture), but this approach seems cleaner to me than special-casing the last slot in all affected code paths. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: replace magic 16 by SI_NUM_USER_SAMPLERSNicolai Hähnle2016-04-071-1/+1
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: raise PIPE_MAX_SAMPLERS to 32Nicolai Hähnle2016-04-071-1/+1
| | | | | | | | | | | | | | | The previous value of 18 was motivated by having drivers that want to expose 16 samplers but also use some additional samplers for internal use. Raising the value even higher isn't going to hurt that case. On the other hand, some drivers actually use PIPE_MAX_SAMPLERS as the number of samplers they expose externally, so raising this number above 32 is fragile (because several places in the code use bitfields, and tracking down and widening all of them is prone to miss some case). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/glsl_to_tgsi: make samplers_used an uint32_t (v2)Nicolai Hähnle2016-04-071-3/+5
| | | | | | | | | | | | | It is used as a bitfield, so it seems cleaner to keep it unsigned. The literal 1 is a (signed) int, and shifting into the sign bit is undefined in C, so change occurences of 1 to 1u. v2: add an assert for bitfield size and use 1u << idx Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (v1) Reviewed-by: Marek Olšák <[email protected]> (v1)
* tgsi/scan: add an assert for the size of the samplers_declared bitfieldNicolai Hähnle2016-04-071-1/+2
| | | | | | | The literal 1 is a (signed) int, and shifting into the sign bit is undefined in C, so change occurences of 1 to 1u. Reviewed-by: Brian Paul <[email protected]>
* draw/aaline: stronger guard against no free samplers (v2)Nicolai Hähnle2016-04-071-2/+4
| | | | | | | | | | | | | | | | Line anti-aliasing will fail when there is no free sampler available. Make the corresponding guard more robust in preparation of raising PIPE_MAX_SAMPLERS to 32. The literal 1 is a (signed) int, and shifting into the sign bit is undefined in C, so change occurences of 1 to 1u. v2: add an assert for bitfield size and use 1u << idx Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (v1) Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v1) Reviewed-by: Marek Olšák <[email protected]> (v1)
* util/pstipple: stronger guard against no free samplers (v2)Nicolai Hähnle2016-04-071-2/+4
| | | | | | | | | | | | | | | | When hasFixedUnit is false, polygon stippling will fail when there is no free sampler available. Make the corresponding guard more robust in preparation of raising PIPE_MAX_SAMPLERS to 32. The literal 1 is a (signed) int, and shifting into the sign bit is undefined in C, so change occurences of 1 to 1u. v2: add an assert for bitfield size and use 1u << idx Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (v1) Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v1) Reviewed-by: Marek Olšák <[email protected]> (v1)
* svga: new SVGA_MSAA env var to disable/enable MSAA pixel formatsBrian Paul2016-04-071-2/+4
| | | | | | On by default. Reviewed-by: Jose Fonseca <[email protected]>
* svga: add some trivial null pointer checksBrian Paul2016-04-073-0/+9
| | | | | | | | These small mallocs will probably never fail, but static analysis tools may complain about the missing checks. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* trace: add missing set_shader_images()Samuel Pitoiset2016-04-073-0/+81
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable perfect ZPASS counts for PIPE_QUERY_OCCLUSION_PREDICATEMarek Olšák2016-04-073-5/+16
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't use the real barrier instruction in tess ctrl shadersMarek Olšák2016-04-071-0/+8
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "clover: Fix build against clang SVN >= r265359"Michel Dänzer2016-04-071-3/+0
| | | | | | | | This reverts commit 0daab9878d2b96356cf667591a2c877d912be52d. The corresponding clang change was reverted. Trivial.
* nir/types: Add a wrapper for count_attribute_slotsJason Ekstrand2016-04-072-0/+10
| | | | Reviewed-by: Rob Clark <[email protected]>
* r600: use radeon_emit in a few more places in evergreen_computeDave Airlie2016-04-071-4/+4
| | | | | | | | This is just a cleanup of the code. Acked-by: Tom Stellard <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: make compute global buffer functions static.Dave Airlie2016-04-072-98/+86
| | | | | | | | | This moves things around so that the global buffer handling functions in evergreen_compute.c are static. Acked-by: Tom Stellard <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: make two compute functions static.Dave Airlie2016-04-072-5/+3
| | | | | | | | These aren't used outside evergreen_compute.c Acked-by: Tom Stellard <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: using pipe_grid_info more in evergreen_compute.Dave Airlie2016-04-072-26/+21
| | | | | | | | | No reason to pull the pieces apart here, also make one of the functions static as it's unused outside this. Acked-by: Tom Stellard <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: in evergreen_compute use ctx consistently instead of ctx_Dave Airlie2016-04-071-25/+25
| | | | | | Acked-by: Tom Stellard <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: use rctx consistently in evergreen_compute.cDave Airlie2016-04-071-74/+74
| | | | | | | | Another step towards cleaning this up. Acked-by: Tom Stellard <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: cleanup whitespace in evergreen_compute.cDave Airlie2016-04-071-87/+75
| | | | | | | | | | This aligns the code with the style of the rest of the driver. Makes editing it a lot less painful. Acked-by: Tom Stellard <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* GL3.txt: Mark ARB_framebuffer_no_attachments as doneEdward O'Callaghan2016-04-072-1/+2
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600g: Enable ARB_framebuffer_no_attachmentsEdward O'Callaghan2016-04-071-1/+1
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Enable ARB_framebuffer_no_attachmentsEdward O'Callaghan2016-04-071-1/+1
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Improve assert info out of si_set_framebuffer_state()Edward O'Callaghan2016-04-071-0/+2
| | | | | | | | Lets give the developer a little hand if we are going to assert on a zero literal at the end of a branch. Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Allow 16 samples MSAA mode for PIPE_FORMAT_NONEEdward O'Callaghan2016-04-071-0/+5
| | | | | | | | | For ARB_framebuffer_no_attachment; A is_format_supported() query with 'PIPE_FORMAT_NONE' passed implies a query of the number of samples supported from the framebuffer with no attachment. Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* softpipe: Set samples and layers in set_framebuffer_state() cbEdward O'Callaghan2016-04-071-0/+2
| | | | | | | | | Carries across the number of samples and layers state in the 'softpipe_set_framebuffer_state()' callback. This state is part of 'ARB_framebuffer_no_attachments' support. Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa/st: Update framebuffer state with no.of samples,layersEdward O'Callaghan2016-04-071-3/+5
| | | | | | | | Handle the case of ARB_framebuffer_no_attachment. Also, kill off a dead debug printf() call while we are here. Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/trace: Dump no.of samples and layers in fb stateEdward O'Callaghan2016-04-071-0/+2
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: Put no.of {samples,layers} into pipe_framebuffer_stateEdward O'Callaghan2016-04-073-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we store the number of samples and layers directly in the pipe_framebuffer_state so that in the case of ARB_framebuffer_no_attachment we may make use of them directly. Further, we adjust various gallium/auxiliary helper functions accordingly. V2: Convert branches in util_framebuffer_get_num_layers() and util_framebuffer_get_num_samples() to their canonical form. V3: 'git stash pop' the typo fix of 'cbufs' which should be 'nr_cbufs' that was missing in V2, woops! Thanks Marek for pointing this out yet again. V4: Squash in the following patch: 'gallium/util: Ensure util_framebuffer_get_num_samples() is valid' Upon context creation, internal driver structures are malloc()'ed and memset() to zero them. This results in a invalid number of samples 'by default'. Handle this in the simplest way to avoid elaborate and probably equally sub-optimial solutions. Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>