summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* mesa/queryobject: return INVALID_VALUE if offset < 0 (v2)Dave Airlie2016-05-231-0/+5
| | | | | | | | | | | | This fixes: GL45-CTS.direct_state_access.queries_errors The ARB_direct_state_access spec agrees. v2: move check down further (Ilia) Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa: Unlock mutex on error path.Matt Turner2016-05-221-0/+1
| | | | Caught by Coverity (CID 1362021). Caused by commit 015f2207c.
* i965: remove redundant NULL checkTimothy Arceri2016-05-221-1/+1
| | | | | | We would have segfaulted in the above code if prog could be NULL. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Just read the existing tally on EndTransformFeedback if paused.Kenneth Graunke2016-05-201-20/+22
| | | | | | | | | | | | | | If the transform feedback object is paused when ending, then there are no new snapshots to add to the tally. In fact, we haven't written a starting snapshot, so we'd best not try and compute (end - start). Just load the existing tally so we can convert it to the number of vertices written and store it to the final result location. This is the Haswell+ equivalent of the previous commit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Don't write a counter snapshot on EndTransformFeedback if paused.Kenneth Graunke2016-05-201-1/+2
| | | | | | | | | | | | | | If the transform feedback object is paused, then we've already written an ending counter snapshot. We don't want to write another one. This fixes assertions in GL33-CTS.transform_feedback.api_errors_test, which calls EndTransformfeedback after PauseTransformFeedback. On the next BeginTransformFeedback, we tried to tally up the results, and saw an odd number of snapshots (due to the double-end), and tripped an assertion. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa: Call TransformFeedback driver hooks before setting flags.Kenneth Graunke2016-05-201-5/+5
| | | | | | | | | | | This way, the driver's EndTransformFeedback() hook can tell whether the transform feedback operation was paused. It's also convenient to have Paused remain false until the driver's PauseTransformFeedback hook finishes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* nir: remove dead glsl variables before lowering io.Dave Airlie2016-05-211-0/+1
| | | | | | | | | | | | | For cull distance GLSL will let unsized unused arrays get into the backend, we should nuke those straight away, to save caring about them later. This fixes: arb_separate_shader_objects/linker/large-number-of-unused-varyings as a side effect (even without culling changes). Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: Delete dead dFdy flipping code.Kenneth Graunke2016-05-201-19/+5
| | | | | | | | | | | Rob's nir_lower_wpos_ytransform() pass flips dFdy in the opposite case of what I expected, so we always take the negate_value case. It doesn't really matter. v2: Write src0 before src1 in ADD instructions (requested by Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Delete brw_wm_prog_key::render_to_fbo and drawable_height.Kenneth Graunke2016-05-202-46/+0
| | | | | | | | | | | | Now that we handle flipping and other gl_FragCoord transformations via a uniform, these key fields have no users. This patch actually eliminates the associated recompiles. The Tomb Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable number. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965, anv: Use NIR FragCoord re-center and y-transform passes.Kenneth Graunke2016-05-207-57/+34
| | | | | | | | | | | | | | This handles gl_FragCoord transformations and other window system vs. user FBO coordinate system flipping by multiplying/adding uniform values, rather than recompiles. This is much better because we have no decent way to guess whether the application is going to use a shader with the window system FBO or a user FBO, much less the drawable height. This led to a lot of recompiles in many applications. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix brw_regs_equal() for NaN and positive/negative zero.Kenneth Graunke2016-05-201-1/+2
| | | | | | | | We'd like the comparisons to mean "the exact same bits". Comparing doubles won't do that for NaN values or positive vs. negative zero. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Replace uses of Shared->Mutex with hash-table mutexesMatt Turner2016-05-209-50/+78
| | | | | | | | | | | | | | | We were locking the Shared->Mutex and then using calling functions like _mesa_HashInsert that do additional per-hash-table locking internally. Instead just lock each hash-table's mutex and use functions like _mesa_HashInsertLocked and the new _mesa_HashRemoveLocked. In order to do this, we need to remove the locking from _mesa_HashFindFreeKeyBlock since it will always be called with the per-hash-table lock taken. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* hash: Add _mesa_HashRemoveLocked() function.Matt Turner2016-05-202-4/+17
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Pass nir_src/nir_dest by reference.Matt Turner2016-05-204-18/+18
| | | | | | | | | | Cuts 6K of .text. text data bss dec hex filename 5772372 264648 29320 6066340 5c90a4 lib/i965_dri.so before 5766074 264648 29320 6060042 5c780a lib/i965_dri.so after Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix strerror error code signMark Janes2016-05-201-1/+1
| | | | | | | This trivial fix to error-handling corrects the sign of drm error codes before passing them to strerror. Identified by Coverity: CID1358581
* i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz.Matt Turner2016-05-191-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ken suggested instead of a big and complicated optimization pass, to just recognize the operations here. It's certainly less code and a lot prettier, but it seems to actually perform worse for currently unknown reasons. total instructions in shared programs: 8923452 -> 8904108 (-0.22%) instructions in affected programs: 814563 -> 795219 (-2.37%) helped: 3336 HURT: 10 total cycles in shared programs: 66970734 -> 66651476 (-0.48%) cycles in affected programs: 10582686 -> 10263428 (-3.02%) helped: 2438 HURT: 691 total spills in shared programs: 1811 -> 1789 (-1.21%) spills in affected programs: 85 -> 63 (-25.88%) helped: 4 total fills in shared programs: 3143 -> 3109 (-1.08%) fills in affected programs: 167 -> 133 (-20.36%) helped: 4 LOST: 2 GAINED: 36 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add infrastucture for sample lod-zero operations.Matt Turner2016-05-196-0/+33
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add and use get_nir_src_imm().Matt Turner2016-05-192-4/+19
| | | | | | | | The next patch wants to inspect the LOD argument and do something different if it's 0.0f. But at that point we've emitted a MOV for it and we just have a register to look at. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Silence warnings related to use of uninitialized valuesEduardo Lima Mitev2016-05-191-2/+2
| | | | | | | | | | | | | | | | | | | brw_fs.cpp: In function ‘const unsigned int* brw_compile_fs(const [...] brw_fs.cpp:6093:64: warning: ‘simd16_grf_start’ may be used uninitialized [...] prog_data->base.dispatch_grf_start_reg = simd16_grf_start; brw_fs.cpp:5996:29: note: ‘simd16_grf_start’ was declared here uint8_t simd8_grf_start, simd16_grf_start; brw_fs.cpp:6094:52: warning: ‘simd16_grf_used’ may be used uninitialized [...] prog_data->reg_blocks_0 = brw_register_blocks(simd16_grf_used); brw_fs.cpp:5997:29: note: ‘simd16_grf_used’ was declared here unsigned simd8_grf_used, simd16_grf_used; (and more) Reviewed-by: Anuj Phogat <[email protected]>
* Revert "i965/urb: fixes division by zero"Matt Turner2016-05-181-5/+19
| | | | This reverts commit 2a8aa1e3deb99a1ae16d942318da648c1327ece5.
* i965/urb: fixes division by zeroArdinartsev Nikita2016-05-181-19/+5
| | | | | | | Fixes regression introduced by af5ca43f2676bff7499f93277f908b681cb821d0 Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
* mesa: fclose() filename on error.Matt Turner2016-05-181-1/+5
| | | | | Pretty useless, as it's in debugging code. Found by Coverity (CID 1257016).
* i965/fs: Assert that nir_op_extract_*'s src1 is a constant.Matt Turner2016-05-181-0/+2
|
* i965: Silence unused parameter warningsIan Romanick2016-05-1814-35/+19
| | | | | | | | | | | | The only place that actually used the type parameter was the GS visitor, and it was always passed glsl_type::int. Just remove the parameter. brw_vec4_vs_visitor.cpp:38:61: warning: unused parameter ‘type’ [-Wunused-parameter] const glsl_type *type) ^ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* mesa: Don't advertise GLES 3.1 without compute supportDaniel Scharrer2016-05-181-1/+2
| | | | | | | | | | | The MaxComputeWorkGroupInvocations constant is used in compute_version_es2() instead of extensions->ARB_compute_shader as ES has lower requirements than desktop GL. Both i965 and gallium set this constant before enabling compute support. Signed-off-by: Daniel Scharrer <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* mesa/st: don't leak nameRob Clark2016-05-181-2/+5
| | | | | | Pointed out by coverity. Signed-off-by: Rob Clark <[email protected]>
* st/mesa: remove unused st_context::default_textureBrian Paul2016-05-172-7/+0
| | | | | | | | The code which used this was removed quite a while ago. Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Sinclair Yeh <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* i965: Make brw_reg_from_fs_reg() halve exec_size when compressed.Kenneth Graunke2016-05-171-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | In a5d7e144eaf43fee37e6ff9e2de194407087632b, Connor generalized the exec_size halving code to handle more cases. As part of this, he made it not halve anything if the region accessed falls completely in a single register. Unfortunately, it started producing some invalid regions: -add(16) g6<1>F g10<8,8,1>UW -g1<0,1,0>F { align1 compr }; -add(16) g8<1>F g12<8,8,1>UW -g1.1<0,1,0>F { align1 compr }; +add(16) g6<1>F g10<16,16,1>UW -g1<0,1,0>F { align1 compr }; +add(16) g8<1>F g12<16,16,1>UW -g1.1<0,1,0>F { align1 compr }; Here, the UW source region completely fits within a register. However, we have to use instruction compression because the destination region spans two registers. <16,16,1> is invalid because it's compressed. To handle this, skip the "everything fits in one register" case and fall through to the exec_size halving case when compressed. Fixes hundreds of Piglit regressions on GM965. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move compression decisions before brw_reg_from_fs_reg().Kenneth Graunke2016-05-171-26/+26
| | | | | | | | | brw_reg_from_fs_reg() needs to know whether the instruction will be compressed or not. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Enable ES 3.2 sample shading extensions.Kenneth Graunke2016-05-171-0/+1
| | | | | | | | | | | | | | | This enables: - GL_OES_sample_shading - GL_OES_sample_variables - GL_OES_shader_multisample_interpolation On Gen8, we pass all the CTS tests, and all but 4 of the dEQP-GLES31 tests (dealing with 1x/2x MSAA at half rate sampling). We believe those 4 dEQP-GLES31 tests are incorrect. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* android: fix building error in libmesa_st_mesaMauro Rossi2016-05-171-0/+2
| | | | | | | | | | | | | | | | Fixes the following building error due to libmesa_nir dependency: In file included from external/mesa/src/mesa/state_tracker/st_glsl_to_nir.cpp:44:0: external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory #include "nir_opcodes.h" ^ compilation terminated. build/core/binary.mk:706: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/state_tracker/st_glsl_to_nir.o' failed make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/state_tracker/st_glsl_to_nir.o] Error 1 make: *** Waiting for unfinished jobs.... Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* st/mesa: fix reversed copyimage canonical formatNicolai Hähnle2016-05-171-3/+3
| | | | | | | | | | | The format_desc swizzle describes where in the array each color channel comes from - but the existing code was written as if each entry in the swizzle described the meaning of an array element. Fixes piglit's arb_copy_image-format-swizzle. Cc: "11.1 11.2" <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa/st: add support for NIR as possible driver IRRob Clark2016-05-178-12/+632
| | | | | Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* mesa/st: move things around a bit in st_create_fp_variant()Rob Clark2016-05-171-12/+8
| | | | | | | Prep work for next patch. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa/st: add nir pass for lowering builtin uniformsRob Clark2016-05-173-0/+278
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Add an allow_spilling flag to brw_compile_fsJason Ekstrand2016-05-176-19/+26
| | | | | | | | | This allows us to disable spilling for blorp shaders since blorp state setup doesn't handle spilling. Without this, blorp fails hard if you run with INTEL_DEBUG=spill. Reviewed-by: Francisco Jerez <[email protected]> Tested-by: Francisco Jerez <[email protected]>
* i965: Expose OpenGL 4.2 for gen8+Alejandro Piñeiro2016-05-172-2/+2
| | | | | | | | ARB_vertex_attrib_64bit was the only feature missing. v2: we can expose 4.2 instead of 4.1 (Ian Romanick) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_vertex_attrib_64bit for gen8+Alejandro Piñeiro2016-05-171-0/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: take care of doubles when lowering VS inputsJuan A. Suarez Romero2016-05-173-1/+16
| | | | | | | Input attributes can require 2 vec4 or 1 vec4 depending on whether they are double-precision or not. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: calculate first non-payload GRF using attrib slotsJuan A. Suarez Romero2016-05-173-1/+3
| | | | | | | | | | When computing where the first non-payload GRF starts, we can't rely on the number of attributes, as each attribute can be using 1 or 2 slots depending on whether they are a dvec3/4 or other. Instead, we need to use the number of slots used by the attributes. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: use attribute slots to calculate URB read lengthJuan A. Suarez Romero2016-05-171-3/+9
| | | | | | | | | | Do not use total attributes because a dvec3/dvec4 attribute requires two slots. So rather use total attribute slots. v2: do not use loop to calculate required attribute slots (Kenneth Graunke) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: take care of doubles when remapping VS attributesJuan A. Suarez Romero2016-05-171-15/+11
| | | | | | | Double-precision types require 1 slot in VUE for double and dvec2, and 2 slots for anything else. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: shuffle 32bits into 64bits for doublesJuan A. Suarez Romero2016-05-171-0/+8
| | | | | | | | | | | | | | | | VS Thread Payload handles attributes in URB as vec4, no matter if they are actually single or double precision. So with double-precision types, value ends up in the registers split in 32bits chunks, in different positions. We need to shuffle the chunks to get the doubles correctly. v2: * Extra blank line. Add { } on if body (Ian Romanick) * Use dest directly (Kenneth Graunke) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: half exec_size when dealing with 64 bits attributesAlejandro Piñeiro2016-05-171-2/+19
| | | | | | | | | | | | | The HW has a restriction that only vertical stride may cross register boundaries. Until now this was only handled on VGRFs at rw_reg_from_fs_reg, but it is also needed for attributes. v2: * Remove reference to commit id on commit message (Juan Suarez) * Simplify code that compute final exec_size (Ian Romanick) * Use REG_SIZE on that same code (Kenneth Graunke) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: passthru formats cannot be used width edge flag enabledAlejandro Piñeiro2016-05-171-0/+20
| | | | | | Add an assertion to detect this case. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Configure how to store *64*PASSTHRU vertex componentsAntia Puentes2016-05-171-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | From the Broadwell specification, structure VERTEX_ELEMENT_STATE description: "When SourceElementFormat is set to one of the *64*_PASSTHRU formats, 64-bit components are stored in the URB without any conversion. In this case, vertex elements must be written as 128 or 256 bits, with VFCOMP_STORE_0 being used to pad the output as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red component into the URB, Component 1 must be specified as VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit vertex element, or Components 1-3 must be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex element. Likewise, use of R64G64B64_PASSTHRU requires Component 3 to be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex element." Uses 128-bits to write double and dvec2 vertex elements, and 256-bits for dvec3 and dvec4 vertex elements. Signed-off-by: Juan A. Suarez Romero <[email protected]> Signed-off-by: Antia Puentes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: get the proper vertex surface type for doubles on gen8+Alejandro Piñeiro2016-05-171-3/+27
| | | | | | | | | | This commit adds support for PASSTHRU format when pushing double-precision attributes. Check glarray->Doubles in order to know if we should choose a format that does a conversion to float, or just passthru the 64-bit double. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_shader_precision on Gen8+.Kenneth Graunke2016-05-161-0/+1
| | | | | | | | | | | | | I recently fixed a bug in the Piglit tests: https://lists.freedesktop.org/archives/piglit/2016-May/019802.html With that patch in place, we pass all the tests. So, turn it on. We could probably expose this earlier than Gen8, but the extension says that OpenGL 4.0 is required, and all of our tests are written against GLSL 4.00 (which is only supported on Gen8+). Signed-off-by: Kenneth Graunke <[email protected]>
* mesa/version.c: enable cull distance in version check.Dave Airlie2016-05-171-1/+1
| | | | | Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: check tcs for NULL dereferenceMark Janes2016-05-161-3/+5
| | | | | | | | Coverity issue 1361544 found an instance where the tcs variable is checked for NULL, but unconditionally dereferenced later in the same function. Reviewed-by: Kenneth Graunke <[email protected]>