summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965: remove redundant NULL checkTimothy Arceri2016-05-221-1/+1
| | | | | | We would have segfaulted in the above code if prog could be NULL. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Just read the existing tally on EndTransformFeedback if paused.Kenneth Graunke2016-05-201-20/+22
| | | | | | | | | | | | | | If the transform feedback object is paused when ending, then there are no new snapshots to add to the tally. In fact, we haven't written a starting snapshot, so we'd best not try and compute (end - start). Just load the existing tally so we can convert it to the number of vertices written and store it to the final result location. This is the Haswell+ equivalent of the previous commit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Don't write a counter snapshot on EndTransformFeedback if paused.Kenneth Graunke2016-05-201-1/+2
| | | | | | | | | | | | | | If the transform feedback object is paused, then we've already written an ending counter snapshot. We don't want to write another one. This fixes assertions in GL33-CTS.transform_feedback.api_errors_test, which calls EndTransformfeedback after PauseTransformFeedback. On the next BeginTransformFeedback, we tried to tally up the results, and saw an odd number of snapshots (due to the double-end), and tripped an assertion. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* nir: remove dead glsl variables before lowering io.Dave Airlie2016-05-211-0/+1
| | | | | | | | | | | | | For cull distance GLSL will let unsized unused arrays get into the backend, we should nuke those straight away, to save caring about them later. This fixes: arb_separate_shader_objects/linker/large-number-of-unused-varyings as a side effect (even without culling changes). Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: Delete dead dFdy flipping code.Kenneth Graunke2016-05-201-19/+5
| | | | | | | | | | | Rob's nir_lower_wpos_ytransform() pass flips dFdy in the opposite case of what I expected, so we always take the negate_value case. It doesn't really matter. v2: Write src0 before src1 in ADD instructions (requested by Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Delete brw_wm_prog_key::render_to_fbo and drawable_height.Kenneth Graunke2016-05-202-46/+0
| | | | | | | | | | | | Now that we handle flipping and other gl_FragCoord transformations via a uniform, these key fields have no users. This patch actually eliminates the associated recompiles. The Tomb Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable number. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965, anv: Use NIR FragCoord re-center and y-transform passes.Kenneth Graunke2016-05-207-57/+34
| | | | | | | | | | | | | | This handles gl_FragCoord transformations and other window system vs. user FBO coordinate system flipping by multiplying/adding uniform values, rather than recompiles. This is much better because we have no decent way to guess whether the application is going to use a shader with the window system FBO or a user FBO, much less the drawable height. This led to a lot of recompiles in many applications. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix brw_regs_equal() for NaN and positive/negative zero.Kenneth Graunke2016-05-201-1/+2
| | | | | | | | We'd like the comparisons to mean "the exact same bits". Comparing doubles won't do that for NaN values or positive vs. negative zero. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Pass nir_src/nir_dest by reference.Matt Turner2016-05-204-18/+18
| | | | | | | | | | Cuts 6K of .text. text data bss dec hex filename 5772372 264648 29320 6066340 5c90a4 lib/i965_dri.so before 5766074 264648 29320 6060042 5c780a lib/i965_dri.so after Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix strerror error code signMark Janes2016-05-201-1/+1
| | | | | | | This trivial fix to error-handling corrects the sign of drm error codes before passing them to strerror. Identified by Coverity: CID1358581
* i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz.Matt Turner2016-05-191-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ken suggested instead of a big and complicated optimization pass, to just recognize the operations here. It's certainly less code and a lot prettier, but it seems to actually perform worse for currently unknown reasons. total instructions in shared programs: 8923452 -> 8904108 (-0.22%) instructions in affected programs: 814563 -> 795219 (-2.37%) helped: 3336 HURT: 10 total cycles in shared programs: 66970734 -> 66651476 (-0.48%) cycles in affected programs: 10582686 -> 10263428 (-3.02%) helped: 2438 HURT: 691 total spills in shared programs: 1811 -> 1789 (-1.21%) spills in affected programs: 85 -> 63 (-25.88%) helped: 4 total fills in shared programs: 3143 -> 3109 (-1.08%) fills in affected programs: 167 -> 133 (-20.36%) helped: 4 LOST: 2 GAINED: 36 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add infrastucture for sample lod-zero operations.Matt Turner2016-05-196-0/+33
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add and use get_nir_src_imm().Matt Turner2016-05-192-4/+19
| | | | | | | | The next patch wants to inspect the LOD argument and do something different if it's 0.0f. But at that point we've emitted a MOV for it and we just have a register to look at. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Silence warnings related to use of uninitialized valuesEduardo Lima Mitev2016-05-191-2/+2
| | | | | | | | | | | | | | | | | | | brw_fs.cpp: In function ‘const unsigned int* brw_compile_fs(const [...] brw_fs.cpp:6093:64: warning: ‘simd16_grf_start’ may be used uninitialized [...] prog_data->base.dispatch_grf_start_reg = simd16_grf_start; brw_fs.cpp:5996:29: note: ‘simd16_grf_start’ was declared here uint8_t simd8_grf_start, simd16_grf_start; brw_fs.cpp:6094:52: warning: ‘simd16_grf_used’ may be used uninitialized [...] prog_data->reg_blocks_0 = brw_register_blocks(simd16_grf_used); brw_fs.cpp:5997:29: note: ‘simd16_grf_used’ was declared here unsigned simd8_grf_used, simd16_grf_used; (and more) Reviewed-by: Anuj Phogat <[email protected]>
* Revert "i965/urb: fixes division by zero"Matt Turner2016-05-181-5/+19
| | | | This reverts commit 2a8aa1e3deb99a1ae16d942318da648c1327ece5.
* i965/urb: fixes division by zeroArdinartsev Nikita2016-05-181-19/+5
| | | | | | | Fixes regression introduced by af5ca43f2676bff7499f93277f908b681cb821d0 Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
* i965/fs: Assert that nir_op_extract_*'s src1 is a constant.Matt Turner2016-05-181-0/+2
|
* i965: Silence unused parameter warningsIan Romanick2016-05-1814-35/+19
| | | | | | | | | | | | The only place that actually used the type parameter was the GS visitor, and it was always passed glsl_type::int. Just remove the parameter. brw_vec4_vs_visitor.cpp:38:61: warning: unused parameter ‘type’ [-Wunused-parameter] const glsl_type *type) ^ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Make brw_reg_from_fs_reg() halve exec_size when compressed.Kenneth Graunke2016-05-171-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | In a5d7e144eaf43fee37e6ff9e2de194407087632b, Connor generalized the exec_size halving code to handle more cases. As part of this, he made it not halve anything if the region accessed falls completely in a single register. Unfortunately, it started producing some invalid regions: -add(16) g6<1>F g10<8,8,1>UW -g1<0,1,0>F { align1 compr }; -add(16) g8<1>F g12<8,8,1>UW -g1.1<0,1,0>F { align1 compr }; +add(16) g6<1>F g10<16,16,1>UW -g1<0,1,0>F { align1 compr }; +add(16) g8<1>F g12<16,16,1>UW -g1.1<0,1,0>F { align1 compr }; Here, the UW source region completely fits within a register. However, we have to use instruction compression because the destination region spans two registers. <16,16,1> is invalid because it's compressed. To handle this, skip the "everything fits in one register" case and fall through to the exec_size halving case when compressed. Fixes hundreds of Piglit regressions on GM965. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move compression decisions before brw_reg_from_fs_reg().Kenneth Graunke2016-05-171-26/+26
| | | | | | | | | brw_reg_from_fs_reg() needs to know whether the instruction will be compressed or not. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Enable ES 3.2 sample shading extensions.Kenneth Graunke2016-05-171-0/+1
| | | | | | | | | | | | | | | This enables: - GL_OES_sample_shading - GL_OES_sample_variables - GL_OES_shader_multisample_interpolation On Gen8, we pass all the CTS tests, and all but 4 of the dEQP-GLES31 tests (dealing with 1x/2x MSAA at half rate sampling). We believe those 4 dEQP-GLES31 tests are incorrect. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Add an allow_spilling flag to brw_compile_fsJason Ekstrand2016-05-176-19/+26
| | | | | | | | | This allows us to disable spilling for blorp shaders since blorp state setup doesn't handle spilling. Without this, blorp fails hard if you run with INTEL_DEBUG=spill. Reviewed-by: Francisco Jerez <[email protected]> Tested-by: Francisco Jerez <[email protected]>
* i965: Expose OpenGL 4.2 for gen8+Alejandro Piñeiro2016-05-172-2/+2
| | | | | | | | ARB_vertex_attrib_64bit was the only feature missing. v2: we can expose 4.2 instead of 4.1 (Ian Romanick) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_vertex_attrib_64bit for gen8+Alejandro Piñeiro2016-05-171-0/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: take care of doubles when lowering VS inputsJuan A. Suarez Romero2016-05-173-1/+16
| | | | | | | Input attributes can require 2 vec4 or 1 vec4 depending on whether they are double-precision or not. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: calculate first non-payload GRF using attrib slotsJuan A. Suarez Romero2016-05-173-1/+3
| | | | | | | | | | When computing where the first non-payload GRF starts, we can't rely on the number of attributes, as each attribute can be using 1 or 2 slots depending on whether they are a dvec3/4 or other. Instead, we need to use the number of slots used by the attributes. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: use attribute slots to calculate URB read lengthJuan A. Suarez Romero2016-05-171-3/+9
| | | | | | | | | | Do not use total attributes because a dvec3/dvec4 attribute requires two slots. So rather use total attribute slots. v2: do not use loop to calculate required attribute slots (Kenneth Graunke) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: take care of doubles when remapping VS attributesJuan A. Suarez Romero2016-05-171-15/+11
| | | | | | | Double-precision types require 1 slot in VUE for double and dvec2, and 2 slots for anything else. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: shuffle 32bits into 64bits for doublesJuan A. Suarez Romero2016-05-171-0/+8
| | | | | | | | | | | | | | | | VS Thread Payload handles attributes in URB as vec4, no matter if they are actually single or double precision. So with double-precision types, value ends up in the registers split in 32bits chunks, in different positions. We need to shuffle the chunks to get the doubles correctly. v2: * Extra blank line. Add { } on if body (Ian Romanick) * Use dest directly (Kenneth Graunke) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: half exec_size when dealing with 64 bits attributesAlejandro Piñeiro2016-05-171-2/+19
| | | | | | | | | | | | | The HW has a restriction that only vertical stride may cross register boundaries. Until now this was only handled on VGRFs at rw_reg_from_fs_reg, but it is also needed for attributes. v2: * Remove reference to commit id on commit message (Juan Suarez) * Simplify code that compute final exec_size (Ian Romanick) * Use REG_SIZE on that same code (Kenneth Graunke) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: passthru formats cannot be used width edge flag enabledAlejandro Piñeiro2016-05-171-0/+20
| | | | | | Add an assertion to detect this case. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Configure how to store *64*PASSTHRU vertex componentsAntia Puentes2016-05-171-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | From the Broadwell specification, structure VERTEX_ELEMENT_STATE description: "When SourceElementFormat is set to one of the *64*_PASSTHRU formats, 64-bit components are stored in the URB without any conversion. In this case, vertex elements must be written as 128 or 256 bits, with VFCOMP_STORE_0 being used to pad the output as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red component into the URB, Component 1 must be specified as VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit vertex element, or Components 1-3 must be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex element. Likewise, use of R64G64B64_PASSTHRU requires Component 3 to be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex element." Uses 128-bits to write double and dvec2 vertex elements, and 256-bits for dvec3 and dvec4 vertex elements. Signed-off-by: Juan A. Suarez Romero <[email protected]> Signed-off-by: Antia Puentes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: get the proper vertex surface type for doubles on gen8+Alejandro Piñeiro2016-05-171-3/+27
| | | | | | | | | | This commit adds support for PASSTHRU format when pushing double-precision attributes. Check glarray->Doubles in order to know if we should choose a format that does a conversion to float, or just passthru the 64-bit double. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_shader_precision on Gen8+.Kenneth Graunke2016-05-161-0/+1
| | | | | | | | | | | | | I recently fixed a bug in the Piglit tests: https://lists.freedesktop.org/archives/piglit/2016-May/019802.html With that patch in place, we pass all the tests. So, turn it on. We could probably expose this earlier than Gen8, but the extension says that OpenGL 4.0 is required, and all of our tests are written against GLSL 4.00 (which is only supported on Gen8+). Signed-off-by: Kenneth Graunke <[email protected]>
* i965: check tcs for NULL dereferenceMark Janes2016-05-161-3/+5
| | | | | | | | Coverity issue 1361544 found an instance where the tcs variable is checked for NULL, but unconditionally dereferenced later in the same function. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Mark is_lossless_compressed_aux UNUSED to silence warning.Matt Turner2016-05-161-1/+1
| | | | Used only in assert().
* i965: Expose OpenGL 4.0 for gen8+Iago Toral Quiroga2016-05-162-2/+4
| | | | | | ARB_gpu_shader_fp64 was the only feature missing. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_gpu_shader_fp64 for gen8+Iago Toral Quiroga2016-05-161-0/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/tes/scalar: Fix load input for doublesIago Toral Quiroga2016-05-161-2/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/tcs/scalar: fix store output for doublesIago Toral Quiroga2016-05-161-21/+96
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/tcs/scalar: fix load input for doublesIago Toral Quiroga2016-05-161-25/+73
| | | | | | | v2: do not write to the original indirect_offset since that is an expression that could be used somewhere else (Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: fix nir_intrinsic_store_output for doublesIago Toral Quiroga2016-05-161-1/+14
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: fix number of output components for doublesIago Toral Quiroga2016-05-161-4/+9
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: handle doubles in type_size_vec4()Iago Toral Quiroga2016-05-161-8/+10
| | | | | | | | | The scalar backend uses this to check URB input sizes. v2: Removed redundant break after return (Curro) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: support doubles with shared variable storesIago Toral Quiroga2016-05-161-5/+35
| | | | | | | | | | This is pretty much the same we do with SSBOs. v2: do not shuffle in-place, it is not safe since the original 64-bit data could be used after the write, instead use a temporary like we do for SSBO stores (Iago) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: support doubles with ssbo storesIago Toral Quiroga2016-05-161-4/+35
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: add shuffle_64bit_data_for_32bit_write helperIago Toral Quiroga2016-05-162-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | This does the inverse operation of shuffle_32bit_load_result_to_64bit_data and we will use it when we need to write 64-bit data in the layout expected by untyped write messages. v2 (curro): - Use subscript() instead of stride() - Assert on the input types rather than silently retyping. - Use offset() instead of horiz_offset(), drop the multiplier definition. - Drop the temporary vgrf and force_writemask_all. - Make component_i const. - Move to brw_fs_nir.cpp v3 (curro): - Pass dst and src by reference. - Simplify allocation of tmp register. - Move to brw_fs_nir.cpp. - Get rid of the temporary. v3 (Iago): - Check that the src and dst regions do not overlap, since that would typically be a bug in the caller. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: support doubles with SSBO loadsIago Toral Quiroga2016-05-161-7/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: support doubles with shared variable loadsIago Toral Quiroga2016-05-161-8/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: Add do_untyped_vector_read helperIago Toral Quiroga2016-05-161-0/+63
| | | | | | | | | | | | | | | We are going to need the same logic for anything that reads doubles via untyped messages (CS shared variables and SSBOs). Add a helper function with that logic so that we can reuse it. v2: - Make this a static function instead of a method of fs_visitor (Iago) - We only support types with a size of 4 or 8 (Curro) - Avoid retypes by using a separate vgrf for the packed result (Curro) - Put dst parameter before source parameters (Curro) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>