aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965: Use align1 mode for barrier messages.Kenneth Graunke2017-01-151-0/+3
| | | | | | | | | | | | | | In commit 7428e6f86ab5 we switched the barrier SEND message's destination type to UW to avoid problems in SIMD16 compute shaders. Tessellation control shaders also use barriers, and in vec4 mode, we were emitting them in align16 mode. The simulator warns that only UD, D, F, and DF are valid destination types - UW is technically illegal. So, switch to align1 mode. Either mode should work fine. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Move Gen4-5 interpolation stuff to brw_wm_prog_data.Kenneth Graunke2017-01-1311-70/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes glxgears rendering, which had surprisingly been broken since late October! Specifically, commit 91d61fbf7cb61a44adcaae51ee08ad0dd6b. glxgears uses glShadeModel(GL_FLAT) when drawing the main portion of the gears, then uses glShadeModel(GL_SMOOTH) for drawing the Gouraud-shaded inner portion of the gears. This results in the same fragment program having two different state-dependent interpolation maps: one where gl_Color is flat, and another where it's smooth. The problem is that there's only one gen4_fragment_program, so it can't store both. Each FS compile would trash the last one. But, the FS compiles are cached, so the first one would store FLAT, and the second would see a matching program in the cache and never bother to compile one with SMOOTH. (Clearing the program cache on every draw made it render correctly.) Instead, move it to brw_wm_prog_data, where we can keep a copy for every specialization of the program. The only downside is bloating the structure a bit, but we can tighten that up a bit if we need to. This also lets us kill gen4_fragment_program entirely! Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965/vec4: Fix mapping attributesJuan A. Suarez Romero2017-01-132-23/+11
| | | | | | | | | | | | | | | | | | This patch reverts 57bab6708f2bbc1ab8a3d202e9a467963596d462, which was causing issues with ILK and earlier VS programs. 1. brw_nir.c: Revert "i965/vec4/nir: vec4 also needs to remap vs attributes" Do not perform a remap in vec4 backend. Rather, do it later when setup attributes 2. brw_vec4.cpp: This fixes mapping ATTRx to proper GRFn. Suggested-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99391 [[email protected]: merge Juan's two patches from bugzilla] Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix textureGather with RG32I/UI on Gen7.Kenneth Graunke2017-01-132-8/+37
| | | | | | | | | | | | | | | | | | | According to the "Gather4 R32G32_FLOAT Bug" internal documentation page, the R32G32_UINT and R32G32_SINT formats are affected by the same bug as R32G32_FLOAT. Applying the same workarounds should be viable - apparently the R32G32_FLOAT_LD format shouldn't corrupt integer data which is NaN or other sketchy floating point values. One irritating caveat is that, because it's a FLOAT format, the alpha channel or any set to SCS_ONE return 0x3f8 (1.0) rather than integer 1. So we need shader code to whack those channels to 1. Fixes GL45-CTS.texture_gather.plain-gather-int-cube-rg on Haswell. v2: Fix swizzle component zeroing (caught by Jordan Justen). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa/get: Remove unused extra_ARB_viewport_arrayBoyan Ding2017-01-131-1/+0
| | | | | | | | Unused since 0a7691ee (mesa: Enable enums for OES_viewport_array). Silence a warning of unused variable. Signed-off-by: Boyan Ding <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* xlib: Unify the style of function pointer calls in structsBoyan Ding2017-01-131-74/+74
| | | | | | Signed-off-by: Boyan Ding <[email protected]> [Emil Velikov: handle the final case in glXCreateContextAttribsARB] Signed-off-by: Emil Velikov <[email protected]>
* radeon: Unify the style of function pointer calls in structsBoyan Ding2017-01-133-17/+17
| | | | | | Signed-off-by: Boyan Ding <[email protected]> [Emil Velikov: handle the all cases] Signed-off-by: Emil Velikov <[email protected]>
* nouveau: Unify the style of function pointer calls in structsBoyan Ding2017-01-131-3/+3
| | | | Signed-off-by: Boyan Ding <[email protected]>
* i915: Add XRGB8888 format to intel_screen_make_configsDerek Foreman2017-01-131-1/+2
| | | | | | | | | | | | | | | | | | This is a copy of commit 536003c11e4cb1172c540932ce3cce06f03bf44e except for i915. Original log for the i965 commit follows: Some application, such as drm backend of weston, uses XRGB8888 config as default. i965 doesn't provide this format, but before commit 65c8965d, the drm platform of EGL takes ARGB8888 as XRGB8888. Now that commit 65c8965d makes EGL recognize format correctly so weston won't start because it can't find XRGB8888. Add XRGB8888 format to i965 just as other drivers do. Signed-off-by: Derek Foreman <[email protected]> Acked-by: Boyan Ding <[email protected]> Tested-by: Mark Janes <[email protected]>
* main/fbobject: throw invalid operation when get_attachment fails if neededAlejandro Piñeiro2017-01-131-7/+42
| | | | | | | | | | | | | | | | In most cases, if a call to get_attachment fails is because attachment is a INVALID_ENUM. But for some specific cases, if COLOR_ATTACHMENTm (where m >= MAX_COLOR_ATTACHMENTS) is used, it should raise an INVALID_OPERATION exception instead. Fixes: GL45-CTS.direct_state_access.framebuffers_get_attachment_parameter_errors GL45-CTS.direct_state_access.framebuffers_renderbuffer_attachment_errors v2: extra new line before quote block. Include "color attachment" on both new message errors (Nicolai). Reviewed-by: Nicolai Hähnle <[email protected]>
* main/fboject: return if it is color_attachment on get_attachmentAlejandro Piñeiro2017-01-131-11/+19
| | | | | | | | | Some callers would need that info to know if they should raise INVALID_ENUM or INVALID_OPERATION. An alternative would be the caller to check if the attachment is a GL_COLOR_ATTACHMENTm, but that seems redundant as get_attachment is already doing that. Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa/main: fix version/extension checks in _mesa_ClampColorNicolai Hähnle2017-01-131-6/+10
| | | | | | | | | | | | | Add a proper check for feature support, and raise an invalid enum for GL_CLAMP_VERTEX/FRAGMENT_COLOR unconditionally in core profiles, since those enums were explicitly removed after the extension was promoted to core functionality (not in the profile sense) with OpenGL 3.0. This matches the behavior of the AMD closed source driver and fixes GL45-CTS.gtf30.GL3Tests.half_float.half_float_textures. Cc: "12.0 13.0" <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nir/i965: assert first is always less than 64Juan A. Suarez Romero2017-01-121-0/+1
| | | | | | This fixes a defect detected by Coverity Scan. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/gen7: expose OpenGL 4.2 on Haswell when supportedJuan A. Suarez Romero2017-01-122-2/+2
| | | | | | | | | GL_ARB_vertex_attrib_64bit was the last piece missing. v2: update docs (Jordan) Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: enable ARB_shader_precision to HSW+Samuel Iglesias Gonsálvez2017-01-121-1/+1
| | | | | | | | v2: update docs (Jordan) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: unify the code to enable of ARB_gpu_shader_fp64 and ↵Samuel Iglesias Gonsálvez2017-01-121-7/+2
| | | | | | | | | ARB_vertex_attrib_64bit for HSW+ Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Enable ARB_vertex_attrib_64bit for HaswellAlejandro Piñeiro2017-01-121-1/+3
| | | | | | | | v2: update docs (Jordan) Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: check for dual slot attributes on any genJuan A. Suarez Romero2017-01-121-2/+1
| | | | | | | Those not supporting 64 bit input vertex attributes will have the dual_slot value as false. Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: emit correctly load_inputs for 64bit dataJuan A. Suarez Romero2017-01-121-6/+15
| | | | | | | | | | | | | For dvec3 and dvec4 types, a single GRF do not have enough space to allocate two inputs from two different vertices (SIMD4x2). So the GRF only contains first two components for the two vertices, and the next GRF has the remaining components. We want to put all the components for the same vertex in the same register. Thus, we do a shuffle to reorder the data. Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: take into account doubles when creating attribute mappingAlejandro Piñeiro2017-01-121-4/+9
| | | | | | | | | | Doubles needs more that one slot per attribute. So when filling the attribute_map we check if it is a double in order to allocate one extra register. Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4/nir: vec4 also needs to remap vs attributesAlejandro Piñeiro2017-01-121-10/+22
| | | | | | | | | | | | | Doubles need extra space, so we would need to do a remapping for vec4 too in order to take that into account. We reuse the already existing remap_vs_attrs, but passing is_scalar, so they could remap accordingly. v2: code-format remap_vs_attrs_params initialization (Matt) Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: use attribute slots for first non payload GRFAlejandro Piñeiro2017-01-121-1/+1
| | | | | | | | | | | | | | | | | As part of the payload setup, setup_attributes is called with the first GRF that can be used for the attributes (first ones are used for uniforms for example) and returns the first GRF that is not part of the payload. Before this patch, it adds directly the number of attributes. But as with 64-bit attributes can consume more than one slot, that is not valid anymore. This patch change the addition to use the number of slots consumed. gen >= 8 would not be affected, as they use the scalar mode. For that case, the vs configuration is done at fs_visitor::assign_vs_urb_setup. v2: add explanation in commit log (Jordan) Reviewed-by: Jordan Justen <[email protected]>
* i965: downsize *64*PASSTHRU formats to equivalent *32*FLOAT formats on gen < 8Alejandro Piñeiro2017-01-121-30/+139
| | | | | | | | | | | | gen < 8 doesn't support *64*PASSTHRU formats when emitting vertices. So in order to provide the equivalent functionality, we need to downsize the format to equivalent *32*FLOAT, and in some cases (R64G64B64 and R64G64B64A64) submit two 3DSTATE_VERTEX_ELEMENTS for each vertex element. Signed-off-by: Alejandro Piñeiro <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: return PASSTHRU surface types also on gen7Alejandro Piñeiro2017-01-121-2/+6
| | | | | | | | Although gen7 doesn't include surface types as a valid conversion format, we return it, as it reflects what we want to achieve, even if we need to workaround it on gen < 8. Reviewed-by: Jordan Justen <[email protected]>
* main/buffers: take into account FRONT_AND_BACK on ReadBufferAlejandro Piñeiro2017-01-121-0/+2
| | | | | | | | | | | | | | | | | | From OpenGL 3.1 spec, section 4.3.1 "Reading Pixels", page 190 (203 PDF) "When READ FRAMEBUFFER BINDING is zero, i.e. the default framebuffer, src must be one of the values listed in table 4.4, including NONE . FRONT_AND_BACK , FRONT , and LEFT refer to the front left buffer." There is an equivalent text on OpenGL 4.5 spec, section 18.2.1 "Selecting Buffers for Reading", page 502 (524 PDF), so the behaviour is still the same. Part of the fix for: GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors Reviewed-by: Anuj Phogat <[email protected]>
* main/buffers: update error handling on DrawBuffers for 4.5Alejandro Piñeiro2017-01-121-13/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before 4.5, GL_BACK was not allowed as a value of bufs. Since 4.5 it is allowed under some circumstances: From the OpenGL 4.5 specification, Section 17.4.1 "Selecting Buffers for Writing", page 493 (page 515 of the PDF): "An INVALID_ENUM error is generated if any value in bufs is FRONT, LEFT, RIGHT, or FRONT_AND_BACK . This restriction applies to both the de- fault framebuffer and framebuffer objects, and exists because these constants may themselves refer to multiple buffers, as shown in table 17.4." And on page 492 (page 514 of the PDF): "If the default framebuffer is affected, then each of the constants must be one of the values listed in table 17.6 or the special value BACK . When BACK is used, n must be 1 and color values are written into the left buffer for single-buffered contexts, or into the back left buffer for double-buffered contexts." This patch keeps the same behaviour if OpenGL version is < 4. We assume that for 4.x this is the intended behaviour, so a fix, but for 3.x the intended behaviour is the already in place. Part of the fix for: GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors v2: remove forgot printf v3: remove spaces before commas on spec quote, split line too long (Anuj) Reviewed-by: Anuj Phogat <[email protected]>
* i965: Enable predicate support on gen >= 8.Rafael Antognolli2017-01-111-1/+1
| | | | | | | | | | Predication needs cmd parser only on gen7. For newer platforms, it should be available without it. v2 (Ken): rebase on recent changes. Signed-off-by: Rafael Antognolli <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Use the nir_move_comparisons pass.Kenneth Graunke2017-01-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While the below stats are encouraging this pass will also become very usefull for avoiding regression once brw_do_channel_expressions() and brw_do_vector_splitting() are disabled. On Broadwell: total instructions in shared programs: 13078787 -> 13060898 (-0.14%) instructions in affected programs: 1809827 -> 1791938 (-0.99%) helped: 4527 HURT: 157 total cycles in shared programs: 256562762 -> 256590424 (0.01%) cycles in affected programs: 159749392 -> 159777054 (0.02%) helped: 5583 HURT: 2289 total spills in shared programs: 14929 -> 14923 (-0.04%) spills in affected programs: 62 -> 56 (-9.68%) helped: 1 HURT: 0 total fills in shared programs: 20144 -> 20141 (-0.01%) fills in affected programs: 253 -> 250 (-1.19%) helped: 1 HURT: 3 LOST: 0 GAINED: 2 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Move nir_lower_locals_to_regs a bit later.Kenneth Graunke2017-01-121-2/+2
| | | | | | | | | | | | | I'm going to add a boolean scheduling pass that I want run late, but after copy propagation and dead code elimination. Yet, I don't want to have to think about registers. So, move the register conversion a little later. No impact on shader-db. Suggested by Jason Ekstrand. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* android: st/mesa: fix building error in libmesa_st_mesaMauro Rossi2017-01-111-1/+3
| | | | | | | Fixes building error due to dependency on nir generated headers Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* compiler: Merge shader_info's tcs and tes structs.Kenneth Graunke2017-01-108-32/+34
| | | | | | | | | | | | | | | Annoyingly, SPIR-V lets you specify all of these fields in either the TCS or TES, which means that we need to be able to store all of them for either shader stage. Putting them in a union won't work. Combining both is an easy solution, and given that the TCS struct only had a single field, it's pretty inexpensive. This patch renames the combined struct to "tess" to indicate that it's for tessellation in general, not one of the two stages. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix number of slots in SSO mode when there are no user varyings.Kenneth Graunke2017-01-091-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | We want vue_map->num_slots to be one more than the final slot. When assigning fixed slots, built-in slots, and non-SSO user varyings, we do slot++. This leaves "slot" as one past the most recently assigned slot. But for SSO user varyings, we computed slot based on the varying location value...and left it at that slot value. To work around this inconsistency, I made num_slots be "slot + 1" if separate and "slot" otherwise. The problem is...if there are no user varyings in SSO mode...then we would have done slot++ when assigning built-ins, so it would be off by one. This resulted in loops from 0 to vue_map->num_slots hitting a bonus BRW_VARYING_SLOT_PAD at the end. This used to break the SIMD8 VS/TES backends, but I fixed that in commit 480d6c1653713dcae617ac523b2ca5deee01c845. It's probably safe at this point, but we should fix it anyway. To fix this, do slot++ in all cases. For SSO mode, we overwrite slot for every varying, so this increment only matters on the last varying. Because we process varyings in order, this will set slot to 1 more than the highest assigned slot. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa: set GLSL 1.20 for the fixed-function fragment shaderMarek Olšák2017-01-101-1/+13
| | | | | | | | | | | | This fixes broken depth texturing after: commit 22639a6e19f95902aef23474ad672bf489231ea7 Author: Timothy Arceri <[email protected]> Date: Mon Nov 21 00:29:29 2016 +1100 st/mesa: get Version from gl_program rather than gl_shader_program Reviewed-by: Roland Scheidegger <[email protected]>
* nir/i965: use two slots from inputs_read for dvec3/dvec4 vertex input attributesJuan A. Suarez Romero2017-01-096-29/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, input_reads was a bitmap tracking which vertex input locations were being used. In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4) consumes just one location, any other small attribute. So we mark the proper bit in inputs_read, and also the same bit in double_inputs_read if the attribute is a dvec3/dvec4. But in Vulkan, this is slightly different: a dvec3/dvec4 attribute consumes two locations, not just one. And hence two bits would be marked in inputs_read for the same vertex input attribute. To avoid handling two different situations in NIR, we just choose the latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4 vertex input attribute is marked with two bits in the inputs_read bitmap (and also in the double_inputs_read), and following attributes are adjusted accordingly. As example, if in our GLSL/IR shader we have three attributes: layout(location = 0) vec3 attr0; layout(location = 1) dvec4 attr1; layout(location = 2) dvec3 attr2; then in our NIR shader we put attr0 in location 0, attr1 in locations 1 and 2, and attr2 in location 3 and 4. Checking carefully, basically we are using slots rather than locations in NIR. When emitting the vertices, we do a inverse map to know the corresponding location for each slot. v2 (Jason): - use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR. v3 (Jason): - Fix commit log error. - Use ladder ifs and fix braces. - elements_double is divisible by 2, don't need DIV_ROUND_UP(). - Use if ladder instead of a switch. - Add comment about hardware restriction in 64bit vertex attributes. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: call intel_prepare_render always when reading pixelsTapani Pälli2017-01-091-6/+6
| | | | | | | | | | | | | | Currently we do this only in the fallback code (when tiled memcpy version failed) but it needs to be done always so that we have correct read and write buffer in place. No regressions seen in CI. Fixes: dEQP-EGL.functional.buffer_age.* Signed-off-by: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98330 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* st/mesa: pass gl_program to st_bind_ubos()Timothy Arceri2017-01-091-18/+18
| | | | | | We no longer need anything from gl_linked_shader. Reviewed-by: Eric Anholt <[email protected]>
* st/mesa: pass gl_program to st_bind_images()Timothy Arceri2017-01-091-24/+22
| | | | | | We no longer need anything from gl_linked_shader. Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: stop passing gl_linked_shader to set_affected_state_flags()Timothy Arceri2017-01-091-7/+6
| | | | | | We now get everything we need from the gl_program param. Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa/glsl: set num_images directly in shader_infoTimothy Arceri2017-01-094-16/+8
| | | | | | This change also removes the now duplicate NumImages field. Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: pass gl_program to st_bind_ssbos()Timothy Arceri2017-01-091-21/+21
| | | | | | We no longer need to pass gl_shader_program. Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Move TES input VUE map calculation out a layer.Kenneth Graunke2017-01-073-9/+11
| | | | | | | | | | | In Vulkan, we'll compile the TCS and TES at the same time, so I can just pass the TCS output VUE map to brw_compile_tes as the TES input VUE map. So, we only need to do this in GL. Move it to the GL-specific layer. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Pass NULL for gl_program when compiling TES.Kenneth Graunke2017-01-071-1/+1
| | | | | | | | This isn't needed, and Vulkan doesn't have one. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Move TES spacing/domain/topology setup to brw_compile_tes().Kenneth Graunke2017-01-072-33/+34
| | | | | | | | Moving this down a layer lets us share code between Vulkan and GL. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Access TES shader info via NIR.Kenneth Graunke2017-01-071-6/+6
| | | | | | | | NIR exists in both GL and Vulkan, but gl_program is GL specific. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Introduce a compiler enum for tessellation spacing.Kenneth Graunke2017-01-075-37/+36
| | | | | | | | | | It feels weird using GL_* enums in a Vulkan driver. v2: Fix the TESS_SPACING -> PIPE_TESS_SPACING conversion. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* compiler: Change shader_info->tes.vertex_order into a ccw boolean.Kenneth Graunke2017-01-073-12/+5
| | | | | | | | | | The vertex order is either clockwise or counterclockwise. We can just store a "ccw" boolean rather than GLenum values. I don't want to use GLenums in a Vulkan driver, and even in GL a simple boolean works fine. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* drirc: Allow extension midshader for Divinity: Original Sin (EE)Kai Wasserbäch2017-01-071-0/+4
| | | | | | | | See also <https://bugs.freedesktop.org/show_bug.cgi?id=93551#c27> where this was first observed as a requirement. Signed-off-by: Kai Wasserbäch <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* i965/compiler: Use the new nir_opt_copy_prop_vars passJason Ekstrand2017-01-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We run this after nir_lower_vars_to_ssa so that as many load/store_var intrinsics as possible before copy_prop_vars executes. This is because the pass isn't particularly efficient (it does a lot of linear walks of a linked list) so we'd like as much of the work as possible to be done before copy_prop_vars runs. Shader DB results on Sky Lake: total instructions in shared programs: 12020290 -> 12013627 (-0.06%) instructions in affected programs: 26033 -> 19370 (-25.59%) helped: 16 HURT: 13 total cycles in shared programs: 137772848 -> 137549012 (-0.16%) cycles in affected programs: 6955660 -> 6731824 (-3.22%) helped: 217 HURT: 237 total loops in shared programs: 3208 -> 3208 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 4112 -> 4057 (-1.34%) spills in affected programs: 483 -> 428 (-11.39%) helped: 2 HURT: 0 total fills in shared programs: 5519 -> 5102 (-7.56%) fills in affected programs: 993 -> 576 (-41.99%) helped: 2 HURT: 0 LOST: 0 GAINED: 0 Broadwell had similar results. On older hardware, the impact isn't as large because they don't advertise GL 4.5. Of the hurt programs, all but one are hurt by a single instruction and the one is hurt by 3 instructions. All of the helped programs, on the other hand, are helped by at least 3 instructions and one kerbal space program shader is helped by 44.59%. The real star of the show, however, is the Gl43CSDof synmark2 benchmark which has two shaders which are cut by 28% and 40% and the over-all runtime performance of the benchmark on my Sky Lake laptop is improved by around 25-30% (it's a bit hard to be exact due to thermal throttling). Reviewed-by: Timothy Arceri <[email protected]>
* i965: Rework gl_TessLevel*[] handling to use NIR compact arrays.Kenneth Graunke2017-01-0610-364/+92
| | | | | | | | | | | | | | | | | | | | | | | | Treating everything as scalar arrays allows us to drop a bunch of special case input/output munging all throughout the backend. Instead, we just need to remap the TessLevel components to the appropriate patch URB header locations in remap_patch_urb_offsets(). We also switch to treating the TES input versions of these as ordinary shader inputs rather than system values, as remap_patch_urb_offsets() just makes everything work out without special handling. This regresses one Piglit test: arb_tessellation_shader-large-uniforms/GL_TESS_CONTROL_SHADER-array-at-limit The compiler starts promoting the constant arrays assigned to gl_TessLevel* to uniform arrays. Since the shader also has a uniform array that uses the maximum number of uniform components, this puts it over the uniform component limit enforced by the linker. This is arguably a bug in the constant array promotion code (it should avoid pushing us over limits), but is unlikely to penalize any real application. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Inline store_output helper in quads workaround code.Kenneth Graunke2017-01-061-14/+10
| | | | | | | | | It's only used in one place, it ignores the offset parameter currently, and I want to add more parameters...at which point, passing in a bunch of integers seems less obvious than writing it out. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>