| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we're always growing the param array as-needed, we can
allocate the param array in common code and stop repeating the
allocation everywere. In order to keep things sane, we ralloc the
[pull_]param array off of the compile context and then steal it back
to a NULL context later. This doesn't get us all the way to where
prog_data::[pull_]param is purely an out parameter of the back-end
compiler but it gets us a lot closer.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that the only thing we put in the array up-front are client push
constants, we can simplify anv_pipeline_compile a bit.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Before, we were calculating up-front and then filling in later. Now we
just grow as needed in anv_nir_apply_pipeline_layout.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This way any image uniforms end up having locations higher than
MAX_PUSH_CONSTANT_SIZE. There's no bug here at the moment, but this
consistency will make the next commit easier. Also, because
nir_apply_pipeline_layout properly increments nir->num_uniforms when
it expands the param array, we no longer need to stomp it to match
prog_data::nr_params because it already does.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead of requiring the caller of brw_compile_vs to figure it out, just
grow the param array on-demand.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of making the caller of brw_compile_cs add something to the
param array for thread_local_id_index, just add it on-demand in
brw_nir_intrinsics and grow the array. This is now safe to do because
everyone is now using ralloc for prog_data::param.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's already only ever called from brw_compile_cs and only handles
compute intrinsics. Let's just make it CS-specific. We can always
make it handle other stages again later if we want.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We haven't needed this ever since we started using NIR for lowering
rectangle textures.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, in the ARB program case _mesa_add_state_reference may grow
the parameter array which will cause brw_nir_setup_arb_uniforms to write
past the end of the param array because it only looks at the parameter
list length but the parma array is allocated based on nir->num_uniforms.
The only reason this hasn't caused us problems is because we are padding
out the param array for fragment programs unnecessarily.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The Vulkan driver does not support pull constants. It simply limits
things such that we can always push everything. Previously, we were
determining whether or not to push things based on whether or not the
prog_data::pull_param array is non-null. This is rather hackish and
about to stop working.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This way we stop leaking it. This is completely safe because, when we
hand it off to anv_shader_bin_create or anv_pipeline_cache_upload_kernel,
they make a copy of the entire param array.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This lets us avoid some of the manual ralloc stealing and prepares for
future commits in which we will want to ralloc prog_data::param.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This burns an extra 10k of memory or so in the case where you don't have
any images. However, if you have several shaders which use images, this
should be much less memory. It also gets rid of a part of prog_data
that really has nothing to do with the compiler.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This should be just as good as looking in prog_data but removes our one
state setup dependency on brw_stage_prog_data::nr_image_param.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves us away to the array of pointers model and onto a model where
each param is represented by a generic uint32_t handle. We reserve 2^16
of these handles for builtins that get generated by somewhere inside the
compiler and have well-defined meanings. Generic params have handles
whose meanings are defined by the driver.
The primary downside to this new approach is that it moves a little bit
of the work that we would normally do at compile time to draw time. On
my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to
have any measurable affect on OglBatch7. So, while this may come back
to bite us, it doesn't look too bad.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The only thing it was handling was push constants. We pull the actual
constant upload code into gen6_constant_state.c and the atoms into
genX_state_upload.c.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This looks like a copy+paste error. They don't actually write into that
variable as would be implied by putting the return there.
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
We didn't fold correctly in the case of 0x1 because we never let the
loop counter hit 0. Switching it to bit >= 0 solves this problem.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This gets rid of all of our hand-rolled size calculation and
serialization code and replaces it with safe "standards" that are used
elsewhere in anv and mesa. This should be significantly safer than
rolling our own.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
| |
This is just a trivial cleanup.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There are certain advantages to using uint8_t internally such as
well-defined arithmetic on all platforms. However, interfaces that
work in terms of raw data should use a void* type.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
These helpers not only call blob_reserve_bytes but also make sure that
the blob is properly aligned as if blob_write_* were called.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Despite the name, it could only be used if you immediately wrote to the
pointer. Noboby was using it outside of one test, so clearly this
behavior wasn't that useful. Instead, make it return an offset into the
data buffer so that the result isn't invalidated if you later write to
the blob. In conjunction with blob_overwrite_bytes(), this will be
useful for leaving a placeholder and then filling it in later, which
we'll need to do for handling phi nodes when serializing NIR.
v2 (Jason Ekstrand):
- Detect overflow in the offset + to_write computation
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
These can be used to easily count up the number of bytes that will be
required by "writing" it into the NULL blob.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There's no reason why that tiny bit of memory needs to be on the heap.
We always put blob_reader on the stack, so why not do the same with the
writable blob.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
We're going to want to use the blob for Vulkan pipeline caching so it
makes sense to have it in libcompiler not libglsl.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we could have a failure followed by a smaller write that
succeeds and get a corrupted blob. If we ever OOM, we should stop.
v2 (Jason Ekstrand):
- Initialize the new boolean member in create_blob
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
| |
Otherwise, if you have a large read fail and then try to do a small
read, the small read may succeed even though it's at the wrong offset.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As all users of brw_blorp_miptree_download() must emit a full pipeline
and cache flush when targetting a user PBO (as that PBO may then be
subsequently bound or *be* bound anywhere and outside of the driver
dirty tracking) move that flush into brw_blorp_miptree_download()
itself.
v2 (Ken): Rebase without userptr stuff so it can land sooner.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This improves the FillTex benchmark in GLBench 2.7 by 30% on my Broxton.
On Ken's Broxton which only has single-channel ram, it improves by 210%.
v2 (Ken): Check mt->aux_usage == ISL_AUX_USAGE_CCS_E rather than using
intel_miptree_is_lossless_compressed().
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v1 (Topi Pohjolainen): original patch.
v2 (Topi Pohjolainen):
- Fix return value (s/MESA_FORMAT_NONE/false/) (Anuj)
- Move _mesa_tex_format_from_format_and_type() just
in the end avoiding additional if-block (Anuj)
- Explain better the array alignment restriction (Anuj)
- Do not bail out in case of gl_pixelstore_attrib::ImageHeight,
it is handled by _mesa_image_offset() automatically (Ken).
- Support 1D_ARRAY by flipping depth, width and y, z (Ken).
v3 (Topi Pohjolainen):
- Contrary to v2, do not try to handle
gl_pixelstore_attrib::ImageHeight. Currently there are no
tests in piglit or cts for it. One could possibly copy or
modify tests/texturing/texsubimage.c. There, however, seems
to be number of corner cases to consider. Moreover, current
meta path applies the packing height for both source and
targets when determining the offset. This would probably
require re-visiting also.
v4 (Topi Pohjolainen): Rebased on top of merged drm-bacon
v5 (Jason Ekstrand):
- Move to brw_blorp.c
- Significant refactoring
- Fixed 1-D array textures
- Simplified handling of PBOs vs. CPU data.
- Handle gl_pixelstore_attrib::ImageHeight. It turns out there are
piglit tests that cover this. The original version was failing them
because of an error in the way it handled 1-D array textures.
- Add support for texture download
v6 (Kenneth Graunke): Rebase fixes:
- Use intel_miptree_check_level_layer instead of deleted fields
- Update for mesa_format_supports_render[] rename.
- Pass 'false' (read-only) to intel_bufferobj_buffer
v7 (Kenneth Graunke):
- Fix brw_blorp_download_miptree to pass 'false' (not read only) for
the destination buffer (caught by Chris Wilson).
- Fix blorp_get_client_bo to pass intel_bufferobj_buffer !read_only
for the 'writable' parameter instead of 'false' (caught by Jason).
- Support GL_BGR, GL_BGRA, GL_BGRA_INTEGER, GL_BGR_INTEGER, allowing
us to use this for ReadPixels on the window system buffer (caught
by Chris Wilson).
- Fix y-flipping bugs in download path (exposed by BGRA support).
- Fix false vs. NULL return value in blorp_get_client_bo.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
I want to reuse it for the BLORP download path.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Framebuffer access includes framebuffer reads so we need to invalidate
the texture cache. We do not, however, need to flush the depth cache
because you cannot do bind a depth texture as an image.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Texture uploads and downloads may go through the render pipe which may
result in texturing from or rendering to the texture or the PBO. We
need to flush accordingly.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
They made a mistake in the MESA_swap_control XML, which I'm pursuing in
their github. Until then, we can just back this piece out.
Tested-by: Mark Janes <[email protected]>
Reviewed-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GL_EXT_texture_sRGB_decode spec says:
"The conversion of sRGB color space components to linear color space is
always performed if the texel lookup function is one of the texelFetch
builtin functions.
Otherwise, if the texel lookup function is one of the texture builtin
functions or one of the texture gather functions, the conversion of sRGB
color space components to linear color space is controlled by the
TEXTURE_SRGB_DECODE_EXT parameter.
If the TEXTURE_SRGB_DECODE_EXT parameter is DECODE_EXT, the conversion
of sRGB color space components to linear color space is performed.
If the TEXTURE_SRGB_DECODE_EXT parameter is SKIP_DECODE_EXT, the value
is returned without decoding. However, if the texture is also accessed
with a texelFetch function, then the result of texture builtin functions
and/or texture gather functions may be returned with decoding or without
decoding."
This patch makes i965 force sRGB decoding for any textures accessed via
texelFetch(). If textures are accessed via texelFetch() and a regular
texture access function, this will affect the other ones too - which is
fine - it's undefined according to the last paragraph quoted.
We could make both work, but we'd have to emit multiple SURFACE_STATEs,
and have two binding table sections, like we do for texture gather hacks
on older platforms.
Fixes the following Android O CTS test:
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.srgba8.texel_fetch
Reviewed-by: Jason Ekstrand <[email protected]>
|