Writing to Compressed Textures

In general it's not possible to use a block-compressed texture as a render target or as a compute shader output. Instead you have to either: Alias the block compressed texture with an uncompressed texture where each texel corresponds to a block, or to output the compressed blocks to an uncompressed texture buffer, and then copy the compressed blocks from that intermediate memory location to the final compressed texture.

Each of the graphics APIs expose this functionality in a different way. This document explains the options available under the following APIs:

Direct3D
Vulkan
Metal
OpenGL
OpenGL ES

Direct3D

Direct3D 10.1 enables copies between prestructured-typed textures and block-compressed textures of the same bit widths. The functions that can accomplish this are CopyResource and CopySubresourceRegion.

For additional information see: Format Conversion Using Direct3D 10.1

The following table lists the allowable source and destination formats that you can use in this reinterpretation type of format conversion:

Bit Width	Uncompressed Resource	Block-Compressed Resource
64	DXGI_FORMAT_R16G16B16A16_UINT DXGI_FORMAT_R16G16B16A16_SINT DXGI_FORMAT_R32G32_UINT DXGI_FORMAT_R32G32_SINT	DXGI_FORMAT_BC1_UNORM[_SRGB] DXGI_FORMAT_BC4_UNORM DXGI_FORMAT_BC4_SNORM
128	DXGI_FORMAT_R32G32B32A32_UINT DXGI_FORMAT_R32G32B32A32_SINT	DXGI_FORMAT_BC2_UNORM[_SRGB] DXGI_FORMAT_BC3_UNORM[_SRGB] DXGI_FORMAT_BC5_UNORM DXGI_FORMAT_BC5_SNORM

This functionality has been carried over to Direct3D 11 and Direct3D 12, but like in Vulkan, Direct3D 12.1 also offers the possibility of creating a compressed Unordered Access View (UAV) of a block-compressed texture. To use this functionality you first need to check that the device supports the RelaxedFormatCastingSupported feature:

D3D12_FEATURE_DATA_D3D12_OPTIONS12 feature_options12 = {};
hr = device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS12,
                                 &feature_options12, sizeof(feature_options12));
if (SUCCEEDED(hr)) {
    supports_relaxed_format_casting = feature_options12.RelaxedFormatCastingSupported;
}

You also need to create an ID3D12Device10 in order to use the CreateCommittedResource3 and provide the list of compatible formats up front:

DXGI_FORMAT format_list[2] = {
    DXGI_FORMAT_BC7_UNORM, DXGI_FORMAT_R32G32B32A32_UINT
};
hr = device10->CreateCommittedResource3(
    &heap_properties,
    D3D12_HEAP_FLAG_NONE,
    &texture_desc,
    D3D12_BARRIER_LAYOUT_COMMON,
    nullptr,
    nullptr,
    countof(format_list),
    format_list,
    IID_PPV_ARGS(bc_texture));

Another option that's possible in Direct3D 12 is to use placed resources to create a compressed and uncompressed texture that share the same memory. This is not officially supported and only works on some vendors (NVIDIA). If there's enough interest I can document this mechanism in more detail, but I would consider that approach deprecated.

Vulkan

Like OpenGL, Vulkan supports copies from uncompressed to block-compressed textures. This feature is available since Vulkan 1.0 through the vkCmdCopyImage API. Unfortunately, many Adreno devices have non-conformant implementations that do not implement this correctly and some PowerVR devices have undocumented restrictions about the texture sizes and formats that are supported.

An alternative approach is to use compute shaders to output the compressed data to a buffer and copy the contents of the buffer to the compressed image using vkCmdCopyBufferToImage. This appears to work well in all cases.

Additionally, the KHR_maintenance2 extension allowed the creation of uncompressed image views of compressed images, which allows us to avoid the copy entirely. This feature was promoted to core in Vulkan 1.1, it's widely available, and it happens to work correctly on the majority of Android devices. The implementation is somewhat obfuscated, but the performance benefits make it worthwhile.

To create block-texel views you first have to create an image with the following flags:

EXTENDED_USAGE: This allows us creating a block compressed image with the unsupported STORAGE usage flag as long as it is removed from the block compressed views.
MUTABLE_FORMAT: This allows us to create image views with different, but compatible formats.
BLOCK_TEXEL_VIEW_COMPATIBLE: This extends the list of compatible formats to include uncompressed formats where the texel size is equal to the block size.

Here's an example:

VkFormat compressed_format = VK_FORMAT_ASTC_4x4_UNORM_BLOCK;
VkFormat uncompressed_format = VK_FORMAT_R32G32B32A32_UINT;

VkImageCreateInfo image_info = { VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO };
image_info.imageType = VK_IMAGE_TYPE_2D;
image_info.format = compressed_format;
image_info.extent = { w, h, 1 };
image_info.mipLevels = 1;
image_info.arrayLayers = 1;
image_info.samples = VK_SAMPLE_COUNT_1_BIT;
image_info.tiling = VK_IMAGE_TILING_OPTIMAL;
image_info.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;

// Note, we create the compressed image with the *STORAGE* usage flag. This is only allowed thanks to EXTENDED_USAGE image flag.
image_info.usage = 
    VK_IMAGE_USAGE_SAMPLED_BIT | 
    VK_IMAGE_USAGE_TRANSFER_DST_BIT | 
    VK_IMAGE_USAGE_STORAGE_BIT;

// Provide the required flags:
image_info.flags = 
    VK_IMAGE_CREATE_EXTENDED_USAGE_BIT | 
    VK_IMAGE_CREATE_BLOCK_TEXEL_VIEW_COMPATIBLE_BIT | 
    VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT;

After this you would allocate, create the image, and upload the data as you would do normally.

In order to create a view for sampling the texture in the shader, you use the compressed format, but for that to succeed you have to explicitly remove the USAGE_STORAGE_BIT:

VkImageViewCreateInfo sample_view_info = {};
sample_view_info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
sample_view_info.image = image;
sample_view_info.viewType = VK_IMAGE_VIEW_TYPE_2D;
sample_view_info.format = compressed_format;
sample_view_info.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
sample_view_info.subresourceRange.levelCount = 1;
sample_view_info.subresourceRange.layerCount = 1;

// Remove the STORAGE usage flag from this view.
VkImageViewUsageCreateInfo sample_view_usage_info = {};
sample_view_usage_info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_USAGE_CREATE_INFO;
sample_view_usage_info.usage = image_info.usage & ~VK_IMAGE_USAGE_STORAGE_BIT;
sample_view_info.pNext = &sample_view_usage_info;

VkImageView sample_view = VK_NULL_HANDLE;
vkCreateImageView(device, &sample_view_info, nullptr, &sample_view);

And to create a view to use the texture as storage in the compute shader, you use the uncompressed format. For maximum compatibility you should also remove the USAGE_SAMPLED_BIT:

// Create texel view for storage:
VkImageViewCreateInfo store_view_info = {};
store_view_info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
store_view_info.image = image;
store_view_info.viewType = VK_IMAGE_VIEW_TYPE_2D;
store_view_info.format = uncompressed_format;
store_view_info.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
store_view_info.subresourceRange.levelCount = 1;
store_view_info.subresourceRange.layerCount = 1;

// Remove the SAMPLED usage flag from this view.
VkImageViewUsageCreateInfo store_view_usage_info = {};
store_view_usage_info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_USAGE_CREATE_INFO;
store_view_usage_info.usage = image_info.usage & ~VK_IMAGE_USAGE_SAMPLED_BIT;
store_view_info.pNext = &store_view_usage_info;

VkImageView store_view = VK_NULL_HANDLE;
VK_CHECK(vkCreateImageView(device, &image_view_create_info, nullptr, &store_view));

Vulkan also provides the KHR_image_format_list extension, which is promoted to Vulkan 1.2. This allows you to provide a list of compatible formats at creation time. Using this extension is not strictly required, but is recommended as it can be an optimization on some devices.

Here's an usage example:

VkImageFormatListCreateInfo format_list_info = {
    VK_STRUCTURE_TYPE_IMAGE_FORMAT_LIST_CREATE_INFO_KHR 
};

VkFormat view_formats[2];
if (physical_device_properties.apiVersion >= VK_API_VERSION_1_2 || supported_extensions.KHR_image_format_list)
{
    view_formats[0] = compressed_format;
    view_formats[1] = uncompressed_format;
    format_list_info.viewFormatCount = 2;
    format_list_info.pViewFormats = view_formats;
    image_info.pNext = &format_list_info;
}

You can use this same procedure to create render target textures with the USAGE_COLOR_ATTACHMENT_BIT and remove it from the corresponding compressed views for sampling. However, this is broken on some Adreno devices, so for simplicity I would recommend to always run the codecs in the compute shader.

That said, on some devices (PowerVR), the codecs can run faster when executed in the pixel shader, so if that's important to you, then it may be worth also exploring that option as an optimization.

Metal

Metal does not support copies from uncompressed to compressed textures. The only method available is to use compute shaders to output the compressed data to a buffer and then use a blit encoder to copy the contents of the buffer over to the texture:

let blitEncoder = commandBuffer.makeBlitCommandEncoder()

blitEncoder.copy(
    from: buffer, 
    sourceOffset: bufferOffset, 
    sourceBytesPerRow: bufferRowLength, 
    sourceBytesPerImage: 0, 
    sourceSize: MTLSize(width:width, height:height, depth:1), 
    to: outputTexture, 
    destinationSlice: 0, 
    destinationLevel: 0, 
    destinationOrigin: MTLOrigin(x: 0, y: 0, z: 0))

blitEncoder.endEncoding()

If for any reason you choose to have the encoder output the compressed data to an uncompressed texture, then you will have to perform two additional copies: One to copy from the texture to a buffer, and then another from the buffer to the compressed texture. So, the recommended approach is to have the encoder output the compressed blocks to a buffer directly.

OpenGL

Before OpenGL 4.3 or without the GL_EXT_copy_image extension the only way to write to compressed textures was using Pixel Buffer Objects (PBO). This required two copies, one from the uncompressed render target to the pixel buffer object, and then from the pixel buffer object to the block compressed texture.

My old NVIDIA OpenGL SDK examples demonstrate this procedure:

This process was simplified with the introduction of glCopyImageSubData in the GL_EXT_copy_image extension, which became a core feature in OpenGL 4.3

glCopyImageSubData allows copying data from an uncompressed texture to a block-compressed texture as long as the two formats are compatible, which generally means that the size of the texel is equal to the size of the block.

The following table shows some of the compatible formats:

Block Size	Uncompressed Internal Format	Compressed Internal Format(s)
64-bit	GL_RGBA32UI	GL_COMPRESSED_RGBA_S3TC_DXT3_EXT, GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT, GL_COMPRESSED_RGBA_S3TC_DXT5_EXT, GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT, GL_COMPRESSED_RG_RGTC2, GL_COMPRESSED_SIGNED_RG_RGTC2, GL_COMPRESSED_RGBA_BPTC_UNORM, GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM, GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT, GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT
128-bit	GL_RGBA16UI, GL_RG32UI	GL_COMPRESSED_RGB_S3TC_DXT1_EXT, GL_COMPRESSED_SRGB_S3TC_DXT1_EXT, GL_COMPRESSED_RGBA_S3TC_DXT1_EXT, GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT, GL_COMPRESSED_RED_RGTC1, GL_COMPRESSED_SIGNED_RED_RGTC1

Another alternative is to use compute shaders to output the compressed data to a buffer and then bind that buffer to the GL_PIXEL_UNPACK_BUFFER target so that it's used as the data source of the following calls to glCompressedTexSubImage2D. The following snippet shows how to do that:

glBindBuffer(GL_PIXEL_UNPACK_BUFFER, tmp_buffer);
glBindTexture(GL_TEXTURE_2D, dst_texture);
glCompressedTexSubImage2D(GL_TEXTURE_2D, dst_level, dst_x, dst_y, width, height, 
    gl_format, tmp_buffer_size, (const void*)tmp_buffer_offset);

None of these two approaches are supported on the latest OpenGL drivers that are available in MacOS. In that OS the only option is to rely on Pixel Buffer Objects or use the Metal API.

OpenGL ES

Like OpenGL, OpenGL ES also supports the GL_EXT_copy_image extension, which became a core feature in OpenGL ES 3.2, so you can use the same procedure as in OpenGL.

There's one caveat, OpenGL ES does not support the rg32ui output image layout. Instead, to output 64-bit blocks, you have to use rgba16ui:

layout(rgba16ui) uniform restrict writeonly highp uimage2D dst;

And pack the 64-bit uvec2 block as follows when passing it to imageStore:

uvec4 pack_uvec2(uvec2 v) {
    return uvec4(v.x & 0xFFFFu, v.x >> 16, v.y & 0xFFFFu, v.y >> 16);
}
...
imageStore(dst, uv, pack_uvec2(block));

Unfortunately, the glCopyImageSubData code path is not heavily tested on Android and does not work under all drivers on some texture types. My recommendation is to use the intermediate buffer approach and glCompressedTexSubImage2D, which I've found to be more reliable.

Note, that this requires the use of compute shaders, which were introduced in OpenGL ES 3.1.

Again, as with MacOS, none of these approaches work in the latest OpenGL version that's available in iOS, so the best option there is to use the Metal API.