fast_image_resize-5.3.0/.cargo_vcs_info.json0000644000000001360000000000100145210ustar { "git": { "sha1": "1ca7f37c73c90f5257b727dc9570e6a25b73a76a" }, "path_in_vcs": "" }fast_image_resize-5.3.0/.gitignore000064400000000000000000000000441046102023000152770ustar 00000000000000/target .* !/.gitignore data/result fast_image_resize-5.3.0/CHANGELOG.md000064400000000000000000000502471046102023000151320ustar 00000000000000## [5.3.0] - 2025-09-02 ### Added - Added support for multi-thread image resizing using the `ResizeAlg::Nearest` algorithm. ([#54](https://github.com/Cykooz/fast_image_resize/issues/54)). ## [5.2.2] - 2025-08-29 ### Fixed - Fixed a "divide by zero" error in case of using multithreading to resize images with particular sizes ([#55](https://github.com/Cykooz/fast_image_resize/issues/55)). ## [5.2.1] - 2025-07-27 ### Changed - Added minimum supported Rust version (MSRV) into `Cargo.toml`. ## [5.2.0] - 2025-07-12 ### Added - Added support of `DynamicImage::ImageRgb32F` and `DynamicImage::ImageRgba32F` form the `image` crate ([#50](https://github.com/Cykooz/fast_image_resize/pull/50)). ## [5.1.4] - 2025-05-16 ### Fixed - Fixed `SSE4.1` and `AVX2` implementation for dividing image by alpha channel for images with `U16x2` pixels. - Fixed `NEON` implementation for dividing image by alpha channel for images with `U16x2` and `U16x4` pixels . ## [5.1.3] - 2025-04-06 ### Fixed - Fixed error in `NEON` implementation of `MulDiv::multiply_alpha()` and `MulDiv::multiply_alpha_inplace()` for `U8x2` pixels ([#49](https://github.com/Cykooz/fast_image_resize/issues/49)). - Replaced the internal crate `testing` on the corresponding module in the `tests` directory ([#48](https://github.com/Cykooz/fast_image_resize/issues/48)). ## [5.1.2] - 2025-02-16 ### Fixed - Fixed error in implementation of `ImageView::split_by_width()`, `ImageView::split_by_height()`, `ImageViewMut::split_by_width_mut()` and `ImageViewMut::split_by_height_mut()` ([#46](https://github.com/Cykooz/fast_image_resize/issues/46)). ## [5.1.1] - 2025-01-13 ### Fixed - Fixed error in implementation of `ImageView::split_by_width()`, `ImageView::split_by_height()`, `ImageViewMut::split_by_width_mut()` and `ImageViewMut::split_by_height_mut()` ([#43](https://github.com/Cykooz/fast_image_resize/issues/43)). ## [5.1.0] - 2024-12-09 ### Changed - Improved speed (about 9%) of `SSE4.1` implementation for vertical convolution pass for pixel types based on `u8` components. ### Fixed - `is_aarch64_feature_detected()` is used now in the `CpuExtensions::is_supported()` method for `aarch64` architecture. ## [5.0.0] - 2024-10-03 ### Added - Added support for multi-thread image processing with the help of `rayon` crate. You should enable `rayon` feature to turn on this behavior. - Added methods to split image in different directions: - `ImageView::split_by_height()` - `ImageView::split_by_width()` - `ImageViewMut::split_by_height_mut()` - `ImageViewMut::split_by_width_mut()` These methods have default implementation and are used for multi-thread image processing. ### Changed - **BREAKING**: Added supertraits `Send`, `Sync` and `Sized` to the `ImageView` trait. - Optimized convolution algorythm by deleting zero coefficients from start and end of bounds. ## [4.2.3] - 2025-05-16 ### Fixed - Fixed `SSE4.1` and `AVX2` implementation fo dividing image by alpha channel for images with `U16x2` pixels. - Fixed `NEON` implementation for dividing image by alpha channel for images with `U16x2` and `U16x4` pixels. ## [4.2.2] - 2025-04-06 ## Fixed - Fixed error in `NEON` implementation of `MulDiv::multiply_alpha()` and `MulDiv::multiply_alpha_inplace()` for `U8x2` pixels ([#49](https://github.com/Cykooz/fast_image_resize/issues/49)). ## [4.2.1] - 2024-07-24 ### Fixed - Disabled default features of the `image` crate (#36). ## [4.2.0] - 2024-07-19 ### Added - Added new resize algorithm `ResizeAlg::Interpolation` (#32). It is like `ResizeAlg::Convolution` but with fixed kernel size. This algorithm can be useful if you want to get a result similar to `OpenCV` (except `INTER_AREA` interpolation). ## [4.1.0] - 2024-07-14 ### Added - Added support for optimization with help of `SSE4.1` and `AVX2` for the `F32` pixel type. - Added support for new pixel types `F32x2`, `F32x3` and `F32x4` with optimizations for `SSE4.1` and `AVX2` (#30). ## [4.0.0] - 2024-05-13 ### Added - Added Gaussian filter for convolution algorithm. - Method `PixelType::size()` was made public. - Added new image containers: - `ImageRef` - `TypedImageRef` - `TypedImage` - `TypedCroppedImage` - `TypedCroppedImageMut` - `CroppedImage` - `CroppedImageMut` ### Fixed - Fixed dividing image by alpha channel. ### Changed A lot of **breaking changes** have been done in this release: - Structures `ImageView` and `ImageViewMut` have been removed. They always did unnecessary memory allocation to store references to image rows. Instead of these structures, the `ImageView` and `ImageViewMut` traits have been added. The crate accepts any image container that provides these traits. - Also, traits `IntoImageView` and `IntoImageViewMut` have been added. They allow you to write runtime adapters to convert your particular image container into something that provides `ImageView`/`ImageViewMut` trait. - `Resizer` now has two methods for resize (dynamic and typed): - `resize()` accepts references to `impl IntoImageView` and `impl IntoImageViewMut`; - `resize_typed()` accepts references to `impl ImageView` and `impl ImageViewMut`. - Resize methods also accept the `options` argument. With the help of this argument, you can specify: - resize algorithm (default: Lanczos3); - how to crop the source image; - whether to multiply the source image by the alpha channel and divide the destination image by the alpha channel. By default, Resizer multiplies and divides by alpha channel images with `U8x2`, `U8x4`, `U16x2` and `U16x4` pixels. - Argument `resize_alg` was removed from `Resizer::new()` method, use `options` argument of methods to resize instead. - The `MulDiv` implementation has been changed in the same way as `Resizer`. It now has two versions of each method: dynamic and typed. - Type of image dimensions has been changed from `NonZeroU32` into `u32`. Now you can create and use zero-sized images. - `Image` (embedded implementation of image container) moved from root of the crate into module `images`. - Added optional feature "image". It adds implementation of traits `IntoImageView` and `IntoImageViewMut` for the [DynamicImage](https://docs.rs/image/latest/image/enum.DynamicImage.html) type from the `image` crate. This implementation allows you to use `DynamicImage` instances as arguments for methods of this crate. Look at the difference between versions 3 and 4 on example of resizing RGBA8 image from given u8-buffer with pixels-data. 3.x version: ```rust use fast_image_resize::{Image, MulDiv, PixelType, Resizer}; use std::num::NonZeroU32; fn my_resize( src_width: u32, src_height: u32, src_pixels: &mut [u8], dst_width: u32, dst_height: u32, ) -> Image { let src_width = NonZeroU32::new(src_width).unwrap(); let src_height = NonZeroU32::new(src_height).unwrap(); let src_image = Image::from_slice_u8( src_width, src_height, src_pixels, PixelType::U8x4, ).unwrap(); // Multiple RGB channels of source image by alpha channel. let alpha_mul_div = MulDiv::default(); let mut tmp_image = Image::new( src_width, src_height, PixelType::U8x4, ); alpha_mul_div .multiply_alpha( &src_image.view(), &mut tmp_image.view_mut(), ).unwrap(); // Create container for data of destination image. let dst_width = NonZeroU32::new(dst_width).unwrap(); let dst_height = NonZeroU32::new(dst_height).unwrap(); let mut dst_image = Image::new( dst_width, dst_height, PixelType::U8x4, ); // Get mutable view of destination image data. let mut dst_view = dst_image.view_mut(); // Create Resizer instance and resize source image // into buffer of destination image. let mut resizer = Resizer::default(); resizer.resize(&tmp_image.view(), &mut dst_view).unwrap(); // Divide RGB channels of destination image by alpha. alpha_mul_div.divide_alpha_inplace(&mut dst_view).unwrap(); dst_image } ``` 4.x version: ```rust use fast_image_resize::images::{Image, ImageRef}; use fast_image_resize::{PixelType, Resizer}; fn my_resize( src_width: u32, src_height: u32, src_pixels: &[u8], dst_width: u32, dst_height: u32, ) -> Image { let src_image = ImageRef::new( src_width, src_height, src_pixels, PixelType::U8x4, ).unwrap(); // Create container for data of destination image. let mut dst_image = Image::new( dst_width, dst_height, PixelType::U8x4, ); // Create Resizer instance and resize source image // into buffer of destination image. let mut resizer = Resizer::new(); // By default, Resizer multiplies and divides by alpha channel // images with U8x2, U8x4, U16x2 and U16x4 pixels. resizer.resize(&src_image, &mut dst_image, None).unwrap(); dst_image } ``` ## [3.0.4] - 2024-02-15 ### Fixed - Fixed error with incorrect cropping of source image. ## [3.0.3] - 2024-02-07 ### Fixed - Fixed version of `num-traits` in the `Cargo.toml`. ## [3.0.2] - 2024-02-07 ### Added - Added `Custom` variant for `FilterType` enum and corresponding `Filter` structure. - **BREAKING**: Added a new variant of enum `CropBoxError::WidthOrHeightLessOrEqualToZero`. ### Changed - Slightly improved (about 3%) speed of `AVX2` implementation of `Convolution` trait for `U8x3` and `U8x4` images. - **BREAKING**: Changed internal data type for `U8x4` structure. Now it is `[u8; 4]` instead of `u32`. - Significantly improved (4.5 times on `x86_64`) speed of vertical convolution pass implemented in native Rust for `U8`, `U8x2`, `U8x3` and `U8x4` images. - Changed order of convolution passes for `U8`, `U8x2`, `U8x3` and `U8x4` images. Now a vertical pass is the first and a horizontal pass is the second. - **BREAKING**: Type of the `CropBox` fields has been changed to `f64`. Now you can use fractional size and position of crop box. - **BREAKING**: Type of the `centering` argument of `ImageView::set_crop_box_to_fit_dst_size()` and `DynamicImageView::set_crop_box_to_fit_dst_size()` methods has been changed to `Optional<(f64, f64)>`. - **BREAKING**: The `crop_box` argument of `ImageViewMut::crop()` and `DynamicImageViewMut::crop()` methods has been replaced with separate `left`, `top`, `width` and `height` arguments. ## [2.7.3] - 2023-05-07 ### Fixed - Fixed size of rows in cropped `ImageViewMut` created by `ImageViewMut::crop` method ([#17](https://github.com/Cykooz/fast_image_resize/issues/17)). ## [2.7.2] - 2023-05-04 ### Fixed - Added using of (read|write)_unaligned for unaligned pointers on `arm64` and `wasm32` architectures. ([#15](https://github.com/Cykooz/fast_image_resize/issues/15)). ## [2.7.1] - 2023-04-28 ### Fixed - Added using of (read|write)_unaligned for unaligned pointers on `x86_64` architecture. ([#16](https://github.com/Cykooz/fast_image_resize/pull/16)). ## [2.7.0] - 2023-03-24 ### Added - Added method `DynamicImageViewMut::crop()` to create cropped version of `DynamicImageViewMut` ([#13](https://github.com/Cykooz/fast_image_resize/issues/13)). - Added method `ImageViewMut::crop()` to create cropped version of `ImageViewMut`. ## [2.6.0] - 2023-03-01 ### Crate - Slightly improved speed of `Convolution` implementation for `U8x2` images and `Wasm32 SIMD128` instructions. - Method `Image::buffer_mut()` was made public ([#14](https://github.com/Cykooz/fast_image_resize/pull/14)) ## [2.5.0] - 2023-01-29 ### Crate - Added support of optimisation with helps of `Wasm32 SIMD128` for all types of images exclude `I32` and `F32` (thanks to @cdmurph32, [#11](https://github.com/Cykooz/fast_image_resize/pull/11)). ### Benchmarks - Benchmark framework `glassbench` replaced by `criterion`. - Added report with results of benchmarks for `wasm32-wasi` target. ## [2.4.0] - 2022-12-11 ### Crate - Slightly improved speed of `MulDiv` implementation for `U8x2`, `U8x4`, `U16x2` and `U16x4` images. - Added optimisation for processing `U16x2` images by `MulDiv` with helps of `NEON SIMD` instructions. - Excluded possibility of unnecessary operations during resize of cropped image by convolution algorithm. - Added implementation `From` trait to convert `ImageViewMut` into `ImageView`. - Added implementation `From` trait to convert `DynamicImageViewMut` into `DynamicImageView`. ## [2.3.0] - 2022-11-25 ### Crate - Added support of optimisation with helps of `NEON SIMD` for convolution of `U16` images. - Added support of optimisation with helps of `NEON SIMD` for convolution of `U16x2` images. - Added support of optimisation with helps of `NEON SIMD` for convolution of `U16x3` images. - Improved optimisation of convolution with helps of `NEON SIMD` for `U8` images. ## [2.2.0] - 2022-11-18 ### Crate - Added support of optimisation with helps of `NEON SIMD` for convolution of `U8` images. - Added support of optimisation with helps of `NEON SIMD` for convolution of `U8x2` images. - Added support of optimisation with helps of `NEON SIMD` for convolution of `U8x3` images. - Added optimisation for processing `U8x2` images by `MulDiv` with helps of `NEON SIMD` instructions. ## [2.1.0] - 2022-11-11 ### Crate - Added method `CpuExtensions::is_supported(&self)`. - Internals of `PixelComponentMapper` changed to use heap to store its data. - Added support of optimisation with helps of `NEON SIMD` for convolution of `U16x4` images. - Added optimisation for processing `U16x4` images by `MulDiv` with helps of `NEON SIMD` instructions. - Added full optimisation for convolution of `U8` images with helps of `SSE4.1` instructions. - Fixed link to documentation page in `README.md` file. - Fixed error in implementation of `MulDiv::divide_alpha()` and `MulDiv::divide_alpha_inplace()` for `U16x4` pixels with optimisation with helps of `SSE4.1` and `AVX2`. - Improved optimisation of `MulDiv` with helps of `NEON SIMD` for `U8x4` pixels. ## [2.0.0] - 2022-10-28 ### Crate - Breaking changes: - Struct `ImageView` replaced by enum `DynamicImageView`. - Struct `ImageViewMut` replaced by enum `DynamicImageViewMut`. - Trait `Pixel` renamed into `PixelExt` and some its internals changed: - associated type `ComponentsCount` renamed into `CountOfComponents`. - associated type `ComponentCountOfValues` deleted. - associated method `components_count` renamed into `count_of_components`. - associated method `component_count_of_values` renamed into `count_of_component_values`. - All pixel types (`U8`, `U8x2`, ...) replaced by type aliases for new generic structure `Pixel`. Use method `new()` to create instance of one pixel. - Added structure `PixelComponentMapper` that holds tables for mapping values of pixel's components in forward and backward directions. - Added function `create_gamma_22_mapper()` to create instance of `PixelComponentMapper` that converts images with gamma 2.2 to linear colorspace and back. - Added function `create_srgb_mapper()` to create instance of `PixelComponentMapper` that converts images from SRGB colorspace to linear RGB and back. - Added generic structs `ImageView` and `ImageViewMut`. - Added functions `change_type_of_pixel_components` and `change_type_of_pixel_components_dyn` that change type of pixel's components in whole image. - Added generic trait `IntoPixelComponent`. - Added generic structure `Pixel` for create all types of pixels. - Added full support of optimisation with helps of `SSE4.1` for convolution of `U8x3` images. - Added support of optimisation with helps of `NEON SIMD` for convolution of `U8x4` images. - Added optimisation for processing `U8x4` images by `MulDiv` with helps of `NEON SIMD` instructions. ### Example application - Added option `--high_precision` to use `u16` as pixel components for intermediate image representation. - Added converting of source image into linear colorspace before it will be resized. Destination image will be returned into original colorspace before it will be saved. ## [1.0.0] - 2022-07-24 - Added example of command line application "resizer". ## [0.9.7] - 2022-07-14 - Fixed resizing when the destination image has the same dimensions as the source image ([#9](https://github.com/Cykooz/fast_image_resize/issues/9)). ## [0.9.6] - 2022-06-28 - Added support of new type of pixels `PixelType::U16x4`. - Fixed benchmarks for resizing images with alpha channel using the `resizer` crate. - Removed `image` crate from benchmarks for resizing images with alpha. - Added method `Image::copy(&self) -> Image<'static>`. ## [0.9.5] - 2022-06-22 - Fixed README.md ## [0.9.4] - 2022-06-22 - Added support of new type of pixels `PixelType::U16x2` (e.g. luma with alpha channel). ## [0.9.3] - 2022-05-31 - Added support of new type of pixels `PixelType::U16`. ## [0.9.2] - 2022-05-19 - Added optimisation for convolution of `U8x2` images with helps of `SSE4.1`. ## [0.9.1] - 2022-05-12 - Added optimisation for processing `U8x2` images by `MulDiv` with helps of `SSE4.1` and `AVX2` instructions. - Added optimisation for convolution of `U16x2` images with helps of `AVX2` instructions. ## [0.9.0] - 2022-05-01 - Added support of new type of pixels `PixelType::U8x2`. - Added into `MulDiv` support of images with pixel type `U8x2`. - Added method `Image::into_vec(self) -> Vec` ([#7](https://github.com/Cykooz/fast_image_resize/pull/7)). ## [0.8.0] - 2022-03-23 - Added optimisation for convolution of U16x3 images with helps of `SSE4.1` and `AVX2` instructions. - Added partial optimisation for convolution of U8 images with helps of `SSE4.1` instructions. - Allowed to create an instance of `Image`, `ImageVew` and `ImageViewMut` from a buffer larger than necessary ([#5](https://github.com/Cykooz/fast_image_resize/issues/5)). - Breaking changes: - Removed methods: `Image::from_vec_u32()`, `Image::from_slice_u32()`. - Removed error `InvalidBufferSizeError`. ## [0.7.0] - 2022-01-27 - Added support of new type of pixels `PixelType::U16x3`. - Breaking changes: - Added variant `U16x3` into the enum `PixelType`. ## [0.6.0] - 2022-01-12 - Added optimisation of multiplying and dividing image by alpha channel with helps of `SSE4.1` instructions. - Improved performance of dividing image by alpha channel without forced SIMD instructions. - Breaking changes: - Deleted variant `SSE2` from enum `CpuExtensions`. ## [0.5.3] - 2021-12-14 - Added optimisation of convolution U8x3 images with helps of `AVX2` instructions. - Fixed error in code for convolution U8x4 images with helps of `SSE4.1` instructions. - Fixed error in code for convolution U8 images with helps of `AVX2` instructions. ## [0.5.2] - 2021-11-26 - Fixed compile errors on non-x86 architectures. ## [0.5.1] - 2021-11-24 - Fixed compile errors on non-x86 architectures. ## [0.5.0] - 2021-11-18 - Added support of new type of pixels `PixelType::U8x3` (with auto-vectorization for SSE4.1). - Exposed module `fast_image_resize::pixels` with types `U8x3`, `U8x4`, `F32`, `I32`, `U8` used as wrappers for represent type of one pixel of image. - Some optimisations in code of convolution written in Rust (without intrinsics for SIMD). - Breaking changes: - Added variant `U8x3` into the enum `PixelType`. - Changed internal tuple structures inside of variant of `ImageRows` and `ImageRowsMut` enums. ## [0.4.1] - 2021-11-13 - Added optimisation of convolution grayscale images (U8) with helps of `AVX2` instructions. ## [0.4.0] - 2021-10-23 - Added support of new type of pixels `PixelType::U8` (without forced SIMD). - Breaking changes: - `ImageData` renamed into `Image`. - `SrcImageView` and `DstImageView` replaced by `ImageView` and `ImageViewMut`. - Method `Resizer.resize()` now returns `Result<(), DifferentTypesOfPixelsError>`. ## [0.3.1] - 2021-10-09 - Added support of compilation for architectures other than x86_64. ## [0.3.0] - 2021-08-28 - Added method `SrcImageView.set_crop_box_to_fit_dst_size()`. - Fixed out-of-bounds error during resize with cropping. - Refactored `ImageData`. - Added methods: `from_vec_u32()`, `from_vec_u8()`, `from_slice_u32()`, `from_slice_u8()`. - Removed methods: `from_buffer()`, `from_pixels()`. ## [0.2.0] - 2021-08-02 - Fixed typo in name of CatmullRom filter type. fast_image_resize-5.3.0/Cargo.lock0000644000001345000000000000100124770ustar # This file is automatically @generated by Cargo. # It is not intended for manual editing. version = 4 [[package]] name = "adler2" version = "2.0.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "320119579fcad9c21884f5c4861d16174d0e06250625266f50fe6898340abefa" [[package]] name = "aho-corasick" version = "1.1.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "8e60d3430d3a69478ad0993f19238d2df97c507009a52b3c10addcd7f6bcb916" dependencies = [ "memchr", ] [[package]] name = "aligned-vec" version = "0.6.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "dc890384c8602f339876ded803c97ad529f3842aba97f6392b3dba0dd171769b" dependencies = [ "equator", ] [[package]] name = "android-tzdata" version = "0.1.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e999941b234f3131b00bc13c22d06e8c5ff726d1b6318ac7eb276997bbb4fef0" [[package]] name = "android_system_properties" version = "0.1.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "819e7219dbd41043ac279b19830f2efc897156490d7fd6ea916720117ee66311" dependencies = [ "libc", ] [[package]] name = "anes" version = "0.1.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4b46cbb362ab8752921c97e041f5e366ee6297bd428a31275b9fcf1e380f7299" [[package]] name = "anstyle" version = "1.0.11" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "862ed96ca487e809f1c8e5a8447f6ee2cf102f846893800b20cebdf541fc6bbd" [[package]] name = "anyhow" version = "1.0.99" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b0674a1ddeecb70197781e945de4b3b8ffb61fa939a5597bcf48503737663100" [[package]] name = "arbitrary" version = "1.4.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c3d036a3c4ab069c7b410a2ce876bd74808d2d0888a82667669f8e783a898bf1" [[package]] name = "arg_enum_proc_macro" version = "0.3.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0ae92a5119aa49cdbcf6b9f893fe4e1d98b04ccbf82ee0584ad948a44a734dea" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "arrayvec" version = "0.7.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" [[package]] name = "autocfg" version = "1.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" [[package]] name = "av1-grain" version = "0.2.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4f3efb2ca85bc610acfa917b5aaa36f3fcbebed5b3182d7f877b02531c4b80c8" dependencies = [ "anyhow", "arrayvec", "log", "nom", "num-rational", "v_frame", ] [[package]] name = "avif-serialize" version = "0.8.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "47c8fbc0f831f4519fe8b810b6a7a91410ec83031b8233f730a0480029f6a23f" dependencies = [ "arrayvec", ] [[package]] name = "bitflags" version = "1.3.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a" [[package]] name = "bitflags" version = "2.9.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2261d10cca569e4643e526d8dc2e62e433cc8aba21ab764233731f8d369bf394" [[package]] name = "bitstream-io" version = "2.6.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6099cdc01846bc367c4e7dd630dc5966dccf36b652fae7a74e17b640411a91b2" [[package]] name = "block-buffer" version = "0.10.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "3078c7629b62d3f0439517fa394996acacc5cbc91c5a20d8c658e77abd503a71" dependencies = [ "generic-array", ] [[package]] name = "bstr" version = "1.12.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "234113d19d0d7d613b40e86fb654acf958910802bcceab913a4f9e7cda03b1a4" dependencies = [ "memchr", "serde", ] [[package]] name = "built" version = "0.7.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "56ed6191a7e78c36abdb16ab65341eefd73d64d303fffccdbb00d51e4205967b" [[package]] name = "bumpalo" version = "3.19.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "46c5e41b57b8bba42a04676d81cb89e9ee8e859a1a66f80a5a72e1cb76b34d43" [[package]] name = "bytemuck" version = "1.23.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "3995eaeebcdf32f91f980d360f78732ddc061097ab4e39991ae7a6ace9194677" [[package]] name = "byteorder-lite" version = "0.1.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "8f1fe948ff07f4bd06c30984e69f5b4899c516a3ef74f34df92a2df2ab535495" [[package]] name = "cast" version = "0.3.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5" [[package]] name = "cc" version = "1.2.35" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "590f9024a68a8c40351881787f1934dc11afd69090f5edb6831464694d836ea3" dependencies = [ "find-msvc-tools", "jobserver", "libc", "shlex", ] [[package]] name = "cfg-expr" version = "0.15.8" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d067ad48b8650848b989a59a86c6c36a995d02d2bf778d45c3c5d57bc2718f02" dependencies = [ "smallvec", "target-lexicon", ] [[package]] name = "cfg-if" version = "1.0.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2fd1289c04a9ea8cb22300a459a72a385d7c73d3259e2ed7dcb2af674838cfa9" [[package]] name = "cfg_aliases" version = "0.2.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "613afe47fcd5fac7ccf1db93babcb082c5994d996f20b8b159f2ad1658eb5724" [[package]] name = "chrono" version = "0.4.41" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c469d952047f47f91b68d1cba3f10d63c11d73e4636f24f08daf0278abf01c4d" dependencies = [ "android-tzdata", "iana-time-zone", "num-traits", "windows-link", ] [[package]] name = "chrono-tz" version = "0.9.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "93698b29de5e97ad0ae26447b344c482a7284c737d9ddc5f9e52b74a336671bb" dependencies = [ "chrono", "chrono-tz-build", "phf", ] [[package]] name = "chrono-tz-build" version = "0.3.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0c088aee841df9c3041febbb73934cfc39708749bf96dc827e3359cd39ef11b1" dependencies = [ "parse-zoneinfo", "phf", "phf_codegen", ] [[package]] name = "ciborium" version = "0.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "42e69ffd6f0917f5c029256a24d0161db17cea3997d185db0d35926308770f0e" dependencies = [ "ciborium-io", "ciborium-ll", "serde", ] [[package]] name = "ciborium-io" version = "0.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "05afea1e0a06c9be33d539b876f1ce3692f4afea2cb41f740e7743225ed1c757" [[package]] name = "ciborium-ll" version = "0.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "57663b653d948a338bfb3eeba9bb2fd5fcfaecb9e199e87e1eda4d9e8b240fd9" dependencies = [ "ciborium-io", "half", ] [[package]] name = "clap" version = "4.5.47" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7eac00902d9d136acd712710d71823fb8ac8004ca445a89e73a41d45aa712931" dependencies = [ "clap_builder", ] [[package]] name = "clap_builder" version = "4.5.47" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2ad9bbf750e73b5884fb8a211a9424a1906c1e156724260fdae972f31d70e1d6" dependencies = [ "anstyle", "clap_lex", ] [[package]] name = "clap_lex" version = "0.7.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b94f61472cee1439c0b966b47e3aca9ae07e45d070759512cd390ea2bebc6675" [[package]] name = "core-foundation-sys" version = "0.8.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "773648b94d0e5d620f64f280777445740e61fe701025087ec8b57f45c791888b" [[package]] name = "cpufeatures" version = "0.2.17" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "59ed5838eebb26a2bb2e58f6d5b5316989ae9d08bab10e0e6d103e656d1b0280" dependencies = [ "libc", ] [[package]] name = "crc32fast" version = "1.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9481c1c90cbf2ac953f07c8d4a58aa3945c425b7185c9154d67a65e4230da511" dependencies = [ "cfg-if", ] [[package]] name = "criterion" version = "0.7.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e1c047a62b0cc3e145fa84415a3191f628e980b194c2755aa12300a4e6cbd928" dependencies = [ "anes", "cast", "ciborium", "clap", "criterion-plot", "itertools 0.13.0", "num-traits", "oorandom", "regex", "serde", "serde_json", "tinytemplate", "walkdir", ] [[package]] name = "criterion-plot" version = "0.6.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9b1bcc0dc7dfae599d84ad0b1a55f80cde8af3725da8313b528da95ef783e338" dependencies = [ "cast", "itertools 0.13.0", ] [[package]] name = "crossbeam-deque" version = "0.8.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9dd111b7b7f7d55b72c0a6ae361660ee5853c9af73f70c3c2ef6858b950e2e51" dependencies = [ "crossbeam-epoch", "crossbeam-utils", ] [[package]] name = "crossbeam-epoch" version = "0.9.18" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e" dependencies = [ "crossbeam-utils", ] [[package]] name = "crossbeam-utils" version = "0.8.21" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28" [[package]] name = "crunchy" version = "0.2.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5" [[package]] name = "crypto-common" version = "0.1.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1bfb12502f3fc46cca1bb51ac28df9d618d813cdc3d2f25b9fe775a34af26bb3" dependencies = [ "generic-array", "typenum", ] [[package]] name = "deunicode" version = "1.6.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "abd57806937c9cc163efc8ea3910e00a62e2aeb0b8119f1793a978088f8f6b04" [[package]] name = "digest" version = "0.10.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9ed9a281f7bc9b7576e61468ba615a66a5c8cfdff42420a70aa82701a3b1e292" dependencies = [ "block-buffer", "crypto-common", ] [[package]] name = "document-features" version = "0.2.11" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "95249b50c6c185bee49034bcb378a49dc2b5dff0be90ff6616d31d64febab05d" dependencies = [ "litrs", ] [[package]] name = "either" version = "1.15.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719" [[package]] name = "equator" version = "0.4.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4711b213838dfee0117e3be6ac926007d7f433d7bbe33595975d4190cb07e6fc" dependencies = [ "equator-macro", ] [[package]] name = "equator-macro" version = "0.4.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "44f23cf4b44bfce11a86ace86f8a73ffdec849c9fd00a386a53d278bd9e81fb3" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "equivalent" version = "1.0.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "877a4ace8713b0bcf2a4e7eec82529c029f1d0619886d18145fea96c3ffe5c0f" [[package]] name = "fast_image_resize" version = "5.3.0" dependencies = [ "bytemuck", "cfg-if", "criterion", "document-features", "image", "itertools 0.14.0", "libvips", "nix", "num-traits", "png 0.18.0", "rayon", "resize", "rgb", "serde", "serde_json", "tera", "thiserror 2.0.16", "walkdir", ] [[package]] name = "fdeflate" version = "0.3.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1e6853b52649d4ac5c0bd02320cddc5ba956bdb407c4b75a2c6b75bf51500f8c" dependencies = [ "simd-adler32", ] [[package]] name = "find-msvc-tools" version = "0.1.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e178e4fba8a2726903f6ba98a6d221e76f9c12c650d5dc0e6afdc50677b49650" [[package]] name = "flate2" version = "1.1.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4a3d7db9596fecd151c5f638c0ee5d5bd487b6e0ea232e5dc96d5250f6f94b1d" dependencies = [ "crc32fast", "miniz_oxide", ] [[package]] name = "generic-array" version = "0.14.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a" dependencies = [ "typenum", "version_check", ] [[package]] name = "getrandom" version = "0.2.16" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "335ff9f135e4384c8150d6f27c6daed433577f86b4750418338c01a1a2528592" dependencies = [ "cfg-if", "libc", "wasi 0.11.1+wasi-snapshot-preview1", ] [[package]] name = "getrandom" version = "0.3.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "26145e563e54f2cadc477553f1ec5ee650b00862f0a58bcd12cbdc5f0ea2d2f4" dependencies = [ "cfg-if", "libc", "r-efi", "wasi 0.14.3+wasi-0.2.4", ] [[package]] name = "globset" version = "0.4.16" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "54a1028dfc5f5df5da8a56a73e6c153c9a9708ec57232470703592a3f18e49f5" dependencies = [ "aho-corasick", "bstr", "log", "regex-automata", "regex-syntax", ] [[package]] name = "globwalk" version = "0.9.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0bf760ebf69878d9fd8f110c89703d90ce35095324d1f1edcb595c63945ee757" dependencies = [ "bitflags 2.9.4", "ignore", "walkdir", ] [[package]] name = "half" version = "2.6.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "459196ed295495a68f7d7fe1d84f6c4b7ff0e21fe3017b2f283c6fac3ad803c9" dependencies = [ "cfg-if", "crunchy", ] [[package]] name = "hashbrown" version = "0.15.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9229cfe53dfd69f0609a49f65461bd93001ea1ef889cd5529dd176593f5338a1" [[package]] name = "heck" version = "0.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea" [[package]] name = "humansize" version = "2.1.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6cb51c9a029ddc91b07a787f1d86b53ccfa49b0e86688c946ebe8d3555685dd7" dependencies = [ "libm", ] [[package]] name = "iana-time-zone" version = "0.1.63" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b0c919e5debc312ad217002b8048a17b7d83f80703865bbfcfebb0458b0b27d8" dependencies = [ "android_system_properties", "core-foundation-sys", "iana-time-zone-haiku", "js-sys", "log", "wasm-bindgen", "windows-core", ] [[package]] name = "iana-time-zone-haiku" version = "0.1.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f31827a206f56af32e590ba56d5d2d085f558508192593743f16b2306495269f" dependencies = [ "cc", ] [[package]] name = "ignore" version = "0.4.23" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6d89fd380afde86567dfba715db065673989d6253f42b88179abd3eae47bda4b" dependencies = [ "crossbeam-deque", "globset", "log", "memchr", "regex-automata", "same-file", "walkdir", "winapi-util", ] [[package]] name = "image" version = "0.25.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "db35664ce6b9810857a38a906215e75a9c879f0696556a39f59c62829710251a" dependencies = [ "bytemuck", "byteorder-lite", "num-traits", "png 0.17.16", "ravif", "rayon", ] [[package]] name = "imgref" version = "1.11.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d0263a3d970d5c054ed9312c0057b4f3bde9c0b33836d3637361d4a9e6e7a408" [[package]] name = "indexmap" version = "2.11.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f2481980430f9f78649238835720ddccc57e52df14ffce1c6f37391d61b563e9" dependencies = [ "equivalent", "hashbrown", ] [[package]] name = "interpolate_name" version = "0.2.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c34819042dc3d3971c46c2190835914dfbe0c3c13f61449b2997f4e9722dfa60" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "itertools" version = "0.12.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ba291022dbbd398a455acf126c1e341954079855bc60dfdda641363bd6922569" dependencies = [ "either", ] [[package]] name = "itertools" version = "0.13.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "413ee7dfc52ee1a4949ceeb7dbc8a33f2d6c088194d9f922fb8318faf1f01186" dependencies = [ "either", ] [[package]] name = "itertools" version = "0.14.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2b192c782037fadd9cfa75548310488aabdbf3d2da73885b31bd0abd03351285" dependencies = [ "either", ] [[package]] name = "itoa" version = "1.0.15" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4a5f13b858c8d314ee3e8f639011f7ccefe71f97f96e50151fb991f267928e2c" [[package]] name = "jobserver" version = "0.1.34" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9afb3de4395d6b3e67a780b6de64b51c978ecf11cb9a462c66be7d4ca9039d33" dependencies = [ "getrandom 0.3.3", "libc", ] [[package]] name = "js-sys" version = "0.3.77" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1cfaf33c695fc6e08064efbc1f72ec937429614f25eef83af942d0e227c3a28f" dependencies = [ "once_cell", "wasm-bindgen", ] [[package]] name = "lazy_static" version = "1.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "bbd2bcb4c963f2ddae06a2efc7e9f3591312473c50c6685e1f298068316e66fe" [[package]] name = "libc" version = "0.2.175" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6a82ae493e598baaea5209805c49bbf2ea7de956d50d7da0da1164f9c6d28543" [[package]] name = "libfuzzer-sys" version = "0.4.10" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5037190e1f70cbeef565bd267599242926f724d3b8a9f510fd7e0b540cfa4404" dependencies = [ "arbitrary", "cc", ] [[package]] name = "libm" version = "0.2.15" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f9fbbcab51052fe104eb5e5d351cf728d30a5be1fe14d9be8a3b097481fb97de" [[package]] name = "libvips" version = "1.7.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d9930af49f9da4f576a2ec52ed8d767f6389f0c8cadfcdeaccb9c7b75ca83f8a" dependencies = [ "num-derive", "num-traits", ] [[package]] name = "litrs" version = "0.4.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f5e54036fe321fd421e10d732f155734c4e4afd610dd556d9a82833ab3ee0bed" [[package]] name = "log" version = "0.4.27" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "13dc2df351e3202783a1fe0d44375f7295ffb4049267b0f3018346dc122a1d94" [[package]] name = "loop9" version = "0.1.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0fae87c125b03c1d2c0150c90365d7d6bcc53fb73a9acaef207d2d065860f062" dependencies = [ "imgref", ] [[package]] name = "maybe-rayon" version = "0.1.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "8ea1f30cedd69f0a2954655f7188c6a834246d2bcf1e315e2ac40c4b24dc9519" dependencies = [ "cfg-if", "rayon", ] [[package]] name = "memchr" version = "2.7.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "32a282da65faaf38286cf3be983213fcf1d2e2a58700e808f83f4ea9a4804bc0" [[package]] name = "minimal-lexical" version = "0.2.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "68354c5c6bd36d73ff3feceb05efa59b6acb7626617f4962be322a825e61f79a" [[package]] name = "miniz_oxide" version = "0.8.9" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1fa76a2c86f704bdb222d66965fb3d63269ce38518b83cb0575fca855ebb6316" dependencies = [ "adler2", "simd-adler32", ] [[package]] name = "new_debug_unreachable" version = "1.0.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "650eef8c711430f1a879fdd01d4745a7deea475becfb90269c06775983bbf086" [[package]] name = "nix" version = "0.30.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "74523f3a35e05aba87a1d978330aef40f67b0304ac79c1c00b294c9830543db6" dependencies = [ "bitflags 2.9.4", "cfg-if", "cfg_aliases", "libc", ] [[package]] name = "nom" version = "7.1.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d273983c5a657a70a3e8f2a01329822f3b8c8172b73826411a55751e404a0a4a" dependencies = [ "memchr", "minimal-lexical", ] [[package]] name = "noop_proc_macro" version = "0.3.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0676bb32a98c1a483ce53e500a81ad9c3d5b3f7c920c28c24e9cb0980d0b5bc8" [[package]] name = "num-bigint" version = "0.4.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a5e44f723f1133c9deac646763579fdb3ac745e418f2a7af9cd0c431da1f20b9" dependencies = [ "num-integer", "num-traits", ] [[package]] name = "num-derive" version = "0.4.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ed3955f1a9c7c0c15e092f9c887db08b1fc683305fdf6eb6684f22555355e202" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "num-integer" version = "0.1.46" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7969661fd2958a5cb096e56c8e1ad0444ac2bbcd0061bd28660485a44879858f" dependencies = [ "num-traits", ] [[package]] name = "num-rational" version = "0.4.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f83d14da390562dca69fc84082e73e548e1ad308d24accdedd2720017cb37824" dependencies = [ "num-bigint", "num-integer", "num-traits", ] [[package]] name = "num-traits" version = "0.2.19" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841" dependencies = [ "autocfg", ] [[package]] name = "once_cell" version = "1.21.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "42f5e15c9953c5e4ccceeb2e7382a716482c34515315f7b03532b8b4e8393d2d" [[package]] name = "oorandom" version = "11.1.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d6790f58c7ff633d8771f42965289203411a5e5c68388703c06e14f24770b41e" [[package]] name = "parse-zoneinfo" version = "0.3.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1f2a05b18d44e2957b88f96ba460715e295bc1d7510468a2f3d3b44535d26c24" dependencies = [ "regex", ] [[package]] name = "paste" version = "1.0.15" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "57c0d7b74b563b49d38dae00a0c37d4d6de9b432382b2892f0574ddcae73fd0a" [[package]] name = "percent-encoding" version = "2.3.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220" [[package]] name = "pest" version = "2.8.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1db05f56d34358a8b1066f67cbb203ee3e7ed2ba674a6263a1d5ec6db2204323" dependencies = [ "memchr", "thiserror 2.0.16", "ucd-trie", ] [[package]] name = "pest_derive" version = "2.8.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "bb056d9e8ea77922845ec74a1c4e8fb17e7c218cc4fc11a15c5d25e189aa40bc" dependencies = [ "pest", "pest_generator", ] [[package]] name = "pest_generator" version = "2.8.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "87e404e638f781eb3202dc82db6760c8ae8a1eeef7fb3fa8264b2ef280504966" dependencies = [ "pest", "pest_meta", "proc-macro2", "quote", "syn", ] [[package]] name = "pest_meta" version = "2.8.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "edd1101f170f5903fde0914f899bb503d9ff5271d7ba76bbb70bea63690cc0d5" dependencies = [ "pest", "sha2", ] [[package]] name = "phf" version = "0.11.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1fd6780a80ae0c52cc120a26a1a42c1ae51b247a253e4e06113d23d2c2edd078" dependencies = [ "phf_shared", ] [[package]] name = "phf_codegen" version = "0.11.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "aef8048c789fa5e851558d709946d6d79a8ff88c0440c587967f8e94bfb1216a" dependencies = [ "phf_generator", "phf_shared", ] [[package]] name = "phf_generator" version = "0.11.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "3c80231409c20246a13fddb31776fb942c38553c51e871f8cbd687a4cfb5843d" dependencies = [ "phf_shared", "rand", ] [[package]] name = "phf_shared" version = "0.11.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "67eabc2ef2a60eb7faa00097bd1ffdb5bd28e62bf39990626a582201b7a754e5" dependencies = [ "siphasher", ] [[package]] name = "pkg-config" version = "0.3.32" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7edddbd0b52d732b21ad9a5fab5c704c14cd949e5e9a1ec5929a24fded1b904c" [[package]] name = "png" version = "0.17.16" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "82151a2fc869e011c153adc57cf2789ccb8d9906ce52c0b39a6b5697749d7526" dependencies = [ "bitflags 1.3.2", "crc32fast", "fdeflate", "flate2", "miniz_oxide", ] [[package]] name = "png" version = "0.18.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "97baced388464909d42d89643fe4361939af9b7ce7a31ee32a168f832a70f2a0" dependencies = [ "bitflags 2.9.4", "crc32fast", "fdeflate", "flate2", "miniz_oxide", ] [[package]] name = "ppv-lite86" version = "0.2.21" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "85eae3c4ed2f50dcfe72643da4befc30deadb458a9b590d720cde2f2b1e97da9" dependencies = [ "zerocopy", ] [[package]] name = "proc-macro2" version = "1.0.101" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "89ae43fd86e4158d6db51ad8e2b80f313af9cc74f5c0e03ccb87de09998732de" dependencies = [ "unicode-ident", ] [[package]] name = "profiling" version = "1.0.17" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "3eb8486b569e12e2c32ad3e204dbaba5e4b5b216e9367044f25f1dba42341773" dependencies = [ "profiling-procmacros", ] [[package]] name = "profiling-procmacros" version = "1.0.17" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "52717f9a02b6965224f95ca2a81e2e0c5c43baacd28ca057577988930b6c3d5b" dependencies = [ "quote", "syn", ] [[package]] name = "quick-error" version = "2.0.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a993555f31e5a609f617c12db6250dedcac1b0a85076912c436e6fc9b2c8e6a3" [[package]] name = "quote" version = "1.0.40" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1885c039570dc00dcb4ff087a89e185fd56bae234ddc7f056a945bf36467248d" dependencies = [ "proc-macro2", ] [[package]] name = "r-efi" version = "5.3.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "69cdb34c158ceb288df11e18b4bd39de994f6657d83847bdffdbd7f346754b0f" [[package]] name = "rand" version = "0.8.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "34af8d1a0e25924bc5b7c43c079c942339d8f0a8b57c39049bef581b46327404" dependencies = [ "libc", "rand_chacha", "rand_core", ] [[package]] name = "rand_chacha" version = "0.3.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e6c10a63a0fa32252be49d21e7709d4d4baf8d231c2dbce1eaa8141b9b127d88" dependencies = [ "ppv-lite86", "rand_core", ] [[package]] name = "rand_core" version = "0.6.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ec0be4795e2f6a28069bec0b5ff3e2ac9bafc99e6a9a7dc3547996c5c816922c" dependencies = [ "getrandom 0.2.16", ] [[package]] name = "rav1e" version = "0.7.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "cd87ce80a7665b1cce111f8a16c1f3929f6547ce91ade6addf4ec86a8dda5ce9" dependencies = [ "arbitrary", "arg_enum_proc_macro", "arrayvec", "av1-grain", "bitstream-io", "built", "cfg-if", "interpolate_name", "itertools 0.12.1", "libc", "libfuzzer-sys", "log", "maybe-rayon", "new_debug_unreachable", "noop_proc_macro", "num-derive", "num-traits", "once_cell", "paste", "profiling", "rand", "rand_chacha", "simd_helpers", "system-deps", "thiserror 1.0.69", "v_frame", "wasm-bindgen", ] [[package]] name = "ravif" version = "0.11.20" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5825c26fddd16ab9f515930d49028a630efec172e903483c94796cfe31893e6b" dependencies = [ "avif-serialize", "imgref", "loop9", "quick-error", "rav1e", "rayon", "rgb", ] [[package]] name = "rayon" version = "1.11.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "368f01d005bf8fd9b1206fb6fa653e6c4a81ceb1466406b81792d87c5677a58f" dependencies = [ "either", "rayon-core", ] [[package]] name = "rayon-core" version = "1.13.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "22e18b0f0062d30d4230b2e85ff77fdfe4326feb054b9783a3460d8435c8ab91" dependencies = [ "crossbeam-deque", "crossbeam-utils", ] [[package]] name = "regex" version = "1.11.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "23d7fd106d8c02486a8d64e778353d1cffe08ce79ac2e82f540c86d0facf6912" dependencies = [ "aho-corasick", "memchr", "regex-automata", "regex-syntax", ] [[package]] name = "regex-automata" version = "0.4.10" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6b9458fa0bfeeac22b5ca447c63aaf45f28439a709ccd244698632f9aa6394d6" dependencies = [ "aho-corasick", "memchr", "regex-syntax", ] [[package]] name = "regex-syntax" version = "0.8.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "caf4aa5b0f434c91fe5c7f1ecb6a5ece2130b02ad2a590589dda5146df959001" [[package]] name = "resize" version = "0.8.8" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "87a103d0b47e783f4579149402f7499397ab25540c7a57b2f70487a5d2d20ef0" dependencies = [ "rayon", "rgb", ] [[package]] name = "rgb" version = "0.8.52" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0c6a884d2998352bb4daf0183589aec883f16a6da1f4dde84d8e2e9a5409a1ce" dependencies = [ "bytemuck", ] [[package]] name = "rustversion" version = "1.0.22" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d" [[package]] name = "ryu" version = "1.0.20" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "28d3b2b1366ec20994f1fd18c3c594f05c5dd4bc44d8bb0c1c632c8d6829481f" [[package]] name = "same-file" version = "1.0.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "93fc1dc3aaa9bfed95e02e6eadabb4baf7e3078b0bd1b4d7b6b0b68378900502" dependencies = [ "winapi-util", ] [[package]] name = "serde" version = "1.0.219" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5f0e2c6ed6606019b4e29e69dbaba95b11854410e5347d525002456dbbb786b6" dependencies = [ "serde_derive", ] [[package]] name = "serde_derive" version = "1.0.219" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5b0276cf7f2c73365f7157c8123c21cd9a50fbbd844757af28ca1f5925fc2a00" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "serde_json" version = "1.0.143" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d401abef1d108fbd9cbaebc3e46611f4b1021f714a0597a71f41ee463f5f4a5a" dependencies = [ "itoa", "memchr", "ryu", "serde", ] [[package]] name = "serde_spanned" version = "0.6.9" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "bf41e0cfaf7226dca15e8197172c295a782857fcb97fad1808a166870dee75a3" dependencies = [ "serde", ] [[package]] name = "sha2" version = "0.10.9" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a7507d819769d01a365ab707794a4084392c824f54a7a6a7862f8c3d0892b283" dependencies = [ "cfg-if", "cpufeatures", "digest", ] [[package]] name = "shlex" version = "1.3.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64" [[package]] name = "simd-adler32" version = "0.3.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d66dc143e6b11c1eddc06d5c423cfc97062865baf299914ab64caa38182078fe" [[package]] name = "simd_helpers" version = "0.1.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "95890f873bec569a0362c235787f3aca6e1e887302ba4840839bcc6459c42da6" dependencies = [ "quote", ] [[package]] name = "siphasher" version = "1.0.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "56199f7ddabf13fe5074ce809e7d3f42b42ae711800501b5b16ea82ad029c39d" [[package]] name = "slug" version = "0.1.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "882a80f72ee45de3cc9a5afeb2da0331d58df69e4e7d8eeb5d3c7784ae67e724" dependencies = [ "deunicode", "wasm-bindgen", ] [[package]] name = "smallvec" version = "1.15.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "67b1b7a3b5fe4f1376887184045fcf45c69e92af734b7aaddc05fb777b6fbd03" [[package]] name = "syn" version = "2.0.106" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ede7c438028d4436d71104916910f5bb611972c5cfd7f89b8300a8186e6fada6" dependencies = [ "proc-macro2", "quote", "unicode-ident", ] [[package]] name = "system-deps" version = "6.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a3e535eb8dded36d55ec13eddacd30dec501792ff23a0b1682c38601b8cf2349" dependencies = [ "cfg-expr", "heck", "pkg-config", "toml", "version-compare", ] [[package]] name = "target-lexicon" version = "0.12.16" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "61c41af27dd6d1e27b1b16b489db798443478cef1f06a660c96db617ba5de3b1" [[package]] name = "tera" version = "1.20.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ab9d851b45e865f178319da0abdbfe6acbc4328759ff18dafc3a41c16b4cd2ee" dependencies = [ "chrono", "chrono-tz", "globwalk", "humansize", "lazy_static", "percent-encoding", "pest", "pest_derive", "rand", "regex", "serde", "serde_json", "slug", "unic-segment", ] [[package]] name = "thiserror" version = "1.0.69" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b6aaf5339b578ea85b50e080feb250a3e8ae8cfcdff9a461c9ec2904bc923f52" dependencies = [ "thiserror-impl 1.0.69", ] [[package]] name = "thiserror" version = "2.0.16" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "3467d614147380f2e4e374161426ff399c91084acd2363eaf549172b3d5e60c0" dependencies = [ "thiserror-impl 2.0.16", ] [[package]] name = "thiserror-impl" version = "1.0.69" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4fee6c4efc90059e10f81e6d42c60a18f76588c3d74cb83a0b242a2b6c7504c1" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "thiserror-impl" version = "2.0.16" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6c5e1be1c48b9172ee610da68fd9cd2770e7a4056cb3fc98710ee6906f0c7960" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "tinytemplate" version = "1.2.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "be4d6b5f19ff7664e8c98d03e2139cb510db9b0a60b55f8e8709b689d939b6bc" dependencies = [ "serde", "serde_json", ] [[package]] name = "toml" version = "0.8.23" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "dc1beb996b9d83529a9e75c17a1686767d148d70663143c7854d8b4a09ced362" dependencies = [ "serde", "serde_spanned", "toml_datetime", "toml_edit", ] [[package]] name = "toml_datetime" version = "0.6.11" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "22cddaf88f4fbc13c51aebbf5f8eceb5c7c5a9da2ac40a13519eb5b0a0e8f11c" dependencies = [ "serde", ] [[package]] name = "toml_edit" version = "0.22.27" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "41fe8c660ae4257887cf66394862d21dbca4a6ddd26f04a3560410406a2f819a" dependencies = [ "indexmap", "serde", "serde_spanned", "toml_datetime", "winnow", ] [[package]] name = "typenum" version = "1.18.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1dccffe3ce07af9386bfd29e80c0ab1a8205a2fc34e4bcd40364df902cfa8f3f" [[package]] name = "ucd-trie" version = "0.1.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2896d95c02a80c6d6a5d6e953d479f5ddf2dfdb6a244441010e373ac0fb88971" [[package]] name = "unic-char-property" version = "0.9.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a8c57a407d9b6fa02b4795eb81c5b6652060a15a7903ea981f3d723e6c0be221" dependencies = [ "unic-char-range", ] [[package]] name = "unic-char-range" version = "0.9.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0398022d5f700414f6b899e10b8348231abf9173fa93144cbc1a43b9793c1fbc" [[package]] name = "unic-common" version = "0.9.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "80d7ff825a6a654ee85a63e80f92f054f904f21e7d12da4e22f9834a4aaa35bc" [[package]] name = "unic-segment" version = "0.9.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e4ed5d26be57f84f176157270c112ef57b86debac9cd21daaabbe56db0f88f23" dependencies = [ "unic-ucd-segment", ] [[package]] name = "unic-ucd-segment" version = "0.9.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2079c122a62205b421f499da10f3ee0f7697f012f55b675e002483c73ea34700" dependencies = [ "unic-char-property", "unic-char-range", "unic-ucd-version", ] [[package]] name = "unic-ucd-version" version = "0.9.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "96bd2f2237fe450fcd0a1d2f5f4e91711124f7857ba2e964247776ebeeb7b0c4" dependencies = [ "unic-common", ] [[package]] name = "unicode-ident" version = "1.0.18" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5a5f39404a5da50712a4c1eecf25e90dd62b613502b7e925fd4e4d19b5c96512" [[package]] name = "v_frame" version = "0.3.9" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "666b7727c8875d6ab5db9533418d7c764233ac9c0cff1d469aec8fa127597be2" dependencies = [ "aligned-vec", "num-traits", "wasm-bindgen", ] [[package]] name = "version-compare" version = "0.2.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "852e951cb7832cb45cb1169900d19760cfa39b82bc0ea9c0e5a14ae88411c98b" [[package]] name = "version_check" version = "0.9.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a" [[package]] name = "walkdir" version = "2.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "29790946404f91d9c5d06f9874efddea1dc06c5efe94541a7d6863108e3a5e4b" dependencies = [ "same-file", "winapi-util", ] [[package]] name = "wasi" version = "0.11.1+wasi-snapshot-preview1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ccf3ec651a847eb01de73ccad15eb7d99f80485de043efb2f370cd654f4ea44b" [[package]] name = "wasi" version = "0.14.3+wasi-0.2.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6a51ae83037bdd272a9e28ce236db8c07016dd0d50c27038b3f407533c030c95" dependencies = [ "wit-bindgen", ] [[package]] name = "wasm-bindgen" version = "0.2.100" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1edc8929d7499fc4e8f0be2262a241556cfc54a0bea223790e71446f2aab1ef5" dependencies = [ "cfg-if", "once_cell", "rustversion", "wasm-bindgen-macro", ] [[package]] name = "wasm-bindgen-backend" version = "0.2.100" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2f0a0651a5c2bc21487bde11ee802ccaf4c51935d0d3d42a6101f98161700bc6" dependencies = [ "bumpalo", "log", "proc-macro2", "quote", "syn", "wasm-bindgen-shared", ] [[package]] name = "wasm-bindgen-macro" version = "0.2.100" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7fe63fc6d09ed3792bd0897b314f53de8e16568c2b3f7982f468c0bf9bd0b407" dependencies = [ "quote", "wasm-bindgen-macro-support", ] [[package]] name = "wasm-bindgen-macro-support" version = "0.2.100" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "8ae87ea40c9f689fc23f209965b6fb8a99ad69aeeb0231408be24920604395de" dependencies = [ "proc-macro2", "quote", "syn", "wasm-bindgen-backend", "wasm-bindgen-shared", ] [[package]] name = "wasm-bindgen-shared" version = "0.2.100" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1a05d73b933a847d6cccdda8f838a22ff101ad9bf93e33684f39c1f5f0eece3d" dependencies = [ "unicode-ident", ] [[package]] name = "winapi-util" version = "0.1.10" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0978bf7171b3d90bac376700cb56d606feb40f251a475a5d6634613564460b22" dependencies = [ "windows-sys", ] [[package]] name = "windows-core" version = "0.61.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c0fdd3ddb90610c7638aa2b3a3ab2904fb9e5cdbecc643ddb3647212781c4ae3" dependencies = [ "windows-implement", "windows-interface", "windows-link", "windows-result", "windows-strings", ] [[package]] name = "windows-implement" version = "0.60.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a47fddd13af08290e67f4acabf4b459f647552718f683a7b415d290ac744a836" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "windows-interface" version = "0.59.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "bd9211b69f8dcdfa817bfd14bf1c97c9188afa36f4750130fcdf3f400eca9fa8" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "windows-link" version = "0.1.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5e6ad25900d524eaabdbbb96d20b4311e1e7ae1699af4fb28c17ae66c80d798a" [[package]] name = "windows-result" version = "0.3.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "56f42bd332cc6c8eac5af113fc0c1fd6a8fd2aa08a0119358686e5160d0586c6" dependencies = [ "windows-link", ] [[package]] name = "windows-strings" version = "0.4.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "56e6c93f3a0c3b36176cb1327a4958a0353d5d166c2a35cb268ace15e91d3b57" dependencies = [ "windows-link", ] [[package]] name = "windows-sys" version = "0.60.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f2f500e4d28234f72040990ec9d39e3a6b950f9f22d3dba18416c35882612bcb" dependencies = [ "windows-targets", ] [[package]] name = "windows-targets" version = "0.53.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d5fe6031c4041849d7c496a8ded650796e7b6ecc19df1a431c1a363342e5dc91" dependencies = [ "windows-link", "windows_aarch64_gnullvm", "windows_aarch64_msvc", "windows_i686_gnu", "windows_i686_gnullvm", "windows_i686_msvc", "windows_x86_64_gnu", "windows_x86_64_gnullvm", "windows_x86_64_msvc", ] [[package]] name = "windows_aarch64_gnullvm" version = "0.53.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "86b8d5f90ddd19cb4a147a5fa63ca848db3df085e25fee3cc10b39b6eebae764" [[package]] name = "windows_aarch64_msvc" version = "0.53.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c7651a1f62a11b8cbd5e0d42526e55f2c99886c77e007179efff86c2b137e66c" [[package]] name = "windows_i686_gnu" version = "0.53.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c1dc67659d35f387f5f6c479dc4e28f1d4bb90ddd1a5d3da2e5d97b42d6272c3" [[package]] name = "windows_i686_gnullvm" version = "0.53.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9ce6ccbdedbf6d6354471319e781c0dfef054c81fbc7cf83f338a4296c0cae11" [[package]] name = "windows_i686_msvc" version = "0.53.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "581fee95406bb13382d2f65cd4a908ca7b1e4c2f1917f143ba16efe98a589b5d" [[package]] name = "windows_x86_64_gnu" version = "0.53.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2e55b5ac9ea33f2fc1716d1742db15574fd6fc8dadc51caab1c16a3d3b4190ba" [[package]] name = "windows_x86_64_gnullvm" version = "0.53.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0a6e035dd0599267ce1ee132e51c27dd29437f63325753051e71dd9e42406c57" [[package]] name = "windows_x86_64_msvc" version = "0.53.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "271414315aff87387382ec3d271b52d7ae78726f5d44ac98b4f4030c91880486" [[package]] name = "winnow" version = "0.7.13" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "21a0236b59786fed61e2a80582dd500fe61f18b5dca67a4a067d0bc9039339cf" dependencies = [ "memchr", ] [[package]] name = "wit-bindgen" version = "0.45.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "052283831dbae3d879dc7f51f3d92703a316ca49f91540417d38591826127814" [[package]] name = "zerocopy" version = "0.8.26" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1039dd0d3c310cf05de012d8a39ff557cb0d23087fd44cad61df08fc31907a2f" dependencies = [ "zerocopy-derive", ] [[package]] name = "zerocopy-derive" version = "0.8.26" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9ecf5b4cc5364572d7f4c329661bcc82724222973f2cab6f050a4e5c22f75181" dependencies = [ "proc-macro2", "quote", "syn", ] fast_image_resize-5.3.0/Cargo.toml0000644000000113620000000000100125220ustar # THIS FILE IS AUTOMATICALLY GENERATED BY CARGO # # When uploading crates to the registry Cargo will automatically # "normalize" Cargo.toml files for maximal compatibility # with all versions of Cargo and also rewrite `path` dependencies # to registry (e.g., crates.io) dependencies. # # If you are reading this file be aware that the original Cargo.toml # will likely look very different (and much more reasonable). # See Cargo.toml.orig for the original contents. [package] edition = "2021" rust-version = "1.87.0" name = "fast_image_resize" version = "5.3.0" authors = ["Kirill Kuzminykh "] build = false exclude = [ "/data", "/.github", ] autolib = false autobins = false autoexamples = false autotests = false autobenches = false description = "Library for fast image resizing with using of SIMD instructions" documentation = "https://docs.rs/crate/fast_image_resize" readme = "README.md" keywords = [ "image", "resize", ] license = "MIT OR Apache-2.0" repository = "https://github.com/cykooz/fast_image_resize" [[package.metadata.release.pre-release-replacements]] file = "CHANGELOG.md" replace = "{{version}}" search = "Unreleased" [[package.metadata.release.pre-release-replacements]] file = "CHANGELOG.md" replace = "{{date}}" search = "ReleaseDate" [features] for_testing = [ "image", "image/png", ] image = [ "dep:image", "dep:bytemuck", ] only_u8x4 = [] rayon = [ "dep:rayon", "resize/rayon", "image/rayon", ] [lib] name = "fast_image_resize" path = "src/lib.rs" [[test]] name = "alpha_tests" path = "tests/alpha_tests.rs" [[test]] name = "color_tests" path = "tests/color_tests.rs" [[test]] name = "image_view" path = "tests/image_view.rs" [[test]] name = "images_tests" path = "tests/images_tests.rs" [[test]] name = "resize_tests" path = "tests/resize_tests.rs" [[test]] name = "testing" path = "tests/testing.rs" [[bench]] name = "bench_alpha" path = "benches/bench_alpha.rs" harness = false [[bench]] name = "bench_color_mapper" path = "benches/bench_color_mapper.rs" harness = false [[bench]] name = "bench_compare_l" path = "benches/bench_compare_l.rs" harness = false [[bench]] name = "bench_compare_l16" path = "benches/bench_compare_l16.rs" harness = false [[bench]] name = "bench_compare_l32f" path = "benches/bench_compare_l32f.rs" harness = false [[bench]] name = "bench_compare_la" path = "benches/bench_compare_la.rs" harness = false [[bench]] name = "bench_compare_la16" path = "benches/bench_compare_la16.rs" harness = false [[bench]] name = "bench_compare_la32f" path = "benches/bench_compare_la32f.rs" harness = false [[bench]] name = "bench_compare_rgb" path = "benches/bench_compare_rgb.rs" harness = false [[bench]] name = "bench_compare_rgb16" path = "benches/bench_compare_rgb16.rs" harness = false [[bench]] name = "bench_compare_rgb32f" path = "benches/bench_compare_rgb32f.rs" harness = false [[bench]] name = "bench_compare_rgba" path = "benches/bench_compare_rgba.rs" harness = false [[bench]] name = "bench_compare_rgba16" path = "benches/bench_compare_rgba16.rs" harness = false [[bench]] name = "bench_compare_rgba32f" path = "benches/bench_compare_rgba32f.rs" harness = false [[bench]] name = "bench_resize" path = "benches/bench_resize.rs" harness = false [[bench]] name = "bench_threads" path = "benches/bench_threads.rs" harness = false [dependencies.bytemuck] version = "1.23" optional = true [dependencies.cfg-if] version = "1.0" [dependencies.document-features] version = "0.2.11" [dependencies.image] version = "0.25.6" optional = true default-features = false [dependencies.num-traits] version = "0.2.19" [dependencies.rayon] version = "1.11" optional = true [dependencies.thiserror] version = "2.0" [dev-dependencies.criterion] version = "0.7.0" features = ["cargo_bench_support"] default-features = false [dev-dependencies.itertools] version = "0.14.0" [dev-dependencies.png] version = "0.18.0" [dev-dependencies.resize] version = "0.8.8" features = ["std"] default-features = false [dev-dependencies.rgb] version = "0.8.52" [dev-dependencies.serde] version = "1.0" features = ["serde_derive"] [dev-dependencies.serde_json] version = "1.0" [dev-dependencies.tera] version = "1.20" [dev-dependencies.walkdir] version = "2.5" [target.'cfg(all(not(target_arch = "wasm32"), not(target_os = "windows")))'.dev-dependencies.libvips] version = "1.7" [target.'cfg(not(target_arch = "wasm32"))'.dev-dependencies.nix] version = "0.30.1" features = ["sched"] default-features = false [profile.dev.package."*"] opt-level = 3 [profile.release] opt-level = 3 incremental = true strip = true [profile.release.package.image] codegen-units = 1 [profile.release.package.resize] codegen-units = 1 [profile.release.build-override] opt-level = 2 debug = 0 [profile.test] opt-level = 1 incremental = true fast_image_resize-5.3.0/Cargo.toml.orig000064400000000000000000000071751046102023000162120ustar 00000000000000[workspace] members = [ "resizer", ] [package] name = "fast_image_resize" version = "5.3.0" authors = ["Kirill Kuzminykh "] edition = "2021" rust-version = "1.87.0" license = "MIT OR Apache-2.0" description = "Library for fast image resizing with using of SIMD instructions" readme = "README.md" keywords = ["image", "resize"] repository = "https://github.com/cykooz/fast_image_resize" documentation = "https://docs.rs/crate/fast_image_resize" exclude = ["/data", "/.github"] [dependencies] cfg-if = "1.0" num-traits = "0.2.19" thiserror = "2.0" document-features = "0.2.11" # Optional dependencies image = { version = "0.25.6", optional = true, default-features = false } bytemuck = { version = "1.23", optional = true } rayon = { version = "1.11", optional = true } [features] ## Enable this feature to implement traits [IntoImageView](crate::IntoImageView) and ## [IntoImageViewMut](crate::IntoImageViewMut) for the ## [DynamicImage](https://docs.rs/image/latest/image/enum.DynamicImage.html) ## type from the `image` crate. image = ["dep:image", "dep:bytemuck"] ## This feature enables image processing in the ` rayon ` thread pool. rayon = ["dep:rayon", "resize/rayon", "image/rayon"] for_testing = ["image", "image/png"] only_u8x4 = [] # This can be used to experiment with the crate's code. [dev-dependencies] fast_image_resize = { path = ".", features = ["for_testing"] } resize = { version = "0.8.8", default-features = false, features = ["std"] } rgb = "0.8.52" png = "0.18.0" serde = { version = "1.0", features = ["serde_derive"] } serde_json = "1.0" walkdir = "2.5" itertools = "0.14.0" criterion = { version = "0.7.0", default-features = false, features = ["cargo_bench_support"] } tera = "1.20" [target.'cfg(not(target_arch = "wasm32"))'.dev-dependencies] nix = { version = "0.30.1", default-features = false, features = ["sched"] } [target.'cfg(all(not(target_arch = "wasm32"), not(target_os = "windows")))'.dev-dependencies] libvips = "1.7" [profile.test] opt-level = 1 incremental = true # debug builds for deps [profile.dev.package.'*'] opt-level = 3 # release build for procmacros - same config as debug build for procmacros [profile.release.build-override] opt-level = 2 debug = false # when possible [profile.release] opt-level = 3 incremental = true #lto = true #codegen-units = 1 strip = true #[profile.release.package.fast_image_resize] #codegen-units = 1 [profile.release.package.image] codegen-units = 1 [profile.release.package.resize] codegen-units = 1 [package.metadata.release] pre-release-replacements = [ { file = "CHANGELOG.md", search = "Unreleased", replace = "{{version}}" }, { file = "CHANGELOG.md", search = "ReleaseDate", replace = "{{date}}" } ] [[bench]] name = "bench_resize" harness = false [[bench]] name = "bench_alpha" harness = false [[bench]] name = "bench_compare_rgb" harness = false [[bench]] name = "bench_compare_rgb16" harness = false [[bench]] name = "bench_compare_rgb32f" harness = false [[bench]] name = "bench_compare_rgba" harness = false [[bench]] name = "bench_compare_rgba16" harness = false [[bench]] name = "bench_compare_rgba32f" harness = false [[bench]] name = "bench_compare_l" harness = false [[bench]] name = "bench_compare_la" harness = false [[bench]] name = "bench_compare_l16" harness = false [[bench]] name = "bench_compare_la16" harness = false [[bench]] name = "bench_compare_l32f" harness = false [[bench]] name = "bench_compare_la32f" harness = false [[bench]] name = "bench_color_mapper" harness = false [[bench]] name = "bench_threads" harness = false # Header of the next release in CHANGELOG.md: # ## [Unreleased] - ReleaseDate fast_image_resize-5.3.0/LICENSE-APACHE000064400000000000000000000261261046102023000152440ustar 00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright (c) 2021 Kirill Kuzminykh Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. fast_image_resize-5.3.0/LICENSE-MIT000064400000000000000000000020611046102023000147440ustar 00000000000000MIT License Copyright (c) 2021 Kirill Kuzminykh Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. fast_image_resize-5.3.0/README.md000064400000000000000000000220161046102023000145710ustar 00000000000000# fast_image_resize [![github](https://img.shields.io/badge/github-Cykooz%2Ffast__image__resize-8da0cb?logo=github)](https://github.com/Cykooz/fast_image_resize) [![crates.io](https://img.shields.io/crates/v/fast_image_resize.svg?logo=rust)](https://crates.io/crates/fast_image_resize) [![docs.rs](https://img.shields.io/badge/docs.rs-fast__image__resize-66c2a5?logo=docs.rs)](https://docs.rs/fast_image_resize) Rust library for fast image resizing with using of SIMD instructions. [CHANGELOG](https://github.com/Cykooz/fast_image_resize/blob/main/CHANGELOG.md) Supported pixel formats and available optimizations: | Format | Description | SSE4.1 | AVX2 | Neon | Wasm32 SIMD128 | |:------:|:--------------------------------------------------------------|:------:|:----:|:----:|:--------------:| | U8 | One `u8` component per pixel (e.g. L) | + | + | + | + | | U8x2 | Two `u8` components per pixel (e.g. LA) | + | + | + | + | | U8x3 | Three `u8` components per pixel (e.g. RGB) | + | + | + | + | | U8x4 | Four `u8` components per pixel (e.g. RGBA, RGBx, CMYK) | + | + | + | + | | U16 | One `u16` components per pixel (e.g. L16) | + | + | + | + | | U16x2 | Two `u16` components per pixel (e.g. LA16) | + | + | + | + | | U16x3 | Three `u16` components per pixel (e.g. RGB16) | + | + | + | + | | U16x4 | Four `u16` components per pixel (e.g. RGBA16, RGBx16, CMYK16) | + | + | + | + | | I32 | One `i32` component per pixel (e.g. L32) | - | - | - | - | | F32 | One `f32` component per pixel (e.g. L32F) | + | + | - | - | | F32x2 | Two `f32` components per pixel (e.g. LA32F) | + | + | - | - | | F32x3 | Three `f32` components per pixel (e.g. RGB32F) | + | + | - | - | | F32x4 | Four `f32` components per pixel (e.g. RGBA32F) | + | + | - | - | ## Colorspace Resizer from this crate does not convert image into linear colorspace during a resize process. If it is important for you to resize images with a non-linear color space (e.g. sRGB) correctly, then you have to convert it to a linear color space before resizing and convert back to the color space of result image. [Read more](http://www.ericbrasseur.org/gamma.html) about resizing with respect to color space. This crate provides the [PixelComponentMapper](https://docs.rs/fast_image_resize/latest/fast_image_resize/struct.PixelComponentMapper.html) structure that allows you to create colorspace converters for images whose pixels based on `u8` and `u16` components. In addition, the crate contains functions `create_gamma_22_mapper()` and `create_srgb_mapper()` to create instance of `PixelComponentMapper` that converts images from sRGB or gamma 2.2 into linear colorspace and back. ## Multi-threading You should enable `"rayon"` feature to turn on image processing in [rayon](https://docs.rs/rayon/latest/rayon/) thread pool. ## Some benchmarks in single-threaded mode for x86_64 _All benchmarks:_ [_x86_64_](https://github.com/Cykooz/fast_image_resize/blob/main/benchmarks-x86_64.md), [_ARM64_](https://github.com/Cykooz/fast_image_resize/blob/main/benchmarks-arm64.md), [_WASM32_](https://github.com/Cykooz/fast_image_resize/blob/main/benchmarks-wasm32.md). Other libraries used to compare of resizing speed: - image () - resize (, single-threaded mode) - libvips (single-threaded mode) ### Resize RGB8 image (U8x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 29.28 | - | 83.28 | 136.97 | 189.93 | | resize | 7.42 | 26.82 | 49.29 | 93.22 | 140.26 | | libvips | 2.42 | 61.73 | 5.66 | 9.81 | 15.78 | | fir rust | 0.28 | 10.87 | 16.12 | 26.63 | 38.08 | | fir sse4.1 | 0.28 | 3.37 | 5.34 | 9.89 | 15.30 | | fir avx2 | 0.28 | 2.52 | 3.67 | 6.80 | 13.21 | ### Resize RGBA8 image (U8x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:------:|:--------:|:-------:|:--------:| | resize | 9.59 | 34.02 | 64.61 | 126.43 | 187.18 | | libvips | 4.19 | 169.02 | 142.22 | 228.64 | 330.24 | | fir rust | 0.19 | 20.30 | 25.25 | 36.57 | 49.69 | | fir sse4.1 | 0.19 | 9.51 | 11.90 | 17.78 | 24.49 | | fir avx2 | 0.19 | 7.11 | 8.39 | 13.68 | 21.72 | ### Resize L8 image (U8) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with one byte per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 26.90 | - | 56.49 | 85.11 | 112.72 | | resize | 6.57 | 11.06 | 18.83 | 38.44 | 63.98 | | libvips | 2.62 | 24.92 | 6.81 | 9.84 | 12.73 | | fir rust | 0.16 | 4.42 | 5.45 | 8.69 | 12.04 | | fir sse4.1 | 0.16 | 1.45 | 2.02 | 3.37 | 5.44 | | fir avx2 | 0.16 | 1.51 | 1.73 | 2.74 | 4.11 | ## Examples ### Resize RGBA8 image Note: You must enable `"image"` feature to support of [image::DynamicImage](https://docs.rs/image/latest/image/enum.DynamicImage.html). Otherwise, you have to convert such images into supported by the crate image type. ```rust use std::io::BufWriter; use image::codecs::png::PngEncoder; use image::{ExtendedColorType, ImageEncoder, ImageReader}; use fast_image_resize::{IntoImageView, Resizer}; use fast_image_resize::images::Image; fn main() { // Read source image from file let src_image = ImageReader::open("./data/nasa-4928x3279.png") .unwrap() .decode() .unwrap(); // Create container for data of destination image let dst_width = 1024; let dst_height = 768; let mut dst_image = Image::new( dst_width, dst_height, src_image.pixel_type().unwrap(), ); // Create Resizer instance and resize source image // into buffer of destination image let mut resizer = Resizer::new(); resizer.resize(&src_image, &mut dst_image, None).unwrap(); // Write destination image as PNG-file let mut result_buf = BufWriter::new(Vec::new()); PngEncoder::new(&mut result_buf) .write_image( dst_image.buffer(), dst_width, dst_height, src_image.color().into(), ) .unwrap(); } ``` ### Resize with cropping ```rust use image::codecs::png::PngEncoder; use image::{ColorType, ImageReader, GenericImageView}; use fast_image_resize::{IntoImageView, Resizer, ResizeOptions}; use fast_image_resize::images::Image; fn main() { let img = ImageReader::open("./data/nasa-4928x3279.png") .unwrap() .decode() .unwrap(); // Create container for data of destination image let mut dst_image = Image::new( 1024, 768, img.pixel_type().unwrap(), ); // Create Resizer instance and resize cropped source image // into buffer of destination image let mut resizer = Resizer::new(); resizer.resize( &img, &mut dst_image, &ResizeOptions::new().crop( 10.0, // left 10.0, // top 2000.0, // width 2000.0, // height ), ).unwrap(); } ``` ### Change CPU extensions used by resizer ```rust, no_run use fast_image_resize as fr; fn main() { let mut resizer = fr::Resizer::new(); #[cfg(target_arch = "x86_64")] unsafe { resizer.set_cpu_extensions(fr::CpuExtensions::Sse4_1); } } ``` fast_image_resize-5.3.0/benches/bench_alpha.rs000064400000000000000000000122361046102023000175160ustar 00000000000000use fast_image_resize::images::Image; use fast_image_resize::{CpuExtensions, MulDiv, PixelType}; use num_traits::ToBytes; use utils::testing::cpu_ext_into_str; mod utils; // Multiplies by alpha fn get_src_image(width: u32, height: u32, pixel_type: PixelType, pixel: &[u8]) -> Image<'static> { let pixels_count = width as usize * height as usize; let buffer = (0..pixels_count) .flat_map(|_| pixel.iter().copied()) .collect(); Image::from_vec_u8(width, height, buffer, pixel_type).unwrap() } fn multiplies_alpha( bench_group: &mut utils::BenchGroup, pixel_type: PixelType, cpu_extensions: CpuExtensions, ) { let sample_size = 100; let width = 4096; let height = 2048; let f32x2_bytes: Vec = [1.0, 0.5].iter().flat_map(|v| v.to_le_bytes()).collect(); let f32x4_bytes: Vec = [1.0, 0.5, 0., 0.5] .iter() .flat_map(|v| v.to_le_bytes()) .collect(); let pixel: &[u8] = match pixel_type { PixelType::U8x4 => &[255, 128, 0, 128], PixelType::U8x2 => &[255, 128], PixelType::U16x2 => &[255, 255, 0, 128], PixelType::U16x4 => &[0, 255, 0, 128, 0, 0, 0, 128], PixelType::F32x2 => &f32x2_bytes, PixelType::F32x4 => &f32x4_bytes, _ => unreachable!(), }; let src_data = get_src_image(width, height, pixel_type, pixel); let mut dst_data = Image::new(width, height, pixel_type); let mut alpha_mul_div: MulDiv = Default::default(); unsafe { alpha_mul_div.set_cpu_extensions(cpu_extensions); } utils::bench( bench_group, sample_size, format!("Multiplies alpha {pixel_type:?}"), cpu_ext_into_str(cpu_extensions), |bencher| { bencher.iter(|| { alpha_mul_div .multiply_alpha(&src_data, &mut dst_data) .unwrap(); }) }, ); let src_image = get_src_image(width, height, pixel_type, pixel); utils::bench( bench_group, sample_size, format!("Multiplies alpha inplace {pixel_type:?}"), cpu_ext_into_str(cpu_extensions), |bencher| { let mut image = src_image.copy(); bencher.iter(|| { alpha_mul_div.multiply_alpha_inplace(&mut image).unwrap(); }) }, ); } fn divides_alpha( bench_group: &mut utils::BenchGroup, pixel_type: PixelType, cpu_extensions: CpuExtensions, ) { let sample_size = 100; let width = 4095; let height = 2048; let f32x2_bytes: Vec = [0.5, 0.5].iter().flat_map(|v| v.to_le_bytes()).collect(); let f32x4_bytes: Vec = [0.5, 0.25, 0., 0.5] .iter() .flat_map(|v| v.to_le_bytes()) .collect(); let pixel: &[u8] = match pixel_type { PixelType::U8x4 => &[128, 64, 0, 128], PixelType::U8x2 => &[128, 128], PixelType::U16x2 => &[0, 128, 0, 128], PixelType::U16x4 => &[0, 128, 0, 64, 0, 0, 0, 128], PixelType::F32x2 => &f32x2_bytes, PixelType::F32x4 => &f32x4_bytes, _ => unreachable!(), }; let src_data = get_src_image(width, height, pixel_type, pixel); let mut dst_data = Image::new(width, height, pixel_type); let mut alpha_mul_div: MulDiv = Default::default(); unsafe { alpha_mul_div.set_cpu_extensions(cpu_extensions); } utils::bench( bench_group, sample_size, format!("Divides alpha {pixel_type:?}"), cpu_ext_into_str(cpu_extensions), |bencher| { bencher.iter(|| { alpha_mul_div .divide_alpha(&src_data, &mut dst_data) .unwrap(); }) }, ); let src_image = get_src_image(width, height, pixel_type, pixel); utils::bench( bench_group, sample_size, format!("Divides alpha inplace {pixel_type:?}"), cpu_ext_into_str(cpu_extensions), |bencher| { let mut image = src_image.copy(); bencher.iter(|| { alpha_mul_div.divide_alpha_inplace(&mut image).unwrap(); }) }, ); } fn bench_alpha(bench_group: &mut utils::BenchGroup) { let pixel_types = [ PixelType::U8x2, PixelType::U8x4, PixelType::U16x2, PixelType::U16x4, PixelType::F32x2, PixelType::F32x4, ]; let mut cpu_extensions = vec![CpuExtensions::None]; #[cfg(target_arch = "x86_64")] { cpu_extensions.push(CpuExtensions::Sse4_1); cpu_extensions.push(CpuExtensions::Avx2); } #[cfg(target_arch = "aarch64")] { cpu_extensions.push(CpuExtensions::Neon); } #[cfg(target_arch = "wasm32")] { cpu_extensions.push(CpuExtensions::Simd128); } for pixel_type in pixel_types { for &cpu_ext in cpu_extensions.iter() { multiplies_alpha(bench_group, pixel_type, cpu_ext); } } for pixel_type in pixel_types { for &cpu_ext in cpu_extensions.iter() { divides_alpha(bench_group, pixel_type, cpu_ext); } } } fn main() { let res = utils::run_bench(bench_alpha, "Bench Alpha"); println!("{}", utils::build_md_table(&res)); } fast_image_resize-5.3.0/benches/bench_color_mapper.rs000064400000000000000000000014601046102023000211100ustar 00000000000000use fast_image_resize::create_srgb_mapper; use fast_image_resize::images::Image; use fast_image_resize::pixels::U8x3; use utils::pin_process_to_cpu0; use utils::testing::PixelTestingExt; mod utils; pub fn bench_color_mapper(bench_group: &mut utils::BenchGroup) { let src_image = U8x3::load_big_src_image(); let mut dst_image = Image::new( src_image.width(), src_image.height(), src_image.pixel_type(), ); let mapper = create_srgb_mapper(); bench_group .criterion_group .bench_function("SRGB U8x3 => RGB U8x3", |bencher| { bencher.iter(|| { mapper.forward_map(&src_image, &mut dst_image).unwrap(); }) }); } fn main() { pin_process_to_cpu0(); utils::run_bench(bench_color_mapper, "Color mapper"); } fast_image_resize-5.3.0/benches/bench_compare_l.rs000064400000000000000000000013041046102023000203640ustar 00000000000000use fast_image_resize::pixels::U8; use resize::Pixel::Gray8; use rgb::FromSlice; use utils::testing::PixelTestingExt; mod utils; pub fn bench_compare_l(bench_group: &mut utils::BenchGroup) { type P = U8; let src_image = P::load_big_image(); utils::image_resize(bench_group, &src_image); utils::resize_resize( bench_group, Gray8, src_image.as_raw().as_gray(), src_image.width(), src_image.height(), ); utils::libvips_resize::

(bench_group, false); utils::fir_resize::

(bench_group, false); } fn main() { let res = utils::run_bench(bench_compare_l, "Compare resize of U8 image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_l16.rs000064400000000000000000000013211046102023000205320ustar 00000000000000use fast_image_resize::pixels::U16; use resize::Pixel::Gray16; use rgb::FromSlice; use utils::testing::PixelTestingExt; mod utils; pub fn bench_downscale_l16(bench_group: &mut utils::BenchGroup) { type P = U16; let src_image = P::load_big_image(); utils::image_resize(bench_group, &src_image); utils::resize_resize( bench_group, Gray16, src_image.as_raw().as_gray(), src_image.width(), src_image.height(), ); utils::libvips_resize::

(bench_group, false); utils::fir_resize::

(bench_group, false); } fn main() { let res = utils::run_bench(bench_downscale_l16, "Compare resize of U16 image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_l32f.rs000064400000000000000000000013261046102023000207030ustar 00000000000000use fast_image_resize::pixels::F32; use resize::Pixel::GrayF32; use rgb::FromSlice; use utils::testing::PixelTestingExt; mod utils; pub fn bench_downscale_l32f(bench_group: &mut utils::BenchGroup) { type P = F32; let src_image = P::load_big_image(); utils::image_resize(bench_group, &src_image); utils::resize_resize( bench_group, GrayF32, src_image.as_raw().as_gray(), src_image.width(), src_image.height(), ); utils::libvips_resize::

(bench_group, false); utils::fir_resize::

(bench_group, false); } fn main() { let res = utils::run_bench(bench_downscale_l32f, "Compare resize of L32F image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_la.rs000064400000000000000000000005741046102023000205350ustar 00000000000000use fast_image_resize::pixels::U8x2; mod utils; pub fn bench_downscale_la(bench_group: &mut utils::BenchGroup) { type P = U8x2; utils::libvips_resize::

(bench_group, true); utils::fir_resize::

(bench_group, true); } fn main() { let res = utils::run_bench(bench_downscale_la, "Compare resize of LA image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_la16.rs000064400000000000000000000006041046102023000206760ustar 00000000000000use fast_image_resize::pixels::U16x2; mod utils; pub fn bench_downscale_la16(bench_group: &mut utils::BenchGroup) { type P = U16x2; utils::libvips_resize::

(bench_group, true); utils::fir_resize::

(bench_group, true); } fn main() { let res = utils::run_bench(bench_downscale_la16, "Compare resize of LA16 image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_la32f.rs000064400000000000000000000006071046102023000210450ustar 00000000000000use fast_image_resize::pixels::F32x2; mod utils; pub fn bench_downscale_la32f(bench_group: &mut utils::BenchGroup) { type P = F32x2; utils::libvips_resize::

(bench_group, true); utils::fir_resize::

(bench_group, true); } fn main() { let res = utils::run_bench(bench_downscale_la32f, "Compare resize of LA32F image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_rgb.rs000064400000000000000000000013161046102023000207060ustar 00000000000000use fast_image_resize::pixels::U8x3; use resize::Pixel::RGB8; use rgb::FromSlice; use utils::testing::PixelTestingExt; mod utils; pub fn bench_downscale_rgb(bench_group: &mut utils::BenchGroup) { type P = U8x3; let src_image = P::load_big_image(); utils::image_resize(bench_group, &src_image); utils::resize_resize( bench_group, RGB8, src_image.as_raw().as_rgb(), src_image.width(), src_image.height(), ); utils::libvips_resize::

(bench_group, false); utils::fir_resize::

(bench_group, false); } fn main() { let res = utils::run_bench(bench_downscale_rgb, "Compare resize of RGB image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_rgb16.rs000064400000000000000000000013301046102023000210510ustar 00000000000000use fast_image_resize::pixels::U16x3; use resize::Pixel::RGB16; use rgb::FromSlice; use utils::testing::PixelTestingExt; mod utils; pub fn bench_downscale_rgb16(bench_group: &mut utils::BenchGroup) { type P = U16x3; let src_image = P::load_big_image(); utils::image_resize(bench_group, &src_image); utils::resize_resize( bench_group, RGB16, src_image.as_raw().as_rgb(), src_image.width(), src_image.height(), ); utils::libvips_resize::

(bench_group, false); utils::fir_resize::

(bench_group, false); } fn main() { let res = utils::run_bench(bench_downscale_rgb16, "Compare resize of RGB16 image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_rgb32f.rs000064400000000000000000000013351046102023000212220ustar 00000000000000use fast_image_resize::pixels::F32x3; use resize::Pixel::RGBF32; use rgb::FromSlice; use utils::testing::PixelTestingExt; mod utils; pub fn bench_downscale_rgb32f(bench_group: &mut utils::BenchGroup) { type P = F32x3; let src_image = P::load_big_image(); utils::image_resize(bench_group, &src_image); utils::resize_resize( bench_group, RGBF32, src_image.as_raw().as_rgb(), src_image.width(), src_image.height(), ); utils::libvips_resize::

(bench_group, false); utils::fir_resize::

(bench_group, false); } fn main() { let res = utils::run_bench(bench_downscale_rgb32f, "Compare resize of RGB32F image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_rgba.rs000064400000000000000000000012421046102023000210450ustar 00000000000000use fast_image_resize::pixels::U8x4; use resize::Pixel::RGBA8P; use rgb::FromSlice; use utils::testing::PixelTestingExt; mod utils; pub fn bench_downscale_rgba(bench_group: &mut utils::BenchGroup) { type P = U8x4; let src_image = P::load_big_image(); utils::resize_resize( bench_group, RGBA8P, src_image.as_raw().as_rgba(), src_image.width(), src_image.height(), ); utils::libvips_resize::

(bench_group, true); utils::fir_resize::

(bench_group, true); } fn main() { let res = utils::run_bench(bench_downscale_rgba, "Compare resize of RGBA image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_rgba16.rs000064400000000000000000000012541046102023000212170ustar 00000000000000use fast_image_resize::pixels::U16x4; use resize::Pixel::RGBA16P; use rgb::FromSlice; use utils::testing::PixelTestingExt; mod utils; pub fn bench_downscale_rgba16(bench_group: &mut utils::BenchGroup) { type P = U16x4; let src_image = P::load_big_image(); utils::resize_resize( bench_group, RGBA16P, src_image.as_raw().as_rgba(), src_image.width(), src_image.height(), ); utils::libvips_resize::

(bench_group, true); utils::fir_resize::

(bench_group, true); } fn main() { let res = utils::run_bench(bench_downscale_rgba16, "Compare resize of RGBA16 image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_compare_rgba32f.rs000064400000000000000000000006151046102023000213630ustar 00000000000000use fast_image_resize::pixels::F32x4; mod utils; pub fn bench_downscale_rgba32f(bench_group: &mut utils::BenchGroup) { type P = F32x4; utils::libvips_resize::

(bench_group, true); utils::fir_resize::

(bench_group, true); } fn main() { let res = utils::run_bench(bench_downscale_rgba32f, "Compare resize of RGBA32F image"); utils::print_and_write_compare_result(&res); } fast_image_resize-5.3.0/benches/bench_resize.rs000064400000000000000000000177761046102023000177500ustar 00000000000000use fast_image_resize::images::Image; use fast_image_resize::pixels::*; use fast_image_resize::{CpuExtensions, FilterType, PixelType, ResizeAlg, ResizeOptions, Resizer}; use utils::testing::{cpu_ext_into_str, PixelTestingExt}; mod utils; const NEW_SIZE: u32 = 695; fn native_nearest_u8x4_bench(bench_group: &mut utils::BenchGroup) { let src_image = U8x4::load_big_square_src_image(); let mut dst_image = Image::new(NEW_SIZE, NEW_SIZE, PixelType::U8x4); let mut resizer = Resizer::new(); let options = ResizeOptions::new().resize_alg(ResizeAlg::Nearest); unsafe { resizer.set_cpu_extensions(CpuExtensions::None); } utils::bench(bench_group, 100, "U8x4 Nearest", "rust", |bencher| { bencher.iter(|| { resizer .resize(&src_image, &mut dst_image, &options) .unwrap() }) }); } #[cfg(not(feature = "only_u8x4"))] fn native_nearest_u8_bench(bench_group: &mut utils::BenchGroup) { let src_image = U8::load_big_square_src_image(); let mut dst_image = Image::new(NEW_SIZE, NEW_SIZE, PixelType::U8); let mut resizer = Resizer::new(); let options = ResizeOptions::new().resize_alg(ResizeAlg::Nearest); unsafe { resizer.set_cpu_extensions(CpuExtensions::None); } utils::bench(bench_group, 100, "U8 Nearest", "rust", |bencher| { bencher.iter(|| { resizer .resize(&src_image, &mut dst_image, &options) .unwrap() }) }); } fn downscale_bench( bench_group: &mut utils::BenchGroup, image: &Image<'static>, cpu_extensions: CpuExtensions, filter_type: FilterType, dst_width: u32, dst_height: u32, name_prefix: &str, ) { let mut res_image = Image::new(dst_width, dst_height, image.pixel_type()); let mut resizer = Resizer::new(); let options = ResizeOptions::new() .resize_alg(ResizeAlg::Convolution(filter_type)) .use_alpha(false); unsafe { resizer.set_cpu_extensions(cpu_extensions); } let prefix = if name_prefix.is_empty() { "".to_string() } else { format!(" {}", name_prefix) }; utils::bench( bench_group, 100, format!("{:?} {:?}", image.pixel_type(), filter_type), format!("{}{}", cpu_ext_into_str(cpu_extensions), prefix), |bencher| bencher.iter(|| resizer.resize(image, &mut res_image, &options).unwrap()), ); } pub fn resize_in_one_dimension_bench(bench_group: &mut utils::BenchGroup) { let pixel_types = [ PixelType::U8, PixelType::U8x2, PixelType::U8x3, PixelType::U8x4, PixelType::U16, PixelType::U16x2, PixelType::U16x3, PixelType::U16x4, PixelType::F32, PixelType::F32x2, PixelType::F32x3, PixelType::F32x4, ]; let mut cpu_extensions = vec![CpuExtensions::None]; #[cfg(target_arch = "x86_64")] { cpu_extensions.push(CpuExtensions::Sse4_1); cpu_extensions.push(CpuExtensions::Avx2); } #[cfg(target_arch = "aarch64")] { cpu_extensions.push(CpuExtensions::Neon); } #[cfg(target_arch = "wasm32")] { cpu_extensions.push(CpuExtensions::Simd128); } for pixel_type in pixel_types { #[cfg(feature = "only_u8x4")] if pixel_type != PixelType::U8x4 { continue; } for &cpu_extension in cpu_extensions.iter() { #[cfg(not(feature = "only_u8x4"))] let image = match pixel_type { PixelType::U8 => U8::load_big_square_src_image(), PixelType::U8x2 => U8x2::load_big_square_src_image(), PixelType::U8x3 => U8x3::load_big_square_src_image(), PixelType::U8x4 => U8x4::load_big_square_src_image(), PixelType::U16 => U16::load_big_square_src_image(), PixelType::U16x2 => U16x2::load_big_square_src_image(), PixelType::U16x3 => U16x3::load_big_square_src_image(), PixelType::U16x4 => U16x4::load_big_square_src_image(), PixelType::I32 => I32::load_big_square_src_image(), PixelType::F32 => F32::load_big_square_src_image(), PixelType::F32x2 => F32x2::load_big_square_src_image(), PixelType::F32x3 => F32x3::load_big_square_src_image(), PixelType::F32x4 => F32x4::load_big_square_src_image(), _ => unreachable!(), }; #[cfg(feature = "only_u8x4")] let image = match pixel_type { PixelType::U8x4 => U8x4::load_big_square_src_image(), _ => unreachable!(), }; downscale_bench( bench_group, &image, cpu_extension, FilterType::Lanczos3, NEW_SIZE, image.height(), "H", ); downscale_bench( bench_group, &image, cpu_extension, FilterType::Lanczos3, image.height(), NEW_SIZE, "V", ); } } } pub fn resize_bench(bench_group: &mut utils::BenchGroup) { let pixel_types = [ PixelType::U8, PixelType::U8x2, PixelType::U8x3, PixelType::U8x4, PixelType::U16, PixelType::U16x2, PixelType::U16x3, PixelType::U16x4, PixelType::I32, PixelType::F32, PixelType::F32x2, PixelType::F32x3, PixelType::F32x4, ]; let mut cpu_extensions = vec![CpuExtensions::None]; #[cfg(target_arch = "x86_64")] { cpu_extensions.push(CpuExtensions::Sse4_1); cpu_extensions.push(CpuExtensions::Avx2); } #[cfg(target_arch = "aarch64")] { cpu_extensions.push(CpuExtensions::Neon); } #[cfg(target_arch = "wasm32")] { cpu_extensions.push(CpuExtensions::Simd128); } for pixel_type in pixel_types { #[cfg(feature = "only_u8x4")] if pixel_type != PixelType::U8x4 { continue; } for &cpu_extension in cpu_extensions.iter() { #[cfg(not(feature = "only_u8x4"))] let image = match pixel_type { PixelType::U8 => U8::load_big_square_src_image(), PixelType::U8x2 => U8x2::load_big_square_src_image(), PixelType::U8x3 => U8x3::load_big_square_src_image(), PixelType::U8x4 => U8x4::load_big_square_src_image(), PixelType::U16 => U16::load_big_square_src_image(), PixelType::U16x2 => U16x2::load_big_square_src_image(), PixelType::U16x3 => U16x3::load_big_square_src_image(), PixelType::U16x4 => U16x4::load_big_square_src_image(), PixelType::I32 => I32::load_big_square_src_image(), PixelType::F32 => F32::load_big_square_src_image(), PixelType::F32x2 => F32x2::load_big_square_src_image(), PixelType::F32x3 => F32x3::load_big_square_src_image(), PixelType::F32x4 => F32x4::load_big_square_src_image(), _ => unreachable!(), }; #[cfg(feature = "only_u8x4")] let image = match pixel_type { PixelType::U8x4 => U8x4::load_big_square_src_image(), _ => unreachable!(), }; downscale_bench( bench_group, &image, cpu_extension, FilterType::Lanczos3, NEW_SIZE, NEW_SIZE, "", ); } } native_nearest_u8x4_bench(bench_group); #[cfg(not(feature = "only_u8x4"))] native_nearest_u8_bench(bench_group); } fn main1() { let results = utils::run_bench(resize_bench, "Resize"); println!("{}", utils::build_md_table(&results)); } fn main() { let results = utils::run_bench(resize_in_one_dimension_bench, "Resize one dimension"); println!("{}", utils::build_md_table(&results)); } fast_image_resize-5.3.0/benches/bench_threads.rs000064400000000000000000000041231046102023000200570ustar 00000000000000use fast_image_resize::images::Image; use fast_image_resize::pixels::U8x3; use fast_image_resize::{CpuExtensions, FilterType, ResizeAlg, ResizeOptions, Resizer}; use utils::testing::PixelTestingExt; use crate::utils::{bench, build_md_table, BenchGroup}; mod utils; pub fn fir_resize(bench_group: &mut BenchGroup, use_alpha: bool) { let src_sizes: Vec = vec![2, 10, 100, 500, 1000, 5000, 10000, 65536]; let mut resizer = Resizer::new(); unsafe { resizer.set_cpu_extensions(CpuExtensions::None); } let resize_options = ResizeOptions::new() .resize_alg(ResizeAlg::Convolution(FilterType::Bilinear)) .use_alpha(false); for &src_width in &src_sizes { for &src_height in &src_sizes { let dst_width = src_width / 2; let src_image = Image::new(src_width, src_height, P::pixel_type()); let mut dst_image = Image::new(dst_width, src_height, src_image.pixel_type()); for thread_count in 1..=8 { bench( bench_group, 10, format!("{src_width}x{src_height}"), format!("{thread_count}"), |bencher| { let mut builder = rayon::ThreadPoolBuilder::new(); builder = builder.num_threads(thread_count); let pool = builder.build().unwrap(); pool.install(|| { bencher.iter(|| { resizer .resize(&src_image, &mut dst_image, &resize_options) .unwrap() }) }) }, ); } } } } pub fn bench_threads(bench_group: &mut BenchGroup) { type P = U8x3; fir_resize::

(bench_group, false); } fn main() { let res = utils::run_bench(bench_threads, "Compare resize by width images with threads"); let md_table = build_md_table(&res); println!("{}", md_table); } fast_image_resize-5.3.0/benches/templates/bench_compare_l.md.tera000064400000000000000000000005611046102023000232740ustar 00000000000000### Resize L8 image (U8) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with one byte per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_l16.md.tera000064400000000000000000000005641046102023000234460ustar 00000000000000### Resize L16 image (U16) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with two bytes per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_l32f.md.tera000064400000000000000000000005651046102023000236130ustar 00000000000000### Resize L32F image (F32) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with two bytes per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_la.md.tera000064400000000000000000000011051046102023000234300ustar 00000000000000### Resize LA8 image (U8x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (two bytes per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_la16.md.tera000064400000000000000000000011421046102023000236000ustar 00000000000000### Resize LA16 (luma with alpha channel) image (U16x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (four bytes per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_la32f.md.tera000064400000000000000000000011511046102023000237440ustar 00000000000000### Resize LA32F (luma with alpha channel) image (F32x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (two `f32` values per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_rgb.md.tera000064400000000000000000000004671046102023000236200ustar 00000000000000### Resize RGB8 image (U8x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) - Numbers in the table mean a duration of image resizing in milliseconds. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_rgb16.md.tera000064400000000000000000000005331046102023000237610ustar 00000000000000### Resize RGB16 image (U16x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into RGB16 image. - Numbers in the table mean a duration of image resizing in milliseconds. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_rgb32f.md.tera000064400000000000000000000005351046102023000241270ustar 00000000000000### Resize RGB32F image (F32x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into RGB32F image. - Numbers in the table mean a duration of image resizing in milliseconds. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_rgba.md.tera000064400000000000000000000006741046102023000237610ustar 00000000000000### Resize RGBA8 image (U8x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_rgba16.md.tera000064400000000000000000000006761046102023000241320ustar 00000000000000### Resize RGBA16 image (U16x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/bench_compare_rgba32f.md.tera000064400000000000000000000010511046102023000242620ustar 00000000000000### Resize RGBA32F image (F32x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support multiplying and dividing by alpha channel for this pixel format. {{ compare_results -}} fast_image_resize-5.3.0/benches/templates/introduction.md.tera000064400000000000000000000016711046102023000227200ustar 00000000000000## Benchmarks of fast_image_resize crate for {{ arch_name }} architecture Environment: {% if arch_id == "arm64" -%} - CPU: Neoverse-N1 2GHz (Oracle Cloud Compute, VM.Standard.A1.Flex) {% else -%} - CPU: AMD Ryzen 9 5950X - RAM: DDR4 4000 MHz {% endif -%} - Ubuntu 24.04 (linux 6.11.0) - Rust 1.87.0 - criterion = "0.5.1" - fast_image_resize = "5.1.4" {% if arch_id == "wasm32" -%} - wasmtime = "32.0.0" {% endif %} Other libraries used to compare of resizing speed: - image = "0.25.6" () - resize = "0.8.8" (, single-threaded mode) {% if arch_id != "wasm32" -%} - libvips = "8.15.1" (single-threaded mode) {% endif %} Resize algorithms: - Nearest - Box - convolution with minimal kernel size 1x1 px - Bilinear - convolution with minimal kernel size 2x2 px - Bicubic (CatmullRom) - convolution with minimal kernel size 4x4 px - Lanczos3 - convolution with minimal kernel size 6x6 px fast_image_resize-5.3.0/benches/utils/bencher.rs000064400000000000000000000076141046102023000200440ustar 00000000000000use std::env; use std::path::PathBuf; use std::time::{Duration, SystemTime}; use criterion::measurement::WallTime; use criterion::{Bencher, BenchmarkGroup, BenchmarkId, Criterion}; use super::{cargo_target_directory, get_arch_id_and_name, get_results, BenchResult}; pub struct BenchGroup<'a> { pub criterion_group: BenchmarkGroup<'a, WallTime>, old_results: Vec, results: Vec, } impl<'a> BenchGroup<'a> { fn finish(self) -> Vec { self.criterion_group.finish(); self.results } } pub fn run_bench(bench_fn: F, name: &str) -> Vec where F: FnOnce(&mut BenchGroup), { if env::var("PIN_TO_CPU0").is_ok() { pin_process_to_cpu0(); } let arch_id = get_arch_id_and_name().0; let output_dir = criterion_output_directory().join(arch_id); let mut criterion = Criterion::default() .output_directory(&output_dir) .configure_from_args(); let now = SystemTime::now(); let results_dir = output_dir.join(name); let results_lifetime: u32 = env::var("RESULTS_LIFETIME") .unwrap_or_else(|_| "0".to_owned()) .parse() .unwrap_or_default(); let old_results = if results_lifetime > 0 && name.starts_with("Compare ") { let old_now = now - Duration::from_secs(results_lifetime as u64 * 24 * 3600); get_results(&results_dir, &old_now) } else { vec![] }; let mut group = BenchGroup { criterion_group: criterion.benchmark_group(name), old_results, results: vec![], }; bench_fn(&mut group); let mut results = group.finish(); criterion.final_summary(); let new_results = get_results(&results_dir, &now); if new_results.is_empty() { new_results } else { for res in results.iter_mut().filter(|r| r.estimate < 0.) { res.estimate = new_results .iter() .find(|new_res| { new_res.function_name == res.function_name && new_res.parameter == res.parameter }) .map(|r| r.estimate) .unwrap_or(0.) } results } } pub fn bench( group: &mut BenchGroup, sample_size: usize, func_name: S1, parameter: S2, mut f: F, ) where S1: Into, S2: Into, F: FnMut(&mut Bencher), { let parameter = parameter.into(); let func_name = func_name.into(); // Use old results only for other libraries, not for 'fast_image_resize' if !func_name.starts_with("fir ") { if let Some(old_res) = group .old_results .iter() .find(|res| res.function_name == func_name && res.parameter == parameter) { group.results.push(old_res.clone()); println!( "SKIP benching of '{}' function with '{}' parameter due to using old result.", func_name, parameter ); return; } } group.results.push(BenchResult { function_name: func_name.clone(), parameter: parameter.clone(), estimate: -1., // Unknown result }); group.criterion_group.sample_size(sample_size); group.criterion_group.bench_with_input( BenchmarkId::new(func_name, ¶meter), ¶meter, |bencher, _| f(bencher), ); } /// Pin process to #0 CPU core pub fn pin_process_to_cpu0() { #[cfg(not(target_arch = "wasm32"))] { let mut cpu_set = nix::sched::CpuSet::new(); cpu_set.set(0).unwrap(); nix::sched::sched_setaffinity(nix::unistd::Pid::from_raw(0), &cpu_set).unwrap(); } } fn criterion_output_directory() -> PathBuf { if let Some(value) = env::var_os("CRITERION_HOME") { PathBuf::from(value) } else if let Some(path) = cargo_target_directory() { path.join("criterion") } else { PathBuf::from("target/criterion") } } fast_image_resize-5.3.0/benches/utils/mod.rs000064400000000000000000000024131046102023000172050ustar 00000000000000use std::env; use std::path::PathBuf; use std::process::Command; pub use bencher::*; pub use resize_functions::*; pub use results::*; use serde::Deserialize; mod bencher; mod resize_functions; mod results; pub mod testing; const fn get_arch_id_and_name() -> (&'static str, &'static str) { #[cfg(target_arch = "x86_64")] return ("x86_64", "x86_64"); #[cfg(target_arch = "aarch64")] return ("arm64", "arm64"); #[cfg(target_arch = "wasm32")] return ("wasm32", "Wasm32"); #[cfg(not(any( target_arch = "x86_64", target_arch = "aarch64", target_arch = "wasm32" )))] return ("unknown", "Unknown"); } /// Returns the Cargo target directory, possibly calling `cargo metadata` to /// figure it out. fn cargo_target_directory() -> Option { #[derive(Deserialize)] struct Metadata { target_directory: PathBuf, } env::var_os("CARGO_TARGET_DIR") .map(PathBuf::from) .or_else(|| { let output = Command::new(env::var_os("CARGO")?) .args(["metadata", "--format-version", "1"]) .output() .ok()?; let metadata: Metadata = serde_json::from_slice(&output.stdout).ok()?; Some(metadata.target_directory) }) } fast_image_resize-5.3.0/benches/utils/resize_functions.rs000064400000000000000000000220341046102023000220200ustar 00000000000000use std::ops::Deref; use criterion::black_box; use fast_image_resize::images::Image; use fast_image_resize::{CpuExtensions, FilterType, ResizeAlg, ResizeOptions, Resizer}; use image::{imageops, ImageBuffer}; use super::bencher::{bench, BenchGroup}; use super::testing::{cpu_ext_into_str, PixelTestingExt}; const ALG_NAMES: [&str; 5] = ["Nearest", "Box", "Bilinear", "Bicubic", "Lanczos3"]; const NEW_WIDTH: u32 = 852; const NEW_HEIGHT: u32 = 567; /// Resize image with help of "image" crate (https://crates.io/crates/image) pub fn image_resize(bench_group: &mut BenchGroup, src_image: &ImageBuffer) where P: image::Pixel + 'static, C: Deref, { for alg_name in ALG_NAMES { let (filter, sample_size) = match alg_name { "Nearest" => (imageops::Nearest, 80), "Bilinear" => (imageops::Triangle, 50), "Bicubic" => (imageops::CatmullRom, 30), "Lanczos3" => (imageops::Lanczos3, 20), _ => continue, }; bench(bench_group, sample_size, "image", alg_name, |bencher| { bencher.iter(|| { imageops::resize(src_image, NEW_WIDTH, NEW_HEIGHT, filter); }) }); } } /// Resize image with help of "resize" crate (https://crates.io/crates/resize) pub fn resize_resize( bench_group: &mut BenchGroup, pixel_format: Format, src_image: &[Format::InputPixel], src_width: u32, src_height: u32, ) where Out: Clone, Format: resize::PixelFormat + Copy, { for alg_name in ALG_NAMES { let mut dst = vec![pixel_format.into_pixel(Format::new()); (NEW_WIDTH * NEW_HEIGHT) as usize]; let sample_size = if alg_name == "Lanczos3" { 60 } else { 100 }; bench(bench_group, sample_size, "resize", alg_name, |bencher| { let filter = match alg_name { "Nearest" => resize::Type::Point, "Box" => resize::Type::Custom(resize::Filter::box_filter(0.5)), "Bilinear" => resize::Type::Triangle, "Bicubic" => resize::Type::Catrom, "Lanczos3" => resize::Type::Lanczos3, _ => return, }; let mut resizer = resize::new( src_width as usize, src_height as usize, NEW_WIDTH as usize, NEW_HEIGHT as usize, pixel_format, filter, ) .unwrap(); bencher.iter(|| { resizer.resize(src_image, &mut dst).unwrap(); }) }); } } /// Resize image with help of "libvips" crate (https://crates.io/crates/libvips) pub fn libvips_resize(bench_group: &mut BenchGroup, has_alpha: bool) { #[cfg(all(not(target_arch = "wasm32"), not(target_os = "windows")))] vips::libvips_resize_inner::

(bench_group, has_alpha); } #[cfg(all(not(target_arch = "wasm32"), not(target_os = "windows")))] mod vips { use libvips::ops::{self, BandFormat, Kernel, ReduceOptions}; use libvips::{VipsApp, VipsImage}; use super::*; const SAMPLE_SIZE: usize = 100; pub(crate) fn libvips_resize_inner( bench_group: &mut BenchGroup, has_alpha: bool, ) { let app = VipsApp::new("Test Libvips", false).expect("Cannot initialize libvips"); let num_threads: u32 = std::env::var("RAYON_NUM_THREADS") .map(|s| s.parse().unwrap_or(1)) .unwrap_or(1); if num_threads > 0 { app.concurrency_set(num_threads as i32); } app.cache_set_max(0); app.cache_set_max_mem(0); let src_image_data = P::load_big_src_image(); let src_width = src_image_data.width() as i32; let src_height = src_image_data.height() as i32; let band_format = match P::count_of_component_values() { 0x100 => BandFormat::Uchar, 0x10000 => BandFormat::Ushort, 0 => BandFormat::Float, _ => panic!("Unknown type of pixel"), }; let src_vips_image = VipsImage::new_from_memory( src_image_data.buffer(), src_width, src_height, P::count_of_components() as i32, band_format, ) .unwrap(); let hshrink = src_width as f64 / NEW_WIDTH as f64; let vshrink = src_height as f64 / NEW_HEIGHT as f64; for alg_name in ALG_NAMES { let kernel = match alg_name { "Nearest" => Kernel::Nearest, "Box" => { bench(bench_group, SAMPLE_SIZE, "libvips", alg_name, |bencher| { if has_alpha { bencher.iter(|| { let premultiplied = ops::premultiply(&src_vips_image).unwrap(); let resized = ops::shrink(&premultiplied, hshrink, vshrink).unwrap(); let result = ops::unpremultiply(&resized).unwrap(); let result = ops::cast(&result, band_format).unwrap(); let res_bytes = result.image_write_to_memory(); black_box(&res_bytes); }) } else { bencher.iter(|| { let result = ops::shrink(&src_vips_image, hshrink, vshrink).unwrap(); let res_bytes = result.image_write_to_memory(); black_box(&res_bytes); }) } }); continue; } "Bilinear" => Kernel::Linear, "Bicubic" => Kernel::Cubic, "Lanczos3" => Kernel::Lanczos3, _ => continue, }; let options = ReduceOptions { kernel, gap: 0. }; bench(bench_group, SAMPLE_SIZE, "libvips", alg_name, |bencher| { if has_alpha && alg_name != "Nearest" { bencher.iter(|| { let premultiplied = ops::premultiply(&src_vips_image).unwrap(); let resized = ops::reduce_with_opts(&premultiplied, hshrink, vshrink, &options) .unwrap(); let result = ops::unpremultiply(&resized).unwrap(); let result = ops::cast(&result, band_format).unwrap(); let res_bytes = result.image_write_to_memory(); black_box(&res_bytes); }) } else { bencher.iter(|| { let result = ops::reduce_with_opts(&src_vips_image, hshrink, vshrink, &options) .unwrap(); let res_bytes = result.image_write_to_memory(); black_box(&res_bytes); }) } }); } } } /// Resize image with help of "fast_imager_resize" crate pub fn fir_resize(bench_group: &mut BenchGroup, use_alpha: bool) { let src_image_data = P::load_big_src_image(); let mut dst_image = Image::new(NEW_WIDTH, NEW_HEIGHT, src_image_data.pixel_type()); let mut cpu_extensions = vec![CpuExtensions::None]; #[cfg(target_arch = "x86_64")] { cpu_extensions.push(CpuExtensions::Sse4_1); cpu_extensions.push(CpuExtensions::Avx2); } #[cfg(target_arch = "aarch64")] { cpu_extensions.push(CpuExtensions::Neon); } #[cfg(target_arch = "wasm32")] { cpu_extensions.push(CpuExtensions::Simd128); } for cpu_ext in cpu_extensions { for alg_name in ALG_NAMES { let resize_alg = match alg_name { "Nearest" => ResizeAlg::Nearest, "Box" => ResizeAlg::Convolution(FilterType::Box), "Bilinear" => ResizeAlg::Convolution(FilterType::Bilinear), "Bicubic" => ResizeAlg::Convolution(FilterType::CatmullRom), "Lanczos3" => ResizeAlg::Convolution(FilterType::Lanczos3), _ => continue, }; let mut fast_resizer = Resizer::new(); unsafe { fast_resizer.set_cpu_extensions(cpu_ext); } let sample_size = 100; let resize_options = ResizeOptions::new() .resize_alg(resize_alg) .use_alpha(use_alpha); bench( bench_group, sample_size, format!("fir {}", cpu_ext_into_str(cpu_ext)), alg_name, |bencher| { bencher.iter(|| { fast_resizer .resize(&src_image_data, &mut dst_image, &resize_options) .unwrap() }) }, ); } } } fast_image_resize-5.3.0/benches/utils/results.rs000064400000000000000000000242771046102023000201430ustar 00000000000000use std::borrow::Cow; use std::collections::HashMap; use std::env; use std::path::{Path, PathBuf}; use std::time::SystemTime; use itertools::Itertools; use serde::Deserialize; use walkdir::WalkDir; use super::get_arch_id_and_name; #[derive(Debug, Clone)] pub struct BenchResult { pub function_name: String, pub parameter: String, /// Estimate time in nanoseconds pub estimate: f64, } impl BenchResult { pub fn new(function_name: String, parameter: Option, path: &Path) -> Self { #[derive(Deserialize)] struct Mean { point_estimate: f64, } #[derive(Deserialize)] struct Estimates { mean: Mean, } let data = std::fs::read_to_string(path).expect("Unable to read file with benchmark results"); let estimates: Estimates = serde_json::from_str(&data).expect("Unable to parse JSON data with benchmark results"); Self { function_name, parameter: parameter.unwrap_or_default(), estimate: estimates.mean.point_estimate, } } } /// Find all "new/estimates.json" files inside of given directory. /// Get only files what were created after the given time. /// Read estimate time from this files and return vector of `BenchResult` instances. pub fn get_results(parent_dir: &PathBuf, modified_after: &SystemTime) -> Vec { let mut result = vec![]; if !parent_dir.is_dir() { println!("WARNING: Directory with bench results is absent"); return result; } let result_paths = WalkDir::new(parent_dir) .follow_links(true) .into_iter() .map(|e| e.expect("Invalid FS entry")) .filter(|e| e.path().ends_with("new/estimates.json")) .map(|e| (e.metadata().expect("Unable get metadata for FS entity"), e)) .filter(|(m, _)| m.is_file()) .map(|(m, e)| { ( m.modified() .expect("Unable to get last modification time of estimates.json file"), e, ) }) // Exclude old results .filter(|(modified, _)| modified >= modified_after) .sorted_by_key(|(modified, _)| modified.to_owned()) .map(|(_, e)| e.into_path()); for path in result_paths { let rel_path = path.strip_prefix(parent_dir).unwrap_or(&path).to_path_buf(); let path_components: Vec = rel_path .iter() .map(|os_str| { os_str .to_str() .expect("Unable to convert FS entry name into String") .to_string() }) .collect(); let (function_name, parameter_name) = match path_components.as_slice() { [f, p, _, _] => (f.to_string(), Some(p.to_string())), [f, _, _] => (f.to_string(), None), _ => panic!("Relative path to bench result is invalid"), }; result.push(BenchResult::new(function_name, parameter_name, &path)); } result } static COL_ORDER: [&str; 5] = ["Nearest", "Box", "Bilinear", "Bicubic", "Lanczos3"]; pub fn build_md_table(bench_results: &[BenchResult]) -> String { let mut row_names: Vec = Vec::new(); let mut row_indexes: HashMap = HashMap::new(); let mut col_names: Vec = Vec::new(); for result in bench_results { let row_name = result.function_name.clone(); if !row_names.contains(&row_name) { row_names.push(row_name.clone()); row_indexes.insert(row_name.clone(), row_names.len() - 1); } let col_name = result.parameter.clone(); if !col_names.contains(&col_name) { col_names.push(col_name.clone()); } } // Reorder columns let mut ordered_pos = 0; for name in COL_ORDER { if let Some((cur_pos, _)) = col_names.iter().find_position(|s| s.as_str() == name) { if cur_pos != ordered_pos { col_names.swap(cur_pos, ordered_pos); } ordered_pos += 1; } } let col_indexes: HashMap = col_names .iter() .enumerate() .map(|(i, v)| (v.clone(), i)) .collect(); let cols_count = col_names.len(); let mut values = vec![Cow::Borrowed("-"); row_names.len() * cols_count]; for result in bench_results { let row_index = row_indexes.get(&result.function_name).copied(); let col_index = col_indexes.get(&result.parameter).copied(); if let (Some(row_index), Some(col_index)) = (row_index, col_index) { let value = result.estimate / 1000000.; if value >= 0.01 { let value_index = row_index * cols_count + col_index; values[value_index] = Cow::Owned(format!("{:.2}", value)); } } } let first_column_width = row_names.iter().map(|s| s.len()).max().unwrap_or(0); let mut column_width: Vec = vec![first_column_width]; for (col_index, col_name) in col_names.iter().enumerate() { let width = (0..row_names.len()) .map(|row_index| { let value_index = row_index * cols_count + col_index; values.get(value_index).map(|v| v.len()).unwrap_or(0) }) .max() .unwrap_or(0); column_width.push(width.max(col_name.len())); } let mut first_row: Vec = vec!["".to_owned()]; col_names.iter().for_each(|s| first_row.push(s.to_owned())); let mut str_buffer: Vec = vec![]; table_row(&mut str_buffer, &column_width, &first_row); table_header_underline(&mut str_buffer, &column_width); for row_name in row_names.iter() { let mut row = vec![row_name.clone()]; for col_name in col_names.iter() { let row_index = row_indexes.get(row_name).copied(); let col_index = col_indexes.get(col_name).copied(); if let (Some(row_index), Some(col_index)) = (row_index, col_index) { let value_index = row_index * cols_count + col_index; let value = values .get(value_index) .map(|v| v.to_string()) .unwrap_or_default(); row.push(value); } } table_row(&mut str_buffer, &column_width, &row); } str_buffer.join("") } fn table_row(buffer: &mut Vec, widths: &[usize], values: &[String]) { for (i, (&width, value)) in widths.iter().zip(values).enumerate() { match i { 0 => buffer.push(format!("| {:width$} ", value, width = width)), _ => buffer.push(format!("| {:^width$} ", value, width = width)), } } buffer.push("|\n".to_string()); } fn table_header_underline(buffer: &mut Vec, widths: &[usize]) { for (i, &width) in widths.iter().enumerate() { match i { 0 => buffer.push(format!("|{:- buffer.push(format!("|:{:-\n", placeholder_name); let start = match content.find(&start_maker) { Some(s) => s, None => { println!( "WARNING: Can't find start marker for placeholder '{}' in file {:?}", placeholder_name, path ); return; } }; let end_maker = format!("", placeholder_name); let end = match content.find(&end_maker) { Some(s) => s, None => { println!( "WARNING: Can't find end marker for placeholder '{}' in file {:?}", placeholder_name, path ); return; } }; let replace_str = [start_maker.as_str(), string].join(""); content.replace_range(start..end, &replace_str); std::fs::write(path, content).expect("Unable to save string into file"); } #[cfg(not(target_arch = "wasm32"))] static TEMPLATES_PATH: &str = "benches/templates/*.tera"; // WASI doesn't support `std::fs::canonicalize` function #[cfg(target_arch = "wasm32")] static TEMPLATES_PATH: &str = "/benches/templates/*.tera"; fn write_bench_results_into_file(md_table: &str) { let (arch_id, arch_name) = get_arch_id_and_name(); let file_name = format!("benchmarks-{}.md", arch_id); let file_path_buf = PathBuf::from(file_name); if !file_path_buf.is_file() { panic!("Can't find file {:?} in current directory", file_path_buf); } let file_path = file_path_buf.as_path(); let tera_engine = match tera::Tera::new(TEMPLATES_PATH) { Ok(t) => t, Err(e) => { println!("Parsing error(s): {}", e); ::std::process::exit(1); } }; let mut context = tera::Context::new(); context.insert("arch_id", &arch_id); context.insert("arch_name", &arch_name); // Update introduction text let introduction = tera_engine .render("introduction.md.tera", &context) .unwrap(); insert_string_into_file(file_path, "introduction", &introduction); // Update benchmark results let crate_name = env!("CARGO_CRATE_NAME"); context.insert("compare_results", md_table); let tpl_file_name = format!("{}.md.tera", crate_name); let results_block = tera_engine.render(&tpl_file_name, &context).unwrap(); insert_string_into_file(file_path, crate_name, &results_block); if arch_id == "x86_64" { let file_path = PathBuf::from("README.md"); if !file_path.is_file() { panic!("Can't find file {:?} in current directory", file_path); } insert_string_into_file(file_path.as_path(), crate_name, &results_block); } } pub fn print_and_write_compare_result(bench_results: &[BenchResult]) { if !bench_results.is_empty() { let md_table = build_md_table(bench_results); println!("{}", md_table); if env::var("WRITE_COMPARE_RESULT").unwrap_or_else(|_| "".to_owned()) == "1" { write_bench_results_into_file(&md_table); } } } fast_image_resize-5.3.0/benches/utils/testing.rs000064400000000000000000000350421046102023000201070ustar 00000000000000use std::fs::File; use std::io::BufReader; use std::num::NonZeroU32; use std::ops::Deref; use fast_image_resize::images::Image; use fast_image_resize::pixels::*; use fast_image_resize::{change_type_of_pixel_components, CpuExtensions, PixelTrait, PixelType}; use image::{ColorType, ExtendedColorType, ImageBuffer, ImageReader}; pub fn non_zero_u32(v: u32) -> NonZeroU32 { NonZeroU32::new(v).unwrap() } pub fn image_checksum(image: &Image) -> [u64; N] { let buffer = image.buffer(); let mut res = [0u64; N]; let component_size = P::size() / P::count_of_components(); match component_size { 1 => { for pixel in buffer.chunks_exact(N) { res.iter_mut().zip(pixel).for_each(|(d, &s)| *d += s as u64); } } 2 => { let buffer_u16 = unsafe { buffer.align_to::().1 }; for pixel in buffer_u16.chunks_exact(N) { res.iter_mut().zip(pixel).for_each(|(d, &s)| *d += s as u64); } } 4 => { let buffer_u32 = unsafe { buffer.align_to::().1 }; for pixel in buffer_u32.chunks_exact(N) { res.iter_mut() .zip(pixel) .for_each(|(d, &s)| *d = d.overflowing_add(s as u64).0); } } _ => (), }; res } pub trait PixelTestingExt: PixelTrait { type ImagePixel: image::Pixel; type Container: Deref::Subpixel]>; fn pixel_type_str() -> &'static str { match Self::pixel_type() { PixelType::U8 => "u8", PixelType::U8x2 => "u8x2", PixelType::U8x3 => "u8x3", PixelType::U8x4 => "u8x4", PixelType::U16 => "u16", PixelType::U16x2 => "u16x2", PixelType::U16x3 => "u16x3", PixelType::U16x4 => "u16x4", PixelType::I32 => "i32", PixelType::F32 => "f32", PixelType::F32x2 => "f32x2", PixelType::F32x3 => "f32x3", PixelType::F32x4 => "f32x4", _ => unreachable!(), } } fn cpu_extensions() -> Vec { let mut cpu_extensions_vec = vec![CpuExtensions::None]; #[cfg(target_arch = "x86_64")] { cpu_extensions_vec.push(CpuExtensions::Sse4_1); cpu_extensions_vec.push(CpuExtensions::Avx2); } #[cfg(target_arch = "aarch64")] { cpu_extensions_vec.push(CpuExtensions::Neon); } #[cfg(target_arch = "wasm32")] { cpu_extensions_vec.push(CpuExtensions::Simd128); } cpu_extensions_vec } fn img_paths() -> (&'static str, &'static str, &'static str) { match Self::pixel_type() { PixelType::U8 | PixelType::U8x3 | PixelType::U16 | PixelType::U16x3 | PixelType::I32 | PixelType::F32 | PixelType::F32x3 => ( "./data/nasa-4928x3279.png", "./data/nasa-4019x4019.png", "./data/nasa-852x567.png", ), PixelType::U8x2 | PixelType::U8x4 | PixelType::U16x2 | PixelType::U16x4 | PixelType::F32x2 | PixelType::F32x4 => ( "./data/nasa-4928x3279-rgba.png", "./data/nasa-4019x4019-rgba.png", "./data/nasa-852x567-rgba.png", ), _ => unreachable!(), } } fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer; fn load_big_image() -> ImageBuffer { Self::load_image_buffer(ImageReader::open(Self::img_paths().0).unwrap()) } fn load_big_square_image() -> ImageBuffer { Self::load_image_buffer(ImageReader::open(Self::img_paths().1).unwrap()) } fn load_small_image() -> ImageBuffer { Self::load_image_buffer(ImageReader::open(Self::img_paths().2).unwrap()) } fn load_big_src_image() -> Image<'static> { let img = Self::load_big_image(); Image::from_vec_u8( img.width(), img.height(), Self::img_into_bytes(img), Self::pixel_type(), ) .unwrap() } fn load_big_square_src_image() -> Image<'static> { let img = Self::load_big_square_image(); Image::from_vec_u8( img.width(), img.height(), Self::img_into_bytes(img), Self::pixel_type(), ) .unwrap() } fn load_small_src_image() -> Image<'static> { let img = Self::load_small_image(); Image::from_vec_u8( img.width(), img.height(), Self::img_into_bytes(img), Self::pixel_type(), ) .unwrap() } fn img_into_bytes(img: ImageBuffer) -> Vec; } #[cfg(not(feature = "only_u8x4"))] pub mod not_u8x4 { use super::*; impl PixelTestingExt for U8 { type ImagePixel = image::Luma; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma8() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.into_raw() } } impl PixelTestingExt for U8x2 { type ImagePixel = image::LumaA; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma_alpha8() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.into_raw() } } impl PixelTestingExt for U8x3 { type ImagePixel = image::Rgb; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgb8() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.into_raw() } } impl PixelTestingExt for U16 { type ImagePixel = image::Luma; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma16() } fn img_into_bytes(img: ImageBuffer) -> Vec { // img.as_raw() // .iter() // .enumerate() // .flat_map(|(i, &c)| ((i & 0xffff) as u16).to_le_bytes()) // .collect() img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for U16x2 { type ImagePixel = image::LumaA; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma_alpha16() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for U16x3 { type ImagePixel = image::Rgb; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgb16() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for U16x4 { type ImagePixel = image::Rgba; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgba16() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for I32 { type ImagePixel = image::Luma; type Container = Vec; fn cpu_extensions() -> Vec { vec![CpuExtensions::None] } fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { let image_u16 = img_reader.decode().unwrap().to_luma32f(); ImageBuffer::from_fn(image_u16.width(), image_u16.height(), |x, y| { let pixel = image_u16.get_pixel(x, y); image::Luma::from([(pixel.0[0] * i32::MAX as f32).round() as i32]) }) } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw() .iter() .flat_map(|val| val.to_le_bytes()) .collect() } } impl PixelTestingExt for F32 { type ImagePixel = image::Luma; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma32f() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw() .iter() .flat_map(|val| val.to_le_bytes()) .collect() } } impl PixelTestingExt for F32x2 { type ImagePixel = image::LumaA; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma_alpha32f() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw() .iter() .flat_map(|val| val.to_le_bytes()) .collect() } } impl PixelTestingExt for F32x3 { type ImagePixel = image::Rgb; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgb32f() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for F32x4 { type ImagePixel = image::Rgba; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgba32f() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } } impl PixelTestingExt for U8x4 { type ImagePixel = image::Rgba; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgba8() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.into_raw() } } pub fn save_result(image: &Image, name: &str) { if std::env::var("SAVE_RESULT") .unwrap_or_else(|_| "".to_owned()) .is_empty() { return; } std::fs::create_dir_all("./data/result").unwrap(); let path = format!("./data/result/{name}.png"); let color_type: ExtendedColorType = match image.pixel_type() { PixelType::U8 => ColorType::L8.into(), PixelType::U8x2 => ColorType::La8.into(), PixelType::U8x3 => ColorType::Rgb8.into(), PixelType::U8x4 => ColorType::Rgba8.into(), PixelType::U16 => ColorType::L16.into(), PixelType::U16x2 => ColorType::La16.into(), PixelType::U16x3 => ColorType::Rgb16.into(), PixelType::U16x4 => ColorType::Rgba16.into(), PixelType::I32 | PixelType::F32 => { let mut image_u16 = Image::new(image.width(), image.height(), PixelType::U16); change_type_of_pixel_components(image, &mut image_u16).unwrap(); save_result(&image_u16, name); return; } PixelType::F32x2 => { let mut image_u16 = Image::new(image.width(), image.height(), PixelType::U16x2); change_type_of_pixel_components(image, &mut image_u16).unwrap(); save_result(&image_u16, name); return; } PixelType::F32x3 => { let mut image_u16 = Image::new(image.width(), image.height(), PixelType::U16x3); change_type_of_pixel_components(image, &mut image_u16).unwrap(); save_result(&image_u16, name); return; } PixelType::F32x4 => { let mut image_u16 = Image::new(image.width(), image.height(), PixelType::U16x4); change_type_of_pixel_components(image, &mut image_u16).unwrap(); save_result(&image_u16, name); return; } _ => panic!("Unsupported type of pixels"), }; image::save_buffer( path, image.buffer(), image.width(), image.height(), color_type, ) .unwrap(); } pub const fn cpu_ext_into_str(cpu_extensions: CpuExtensions) -> &'static str { match cpu_extensions { CpuExtensions::None => "rust", #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => "sse4.1", #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => "avx2", #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => "neon", #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => "simd128", } } fast_image_resize-5.3.0/benchmarks-arm64.md000064400000000000000000000261631046102023000167070ustar 00000000000000 ## Benchmarks of fast_image_resize crate for arm64 architecture Environment: - CPU: Neoverse-N1 2GHz (Oracle Cloud Compute, VM.Standard.A1.Flex) - Ubuntu 24.04 (linux 6.11.0) - Rust 1.87.0 - criterion = "0.5.1" - fast_image_resize = "5.1.4" Other libraries used to compare of resizing speed: - image = "0.25.6" () - resize = "0.8.8" (, single-threaded mode) - libvips = "8.15.1" (single-threaded mode) Resize algorithms: - Nearest - Box - convolution with minimal kernel size 1x1 px - Bilinear - convolution with minimal kernel size 2x2 px - Bicubic (CatmullRom) - convolution with minimal kernel size 4x4 px - Lanczos3 - convolution with minimal kernel size 6x6 px ### Resize RGB8 image (U8x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:------:|:--------:|:-------:|:--------:| | image | 83.12 | - | 170.54 | 305.90 | 433.80 | | resize | 30.38 | 59.92 | 101.62 | 183.87 | 273.82 | | libvips | 9.59 | 137.52 | 26.91 | 66.06 | 88.65 | | fir rust | 0.92 | 19.25 | 32.19 | 82.86 | 110.04 | | fir neon | 0.92 | 16.46 | 23.75 | 42.43 | 62.16 | ### Resize RGBA8 image (U8x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:------:|:--------:|:-------:|:--------:| | resize | 20.69 | 73.58 | 113.62 | 195.21 | 291.60 | | libvips | 13.87 | 326.93 | 234.51 | 471.17 | 599.71 | | fir rust | 0.92 | 44.76 | 58.95 | 126.45 | 165.50 | | fir neon | 0.92 | 29.08 | 40.81 | 64.57 | 90.29 | ### Resize L8 image (U8) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with one byte per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 78.35 | - | 119.04 | 186.56 | 258.32 | | resize | 11.31 | 26.78 | 40.38 | 70.32 | 93.02 | | libvips | 5.68 | 51.41 | 14.45 | 24.51 | 31.19 | | fir rust | 0.48 | 9.30 | 10.73 | 18.47 | 25.09 | | fir neon | 0.48 | 4.92 | 7.48 | 13.16 | 20.15 | ### Resize LA8 image (U8x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (two bytes per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:------:|:--------:|:-------:|:--------:| | libvips | 8.81 | 188.05 | 134.45 | 231.87 | 290.01 | | fir rust | 0.66 | 18.79 | 25.40 | 43.31 | 55.68 | | fir neon | 0.66 | 16.88 | 21.07 | 32.55 | 44.99 | ### Resize RGB16 image (U16x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into RGB16 image. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:------:|:--------:|:-------:|:--------:| | image | 82.91 | - | 173.38 | 338.57 | 493.64 | | resize | 19.50 | 58.09 | 98.29 | 183.34 | 267.44 | | libvips | 23.67 | 197.37 | 109.00 | 228.02 | 301.44 | | fir rust | 1.39 | 48.64 | 76.07 | 135.93 | 191.26 | | fir neon | 1.39 | 54.50 | 72.78 | 111.57 | 138.66 | ### Resize RGBA16 image (U16x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:------:|:--------:|:-------:|:--------:| | resize | 23.64 | 77.94 | 115.55 | 206.91 | 303.21 | | libvips | 33.79 | 329.22 | 236.24 | 466.96 | 591.03 | | fir rust | 1.51 | 73.59 | 108.54 | 200.42 | 267.85 | | fir neon | 1.51 | 45.25 | 60.21 | 90.64 | 120.82 | ### Resize L16 image (U16) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with two bytes per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 78.13 | - | 123.28 | 195.56 | 272.13 | | resize | 11.84 | 25.06 | 38.29 | 68.43 | 95.25 | | libvips | 9.27 | 69.76 | 38.82 | 76.85 | 100.79 | | fir rust | 0.64 | 23.31 | 35.76 | 58.21 | 85.60 | | fir neon | 0.64 | 11.80 | 16.48 | 26.00 | 36.87 | ### Resize LA16 (luma with alpha channel) image (U16x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (four bytes per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:------:|:--------:|:-------:|:--------:| | libvips | 18.43 | 202.03 | 145.95 | 242.56 | 302.87 | | fir rust | 0.93 | 41.87 | 60.11 | 101.73 | 140.40 | | fir neon | 0.93 | 24.30 | 33.30 | 52.33 | 71.84 | ### Resize L32F image (F32) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with two bytes per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 7.06 | - | 39.97 | 72.83 | 106.05 | | resize | 12.14 | 23.30 | 32.60 | 55.84 | 83.37 | | libvips | 8.20 | 67.17 | 40.19 | 90.36 | 118.79 | | fir rust | 0.93 | 18.92 | 30.89 | 54.12 | 78.40 | Note: The `resize` crate uses `f32` for intermediate calculations. The `fast_image_resize` uses `f64`. This is a reason why `fast_image_resize` is slower or equal in cases with `f32`-based pixels. ### Resize LA32F (luma with alpha channel) image (F32x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (two `f32` values per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:------:|:--------:|:-------:|:--------:| | libvips | 16.89 | 184.38 | 129.68 | 225.33 | 283.49 | | fir rust | 1.52 | 38.62 | 61.88 | 116.00 | 162.75 | ### Resize RGB32F image (F32x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into RGB32F image. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:------:|:--------:|:-------:|:--------:| | image | 8.00 | - | 51.74 | 95.36 | 139.32 | | resize | 19.96 | 45.73 | 69.25 | 133.08 | 190.53 | | libvips | 19.48 | 197.65 | 114.33 | 274.50 | 354.63 | | fir rust | 2.29 | 39.09 | 71.75 | 149.96 | 214.76 | ### Resize RGBA32F image (F32x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support multiplying and dividing by alpha channel for this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:------:|:--------:|:-------:|:--------:| | libvips | 32.13 | 323.66 | 230.94 | 456.97 | 587.84 | | fir rust | 3.01 | 68.00 | 111.35 | 209.27 | 315.19 | fast_image_resize-5.3.0/benchmarks-wasm32.md000064400000000000000000000246401046102023000170700ustar 00000000000000 ## Benchmarks of fast_image_resize crate for Wasm32 architecture Environment: - CPU: AMD Ryzen 9 5950X - RAM: DDR4 4000 MHz - Ubuntu 24.04 (linux 6.11.0) - Rust 1.87.0 - criterion = "0.5.1" - fast_image_resize = "5.1.4" - wasmtime = "32.0.0" Other libraries used to compare of resizing speed: - image = "0.25.6" () - resize = "0.8.8" (, single-threaded mode) Resize algorithms: - Nearest - Box - convolution with minimal kernel size 1x1 px - Bilinear - convolution with minimal kernel size 2x2 px - Bicubic (CatmullRom) - convolution with minimal kernel size 4x4 px - Lanczos3 - convolution with minimal kernel size 6x6 px ### Resize RGB8 image (U8x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |-------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 22.36 | - | 114.09 | 207.35 | 299.62 | | resize | 11.40 | 32.71 | 59.71 | 113.30 | 167.48 | | fir rust | 0.39 | 36.51 | 67.08 | 128.85 | 191.63 | | fir simd128 | 0.39 | 4.82 | 7.46 | 13.43 | 19.83 | ### Resize RGBA8 image (U8x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |-------------|:-------:|:-----:|:--------:|:-------:|:--------:| | resize | 11.65 | 38.44 | 73.46 | 140.39 | 209.13 | | fir rust | 0.25 | 89.18 | 129.75 | 211.68 | 294.66 | | fir simd128 | 0.25 | 12.16 | 14.94 | 21.81 | 29.56 | ### Resize L8 image (U8) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with one byte per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |-------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 18.89 | - | 85.80 | 152.05 | 216.91 | | resize | 6.67 | 16.13 | 27.43 | 50.73 | 74.16 | | fir rust | 0.21 | 12.83 | 23.04 | 44.08 | 65.74 | | fir simd128 | 0.21 | 2.66 | 3.43 | 5.53 | 8.26 | ### Resize LA8 image (U8x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (two bytes per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |-------------|:-------:|:-----:|:--------:|:-------:|:--------:| | fir rust | 0.19 | 43.68 | 65.07 | 108.59 | 153.40 | | fir simd128 | 0.19 | 7.04 | 8.48 | 11.78 | 16.11 | ### Resize RGB16 image (U16x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into RGB16 image. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |-------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 21.29 | - | 103.21 | 186.77 | 269.27 | | resize | 10.66 | 31.93 | 54.70 | 106.72 | 153.68 | | fir rust | 0.41 | 31.09 | 52.53 | 97.53 | 142.71 | | fir simd128 | 0.41 | 28.80 | 48.69 | 88.22 | 127.73 | ### Resize RGBA16 image (U16x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |-------------|:-------:|:-----:|:--------:|:-------:|:--------:| | resize | 12.68 | 39.06 | 74.18 | 141.37 | 209.51 | | fir rust | 0.42 | 84.14 | 117.31 | 178.55 | 243.12 | | fir simd128 | 0.42 | 47.40 | 71.17 | 122.44 | 171.00 | ### Resize L16 image (U16) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with two bytes per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |-------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 19.11 | - | 82.39 | 145.23 | 207.70 | | resize | 7.85 | 18.17 | 29.70 | 53.69 | 78.64 | | fir rust | 0.20 | 18.15 | 25.96 | 44.14 | 61.74 | | fir simd128 | 0.20 | 10.18 | 17.22 | 29.79 | 42.83 | ### Resize LA16 (luma with alpha channel) image (U16x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (four bytes per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |-------------|:-------:|:-----:|:--------:|:-------:|:--------:| | fir rust | 0.25 | 44.73 | 61.06 | 92.55 | 124.11 | | fir simd128 | 0.25 | 24.57 | 37.51 | 63.99 | 90.32 | ### Resize L32F image (F32) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with two bytes per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 11.33 | - | 58.92 | 106.75 | 155.04 | | resize | 8.98 | 16.53 | 24.59 | 45.82 | 67.67 | | fir rust | 0.25 | 10.13 | 18.35 | 36.05 | 60.08 | Note: The `resize` crate uses `f32` for intermediate calculations. The `fast_image_resize` uses `f64`. This is a reason why `fast_image_resize` is slower or equal in cases with `f32`-based pixels. ### Resize LA32F (luma with alpha channel) image (F32x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (two `f32` values per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:-----:|:--------:|:-------:|:--------:| | fir rust | 0.42 | 30.89 | 43.06 | 70.18 | 101.63 | ### Resize RGB32F image (F32x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into RGB32F image. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 11.78 | - | 68.25 | 126.64 | 184.94 | | resize | 10.76 | 21.80 | 34.28 | 64.89 | 96.11 | | fir rust | 1.02 | 24.73 | 45.06 | 85.19 | 128.12 | ### Resize RGBA32F image (F32x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support multiplying and dividing by alpha channel for this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |----------|:-------:|:-----:|:--------:|:-------:|:--------:| | fir rust | 1.22 | 50.02 | 73.15 | 118.86 | 168.38 | fast_image_resize-5.3.0/benchmarks-x86_64.md000064400000000000000000000300341046102023000167040ustar 00000000000000 ## Benchmarks of fast_image_resize crate for x86_64 architecture Environment: - CPU: AMD Ryzen 9 5950X - RAM: DDR4 4000 MHz - Ubuntu 24.04 (linux 6.11.0) - Rust 1.87.0 - criterion = "0.5.1" - fast_image_resize = "5.1.4" Other libraries used to compare of resizing speed: - image = "0.25.6" () - resize = "0.8.8" (, single-threaded mode) - libvips = "8.15.1" (single-threaded mode) Resize algorithms: - Nearest - Box - convolution with minimal kernel size 1x1 px - Bilinear - convolution with minimal kernel size 2x2 px - Bicubic (CatmullRom) - convolution with minimal kernel size 4x4 px - Lanczos3 - convolution with minimal kernel size 6x6 px ### Resize RGB8 image (U8x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 29.28 | - | 83.28 | 136.97 | 189.93 | | resize | 7.42 | 26.82 | 49.29 | 93.22 | 140.26 | | libvips | 2.42 | 61.73 | 5.66 | 9.81 | 15.78 | | fir rust | 0.28 | 10.87 | 16.12 | 26.63 | 38.08 | | fir sse4.1 | 0.28 | 3.37 | 5.34 | 9.89 | 15.30 | | fir avx2 | 0.28 | 2.52 | 3.67 | 6.80 | 13.21 | ### Resize RGBA8 image (U8x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:------:|:--------:|:-------:|:--------:| | resize | 9.59 | 34.02 | 64.61 | 126.43 | 187.18 | | libvips | 4.19 | 169.02 | 142.22 | 228.64 | 330.24 | | fir rust | 0.19 | 20.30 | 25.25 | 36.57 | 49.69 | | fir sse4.1 | 0.19 | 9.51 | 11.90 | 17.78 | 24.49 | | fir avx2 | 0.19 | 7.11 | 8.39 | 13.68 | 21.72 | ### Resize L8 image (U8) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with one byte per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 26.90 | - | 56.49 | 85.11 | 112.72 | | resize | 6.57 | 11.06 | 18.83 | 38.44 | 63.98 | | libvips | 2.62 | 24.92 | 6.81 | 9.84 | 12.73 | | fir rust | 0.16 | 4.42 | 5.45 | 8.69 | 12.04 | | fir sse4.1 | 0.16 | 1.45 | 2.02 | 3.37 | 5.44 | | fir avx2 | 0.16 | 1.51 | 1.73 | 2.74 | 4.11 | ### Resize LA8 image (U8x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (two bytes per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | libvips | 3.66 | 94.55 | 79.09 | 122.64 | 165.17 | | fir rust | 0.17 | 14.54 | 16.90 | 23.02 | 29.07 | | fir sse4.1 | 0.17 | 5.82 | 7.03 | 9.72 | 13.47 | | fir avx2 | 0.17 | 4.05 | 4.78 | 6.49 | 8.91 | ### Resize RGB16 image (U16x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into RGB16 image. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 29.20 | - | 82.74 | 136.66 | 191.22 | | resize | 8.31 | 24.85 | 48.08 | 92.32 | 138.14 | | libvips | 14.15 | 95.89 | 67.43 | 131.07 | 175.11 | | fir rust | 0.35 | 26.30 | 45.04 | 78.18 | 113.16 | | fir sse4.1 | 0.35 | 14.65 | 22.51 | 38.61 | 55.30 | | fir avx2 | 0.35 | 12.59 | 17.97 | 28.09 | 36.83 | ### Resize RGBA16 image (U16x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:------:|:--------:|:-------:|:--------:| | resize | 10.41 | 34.13 | 63.85 | 126.67 | 190.86 | | libvips | 21.24 | 181.32 | 152.83 | 241.51 | 344.01 | | fir rust | 0.39 | 62.48 | 84.00 | 127.42 | 174.28 | | fir sse4.1 | 0.39 | 31.00 | 41.70 | 63.55 | 85.98 | | fir avx2 | 0.39 | 21.07 | 26.54 | 37.35 | 48.88 | ### Resize L16 image (U16) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with two bytes per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 27.15 | - | 57.20 | 86.11 | 114.59 | | resize | 6.94 | 10.38 | 16.53 | 34.22 | 59.67 | | libvips | 5.74 | 34.76 | 23.81 | 43.92 | 59.54 | | fir rust | 0.17 | 13.12 | 20.60 | 32.93 | 45.22 | | fir sse4.1 | 0.17 | 5.01 | 7.17 | 12.71 | 18.59 | | fir avx2 | 0.17 | 5.02 | 6.15 | 9.08 | 13.87 | ### Resize LA16 (luma with alpha channel) image (U16x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (four bytes per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:------:|:--------:|:-------:|:--------:| | libvips | 11.30 | 104.96 | 87.94 | 133.05 | 177.05 | | fir rust | 0.19 | 26.01 | 34.51 | 57.11 | 79.20 | | fir sse4.1 | 0.19 | 14.57 | 20.95 | 32.23 | 45.14 | | fir avx2 | 0.19 | 11.33 | 14.61 | 21.68 | 29.04 | ### Resize L32F image (F32) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into grayscale image with two bytes per pixel. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 2.25 | - | 17.43 | 34.22 | 52.11 | | resize | 5.37 | 8.92 | 13.48 | 30.89 | 45.56 | | libvips | 4.61 | 34.03 | 23.89 | 45.80 | 64.98 | | fir rust | 0.19 | 7.11 | 11.74 | 25.79 | 39.27 | | fir sse4.1 | 0.19 | 4.33 | 6.75 | 11.59 | 16.93 | | fir avx2 | 0.19 | 3.87 | 5.17 | 7.81 | 11.16 | ### Resize LA32F (luma with alpha channel) image (F32x2) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) has converted into grayscale image with an alpha channel (two `f32` values per pixel). - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | libvips | 10.74 | 91.69 | 77.46 | 119.06 | 161.51 | | fir rust | 0.40 | 20.41 | 27.92 | 46.32 | 68.22 | | fir sse4.1 | 0.40 | 16.80 | 21.28 | 30.42 | 40.64 | | fir avx2 | 0.40 | 15.46 | 17.60 | 23.08 | 28.89 | ### Resize RGB32F image (F32x3) 4928x3279 => 852x567 Pipeline: `src_image => resize => dst_image` - Source image [nasa-4928x3279.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279.png) has converted into RGB32F image. - Numbers in the table mean a duration of image resizing in milliseconds. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:-----:|:--------:|:-------:|:--------:| | image | 2.94 | - | 26.03 | 50.30 | 75.89 | | resize | 8.77 | 14.26 | 24.16 | 48.92 | 71.69 | | libvips | 10.81 | 91.85 | 69.09 | 136.77 | 190.14 | | fir rust | 0.87 | 13.98 | 24.68 | 48.56 | 73.59 | | fir sse4.1 | 0.87 | 11.18 | 18.11 | 31.87 | 47.17 | | fir avx2 | 0.87 | 9.24 | 13.08 | 21.25 | 29.01 | ### Resize RGBA32F image (F32x4) 4928x3279 => 852x567 Pipeline: `src_image => multiply by alpha => resize => divide by alpha => dst_image` - Source image [nasa-4928x3279-rgba.png](https://github.com/Cykooz/fast_image_resize/blob/main/data/nasa-4928x3279-rgba.png) - Numbers in the table mean a duration of image resizing in milliseconds. - The `image` crate does not support multiplying and dividing by alpha channel. - The `resize` crate does not support multiplying and dividing by alpha channel for this pixel format. | | Nearest | Box | Bilinear | Bicubic | Lanczos3 | |------------|:-------:|:------:|:--------:|:-------:|:--------:| | libvips | 20.21 | 153.92 | 125.00 | 214.27 | 313.49 | | fir rust | 1.03 | 35.41 | 44.55 | 68.91 | 92.62 | | fir sse4.1 | 1.03 | 30.65 | 38.50 | 57.21 | 76.65 | | fir avx2 | 1.03 | 28.66 | 30.10 | 38.79 | 48.73 | fast_image_resize-5.3.0/dev.md000064400000000000000000000031141046102023000144100ustar 00000000000000# Preparation Install system libraries: - libvips-dev (used in benchmarks) Install additional toolchains: - Arm64: ```shell rustup target add aarch64-unknown-linux-gnu ``` - Wasm32: ```shell rustup target add wasm32-wasip2 ``` Install [Wasmtime](https://wasmtime.dev/). # Tests Run tests with saving result images as files in `./data` directory: ```shell SAVE_RESULT=1 cargo test ``` # Benchmarks Run benchmarks to compare with other crates for image resizing and write results into report files, such as `./benchmarks-x86_64.md`: ```shell WRITE_COMPARE_RESULT=1 cargo bench -- Compare ``` If you want to use old benchmark results for other crates, you must add an env variable with the number of days as a result lifetime: ```shell WRITE_COMPARE_RESULT=1 RESULTS_LIFETIME=5 cargo bench -- Compare ``` # Wasm32 Specify build target and runner in `.cargo/config.toml` file. ```toml [build] target = "wasm32-wasip2" [target.wasm32-wasip2] runner = "wasmtime --dir=. --" ``` Run tests: ```shell cargo test ``` Run tests with saving result images as files in `./data` directory: ```shell CARGO_TARGET_WASM32_WASIP2_RUNNER="wasmtime --dir=. --env SAVE_RESULT=1 --" cargo test ``` Run a specific benchmark in `quick` mode: ```shell cargo bench --bench bench_resize -- --color=always --quick ``` Run benchmarks to compare with other crates for image resizing and write results into report files, such as `./benchmarks-wasm32.md`: ```shell CARGO_TARGET_WASM32_WASIP2_RUNNER="wasmtime --dir=. --env WRITE_COMPARE_RESULT=1 --" cargo bench --no-fail-fast -- --color=always Compare ``` fast_image_resize-5.3.0/rustfmt.toml000064400000000000000000000001341046102023000157100ustar 00000000000000unstable_features = true imports_granularity = "Module" group_imports = "StdExternalCrate" fast_image_resize-5.3.0/src/alpha/common.rs000064400000000000000000000106201046102023000170220ustar 00000000000000#[inline(always)] pub(crate) fn mul_div_255(a: u8, b: u8) -> u8 { let tmp = a as u32 * b as u32 + 128; (((tmp >> 8) + tmp) >> 8) as u8 } #[inline(always)] pub(crate) fn mul_div_65535(a: u16, b: u16) -> u16 { let tmp = a as u32 * b as u32 + 0x8000; (((tmp >> 16) + tmp) >> 16) as u16 } const fn recip_alpha_array(precision: u32) -> [u32; 256] { let mut res = [0; 256]; let scale = 1 << (precision + 1); let scaled_max = 255 * scale; let mut i: usize = 1; while i < 256 { res[i] = ((scaled_max / i as u32) + 1) >> 1; i += 1; } res } const fn recip_alpha16_array(precision: u64) -> [u64; 65536] { let mut res = [0; 65536]; let scale = 1 << (precision + 1); let scaled_max = 0xffff * scale; let mut i: usize = 1; while i < 65536 { res[i] = ((scaled_max / i as u64) + 1) >> 1; i += 1; } res } const PRECISION: u32 = 8; const ROUND_CORRECTION: u32 = 1 << (PRECISION - 1); const PRECISION16: u64 = 33; const ROUND_CORRECTION16: u64 = 1 << (PRECISION16 - 1); #[inline(always)] pub(crate) fn div_and_clip(v: u8, recip_alpha: u32) -> u8 { ((v as u32 * recip_alpha + ROUND_CORRECTION) >> PRECISION).min(0xff) as u8 } #[inline(always)] pub(crate) fn div_and_clip16(v: u16, recip_alpha: u64) -> u16 { ((v as u64 * recip_alpha + ROUND_CORRECTION16) >> PRECISION16).min(0xffff) as u16 } pub(crate) const RECIP_ALPHA: [u32; 256] = recip_alpha_array(PRECISION); pub(crate) static RECIP_ALPHA16: [u64; 65536] = recip_alpha16_array(PRECISION16); macro_rules! process_two_images { {$op: ident($src_view: ident, $dst_view: ident, $($arg: ident),+);} => { #[allow(unused_labels)] 'block: { #[cfg(feature = "rayon")] { use crate::threading::split_h_two_images_for_threading; use rayon::prelude::*; if let Some(iter) = split_h_two_images_for_threading($src_view, $dst_view, 0) { iter.for_each(|(src, mut dst)| { $op(&src, &mut dst, $($arg),+); }); break 'block; } } $op($src_view, $dst_view, $($arg),+); } }; } macro_rules! process_one_images { {$op: ident($image_view: ident, $($arg: ident),+);} => { #[allow(unused_labels)] 'block: { #[cfg(feature = "rayon")] { use crate::threading::split_h_one_image_for_threading; use rayon::prelude::*; if let Some(iter) = split_h_one_image_for_threading($image_view) { iter.for_each(|mut img| { $op(&mut img, $($arg),+); }); break 'block; } } $op($image_view, $($arg),+); } }; } #[cfg(test)] mod tests { use super::*; #[test] fn test_recip_alpha_array() { for alpha in 0..=255u8 { let expected = if alpha == 0 { 0 } else { let scale = (1 << PRECISION) as f64; (255.0 * scale / alpha as f64).round() as u32 }; let recip_alpha = RECIP_ALPHA[alpha as usize]; assert_eq!(expected, recip_alpha, "alpha {}", alpha); } } #[test] fn test_div_and_clip() { let mut err_sum: i32 = 0; for alpha in 0..=255u8 { for color in 0..=255u8 { let expected_color = if alpha == 0 { 0 } else { let res = color as f64 / (alpha as f64 / 255.); res.round().min(255.) as u8 }; let recip_alpha = RECIP_ALPHA[alpha as usize]; let result_color = div_and_clip(color, recip_alpha); let delta = result_color as i32 - expected_color as i32; err_sum += delta.abs(); } } assert_eq!(err_sum, 2512); } #[test] fn test_recip_alpha16_array() { for alpha in 0..=0xffffu16 { let expected = if alpha == 0 { 0 } else { let scale = (1u64 << PRECISION16) as f64; (65535.0 * scale / alpha as f64).round() as u64 }; let recip_alpha = RECIP_ALPHA16[alpha as usize]; assert_eq!(expected, recip_alpha, "alpha {}", alpha); } } } fast_image_resize-5.3.0/src/alpha/errors.rs000064400000000000000000000006571046102023000170570ustar 00000000000000use thiserror::Error; use crate::ImageError; #[derive(Error, Debug, Clone, Copy)] #[non_exhaustive] pub enum MulDivImagesError { #[error("Source or destination image is not supported")] ImageError(#[from] ImageError), #[error("Size of source image does not match to destination image")] SizeIsDifferent, #[error("Pixel type of source image does not match to destination image")] PixelTypesAreDifferent, } fast_image_resize-5.3.0/src/alpha/f32x2/avx2.rs000064400000000000000000000124071046102023000172630ustar 00000000000000use std::arch::x86_64::*; use crate::pixels::F32x2; use crate::{ImageView, ImageViewMut}; use super::sse4; #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[F32x2], dst_row: &mut [F32x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); for (src_chunk, dst_chunk) in src_chunks.zip(&mut dst_chunks) { let src_ptr = src_chunk.as_ptr() as *const f32; let src_pixels03 = _mm256_loadu_ps(src_ptr); let src_pixels47 = _mm256_loadu_ps(src_ptr.add(8)); multiply_alpha_8_pixels(src_pixels03, src_pixels47, dst_chunk); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); sse4::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [F32x2]) { let mut chunks = row.chunks_exact_mut(8); for chunk in &mut chunks { let src_ptr = chunk.as_ptr() as *const f32; let src_pixels01 = _mm256_loadu_ps(src_ptr); let src_pixels23 = _mm256_loadu_ps(src_ptr.add(8)); multiply_alpha_8_pixels(src_pixels01, src_pixels23, chunk); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { sse4::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_8_pixels(pixels03: __m256, pixels47: __m256, dst_chunk: &mut [F32x2]) { let luma07 = _mm256_shuffle_ps::<0b10_00_10_00>(pixels03, pixels47); let alpha07 = _mm256_shuffle_ps::<0b11_01_11_01>(pixels03, pixels47); let multiplied_luma07 = _mm256_mul_ps(luma07, alpha07); let dst_pixel03 = _mm256_unpacklo_ps(multiplied_luma07, alpha07); let dst_pixel47 = _mm256_unpackhi_ps(multiplied_luma07, alpha07); let dst_ptr = dst_chunk.as_mut_ptr() as *mut f32; _mm256_storeu_ps(dst_ptr, dst_pixel03); _mm256_storeu_ps(dst_ptr.add(8), dst_pixel47); } // Divide #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_row(src_row: &[F32x2], dst_row: &mut [F32x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); for (src_chunk, dst_chunk) in src_chunks.zip(&mut dst_chunks) { let src_ptr = src_chunk.as_ptr() as *const f32; let src_pixels03 = _mm256_loadu_ps(src_ptr); let src_pixels47 = _mm256_loadu_ps(src_ptr.add(8)); divide_alpha_8_pixels(src_pixels03, src_pixels47, dst_chunk); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); sse4::divide_alpha_row(src_remainder, dst_reminder); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [F32x2]) { let mut chunks = row.chunks_exact_mut(8); for chunk in &mut chunks { let src_ptr = chunk.as_ptr() as *const f32; let src_pixels01 = _mm256_loadu_ps(src_ptr); let src_pixels23 = _mm256_loadu_ps(src_ptr.add(8)); divide_alpha_8_pixels(src_pixels01, src_pixels23, chunk); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { sse4::divide_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn divide_alpha_8_pixels(pixels03: __m256, pixels47: __m256, dst_chunk: &mut [F32x2]) { let zero = _mm256_set1_ps(0.); let luma07 = _mm256_shuffle_ps::<0b10_00_10_00>(pixels03, pixels47); let alpha07 = _mm256_shuffle_ps::<0b11_01_11_01>(pixels03, pixels47); let mut multiplied_luma07 = _mm256_div_ps(luma07, alpha07); let mask_zero = _mm256_cmp_ps::<_CMP_NEQ_UQ>(alpha07, zero); multiplied_luma07 = _mm256_and_ps(mask_zero, multiplied_luma07); let dst_pixel03 = _mm256_unpacklo_ps(multiplied_luma07, alpha07); let dst_pixel47 = _mm256_unpackhi_ps(multiplied_luma07, alpha07); let dst_ptr = dst_chunk.as_mut_ptr() as *mut f32; _mm256_storeu_ps(dst_ptr, dst_pixel03); _mm256_storeu_ps(dst_ptr.add(8), dst_pixel47); } fast_image_resize-5.3.0/src/alpha/f32x2/mod.rs000064400000000000000000000104121046102023000171540ustar 00000000000000use crate::cpu_extensions::CpuExtensions; use crate::pixels::F32x2; use crate::{ImageError, ImageView, ImageViewMut}; use super::AlphaMulDiv; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "x86_64")] mod sse4; type P = F32x2; impl AlphaMulDiv for P { fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { multiple(src_view, dst_view, cpu_extensions); } Ok(()) } fn multiply_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { multiply_inplace(image_view, cpu_extensions); } Ok(()) } fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { divide(src_view, dst_view, cpu_extensions); } Ok(()) } fn divide_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { divide_inplace(image_view, cpu_extensions); } Ok(()) } } fn multiple( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha(src_view, dst_view) }, // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => unsafe { neon::multiply_alpha(src_view, dst_view) }, // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha(src_view, dst_view) }, _ => native::multiply_alpha(src_view, dst_view), } } fn multiply_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha_inplace(image_view) }, // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => unsafe { neon::multiply_alpha_inplace(image_view) }, // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha_inplace(image_view) }, _ => native::multiply_alpha_inplace(image_view), } } fn divide( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha(src_view, dst_view) }, // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => unsafe { neon::divide_alpha(src_view, dst_view) }, // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha(src_view, dst_view) }, _ => native::divide_alpha(src_view, dst_view), } } fn divide_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha_inplace(image_view) }, // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => unsafe { neon::divide_alpha_inplace(image_view) }, // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha_inplace(image_view) }, _ => native::divide_alpha_inplace(image_view), } } fast_image_resize-5.3.0/src/alpha/f32x2/native.rs000064400000000000000000000046151046102023000176730ustar 00000000000000use num_traits::Zero; use crate::pixels::F32x2; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; pub(crate) fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } pub(crate) fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn multiply_alpha_row(src_row: &[F32x2], dst_row: &mut [F32x2]) { for (src_pixel, dst_pixel) in src_row.iter().zip(dst_row) { let components: [f32; 2] = src_pixel.0; let alpha = components[1]; dst_pixel.0 = [components[0] * alpha, alpha]; } } #[inline(always)] pub(crate) fn multiply_alpha_row_inplace(row: &mut [F32x2]) { for pixel in row { pixel.0[0] *= pixel.0[1]; } } // Divide #[inline] pub(crate) fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[inline] pub(crate) fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn divide_alpha_row(src_row: &[F32x2], dst_row: &mut [F32x2]) { foreach_with_pre_reading( src_row.iter().zip(dst_row), |(&src_pixel, dst_pixel)| (src_pixel, dst_pixel), |(src_pixel, dst_pixel)| { let alpha = src_pixel.0[1]; if alpha.is_zero() { dst_pixel.0 = [0.; 2]; } else { dst_pixel.0 = [src_pixel.0[0] / alpha, alpha]; } }, ); } #[inline(always)] pub(crate) fn divide_alpha_row_inplace(row: &mut [F32x2]) { for pixel in row { let components: [f32; 2] = pixel.0; let alpha = components[1]; if alpha.is_zero() { pixel.0[0] = 0.; } else { pixel.0[0] = components[0] / alpha; } } } fast_image_resize-5.3.0/src/alpha/f32x2/sse4.rs000064400000000000000000000123171046102023000172610ustar 00000000000000use std::arch::x86_64::*; use crate::pixels::F32x2; use crate::{ImageView, ImageViewMut}; use super::native; #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[F32x2], dst_row: &mut [F32x2]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); for (src_chunk, dst_chunk) in src_chunks.zip(&mut dst_chunks) { let src_ptr = src_chunk.as_ptr() as *const f32; let src_pixels01 = _mm_loadu_ps(src_ptr); let src_pixels23 = _mm_loadu_ps(src_ptr.add(4)); multiply_alpha_4_pixels(src_pixels01, src_pixels23, dst_chunk); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [F32x2]) { let mut chunks = row.chunks_exact_mut(4); for chunk in &mut chunks { let src_ptr = chunk.as_ptr() as *const f32; let src_pixels01 = _mm_loadu_ps(src_ptr); let src_pixels23 = _mm_loadu_ps(src_ptr.add(4)); multiply_alpha_4_pixels(src_pixels01, src_pixels23, chunk); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn multiply_alpha_4_pixels(pixels01: __m128, pixels23: __m128, dst_chunk: &mut [F32x2]) { let luma03 = _mm_shuffle_ps::<0b10_00_10_00>(pixels01, pixels23); let alpha03 = _mm_shuffle_ps::<0b11_01_11_01>(pixels01, pixels23); let multiplied_luma03 = _mm_mul_ps(luma03, alpha03); let dst_pixel01 = _mm_unpacklo_ps(multiplied_luma03, alpha03); let dst_pixel23 = _mm_unpackhi_ps(multiplied_luma03, alpha03); let dst_ptr = dst_chunk.as_mut_ptr() as *mut f32; _mm_storeu_ps(dst_ptr, dst_pixel01); _mm_storeu_ps(dst_ptr.add(4), dst_pixel23); } // Divide #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row(src_row: &[F32x2], dst_row: &mut [F32x2]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); for (src_chunk, dst_chunk) in src_chunks.zip(&mut dst_chunks) { let src_ptr = src_chunk.as_ptr() as *const f32; let src_pixels01 = _mm_loadu_ps(src_ptr); let src_pixels23 = _mm_loadu_ps(src_ptr.add(4)); divide_alpha_4_pixels(src_pixels01, src_pixels23, dst_chunk); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::divide_alpha_row(src_remainder, dst_reminder); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [F32x2]) { let mut chunks = row.chunks_exact_mut(4); for chunk in &mut chunks { let src_ptr = chunk.as_ptr() as *const f32; let src_pixels01 = _mm_loadu_ps(src_ptr); let src_pixels23 = _mm_loadu_ps(src_ptr.add(4)); divide_alpha_4_pixels(src_pixels01, src_pixels23, chunk); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::divide_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn divide_alpha_4_pixels(pixels01: __m128, pixels23: __m128, dst_chunk: &mut [F32x2]) { let zero = _mm_set_ps1(0.); let luma03 = _mm_shuffle_ps::<0b10_00_10_00>(pixels01, pixels23); let alpha03 = _mm_shuffle_ps::<0b11_01_11_01>(pixels01, pixels23); let mut multiplied_luma03 = _mm_div_ps(luma03, alpha03); let mask_zero = _mm_cmpneq_ps(alpha03, zero); multiplied_luma03 = _mm_and_ps(mask_zero, multiplied_luma03); let dst_pixel01 = _mm_unpacklo_ps(multiplied_luma03, alpha03); let dst_pixel23 = _mm_unpackhi_ps(multiplied_luma03, alpha03); let dst_ptr = dst_chunk.as_mut_ptr() as *mut f32; _mm_storeu_ps(dst_ptr, dst_pixel01); _mm_storeu_ps(dst_ptr.add(4), dst_pixel23); } fast_image_resize-5.3.0/src/alpha/f32x4/avx2.rs000064400000000000000000000137501046102023000172670ustar 00000000000000use std::arch::x86_64::*; use crate::pixels::F32x4; use crate::{ImageView, ImageViewMut}; use super::native; #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[F32x4], dst_row: &mut [F32x4]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); for (src_chunk, dst_chunk) in src_chunks.zip(&mut dst_chunks) { let src_pixels = load_8_pixels(src_chunk); multiply_alpha_8_pixels(src_pixels, dst_chunk); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [F32x4]) { let mut chunks = row.chunks_exact_mut(8); for chunk in &mut chunks { let src_pixels = load_8_pixels(chunk); multiply_alpha_8_pixels(src_pixels, chunk); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_8_pixels(pixels: [__m256; 4], dst_chunk: &mut [F32x4]) { let r_f32x8 = _mm256_mul_ps(pixels[0], pixels[3]); let g_f32x8 = _mm256_mul_ps(pixels[1], pixels[3]); let b_f32x8 = _mm256_mul_ps(pixels[2], pixels[3]); store_8_pixels([r_f32x8, g_f32x8, b_f32x8, pixels[3]], dst_chunk); } // Divide #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_row(src_row: &[F32x4], dst_row: &mut [F32x4]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); for (src_chunk, dst_chunk) in src_chunks.zip(&mut dst_chunks) { let src_pixels = load_8_pixels(src_chunk); divide_alpha_8_pixels(src_pixels, dst_chunk); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::divide_alpha_row(src_remainder, dst_reminder); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [F32x4]) { let mut chunks = row.chunks_exact_mut(8); for chunk in &mut chunks { let src_pixels = load_8_pixels(chunk); divide_alpha_8_pixels(src_pixels, chunk); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::divide_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn divide_alpha_8_pixels(pixels: [__m256; 4], dst_chunk: &mut [F32x4]) { let mut r_f32x8 = _mm256_div_ps(pixels[0], pixels[3]); let mut g_f32x8 = _mm256_div_ps(pixels[1], pixels[3]); let mut b_f32x8 = _mm256_div_ps(pixels[2], pixels[3]); let zero = _mm256_setzero_ps(); let mask_zero = _mm256_cmp_ps::<_CMP_NEQ_UQ>(pixels[3], zero); r_f32x8 = _mm256_and_ps(mask_zero, r_f32x8); g_f32x8 = _mm256_and_ps(mask_zero, g_f32x8); b_f32x8 = _mm256_and_ps(mask_zero, b_f32x8); store_8_pixels([r_f32x8, g_f32x8, b_f32x8, pixels[3]], dst_chunk); } #[inline] #[target_feature(enable = "avx2")] unsafe fn load_8_pixels(pixels: &[F32x4]) -> [__m256; 4] { let ptr = pixels.as_ptr() as *const f32; cols_into_rows([ _mm256_loadu_ps(ptr), _mm256_loadu_ps(ptr.add(8)), _mm256_loadu_ps(ptr.add(16)), _mm256_loadu_ps(ptr.add(24)), ]) } #[inline] #[target_feature(enable = "avx2")] unsafe fn store_8_pixels(pixels: [__m256; 4], dst_chunk: &mut [F32x4]) { let pixels = cols_into_rows(pixels); let mut dst_ptr = dst_chunk.as_mut_ptr() as *mut f32; for rgba in pixels { _mm256_storeu_ps(dst_ptr, rgba); dst_ptr = dst_ptr.add(8) } } #[inline] #[target_feature(enable = "avx2")] unsafe fn cols_into_rows(pixels: [__m256; 4]) -> [__m256; 4] { let rrgg02_rrgg13 = _mm256_unpacklo_ps(pixels[0], pixels[1]); let rrgg46_rrgg57 = _mm256_unpacklo_ps(pixels[2], pixels[3]); let r0246_r1357 = _mm256_castsi256_ps(_mm256_unpacklo_epi64( _mm256_castps_si256(rrgg02_rrgg13), _mm256_castps_si256(rrgg46_rrgg57), )); let g0246_g1357 = _mm256_castsi256_ps(_mm256_unpackhi_epi64( _mm256_castps_si256(rrgg02_rrgg13), _mm256_castps_si256(rrgg46_rrgg57), )); let bbaa02_bbaa13 = _mm256_unpackhi_ps(pixels[0], pixels[1]); let bbaa46_bbaa57 = _mm256_unpackhi_ps(pixels[2], pixels[3]); let b0246_b1357 = _mm256_castsi256_ps(_mm256_unpacklo_epi64( _mm256_castps_si256(bbaa02_bbaa13), _mm256_castps_si256(bbaa46_bbaa57), )); let a0246_a1357 = _mm256_castsi256_ps(_mm256_unpackhi_epi64( _mm256_castps_si256(bbaa02_bbaa13), _mm256_castps_si256(bbaa46_bbaa57), )); [r0246_r1357, g0246_g1357, b0246_b1357, a0246_a1357] } fast_image_resize-5.3.0/src/alpha/f32x4/mod.rs000064400000000000000000000104121046102023000171560ustar 00000000000000use crate::cpu_extensions::CpuExtensions; use crate::pixels::F32x4; use crate::{ImageError, ImageView, ImageViewMut}; use super::AlphaMulDiv; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "x86_64")] mod sse4; type P = F32x4; impl AlphaMulDiv for P { fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { multiple(src_view, dst_view, cpu_extensions); } Ok(()) } fn multiply_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { multiply_inplace(image_view, cpu_extensions); } Ok(()) } fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { divide(src_view, dst_view, cpu_extensions); } Ok(()) } fn divide_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { divide_inplace(image_view, cpu_extensions); } Ok(()) } } fn multiple( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha(src_view, dst_view) }, // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => unsafe { neon::multiply_alpha(src_view, dst_view) }, // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha(src_view, dst_view) }, _ => native::multiply_alpha(src_view, dst_view), } } fn multiply_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha_inplace(image_view) }, // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => unsafe { neon::multiply_alpha_inplace(image_view) }, // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha_inplace(image_view) }, _ => native::multiply_alpha_inplace(image_view), } } fn divide( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha(src_view, dst_view) }, // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => unsafe { neon::divide_alpha(src_view, dst_view) }, // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha(src_view, dst_view) }, _ => native::divide_alpha(src_view, dst_view), } } fn divide_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha_inplace(image_view) }, // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => unsafe { neon::divide_alpha_inplace(image_view) }, // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha_inplace(image_view) }, _ => native::divide_alpha_inplace(image_view), } } fast_image_resize-5.3.0/src/alpha/f32x4/native.rs000064400000000000000000000057611046102023000177000ustar 00000000000000use num_traits::Zero; use crate::pixels::F32x4; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; pub(crate) fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } pub(crate) fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn multiply_alpha_row(src_row: &[F32x4], dst_row: &mut [F32x4]) { for (src_pixel, dst_pixel) in src_row.iter().zip(dst_row) { let components = src_pixel.0; let alpha = components[3]; dst_pixel.0 = [ components[0] * alpha, components[1] * alpha, components[2] * alpha, alpha, ]; } } #[inline(always)] pub(crate) fn multiply_alpha_row_inplace(row: &mut [F32x4]) { for pixel in row { let alpha = pixel.0[3]; pixel.0[0] *= alpha; pixel.0[1] *= alpha; pixel.0[2] *= alpha; } } // Divide #[inline] pub(crate) fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[inline] pub(crate) fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn divide_alpha_row(src_row: &[F32x4], dst_row: &mut [F32x4]) { foreach_with_pre_reading( src_row.iter().zip(dst_row), |(&src_pixel, dst_pixel)| (src_pixel, dst_pixel), |(src_pixel, dst_pixel)| { let components = src_pixel.0; let alpha = components[3]; if alpha.is_zero() { dst_pixel.0 = [0.; 4]; } else { let recip_alpha = 1. / alpha; dst_pixel.0 = [ components[0] * recip_alpha, components[1] * recip_alpha, components[2] * recip_alpha, alpha, ]; } }, ); } #[inline(always)] pub(crate) fn divide_alpha_row_inplace(row: &mut [F32x4]) { for pixel in row { let components = pixel.0; let alpha = components[3]; if alpha.is_zero() { pixel.0 = [0.; 4]; } else { let recip_alpha = 1. / alpha; pixel.0 = [ components[0] * recip_alpha, components[1] * recip_alpha, components[2] * recip_alpha, alpha, ]; } } } fast_image_resize-5.3.0/src/alpha/f32x4/sse4.rs000064400000000000000000000134041046102023000172610ustar 00000000000000use std::arch::x86_64::*; use crate::pixels::F32x4; use crate::{ImageView, ImageViewMut}; use super::native; #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[F32x4], dst_row: &mut [F32x4]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); for (src_chunk, dst_chunk) in src_chunks.zip(&mut dst_chunks) { let src_pixels = load_4_pixels(src_chunk); multiply_alpha_4_pixels(src_pixels, dst_chunk); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [F32x4]) { let mut chunks = row.chunks_exact_mut(4); for chunk in &mut chunks { let src_pixels = load_4_pixels(chunk); multiply_alpha_4_pixels(src_pixels, chunk); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn multiply_alpha_4_pixels(pixels: [__m128; 4], dst_chunk: &mut [F32x4]) { let r_f32x4 = _mm_mul_ps(pixels[0], pixels[3]); let g_f32x4 = _mm_mul_ps(pixels[1], pixels[3]); let b_f32x4 = _mm_mul_ps(pixels[2], pixels[3]); store_4_pixels([r_f32x4, g_f32x4, b_f32x4, pixels[3]], dst_chunk); } // Divide #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row(src_row: &[F32x4], dst_row: &mut [F32x4]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); for (src_chunk, dst_chunk) in src_chunks.zip(&mut dst_chunks) { let src_pixels = load_4_pixels(src_chunk); divide_alpha_4_pixels(src_pixels, dst_chunk); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::divide_alpha_row(src_remainder, dst_reminder); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [F32x4]) { let mut chunks = row.chunks_exact_mut(4); for chunk in &mut chunks { let src_pixels = load_4_pixels(chunk); divide_alpha_4_pixels(src_pixels, chunk); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::divide_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn divide_alpha_4_pixels(pixels: [__m128; 4], dst_chunk: &mut [F32x4]) { let mut r_f32x4 = _mm_div_ps(pixels[0], pixels[3]); let mut g_f32x4 = _mm_div_ps(pixels[1], pixels[3]); let mut b_f32x4 = _mm_div_ps(pixels[2], pixels[3]); let zero = _mm_setzero_ps(); let mask_zero = _mm_cmpneq_ps(pixels[3], zero); r_f32x4 = _mm_and_ps(mask_zero, r_f32x4); g_f32x4 = _mm_and_ps(mask_zero, g_f32x4); b_f32x4 = _mm_and_ps(mask_zero, b_f32x4); store_4_pixels([r_f32x4, g_f32x4, b_f32x4, pixels[3]], dst_chunk); } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn load_4_pixels(pixels: &[F32x4]) -> [__m128; 4] { let ptr = pixels.as_ptr() as *const f32; cols_into_rows([ _mm_loadu_ps(ptr), _mm_loadu_ps(ptr.add(4)), _mm_loadu_ps(ptr.add(8)), _mm_loadu_ps(ptr.add(12)), ]) } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn store_4_pixels(pixels: [__m128; 4], dst_chunk: &mut [F32x4]) { let pixels = cols_into_rows(pixels); let mut dst_ptr = dst_chunk.as_mut_ptr() as *mut f32; for rgba in pixels { _mm_storeu_ps(dst_ptr, rgba); dst_ptr = dst_ptr.add(4) } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn cols_into_rows(pixels: [__m128; 4]) -> [__m128; 4] { let rrgg01 = _mm_unpacklo_ps(pixels[0], pixels[1]); let rrgg23 = _mm_unpacklo_ps(pixels[2], pixels[3]); let r0123 = _mm_castsi128_ps(_mm_unpacklo_epi64( _mm_castps_si128(rrgg01), _mm_castps_si128(rrgg23), )); let g0123 = _mm_castsi128_ps(_mm_unpackhi_epi64( _mm_castps_si128(rrgg01), _mm_castps_si128(rrgg23), )); let bbaa01 = _mm_unpackhi_ps(pixels[0], pixels[1]); let bbaa23 = _mm_unpackhi_ps(pixels[2], pixels[3]); let b0123 = _mm_castsi128_ps(_mm_unpacklo_epi64( _mm_castps_si128(bbaa01), _mm_castps_si128(bbaa23), )); let a0123 = _mm_castsi128_ps(_mm_unpackhi_epi64( _mm_castps_si128(bbaa01), _mm_castps_si128(bbaa23), )); [r0123, g0123, b0123, a0123] } fast_image_resize-5.3.0/src/alpha/mod.rs000064400000000000000000000037451046102023000163230ustar 00000000000000use crate::{pixels, CpuExtensions, ImageError, ImageView, ImageViewMut}; #[macro_use] mod common; pub(crate) mod errors; mod u8x4; cfg_if::cfg_if! { if #[cfg(not(feature = "only_u8x4"))] { mod u16x2; mod u16x4; mod u8x2; mod f32x2; mod f32x4; } } pub(crate) trait AlphaMulDiv: pixels::InnerPixel { /// Multiplies RGB-channels of source image by alpha-channel and store /// result into destination image. #[allow(unused_variables)] fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { Err(ImageError::UnsupportedPixelType) } /// Multiplies RGB-channels of image by alpha-channel inplace. #[allow(unused_variables)] fn multiply_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { Err(ImageError::UnsupportedPixelType) } /// Divides RGB-channels of source image by alpha-channel and store /// result into destination image. #[allow(unused_variables)] fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { Err(ImageError::UnsupportedPixelType) } /// Divides RGB-channels of image by alpha-channel inplace. #[allow(unused_variables)] fn divide_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { Err(ImageError::UnsupportedPixelType) } } impl AlphaMulDiv for pixels::U8 {} impl AlphaMulDiv for pixels::U8x3 {} impl AlphaMulDiv for pixels::U16 {} impl AlphaMulDiv for pixels::U16x3 {} impl AlphaMulDiv for pixels::I32 {} impl AlphaMulDiv for pixels::F32 {} impl AlphaMulDiv for pixels::F32x3 {} fast_image_resize-5.3.0/src/alpha/u16x2/avx2.rs000064400000000000000000000176561046102023000173170ustar 00000000000000use std::arch::x86_64::*; use super::sse4; use crate::pixels::U16x2; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm256_loadu_si256(src.as_ptr() as *const __m256i); let dst_ptr = dst.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_8_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); sse4::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [U16x2]) { let mut chunks = row.chunks_exact_mut(8); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = _mm256_loadu_si256(chunk.as_ptr() as *const __m256i); let dst_ptr = chunk.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_8_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); if !reminder.is_empty() { sse4::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_8_pixels(pixels: __m256i) -> __m256i { let zero = _mm256_setzero_si256(); let half = _mm256_set1_epi32(0x8000); const MAX_A: i32 = 0xffff0000u32 as i32; let max_alpha = _mm256_set1_epi32(MAX_A); /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| */ #[rustfmt::skip] let factor_mask = _mm256_set_epi8( 15, 14, 15, 14, 11, 10, 11, 10, 7, 6, 7, 6, 3, 2, 3, 2, 15, 14, 15, 14, 11, 10, 11, 10, 7, 6, 7, 6, 3, 2, 3, 2 ); let factor_pixels = _mm256_shuffle_epi8(pixels, factor_mask); let factor_pixels = _mm256_or_si256(factor_pixels, max_alpha); let src_i32_lo = _mm256_unpacklo_epi16(pixels, zero); let factors = _mm256_unpacklo_epi16(factor_pixels, zero); let src_i32_lo = _mm256_add_epi32(_mm256_mullo_epi32(src_i32_lo, factors), half); let dst_i32_lo = _mm256_add_epi32(src_i32_lo, _mm256_srli_epi32::<16>(src_i32_lo)); let dst_i32_lo = _mm256_srli_epi32::<16>(dst_i32_lo); let src_i32_hi = _mm256_unpackhi_epi16(pixels, zero); let factors = _mm256_unpackhi_epi16(factor_pixels, zero); let src_i32_hi = _mm256_add_epi32(_mm256_mullo_epi32(src_i32_hi, factors), half); let dst_i32_hi = _mm256_add_epi32(src_i32_hi, _mm256_srli_epi32::<16>(src_i32_hi)); let dst_i32_hi = _mm256_srli_epi32::<16>(dst_i32_hi); _mm256_packus_epi32(dst_i32_lo, dst_i32_hi) } // Divide #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm256_loadu_si256(src.as_ptr() as *const __m256i); let dst_ptr = dst.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_8_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U16x2::new([0, 0]); 8]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x2::new([0, 0]); 8]; let mut pixels = _mm256_loadu_si256(src_pixels.as_ptr() as *const __m256i); pixels = divide_alpha_8_pixels(pixels); _mm256_storeu_si256(dst_pixels.as_mut_ptr() as *mut __m256i, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [U16x2]) { let mut chunks = row.chunks_exact_mut(8); // Using a simple for-loop in this case is faster than implementation with pre-reading for chunk in &mut chunks { let mut pixels = _mm256_loadu_si256(chunk.as_ptr() as *const __m256i); pixels = divide_alpha_8_pixels(pixels); _mm256_storeu_si256(chunk.as_mut_ptr() as *mut __m256i, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut src_pixels = [U16x2::new([0, 0]); 8]; src_pixels .iter_mut() .zip(reminder.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x2::new([0, 0]); 8]; let mut pixels = _mm256_loadu_si256(src_pixels.as_ptr() as *const __m256i); pixels = divide_alpha_8_pixels(pixels); _mm256_storeu_si256(dst_pixels.as_mut_ptr() as *mut __m256i, pixels); dst_pixels.iter().zip(reminder).for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn divide_alpha_8_pixels(pixels: __m256i) -> __m256i { let alpha_mask = _mm256_set1_epi32(0xffff0000u32 as i32); let luma_mask = _mm256_set1_epi32(0xffff); let alpha_max = _mm256_set1_ps(65535.0); /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| */ #[rustfmt::skip] let alpha32_sh = _mm256_set_epi8( -1, -1, 15, 14, -1, -1, 11, 10, -1, -1, 7, 6, -1, -1, 3, 2, -1, -1, 15, 14, -1, -1, 11, 10, -1, -1, 7, 6, -1, -1, 3, 2, ); let alpha_f32x8 = _mm256_cvtepi32_ps(_mm256_shuffle_epi8(pixels, alpha32_sh)); let luma_i32x8 = _mm256_and_si256(pixels, luma_mask); let luma_f32x8 = _mm256_cvtepi32_ps(luma_i32x8); let scaled_luma_f32x8 = _mm256_mul_ps(luma_f32x8, alpha_max); let divided_luma_f32x8 = _mm256_div_ps(scaled_luma_f32x8, alpha_f32x8); let mut divided_luma_i32x8 = _mm256_cvtps_epi32(divided_luma_f32x8); // Clamp result to [0..0xffff] divided_luma_i32x8 = _mm256_min_epi32(divided_luma_i32x8, luma_mask); let alpha = _mm256_and_si256(pixels, alpha_mask); _mm256_blendv_epi8(divided_luma_i32x8, alpha, alpha_mask) } fast_image_resize-5.3.0/src/alpha/u16x2/mod.rs000064400000000000000000000104241046102023000172000ustar 00000000000000use crate::pixels::U16x2; use crate::{CpuExtensions, ImageError, ImageView, ImageViewMut}; use super::AlphaMulDiv; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U16x2; impl AlphaMulDiv for P { fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { multiple(src_view, dst_view, cpu_extensions); } Ok(()) } fn multiply_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { multiply_inplace(image_view, cpu_extensions); } Ok(()) } fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { divide(src_view, dst_view, cpu_extensions); } Ok(()) } fn divide_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { divide_inplace(image_view, cpu_extensions); } Ok(()) } } fn multiple( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha(src_view, dst_view) }, _ => native::multiply_alpha(src_view, dst_view), } } fn multiply_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha_inplace(image_view) }, _ => native::multiply_alpha_inplace(image_view), } } fn divide( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha(src_view, dst_view) }, _ => native::divide_alpha(src_view, dst_view), } } fn divide_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha_inplace(image_view) }, _ => native::divide_alpha_inplace(image_view), } } fast_image_resize-5.3.0/src/alpha/u16x2/native.rs000064400000000000000000000047301046102023000177120ustar 00000000000000use crate::alpha::common::{div_and_clip16, mul_div_65535, RECIP_ALPHA16}; use crate::pixels::U16x2; use crate::{ImageView, ImageViewMut}; pub(crate) fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } pub(crate) fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn multiply_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { for (src_pixel, dst_pixel) in src_row.iter().zip(dst_row) { let components: [u16; 2] = src_pixel.0; let alpha = components[1]; dst_pixel.0 = [mul_div_65535(components[0], alpha), alpha]; } } #[inline(always)] pub(crate) fn multiply_alpha_row_inplace(row: &mut [U16x2]) { for pixel in row { let components: [u16; 2] = pixel.0; let alpha = components[1]; pixel.0 = [mul_div_65535(components[0], alpha), alpha]; } } // Divide #[inline] pub(crate) fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[inline] pub(crate) fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn divide_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { src_row .iter() .zip(dst_row) .for_each(|(src_pixel, dst_pixel)| { let components: [u16; 2] = src_pixel.0; let alpha = components[1]; let recip_alpha = RECIP_ALPHA16[alpha as usize]; dst_pixel.0 = [div_and_clip16(components[0], recip_alpha), alpha]; }); } #[inline(always)] pub(crate) fn divide_alpha_row_inplace(row: &mut [U16x2]) { for pixel in row { let components: [u16; 2] = pixel.0; let alpha = components[1]; let recip_alpha = RECIP_ALPHA16[alpha as usize]; pixel.0 = [div_and_clip16(components[0], recip_alpha), alpha]; } } fast_image_resize-5.3.0/src/alpha/u16x2/neon.rs000064400000000000000000000157101046102023000173630ustar 00000000000000use std::arch::aarch64::*; use crate::neon_utils; use crate::pixels::U16x2; use crate::{ImageView, ImageViewMut}; use super::native; #[target_feature(enable = "neon")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "neon")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline(always)] unsafe fn multiply_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); // Using a simple for-loop in this case is faster than implementation with pre-reading for (src, dst) in src_chunks.zip(&mut dst_chunks) { let mut pixels = neon_utils::load_deintrel_u16x8x2(src, 0); pixels.0 = neon_utils::multiply_color_to_alpha_u16x8(pixels.0, pixels.1); let dst_ptr = dst.as_mut_ptr() as *mut u16; vst2q_u16(dst_ptr, pixels); } if !src_remainder.is_empty() { let src_chunks = src_remainder.chunks_exact(4); let src_remainder = src_chunks.remainder(); let dst_reminder = dst_chunks.into_remainder(); let mut dst_chunks = dst_reminder.chunks_exact_mut(4); let mut src_dst = src_chunks.zip(&mut dst_chunks); if let Some((src, dst)) = src_dst.next() { let mut pixels = neon_utils::load_deintrel_u16x4x2(src, 0); pixels.0 = neon_utils::multiply_color_to_alpha_u16x4(pixels.0, pixels.1); let dst_ptr = dst.as_mut_ptr() as *mut u16; vst2_u16(dst_ptr, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } } #[inline(always)] unsafe fn multiply_alpha_row_inplace(row: &mut [U16x2]) { let mut chunks = row.chunks_exact_mut(8); // Using a simple for-loop in this case is faster than implementation with pre-reading for chunk in &mut chunks { let mut pixels = neon_utils::load_deintrel_u16x8x2(chunk, 0); pixels.0 = neon_utils::multiply_color_to_alpha_u16x8(pixels.0, pixels.1); let dst_ptr = chunk.as_mut_ptr() as *mut u16; vst2q_u16(dst_ptr, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut chunks = reminder.chunks_exact_mut(4); if let Some(chunk) = chunks.next() { let mut pixels = neon_utils::load_deintrel_u16x4x2(chunk, 0); pixels.0 = neon_utils::multiply_color_to_alpha_u16x4(pixels.0, pixels.1); let dst_ptr = chunk.as_mut_ptr() as *mut u16; vst2_u16(dst_ptr, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } } // Divide #[target_feature(enable = "neon")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "neon")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline(always)] pub(crate) unsafe fn divide_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); // Using a simple for-loop in this case is faster than implementation with pre-reading for (src, dst) in src_chunks.zip(&mut dst_chunks) { let mut pixels = neon_utils::load_deintrel_u16x8x2(src, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst.as_mut_ptr() as *mut u16; vst2q_u16(dst_ptr, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U16x2::new([0, 0]); 8]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x2::new([0, 0]); 8]; let mut pixels = neon_utils::load_deintrel_u16x8x2(&src_pixels, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst_pixels.as_mut_ptr() as *mut u16; vst2q_u16(dst_ptr, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[inline(always)] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [U16x2]) { let mut chunks = row.chunks_exact_mut(8); // Using a simple for-loop in this case is faster than implementation with pre-reading for chunk in &mut chunks { let mut pixels = neon_utils::load_deintrel_u16x8x2(chunk, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = chunk.as_mut_ptr() as *mut u16; vst2q_u16(dst_ptr, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut src_pixels = [U16x2::new([0, 0]); 8]; src_pixels .iter_mut() .zip(reminder.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x2::new([0, 0]); 8]; let mut pixels = neon_utils::load_deintrel_u16x8x2(&src_pixels, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst_pixels.as_mut_ptr() as *mut u16; vst2q_u16(dst_ptr, pixels); dst_pixels.iter().zip(reminder).for_each(|(s, d)| *d = *s); } } #[inline(always)] unsafe fn divide_alpha_8_pixels(mut pixels: uint16x8x2_t) -> uint16x8x2_t { let zero = vdupq_n_u16(0); let alpha_scale = vdupq_n_f32(65535.0); let nonzero_alpha_mask = vmvnq_u16(vceqzq_u16(pixels.1)); // Low let alpha_scaled_u32 = vreinterpretq_u32_u16(vzip1q_u16(pixels.1, zero)); let alpha_scaled_f32 = vcvtq_f32_u32(alpha_scaled_u32); let recip_alpha_lo_f32 = vdivq_f32(alpha_scale, alpha_scaled_f32); // High let alpha_scaled_u32 = vreinterpretq_u32_u16(vzip2q_u16(pixels.1, zero)); let alpha_scaled_f32 = vcvtq_f32_u32(alpha_scaled_u32); let recip_alpha_hi_f32 = vdivq_f32(alpha_scale, alpha_scaled_f32); pixels.0 = neon_utils::mul_color_recip_alpha_u16x8( pixels.0, recip_alpha_lo_f32, recip_alpha_hi_f32, zero, ); pixels.0 = vandq_u16(pixels.0, nonzero_alpha_mask); pixels } fast_image_resize-5.3.0/src/alpha/u16x2/sse4.rs000064400000000000000000000170321046102023000173010ustar 00000000000000use std::arch::x86_64::*; use super::native; use crate::pixels::U16x2; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm_loadu_si128(src.as_ptr() as *const __m128i); let dst_ptr = dst.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_4_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [U16x2]) { let mut chunks = row.chunks_exact_mut(4); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = _mm_loadu_si128(chunk.as_ptr() as *const __m128i); let dst_ptr = chunk.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_4_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn multiply_alpha_4_pixels(pixels: __m128i) -> __m128i { let zero = _mm_setzero_si128(); let half = _mm_set1_epi32(0x8000); const MAX_A: i32 = 0xffff0000u32 as i32; let max_alpha = _mm_set1_epi32(MAX_A); /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| */ let factor_mask = _mm_set_epi8(15, 14, 15, 14, 11, 10, 11, 10, 7, 6, 7, 6, 3, 2, 3, 2); let factor_pixels = _mm_shuffle_epi8(pixels, factor_mask); let factor_pixels = _mm_or_si128(factor_pixels, max_alpha); let src_i32_lo = _mm_unpacklo_epi16(pixels, zero); let factors = _mm_unpacklo_epi16(factor_pixels, zero); let src_i32_lo = _mm_add_epi32(_mm_mullo_epi32(src_i32_lo, factors), half); let dst_i32_lo = _mm_add_epi32(src_i32_lo, _mm_srli_epi32::<16>(src_i32_lo)); let dst_i32_lo = _mm_srli_epi32::<16>(dst_i32_lo); let src_i32_hi = _mm_unpackhi_epi16(pixels, zero); let factors = _mm_unpackhi_epi16(factor_pixels, zero); let src_i32_hi = _mm_add_epi32(_mm_mullo_epi32(src_i32_hi, factors), half); let dst_i32_hi = _mm_add_epi32(src_i32_hi, _mm_srli_epi32::<16>(src_i32_hi)); let dst_i32_hi = _mm_srli_epi32::<16>(dst_i32_hi); _mm_packus_epi32(dst_i32_lo, dst_i32_hi) } // Divide #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm_loadu_si128(src.as_ptr() as *const __m128i); let dst_ptr = dst.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_4_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U16x2::new([0, 0]); 4]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x2::new([0, 0]); 4]; let mut pixels = _mm_loadu_si128(src_pixels.as_ptr() as *const __m128i); pixels = divide_alpha_4_pixels(pixels); _mm_storeu_si128(dst_pixels.as_mut_ptr() as *mut __m128i, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [U16x2]) { let mut chunks = row.chunks_exact_mut(4); // Using a simple for-loop in this case is faster than implementation with pre-reading for chunk in &mut chunks { let mut pixels = _mm_loadu_si128(chunk.as_ptr() as *const __m128i); pixels = divide_alpha_4_pixels(pixels); _mm_storeu_si128(chunk.as_mut_ptr() as *mut __m128i, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut src_pixels = [U16x2::new([0, 0]); 4]; src_pixels .iter_mut() .zip(reminder.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x2::new([0, 0]); 4]; let mut pixels = _mm_loadu_si128(src_pixels.as_ptr() as *const __m128i); pixels = divide_alpha_4_pixels(pixels); _mm_storeu_si128(dst_pixels.as_mut_ptr() as *mut __m128i, pixels); dst_pixels.iter().zip(reminder).for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn divide_alpha_4_pixels(pixels: __m128i) -> __m128i { let alpha_mask = _mm_set1_epi32(0xffff0000u32 as i32); let luma_mask = _mm_set1_epi32(0xffff); let alpha_max = _mm_set1_ps(65535.0); /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| */ let alpha32_sh = _mm_set_epi8(-1, -1, 15, 14, -1, -1, 11, 10, -1, -1, 7, 6, -1, -1, 3, 2); let alpha_f32x4 = _mm_cvtepi32_ps(_mm_shuffle_epi8(pixels, alpha32_sh)); let luma_f32x4 = _mm_cvtepi32_ps(_mm_and_si128(pixels, luma_mask)); let scaled_luma_f32x4 = _mm_mul_ps(luma_f32x4, alpha_max); let mut divided_luma_i32x4 = _mm_cvtps_epi32(_mm_div_ps(scaled_luma_f32x4, alpha_f32x4)); // Clamp result to [0..0xffff] divided_luma_i32x4 = _mm_min_epi32(divided_luma_i32x4, luma_mask); let alpha = _mm_and_si128(pixels, alpha_mask); _mm_blendv_epi8(divided_luma_i32x4, alpha, alpha_mask) } fast_image_resize-5.3.0/src/alpha/u16x2/wasm32.rs000064400000000000000000000163231046102023000175410ustar 00000000000000use std::arch::wasm32::*; use crate::pixels::U16x2; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; use super::native; pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); // A simple for-loop in this case is faster than implementation with pre-reading for (src, dst) in src_dst { let mut pixels = v128_load(src.as_ptr() as *const v128); pixels = multiply_alpha_4_pixels(pixels); v128_store(dst.as_mut_ptr() as *mut v128, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_row_inplace(row: &mut [U16x2]) { let mut chunks = row.chunks_exact_mut(4); // A simple for-loop in this case is faster than implementation with pre-reading for chunk in &mut chunks { let mut pixels = v128_load(chunk.as_ptr() as *const v128); pixels = multiply_alpha_4_pixels(pixels); v128_store(chunk.as_mut_ptr() as *mut v128, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_4_pixels(pixels: v128) -> v128 { const HALF: v128 = u32x4(0x8000, 0x8000, 0x8000, 0x8000); const MAX_ALPHA: v128 = u32x4(0xffff0000u32, 0xffff0000u32, 0xffff0000u32, 0xffff0000u32); /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| */ const FACTOR_MASK: v128 = i8x16(2, 3, 2, 3, 6, 7, 6, 7, 10, 11, 10, 11, 14, 15, 14, 15); let factor_pixels = u8x16_swizzle(pixels, FACTOR_MASK); let factor_pixels = v128_or(factor_pixels, MAX_ALPHA); let src_u32_lo = u32x4_extend_low_u16x8(pixels); let factors = u32x4_extend_low_u16x8(factor_pixels); let mut dst_i32_lo = u32x4_add(u32x4_mul(src_u32_lo, factors), HALF); dst_i32_lo = u32x4_add(dst_i32_lo, u32x4_shr(dst_i32_lo, 16)); dst_i32_lo = u32x4_shr(dst_i32_lo, 16); let src_u32_hi = u32x4_extend_high_u16x8(pixels); let factors = u32x4_extend_high_u16x8(factor_pixels); let mut dst_i32_hi = u32x4_add(u32x4_mul(src_u32_hi, factors), HALF); dst_i32_hi = u32x4_add(dst_i32_hi, u32x4_shr(dst_i32_hi, 16)); dst_i32_hi = u32x4_shr(dst_i32_hi, 16); u16x8_narrow_i32x4(dst_i32_lo, dst_i32_hi) } // Divide pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_row(src_row: &[U16x2], dst_row: &mut [U16x2]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = v128_load(src.as_ptr() as *const v128); let dst_ptr = dst.as_mut_ptr() as *mut v128; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_4_pixels(pixels); v128_store(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U16x2::new([0, 0]); 4]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x2::new([0, 0]); 4]; let mut pixels = v128_load(src_pixels.as_ptr() as *const v128); pixels = divide_alpha_4_pixels(pixels); v128_store(dst_pixels.as_mut_ptr() as *mut v128, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_row_inplace(row: &mut [U16x2]) { let mut chunks = row.chunks_exact_mut(4); // A simple for-loop in this case is as fast as implementation with pre-reading for chunk in &mut chunks { let mut pixels = v128_load(chunk.as_ptr() as *const v128); pixels = divide_alpha_4_pixels(pixels); v128_store(chunk.as_mut_ptr() as *mut v128, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut src_pixels = [U16x2::new([0, 0]); 4]; src_pixels .iter_mut() .zip(reminder.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x2::new([0, 0]); 4]; let mut pixels = v128_load(src_pixels.as_ptr() as *const v128); pixels = divide_alpha_4_pixels(pixels); v128_store(dst_pixels.as_mut_ptr() as *mut v128, pixels); dst_pixels.iter().zip(reminder).for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_4_pixels(pixels: v128) -> v128 { /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| */ const ALPHA32_SH: v128 = i8x16(2, 3, -1, -1, 6, 7, -1, -1, 10, 11, -1, -1, 14, 15, -1, -1); let alpha_f32x4 = f32x4_convert_i32x4(u8x16_swizzle(pixels, ALPHA32_SH)); let luma_mask = u32x4_splat(0xffff); let luma_f32x4 = f32x4_convert_i32x4(v128_and(pixels, luma_mask)); let alpha_max = f32x4_splat(65535.0); let scaled_luma_f32x4 = f32x4_mul(luma_f32x4, alpha_max); // In case of zero division the result will be u32::MAX or 0. let divided_luma_u32x4 = u32x4_trunc_sat_f32x4(f32x4_add( f32x4_div(scaled_luma_f32x4, alpha_f32x4), f32x4_splat(0.5), )); // All u32::MAX values in arguments will interpreted as -1i32. // u16x8_narrow_i32x4() converts all negative values into 0. let divided_luma_u16 = u16x8_narrow_i32x4(divided_luma_u32x4, divided_luma_u32x4); let alpha_mask = u32x4_splat(0xffff0000); let alpha = v128_and(pixels, alpha_mask); v128_or(u32x4_extend_low_u16x8(divided_luma_u16), alpha) } fast_image_resize-5.3.0/src/alpha/u16x4/avx2.rs000064400000000000000000000213321046102023000173030ustar 00000000000000use std::arch::x86_64::*; use super::sse4; use crate::pixels::U16x4; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm256_loadu_si256(src.as_ptr() as *const __m256i); let dst_ptr = dst.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_4_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); sse4::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [U16x4]) { let mut chunks = row.chunks_exact_mut(4); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = _mm256_loadu_si256(chunk.as_ptr() as *const __m256i); let dst_ptr = chunk.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_4_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); if !reminder.is_empty() { sse4::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_4_pixels(pixels: __m256i) -> __m256i { let zero = _mm256_setzero_si256(); let half = _mm256_set1_epi32(0x8000); const MAX_A: i64 = 0xffff000000000000u64 as i64; let max_alpha = _mm256_set1_epi64x(MAX_A); /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| |0001 0203 0405 0607| |0809 1011 1213 1415| */ let factor_mask = _mm256_set_m128i( _mm_set_epi8(15, 14, 15, 14, 15, 14, 15, 14, 7, 6, 7, 6, 7, 6, 7, 6), _mm_set_epi8(15, 14, 15, 14, 15, 14, 15, 14, 7, 6, 7, 6, 7, 6, 7, 6), ); let factor_pixels = _mm256_shuffle_epi8(pixels, factor_mask); let factor_pixels = _mm256_or_si256(factor_pixels, max_alpha); let src_i32_lo = _mm256_unpacklo_epi16(pixels, zero); let factors = _mm256_unpacklo_epi16(factor_pixels, zero); let src_i32_lo = _mm256_add_epi32(_mm256_mullo_epi32(src_i32_lo, factors), half); let dst_i32_lo = _mm256_add_epi32(src_i32_lo, _mm256_srli_epi32::<16>(src_i32_lo)); let dst_i32_lo = _mm256_srli_epi32::<16>(dst_i32_lo); let src_i32_hi = _mm256_unpackhi_epi16(pixels, zero); let factors = _mm256_unpackhi_epi16(factor_pixels, zero); let src_i32_hi = _mm256_add_epi32(_mm256_mullo_epi32(src_i32_hi, factors), half); let dst_i32_hi = _mm256_add_epi32(src_i32_hi, _mm256_srli_epi32::<16>(src_i32_hi)); let dst_i32_hi = _mm256_srli_epi32::<16>(dst_i32_hi); _mm256_packus_epi32(dst_i32_lo, dst_i32_hi) } // Divide #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm256_loadu_si256(src.as_ptr() as *const __m256i); let dst_ptr = dst.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_4_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U16x4::new([0, 0, 0, 0]); 4]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x4::new([0, 0, 0, 0]); 4]; let mut pixels = _mm256_loadu_si256(src_pixels.as_ptr() as *const __m256i); pixels = divide_alpha_4_pixels(pixels); _mm256_storeu_si256(dst_pixels.as_mut_ptr() as *mut __m256i, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [U16x4]) { let mut chunks = row.chunks_exact_mut(4); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = _mm256_loadu_si256(chunk.as_ptr() as *const __m256i); let dst_ptr = chunk.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_4_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut src_pixels = [U16x4::new([0, 0, 0, 0]); 4]; src_pixels .iter_mut() .zip(reminder.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x4::new([0, 0, 0, 0]); 4]; let mut pixels = _mm256_loadu_si256(src_pixels.as_ptr() as *const __m256i); pixels = divide_alpha_4_pixels(pixels); _mm256_storeu_si256(dst_pixels.as_mut_ptr() as *mut __m256i, pixels); dst_pixels.iter().zip(reminder).for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn divide_alpha_4_pixels(pixels: __m256i) -> __m256i { let zero = _mm256_setzero_si256(); let alpha_mask = _mm256_set1_epi64x(0xffff000000000000u64 as i64); let alpha_max = _mm256_set1_ps(65535.0); /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| |0001 0203 0405 0607| |0809 1011 1213 1415| */ let alpha32_sh0 = _mm256_set_m128i( _mm_set_epi8(-1, -1, 7, 6, -1, -1, 7, 6, -1, -1, 7, 6, -1, -1, 7, 6), _mm_set_epi8(-1, -1, 7, 6, -1, -1, 7, 6, -1, -1, 7, 6, -1, -1, 7, 6), ); let alpha32_sh1 = _mm256_set_m128i( _mm_set_epi8( -1, -1, 15, 14, -1, -1, 15, 14, -1, -1, 15, 14, -1, -1, 15, 14, ), _mm_set_epi8( -1, -1, 15, 14, -1, -1, 15, 14, -1, -1, 15, 14, -1, -1, 15, 14, ), ); let alpha0_f32x8 = _mm256_cvtepi32_ps(_mm256_shuffle_epi8(pixels, alpha32_sh0)); let alpha1_f32x8 = _mm256_cvtepi32_ps(_mm256_shuffle_epi8(pixels, alpha32_sh1)); let pix0_f32x8 = _mm256_cvtepi32_ps(_mm256_unpacklo_epi16(pixels, zero)); let pix1_f32x8 = _mm256_cvtepi32_ps(_mm256_unpackhi_epi16(pixels, zero)); let scaled_pix0_f32x8 = _mm256_mul_ps(pix0_f32x8, alpha_max); let scaled_pix1_f32x8 = _mm256_mul_ps(pix1_f32x8, alpha_max); let divided_pix0_i32x8 = _mm256_cvtps_epi32(_mm256_div_ps(scaled_pix0_f32x8, alpha0_f32x8)); let divided_pix1_i32x8 = _mm256_cvtps_epi32(_mm256_div_ps(scaled_pix1_f32x8, alpha1_f32x8)); // All negative values will be stored as 0. let two_pixels_i16x16 = _mm256_packus_epi32(divided_pix0_i32x8, divided_pix1_i32x8); let alpha = _mm256_and_si256(pixels, alpha_mask); _mm256_blendv_epi8(two_pixels_i16x16, alpha, alpha_mask) } fast_image_resize-5.3.0/src/alpha/u16x4/mod.rs000064400000000000000000000104241046102023000172020ustar 00000000000000use crate::pixels::U16x4; use crate::{CpuExtensions, ImageError, ImageView, ImageViewMut}; use super::AlphaMulDiv; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U16x4; impl AlphaMulDiv for P { fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { multiple(src_view, dst_view, cpu_extensions); } Ok(()) } fn multiply_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { multiply_inplace(image_view, cpu_extensions); } Ok(()) } fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { divide(src_view, dst_view, cpu_extensions); } Ok(()) } fn divide_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { divide_inplace(image_view, cpu_extensions); } Ok(()) } } fn multiple( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha(src_view, dst_view) }, _ => native::multiply_alpha(src_view, dst_view), } } fn multiply_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha_inplace(image_view) }, _ => native::multiply_alpha_inplace(image_view), } } fn divide( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha(src_view, dst_view) }, _ => native::divide_alpha(src_view, dst_view), } } fn divide_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha_inplace(image_view) }, _ => native::divide_alpha_inplace(image_view), } } fast_image_resize-5.3.0/src/alpha/u16x4/native.rs000064400000000000000000000060551046102023000177160ustar 00000000000000use crate::alpha::common::{div_and_clip16, mul_div_65535, RECIP_ALPHA16}; use crate::pixels::U16x4; use crate::{ImageView, ImageViewMut}; pub(crate) fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } pub(crate) fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn multiply_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { for (src_pixel, dst_pixel) in src_row.iter().zip(dst_row) { let components: [u16; 4] = src_pixel.0; let alpha = components[3]; dst_pixel.0 = [ mul_div_65535(components[0], alpha), mul_div_65535(components[1], alpha), mul_div_65535(components[2], alpha), alpha, ]; } } #[inline(always)] pub(crate) fn multiply_alpha_row_inplace(row: &mut [U16x4]) { for pixel in row { let components: [u16; 4] = pixel.0; let alpha = components[3]; pixel.0 = [ mul_div_65535(components[0], alpha), mul_div_65535(components[1], alpha), mul_div_65535(components[2], alpha), alpha, ]; } } // Divide #[inline] pub(crate) fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[inline] pub(crate) fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn divide_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { src_row .iter() .zip(dst_row) .for_each(|(src_pixel, dst_pixel)| { let components: [u16; 4] = src_pixel.0; let alpha = components[3]; let recip_alpha = RECIP_ALPHA16[alpha as usize]; dst_pixel.0 = [ div_and_clip16(components[0], recip_alpha), div_and_clip16(components[1], recip_alpha), div_and_clip16(components[2], recip_alpha), alpha, ]; }); } #[inline(always)] pub(crate) fn divide_alpha_row_inplace(row: &mut [U16x4]) { row.iter_mut().for_each(|pixel| { let components: [u16; 4] = pixel.0; let alpha = components[3]; let recip_alpha = RECIP_ALPHA16[alpha as usize]; pixel.0 = [ div_and_clip16(components[0], recip_alpha), div_and_clip16(components[1], recip_alpha), div_and_clip16(components[2], recip_alpha), alpha, ]; }); } fast_image_resize-5.3.0/src/alpha/u16x4/neon.rs000064400000000000000000000201531046102023000173620ustar 00000000000000use std::arch::aarch64::*; use crate::neon_utils; use crate::pixels::U16x4; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; use super::native; #[target_feature(enable = "neon")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "neon")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline(always)] unsafe fn multiply_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = neon_utils::load_deintrel_u16x8x4(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut u16; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels.0 = neon_utils::multiply_color_to_alpha_u16x8(pixels.0, pixels.3); pixels.1 = neon_utils::multiply_color_to_alpha_u16x8(pixels.1, pixels.3); pixels.2 = neon_utils::multiply_color_to_alpha_u16x8(pixels.2, pixels.3); vst4q_u16(dst_ptr, pixels); }, ); let src_chunks = src_remainder.chunks_exact(4); let src_remainder = src_chunks.remainder(); let dst_reminder = dst_chunks.into_remainder(); let mut dst_chunks = dst_reminder.chunks_exact_mut(4); let mut src_dst = src_chunks.zip(&mut dst_chunks); if let Some((src, dst)) = src_dst.next() { let mut pixels = neon_utils::load_deintrel_u16x4x4(src, 0); pixels.0 = neon_utils::multiply_color_to_alpha_u16x4(pixels.0, pixels.3); pixels.1 = neon_utils::multiply_color_to_alpha_u16x4(pixels.1, pixels.3); pixels.2 = neon_utils::multiply_color_to_alpha_u16x4(pixels.2, pixels.3); let dst_ptr = dst.as_mut_ptr() as *mut u16; vst4_u16(dst_ptr, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline(always)] unsafe fn multiply_alpha_row_inplace(row: &mut [U16x4]) { let mut chunks = row.chunks_exact_mut(8); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = neon_utils::load_deintrel_u16x8x4(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut u16; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels.0 = neon_utils::multiply_color_to_alpha_u16x8(pixels.0, pixels.3); pixels.1 = neon_utils::multiply_color_to_alpha_u16x8(pixels.1, pixels.3); pixels.2 = neon_utils::multiply_color_to_alpha_u16x8(pixels.2, pixels.3); vst4q_u16(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); let mut chunks = reminder.chunks_exact_mut(4); if let Some(chunk) = chunks.next() { let mut pixels = neon_utils::load_deintrel_u16x4x4(chunk, 0); pixels.0 = neon_utils::multiply_color_to_alpha_u16x4(pixels.0, pixels.3); pixels.1 = neon_utils::multiply_color_to_alpha_u16x4(pixels.1, pixels.3); pixels.2 = neon_utils::multiply_color_to_alpha_u16x4(pixels.2, pixels.3); let dst_ptr = chunk.as_mut_ptr() as *mut u16; vst4_u16(dst_ptr, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } // Divide #[target_feature(enable = "neon")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "neon")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline(always)] pub(crate) unsafe fn divide_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = neon_utils::load_deintrel_u16x8x4(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut u16; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_8_pixels(pixels); vst4q_u16(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U16x4::new([0; 4]); 8]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x4::new([0; 4]); 8]; let mut pixels = neon_utils::load_deintrel_u16x8x4(&src_pixels, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst_pixels.as_mut_ptr() as *mut u16; vst4q_u16(dst_ptr, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[inline(always)] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [U16x4]) { let mut chunks = row.chunks_exact_mut(8); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = neon_utils::load_deintrel_u16x8x4(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut u16; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_8_pixels(pixels); vst4q_u16(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut src_pixels = [U16x4::new([0; 4]); 8]; src_pixels .iter_mut() .zip(reminder.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U16x4::new([0; 4]); 8]; let mut pixels = neon_utils::load_deintrel_u16x8x4(&src_pixels, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst_pixels.as_mut_ptr() as *mut u16; vst4q_u16(dst_ptr, pixels); dst_pixels.iter().zip(reminder).for_each(|(s, d)| *d = *s); } } #[inline(always)] unsafe fn divide_alpha_8_pixels(mut pixels: uint16x8x4_t) -> uint16x8x4_t { let zero = vdupq_n_u16(0); let alpha_scale = vdupq_n_f32(65535.0); let nonzero_alpha_mask = vmvnq_u16(vceqzq_u16(pixels.3)); // Low let alpha_scaled_u32 = vreinterpretq_u32_u16(vzip1q_u16(pixels.3, zero)); let alpha_scaled_f32 = vcvtq_f32_u32(alpha_scaled_u32); let recip_alpha_lo_f32 = vdivq_f32(alpha_scale, alpha_scaled_f32); // High let alpha_scaled_u32 = vreinterpretq_u32_u16(vzip2q_u16(pixels.3, zero)); let alpha_scaled_f32 = vcvtq_f32_u32(alpha_scaled_u32); let recip_alpha_hi_f32 = vdivq_f32(alpha_scale, alpha_scaled_f32); pixels.0 = neon_utils::mul_color_recip_alpha_u16x8( pixels.0, recip_alpha_lo_f32, recip_alpha_hi_f32, zero, ); pixels.0 = vandq_u16(pixels.0, nonzero_alpha_mask); pixels.1 = neon_utils::mul_color_recip_alpha_u16x8( pixels.1, recip_alpha_lo_f32, recip_alpha_hi_f32, zero, ); pixels.1 = vandq_u16(pixels.1, nonzero_alpha_mask); pixels.2 = neon_utils::mul_color_recip_alpha_u16x8( pixels.2, recip_alpha_lo_f32, recip_alpha_hi_f32, zero, ); pixels.2 = vandq_u16(pixels.2, nonzero_alpha_mask); pixels } fast_image_resize-5.3.0/src/alpha/u16x4/sse4.rs000064400000000000000000000174111046102023000173040ustar 00000000000000use std::arch::x86_64::*; use crate::pixels::U16x4; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; use super::native; #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { let src_chunks = src_row.chunks_exact(2); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(2); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm_loadu_si128(src.as_ptr() as *const __m128i); let dst_ptr = dst.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_2_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [U16x4]) { let mut chunks = row.chunks_exact_mut(2); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = _mm_loadu_si128(chunk.as_ptr() as *const __m128i); let dst_ptr = chunk.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_2_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); let remainder = chunks.into_remainder(); if !remainder.is_empty() { native::multiply_alpha_row_inplace(remainder); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn multiply_alpha_2_pixels(pixels: __m128i) -> __m128i { let zero = _mm_setzero_si128(); let half = _mm_set1_epi32(0x8000); const MAX_A: i64 = 0xffff000000000000u64 as i64; let max_alpha = _mm_set1_epi64x(MAX_A); /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| */ let factor_mask = _mm_set_epi8(15, 14, 15, 14, 15, 14, 15, 14, 7, 6, 7, 6, 7, 6, 7, 6); let factor_pixels = _mm_shuffle_epi8(pixels, factor_mask); let factor_pixels = _mm_or_si128(factor_pixels, max_alpha); let src_i32_lo = _mm_unpacklo_epi16(pixels, zero); let factors = _mm_unpacklo_epi16(factor_pixels, zero); let src_i32_lo = _mm_add_epi32(_mm_mullo_epi32(src_i32_lo, factors), half); let dst_i32_lo = _mm_add_epi32(src_i32_lo, _mm_srli_epi32::<16>(src_i32_lo)); let dst_i32_lo = _mm_srli_epi32::<16>(dst_i32_lo); let src_i32_hi = _mm_unpackhi_epi16(pixels, zero); let factors = _mm_unpackhi_epi16(factor_pixels, zero); let src_i32_hi = _mm_add_epi32(_mm_mullo_epi32(src_i32_hi, factors), half); let dst_i32_hi = _mm_add_epi32(src_i32_hi, _mm_srli_epi32::<16>(src_i32_hi)); let dst_i32_hi = _mm_srli_epi32::<16>(dst_i32_hi); _mm_packus_epi32(dst_i32_lo, dst_i32_hi) } // Divide #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { let src_chunks = src_row.chunks_exact(2); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(2); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm_loadu_si128(src.as_ptr() as *const __m128i); let dst_ptr = dst.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_2_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); if let Some(src) = src_remainder.first() { let src_pixels = [*src, U16x4::new([0, 0, 0, 0])]; let mut dst_pixels = [U16x4::new([0, 0, 0, 0]); 2]; let mut pixels = _mm_loadu_si128(src_pixels.as_ptr() as *const __m128i); pixels = divide_alpha_2_pixels(pixels); _mm_storeu_si128(dst_pixels.as_mut_ptr() as *mut __m128i, pixels); let dst_reminder = dst_chunks.into_remainder(); if let Some(dst) = dst_reminder.get_mut(0) { *dst = dst_pixels[0]; } } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [U16x4]) { let mut chunks = row.chunks_exact_mut(2); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = _mm_loadu_si128(chunk.as_ptr() as *const __m128i); let dst_ptr = chunk.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_2_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); if let Some(pixel) = reminder.first_mut() { let src_pixels = [*pixel, U16x4::new([0, 0, 0, 0])]; let mut dst_pixels = [U16x4::new([0, 0, 0, 0]); 2]; let mut pixels = _mm_loadu_si128(src_pixels.as_ptr() as *const __m128i); pixels = divide_alpha_2_pixels(pixels); _mm_storeu_si128(dst_pixels.as_mut_ptr() as *mut __m128i, pixels); *pixel = dst_pixels[0]; } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn divide_alpha_2_pixels(pixels: __m128i) -> __m128i { let zero = _mm_setzero_si128(); let alpha_mask = _mm_set1_epi64x(0xffff000000000000u64 as i64); let alpha_max = _mm_set1_ps(65535.0); /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| */ let alpha32_sh0 = _mm_set_epi8(-1, -1, 7, 6, -1, -1, 7, 6, -1, -1, 7, 6, -1, -1, 7, 6); let alpha32_sh1 = _mm_set_epi8( -1, -1, 15, 14, -1, -1, 15, 14, -1, -1, 15, 14, -1, -1, 15, 14, ); let alpha0_f32x4 = _mm_cvtepi32_ps(_mm_shuffle_epi8(pixels, alpha32_sh0)); let alpha1_f32x4 = _mm_cvtepi32_ps(_mm_shuffle_epi8(pixels, alpha32_sh1)); let pix0_f32x4 = _mm_cvtepi32_ps(_mm_unpacklo_epi16(pixels, zero)); let pix1_f32x4 = _mm_cvtepi32_ps(_mm_unpackhi_epi16(pixels, zero)); let scaled_pix0_f32x4 = _mm_mul_ps(pix0_f32x4, alpha_max); let scaled_pix1_f32x4 = _mm_mul_ps(pix1_f32x4, alpha_max); let divided_pix0_i32x4 = _mm_cvtps_epi32(_mm_div_ps(scaled_pix0_f32x4, alpha0_f32x4)); let divided_pix1_i32x4 = _mm_cvtps_epi32(_mm_div_ps(scaled_pix1_f32x4, alpha1_f32x4)); let two_pixels_i16x8 = _mm_packus_epi32(divided_pix0_i32x4, divided_pix1_i32x4); let alpha = _mm_and_si128(pixels, alpha_mask); _mm_blendv_epi8(two_pixels_i16x8, alpha, alpha_mask) } fast_image_resize-5.3.0/src/alpha/u16x4/wasm32.rs000064400000000000000000000173101046102023000175400ustar 00000000000000use std::arch::wasm32::*; use crate::pixels::U16x4; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; use super::native; pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { let src_chunks = src_row.chunks_exact(2); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(2); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = v128_load(src.as_ptr() as *const v128); let dst_ptr = dst.as_mut_ptr() as *mut v128; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_2_pixels(pixels); v128_store(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_row_inplace(row: &mut [U16x4]) { let mut chunks = row.chunks_exact_mut(2); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = v128_load(chunk.as_ptr() as *const v128); let dst_ptr = chunk.as_mut_ptr() as *mut v128; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_2_pixels(pixels); v128_store(dst_ptr, pixels); }, ); let remainder = chunks.into_remainder(); if !remainder.is_empty() { native::multiply_alpha_row_inplace(remainder); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_2_pixels(pixels: v128) -> v128 { let half = u32x4_splat(0x8000); let max_alpha = u64x2_splat(0xffff000000000000); /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| */ const FACTOR_MASK: v128 = i8x16(6, 7, 6, 7, 6, 7, 6, 7, 14, 15, 14, 15, 14, 15, 14, 15); let factor_pixels = u8x16_swizzle(pixels, FACTOR_MASK); let factor_pixels = v128_or(factor_pixels, max_alpha); let src_u32_lo = u32x4_extend_low_u16x8(pixels); let factors = u32x4_extend_low_u16x8(factor_pixels); let mut dst_u32_lo = u32x4_add(u32x4_mul(src_u32_lo, factors), half); dst_u32_lo = u32x4_add(dst_u32_lo, u32x4_shr(dst_u32_lo, 16)); dst_u32_lo = u32x4_shr(dst_u32_lo, 16); let src_u32_hi = u32x4_extend_high_u16x8(pixels); let factors = u32x4_extend_high_u16x8(factor_pixels); let mut dst_u32_hi = u32x4_add(u32x4_mul(src_u32_hi, factors), half); dst_u32_hi = u32x4_add(dst_u32_hi, u32x4_shr(dst_u32_hi, 16)); dst_u32_hi = u32x4_shr(dst_u32_hi, 16); u16x8_narrow_i32x4(dst_u32_lo, dst_u32_hi) } // Divide pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_row(src_row: &[U16x4], dst_row: &mut [U16x4]) { let src_chunks = src_row.chunks_exact(2); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(2); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = v128_load(src.as_ptr() as *const v128); let dst_ptr = dst.as_mut_ptr() as *mut v128; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_2_pixels(pixels); v128_store(dst_ptr, pixels); }, ); if let Some(src) = src_remainder.first() { let src_pixels = [*src, U16x4::new([0, 0, 0, 0])]; let mut dst_pixels = [U16x4::new([0, 0, 0, 0]); 2]; let mut pixels = v128_load(src_pixels.as_ptr() as *const v128); pixels = divide_alpha_2_pixels(pixels); v128_store(dst_pixels.as_mut_ptr() as *mut v128, pixels); let dst_reminder = dst_chunks.into_remainder(); if let Some(dst) = dst_reminder.get_mut(0) { *dst = dst_pixels[0]; } } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_row_inplace(row: &mut [U16x4]) { let mut chunks = row.chunks_exact_mut(2); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = v128_load(chunk.as_ptr() as *const v128); let dst_ptr = chunk.as_mut_ptr() as *mut v128; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_2_pixels(pixels); v128_store(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); if let Some(pixel) = reminder.first_mut() { let src_pixels = [*pixel, U16x4::new([0, 0, 0, 0])]; let mut dst_pixels = [U16x4::new([0, 0, 0, 0]); 2]; let mut pixels = v128_load(src_pixels.as_ptr() as *const v128); pixels = divide_alpha_2_pixels(pixels); v128_store(dst_pixels.as_mut_ptr() as *mut v128, pixels); *pixel = dst_pixels[0]; } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_2_pixels(pixels: v128) -> v128 { let zero = u64x2_splat(0); /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| */ const ALPHA32_LO_SH: v128 = i8x16(6, 7, -1, -1, 6, 7, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1); const ALPHA32_HI_SH: v128 = i8x16( 14, 15, -1, -1, 14, 15, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1, ); let alpha_lo_f32x4 = f32x4_convert_i32x4(u8x16_swizzle(pixels, ALPHA32_LO_SH)); let alpha_hi_f32x4 = f32x4_convert_i32x4(u8x16_swizzle(pixels, ALPHA32_HI_SH)); let pix_lo_f32x4 = f32x4_convert_i32x4(i16x8_shuffle::<0, 8, 1, 9, 2, 10, 3, 11>(pixels, zero)); let pix_hi_f32x4 = f32x4_convert_i32x4(i16x8_shuffle::<4, 12, 5, 13, 6, 14, 7, 15>(pixels, zero)); let alpha_max = f32x4_splat(65535.0); let scaled_pix_lo_f32x4 = f32x4_mul(pix_lo_f32x4, alpha_max); let scaled_pix_hi_f32x4 = f32x4_mul(pix_hi_f32x4, alpha_max); // In case of zero division the result will be u32::MAX or 0. let divided_pix_lo_u32x4 = u32x4_trunc_sat_f32x4(f32x4_add( f32x4_div(scaled_pix_lo_f32x4, alpha_lo_f32x4), f32x4_splat(0.5), )); let divided_pix_hi_u32x4 = u32x4_trunc_sat_f32x4(f32x4_add( f32x4_div(scaled_pix_hi_f32x4, alpha_hi_f32x4), f32x4_splat(0.5), )); // All u32::MAX values in arguments will interpreted as -1i32. // u16x8_narrow_i32x4() converts all negative values into 0. let two_pixels_i16x8 = u16x8_narrow_i32x4(divided_pix_lo_u32x4, divided_pix_hi_u32x4); let alpha_mask = u64x2_splat(0xffff000000000000); let alpha = v128_and(pixels, alpha_mask); v128_or(two_pixels_i16x8, alpha) } fast_image_resize-5.3.0/src/alpha/u8x2/avx2.rs000064400000000000000000000163421046102023000172270ustar 00000000000000use std::arch::x86_64::*; use crate::pixels::U8x2; use crate::utils::foreach_with_pre_reading; use crate::{simd_utils, ImageView, ImageViewMut}; use super::sse4; #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { let src_chunks = src_row.chunks_exact(16); let src_tail = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(16); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = simd_utils::loadu_si256(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_16_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); if !src_tail.is_empty() { let dst_tail = dst_chunks.into_remainder(); sse4::multiply_alpha_row(src_tail, dst_tail); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_row_inplace(row: &mut [U8x2]) { let mut chunks = row.chunks_exact_mut(16); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = simd_utils::loadu_si256(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_16_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); let tail = chunks.into_remainder(); if !tail.is_empty() { sse4::multiply_alpha_row_inplace(tail); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_16_pixels(pixels: __m256i) -> __m256i { let zero = _mm256_setzero_si256(); let half = _mm256_set1_epi16(128); const MAX_A: i16 = 0xff00u16 as i16; let max_alpha = _mm256_set1_epi16(MAX_A); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| */ #[rustfmt::skip] let factor_mask = _mm256_set_epi8( 15, 15, 13, 13, 11, 11, 9, 9, 7, 7, 5, 5, 3, 3, 1, 1, 15, 15, 13, 13, 11, 11, 9, 9, 7, 7, 5, 5, 3, 3, 1, 1 ); let factor_pixels = _mm256_shuffle_epi8(pixels, factor_mask); let factor_pixels = _mm256_or_si256(factor_pixels, max_alpha); let src_i16_lo = _mm256_unpacklo_epi8(pixels, zero); let factors = _mm256_unpacklo_epi8(factor_pixels, zero); let src_i16_lo = _mm256_add_epi16(_mm256_mullo_epi16(src_i16_lo, factors), half); let dst_i16_lo = _mm256_add_epi16(src_i16_lo, _mm256_srli_epi16::<8>(src_i16_lo)); let dst_i16_lo = _mm256_srli_epi16::<8>(dst_i16_lo); let src_i16_hi = _mm256_unpackhi_epi8(pixels, zero); let factors = _mm256_unpackhi_epi8(factor_pixels, zero); let src_i16_hi = _mm256_add_epi16(_mm256_mullo_epi16(src_i16_hi, factors), half); let dst_i16_hi = _mm256_add_epi16(src_i16_hi, _mm256_srli_epi16::<8>(src_i16_hi)); let dst_i16_hi = _mm256_srli_epi16::<8>(dst_i16_hi); _mm256_packus_epi16(dst_i16_lo, dst_i16_hi) } // Divide #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[target_feature(enable = "avx2")] unsafe fn divide_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { let src_chunks = src_row.chunks_exact(16); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(16); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = simd_utils::loadu_si256(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_16_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); sse4::divide_alpha_row(src_remainder, dst_reminder); } } #[target_feature(enable = "avx2")] unsafe fn divide_alpha_row_inplace(row: &mut [U8x2]) { let mut chunks = row.chunks_exact_mut(16); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = simd_utils::loadu_si256(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_16_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); if !reminder.is_empty() { sse4::divide_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn divide_alpha_16_pixels(pixels: __m256i) -> __m256i { let alpha_mask = _mm256_set1_epi16(0xff00u16 as i16); let luma_mask = _mm256_set1_epi16(0xff); #[rustfmt::skip] let alpha32_sh_lo = _mm256_set_epi8( -1, -1, -1, 7, -1, -1, -1, 5, -1, -1, -1, 3, -1, -1, -1, 1, -1, -1, -1, 7, -1, -1, -1, 5, -1, -1, -1, 3, -1, -1, -1, 1, ); #[rustfmt::skip] let alpha32_sh_hi = _mm256_set_epi8( -1, -1, -1, 15, -1, -1, -1, 13, -1, -1, -1, 11, -1, -1, -1, 9, -1, -1, -1, 15, -1, -1, -1, 13, -1, -1, -1, 11, -1, -1, -1, 9, ); let alpha_scale = _mm256_set1_ps(255.0 * 256.0); let alpha_lo_f32 = _mm256_cvtepi32_ps(_mm256_shuffle_epi8(pixels, alpha32_sh_lo)); let scaled_alpha_lo_i32 = _mm256_cvtps_epi32(_mm256_div_ps(alpha_scale, alpha_lo_f32)); let alpha_hi_f32 = _mm256_cvtepi32_ps(_mm256_shuffle_epi8(pixels, alpha32_sh_hi)); let scaled_alpha_hi_i32 = _mm256_cvtps_epi32(_mm256_div_ps(alpha_scale, alpha_hi_f32)); let scaled_alpha_i16 = _mm256_packus_epi32(scaled_alpha_lo_i32, scaled_alpha_hi_i32); let luma_i16 = _mm256_and_si256(pixels, luma_mask); let luma_i16 = _mm256_slli_epi16::<7>(luma_i16); let scaled_luma_i16 = _mm256_mulhrs_epi16(luma_i16, scaled_alpha_i16); let scaled_luma_i16 = _mm256_min_epu16(scaled_luma_i16, luma_mask); let alpha = _mm256_and_si256(pixels, alpha_mask); _mm256_blendv_epi8(scaled_luma_i16, alpha, alpha_mask) } fast_image_resize-5.3.0/src/alpha/u8x2/mod.rs000064400000000000000000000104551046102023000171250ustar 00000000000000use crate::cpu_extensions::CpuExtensions; use crate::pixels::U8x2; use crate::{ImageError, ImageView, ImageViewMut}; use super::AlphaMulDiv; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U8x2; impl AlphaMulDiv for P { fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { multiple(src_view, dst_view, cpu_extensions); } Ok(()) } fn multiply_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { multiply_inplace(image_view, cpu_extensions); } Ok(()) } fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { divide(src_view, dst_view, cpu_extensions); } Ok(()) } fn divide_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { divide_inplace(image_view, cpu_extensions); } Ok(()) } } fn multiple( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha(src_view, dst_view) }, _ => native::multiply_alpha(src_view, dst_view), } } fn multiply_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha_inplace(image_view) }, _ => native::multiply_alpha_inplace(image_view), } } fn divide( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha(src_view, dst_view) }, _ => native::divide_alpha(src_view, dst_view), } } fn divide_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha_inplace(image_view) }, _ => native::divide_alpha_inplace(image_view), } } fast_image_resize-5.3.0/src/alpha/u8x2/native.rs000064400000000000000000000043451046102023000176350ustar 00000000000000use crate::alpha::common::{div_and_clip, mul_div_255, RECIP_ALPHA}; use crate::pixels::U8x2; use crate::{ImageView, ImageViewMut}; pub(crate) fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } pub(crate) fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn multiply_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { for (src_pixel, dst_pixel) in src_row.iter().zip(dst_row) { let components: [u8; 2] = src_pixel.0; let alpha = components[1]; dst_pixel.0 = [mul_div_255(components[0], alpha), alpha]; } } #[inline(always)] pub(crate) fn multiply_alpha_row_inplace(row: &mut [U8x2]) { for pixel in row { let components: [u8; 2] = pixel.0; let alpha = components[1]; pixel.0 = [mul_div_255(components[0], alpha), alpha]; } } // Divide #[inline] pub(crate) fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[inline] pub(crate) fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for dst_row in image_view.iter_rows_mut(0) { let src_row = unsafe { std::slice::from_raw_parts(dst_row.as_ptr(), dst_row.len()) }; divide_alpha_row(src_row, dst_row); } } #[inline(always)] pub(crate) fn divide_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { src_row .iter() .zip(dst_row) .for_each(|(src_pixel, dst_pixel)| { let components: [u8; 2] = src_pixel.0; let alpha = components[1]; let recip_alpha = RECIP_ALPHA[alpha as usize]; dst_pixel.0 = [div_and_clip(components[0], recip_alpha), alpha]; }); } fast_image_resize-5.3.0/src/alpha/u8x2/neon.rs000064400000000000000000000302451046102023000173040ustar 00000000000000use std::arch::aarch64::*; use crate::neon_utils; use crate::pixels::U8x2; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; use super::native; #[target_feature(enable = "neon")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "neon")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline(always)] unsafe fn multiply_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { let src_chunks = src_row.chunks_exact(32); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(32); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = neon_utils::load_deintrel_u8x16x4(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut u8; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiplies_alpha_32_pixels(pixels); vst4q_u8(dst_ptr, pixels); }, ); let src_chunks = src_remainder.chunks_exact(16); let src_remainder = src_chunks.remainder(); let dst_reminder = dst_chunks.into_remainder(); let mut dst_chunks = dst_reminder.chunks_exact_mut(16); let mut src_dst = src_chunks.zip(&mut dst_chunks); if let Some((src, dst)) = src_dst.next() { let mut pixels = neon_utils::load_deintrel_u8x8x4(src, 0); pixels = multiplies_alpha_16_pixels(pixels); let dst_ptr = dst.as_mut_ptr() as *mut u8; vst4_u8(dst_ptr, pixels); } let src_chunks = src_remainder.chunks_exact(8); let src_remainder = src_chunks.remainder(); let dst_reminder = dst_chunks.into_remainder(); let mut dst_chunks = dst_reminder.chunks_exact_mut(8); let mut src_dst = src_chunks.zip(&mut dst_chunks); if let Some((src, dst)) = src_dst.next() { let mut pixels = neon_utils::load_deintrel_u8x8x2(src, 0); pixels = multiplies_alpha_8_pixels(pixels); let dst_ptr = dst.as_mut_ptr() as *mut u8; vst2_u8(dst_ptr, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline(always)] unsafe fn multiply_alpha_row_inplace(row: &mut [U8x2]) { let mut chunks = row.chunks_exact_mut(32); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = neon_utils::load_deintrel_u8x16x4(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut u8; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiplies_alpha_32_pixels(pixels); vst4q_u8(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); let mut chunks = reminder.chunks_exact_mut(16); if let Some(chunk) = chunks.next() { let mut pixels = neon_utils::load_deintrel_u8x8x4(chunk, 0); pixels = multiplies_alpha_16_pixels(pixels); let chunk_ptr = chunk.as_mut_ptr() as *mut u8; vst4_u8(chunk_ptr, pixels); } let reminder = chunks.into_remainder(); let mut chunks = reminder.chunks_exact_mut(8); if let Some(chunk) = chunks.next() { let mut pixels = neon_utils::load_deintrel_u8x8x2(chunk, 0); pixels = multiplies_alpha_8_pixels(pixels); let chunk_ptr = chunk.as_mut_ptr() as *mut u8; vst2_u8(chunk_ptr, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } #[inline(always)] unsafe fn multiplies_alpha_32_pixels(mut pixels: uint8x16x4_t) -> uint8x16x4_t { let zero_u8x16 = vdupq_n_u8(0); let alpha_u16 = uint16x8x2_t( vreinterpretq_u16_u8(vzip1q_u8(pixels.1, zero_u8x16)), vreinterpretq_u16_u8(vzip2q_u8(pixels.1, zero_u8x16)), ); pixels.0 = neon_utils::mul_color_to_alpha_u8x16(pixels.0, alpha_u16, zero_u8x16); let alpha_u16 = uint16x8x2_t( vreinterpretq_u16_u8(vzip1q_u8(pixels.3, zero_u8x16)), vreinterpretq_u16_u8(vzip2q_u8(pixels.3, zero_u8x16)), ); pixels.2 = neon_utils::mul_color_to_alpha_u8x16(pixels.2, alpha_u16, zero_u8x16); pixels } #[inline(always)] unsafe fn multiplies_alpha_16_pixels(mut pixels: uint8x8x4_t) -> uint8x8x4_t { let zero_u8x8 = vdup_n_u8(0); let alpha_u16_lo = vreinterpret_u16_u8(vzip1_u8(pixels.1, zero_u8x8)); let alpha_u16_hi = vreinterpret_u16_u8(vzip2_u8(pixels.1, zero_u8x8)); let alpha_u16 = vcombine_u16(alpha_u16_lo, alpha_u16_hi); pixels.0 = neon_utils::mul_color_to_alpha_u8x8(pixels.0, alpha_u16, zero_u8x8); let alpha_u16_lo = vreinterpret_u16_u8(vzip1_u8(pixels.3, zero_u8x8)); let alpha_u16_hi = vreinterpret_u16_u8(vzip2_u8(pixels.3, zero_u8x8)); let alpha_u16 = vcombine_u16(alpha_u16_lo, alpha_u16_hi); pixels.2 = neon_utils::mul_color_to_alpha_u8x8(pixels.2, alpha_u16, zero_u8x8); pixels } #[inline(always)] unsafe fn multiplies_alpha_8_pixels(mut pixels: uint8x8x2_t) -> uint8x8x2_t { let zero_u8x8 = vdup_n_u8(0); let alpha_u16_lo = vreinterpret_u16_u8(vzip1_u8(pixels.1, zero_u8x8)); let alpha_u16_hi = vreinterpret_u16_u8(vzip2_u8(pixels.1, zero_u8x8)); let alpha_u16 = vcombine_u16(alpha_u16_lo, alpha_u16_hi); pixels.0 = neon_utils::mul_color_to_alpha_u8x8(pixels.0, alpha_u16, zero_u8x8); pixels } // Divide #[target_feature(enable = "neon")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "neon")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline(always)] unsafe fn divide_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { let src_chunks = src_row.chunks_exact(16); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(16); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = neon_utils::load_deintrel_u8x16x2(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut u8; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_16_pixels(pixels); vst2q_u8(dst_ptr, pixels); }, ); let src_chunks = src_remainder.chunks_exact(8); let src_remainder = src_chunks.remainder(); let dst_reminder = dst_chunks.into_remainder(); let mut dst_chunks = dst_reminder.chunks_exact_mut(8); let mut src_dst = src_chunks.zip(&mut dst_chunks); if let Some((src, dst)) = src_dst.next() { let mut pixels = neon_utils::load_deintrel_u8x8x2(src, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst.as_mut_ptr() as *mut u8; vst2_u8(dst_ptr, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U8x2::new([0; 2]); 8]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U8x2::new([0; 2]); 8]; let mut pixels = neon_utils::load_deintrel_u8x8x2(&src_pixels, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst_pixels.as_mut_ptr() as *mut u8; vst2_u8(dst_ptr, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[inline(always)] unsafe fn divide_alpha_row_inplace(row: &mut [U8x2]) { let mut chunks = row.chunks_exact_mut(16); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = neon_utils::load_deintrel_u8x16x2(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut u8; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_16_pixels(pixels); vst2q_u8(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); let mut chunks = reminder.chunks_exact_mut(8); if let Some(chunk) = chunks.next() { let mut pixels = neon_utils::load_deintrel_u8x8x2(chunk, 0); pixels = divide_alpha_8_pixels(pixels); let chunk_ptr = chunk.as_mut_ptr() as *mut u8; vst2_u8(chunk_ptr, pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut src_pixels = [U8x2::new([0; 2]); 8]; src_pixels .iter_mut() .zip(reminder.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U8x2::new([0; 2]); 8]; let mut pixels = neon_utils::load_deintrel_u8x8x2(&src_pixels, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst_pixels.as_mut_ptr() as *mut u8; vst2_u8(dst_ptr, pixels); dst_pixels.iter().zip(reminder).for_each(|(s, d)| *d = *s); } } #[inline(always)] unsafe fn divide_alpha_16_pixels(mut pixels: uint8x16x2_t) -> uint8x16x2_t { let zero = vdupq_n_u8(0); let alpha_scale = vdupq_n_f32(255.0 * 256.0); let nonzero_alpha_mask = vmvnq_u8(vceqzq_u8(pixels.1)); let alpha_u16_lo = vzip1q_u8(pixels.1, zero); let alpha_u16_hi = vzip2q_u8(pixels.1, zero); let alpha_f32_0 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip1q_u8(alpha_u16_lo, zero))); let recip_alpha_f32_0 = vdivq_f32(alpha_scale, alpha_f32_0); let recip_alpha_u16_0 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_0)); let alpha_f32_1 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip2q_u8(alpha_u16_lo, zero))); let recip_alpha_f32_1 = vdivq_f32(alpha_scale, alpha_f32_1); let recip_alpha_u16_1 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_1)); let alpha_f32_2 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip1q_u8(alpha_u16_hi, zero))); let recip_alpha_f32_2 = vdivq_f32(alpha_scale, alpha_f32_2); let recip_alpha_u16_2 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_2)); let alpha_f32_3 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip2q_u8(alpha_u16_hi, zero))); let recip_alpha_f32_3 = vdivq_f32(alpha_scale, alpha_f32_3); let recip_alpha_u16_3 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_3)); let recip_alpha = uint16x8x2_t( vcombine_u16(recip_alpha_u16_0, recip_alpha_u16_1), vcombine_u16(recip_alpha_u16_2, recip_alpha_u16_3), ); pixels.0 = neon_utils::mul_color_recip_alpha_u8x16(pixels.0, recip_alpha, zero); pixels.0 = vandq_u8(pixels.0, nonzero_alpha_mask); pixels } #[inline(always)] unsafe fn divide_alpha_8_pixels(mut pixels: uint8x8x2_t) -> uint8x8x2_t { let zero_u8x8 = vdup_n_u8(0); let zero_u8x16 = vdupq_n_u8(0); let alpha_scale = vdupq_n_f32(255.0 * 256.0); let nonzero_alpha_mask = vmvn_u8(vceqz_u8(pixels.1)); let alpha_u16_lo = vzip1_u8(pixels.1, zero_u8x8); let alpha_u16_hi = vzip2_u8(pixels.1, zero_u8x8); let alpha_u16 = vcombine_u8(alpha_u16_lo, alpha_u16_hi); let alpha_f32_0 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip1q_u8(alpha_u16, zero_u8x16))); let recip_alpha_f32_0 = vdivq_f32(alpha_scale, alpha_f32_0); let recip_alpha_u16_0 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_0)); let alpha_f32_1 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip2q_u8(alpha_u16, zero_u8x16))); let recip_alpha_f32_1 = vdivq_f32(alpha_scale, alpha_f32_1); let recip_alpha_u16_1 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_1)); let recip_alpha = vcombine_u16(recip_alpha_u16_0, recip_alpha_u16_1); pixels.0 = neon_utils::mul_color_recip_alpha_u8x8(pixels.0, recip_alpha, zero_u8x8); pixels.0 = vand_u8(pixels.0, nonzero_alpha_mask); pixels } fast_image_resize-5.3.0/src/alpha/u8x2/sse4.rs000064400000000000000000000177161046102023000172330ustar 00000000000000use std::arch::x86_64::*; use super::native; use crate::pixels::U8x2; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm_loadu_si128(src.as_ptr() as *const __m128i); let dst_ptr = dst.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiplies_alpha_8_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [U8x2]) { let mut chunks = row.chunks_exact_mut(8); // Using a simple for-loop in this case is faster than implementation with pre-reading for chunk in &mut chunks { let src_pixels = _mm_loadu_si128(chunk.as_ptr() as *const __m128i); let dst_pixels = multiplies_alpha_8_pixels(src_pixels); _mm_storeu_si128(chunk.as_mut_ptr() as *mut __m128i, dst_pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn multiplies_alpha_8_pixels(pixels: __m128i) -> __m128i { let zero = _mm_setzero_si128(); let half = _mm_set1_epi16(128); const MAX_A: i16 = 0xff00u16 as i16; let max_alpha = _mm_set1_epi16(MAX_A); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| */ let factor_mask = _mm_set_epi8(15, 15, 13, 13, 11, 11, 9, 9, 7, 7, 5, 5, 3, 3, 1, 1); let factor_pixels = _mm_shuffle_epi8(pixels, factor_mask); let factor_pixels = _mm_or_si128(factor_pixels, max_alpha); let src_i16_lo = _mm_unpacklo_epi8(pixels, zero); let factors = _mm_unpacklo_epi8(factor_pixels, zero); let src_i16_lo = _mm_add_epi16(_mm_mullo_epi16(src_i16_lo, factors), half); let dst_i16_lo = _mm_add_epi16(src_i16_lo, _mm_srli_epi16::<8>(src_i16_lo)); let dst_i16_lo = _mm_srli_epi16::<8>(dst_i16_lo); let src_i16_hi = _mm_unpackhi_epi8(pixels, zero); let factors = _mm_unpackhi_epi8(factor_pixels, zero); let src_i16_hi = _mm_add_epi16(_mm_mullo_epi16(src_i16_hi, factors), half); let dst_i16_hi = _mm_add_epi16(src_i16_hi, _mm_srli_epi16::<8>(src_i16_hi)); let dst_i16_hi = _mm_srli_epi16::<8>(dst_i16_hi); _mm_packus_epi16(dst_i16_lo, dst_i16_hi) } // Divide #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm_loadu_si128(src.as_ptr() as *const __m128i); let dst_ptr = dst.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_8_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U8x2::new([0; 2]); 8]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U8x2::new([0; 2]); 8]; let mut pixels = _mm_loadu_si128(src_pixels.as_ptr() as *const __m128i); pixels = divide_alpha_8_pixels(pixels); _mm_storeu_si128(dst_pixels.as_mut_ptr() as *mut __m128i, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [U8x2]) { let mut chunks = row.chunks_exact_mut(8); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = _mm_loadu_si128(chunk.as_ptr() as *const __m128i); let dst_ptr = chunk.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_8_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut src_pixels = [U8x2::new([0; 2]); 8]; src_pixels .iter_mut() .zip(reminder.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U8x2::new([0; 2]); 8]; let mut pixels = _mm_loadu_si128(src_pixels.as_ptr() as *const __m128i); pixels = divide_alpha_8_pixels(pixels); _mm_storeu_si128(dst_pixels.as_mut_ptr() as *mut __m128i, pixels); dst_pixels.iter().zip(reminder).for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn divide_alpha_8_pixels(pixels: __m128i) -> __m128i { let alpha_mask = _mm_set1_epi16(0xff00u16 as i16); let luma_mask = _mm_set1_epi16(0xff); let alpha32_sh_lo = _mm_set_epi8(-1, -1, -1, 7, -1, -1, -1, 5, -1, -1, -1, 3, -1, -1, -1, 1); let alpha32_sh_hi = _mm_set_epi8( -1, -1, -1, 15, -1, -1, -1, 13, -1, -1, -1, 11, -1, -1, -1, 9, ); let alpha_scale = _mm_set1_ps(255.0 * 256.0); let alpha_lo_f32 = _mm_cvtepi32_ps(_mm_shuffle_epi8(pixels, alpha32_sh_lo)); // In case of zero division the `scaled_alpha_lo_i32` will contain negative value (-2147483648). let scaled_alpha_lo_i32 = _mm_cvtps_epi32(_mm_div_ps(alpha_scale, alpha_lo_f32)); let alpha_hi_f32 = _mm_cvtepi32_ps(_mm_shuffle_epi8(pixels, alpha32_sh_hi)); let scaled_alpha_hi_i32 = _mm_cvtps_epi32(_mm_div_ps(alpha_scale, alpha_hi_f32)); // All negative values will be stored as 0. let scaled_alpha_i16 = _mm_packus_epi32(scaled_alpha_lo_i32, scaled_alpha_hi_i32); let luma_i16 = _mm_and_si128(pixels, luma_mask); let luma_i16 = _mm_slli_epi16::<7>(luma_i16); let scaled_luma_i16 = _mm_mulhrs_epi16(luma_i16, scaled_alpha_i16); let scaled_luma_i16 = _mm_min_epu16(scaled_luma_i16, luma_mask); let alpha = _mm_and_si128(pixels, alpha_mask); _mm_blendv_epi8(scaled_luma_i16, alpha, alpha_mask) } fast_image_resize-5.3.0/src/alpha/u8x2/wasm32.rs000064400000000000000000000167211046102023000174640ustar 00000000000000use std::arch::wasm32::*; use crate::pixels::U8x2; use crate::wasm32_utils::u16x8_mul_add_shr16; use crate::{ImageView, ImageViewMut}; use super::native; pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); // A simple for-loop in this case is as fast as implementation with pre-reading for (src, dst) in src_dst { let src_pixels = v128_load(src.as_ptr() as *const v128); let dst_pixels = multiplies_alpha_8_pixels(src_pixels); v128_store(dst.as_mut_ptr() as *mut v128, dst_pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_row_inplace(row: &mut [U8x2]) { let mut chunks = row.chunks_exact_mut(8); // Using a simple for-loop in this case is as fast as implementation with pre-reading for chunk in &mut chunks { let src_pixels = v128_load(chunk.as_ptr() as *const v128); let dst_pixels = multiplies_alpha_8_pixels(src_pixels); v128_store(chunk.as_mut_ptr() as *mut v128, dst_pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { native::multiply_alpha_row_inplace(reminder); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiplies_alpha_8_pixels(pixels: v128) -> v128 { let half = u16x8_splat(128); let max_alpha = u16x8_splat(0xff00); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| */ const FACTOR_MASK: v128 = i8x16(1, 1, 3, 3, 5, 5, 7, 7, 9, 9, 11, 11, 13, 13, 15, 15); let factor_pixels = u8x16_swizzle(pixels, FACTOR_MASK); let factor_pixels = v128_or(factor_pixels, max_alpha); let src_u16_lo = u16x8_extend_low_u8x16(pixels); let factors = u16x8_extend_low_u8x16(factor_pixels); let mut dst_u16_lo = u16x8_add(u16x8_mul(src_u16_lo, factors), half); dst_u16_lo = u16x8_add(dst_u16_lo, u16x8_shr(dst_u16_lo, 8)); dst_u16_lo = u16x8_shr(dst_u16_lo, 8); let src_u16_hi = u16x8_extend_high_u8x16(pixels); let factors = u16x8_extend_high_u8x16(factor_pixels); let mut dst_u16_hi = u16x8_add(u16x8_mul(src_u16_hi, factors), half); dst_u16_hi = u16x8_add(dst_u16_hi, u16x8_shr(dst_u16_hi, 8)); dst_u16_hi = u16x8_shr(dst_u16_hi, 8); u8x16_narrow_i16x8(dst_u16_lo, dst_u16_hi) } // Divide pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_row(src_row: &[U8x2], dst_row: &mut [U8x2]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); // Using a simple for-loop in this case is as fast as implementation with pre-reading for (src, dst) in src_dst { let src_pixels = v128_load(src.as_ptr() as *const v128); let dst_pixels = divide_alpha_8_pixels(src_pixels); v128_store(dst.as_mut_ptr() as *mut v128, dst_pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U8x2::new([0, 0]); 8]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U8x2::new([0; 2]); 8]; let mut pixels = v128_load(src_pixels.as_ptr() as *const v128); pixels = divide_alpha_8_pixels(pixels); v128_store(dst_pixels.as_mut_ptr() as *mut v128, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_row_inplace(row: &mut [U8x2]) { let mut chunks = row.chunks_exact_mut(8); // Using a simple for-loop in this case is as fast as implementation with pre-reading for chunk in &mut chunks { let src_pixels = v128_load(chunk.as_ptr() as *const v128); let dst_pixels = divide_alpha_8_pixels(src_pixels); v128_store(chunk.as_mut_ptr() as *mut v128, dst_pixels); } let reminder = chunks.into_remainder(); if !reminder.is_empty() { let mut src_pixels = [U8x2::new([0; 2]); 8]; src_pixels .iter_mut() .zip(reminder.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U8x2::new([0; 2]); 8]; let mut pixels = v128_load(src_pixels.as_ptr() as *const v128); pixels = divide_alpha_8_pixels(pixels); v128_store(dst_pixels.as_mut_ptr() as *mut v128, pixels); dst_pixels.iter().zip(reminder).for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_8_pixels(pixels: v128) -> v128 { const ALPHA32_SH_LO: v128 = i8x16(1, -1, -1, -1, 3, -1, -1, -1, 5, -1, -1, -1, 7, -1, -1, -1); const ALPHA32_SH_HI: v128 = i8x16( 9, -1, -1, -1, 11, -1, -1, -1, 13, -1, -1, -1, 15, -1, -1, -1, ); let alpha_scale = f32x4_splat(255.0 * 256.0); let alpha_lo_f32 = f32x4_convert_u32x4(u8x16_swizzle(pixels, ALPHA32_SH_LO)); // In case of zero division the result will be u32::MAX or 0. let scaled_alpha_lo_u32 = u32x4_trunc_sat_f32x4(f32x4_add( f32x4_div(alpha_scale, alpha_lo_f32), f32x4_splat(0.5), )); let alpha_hi_f32 = f32x4_convert_u32x4(u8x16_swizzle(pixels, ALPHA32_SH_HI)); let scaled_alpha_hi_u32 = u32x4_trunc_sat_f32x4(f32x4_add( f32x4_div(alpha_scale, alpha_hi_f32), f32x4_splat(0.5), )); // All u32::MAX values in arguments will interpreted as -1i32. // u16x8_narrow_i32x4() converts all negative values into 0. let scaled_alpha_u16 = u16x8_narrow_i32x4(scaled_alpha_lo_u32, scaled_alpha_hi_u32); let luma_u16 = u16x8_shl(pixels, 8); let scaled_luma_u16 = u16x8_mul_add_shr16(luma_u16, scaled_alpha_u16, u32x4_splat(0x8000)); let luma_max = u16x8_splat(0xff); let scaled_luma_u16 = u16x8_min(scaled_luma_u16, luma_max); // Blend scaled luma with original alpha channel. let alpha_mask = u16x8_splat(0xff00); let alpha = v128_and(pixels, alpha_mask); v128_or(scaled_luma_u16, alpha) } fast_image_resize-5.3.0/src/alpha/u8x4/avx2.rs000064400000000000000000000166431046102023000172350ustar 00000000000000use crate::pixels::U8x4; use crate::utils::foreach_with_pre_reading; use crate::{simd_utils, ImageView, ImageViewMut}; use std::arch::x86_64::*; use super::sse4; #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { let rows = image_view.iter_rows_mut(0); for row in rows { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { let src_chunks = src_row.chunks_exact(8); let src_tail = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = simd_utils::loadu_si256(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_8_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); if !src_tail.is_empty() { let dst_tail = dst_chunks.into_remainder(); sse4::multiply_alpha_row(src_tail, dst_tail); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_row_inplace(row: &mut [U8x4]) { let mut chunks = row.chunks_exact_mut(8); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = simd_utils::loadu_si256(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_8_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); let tail = chunks.into_remainder(); if !tail.is_empty() { sse4::multiply_alpha_row_inplace(tail); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_alpha_8_pixels(pixels: __m256i) -> __m256i { let zero = _mm256_setzero_si256(); let half = _mm256_set1_epi16(128); const MAX_A: i32 = 0xff000000u32 as i32; let max_alpha = _mm256_set1_epi32(MAX_A); #[rustfmt::skip] let factor_mask = _mm256_set_epi8( 15, 15, 15, 15, 11, 11, 11, 11, 7, 7, 7, 7, 3, 3, 3, 3, 15, 15, 15, 15, 11, 11, 11, 11, 7, 7, 7, 7, 3, 3, 3, 3, ); let factor_pixels = _mm256_shuffle_epi8(pixels, factor_mask); let factor_pixels = _mm256_or_si256(factor_pixels, max_alpha); let pix1 = _mm256_unpacklo_epi8(pixels, zero); let factors = _mm256_unpacklo_epi8(factor_pixels, zero); let pix1 = _mm256_add_epi16(_mm256_mullo_epi16(pix1, factors), half); let pix1 = _mm256_add_epi16(pix1, _mm256_srli_epi16::<8>(pix1)); let pix1 = _mm256_srli_epi16::<8>(pix1); let pix2 = _mm256_unpackhi_epi8(pixels, zero); let factors = _mm256_unpackhi_epi8(factor_pixels, zero); let pix2 = _mm256_add_epi16(_mm256_mullo_epi16(pix2, factors), half); let pix2 = _mm256_add_epi16(pix2, _mm256_srli_epi16::<8>(pix2)); let pix2 = _mm256_srli_epi16::<8>(pix2); _mm256_packus_epi16(pix1, pix2) } // Divide #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); let rows = src_rows.zip(dst_rows); for (src_row, dst_row) in rows { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { let rows = image_view.iter_rows_mut(0); for row in rows { divide_alpha_row_inplace(row); } } #[target_feature(enable = "avx2")] unsafe fn divide_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { let src_chunks = src_row.chunks_exact(8); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(8); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = simd_utils::loadu_si256(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_8_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); sse4::divide_alpha_row(src_remainder, dst_reminder); } } #[target_feature(enable = "avx2")] unsafe fn divide_alpha_row_inplace(row: &mut [U8x4]) { let mut chunks = row.chunks_exact_mut(8); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = simd_utils::loadu_si256(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut __m256i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_8_pixels(pixels); _mm256_storeu_si256(dst_ptr, pixels); }, ); let tail = chunks.into_remainder(); if !tail.is_empty() { sse4::divide_alpha_row_inplace(tail); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn divide_alpha_8_pixels(pixels: __m256i) -> __m256i { let zero = _mm256_setzero_si256(); let alpha_mask = _mm256_set1_epi32(0xff000000u32 as i32); #[rustfmt::skip] let shuffle1 = _mm256_set_epi8( 5, 4, 5, 4, 5, 4, 5, 4, 1, 0, 1, 0, 1, 0, 1, 0, 5, 4, 5, 4, 5, 4, 5, 4, 1, 0, 1, 0, 1, 0, 1, 0, ); #[rustfmt::skip] let shuffle2 = _mm256_set_epi8( 13, 12, 13, 12, 13, 12, 13, 12, 9, 8, 9, 8, 9, 8, 9, 8, 13, 12, 13, 12, 13, 12, 13, 12, 9, 8, 9, 8, 9, 8, 9, 8, ); let alpha_scale = _mm256_set1_ps(255.0 * 256.0); let max_value = _mm256_set1_epi16(0xff); let alpha_f32 = _mm256_cvtepi32_ps(_mm256_srli_epi32::<24>(pixels)); let recip_alpha_i32 = _mm256_cvtps_epi32(_mm256_div_ps(alpha_scale, alpha_f32)); // Recip alpha in Q8.8 format let recip_alpha_lo_q8_8 = _mm256_shuffle_epi8(recip_alpha_i32, shuffle1); let recip_alpha_hi_q8_8 = _mm256_shuffle_epi8(recip_alpha_i32, shuffle2); // Pixels components in format Q9.7 let components_lo_q9_7 = _mm256_slli_epi16::<7>(_mm256_unpacklo_epi8(pixels, zero)); let components_hi_q9_7 = _mm256_slli_epi16::<7>(_mm256_unpackhi_epi8(pixels, zero)); // Multiplied pixels components as i16. // // fn _mm256_mulhrs_epi16(a: i16, b: i16) -> i16 { // let tmp: i32 = ((a as i32 * b as i32) >> 14) + 1; // (tmp >> 1) as i16 // } let res_components_lo_i16 = _mm256_min_epu16( _mm256_mulhrs_epi16(components_lo_q9_7, recip_alpha_lo_q8_8), max_value, ); let res_components_hi_i16 = _mm256_min_epu16( _mm256_mulhrs_epi16(components_hi_q9_7, recip_alpha_hi_q8_8), max_value, ); let alpha = _mm256_and_si256(pixels, alpha_mask); let rgb = _mm256_packus_epi16(res_components_lo_i16, res_components_hi_i16); _mm256_blendv_epi8(rgb, alpha, alpha_mask) } fast_image_resize-5.3.0/src/alpha/u8x4/mod.rs000064400000000000000000000104211046102023000171200ustar 00000000000000use super::AlphaMulDiv; use crate::pixels::U8x4; use crate::{CpuExtensions, ImageError, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U8x4; impl AlphaMulDiv for P { fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { multiple(src_view, dst_view, cpu_extensions); } Ok(()) } fn multiply_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { multiply_inplace(image_view, cpu_extensions); } Ok(()) } fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_two_images! { divide(src_view, dst_view, cpu_extensions); } Ok(()) } fn divide_alpha_inplace( image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) -> Result<(), ImageError> { process_one_images! { divide_inplace(image_view, cpu_extensions); } Ok(()) } } fn multiple( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::multiply_alpha(src_view, dst_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha(src_view, dst_view) }, _ => native::multiply_alpha(src_view, dst_view), } } fn multiply_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::multiply_alpha_inplace(image_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::multiply_alpha_inplace(image_view) }, _ => native::multiply_alpha_inplace(image_view), } } fn divide( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::divide_alpha(src_view, dst_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha(src_view, dst_view) }, _ => native::divide_alpha(src_view, dst_view), } } fn divide_inplace(image_view: &mut impl ImageViewMut, cpu_extensions: CpuExtensions) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => unsafe { avx2::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => unsafe { sse4::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => unsafe { neon::divide_alpha_inplace(image_view) }, #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => unsafe { wasm32::divide_alpha_inplace(image_view) }, _ => native::divide_alpha_inplace(image_view), } } fast_image_resize-5.3.0/src/alpha/u8x4/native.rs000064400000000000000000000051241046102023000176330ustar 00000000000000use crate::alpha::common::{div_and_clip, mul_div_255, RECIP_ALPHA}; use crate::pixels::U8x4; use crate::{ImageView, ImageViewMut}; pub(crate) fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); let rows = src_rows.zip(dst_rows); for (src_row, dst_row) in rows { for (src_pixel, dst_pixel) in src_row.iter().zip(dst_row.iter_mut()) { *dst_pixel = multiply_alpha_pixel(*src_pixel); } } } pub(crate) fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { let rows = image_view.iter_rows_mut(0); for row in rows { multiply_alpha_row_inplace(row); } } #[inline(always)] pub(crate) fn multiply_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { for (src_pixel, dst_pixel) in src_row.iter().zip(dst_row) { *dst_pixel = multiply_alpha_pixel(*src_pixel); } } #[inline(always)] pub(crate) fn multiply_alpha_row_inplace(row: &mut [U8x4]) { for pixel in row.iter_mut() { *pixel = multiply_alpha_pixel(*pixel); } } #[inline(always)] fn multiply_alpha_pixel(mut pixel: U8x4) -> U8x4 { let alpha = pixel.0[3]; pixel.0 = [ mul_div_255(pixel.0[0], alpha), mul_div_255(pixel.0[1], alpha), mul_div_255(pixel.0[2], alpha), alpha, ]; pixel } // Divide #[inline] pub(crate) fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); let rows = src_rows.zip(dst_rows); for (src_row, dst_row) in rows { divide_alpha_row(src_row, dst_row); } } #[inline] pub(crate) fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { let rows = image_view.iter_rows_mut(0); for row in rows { row.iter_mut().for_each(|pixel| { *pixel = divide_alpha_pixel(*pixel); }); } } #[inline(always)] pub(crate) fn divide_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { for (src_pixel, dst_pixel) in src_row.iter().zip(dst_row) { *dst_pixel = divide_alpha_pixel(*src_pixel); } } #[inline(always)] fn divide_alpha_pixel(mut pixel: U8x4) -> U8x4 { let alpha = pixel.0[3]; let recip_alpha = RECIP_ALPHA[alpha as usize]; pixel.0 = [ div_and_clip(pixel.0[0], recip_alpha), div_and_clip(pixel.0[1], recip_alpha), div_and_clip(pixel.0[2], recip_alpha), alpha, ]; pixel } fast_image_resize-5.3.0/src/alpha/u8x4/neon.rs000064400000000000000000000262151046102023000173100ustar 00000000000000use std::arch::aarch64::*; use crate::neon_utils; use crate::pixels::U8x4; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; use super::native; #[target_feature(enable = "neon")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "neon")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline(always)] unsafe fn multiply_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { let src_chunks = src_row.chunks_exact(16); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(16); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = neon_utils::load_deintrel_u8x16x4(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut u8; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiplies_alpha_16_pixles(pixels); vst4q_u8(dst_ptr, pixels); }, ); let src_chunks = src_remainder.chunks_exact(8); let src_remainder = src_chunks.remainder(); let dst_reminder = dst_chunks.into_remainder(); let mut dst_chunks = dst_reminder.chunks_exact_mut(8); for (src, dst) in src_chunks.zip(&mut dst_chunks) { let mut pixels = neon_utils::load_deintrel_u8x8x4(src, 0); pixels = multiplies_alpha_8_pixles(pixels); let dst_ptr = dst.as_mut_ptr() as *mut u8; vst4_u8(dst_ptr, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline(always)] unsafe fn multiply_alpha_row_inplace(row: &mut [U8x4]) { let mut chunks = row.chunks_exact_mut(16); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = neon_utils::load_deintrel_u8x16x4(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut u8; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiplies_alpha_16_pixles(pixels); vst4q_u8(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); let mut chunks = reminder.chunks_exact_mut(8); if let Some(chunk) = chunks.next() { let mut pixels = neon_utils::load_deintrel_u8x8x4(chunk, 0); pixels = multiplies_alpha_8_pixles(pixels); let dst_ptr = chunk.as_mut_ptr() as *mut u8; vst4_u8(dst_ptr, pixels); } let tail = chunks.into_remainder(); if !tail.is_empty() { native::multiply_alpha_row_inplace(tail); } } #[inline(always)] unsafe fn multiplies_alpha_16_pixles(mut pixels: uint8x16x4_t) -> uint8x16x4_t { let zero_u8x16 = vdupq_n_u8(0); let alpha_u16 = uint16x8x2_t( vreinterpretq_u16_u8(vzip1q_u8(pixels.3, zero_u8x16)), vreinterpretq_u16_u8(vzip2q_u8(pixels.3, zero_u8x16)), ); pixels.0 = neon_utils::mul_color_to_alpha_u8x16(pixels.0, alpha_u16, zero_u8x16); pixels.1 = neon_utils::mul_color_to_alpha_u8x16(pixels.1, alpha_u16, zero_u8x16); pixels.2 = neon_utils::mul_color_to_alpha_u8x16(pixels.2, alpha_u16, zero_u8x16); pixels } #[inline(always)] unsafe fn multiplies_alpha_8_pixles(mut pixels: uint8x8x4_t) -> uint8x8x4_t { let zero_u8x8 = vdup_n_u8(0); let alpha_u8 = pixels.3; let alpha_u16_lo = vreinterpret_u16_u8(vzip1_u8(alpha_u8, zero_u8x8)); let alpha_u16_hi = vreinterpret_u16_u8(vzip2_u8(alpha_u8, zero_u8x8)); let alpha_u16 = vcombine_u16(alpha_u16_lo, alpha_u16_hi); pixels.0 = neon_utils::mul_color_to_alpha_u8x8(pixels.0, alpha_u16, zero_u8x8); pixels.1 = neon_utils::mul_color_to_alpha_u8x8(pixels.1, alpha_u16, zero_u8x8); pixels.2 = neon_utils::mul_color_to_alpha_u8x8(pixels.2, alpha_u16, zero_u8x8); pixels } // Divide #[target_feature(enable = "neon")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "neon")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inline(row); } } #[inline(always)] unsafe fn divide_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { let src_chunks = src_row.chunks_exact(16); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(16); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = neon_utils::load_deintrel_u8x16x4(src, 0); let dst_ptr = dst.as_mut_ptr() as *mut u8; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_16_pixels(pixels); vst4q_u8(dst_ptr, pixels); }, ); let src_chunks = src_remainder.chunks_exact(8); let src_remainder = src_chunks.remainder(); let dst_reminder = dst_chunks.into_remainder(); let mut dst_chunks = dst_reminder.chunks_exact_mut(8); for (src, dst) in src_chunks.zip(&mut dst_chunks) { let mut pixels = neon_utils::load_deintrel_u8x8x4(src, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst.as_mut_ptr() as *mut u8; vst4_u8(dst_ptr, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_pixels = [U8x4::new([0; 4]); 8]; src_pixels .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U8x4::new([0; 4]); 8]; let mut pixels = neon_utils::load_deintrel_u8x8x4(src_pixels.as_slice(), 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst_pixels.as_mut_ptr() as *mut u8; vst4_u8(dst_ptr, pixels); dst_pixels .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[inline(always)] unsafe fn divide_alpha_row_inline(row: &mut [U8x4]) { let mut chunks = row.chunks_exact_mut(16); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = neon_utils::load_deintrel_u8x16x4(chunk, 0); let dst_ptr = chunk.as_mut_ptr() as *mut u8; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_16_pixels(pixels); vst4q_u8(dst_ptr, pixels); }, ); let reminder = chunks.into_remainder(); let mut chunks = reminder.chunks_exact_mut(8); if let Some(chunk) = chunks.next() { let mut pixels = neon_utils::load_deintrel_u8x8x4(chunk, 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = chunk.as_mut_ptr() as *mut u8; vst4_u8(dst_ptr, pixels); } let tail = chunks.into_remainder(); if !tail.is_empty() { let mut src_pixels = [U8x4::new([0; 4]); 8]; src_pixels .iter_mut() .zip(tail.iter()) .for_each(|(d, s)| *d = *s); let mut dst_pixels = [U8x4::new([0; 4]); 8]; let mut pixels = neon_utils::load_deintrel_u8x8x4(src_pixels.as_slice(), 0); pixels = divide_alpha_8_pixels(pixels); let dst_ptr = dst_pixels.as_mut_ptr() as *mut u8; vst4_u8(dst_ptr, pixels); dst_pixels.iter().zip(tail).for_each(|(s, d)| *d = *s); } } #[inline(always)] unsafe fn divide_alpha_16_pixels(mut pixels: uint8x16x4_t) -> uint8x16x4_t { let zero = vdupq_n_u8(0); let alpha_scale = vdupq_n_f32(255.0 * 256.0); let nonzero_alpha_mask = vmvnq_u8(vceqzq_u8(pixels.3)); let alpha_u16_lo = vzip1q_u8(pixels.3, zero); let alpha_u16_hi = vzip2q_u8(pixels.3, zero); let alpha_f32_0 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip1q_u8(alpha_u16_lo, zero))); let recip_alpha_f32_0 = vdivq_f32(alpha_scale, alpha_f32_0); let recip_alpha_u16_0 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_0)); let alpha_f32_1 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip2q_u8(alpha_u16_lo, zero))); let recip_alpha_f32_1 = vdivq_f32(alpha_scale, alpha_f32_1); let recip_alpha_u16_1 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_1)); let alpha_f32_2 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip1q_u8(alpha_u16_hi, zero))); let recip_alpha_f32_2 = vdivq_f32(alpha_scale, alpha_f32_2); let recip_alpha_u16_2 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_2)); let alpha_f32_3 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip2q_u8(alpha_u16_hi, zero))); let recip_alpha_f32_3 = vdivq_f32(alpha_scale, alpha_f32_3); let recip_alpha_u16_3 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_3)); let recip_alpha = uint16x8x2_t( vcombine_u16(recip_alpha_u16_0, recip_alpha_u16_1), vcombine_u16(recip_alpha_u16_2, recip_alpha_u16_3), ); pixels.0 = neon_utils::mul_color_recip_alpha_u8x16(pixels.0, recip_alpha, zero); pixels.0 = vandq_u8(pixels.0, nonzero_alpha_mask); pixels.1 = neon_utils::mul_color_recip_alpha_u8x16(pixels.1, recip_alpha, zero); pixels.1 = vandq_u8(pixels.1, nonzero_alpha_mask); pixels.2 = neon_utils::mul_color_recip_alpha_u8x16(pixels.2, recip_alpha, zero); pixels.2 = vandq_u8(pixels.2, nonzero_alpha_mask); pixels } #[inline(always)] unsafe fn divide_alpha_8_pixels(mut pixels: uint8x8x4_t) -> uint8x8x4_t { let zero_u8x8 = vdup_n_u8(0); let zero_u8x16 = vdupq_n_u8(0); let alpha_scale = vdupq_n_f32(255.0 * 256.0); let nonzero_alpha_mask = vmvn_u8(vceqz_u8(pixels.3)); let alpha_u16_lo = vzip1_u8(pixels.3, zero_u8x8); let alpha_u16_hi = vzip2_u8(pixels.3, zero_u8x8); let alpha_u16 = vcombine_u8(alpha_u16_lo, alpha_u16_hi); let alpha_f32_0 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip1q_u8(alpha_u16, zero_u8x16))); let recip_alpha_f32_0 = vdivq_f32(alpha_scale, alpha_f32_0); let recip_alpha_u16_0 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_0)); let alpha_f32_1 = vcvtq_f32_u32(vreinterpretq_u32_u8(vzip2q_u8(alpha_u16, zero_u8x16))); let recip_alpha_f32_1 = vdivq_f32(alpha_scale, alpha_f32_1); let recip_alpha_u16_1 = vmovn_u32(vcvtaq_u32_f32(recip_alpha_f32_1)); let recip_alpha = vcombine_u16(recip_alpha_u16_0, recip_alpha_u16_1); pixels.0 = neon_utils::mul_color_recip_alpha_u8x8(pixels.0, recip_alpha, zero_u8x8); pixels.0 = vand_u8(pixels.0, nonzero_alpha_mask); pixels.1 = neon_utils::mul_color_recip_alpha_u8x8(pixels.1, recip_alpha, zero_u8x8); pixels.1 = vand_u8(pixels.1, nonzero_alpha_mask); pixels.2 = neon_utils::mul_color_recip_alpha_u8x8(pixels.2, recip_alpha, zero_u8x8); pixels.2 = vand_u8(pixels.2, nonzero_alpha_mask); pixels } fast_image_resize-5.3.0/src/alpha/u8x4/sse4.rs000064400000000000000000000200541046102023000172220ustar 00000000000000use std::arch::x86_64::*; use crate::pixels::U8x4; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; use super::native; #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); let rows = src_rows.zip(dst_rows); for (src_row, dst_row) in rows { multiply_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { let rows = image_view.iter_rows_mut(0); for row in rows { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm_loadu_si128(src.as_ptr() as *const __m128i); let dst_ptr = dst.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = multiply_alpha_4_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_alpha_row_inplace(row: &mut [U8x4]) { let mut chunks = row.chunks_exact_mut(4); // Using a simple for-loop in this case is faster than implementation with pre-reading for chunk in &mut chunks { let mut pixels = _mm_loadu_si128(chunk.as_ptr() as *const __m128i); pixels = multiply_alpha_4_pixels(pixels); _mm_storeu_si128(chunk.as_mut_ptr() as *mut __m128i, pixels); } let tail = chunks.into_remainder(); if !tail.is_empty() { native::multiply_alpha_row_inplace(tail); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn multiply_alpha_4_pixels(pixels: __m128i) -> __m128i { let zero = _mm_setzero_si128(); let half = _mm_set1_epi16(128); const MAX_A: i32 = 0xff000000u32 as i32; let max_alpha = _mm_set1_epi32(MAX_A); let factor_mask = _mm_set_epi8(15, 15, 15, 15, 11, 11, 11, 11, 7, 7, 7, 7, 3, 3, 3, 3); let factor_pixels = _mm_shuffle_epi8(pixels, factor_mask); let factor_pixels = _mm_or_si128(factor_pixels, max_alpha); let pix1 = _mm_unpacklo_epi8(pixels, zero); let factors = _mm_unpacklo_epi8(factor_pixels, zero); let pix1 = _mm_add_epi16(_mm_mullo_epi16(pix1, factors), half); let pix1 = _mm_add_epi16(pix1, _mm_srli_epi16::<8>(pix1)); let pix1 = _mm_srli_epi16::<8>(pix1); let pix2 = _mm_unpackhi_epi8(pixels, zero); let factors = _mm_unpackhi_epi8(factor_pixels, zero); let pix2 = _mm_add_epi16(_mm_mullo_epi16(pix2, factors), half); let pix2 = _mm_add_epi16(pix2, _mm_srli_epi16::<8>(pix2)); let pix2 = _mm_srli_epi16::<8>(pix2); _mm_packus_epi16(pix1, pix2) } // Divide #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); let rows = src_rows.zip(dst_rows); for (src_row, dst_row) in rows { divide_alpha_row(src_row, dst_row); } } #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { let rows = image_view.iter_rows_mut(0); for row in rows { divide_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); foreach_with_pre_reading( src_dst, |(src, dst)| { let pixels = _mm_loadu_si128(src.as_ptr() as *const __m128i); let dst_ptr = dst.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_4_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_buffer = [U8x4::new([0; 4]); 4]; src_buffer .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_buffer = [U8x4::new([0; 4]); 4]; let src_pixels = _mm_loadu_si128(src_buffer.as_ptr() as *const __m128i); let dst_pixels = divide_alpha_4_pixels(src_pixels); _mm_storeu_si128(dst_buffer.as_mut_ptr() as *mut __m128i, dst_pixels); dst_buffer .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn divide_alpha_row_inplace(row: &mut [U8x4]) { let mut chunks = row.chunks_exact_mut(4); foreach_with_pre_reading( &mut chunks, |chunk| { let pixels = _mm_loadu_si128(chunk.as_ptr() as *const __m128i); let dst_ptr = chunk.as_mut_ptr() as *mut __m128i; (pixels, dst_ptr) }, |(mut pixels, dst_ptr)| { pixels = divide_alpha_4_pixels(pixels); _mm_storeu_si128(dst_ptr, pixels); }, ); let tail = chunks.into_remainder(); if !tail.is_empty() { let mut src_buffer = [U8x4::new([0; 4]); 4]; src_buffer .iter_mut() .zip(tail.iter()) .for_each(|(d, s)| *d = *s); let mut dst_buffer = [U8x4::new([0; 4]); 4]; let src_pixels = _mm_loadu_si128(src_buffer.as_ptr() as *const __m128i); let dst_pixels = divide_alpha_4_pixels(src_pixels); _mm_storeu_si128(dst_buffer.as_mut_ptr() as *mut __m128i, dst_pixels); dst_buffer.iter().zip(tail).for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn divide_alpha_4_pixels(src_pixels: __m128i) -> __m128i { let zero = _mm_setzero_si128(); let alpha_mask = _mm_set1_epi32(0xff000000u32 as i32); let shuffle1 = _mm_set_epi8(5, 4, 5, 4, 5, 4, 5, 4, 1, 0, 1, 0, 1, 0, 1, 0); let shuffle2 = _mm_set_epi8(13, 12, 13, 12, 13, 12, 13, 12, 9, 8, 9, 8, 9, 8, 9, 8); let alpha_scale = _mm_set1_ps(255.0 * 256.0); let alpha_f32 = _mm_cvtepi32_ps(_mm_srli_epi32::<24>(src_pixels)); let recip_alpha_i32 = _mm_cvtps_epi32(_mm_div_ps(alpha_scale, alpha_f32)); // Recip alpha in Q8.8 format let recip_alpha_lo_q8_8 = _mm_shuffle_epi8(recip_alpha_i32, shuffle1); let recip_alpha_hi_q8_8 = _mm_shuffle_epi8(recip_alpha_i32, shuffle2); // Pixels components in format Q9.7 let components_lo_q9_7 = _mm_slli_epi16::<7>(_mm_unpacklo_epi8(src_pixels, zero)); let components_hi_q9_7 = _mm_slli_epi16::<7>(_mm_unpackhi_epi8(src_pixels, zero)); // Multiplied pixels components as i16. // // fn _mm_mulhrs_epi16(a: i16, b: i16) -> i16 { // let tmp: i32 = ((a as i32 * b as i32) >> 14) + 1; // (tmp >> 1) as i16 // } let max_value = _mm_set1_epi16(0xff); let res_components_lo_i16 = _mm_min_epu16( _mm_mulhrs_epi16(components_lo_q9_7, recip_alpha_lo_q8_8), max_value, ); let res_components_hi_i16 = _mm_min_epu16( _mm_mulhrs_epi16(components_hi_q9_7, recip_alpha_hi_q8_8), max_value, ); let alpha = _mm_and_si128(src_pixels, alpha_mask); let rgba = _mm_packus_epi16(res_components_lo_i16, res_components_hi_i16); _mm_blendv_epi8(rgba, alpha, alpha_mask) } fast_image_resize-5.3.0/src/alpha/u8x4/wasm32.rs000064400000000000000000000165441046102023000174710ustar 00000000000000use std::arch::wasm32::*; use crate::pixels::U8x4; use crate::wasm32_utils::u16x8_mul_add_shr16; use crate::{ImageView, ImageViewMut}; use super::native; pub(crate) unsafe fn multiply_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { multiply_alpha_row(src_row, dst_row); } } pub(crate) unsafe fn multiply_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { multiply_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); // A simple for-loop in this case is as fast as implementation with pre-reading for (src, dst) in src_dst { let mut pixels = v128_load(src.as_ptr() as *const v128); pixels = multiply_alpha_4_pixels(pixels); v128_store(dst.as_mut_ptr() as *mut v128, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); native::multiply_alpha_row(src_remainder, dst_reminder); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_row_inplace(row: &mut [U8x4]) { let mut chunks = row.chunks_exact_mut(4); // A simple for-loop in this case is as fast as implementation with pre-reading for chunk in &mut chunks { let mut pixels = v128_load(chunk.as_ptr() as *const v128); pixels = multiply_alpha_4_pixels(pixels); v128_store(chunk.as_mut_ptr() as *mut v128, pixels); } let tail = chunks.into_remainder(); if !tail.is_empty() { native::multiply_alpha_row_inplace(tail); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn multiply_alpha_4_pixels(pixels: v128) -> v128 { const FACTOR_MASK: v128 = i8x16(3, 3, 3, 3, 7, 7, 7, 7, 11, 11, 11, 11, 15, 15, 15, 15); let max_alpha = u32x4_splat(0xff000000); let factor_pixels = v128_or(u8x16_swizzle(pixels, FACTOR_MASK), max_alpha); let half = u16x8_splat(128); let src_u16_lo = u16x8_extend_low_u8x16(pixels); let factors = u16x8_extend_low_u8x16(factor_pixels); let mut dst_u16_lo = u16x8_add(u16x8_mul(src_u16_lo, factors), half); dst_u16_lo = u16x8_add(dst_u16_lo, u16x8_shr(dst_u16_lo, 8)); dst_u16_lo = u16x8_shr(dst_u16_lo, 8); let src_u16_hi = u16x8_extend_high_u8x16(pixels); let factors = u16x8_extend_high_u8x16(factor_pixels); let mut dst_u16_hi = u16x8_add(u16x8_mul(src_u16_hi, factors), half); dst_u16_hi = u16x8_add(dst_u16_hi, u16x8_shr(dst_u16_hi, 8)); dst_u16_hi = u16x8_shr(dst_u16_hi, 8); u8x16_narrow_i16x8(dst_u16_lo, dst_u16_hi) } // Divide pub(crate) unsafe fn divide_alpha( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) { let src_rows = src_view.iter_rows(0); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { divide_alpha_row(src_row, dst_row); } } pub(crate) unsafe fn divide_alpha_inplace(image_view: &mut impl ImageViewMut) { for row in image_view.iter_rows_mut(0) { divide_alpha_row_inplace(row); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_row(src_row: &[U8x4], dst_row: &mut [U8x4]) { let src_chunks = src_row.chunks_exact(4); let src_remainder = src_chunks.remainder(); let mut dst_chunks = dst_row.chunks_exact_mut(4); let src_dst = src_chunks.zip(&mut dst_chunks); // A simple for-loop in this case is faster than implementation with pre-reading for (src, dst) in src_dst { let mut pixels = v128_load(src.as_ptr() as *const v128); pixels = divide_alpha_4_pixels(pixels); v128_store(dst.as_mut_ptr() as *mut v128, pixels); } if !src_remainder.is_empty() { let dst_reminder = dst_chunks.into_remainder(); let mut src_buffer = [U8x4::new([0; 4]); 4]; src_buffer .iter_mut() .zip(src_remainder) .for_each(|(d, s)| *d = *s); let mut dst_buffer = [U8x4::new([0; 4]); 4]; let src_pixels = v128_load(src_buffer.as_ptr() as *const v128); let dst_pixels = divide_alpha_4_pixels(src_pixels); v128_store(dst_buffer.as_mut_ptr() as *mut v128, dst_pixels); dst_buffer .iter() .zip(dst_reminder) .for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_row_inplace(row: &mut [U8x4]) { let mut chunks = row.chunks_exact_mut(4); // A simple for-loop in this case is faster than implementation with pre-reading for chunk in &mut chunks { let mut pixels = v128_load(chunk.as_ptr() as *const v128); pixels = divide_alpha_4_pixels(pixels); v128_store(chunk.as_mut_ptr() as *mut v128, pixels); } let tail = chunks.into_remainder(); if !tail.is_empty() { let mut src_buffer = [U8x4::new([0; 4]); 4]; src_buffer .iter_mut() .zip(tail.iter()) .for_each(|(d, s)| *d = *s); let mut dst_buffer = [U8x4::new([0; 4]); 4]; let src_pixels = v128_load(src_buffer.as_ptr() as *const v128); let dst_pixels = divide_alpha_4_pixels(src_pixels); v128_store(dst_buffer.as_mut_ptr() as *mut v128, dst_pixels); dst_buffer.iter().zip(tail).for_each(|(s, d)| *d = *s); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn divide_alpha_4_pixels(pixels: v128) -> v128 { const FACTOR_LO_SHUFFLE: v128 = i8x16(0, 1, 0, 1, 0, 1, -1, -1, 2, 3, 2, 3, 2, 3, -1, -1); const FACTOR_HI_SHUFFLE: v128 = i8x16(4, 5, 4, 5, 4, 5, -1, -1, 6, 7, 6, 7, 6, 7, -1, -1); let alpha_scale = f32x4_splat(255.0 * 256.0); let alpha_f32 = f32x4_convert_i32x4(u32x4_shr(pixels, 24)); // In case of zero division the result will be u32::MAX or 0. let scaled_alpha_u32 = u32x4_trunc_sat_f32x4(f32x4_add( f32x4_div(alpha_scale, alpha_f32), f32x4_splat(0.5), )); // All u32::MAX values in arguments will interpreted as -1i32. // u16x8_narrow_i32x4() converts all negative values into 0. let scaled_alpha_u16 = u16x8_narrow_i32x4(scaled_alpha_u32, scaled_alpha_u32); let factor_lo_u16x8 = u8x16_swizzle(scaled_alpha_u16, FACTOR_LO_SHUFFLE); let factor_hi_u16x8 = u8x16_swizzle(scaled_alpha_u16, FACTOR_HI_SHUFFLE); let zero = u32x4_splat(0); let src_u16_lo = u8x16_shuffle::<0, 16, 0, 17, 0, 18, 0, 19, 0, 20, 0, 21, 0, 22, 0, 23>(zero, pixels); let src_u16_hi = u8x16_shuffle::<0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0, 30, 0, 31>(zero, pixels); let color_max = u16x8_splat(0xff); let dst_lo = u16x8_mul_add_shr16(src_u16_lo, factor_lo_u16x8, u32x4_splat(0x8000)); let dst_lo = u16x8_min(dst_lo, color_max); let dst_hi = u16x8_mul_add_shr16(src_u16_hi, factor_hi_u16x8, u32x4_splat(0x8000)); let dst_hi = u16x8_min(dst_hi, color_max); let alpha = v128_and(pixels, u32x4_splat(0xff000000)); let rgb = u8x16_narrow_i16x8(dst_lo, dst_hi); v128_or(rgb, alpha) } fast_image_resize-5.3.0/src/array_chunks.rs000064400000000000000000000165021046102023000171430ustar 00000000000000use core::array::IntoIter; use core::iter::Take; use core::iter::{FusedIterator, Iterator}; use std::mem::MaybeUninit; /// An iterator over `N` elements of the iterator at a time. /// /// The chunks do not overlap. If `N` does not divide the length of the /// iterator, then the last up to `N-1` elements will be omitted. #[derive(Debug, Clone)] #[must_use = "iterators are lazy and do nothing unless consumed"] pub struct ArrayChunks { iter: I, remainder: Option>>, } impl ArrayChunks where I: Iterator, { pub fn new(iter: I) -> Self { assert_ne!(N, 0, "chunk size must be non-zero"); Self { iter, remainder: None, } } /// Returns an iterator over the remaining elements of the original iterator /// that are not going to be returned by this iterator. The returned /// iterator will yield at most `N-1` elements. #[inline] pub fn into_remainder(self) -> Option>> { self.remainder } } impl Iterator for ArrayChunks where I: Iterator, { type Item = [I::Item; N]; #[inline] fn next(&mut self) -> Option { match next_chunk(&mut self.iter) { Ok(chunk) => Some(chunk), Err(remainder) => { // Make sure to not override `self.remainder` with an empty array // when `next` is called after `ArrayChunks` exhaustion. self.remainder.get_or_insert(remainder); None } } // self.try_for_each(ControlFlow::Break).break_value() } #[inline] fn size_hint(&self) -> (usize, Option) { let (lower, upper) = self.iter.size_hint(); (lower / N, upper.map(|n| n / N)) } #[inline] fn count(self) -> usize { self.iter.count() / N } } #[inline] fn next_chunk( iter: &mut I, ) -> Result<[I::Item; N], Take>> { iter_next_chunk(iter) } impl FusedIterator for ArrayChunks where I: FusedIterator {} impl ExactSizeIterator for ArrayChunks where I: ExactSizeIterator, { #[inline] fn len(&self) -> usize { self.iter.len() / N } } /// Pulls `N` items from `iter` and returns them as an array. If the iterator /// yields fewer than `N` items, `Err` is returned containing an iterator over /// the already yielded items. /// /// Since the iterator is passed as a mutable reference and this function calls /// `next` at most `N` times, the iterator can still be used afterwards to /// retrieve the remaining items. /// /// If `iter.next()` panicks, all items already yielded by the iterator are /// dropped. /// /// Used for [`Iterator::next_chunk`]. #[inline] fn iter_next_chunk( iter: &mut impl Iterator, ) -> Result<[T; N], Take>> { let mut array = uninit_array::(); let r = iter_next_chunk_erased(&mut array, iter); match r { Ok(()) => { // SAFETY: All elements of `array` were populated. Ok(unsafe { array_assume_init(array) }) } Err(initialized) => { // SAFETY: Only the first `initialized` elements were populated let array = unsafe { array_assume_init(array) }; Err(array.into_iter().take(initialized)) // Err(unsafe { IntoIter::new_unchecked(array, 0..initialized) }) } } } #[inline(always)] const fn uninit_array() -> [MaybeUninit; N] { // SAFETY: An uninitialized `[MaybeUninit<_>; LEN]` is valid. unsafe { MaybeUninit::<[MaybeUninit; N]>::uninit().assume_init() } } #[inline(always)] unsafe fn array_assume_init(array: [MaybeUninit; N]) -> [T; N] { // SAFETY: // * The caller guarantees that all elements of the array are initialized // * `MaybeUninit` and T are guaranteed to have the same layout // * `MaybeUninit` does not drop, so there are no double-frees // And thus the conversion is safe let ret = unsafe { // core::intrinsics::assert_inhabited::<[T; N]>(); (&array as *const _ as *const [T; N]).read() }; // FIXME: required to avoid `~const Destruct` bound core::mem::forget(array); ret } /// Version of [`iter_next_chunk`] using a passed-in slice in order to avoid /// needing to monomorphize for every array length. /// /// Unfortunately this loop has two exit conditions, the buffer filling up /// or the iterator running out of items, making it tend to optimize poorly. #[inline] fn iter_next_chunk_erased( buffer: &mut [MaybeUninit], iter: &mut impl Iterator, ) -> Result<(), usize> { let mut guard = Guard { array_mut: buffer, initialized: 0, }; while guard.initialized < guard.array_mut.len() { let Some(item) = iter.next() else { // Unlike `try_from_fn_erased`, we want to keep the partial results, // so we need to defuse the guard instead of using `?`. let initialized = guard.initialized; core::mem::forget(guard); return Err(initialized); }; // SAFETY: The loop condition ensures we have space to push the item unsafe { guard.push_unchecked(item) }; } core::mem::forget(guard); Ok(()) } /// Panic guard for incremental initialization of arrays. /// /// Disarm the guard with `mem::forget` once the array has been initialized. /// /// # Safety /// /// All write accesses to this structure are unsafe and must maintain a correct /// count of `initialized` elements. /// /// To minimize indirection fields are still pub but callers should at least use /// `push_unchecked` to signal that something unsafe is going on. struct Guard<'a, T> { /// The array to be initialized. pub array_mut: &'a mut [MaybeUninit], /// The number of items that have been initialized so far. pub initialized: usize, } impl Guard<'_, T> { /// Adds an item to the array and updates the initialized item counter. /// /// # Safety /// /// No more than N elements must be initialized. #[inline] pub unsafe fn push_unchecked(&mut self, item: T) { // SAFETY: If `initialized` was correct before and the caller does not // invoke this method more than N times then writes will be in-bounds // and slots will not be initialized more than once. unsafe { self.array_mut .get_unchecked_mut(self.initialized) .write(item); self.initialized += 1; } } } impl Drop for Guard<'_, T> { fn drop(&mut self) { debug_assert!(self.initialized <= self.array_mut.len()); // SAFETY: this slice will contain only initialized objects. unsafe { core::ptr::drop_in_place(slice_assume_init_mut( self.array_mut.get_unchecked_mut(..self.initialized), )); } } } #[inline(always)] unsafe fn slice_assume_init_mut(slice: &mut [MaybeUninit]) -> &mut [T] { // SAFETY: similar to safety notes for `slice_get_ref`, but we have a // mutable reference which is also guaranteed to be valid for writes. unsafe { &mut *(slice as *mut [MaybeUninit] as *mut [T]) } } fast_image_resize-5.3.0/src/change_components_type.rs000064400000000000000000000122521046102023000212030ustar 00000000000000use crate::pixels::{ F32x2, F32x3, F32x4, InnerPixel, IntoPixelComponent, U16x2, U16x3, U16x4, U8x2, U8x3, U8x4, F32, I32, U16, U8, }; use crate::{ try_pixel_type, DifferentDimensionsError, ImageView, ImageViewMut, IntoImageView, IntoImageViewMut, MappingError, PixelTrait, PixelType, }; pub fn change_type_of_pixel_components( src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> { macro_rules! map_dst { ( $src_pt:ty, $dst_type:expr, $(($dst_enum:path, $dst_pt:ty)),* ) => { match $dst_type { $( $dst_enum => change_components_type::<$src_pt, $dst_pt>(src_image, dst_image), )* _ => Err(MappingError::UnsupportedCombinationOfImageTypes), } } } let src_pixel_type = try_pixel_type(src_image)?; let dst_pixel_type = try_pixel_type(dst_image)?; use PixelType as PT; #[cfg(not(feature = "only_u8x4"))] match src_pixel_type { PixelType::U8 => map_dst!( U8, dst_pixel_type, (PT::U8, U8), (PT::U16, U16), (PT::I32, I32), (PT::F32, F32) ), PixelType::U8x2 => map_dst!( U8x2, dst_pixel_type, (PT::U8x2, U8x2), (PT::U16x2, U16x2), (PT::F32x2, F32x2) ), PixelType::U8x3 => map_dst!( U8x3, dst_pixel_type, (PT::U8x3, U8x3), (PT::U16x3, U16x3), (PT::F32x3, F32x3) ), PixelType::U8x4 => map_dst!( U8x4, dst_pixel_type, (PT::U8x4, U8x4), (PT::U16x4, U16x4), (PT::F32x4, F32x4) ), PixelType::U16 => map_dst!( U16, dst_pixel_type, (PT::U8, U8), (PT::U16, U16), (PT::I32, I32), (PT::F32, F32) ), PixelType::U16x2 => map_dst!( U16x2, dst_pixel_type, (PT::U8x2, U8x2), (PT::U16x2, U16x2), (PT::F32x2, F32x2) ), PixelType::U16x3 => map_dst!( U16x3, dst_pixel_type, (PT::U8x3, U8x3), (PT::U16x3, U16x3), (PT::F32x3, F32x3) ), PixelType::U16x4 => map_dst!( U16x4, dst_pixel_type, (PT::U8x4, U8x4), (PT::U16x4, U16x4), (PT::F32x4, F32x4) ), PixelType::I32 => map_dst!( I32, dst_pixel_type, (PT::U8, U8), (PT::U16, U16), (PT::I32, I32), (PT::F32, F32) ), PixelType::F32 => map_dst!( F32, dst_pixel_type, (PT::U8, U8), (PT::U16, U16), (PT::I32, I32), (PT::F32, F32) ), PixelType::F32x2 => map_dst!( F32x2, dst_pixel_type, (PT::U8x2, U8x2), (PT::U16x2, U16x2), (PT::F32x2, F32x2) ), PixelType::F32x3 => map_dst!( F32x3, dst_pixel_type, (PT::U8x3, U8x3), (PT::U16x3, U16x3), (PT::F32x3, F32x3) ), PixelType::F32x4 => map_dst!( F32x4, dst_pixel_type, (PT::U8x4, U8x4), (PT::U16x4, U16x4), (PT::F32x4, F32x4) ), } #[cfg(feature = "only_u8x4")] match src_pixel_type { PixelType::U8x4 => map_dst!(U8x4, dst_pixel_type, (PT::U8x4, U8x4)), _ => Err(MappingError::UnsupportedCombinationOfImageTypes), } } #[inline(always)] fn change_components_type( src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> where S: PixelTrait, D: PixelTrait, ::Component: IntoPixelComponent<::Component>, { match (src_image.image_view::(), dst_image.image_view_mut::()) { (Some(src_view), Some(mut dst_view)) => { change_type_of_pixel_components_typed(&src_view, &mut dst_view).map_err(|e| e.into()) } _ => Err(MappingError::UnsupportedCombinationOfImageTypes), } } pub fn change_type_of_pixel_components_typed( src_image: &impl ImageView, dst_image: &mut impl ImageViewMut, ) -> Result<(), DifferentDimensionsError> where S: InnerPixel, D: InnerPixel, ::Component: IntoPixelComponent<::Component>, { if src_image.width() != dst_image.width() || src_image.height() != dst_image.height() { return Err(DifferentDimensionsError); } for (s_row, d_row) in src_image.iter_rows(0).zip(dst_image.iter_rows_mut(0)) { let s_components = S::components(s_row); let d_components = D::components_mut(d_row); for (&s_comp, d_comp) in s_components.iter().zip(d_components) { *d_comp = s_comp.into_component(); } } Ok(()) } fast_image_resize-5.3.0/src/color/mappers.rs000064400000000000000000000022351046102023000172350ustar 00000000000000use crate::PixelComponentMapper; fn gamma_into_linear(input: f32) -> f32 { input.powf(2.2) } fn linear_into_gamma(input: f32) -> f32 { input.powf(1.0 / 2.2) } /// Create mapper to convert an image from Gamma 2.2 to linear colorspace and back. pub fn create_gamma_22_mapper() -> PixelComponentMapper { PixelComponentMapper::new(gamma_into_linear, linear_into_gamma) } /// https://en.wikipedia.org/wiki/SRGB#From_sRGB_to_CIE_XYZ /// http://www.ericbrasseur.org/gamma.html?i=2#formulas fn srgb_to_linear(input: f32) -> f32 { if input < 0.04045 { input / 12.92 } else { const A: f32 = 0.055; ((input + A) / (1. + A)).powf(2.4) } } /// https://en.wikipedia.org/wiki/SRGB#From_CIE_XYZ_to_sRGB /// http://www.ericbrasseur.org/gamma.html?i=2#formulas fn linear_to_srgb(input: f32) -> f32 { if input < 0.0031308 { 12.92 * input } else { const A: f32 = 0.055; (1. + A) * input.powf(1. / 2.4) - A } } /// Create mapper to convert an image from sRGB to linear RGB colorspace and back. pub fn create_srgb_mapper() -> PixelComponentMapper { PixelComponentMapper::new(srgb_to_linear, linear_to_srgb) } fast_image_resize-5.3.0/src/color/mod.rs000064400000000000000000000305051046102023000163460ustar 00000000000000//! Functions and structs for working with colorspace and gamma. use num_traits::bounds::UpperBounded; use num_traits::Zero; use crate::pixels::{ GetCount, InnerPixel, IntoPixelComponent, PixelComponent, PixelType, U16x2, U16x3, U16x4, U8x2, U8x3, U8x4, Values, U16, U8, }; use crate::{ try_pixel_type, ImageView, ImageViewMut, IntoImageView, IntoImageViewMut, MappingError, PixelTrait, }; pub(crate) mod mappers; trait FromF32 { fn from_f32(x: f32) -> Self; } impl FromF32 for u8 { fn from_f32(x: f32) -> Self { x as Self } } impl FromF32 for u16 { fn from_f32(x: f32) -> Self { x as Self } } struct MappingTable([Out; SIZE]); impl MappingTable where Out: PixelComponent + Zero + UpperBounded + FromF32 + Into, { pub fn new(map_func: &F) -> Self where F: Fn(f32) -> f32, { let mut table: [Out; SIZE] = [Out::zero(); SIZE]; table.iter_mut().enumerate().for_each(|(input, output)| { let input_f32 = input as f32 / (SIZE - 1) as f32; *output = Out::from_f32((map_func(input_f32) * Out::max_value().into()).round()); }); Self(table) } fn map(&self, src_buffer: &[In], dst_buffer: &mut [Out]) where In: PixelComponent + Into, { for (&src, dst) in src_buffer.iter().zip(dst_buffer) { *dst = self.0[src.into()]; } } fn map_inplace(&self, buffer: &mut [Out]) where Out: Into, { for c in buffer.iter_mut() { let i: usize = (*c).into(); *c = self.0[i]; } } fn map_with_gaps(&self, src_buffer: &[In], dst_buffer: &mut [Out], gap_step: usize) where In: IntoPixelComponent + Into, { for (i, (&src, dst)) in src_buffer.iter().zip(dst_buffer).enumerate() { if (i + 1) % gap_step != 0 { *dst = self.0[src.into()]; } else { *dst = src.into_component(); } } } fn map_with_gaps_inplace(&self, buffer: &mut [Out], gap_step: usize) where Out: Into, { for (i, c) in buffer.iter_mut().enumerate() { if (i + 1) % gap_step != 0 { let i: usize = (*c).into(); *c = self.0[i]; } } } pub fn map_image( &self, src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> where S: PixelTrait, ::Component: PixelComponent> + IntoPixelComponent + Into, D: PixelTrait, { let (src_view, dst_view) = match (src_image.image_view::(), dst_image.image_view_mut::()) { (Some(src_view), Some(dst_view)) => (src_view, dst_view), _ => return Err(MappingError::UnsupportedCombinationOfImageTypes), }; self.map_image_typed(src_view, dst_view); Ok(()) } pub fn map_image_typed( &self, src_view: impl ImageView, mut dst_view: impl ImageViewMut, ) where S: InnerPixel, ::Component: PixelComponent> + IntoPixelComponent + Into, D: InnerPixel, { for (s_row, d_row) in src_view.iter_rows(0).zip(dst_view.iter_rows_mut(0)) { let s_comp = S::components(s_row); let d_comp = D::components_mut(d_row); match S::CountOfComponents::count() { 2 => self.map_with_gaps(s_comp, d_comp, 2), // Don't map alpha channel 4 => self.map_with_gaps(s_comp, d_comp, 4), // Don't map alpha channel _ => self.map(s_comp, d_comp), } } } pub fn map_image_inplace( &self, image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> where Out: Into, S: PixelTrait, { if let Some(image_view) = image.image_view_mut::() { self.map_image_inplace_typed(image_view); Ok(()) } else { Err(MappingError::UnsupportedCombinationOfImageTypes) } } pub fn map_image_inplace_typed(&self, mut image_view: impl ImageViewMut) where Out: Into, S: InnerPixel, { for row in image_view.iter_rows_mut(0) { let comp = S::components_mut(row); match S::CountOfComponents::count() { 2 => self.map_with_gaps_inplace(comp, 2), // Don't map alpha channel 4 => self.map_with_gaps_inplace(comp, 4), // Don't map alpha channel _ => self.map_inplace(comp), } } } } struct MappingTablesGroup { u8_u8: Box>, u8_u16: Box>, u16_u8: Box>, u16_u16: Box>, } /// Mapper of pixel's components. /// /// This structure holds tables for mapping values of pixel's /// components in forward and backward directions. /// /// All pixel types except `I32` and `F32xN` are supported. /// /// Source and destination images may have different bit depth of one /// pixel component. /// But count of components must be equal. /// For example, you can convert `U8x3` image with sRGB colorspace into /// `U16x3` image with linear colorspace. /// /// Alpha channel from such pixel types as `U8x2`, `U8x4`, `U16x2` and `U16x4` /// is not mapped with tables. This component is transformed into destination /// component type with help of [IntoPixelComponent] trait. pub struct PixelComponentMapper { forward_mapping_tables: MappingTablesGroup, backward_mapping_tables: MappingTablesGroup, } impl PixelComponentMapper { /// Create an instance of the structure by filling its tables with /// given functions. /// /// Each function takes one argument with the value of the pixel component /// converted into `f32` in the range `[0.0, 1.0]`. /// The return value must also be `f32` in the range `[0.0, 1.0]`. /// /// Example: /// ``` /// # use fast_image_resize::PixelComponentMapper; /// # /// fn gamma_into_linear(input: f32) -> f32 { /// input.powf(2.2) /// } /// /// fn linear_into_gamma(input: f32) -> f32 { /// input.powf(1.0 / 2.2) /// } /// /// let gamma22_to_linear = PixelComponentMapper::new( /// gamma_into_linear, /// linear_into_gamma, /// ); /// ``` pub fn new(forward_map_func: FF, backward_map_func: BF) -> Self where FF: Fn(f32) -> f32, BF: Fn(f32) -> f32, { Self { forward_mapping_tables: MappingTablesGroup { u8_u8: Box::new(MappingTable::new(&forward_map_func)), u8_u16: Box::new(MappingTable::new(&forward_map_func)), u16_u8: Box::new(MappingTable::new(&forward_map_func)), u16_u16: Box::new(MappingTable::new(&forward_map_func)), }, backward_mapping_tables: MappingTablesGroup { u8_u8: Box::new(MappingTable::new(&backward_map_func)), u8_u16: Box::new(MappingTable::new(&backward_map_func)), u16_u8: Box::new(MappingTable::new(&backward_map_func)), u16_u16: Box::new(MappingTable::new(&backward_map_func)), }, } } fn map( tables: &MappingTablesGroup, src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> { let src_pixel_type = try_pixel_type(src_image)?; let dst_pixel_type = try_pixel_type(dst_image)?; if src_image.width() != dst_image.width() || src_image.height() != dst_image.height() { return Err(MappingError::DifferentDimensions); } use PixelType as PT; macro_rules! match_img { ( $tables: ident, $(($p8: path, $pt8: tt, $p16: path, $pt16: tt),)* ) => { match (src_pixel_type, dst_pixel_type) { $( ($p8, $p8) => $tables.u8_u8.map_image::<$pt8, $pt8>( src_image, dst_image, ), ($p8, $p16) => $tables.u8_u16.map_image::<$pt8, $pt16>( src_image, dst_image, ), ($p16, $p8) => $tables.u16_u8.map_image::<$pt16, $pt8>( src_image, dst_image, ), ($p16, $p16) => $tables.u16_u16.map_image::<$pt16, $pt16>( src_image, dst_image, ), )* _ => return Err(MappingError::UnsupportedCombinationOfImageTypes), } }; } #[cfg(not(feature = "only_u8x4"))] { match_img!( tables, (PT::U8, U8, PT::U16, U16), (PT::U8x2, U8x2, PT::U16x2, U16x2), (PT::U8x3, U8x3, PT::U16x3, U16x3), (PT::U8x4, U8x4, PT::U16x4, U16x4), ) } #[cfg(feature = "only_u8x4")] match (src_pixel_type, dst_pixel_type) { (PT::U8x4, PT::U8x4) => tables.u8_u8.map_image::(src_image, dst_image), _ => return Err(MappingError::UnsupportedCombinationOfImageTypes), } } fn map_inplace( tables: &MappingTablesGroup, image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> { let pixel_type = try_pixel_type(image)?; use PixelType as PT; macro_rules! match_img { ( $tables: ident, $image: ident, $(($p8: path, $pt8: tt, $p16: path, $pt16: tt),)* ) => { match pixel_type { $( $p8 => $tables.u8_u8.map_image_inplace::<$pt8>($image), $p16 => $tables.u16_u16.map_image_inplace::<$pt16>($image), )* _ => return Err(MappingError::UnsupportedCombinationOfImageTypes), } }; } #[cfg(not(feature = "only_u8x4"))] { match_img!( tables, image, (PT::U8, U8, PT::U16, U16), (PT::U8x2, U8x2, PT::U16x2, U16x2), (PT::U8x3, U8x3, PT::U16x3, U16x3), (PT::U8x4, U8x4, PT::U16x4, U16x4), ) } #[cfg(feature = "only_u8x4")] match pixel_type { PT::U8x4 => tables.u8_u8.map_image_inplace::(image), _ => return Err(MappingError::UnsupportedCombinationOfImageTypes), } } /// Mapping in the forward direction of pixel's components of source image /// into corresponding components of destination image. pub fn forward_map( &self, src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> { Self::map(&self.forward_mapping_tables, src_image, dst_image) } pub fn forward_map_inplace( &self, image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> { Self::map_inplace(&self.forward_mapping_tables, image) } /// Mapping in the backward direction of pixel's components of source image /// into corresponding components of destination image. pub fn backward_map( &self, src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> { Self::map(&self.backward_mapping_tables, src_image, dst_image) } pub fn backward_map_inplace( &self, image: &mut impl IntoImageViewMut, ) -> Result<(), MappingError> { Self::map_inplace(&self.backward_mapping_tables, image) } } fast_image_resize-5.3.0/src/convolution/f32x1/avx2.rs000064400000000000000000000110501046102023000205450ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::F32; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_rows(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_rows([src_row], [dst_row], &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_rows( src_rows: [&[F32]; ROWS_COUNT], dst_rows: [&mut [F32]; ROWS_COUNT], coefficients_chunks: &[CoefficientsChunk], ) { let mut ll_buf = [0f64; 2]; for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sums = [_mm256_set1_pd(0.); ROWS_COUNT]; let mut coeffs = coeffs_chunk.values; let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeff03_f64x4 = simd_utils::loadu_pd256(k, 0); let coeff47_f64x4 = simd_utils::loadu_pd256(k, 4); for i in 0..ROWS_COUNT { let mut sum = sums[i]; let source = simd_utils::loadu_ps256(src_rows[i], x); let pixels03_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<0>(source)); sum = _mm256_add_pd(sum, _mm256_mul_pd(pixels03_f64x4, coeff03_f64x4)); let pixels47_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<1>(source)); sum = _mm256_add_pd(sum, _mm256_mul_pd(pixels47_f64x4, coeff47_f64x4)); sums[i] = sum; } x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff03_f64x4 = simd_utils::loadu_pd256(k, 0); for i in 0..ROWS_COUNT { let mut sum = sums[i]; let source = simd_utils::loadu_ps(src_rows[i], x); let pixels03_f64x4 = _mm256_cvtps_pd(source); sum = _mm256_add_pd(sum, _mm256_mul_pd(pixels03_f64x4, coeff03_f64x4)); sums[i] = sum; } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_f64x4 = _mm256_set_pd(0., 0., k[1], k[0]); for i in 0..ROWS_COUNT { let pixel0 = src_rows[i].get_unchecked(x).0; let pixel1 = src_rows[i].get_unchecked(x + 1).0; let pixel01_f64x4 = _mm256_set_pd(0., 0., pixel1 as f64, pixel0 as f64); sums[i] = _mm256_add_pd(sums[i], _mm256_mul_pd(pixel01_f64x4, coeff01_f64x4)); } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_f64x4 = _mm256_set_pd(0., 0., 0., k); for i in 0..ROWS_COUNT { let pixel0 = src_rows[i].get_unchecked(x).0; let pixel0_f64x4 = _mm256_set_pd(0., 0., 0., pixel0 as f64); sums[i] = _mm256_add_pd(sums[i], _mm256_mul_pd(pixel0_f64x4, coeff0_f64x4)); } } for i in 0..ROWS_COUNT { let sum_f64x2 = _mm_add_pd( _mm256_extractf128_pd::<0>(sums[i]), _mm256_extractf128_pd::<1>(sums[i]), ); _mm_storeu_pd(ll_buf.as_mut_ptr(), sum_f64x2); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = (ll_buf[0] + ll_buf[1]) as f32; } } } fast_image_resize-5.3.0/src/convolution/f32x1/mod.rs000064400000000000000000000052211046102023000204470ustar 00000000000000use super::{Coefficients, Convolution}; use crate::convolution::vertical_f32::vert_convolution_f32; use crate::cpu_extensions::CpuExtensions; use crate::pixels::F32; use crate::{ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; // #[cfg(target_arch = "aarch64")] // mod neon; #[cfg(target_arch = "x86_64")] mod sse4; // #[cfg(target_arch = "wasm32")] // mod wasm32; type P = F32; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let coeffs_ref = &coeffs; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, coeffs_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let coeffs_ref = &coeffs; try_process_in_threads_v! { vert_convolution( src_view, dst_view, offset, coeffs_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, coeffs), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, coeffs), _ => native::horiz_convolution(src_view, dst_view, offset, coeffs), } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, cpu_extensions: CpuExtensions, ) { vert_convolution_f32(src_view, dst_view, offset, coeffs, cpu_extensions); } fast_image_resize-5.3.0/src/convolution/f32x1/native.rs000064400000000000000000000032351046102023000211610ustar 00000000000000use crate::convolution::Coefficients; use crate::pixels::F32; use crate::{ImageView, ImageViewMut}; pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (dst_pixel, coeffs_chunk) in dst_row.iter_mut().zip(&coefficients_chunks) { let first_x_src = coeffs_chunk.start as usize; let end_x_src = first_x_src + coeffs_chunk.values.len(); let mut ss = 0.; let mut src_pixels = unsafe { src_row.get_unchecked(first_x_src..end_x_src) }; let mut coefs = coeffs_chunk.values; (coefs, src_pixels) = convolution_by_chunks::<8>(coefs, src_pixels, &mut ss); for (&k, &pixel) in coefs.iter().zip(src_pixels) { ss += pixel.0 as f64 * k; } dst_pixel.0 = ss as f32; } } } #[inline(always)] fn convolution_by_chunks<'a, 'b, const CHUNK_SIZE: usize>( coefs: &'a [f64], src_pixels: &'b [F32], ss: &mut f64, ) -> (&'a [f64], &'b [F32]) { let coef_chunks = coefs.chunks_exact(CHUNK_SIZE); let coefs = coef_chunks.remainder(); let pixel_chunks = src_pixels.chunks_exact(CHUNK_SIZE); let src_pixels = pixel_chunks.remainder(); for (ks, pixels) in coef_chunks.zip(pixel_chunks) { for (&k, &pixel) in ks.iter().zip(pixels) { *ss += pixel.0 as f64 * k; } } (coefs, src_pixels) } fast_image_resize-5.3.0/src/convolution/f32x1/sse4.rs000064400000000000000000000073201046102023000205500ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::F32; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_rows(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_rows([src_row], [dst_row], &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_rows( src_rows: [&[F32]; ROWS_COUNT], dst_rows: [&mut [F32]; ROWS_COUNT], coefficients_chunks: &[CoefficientsChunk], ) { let mut ll_buf = [0f64; 2]; for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sums = [_mm_set1_pd(0.); ROWS_COUNT]; let mut coeffs = coeffs_chunk.values; let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff01_f64x2 = simd_utils::loadu_pd(k, 0); let coeff23_f64x2 = simd_utils::loadu_pd(k, 2); for i in 0..ROWS_COUNT { let mut sum = sums[i]; let source = simd_utils::loadu_ps(src_rows[i], x); let pixel01_f64 = _mm_cvtps_pd(source); sum = _mm_add_pd(sum, _mm_mul_pd(pixel01_f64, coeff01_f64x2)); let pixel23_f64 = _mm_cvtps_pd(_mm_movehl_ps(source, source)); sum = _mm_add_pd(sum, _mm_mul_pd(pixel23_f64, coeff23_f64x2)); sums[i] = sum; } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_f64x2 = simd_utils::loadu_pd(k, 0); for i in 0..ROWS_COUNT { let pixel0 = src_rows[i].get_unchecked(x).0; let pixel1 = src_rows[i].get_unchecked(x + 1).0; let pixel01_f64 = _mm_set_pd(pixel1 as f64, pixel0 as f64); sums[i] = _mm_add_pd(sums[i], _mm_mul_pd(pixel01_f64, coeff01_f64x2)); } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_f64x2 = _mm_set1_pd(k); for i in 0..ROWS_COUNT { let pixel0 = src_rows[i].get_unchecked(x).0; let pixel0_f64 = _mm_set_pd(0., pixel0 as f64); sums[i] = _mm_add_pd(sums[i], _mm_mul_pd(pixel0_f64, coeff0_f64x2)); } } for i in 0..ROWS_COUNT { _mm_storeu_pd(ll_buf.as_mut_ptr(), sums[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = (ll_buf[0] + ll_buf[1]) as f32; } } } fast_image_resize-5.3.0/src/convolution/f32x2/avx2.rs000064400000000000000000000101641046102023000205530ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::F32x2; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_rows(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_rows([src_row], [dst_row], &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_rows( src_rows: [&[F32x2]; ROWS_COUNT], dst_rows: [&mut [F32x2]; ROWS_COUNT], coefficients_chunks: &[CoefficientsChunk], ) { let mut ll_buf = [0f64; 2]; for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut ll_sum = [_mm256_set1_pd(0.); ROWS_COUNT]; let mut coeffs = coeffs_chunk.values; let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff0_f64x4 = _mm256_set_pd(k[1], k[1], k[0], k[0]); let coeff1_f64x4 = _mm256_set_pd(k[3], k[3], k[2], k[2]); for i in 0..ROWS_COUNT { let mut sum = ll_sum[i]; let pixels04_f32x8 = simd_utils::loadu_ps256(src_rows[i], x); let pixels01_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<0>(pixels04_f32x8)); sum = _mm256_add_pd(sum, _mm256_mul_pd(pixels01_f64x4, coeff0_f64x4)); let pixels23_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<1>(pixels04_f32x8)); sum = _mm256_add_pd(sum, _mm256_mul_pd(pixels23_f64x4, coeff1_f64x4)); ll_sum[i] = sum; } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff_f64x4 = _mm256_set_pd(k[1], k[1], k[0], k[0]); for i in 0..ROWS_COUNT { let mut sum = ll_sum[i]; let pixels01_f32x4 = simd_utils::loadu_ps(src_rows[i], x); let pixels01_f64x4 = _mm256_cvtps_pd(pixels01_f32x4); sum = _mm256_add_pd(sum, _mm256_mul_pd(pixels01_f64x4, coeff_f64x4)); ll_sum[i] = sum; } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_f64x4 = _mm256_set1_pd(k); for i in 0..ROWS_COUNT { let mut sum = ll_sum[i]; let pixel = src_rows[i].get_unchecked(x); let pixel0_f64x4 = _mm256_set_pd(0., 0., pixel.0[1] as f64, pixel.0[0] as f64); sum = _mm256_add_pd(sum, _mm256_mul_pd(pixel0_f64x4, coeff0_f64x4)); ll_sum[i] = sum; } } for i in 0..ROWS_COUNT { let sum_f64x2 = _mm_add_pd( _mm256_extractf128_pd::<0>(ll_sum[i]), _mm256_extractf128_pd::<1>(ll_sum[i]), ); _mm_storeu_pd(ll_buf.as_mut_ptr(), sum_f64x2); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = ll_buf.map(|v| v as f32); } } } fast_image_resize-5.3.0/src/convolution/f32x2/mod.rs000064400000000000000000000052261046102023000204550ustar 00000000000000use crate::convolution::vertical_f32::vert_convolution_f32; use crate::cpu_extensions::CpuExtensions; use crate::pixels::F32x2; use crate::{ImageView, ImageViewMut}; use super::{Coefficients, Convolution}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; // #[cfg(target_arch = "aarch64")] // mod neon; #[cfg(target_arch = "x86_64")] mod sse4; // #[cfg(target_arch = "wasm32")] // mod wasm32; type P = F32x2; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let coeffs_ref = &coeffs; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, coeffs_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let coeffs_ref = &coeffs; try_process_in_threads_v! { vert_convolution( src_view, dst_view, offset, coeffs_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, coeffs), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, coeffs), _ => native::horiz_convolution(src_view, dst_view, offset, coeffs), } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, cpu_extensions: CpuExtensions, ) { vert_convolution_f32(src_view, dst_view, offset, coeffs, cpu_extensions); } fast_image_resize-5.3.0/src/convolution/f32x2/native.rs000064400000000000000000000017771046102023000211730ustar 00000000000000use crate::convolution::Coefficients; use crate::pixels::F32x2; use crate::{ImageView, ImageViewMut}; pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (dst_pixel, coeffs_chunk) in dst_row.iter_mut().zip(&coefficients_chunks) { let first_x_src = coeffs_chunk.start as usize; let mut ss = [0.; 2]; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, &src_pixel) in coeffs_chunk.values.iter().zip(src_pixels) { for (s, c) in ss.iter_mut().zip(src_pixel.0) { *s += c as f64 * k; } } dst_pixel.0 = ss.map(|v| v as f32); } } } fast_image_resize-5.3.0/src/convolution/f32x2/sse4.rs000064400000000000000000000064301046102023000205520ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::F32x2; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_rows(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_rows([src_row], [dst_row], &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_rows( src_rows: [&[F32x2]; ROWS_COUNT], dst_rows: [&mut [F32x2]; ROWS_COUNT], coefficients_chunks: &[CoefficientsChunk], ) { let mut ll_buf = [0f64; 2]; for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut ll_sum = [_mm_set1_pd(0.); ROWS_COUNT]; let mut coeffs = coeffs_chunk.values; let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_f64x2 = _mm_set1_pd(k[0]); let coeff1_f64x2 = _mm_set1_pd(k[1]); for i in 0..ROWS_COUNT { let mut sum = ll_sum[i]; let source = simd_utils::loadu_ps(src_rows[i], x); let pixel0_f64 = _mm_cvtps_pd(source); sum = _mm_add_pd(sum, _mm_mul_pd(pixel0_f64, coeff0_f64x2)); let pixel1_f64 = _mm_cvtps_pd(_mm_movehl_ps(source, source)); sum = _mm_add_pd(sum, _mm_mul_pd(pixel1_f64, coeff1_f64x2)); ll_sum[i] = sum; } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_f64x2 = _mm_set1_pd(k); for i in 0..ROWS_COUNT { let mut sum = ll_sum[i]; let pixel = src_rows[i].get_unchecked(x); let source = _mm_set_ps(0., 0., pixel.0[1], pixel.0[0]); let pixel0_f64 = _mm_cvtps_pd(source); sum = _mm_add_pd(sum, _mm_mul_pd(pixel0_f64, coeff0_f64x2)); ll_sum[i] = sum; } } for i in 0..ROWS_COUNT { _mm_storeu_pd(ll_buf.as_mut_ptr(), ll_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = ll_buf.map(|v| v as f32); } } } fast_image_resize-5.3.0/src/convolution/f32x3/avx2.rs000064400000000000000000000205531046102023000205570ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::{F32x3, InnerPixel}; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_rows(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_rows([src_row], [dst_row], &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_rows( src_rows: [&[F32x3]; ROWS_COUNT], dst_rows: [&mut [F32x3]; ROWS_COUNT], coefficients_chunks: &[CoefficientsChunk], ) { /* |R0 G0 B0| |R1 G1 B1| |R2 G2| |00 01 02| |03 04 05| |06 07| |B2| |R3 G3 B3| |R4 G4 B4| |R5| |00| |01 02 03| |04 05 06| |07| |G5 B5| |R6 G6 B6| |R7 G7 B7| |00 01| |02 03 04| |05 06 07| */ let mut rg_buf = [0f64; 2]; let mut br_buf = [0f64; 2]; let mut gb_buf = [0f64; 2]; for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut rgbr_sums = [_mm256_set1_pd(0.); ROWS_COUNT]; let mut gbrg_sums = [_mm256_set1_pd(0.); ROWS_COUNT]; let mut brgb_sums = [_mm256_set1_pd(0.); ROWS_COUNT]; let mut coeffs = coeffs_chunk.values; let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeff0001_f64x4 = _mm256_set_pd(k[1], k[0], k[0], k[0]); let coeff1122_f64x4 = _mm256_set_pd(k[2], k[2], k[1], k[1]); let coeff2333_f64x4 = _mm256_set_pd(k[3], k[3], k[3], k[2]); let coeff4445_f64x4 = _mm256_set_pd(k[5], k[4], k[4], k[4]); let coeff5566_f64x4 = _mm256_set_pd(k[6], k[6], k[5], k[5]); let coeff6777_f64x4 = _mm256_set_pd(k[7], k[7], k[7], k[6]); for i in 0..ROWS_COUNT { let c = x * 3; let components = F32x3::components(src_rows[i]); let rgb0rgb1rg2 = simd_utils::loadu_ps256(components, c); let rgb0r1_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<0>(rgb0rgb1rg2)); rgbr_sums[i] = _mm256_add_pd(rgbr_sums[i], _mm256_mul_pd(rgb0r1_f64x4, coeff0001_f64x4)); let gb1rg2_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<1>(rgb0rgb1rg2)); gbrg_sums[i] = _mm256_add_pd(gbrg_sums[i], _mm256_mul_pd(gb1rg2_f64x4, coeff1122_f64x4)); let b2rgb3rgb4r5 = simd_utils::loadu_ps256(components, c + 8); let b2rgb3_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<0>(b2rgb3rgb4r5)); brgb_sums[i] = _mm256_add_pd(brgb_sums[i], _mm256_mul_pd(b2rgb3_f64x4, coeff2333_f64x4)); let rgb4r5_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<1>(b2rgb3rgb4r5)); rgbr_sums[i] = _mm256_add_pd(rgbr_sums[i], _mm256_mul_pd(rgb4r5_f64x4, coeff4445_f64x4)); let gb5rgb6rgb7 = simd_utils::loadu_ps256(components, c + 16); let gb5rg6_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<0>(gb5rgb6rgb7)); gbrg_sums[i] = _mm256_add_pd(gbrg_sums[i], _mm256_mul_pd(gb5rg6_f64x4, coeff5566_f64x4)); let b6rgb7_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<1>(gb5rgb6rgb7)); brgb_sums[i] = _mm256_add_pd(brgb_sums[i], _mm256_mul_pd(b6rgb7_f64x4, coeff6777_f64x4)); } x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff0001_f64x4 = _mm256_set_pd(k[1], k[0], k[0], k[0]); let coeff1122_f64x4 = _mm256_set_pd(k[2], k[2], k[1], k[1]); let coeff2333_f64x4 = _mm256_set_pd(k[3], k[3], k[3], k[2]); for i in 0..ROWS_COUNT { let c = x * 3; let components = F32x3::components(src_rows[i]); let rgb0rgb1rg2 = simd_utils::loadu_ps256(components, c); let rgb0r1_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<0>(rgb0rgb1rg2)); rgbr_sums[i] = _mm256_add_pd(rgbr_sums[i], _mm256_mul_pd(rgb0r1_f64x4, coeff0001_f64x4)); let gb1rg2_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<1>(rgb0rgb1rg2)); gbrg_sums[i] = _mm256_add_pd(gbrg_sums[i], _mm256_mul_pd(gb1rg2_f64x4, coeff1122_f64x4)); let b2rgb3 = simd_utils::loadu_ps(components, c + 8); let b2rgb3_f64x4 = _mm256_cvtps_pd(b2rgb3); brgb_sums[i] = _mm256_add_pd(brgb_sums[i], _mm256_mul_pd(b2rgb3_f64x4, coeff2333_f64x4)); } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0001_f64x4 = _mm256_set_pd(k[1], k[0], k[0], k[0]); let coeff11xx_f64x4 = _mm256_set_pd(0., 0., k[1], k[1]); for i in 0..ROWS_COUNT { let c = x * 3; let components = F32x3::components(src_rows[i]); let rgb0r1 = simd_utils::loadu_ps(components, c); let rgb0r1_f64x4 = _mm256_cvtps_pd(rgb0r1); rgbr_sums[i] = _mm256_add_pd(rgbr_sums[i], _mm256_mul_pd(rgb0r1_f64x4, coeff0001_f64x4)); let g1 = *components.get_unchecked(c + 4); let b1 = *components.get_unchecked(c + 5); let gb1xx = _mm_set_ps(0., 0., b1, g1); let gb1xx_f64x4 = _mm256_cvtps_pd(gb1xx); gbrg_sums[i] = _mm256_add_pd(gbrg_sums[i], _mm256_mul_pd(gb1xx_f64x4, coeff11xx_f64x4)); } x += 2; } for &k in coeffs { let coeff0000_f64x2 = _mm256_set1_pd(k); for i in 0..ROWS_COUNT { let pixel = src_rows[i].get_unchecked(x); let rgb0x = _mm_set_ps(0., pixel.0[2], pixel.0[1], pixel.0[0]); let rgb0x_f64x4 = _mm256_cvtps_pd(rgb0x); rgbr_sums[i] = _mm256_add_pd(rgbr_sums[i], _mm256_mul_pd(rgb0x_f64x4, coeff0000_f64x2)); } x += 1; } for i in 0..ROWS_COUNT { let rg0_f64x2 = _mm256_extractf128_pd::<0>(rgbr_sums[i]); let rg1_f64x2 = _mm256_extractf128_pd::<1>(gbrg_sums[i]); let rg_f64x2 = _mm_add_pd(rg0_f64x2, rg1_f64x2); _mm_storeu_pd(rg_buf.as_mut_ptr(), rg_f64x2); let br0_f64x2 = _mm256_extractf128_pd::<1>(rgbr_sums[i]); let br1_f64x2 = _mm256_extractf128_pd::<0>(brgb_sums[i]); let br_f64x2 = _mm_add_pd(br0_f64x2, br1_f64x2); _mm_storeu_pd(br_buf.as_mut_ptr(), br_f64x2); let gb0_f64x2 = _mm256_extractf128_pd::<0>(gbrg_sums[i]); let gb1_f64x2 = _mm256_extractf128_pd::<1>(brgb_sums[i]); let gb_f64x2 = _mm_add_pd(gb0_f64x2, gb1_f64x2); _mm_storeu_pd(gb_buf.as_mut_ptr(), gb_f64x2); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = [ (rg_buf[0] + br_buf[1]) as f32, (rg_buf[1] + gb_buf[0]) as f32, (br_buf[0] + gb_buf[1]) as f32, ]; } } } fast_image_resize-5.3.0/src/convolution/f32x3/mod.rs000064400000000000000000000052261046102023000204560ustar 00000000000000use crate::convolution::vertical_f32::vert_convolution_f32; use crate::cpu_extensions::CpuExtensions; use crate::pixels::F32x3; use crate::{ImageView, ImageViewMut}; use super::{Coefficients, Convolution}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; // #[cfg(target_arch = "aarch64")] // mod neon; #[cfg(target_arch = "x86_64")] mod sse4; // #[cfg(target_arch = "wasm32")] // mod wasm32; type P = F32x3; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let coeffs_ref = &coeffs; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, coeffs_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let coeffs_ref = &coeffs; try_process_in_threads_v! { vert_convolution( src_view, dst_view, offset, coeffs_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, coeffs), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, coeffs), _ => native::horiz_convolution(src_view, dst_view, offset, coeffs), } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, cpu_extensions: CpuExtensions, ) { vert_convolution_f32(src_view, dst_view, offset, coeffs, cpu_extensions); } fast_image_resize-5.3.0/src/convolution/f32x3/native.rs000064400000000000000000000017771046102023000211740ustar 00000000000000use crate::convolution::Coefficients; use crate::pixels::F32x3; use crate::{ImageView, ImageViewMut}; pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (dst_pixel, coeffs_chunk) in dst_row.iter_mut().zip(&coefficients_chunks) { let first_x_src = coeffs_chunk.start as usize; let mut ss = [0.; 3]; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, &src_pixel) in coeffs_chunk.values.iter().zip(src_pixels) { for (s, c) in ss.iter_mut().zip(src_pixel.0) { *s += c as f64 * k; } } dst_pixel.0 = ss.map(|v| v as f32); } } } fast_image_resize-5.3.0/src/convolution/f32x3/sse4.rs000064400000000000000000000142101046102023000205460ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::{F32x3, InnerPixel}; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_2_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_2_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_rows(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 2; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_rows([src_row], [dst_row], &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_rows( src_rows: [&[F32x3]; ROWS_COUNT], dst_rows: [&mut [F32x3]; ROWS_COUNT], coefficients_chunks: &[CoefficientsChunk], ) { /* |R0 G0 B0| |R1| |00 01 02| |03| |G1 B1| |R2 G2| |00 01| |02 03| |B2| |R3 G3 B3| |00| |01 02 03| */ let mut rg_buf = [0f64; 2]; let mut br_buf = [0f64; 2]; let mut gb_buf = [0f64; 2]; for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut rg_sums = [_mm_set1_pd(0.); ROWS_COUNT]; let mut br_sums = [_mm_set1_pd(0.); ROWS_COUNT]; let mut gb_sums = [_mm_set1_pd(0.); ROWS_COUNT]; let mut coeffs = coeffs_chunk.values; let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff00_f64x2 = _mm_set1_pd(k[0]); let coeff01_f64x2 = _mm_set_pd(k[1], k[0]); let coeff11_f64x2 = _mm_set1_pd(k[1]); let coeff22_f64x2 = _mm_set1_pd(k[2]); let coeff23_f64x2 = _mm_set_pd(k[3], k[2]); let coeff33_f64x2 = _mm_set1_pd(k[3]); for i in 0..ROWS_COUNT { let c = x * 3; let components = F32x3::components(src_rows[i]); let rgb0r1 = simd_utils::loadu_ps(components, c); let rg0_f64x2 = _mm_cvtps_pd(rgb0r1); rg_sums[i] = _mm_add_pd(rg_sums[i], _mm_mul_pd(rg0_f64x2, coeff00_f64x2)); let b0r1_f64x2 = _mm_cvtps_pd(_mm_movehl_ps(rgb0r1, rgb0r1)); br_sums[i] = _mm_add_pd(br_sums[i], _mm_mul_pd(b0r1_f64x2, coeff01_f64x2)); let gb1rg2 = simd_utils::loadu_ps(components, c + 4); let gb1_f64x2 = _mm_cvtps_pd(gb1rg2); gb_sums[i] = _mm_add_pd(gb_sums[i], _mm_mul_pd(gb1_f64x2, coeff11_f64x2)); let rg2_f64x2 = _mm_cvtps_pd(_mm_movehl_ps(gb1rg2, gb1rg2)); rg_sums[i] = _mm_add_pd(rg_sums[i], _mm_mul_pd(rg2_f64x2, coeff22_f64x2)); let b2rgb3 = simd_utils::loadu_ps(components, c + 8); let b2r3_f64x2 = _mm_cvtps_pd(b2rgb3); br_sums[i] = _mm_add_pd(br_sums[i], _mm_mul_pd(b2r3_f64x2, coeff23_f64x2)); let gb3_f64x2 = _mm_cvtps_pd(_mm_movehl_ps(b2rgb3, b2rgb3)); gb_sums[i] = _mm_add_pd(gb_sums[i], _mm_mul_pd(gb3_f64x2, coeff33_f64x2)); } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff00_f64x2 = _mm_set1_pd(k[0]); let coeff01_f64x2 = _mm_set_pd(k[1], k[0]); let coeff11_f64x2 = _mm_set1_pd(k[1]); for i in 0..ROWS_COUNT { let c = x * 3; let components = F32x3::components(src_rows[i]); let rgb0r1 = simd_utils::loadu_ps(components, c); let rg0_f64x2 = _mm_cvtps_pd(rgb0r1); rg_sums[i] = _mm_add_pd(rg_sums[i], _mm_mul_pd(rg0_f64x2, coeff00_f64x2)); let b0r1_f64x2 = _mm_cvtps_pd(_mm_movehl_ps(rgb0r1, rgb0r1)); br_sums[i] = _mm_add_pd(br_sums[i], _mm_mul_pd(b0r1_f64x2, coeff01_f64x2)); let g1 = *components.get_unchecked(c + 4); let b1 = *components.get_unchecked(c + 5); let gb1_f64x2 = _mm_set_pd(b1 as f64, g1 as f64); gb_sums[i] = _mm_add_pd(gb_sums[i], _mm_mul_pd(gb1_f64x2, coeff11_f64x2)); } x += 2; } for &k in coeffs { let coeff00_f64x2 = _mm_set1_pd(k); let coeff0x_f64x2 = _mm_set_pd(0., k); for i in 0..ROWS_COUNT { let pixel = src_rows[i].get_unchecked(x); let rgb0x = _mm_set_ps(0., pixel.0[2], pixel.0[1], pixel.0[0]); let rg0_f64x2 = _mm_cvtps_pd(rgb0x); rg_sums[i] = _mm_add_pd(rg_sums[i], _mm_mul_pd(rg0_f64x2, coeff00_f64x2)); let b0x_f64x2 = _mm_cvtps_pd(_mm_movehl_ps(rgb0x, rgb0x)); br_sums[i] = _mm_add_pd(br_sums[i], _mm_mul_pd(b0x_f64x2, coeff0x_f64x2)); } x += 1; } for i in 0..ROWS_COUNT { _mm_storeu_pd(rg_buf.as_mut_ptr(), rg_sums[i]); _mm_storeu_pd(br_buf.as_mut_ptr(), br_sums[i]); _mm_storeu_pd(gb_buf.as_mut_ptr(), gb_sums[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = [ (rg_buf[0] + br_buf[1]) as f32, (rg_buf[1] + gb_buf[0]) as f32, (br_buf[0] + gb_buf[1]) as f32, ]; } } } fast_image_resize-5.3.0/src/convolution/f32x4/avx2.rs000064400000000000000000000063751046102023000205660ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::F32x4; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_rows(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_rows([src_row], [dst_row], &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_rows( src_rows: [&[F32x4]; ROWS_COUNT], dst_rows: [&mut [F32x4]; ROWS_COUNT], coefficients_chunks: &[CoefficientsChunk], ) { for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut rgba_sums = [_mm256_set1_pd(0.); ROWS_COUNT]; let mut coeffs = coeffs_chunk.values; let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_f64x4 = _mm256_set1_pd(k[0]); let coeff1_f64x4 = _mm256_set1_pd(k[1]); for r in 0..ROWS_COUNT { let pixel01 = simd_utils::loadu_ps256(src_rows[r], x); let pixel0_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<0>(pixel01)); rgba_sums[r] = _mm256_add_pd(rgba_sums[r], _mm256_mul_pd(pixel0_f64x4, coeff0_f64x4)); let pixels1_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<1>(pixel01)); rgba_sums[r] = _mm256_add_pd(rgba_sums[r], _mm256_mul_pd(pixels1_f64x4, coeff1_f64x4)); } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_f64x4 = _mm256_set1_pd(k); for r in 0..ROWS_COUNT { let pixel0 = simd_utils::loadu_ps(src_rows[r], x); let pixel0_f64x4 = _mm256_cvtps_pd(pixel0); rgba_sums[r] = _mm256_add_pd(rgba_sums[r], _mm256_mul_pd(pixel0_f64x4, coeff0_f64x4)); } } for r in 0..ROWS_COUNT { let dst_pixel = dst_rows[r].get_unchecked_mut(dst_x); let rgba_f32x4 = _mm256_cvtpd_ps(rgba_sums[r]); _mm_storeu_ps(dst_pixel.0.as_mut_ptr(), rgba_f32x4); } } } fast_image_resize-5.3.0/src/convolution/f32x4/mod.rs000064400000000000000000000052261046102023000204570ustar 00000000000000use crate::convolution::vertical_f32::vert_convolution_f32; use crate::cpu_extensions::CpuExtensions; use crate::pixels::F32x4; use crate::{ImageView, ImageViewMut}; use super::{Coefficients, Convolution}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; // #[cfg(target_arch = "aarch64")] // mod neon; #[cfg(target_arch = "x86_64")] mod sse4; // #[cfg(target_arch = "wasm32")] // mod wasm32; type P = F32x4; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let coeffs_ref = &coeffs; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, coeffs_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let coeffs_ref = &coeffs; try_process_in_threads_v! { vert_convolution( src_view, dst_view, offset, coeffs_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, coeffs), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, coeffs), _ => native::horiz_convolution(src_view, dst_view, offset, coeffs), } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, cpu_extensions: CpuExtensions, ) { vert_convolution_f32(src_view, dst_view, offset, coeffs, cpu_extensions); } fast_image_resize-5.3.0/src/convolution/f32x4/native.rs000064400000000000000000000017771046102023000211750ustar 00000000000000use crate::convolution::Coefficients; use crate::pixels::F32x4; use crate::{ImageView, ImageViewMut}; pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (dst_pixel, coeffs_chunk) in dst_row.iter_mut().zip(&coefficients_chunks) { let first_x_src = coeffs_chunk.start as usize; let mut ss = [0.; 4]; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, &src_pixel) in coeffs_chunk.values.iter().zip(src_pixels) { for (s, c) in ss.iter_mut().zip(src_pixel.0) { *s += c as f64 * k; } } dst_pixel.0 = ss.map(|v| v as f32); } } } fast_image_resize-5.3.0/src/convolution/f32x4/sse4.rs000064400000000000000000000055151046102023000205570ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::F32x4; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_rows(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_rows([src_row], [dst_row], &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_rows( src_rows: [&[F32x4]; ROWS_COUNT], dst_rows: [&mut [F32x4]; ROWS_COUNT], coefficients_chunks: &[CoefficientsChunk], ) { let mut rg_buf = [0f64; 2]; let mut ba_buf = [0f64; 2]; for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut rg_sums = [_mm_set1_pd(0.); ROWS_COUNT]; let mut ba_sums = [_mm_set1_pd(0.); ROWS_COUNT]; for &k in coeffs_chunk.values { let coeffs_f64x2 = _mm_set1_pd(k); for r in 0..ROWS_COUNT { let pixel = simd_utils::loadu_ps(src_rows[r], x); let rg_f64x2 = _mm_cvtps_pd(pixel); rg_sums[r] = _mm_add_pd(rg_sums[r], _mm_mul_pd(rg_f64x2, coeffs_f64x2)); let ba_f64x2 = _mm_cvtps_pd(_mm_movehl_ps(pixel, pixel)); ba_sums[r] = _mm_add_pd(ba_sums[r], _mm_mul_pd(ba_f64x2, coeffs_f64x2)); } x += 1; } for i in 0..ROWS_COUNT { _mm_storeu_pd(rg_buf.as_mut_ptr(), rg_sums[i]); _mm_storeu_pd(ba_buf.as_mut_ptr(), ba_sums[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = [ rg_buf[0] as f32, rg_buf[1] as f32, ba_buf[0] as f32, ba_buf[1] as f32, ]; } } } fast_image_resize-5.3.0/src/convolution/filters.rs000064400000000000000000000155531046102023000205060ustar 00000000000000use std::f64::consts::PI; use std::fmt::{Debug, Formatter}; use thiserror::Error; type FilterFn = fn(f64) -> f64; /// Description of custom filter for image convolution. #[derive(Clone, Copy)] pub struct Filter { /// Name of filter name: &'static str, /// Filter function func: FilterFn, /// Minimal "radius" of kernel in pixels support: f64, } impl PartialEq for Filter { fn eq(&self, other: &Self) -> bool { self.support == other.support && self.name == other.name } } impl Eq for Filter {} impl Debug for Filter { fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { f.debug_struct("Filter") .field("name", &self.name) .field("support", &self.support) .finish() } } #[derive(Error, Debug, Clone, Copy, PartialEq, Eq)] pub enum CreateFilterError { /// Value of 'support' argument must be finite and greater than 0.0 #[error("Value of 'support' argument must be finite and greater than 0.0")] InvalidSupport, } impl Filter { /// # Arguments /// /// * `name` - Name of filter /// * `func` - Filter function /// * `support` - Minimal "radius" of kernel in pixels pub fn new( name: &'static str, func: FilterFn, support: f64, ) -> Result { if support.is_finite() && support > 0.0 { Ok(Self { name, func, support, }) } else { Err(CreateFilterError::InvalidSupport) } } /// Name of filter pub fn name(&self) -> &'static str { self.name } /// Minimal "radius" of kernel in pixels pub fn support(&self) -> f64 { self.support } } /// Type of filter used for image convolution. #[derive(Default, Clone, Copy, Debug, PartialEq, Eq)] #[non_exhaustive] pub enum FilterType { /// Each pixel of source image contributes to one pixel of the /// destination image with identical weights. For upscaling is equivalent /// of `Nearest` resize algorithm. /// /// Minimal kernel size 1x1 px. Box, /// Bilinear filter calculates the output pixel value using linear /// interpolation on all pixels that may contribute to the output value. /// /// Minimal kernel size 2x2 px. Bilinear, /// Hamming filter has the same performance as `Bilinear` filter while /// providing the image downscaling quality comparable to bicubic /// (`CatmulRom` or `Mitchell`). Produces a sharper image than `Bilinear`, /// doesn't have dislocations on local level like with `Box`. /// The filter doesn't show good quality for the image upscaling. /// /// Minimal kernel size 2x2 px. Hamming, /// Catmull-Rom bicubic filter calculates the output pixel value using /// cubic interpolation on all pixels that may contribute to the output /// value. /// /// Minimal kernel size 4x4 px. CatmullRom, /// Mitchell–Netravali bicubic filter calculate the output pixel value /// using cubic interpolation on all pixels that may contribute to the /// output value. /// /// Minimal kernel size 4x4 px. Mitchell, /// Gaussian filter with a standard deviation of 0.5. /// /// Minimal kernel size 6x6 px. Gaussian, /// Lanczos3 filter calculate the output pixel value using a high-quality /// Lanczos filter (a truncated sinc) on all pixels that may contribute /// to the output value. /// /// Minimal kernel size 6x6 px. #[default] Lanczos3, /// Custom filter function. /// /// # Examples /// /// ```rust /// use fast_image_resize::{Filter, FilterType}; /// /// fn sinc_filter(mut x: f64) -> f64 { /// if x == 0.0 { /// 1.0 /// } else { /// x *= std::f64::consts::PI; /// x.sin() / x /// } /// } /// /// fn lanczos4_filter(x: f64) -> f64 { /// if (-4.0..4.0).contains(&x) { /// sinc_filter(x) * sinc_filter(x / 4.) /// } else { /// 0.0 /// } /// } /// /// let lanczos4 = FilterType::Custom( /// Filter::new("Lanczos4", lanczos4_filter, 4.0).unwrap() /// ); /// /// assert_eq!( /// format!("{:?}", lanczos4), /// "Custom(Filter { name: \"Lanczos4\", support: 4.0 })" /// ); /// ``` Custom(Filter), } /// Returns reference to filter function and value of `filter_support`. #[inline] pub(crate) fn get_filter_func(filter_type: FilterType) -> (FilterFn, f64) { match filter_type { FilterType::Box => (box_filter, 0.5), FilterType::Bilinear => (bilinear_filter, 1.0), FilterType::Hamming => (hamming_filter, 1.0), FilterType::CatmullRom => (catmul_filter, 2.0), FilterType::Mitchell => (mitchell_filter, 2.0), FilterType::Gaussian => (gaussian_filter, 3.0), FilterType::Lanczos3 => (lanczos_filter, 3.0), FilterType::Custom(custom) => (custom.func, custom.support), } } #[inline] fn box_filter(x: f64) -> f64 { if x > -0.5 && x <= 0.5 { 1.0 } else { 0.0 } } #[inline] fn bilinear_filter(mut x: f64) -> f64 { x = x.abs(); if x < 1.0 { 1.0 - x } else { 0.0 } } #[inline] fn hamming_filter(mut x: f64) -> f64 { x = x.abs(); if x == 0.0 { 1.0 } else if x >= 1.0 { 0.0 } else { x *= PI; (0.54 + 0.46 * x.cos()) * x.sin() / x } } /// Catmull-Rom (bicubic) filter /// https://en.wikipedia.org/wiki/Bicubic_interpolation#Bicubic_convolution_algorithm #[inline] fn catmul_filter(mut x: f64) -> f64 { const A: f64 = -0.5; x = x.abs(); if x < 1.0 { ((A + 2.) * x - (A + 3.)) * x * x + 1. } else if x < 2.0 { (((x - 5.) * x + 8.) * x - 4.) * A } else { 0.0 } } /// Mitchell–Netravali filter (B = C = 1/3) /// https://en.wikipedia.org/wiki/Mitchell%E2%80%93Netravali_filters #[inline] fn mitchell_filter(mut x: f64) -> f64 { x = x.abs(); if x < 1.0 { (7. * x / 6. - 2.) * x * x + 16. / 18. } else if x < 2.0 { ((2. - 7. * x / 18.) * x - 10. / 3.) * x + 16. / 9. } else { 0.0 } } /// The Gaussian Function. /// `r` is the standard deviation. fn gaussian(x: f64, r: f64) -> f64 { ((2.0 * PI).sqrt() * r).recip() * (-x.powi(2) / (2.0 * r.powi(2))).exp() } /// Calculate the gaussian function with a /// standard deviation of 0.5. fn gaussian_filter(x: f64) -> f64 { if (-3.0..3.0).contains(&x) { gaussian(x, 0.5) } else { 0.0 } } #[inline] fn sinc_filter(mut x: f64) -> f64 { if x == 0.0 { 1.0 } else { x *= PI; x.sin() / x } } #[inline] fn lanczos_filter(x: f64) -> f64 { // truncated sinc if (-3.0..3.0).contains(&x) { sinc_filter(x) * sinc_filter(x / 3.) } else { 0.0 } } fast_image_resize-5.3.0/src/convolution/i32x1/mod.rs000064400000000000000000000033511046102023000204540ustar 00000000000000use super::{Coefficients, Convolution}; use crate::pixels::I32; use crate::{CpuExtensions, ImageView, ImageViewMut}; mod native; type P = I32; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, _cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let coeffs_ref = &coeffs; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, coeffs_ref, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, _cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let coeffs_ref = &coeffs; try_process_in_threads_v! { vert_convolution( src_view, dst_view, offset, coeffs_ref, ); } } } #[inline(always)] fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coefficients: &Coefficients, ) { native::horiz_convolution(src_view, dst_view, offset, coefficients); } #[inline(always)] fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coefficients: &Coefficients, ) { native::vert_convolution(src_view, dst_view, offset, coefficients); } fast_image_resize-5.3.0/src/convolution/i32x1/native.rs000064400000000000000000000034401046102023000211620ustar 00000000000000use crate::convolution::Coefficients; use crate::pixels::I32; use crate::{ImageView, ImageViewMut}; pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (dst_pixel, coeffs_chunk) in dst_row.iter_mut().zip(&coefficients_chunks) { let first_x_src = coeffs_chunk.start as usize; let mut ss = 0.; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, &pixel) in coeffs_chunk.values.iter().zip(src_pixels) { ss += pixel.0 as f64 * k; } dst_pixel.0 = ss.round() as i32; } } } pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) { let coefficients_chunks = coeffs.get_chunks(); let dst_rows = dst_view.iter_rows_mut(0); let start_src_x = offset as usize; for (&coeffs_chunk, dst_row) in coefficients_chunks.iter().zip(dst_rows) { let first_y_src = coeffs_chunk.start; let mut src_x = start_src_x; for dst_pixel in dst_row.iter_mut() { let mut ss = 0.; let src_rows = src_view.iter_rows(first_y_src); for (src_row, &k) in src_rows.zip(coeffs_chunk.values) { let src_pixel = unsafe { src_row.get_unchecked(src_x) }; ss += src_pixel.0 as f64 * k; } dst_pixel.0 = ss.round() as i32; src_x += 1; } } } fast_image_resize-5.3.0/src/convolution/macros.rs000064400000000000000000000064561046102023000203240ustar 00000000000000macro_rules! constify_imm8 { ($imm8:expr, $expand:ident) => { #[allow(overflowing_literals)] match ($imm8) & 0b0011_1111 { 0 => {} 1 => $expand!(1), 2 => $expand!(2), 3 => $expand!(3), 4 => $expand!(4), 5 => $expand!(5), 6 => $expand!(6), 7 => $expand!(7), 8 => $expand!(8), 9 => $expand!(9), 10 => $expand!(10), 12 => $expand!(12), 13 => $expand!(13), 14 => $expand!(14), 15 => $expand!(15), 16 => $expand!(16), 17 => $expand!(17), 18 => $expand!(18), 19 => $expand!(19), 20 => $expand!(20), 21 => $expand!(21), 22 => $expand!(22), 23 => $expand!(23), 24 => $expand!(24), 25 => $expand!(25), 26 => $expand!(26), 27 => $expand!(27), 28 => $expand!(28), 29 => $expand!(29), 30 => $expand!(30), 31 => $expand!(31), _ => unreachable!(), } }; } #[cfg(target_arch = "aarch64")] macro_rules! constify_64_imm8 { ($imm8:expr, $expand:ident) => { #[allow(overflowing_literals)] match ($imm8) & 0b0111_1111 { 0 => {} 1 => $expand!(1), 2 => $expand!(2), 3 => $expand!(3), 4 => $expand!(4), 5 => $expand!(5), 6 => $expand!(6), 7 => $expand!(7), 8 => $expand!(8), 9 => $expand!(9), 10 => $expand!(10), 12 => $expand!(12), 13 => $expand!(13), 14 => $expand!(14), 15 => $expand!(15), 16 => $expand!(16), 17 => $expand!(17), 18 => $expand!(18), 19 => $expand!(19), 20 => $expand!(20), 21 => $expand!(21), 22 => $expand!(22), 23 => $expand!(23), 24 => $expand!(24), 25 => $expand!(25), 26 => $expand!(26), 27 => $expand!(27), 28 => $expand!(28), 29 => $expand!(29), 30 => $expand!(30), 31 => $expand!(31), 32 => $expand!(32), 33 => $expand!(33), 34 => $expand!(34), 35 => $expand!(35), 36 => $expand!(36), 37 => $expand!(37), 38 => $expand!(38), 39 => $expand!(39), 40 => $expand!(40), 41 => $expand!(41), 42 => $expand!(42), 43 => $expand!(43), 44 => $expand!(44), 45 => $expand!(45), 46 => $expand!(46), 47 => $expand!(47), 48 => $expand!(48), 49 => $expand!(49), 50 => $expand!(50), 51 => $expand!(51), 52 => $expand!(52), 53 => $expand!(53), 54 => $expand!(54), 55 => $expand!(55), 56 => $expand!(56), 57 => $expand!(57), 58 => $expand!(58), 59 => $expand!(59), 60 => $expand!(60), 61 => $expand!(61), 62 => $expand!(62), 63 => $expand!(63), _ => unreachable!(), } }; } fast_image_resize-5.3.0/src/convolution/mod.rs000064400000000000000000000122001046102023000175770ustar 00000000000000pub use filters::*; use crate::pixels::InnerPixel; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[macro_use] mod macros; mod filters; #[macro_use] mod optimisations; mod u8x4; mod vertical_u8; cfg_if::cfg_if! { if #[cfg(not(feature = "only_u8x4"))] { mod u8x1; mod u8x2; mod u8x3; mod u16x1; mod u16x2; mod u16x3; mod u16x4; mod i32x1; mod f32x1; mod f32x2; mod f32x3; mod f32x4; mod vertical_u16; mod vertical_f32; } } pub(crate) trait Convolution: InnerPixel { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ); fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ); } #[derive(Debug, Clone, Copy)] pub(crate) struct Bound { pub start: u32, pub size: u32, } #[derive(Debug, Clone, Default)] pub(crate) struct Coefficients { pub values: Vec, pub window_size: usize, pub bounds: Vec, } #[derive(Debug, Clone, Copy)] pub(crate) struct CoefficientsChunk<'a> { pub start: u32, pub values: &'a [f64], } impl Coefficients { pub fn get_chunks(&self) -> Vec> { let mut coeffs = self.values.as_slice(); let mut res = Vec::with_capacity(self.bounds.len()); for bound in &self.bounds { let (left, right) = coeffs.split_at(self.window_size); coeffs = right; let size = bound.size as usize; res.push(CoefficientsChunk { start: bound.start, values: &left[0..size], }); } res } } pub(crate) fn precompute_coefficients( in_size: u32, in0: f64, // Left/top border for cropping in1: f64, // Right/bottom border for cropping out_size: u32, filter: fn(f64) -> f64, filter_support: f64, adaptive_kernel_size: bool, ) -> Coefficients { if in_size == 0 || out_size == 0 { return Coefficients::default(); } let scale = (in1 - in0) / out_size as f64; if scale <= 0. { return Coefficients::default(); } let filter_scale = if adaptive_kernel_size { scale.max(1.0) } else { 1.0 }; // Determine filter radius size (length of resampling filter) let filter_radius = filter_support * filter_scale; // Maximum number of coeffs per out pixel let window_size = filter_radius.ceil() as usize * 2 + 1; // Optimization: replace division by filter_scale // with multiplication by recip_filter_scale let recip_filter_scale = 1.0 / filter_scale; let count_of_coeffs = window_size * out_size as usize; let mut coeffs: Vec = Vec::with_capacity(count_of_coeffs); let mut bounds: Vec = Vec::with_capacity(out_size as usize); for out_x in 0..out_size { // Find the point in the input image corresponding to the center // of the current pixel in the output image. let in_center = in0 + (out_x as f64 + 0.5) * scale; // x_min and x_max are slice bounds for the input pixels relevant // to the output pixel we are calculating. Pixel x is relevant // if and only if (x >= x_min) && (x < x_max). // Invariant: 0 <= x_min < x_max <= width let x_min = (in_center - filter_radius).floor().max(0.) as u32; let x_max = (in_center + filter_radius).ceil().min(in_size as f64) as u32; let cur_index = coeffs.len(); let mut ww: f64 = 0.0; // Optimisation for follow for-cycle: // (x + 0.5) - in_center => x - (in_center - 0.5) => x - center let center = in_center - 0.5; let mut bound_start = x_min; let mut bound_end = x_max; // Calculate the weight of each input pixel from the given x-range. for x in x_min..x_max { let w: f64 = filter((x as f64 - center) * recip_filter_scale); if x == bound_start && w == 0. { // Don't use zero coefficients at the start of bound; bound_start += 1; } else { coeffs.push(w); ww += w; } } for &c in coeffs.iter().rev() { if bound_end <= bound_start || c != 0. { break; } // Don't use zero coefficients at the end of bound; bound_end -= 1; } if ww != 0.0 { // Normalise values of weights. // The sum of weights must be equal to 1.0. coeffs[cur_index..].iter_mut().for_each(|w| *w /= ww); } // Remaining values should stay empty if they are used despite x_max. coeffs.resize(cur_index + window_size, 0.); bounds.push(Bound { start: bound_start, size: bound_end - bound_start, }); } Coefficients { values: coeffs, window_size, bounds, } } fast_image_resize-5.3.0/src/convolution/optimisations.rs000064400000000000000000000200611046102023000217260ustar 00000000000000use crate::convolution::Coefficients; // This code is based on C-implementation from Pillow-SIMD package for Python // https://github.com/uploadcare/pillow-simd const fn get_clip_table() -> [u8; 1280] { let mut table = [0u8; 1280]; let mut i: usize = 640; while i < 640 + 255 { table[i] = (i - 640) as u8; i += 1; } while i < 1280 { table[i] = 255; i += 1; } table } // Handles values form -640 to 639. static CLIP8_LOOKUPS: [u8; 1280] = get_clip_table(); // 8 bits for a result. Filter can have negative areas. // In one case, the sum of the coefficients will be negative, // in the other it will be more than 1.0. That is why we need // two extra bits for overflow and i32 type. const PRECISION_BITS: u8 = 32 - 8 - 2; // We use i16 type to store coefficients. const MAX_COEFFS_PRECISION: u8 = 16 - 1; #[derive(Debug, Clone)] pub(crate) struct CoefficientsI16Chunk { pub start: u32, values: Vec, } impl CoefficientsI16Chunk { #[inline(always)] pub fn values(&self) -> &[i16] { &self.values } } pub(crate) struct Normalizer16 { precision: u8, chunks: Vec, } impl Normalizer16 { #[inline] pub fn new(coefficients: Coefficients) -> Self { let max_weight = coefficients .values .iter() .max_by(|&x, &y| x.partial_cmp(y).unwrap()) .unwrap_or(&0.0) .to_owned(); let mut precision = 0u8; for cur_precision in 0..PRECISION_BITS { precision = cur_precision; let next_value: i32 = (max_weight * (1 << (precision + 1)) as f64).round() as i32; if next_value >= (1 << MAX_COEFFS_PRECISION) { // The next value will be outside the range, so stop break; } } debug_assert!(precision >= 4); // required for some SIMD optimisations let mut chunks = Vec::with_capacity(coefficients.bounds.len()); if coefficients.window_size > 0 { let scale = (1 << precision) as f64; let coef_chunks = coefficients.values.chunks_exact(coefficients.window_size); for (chunk, bound) in coef_chunks.zip(&coefficients.bounds) { let chunk_i16: Vec = chunk .iter() .take(bound.size as usize) .map(|&v| (v * scale).round() as i16) .collect(); chunks.push(CoefficientsI16Chunk { start: bound.start, values: chunk_i16, }); } } Self { precision, chunks } } #[inline(always)] pub fn precision(&self) -> u8 { self.precision } #[inline(always)] pub fn chunks(&self) -> &[CoefficientsI16Chunk] { &self.chunks } pub fn chunks_len(&self) -> usize { self.chunks.len() } /// # Safety /// The function must be used with the `v` /// such that the expression `v >> self.precision` /// produces a result in the range `[-512, 511]`. #[inline(always)] pub unsafe fn clip(&self, v: i32) -> u8 { let index = (640 + (v >> self.precision)) as usize; // index must be in range [(640-512)..(640+511)] debug_assert!((128..=1151).contains(&index)); *CLIP8_LOOKUPS.get_unchecked(index) } } // 16 bits for a result. Filter can have negative areas. // In one cases the sum of the coefficients will be negative, // in the other it will be more than 1.0. That is why we need // two extra bits for overflow and i64 type. const PRECISION16_BITS: u8 = 64 - 16 - 2; // We use i32 type to store coefficients. const MAX_COEFFS_PRECISION16: u8 = 32 - 1; #[derive(Debug, Clone)] pub(crate) struct CoefficientsI32Chunk { pub start: u32, values: Vec, } impl CoefficientsI32Chunk { #[inline(always)] pub fn values(&self) -> &[i32] { &self.values } } /// Converts `Vec` into `Vec`. pub(crate) struct Normalizer32 { precision: u8, chunks: Vec, } impl Normalizer32 { #[inline] pub fn new(coefficients: Coefficients) -> Self { let max_weight = coefficients .values .iter() .max_by(|&x, &y| x.partial_cmp(y).unwrap()) .unwrap_or(&0.0) .to_owned(); let mut precision = 0u8; for cur_precision in 0..PRECISION16_BITS { precision = cur_precision; let next_value: i64 = (max_weight * (1i64 << (precision + 1)) as f64).round() as i64; // The next value will be outside the range, so just stop if next_value >= (1i64 << MAX_COEFFS_PRECISION16) { break; } } debug_assert!(precision >= 4); // required for some SIMD optimisations let mut chunks = Vec::with_capacity(coefficients.bounds.len()); if coefficients.window_size > 0 { let scale = (1i64 << precision) as f64; let coef_chunks = coefficients.values.chunks_exact(coefficients.window_size); for (chunk, bound) in coef_chunks.zip(&coefficients.bounds) { let chunk_i32: Vec = chunk .iter() .take(bound.size as usize) .map(|&v| (v * scale).round() as i32) .collect(); chunks.push(CoefficientsI32Chunk { start: bound.start, values: chunk_i32, }); } } Self { precision, chunks } } #[inline] pub fn precision(&self) -> u8 { self.precision } #[inline(always)] pub fn chunks(&self) -> &[CoefficientsI32Chunk] { &self.chunks } #[inline(always)] pub fn chunks_len(&self) -> usize { self.chunks.len() } #[inline(always)] pub fn clip(&self, v: i64) -> u16 { (v >> self.precision).min(u16::MAX as i64).max(0) as u16 } } macro_rules! try_process_in_threads_h { {$op: ident($src_view: ident, $dst_view: ident, $offset: ident, $($arg: ident),+$(,)?);} => { #[allow(unused_labels)] 'block: { #[cfg(feature = "rayon")] { use crate::threading::split_h_two_images_for_threading; use rayon::prelude::*; if let Some(iter) = split_h_two_images_for_threading($src_view, $dst_view, $offset) { iter.for_each(|(src, mut dst)| { $op(&src, &mut dst, 0, $($arg),+); }); break 'block; } } $op($src_view, $dst_view, $offset, $($arg),+); } }; } macro_rules! try_process_in_threads_v { {$op: ident($src_view: ident, $dst_view: ident, $offset: ident, $($arg: ident),+$(,)?);} => { #[allow(unused_labels)] 'block: { #[cfg(feature = "rayon")] { use crate::threading::split_v_two_images_for_threading; use rayon::prelude::*; if let Some(iter) = split_v_two_images_for_threading($src_view, $dst_view, $offset) { iter.for_each(|(src, mut dst)| { $op(&src, &mut dst, 0, $($arg),+); }); break 'block; } } $op($src_view, $dst_view, $offset, $($arg),+); } }; } #[cfg(test)] mod tests { use super::*; use crate::convolution::Bound; fn get_coefficients(value: f64) -> Coefficients { Coefficients { values: vec![value], window_size: 1, bounds: vec![Bound { start: 0, size: 1 }], } } #[test] fn test_minimal_precision() { // required for some SIMD optimisations assert!(Normalizer16::new(get_coefficients(0.0)).precision() >= 4); assert!(Normalizer16::new(get_coefficients(2.0)).precision() >= 4); assert!(Normalizer32::new(get_coefficients(0.0)).precision() >= 4); assert!(Normalizer32::new(get_coefficients(2.0)).precision() >= 4); } } fast_image_resize-5.3.0/src/convolution/u16x1/avx2.rs000064400000000000000000000324341046102023000205770ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16]; 4], dst_rows: [&mut [U16]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 4]; /* |L0 | |L1 | |L2 | |L3 | |L4 | |L5 | |L6 | |L7 | |0001| |0203| |0405| |0607| |0809| |1011| |1213| |1415| Shuffle to extract L0 and L1 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract L2 and L3 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract L4 and L5 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract L6 and L7 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ #[rustfmt::skip] let l0l1_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, ); #[rustfmt::skip] let l2l3_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, ); #[rustfmt::skip] let l4l5_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, ); #[rustfmt::skip] let l6l7_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut ll_sum = [_mm256_set1_epi64x(0); 2]; let mut coeffs = chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeff01_i64x4 = _mm256_set_epi64x(k[1] as i64, k[0] as i64, k[1] as i64, k[0] as i64); let coeff23_i64x4 = _mm256_set_epi64x(k[3] as i64, k[2] as i64, k[3] as i64, k[2] as i64); let coeff45_i64x4 = _mm256_set_epi64x(k[5] as i64, k[4] as i64, k[5] as i64, k[4] as i64); let coeff67_i64x4 = _mm256_set_epi64x(k[7] as i64, k[6] as i64, k[7] as i64, k[6] as i64); for (i, sum) in ll_sum.iter_mut().enumerate() { let source = _mm256_set_m128i( simd_utils::loadu_si128(src_rows[i * 2 + 1], x), simd_utils::loadu_si128(src_rows[i * 2], x), ); let l0l1_i64x4 = _mm256_shuffle_epi8(source, l0l1_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(l0l1_i64x4, coeff01_i64x4)); let l2l3_i64x4 = _mm256_shuffle_epi8(source, l2l3_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(l2l3_i64x4, coeff23_i64x4)); let l4l5_i64x4 = _mm256_shuffle_epi8(source, l4l5_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(l4l5_i64x4, coeff45_i64x4)); let l6l7_i64x4 = _mm256_shuffle_epi8(source, l6l7_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(l6l7_i64x4, coeff67_i64x4)); } x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff01_i64x4 = _mm256_set_epi64x(k[1] as i64, k[0] as i64, k[1] as i64, k[0] as i64); let coeff23_i64x4 = _mm256_set_epi64x(k[3] as i64, k[2] as i64, k[3] as i64, k[2] as i64); for (i, sum) in ll_sum.iter_mut().enumerate() { let source = _mm256_set_m128i( simd_utils::loadl_epi64(src_rows[i * 2 + 1], x), simd_utils::loadl_epi64(src_rows[i * 2], x), ); let l0l1_i64x4 = _mm256_shuffle_epi8(source, l0l1_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(l0l1_i64x4, coeff01_i64x4)); let l2l3_i64x4 = _mm256_shuffle_epi8(source, l2l3_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(l2l3_i64x4, coeff23_i64x4)); } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_i64x4 = _mm256_set_epi64x(k[1] as i64, k[0] as i64, k[1] as i64, k[0] as i64); for (i, sum) in ll_sum.iter_mut().enumerate() { let source = _mm256_set_m128i( simd_utils::loadl_epi32(src_rows[i * 2 + 1], x), simd_utils::loadl_epi32(src_rows[i * 2], x), ); let l0l1_i64x4 = _mm256_shuffle_epi8(source, l0l1_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(l0l1_i64x4, coeff01_i64x4)); } x += 2; } for &k in coeffs { let coeff0_i64x4 = _mm256_set1_epi64x(k as i64); for (i, sum) in ll_sum.iter_mut().enumerate() { let source = _mm256_set_epi64x( 0, src_rows[i * 2 + 1].get_unchecked(x).0 as i64, 0, src_rows[i * 2].get_unchecked(x).0 as i64, ); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(source, coeff0_i64x4)); } } // ll_sum.into_iter().enumerate() executes slowly than ll_sum.iter().enumerate() for (i, &ll) in ll_sum.iter().enumerate() { _mm256_storeu_si256(ll_buf.as_mut_ptr() as *mut __m256i, ll); let dst_pixel = dst_rows[i * 2].get_unchecked_mut(dst_x); dst_pixel.0 = normalizer.clip(ll_buf[0] + ll_buf[1] + half_error); let dst_pixel = dst_rows[i * 2 + 1].get_unchecked_mut(dst_x); dst_pixel.0 = normalizer.clip(ll_buf[2] + ll_buf[3] + half_error); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_one_row( src_row: &[U16], dst_row: &mut [U16], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 4]; /* |L0 | |L1 | |L2 | |L3 | |L4 | |L5 | |L6 | |L7 | |0001| |0203| |0405| |0607| |0809| |1011| |1213| |1415| Shuffle to extract L0 and L1 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract L2 and L3 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract L4 and L5 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract L6 and L7 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ #[rustfmt::skip] let l0l1_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, ); #[rustfmt::skip] let l2l3_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, ); #[rustfmt::skip] let l4l5_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, ); #[rustfmt::skip] let l6l7_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut ll_sum = _mm256_set1_epi64x(0); let mut coeffs = chunk.values(); let coeffs_by_16 = coeffs.chunks_exact(16); coeffs = coeffs_by_16.remainder(); for k in coeffs_by_16 { let coeff0189_i64x4 = _mm256_set_epi64x(k[9] as i64, k[8] as i64, k[1] as i64, k[0] as i64); let coeff23ab_i64x4 = _mm256_set_epi64x(k[11] as i64, k[10] as i64, k[3] as i64, k[2] as i64); let coeff45cd_i64x4 = _mm256_set_epi64x(k[13] as i64, k[12] as i64, k[5] as i64, k[4] as i64); let coeff67ef_i64x4 = _mm256_set_epi64x(k[15] as i64, k[14] as i64, k[7] as i64, k[6] as i64); let source = simd_utils::loadu_si256(src_row, x); let l0l1_i64x4 = _mm256_shuffle_epi8(source, l0l1_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(l0l1_i64x4, coeff0189_i64x4)); let l2l3_i64x4 = _mm256_shuffle_epi8(source, l2l3_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(l2l3_i64x4, coeff23ab_i64x4)); let l4l5_i64x4 = _mm256_shuffle_epi8(source, l4l5_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(l4l5_i64x4, coeff45cd_i64x4)); let l6l7_i64x4 = _mm256_shuffle_epi8(source, l6l7_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(l6l7_i64x4, coeff67ef_i64x4)); x += 16; } let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeff0145_i64x4 = _mm256_set_epi64x(k[5] as i64, k[4] as i64, k[1] as i64, k[0] as i64); let coeff2367_i64x4 = _mm256_set_epi64x(k[7] as i64, k[6] as i64, k[3] as i64, k[2] as i64); let source = _mm256_set_m128i( simd_utils::loadl_epi64(src_row, x + 4), simd_utils::loadl_epi64(src_row, x), ); let l0l1_i64x4 = _mm256_shuffle_epi8(source, l0l1_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(l0l1_i64x4, coeff0145_i64x4)); let l2l3_i64x4 = _mm256_shuffle_epi8(source, l2l3_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(l2l3_i64x4, coeff2367_i64x4)); x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff0123_i64x4 = _mm256_set_epi64x(k[3] as i64, k[2] as i64, k[1] as i64, k[0] as i64); let source = _mm256_set_m128i( simd_utils::loadl_epi32(src_row, x + 2), simd_utils::loadl_epi32(src_row, x), ); let l0l1_i64x4 = _mm256_shuffle_epi8(source, l0l1_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(l0l1_i64x4, coeff0123_i64x4)); x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_i64x4 = _mm256_set_epi64x(0, 0, k[1] as i64, k[0] as i64); let source = _mm256_set_m128i(_mm_setzero_si128(), simd_utils::loadl_epi32(src_row, x)); let l0l1_i64x4 = _mm256_shuffle_epi8(source, l0l1_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(l0l1_i64x4, coeff01_i64x4)); x += 2; } for &k in coeffs { let coeff0_i64x4 = _mm256_set1_epi64x(k as i64); let source = _mm256_set_epi64x(0, 0, 0, src_row.get_unchecked(x).0 as i64); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(source, coeff0_i64x4)); } _mm256_storeu_si256(ll_buf.as_mut_ptr() as *mut __m256i, ll_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = normalizer.clip(ll_buf.iter().sum::() + half_error); } } fast_image_resize-5.3.0/src/convolution/u16x1/mod.rs000064400000000000000000000050341046102023000204720ustar 00000000000000use super::{Coefficients, Convolution}; use crate::convolution::optimisations::Normalizer32; use crate::convolution::vertical_u16::vert_convolution_u16; use crate::pixels::U16; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U16; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let normalizer = Normalizer32::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let normalizer = Normalizer32::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_v! { vert_convolution_u16( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, normalizer), _ => native::horiz_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/u16x1/native.rs000064400000000000000000000020721046102023000212000ustar 00000000000000use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16; use crate::{ImageView, ImageViewMut}; #[inline(always)] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let coefficients_chunks = normalizer.chunks(); let initial = 1i64 << (precision - 1); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (coeffs_chunk, dst_pixel) in coefficients_chunks.iter().zip(dst_row.iter_mut()) { let first_x_src = coeffs_chunk.start as usize; let mut ss = initial; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, src_pixel) in coeffs_chunk.values().iter().zip(src_pixels) { ss += src_pixel.0 as i64 * (k as i64); } dst_pixel.0 = normalizer.clip(ss); } } } fast_image_resize-5.3.0/src/convolution/u16x1/neon.rs000064400000000000000000000175551046102023000206650ustar 00000000000000use std::arch::aarch64::*; use crate::convolution::optimisations::{CoefficientsI32Chunk, Normalizer32}; use crate::neon_utils; use crate::pixels::U16; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_64_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let coefficients_chunks = normalizer.chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16]; 4], dst_rows: [&mut [U16]; 4], coefficients_chunks: &[CoefficientsI32Chunk], ) { let initial = vdupq_n_s64(1i64 << (PRECISION - 2)); let zero_u16x4 = vdup_n_u16(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss_a = [initial; 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeffs_i32x4 = neon_utils::load_i32x4(k, 0); let coeff0 = vget_low_s32(coeffs_i32x4); let coeff1 = vget_high_s32(coeffs_i32x4); for i in 0..4 { let source = neon_utils::load_u16x4(src_rows[i], x); let mut sss = sss_a[i]; let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeff0); let pix_i32 = vreinterpret_s32_u16(vzip2_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeff1); sss_a[i] = sss; } x += 4; } let mut coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); if let Some(k) = coeffs_by_2.next() { let coeffs_i32x2 = neon_utils::load_i32x2(k, 0); for i in 0..4 { let source = neon_utils::load_u16x2(src_rows[i], x); let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss_a[i] = vmlal_s32(sss_a[i], pix_i32, coeffs_i32x2); } x += 2; } if !coeffs.is_empty() { let coeffs_i32x2 = neon_utils::load_i32x1(coeffs, 0); for i in 0..4 { let source = neon_utils::load_u16x1(src_rows[i], x); let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss_a[i] = vmlal_s32(sss_a[i], pix_i32, coeffs_i32x2); } } let mut sss_a_i64 = [ vadd_s64(vget_low_s64(sss_a[0]), vget_high_s64(sss_a[0])), vadd_s64(vget_low_s64(sss_a[1]), vget_high_s64(sss_a[1])), vadd_s64(vget_low_s64(sss_a[2]), vget_high_s64(sss_a[2])), vadd_s64(vget_low_s64(sss_a[3]), vget_high_s64(sss_a[3])), ]; sss_a_i64[0] = vshr_n_s64::(sss_a_i64[0]); sss_a_i64[1] = vshr_n_s64::(sss_a_i64[1]); sss_a_i64[2] = vshr_n_s64::(sss_a_i64[2]); sss_a_i64[3] = vshr_n_s64::(sss_a_i64[3]); for i in 0..4 { let res = vdupd_lane_s64::<0>(sss_a_i64[i]); dst_rows[i].get_unchecked_mut(dst_x).0 = vqmovns_u32(vqmovund_s64(res)); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_one_row( src_row: &[U16], dst_row: &mut [U16], coefficients_chunks: &[CoefficientsI32Chunk], ) { let initial = vdupq_n_s64(1i64 << (PRECISION - 2)); let zero_u16x8 = vdupq_n_u16(0); let zero_u16x4 = vdup_n_u16(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss = initial; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i32x4x2 = neon_utils::load_i32x4x2(k, 0); let source = neon_utils::load_u16x8(src_row, x); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss = vmlal_s32(sss, vget_low_s32(pix_i32), vget_low_s32(coeffs_i32x4x2.0)); sss = vmlal_s32(sss, vget_high_s32(pix_i32), vget_high_s32(coeffs_i32x4x2.0)); let pix_i32 = vreinterpretq_s32_u16(vzip2q_u16(source, zero_u16x8)); sss = vmlal_s32(sss, vget_low_s32(pix_i32), vget_low_s32(coeffs_i32x4x2.1)); sss = vmlal_s32(sss, vget_high_s32(pix_i32), vget_high_s32(coeffs_i32x4x2.1)); x += 8; } let mut coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i32x4 = neon_utils::load_i32x4(k, 0); let source = neon_utils::load_u16x4(src_row, x); let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, vget_low_s32(coeffs_i32x4)); let pix_i32 = vreinterpret_s32_u16(vzip2_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, vget_high_s32(coeffs_i32x4)); x += 4; } let mut coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); if let Some(k) = coeffs_by_2.next() { let coeffs_i32x2 = neon_utils::load_i32x2(k, 0); let source = neon_utils::load_u16x2(src_row, x); let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeffs_i32x2); x += 2; } if !coeffs.is_empty() { let coeffs_i32x2 = neon_utils::load_i32x1(coeffs, 0); let source = neon_utils::load_u16x1(src_row, x); let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeffs_i32x2); } let mut sss_i64 = vadd_s64(vget_low_s64(sss), vget_high_s64(sss)); sss_i64 = vshr_n_s64::(sss_i64); let res = vdupd_lane_s64::<0>(sss_i64); dst_row.get_unchecked_mut(dst_x).0 = vqmovns_u32(vqmovund_s64(res)); } } fast_image_resize-5.3.0/src/convolution/u16x1/sse4.rs000064400000000000000000000243571046102023000206020ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16]; 4], dst_rows: [&mut [U16]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 2]; /* |L0 | |L1 | |L2 | |L3 | |L4 | |L5 | |L6 | |L7 | |0001| |0203| |0405| |0607| |0809| |1011| |1213| |1415| Shuffle to extract L0 and L1 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract L2 and L3 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract L4 and L5 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract L6 and L7 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ let l0l1_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0); let l2l3_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4); let l4l5_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8); let l6l7_shuffle = _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut ll_sum = [_mm_set1_epi64x(0); 4]; let mut coeffs = chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeff01_i64x2 = _mm_set_epi64x(k[1] as i64, k[0] as i64); let coeff23_i64x2 = _mm_set_epi64x(k[3] as i64, k[2] as i64); let coeff45_i64x2 = _mm_set_epi64x(k[5] as i64, k[4] as i64); let coeff67_i64x2 = _mm_set_epi64x(k[7] as i64, k[6] as i64); for i in 0..4 { let mut sum = ll_sum[i]; let source = simd_utils::loadu_si128(src_rows[i], x); let l0l1_i64x2 = _mm_shuffle_epi8(source, l0l1_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(l0l1_i64x2, coeff01_i64x2)); let l2l3_i64x2 = _mm_shuffle_epi8(source, l2l3_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(l2l3_i64x2, coeff23_i64x2)); let l4l5_i64x2 = _mm_shuffle_epi8(source, l4l5_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(l4l5_i64x2, coeff45_i64x2)); let l6l7_i64x2 = _mm_shuffle_epi8(source, l6l7_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(l6l7_i64x2, coeff67_i64x2)); ll_sum[i] = sum; } x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff01_i64x2 = _mm_set_epi64x(k[1] as i64, k[0] as i64); let coeff23_i64x2 = _mm_set_epi64x(k[3] as i64, k[2] as i64); for i in 0..4 { let mut sum = ll_sum[i]; let source = simd_utils::loadl_epi64(src_rows[i], x); let l0l1_i64x2 = _mm_shuffle_epi8(source, l0l1_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(l0l1_i64x2, coeff01_i64x2)); let l2l3_i64x2 = _mm_shuffle_epi8(source, l2l3_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(l2l3_i64x2, coeff23_i64x2)); ll_sum[i] = sum; } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_i64x2 = _mm_set_epi64x(k[1] as i64, k[0] as i64); for i in 0..4 { let source = simd_utils::loadl_epi32(src_rows[i], x); let l_i64x2 = _mm_shuffle_epi8(source, l0l1_shuffle); ll_sum[i] = _mm_add_epi64(ll_sum[i], _mm_mul_epi32(l_i64x2, coeff01_i64x2)); } x += 2; } if let Some(&k) = coeffs.first() { let coeff01_i64x2 = _mm_set_epi64x(0, k as i64); for i in 0..4 { let pixel = src_rows[i].get_unchecked(x).0 as i64; let source = _mm_set_epi64x(0, pixel); ll_sum[i] = _mm_add_epi64(ll_sum[i], _mm_mul_epi32(source, coeff01_i64x2)); } } for i in 0..4 { _mm_storeu_si128(ll_buf.as_mut_ptr() as *mut __m128i, ll_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = normalizer.clip(ll_buf.iter().sum::() + half_error); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_one_row( src_row: &[U16], dst_row: &mut [U16], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 2]; /* |L0 | |L1 | |L2 | |L3 | |L4 | |L5 | |L6 | |L7 | |0001| |0203| |0405| |0607| |0809| |1011| |1213| |1415| Shuffle to extract L0 and L1 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract L2 and L3 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract L4 and L5 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract L6 and L7 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ let l01_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0); let l23_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4); let l45_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8); let l67_shuffle = _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut ll_sum = _mm_set1_epi64x(0); let mut coeffs = chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeff01_i64x2 = _mm_set_epi64x(k[1] as i64, k[0] as i64); let coeff23_i64x2 = _mm_set_epi64x(k[3] as i64, k[2] as i64); let coeff45_i64x2 = _mm_set_epi64x(k[5] as i64, k[4] as i64); let coeff67_i64x2 = _mm_set_epi64x(k[7] as i64, k[6] as i64); let source = simd_utils::loadu_si128(src_row, x); let l_i64x2 = _mm_shuffle_epi8(source, l01_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(l_i64x2, coeff01_i64x2)); let l_i64x2 = _mm_shuffle_epi8(source, l23_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(l_i64x2, coeff23_i64x2)); let l_i64x2 = _mm_shuffle_epi8(source, l45_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(l_i64x2, coeff45_i64x2)); let l_i64x2 = _mm_shuffle_epi8(source, l67_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(l_i64x2, coeff67_i64x2)); x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff01_i64x2 = _mm_set_epi64x(k[1] as i64, k[0] as i64); let coeff23_i64x2 = _mm_set_epi64x(k[3] as i64, k[2] as i64); let source = simd_utils::loadl_epi64(src_row, x); let l_i64x2 = _mm_shuffle_epi8(source, l01_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(l_i64x2, coeff01_i64x2)); let l_i64x2 = _mm_shuffle_epi8(source, l23_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(l_i64x2, coeff23_i64x2)); x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_i64x2 = _mm_set_epi64x(k[1] as i64, k[0] as i64); let source = simd_utils::loadl_epi32(src_row, x); let l_i64x2 = _mm_shuffle_epi8(source, l01_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(l_i64x2, coeff01_i64x2)); x += 2; } if let Some(&k) = coeffs.first() { let coeff01_i64x2 = _mm_set_epi64x(0, k as i64); let pixel = src_row.get_unchecked(x).0 as i64; let source = _mm_set_epi64x(0, pixel); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(source, coeff01_i64x2)); } _mm_storeu_si128(ll_buf.as_mut_ptr() as *mut __m128i, ll_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = normalizer.clip(ll_buf[0] + ll_buf[1] + half_error); } } fast_image_resize-5.3.0/src/convolution/u16x1/wasm32.rs000064400000000000000000000244611046102023000210340ustar 00000000000000use std::arch::wasm32::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16]; 4], dst_rows: [&mut [U16]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 2]; /* |L0 | |L1 | |L2 | |L3 | |L4 | |L5 | |L6 | |L7 | |0001| |0203| |0405| |0607| |0809| |1011| |1213| |1415| Shuffle to extract L0 and L1 as i64: 0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1 Shuffle to extract L2 and L3 as i64: 4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1 Shuffle to extract L4 and L5 as i64: 8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1 Shuffle to extract L6 and L7 as i64: 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1 */ const L0L1_SHUFFLE: v128 = i8x16(0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1); const L2L3_SHUFFLE: v128 = i8x16(4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1); const L4L5_SHUFFLE: v128 = i8x16(8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1); const L6L7_SHUFFLE: v128 = i8x16( 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1, ); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut ll_sum: [v128; 4] = [i64x2_splat(0i64); 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeff01_i64x2 = i64x2(k[0] as i64, k[1] as i64); let coeff23_i64x2 = i64x2(k[2] as i64, k[3] as i64); let coeff45_i64x2 = i64x2(k[4] as i64, k[5] as i64); let coeff67_i64x2 = i64x2(k[6] as i64, k[7] as i64); for i in 0..4 { let mut sum = ll_sum[i]; let source = wasm32_utils::load_v128(src_rows[i], x); let l0l1_i64x2 = i8x16_swizzle(source, L0L1_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(l0l1_i64x2, coeff01_i64x2)); let l2l3_i64x2 = i8x16_swizzle(source, L2L3_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(l2l3_i64x2, coeff23_i64x2)); let l4l5_i64x2 = i8x16_swizzle(source, L4L5_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(l4l5_i64x2, coeff45_i64x2)); let l6l7_i64x2 = i8x16_swizzle(source, L6L7_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(l6l7_i64x2, coeff67_i64x2)); ll_sum[i] = sum; } x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff01_i64x2 = i64x2(k[0] as i64, k[1] as i64); let coeff23_i64x2 = i64x2(k[2] as i64, k[3] as i64); for i in 0..4 { let mut sum = ll_sum[i]; let source = wasm32_utils::load_v128(src_rows[i], x); let l0l1_i64x2 = i8x16_swizzle(source, L0L1_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(l0l1_i64x2, coeff01_i64x2)); let l2l3_i64x2 = i8x16_swizzle(source, L2L3_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(l2l3_i64x2, coeff23_i64x2)); ll_sum[i] = sum; } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_i64x2 = i64x2(k[0] as i64, k[1] as i64); for i in 0..4 { let source = wasm32_utils::load_v128(src_rows[i], x); let l_i64x2 = i8x16_swizzle(source, L0L1_SHUFFLE); ll_sum[i] = i64x2_add( ll_sum[i], wasm32_utils::i64x2_mul_lo(l_i64x2, coeff01_i64x2), ); } x += 2; } if let Some(&k) = coeffs.first() { let coeff01_i64x2 = i64x2(k as i64, 0); for i in 0..4 { let pixel = src_rows[i].get_unchecked(x).0 as i64; let source = i64x2(pixel, 0); ll_sum[i] = i64x2_add(ll_sum[i], wasm32_utils::i64x2_mul_lo(source, coeff01_i64x2)); } } for i in 0..4 { v128_store(ll_buf.as_mut_ptr() as *mut v128, ll_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = normalizer.clip(ll_buf.iter().sum::() + half_error); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_one_row( src_row: &[U16], dst_row: &mut [U16], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 2]; /* |L0 | |L1 | |L2 | |L3 | |L4 | |L5 | |L6 | |L7 | |0001| |0203| |0405| |0607| |0809| |1011| |1213| |1415| Shuffle to extract L0 and L1 as i64: 0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1 Shuffle to extract L2 and L3 as i64: 4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1 Shuffle to extract L4 and L5 as i64: 8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1 Shuffle to extract L6 and L7 as i64: 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1 */ const L01_SHUFFLE: v128 = i8x16(0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1); const L23_SHUFFLE: v128 = i8x16(4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1); const L45_SHUFFLE: v128 = i8x16(8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1); const L67_SHUFFLE: v128 = i8x16( 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1, ); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut ll_sum = i64x2_splat(0); let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeff01_i64x2 = i64x2(k[0] as i64, k[1] as i64); let coeff23_i64x2 = i64x2(k[2] as i64, k[3] as i64); let coeff45_i64x2 = i64x2(k[4] as i64, k[5] as i64); let coeff67_i64x2 = i64x2(k[6] as i64, k[7] as i64); let source = wasm32_utils::load_v128(src_row, x); let l_i64x2 = i8x16_swizzle(source, L01_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(l_i64x2, coeff01_i64x2)); let l_i64x2 = i8x16_swizzle(source, L23_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(l_i64x2, coeff23_i64x2)); let l_i64x2 = i8x16_swizzle(source, L45_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(l_i64x2, coeff45_i64x2)); let l_i64x2 = i8x16_swizzle(source, L67_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(l_i64x2, coeff67_i64x2)); x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff01_i64x2 = i64x2(k[0] as i64, k[1] as i64); let coeff23_i64x2 = i64x2(k[2] as i64, k[3] as i64); let source = wasm32_utils::load_v128(src_row, x); let l_i64x2 = i8x16_swizzle(source, L01_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(l_i64x2, coeff01_i64x2)); let l_i64x2 = i8x16_swizzle(source, L23_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(l_i64x2, coeff23_i64x2)); x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_i64x2 = i64x2(k[0] as i64, k[1] as i64); let source = wasm32_utils::load_v128(src_row, x); let l_i64x2 = i8x16_swizzle(source, L01_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(l_i64x2, coeff01_i64x2)); x += 2; } if let Some(&k) = coeffs.first() { let coeff01_i64x2 = i64x2(k as i64, 0); let pixel = src_row.get_unchecked(x).0 as i64; let source = i64x2(pixel, 0); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(source, coeff01_i64x2)); } v128_store(ll_buf.as_mut_ptr() as *mut v128, ll_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = normalizer.clip(ll_buf[0] + ll_buf[1] + half_error); } } fast_image_resize-5.3.0/src/convolution/u16x2/avx2.rs000064400000000000000000000275701046102023000206050ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x2; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x2]; 4], dst_rows: [&mut [U16x2]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 4]; /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| Shuffle to extract L0 and A0 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract L1 and A1 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract L2 and A2 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract L3 and A3 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ #[rustfmt::skip] let p0_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, ); #[rustfmt::skip] let p1_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, ); #[rustfmt::skip] let p2_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, ); #[rustfmt::skip] let p3_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut ll_sum = [_mm256_set1_epi64x(half_error); 2]; let mut coeffs = chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff0_i64x4 = _mm256_set1_epi64x(k[0] as i64); let coeff1_i64x4 = _mm256_set1_epi64x(k[1] as i64); let coeff2_i64x4 = _mm256_set1_epi64x(k[2] as i64); let coeff3_i64x4 = _mm256_set1_epi64x(k[3] as i64); for (i, sum) in ll_sum.iter_mut().enumerate() { let source = _mm256_set_m128i( simd_utils::loadu_si128(src_rows[i * 2 + 1], x), simd_utils::loadu_si128(src_rows[i * 2], x), ); let pp_i64x4 = _mm256_shuffle_epi8(source, p0_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(pp_i64x4, coeff0_i64x4)); let pp_i64x4 = _mm256_shuffle_epi8(source, p1_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(pp_i64x4, coeff1_i64x4)); let pp_i64x4 = _mm256_shuffle_epi8(source, p2_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(pp_i64x4, coeff2_i64x4)); let pp_i64x4 = _mm256_shuffle_epi8(source, p3_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(pp_i64x4, coeff3_i64x4)); } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x4 = _mm256_set1_epi64x(k[0] as i64); let coeff1_i64x4 = _mm256_set1_epi64x(k[1] as i64); for (i, sum) in ll_sum.iter_mut().enumerate() { let source = _mm256_set_m128i( simd_utils::loadl_epi64(src_rows[i * 2 + 1], x), simd_utils::loadl_epi64(src_rows[i * 2], x), ); let pp_i64x4 = _mm256_shuffle_epi8(source, p0_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(pp_i64x4, coeff0_i64x4)); let pp_i64x4 = _mm256_shuffle_epi8(source, p1_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(pp_i64x4, coeff1_i64x4)); } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x4 = _mm256_set1_epi64x(k as i64); for (i, sum) in ll_sum.iter_mut().enumerate() { let source = _mm256_set_m128i( simd_utils::loadl_epi32(src_rows[i * 2 + 1], x), simd_utils::loadl_epi32(src_rows[i * 2], x), ); let pp_i64x4 = _mm256_shuffle_epi8(source, p0_shuffle); *sum = _mm256_add_epi64(*sum, _mm256_mul_epi32(pp_i64x4, coeff0_i64x4)); } } // ll_sum.into_iter().enumerate() executes slowly than ll_sum.iter().enumerate() for (i, &ll) in ll_sum.iter().enumerate() { _mm256_storeu_si256(ll_buf.as_mut_ptr() as *mut __m256i, ll); let dst_pixel = dst_rows[i * 2].get_unchecked_mut(dst_x); dst_pixel.0 = [normalizer.clip(ll_buf[0]), normalizer.clip(ll_buf[1])]; let dst_pixel = dst_rows[i * 2 + 1].get_unchecked_mut(dst_x); dst_pixel.0 = [normalizer.clip(ll_buf[2]), normalizer.clip(ll_buf[3])]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_one_row( src_row: &[U16x2], dst_row: &mut [U16x2], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 4]; /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| Shuffle to extract L0 and A0 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract L1 and A1 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract L2 and A2 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract L3 and A3 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ #[rustfmt::skip] let p0_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, ); #[rustfmt::skip] let p1_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, ); #[rustfmt::skip] let p2_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, ); #[rustfmt::skip] let p3_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut ll_sum = _mm256_setzero_si256(); let mut coeffs = chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeff04_i64x4 = _mm256_set_epi64x(k[4] as i64, k[4] as i64, k[0] as i64, k[0] as i64); let coeff15_i64x4 = _mm256_set_epi64x(k[5] as i64, k[5] as i64, k[1] as i64, k[1] as i64); let coeff26_i64x4 = _mm256_set_epi64x(k[6] as i64, k[6] as i64, k[2] as i64, k[2] as i64); let coeff37_i64x4 = _mm256_set_epi64x(k[7] as i64, k[7] as i64, k[3] as i64, k[3] as i64); let source = simd_utils::loadu_si256(src_row, x); let pp_i64x4 = _mm256_shuffle_epi8(source, p0_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(pp_i64x4, coeff04_i64x4)); let pp_i64x4 = _mm256_shuffle_epi8(source, p1_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(pp_i64x4, coeff15_i64x4)); let pp_i64x4 = _mm256_shuffle_epi8(source, p2_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(pp_i64x4, coeff26_i64x4)); let pp_i64x4 = _mm256_shuffle_epi8(source, p3_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(pp_i64x4, coeff37_i64x4)); x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff02_i64x4 = _mm256_set_epi64x(k[2] as i64, k[2] as i64, k[0] as i64, k[0] as i64); let coeff13_i64x4 = _mm256_set_epi64x(k[3] as i64, k[3] as i64, k[1] as i64, k[1] as i64); let source = _mm256_set_m128i( simd_utils::loadl_epi64(src_row, x + 2), simd_utils::loadl_epi64(src_row, x), ); let pp_i64x4 = _mm256_shuffle_epi8(source, p0_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(pp_i64x4, coeff02_i64x4)); let pp_i64x4 = _mm256_shuffle_epi8(source, p1_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(pp_i64x4, coeff13_i64x4)); x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_i64x4 = _mm256_set_epi64x(k[1] as i64, k[1] as i64, k[0] as i64, k[0] as i64); let source = _mm256_set_m128i( simd_utils::loadl_epi32(src_row, x + 1), simd_utils::loadl_epi32(src_row, x), ); let pp_i64x4 = _mm256_shuffle_epi8(source, p0_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(pp_i64x4, coeff01_i64x4)); x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x4 = _mm256_set_epi64x(0, 0, k as i64, k as i64); let source = _mm256_set_m128i(_mm_setzero_si128(), simd_utils::loadl_epi32(src_row, x)); let p_i64x4 = _mm256_shuffle_epi8(source, p0_shuffle); ll_sum = _mm256_add_epi64(ll_sum, _mm256_mul_epi32(p_i64x4, coeff0_i64x4)); } _mm256_storeu_si256(ll_buf.as_mut_ptr() as *mut __m256i, ll_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = [ normalizer.clip(ll_buf[0] + ll_buf[2] + half_error), normalizer.clip(ll_buf[1] + ll_buf[3] + half_error), ]; } } fast_image_resize-5.3.0/src/convolution/u16x2/mod.rs000064400000000000000000000050401046102023000204700ustar 00000000000000use super::{Coefficients, Convolution}; use crate::convolution::optimisations::Normalizer32; use crate::convolution::vertical_u16::vert_convolution_u16; use crate::pixels::U16x2; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U16x2; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let normalizer = Normalizer32::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let normalizer = Normalizer32::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_v! { vert_convolution_u16( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, normalizer), _ => native::horiz_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/u16x2/native.rs000064400000000000000000000023511046102023000212010ustar 00000000000000use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x2; use crate::{ImageView, ImageViewMut}; #[inline(always)] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let coefficients_chunks = normalizer.chunks(); let initial: i64 = 1 << (precision - 1); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (coeffs_chunk, dst_pixel) in coefficients_chunks.iter().zip(dst_row.iter_mut()) { let first_x_src = coeffs_chunk.start as usize; let mut ss = [initial; 2]; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, src_pixel) in coeffs_chunk.values().iter().zip(src_pixels) { for (i, s) in ss.iter_mut().enumerate() { *s += src_pixel.0[i] as i64 * (k as i64); } } for (i, s) in ss.iter().copied().enumerate() { dst_pixel.0[i] = normalizer.clip(s); } } } } fast_image_resize-5.3.0/src/convolution/u16x2/neon.rs000064400000000000000000000163241046102023000206570ustar 00000000000000use std::arch::aarch64::*; use crate::convolution::optimisations::{CoefficientsI32Chunk, Normalizer32}; use crate::neon_utils; use crate::pixels::U16x2; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_64_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let coefficients_chunks = normalizer.chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x2]; 4], dst_rows: [&mut [U16x2]; 4], coefficients_chunks: &[CoefficientsI32Chunk], ) { let initial = vdupq_n_s64(1i64 << (PRECISION - 1)); // let zero_u16x8 = vdupq_n_u16(0); let zero_u16x4 = vdup_n_u16(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss_a = [initial; 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeffs_i32x2 = neon_utils::load_i32x2(k, 0); let coeff0 = vzip1_s32(coeffs_i32x2, coeffs_i32x2); let coeff1 = vzip2_s32(coeffs_i32x2, coeffs_i32x2); for i in 0..4 { let mut sss = sss_a[i]; let source = neon_utils::load_u16x4(src_rows[i], x); let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeff0); let pix_i32 = vreinterpret_s32_u16(vzip2_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeff1); sss_a[i] = sss; } x += 2; } if !coeffs.is_empty() { let coeffs_i32x2 = neon_utils::load_i32x1(coeffs, 0); let coeff = vzip1_s32(coeffs_i32x2, coeffs_i32x2); for i in 0..4 { let source = neon_utils::load_u16x2(src_rows[i], x); let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss_a[i] = vmlal_s32(sss_a[i], pix_i32, coeff); } } sss_a[0] = vshrq_n_s64::(sss_a[0]); sss_a[1] = vshrq_n_s64::(sss_a[1]); sss_a[2] = vshrq_n_s64::(sss_a[2]); sss_a[3] = vshrq_n_s64::(sss_a[3]); for i in 0..4 { let res_u16x4 = vqmovun_s32(vcombine_s32( vqmovn_s64(sss_a[i]), vreinterpret_s32_u16(zero_u16x4), )); dst_rows[i].get_unchecked_mut(dst_x).0 = [ vduph_lane_u16::<0>(res_u16x4), vduph_lane_u16::<1>(res_u16x4), ]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_one_row( src_row: &[U16x2], dst_row: &mut [U16x2], coefficients_chunks: &[CoefficientsI32Chunk], ) { let initial = vdupq_n_s64(1i64 << (PRECISION - 1)); let zero_u16x8 = vdupq_n_u16(0); let zero_u16x4 = vdup_n_u16(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss = initial; let mut coeffs = coeffs_chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeffs_i32x4 = neon_utils::load_i32x4(k, 0); let coeff0 = vzip1q_s32(coeffs_i32x4, coeffs_i32x4); let coeff1 = vzip2q_s32(coeffs_i32x4, coeffs_i32x4); let source = neon_utils::load_u16x8(src_row, x); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss = vmlal_s32(sss, vget_low_s32(pix_i32), vget_low_s32(coeff0)); sss = vmlal_s32(sss, vget_high_s32(pix_i32), vget_high_s32(coeff0)); let pix_i32 = vreinterpretq_s32_u16(vzip2q_u16(source, zero_u16x8)); sss = vmlal_s32(sss, vget_low_s32(pix_i32), vget_low_s32(coeff1)); sss = vmlal_s32(sss, vget_high_s32(pix_i32), vget_high_s32(coeff1)); x += 4; } let mut coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); if let Some(k) = coeffs_by_2.next() { let coeffs_i32x2 = neon_utils::load_i32x2(k, 0); let coeff0 = vzip1_s32(coeffs_i32x2, coeffs_i32x2); let coeff1 = vzip2_s32(coeffs_i32x2, coeffs_i32x2); let source = neon_utils::load_u16x4(src_row, x); let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeff0); let pix_i32 = vreinterpret_s32_u16(vzip2_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeff1); x += 2; } if !coeffs.is_empty() { let coeffs_i32x2 = neon_utils::load_i32x1(coeffs, 0); let coeff = vzip1_s32(coeffs_i32x2, coeffs_i32x2); let source = neon_utils::load_u16x2(src_row, x); let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeff); } sss = vshrq_n_s64::(sss); let res_u16x4 = vqmovun_s32(vcombine_s32( vqmovn_s64(sss), vreinterpret_s32_u16(zero_u16x4), )); dst_row.get_unchecked_mut(dst_x).0 = [ vduph_lane_u16::<0>(res_u16x4), vduph_lane_u16::<1>(res_u16x4), ]; } } fast_image_resize-5.3.0/src/convolution/u16x2/sse4.rs000064400000000000000000000222671046102023000206010ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x2; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x2]; 4], dst_rows: [&mut [U16x2]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 2]; /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| Shuffle to extract L0 and A0 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract L1 and A1 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract L2 and A2 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract L3 and A3 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ let p0_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0); let p1_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4); let p2_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8); let p3_shuffle = _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut ll_sum = [_mm_set1_epi64x(half_error); 4]; let mut coeffs = chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff0_i64x2 = _mm_set1_epi64x(k[0] as i64); let coeff1_i64x2 = _mm_set1_epi64x(k[1] as i64); let coeff2_i64x2 = _mm_set1_epi64x(k[2] as i64); let coeff3_i64x2 = _mm_set1_epi64x(k[3] as i64); for i in 0..4 { let mut sum = ll_sum[i]; let source = simd_utils::loadu_si128(src_rows[i], x); let p_i64x2 = _mm_shuffle_epi8(source, p0_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(p_i64x2, coeff0_i64x2)); let p_i64x2 = _mm_shuffle_epi8(source, p1_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(p_i64x2, coeff1_i64x2)); let p_i64x2 = _mm_shuffle_epi8(source, p2_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(p_i64x2, coeff2_i64x2)); let p_i64x2 = _mm_shuffle_epi8(source, p3_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(p_i64x2, coeff3_i64x2)); ll_sum[i] = sum; } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = _mm_set1_epi64x(k[0] as i64); let coeff1_i64x2 = _mm_set1_epi64x(k[1] as i64); for i in 0..4 { let mut sum = ll_sum[i]; let source = simd_utils::loadl_epi64(src_rows[i], x); let p_i64x2 = _mm_shuffle_epi8(source, p0_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(p_i64x2, coeff0_i64x2)); let p_i64x2 = _mm_shuffle_epi8(source, p1_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(p_i64x2, coeff1_i64x2)); ll_sum[i] = sum; } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x2 = _mm_set1_epi64x(k as i64); for i in 0..4 { let source = simd_utils::loadl_epi32(src_rows[i], x); let p_i64x2 = _mm_shuffle_epi8(source, p0_shuffle); ll_sum[i] = _mm_add_epi64(ll_sum[i], _mm_mul_epi32(p_i64x2, coeff0_i64x2)); } } for i in 0..4 { _mm_storeu_si128(ll_buf.as_mut_ptr() as *mut __m128i, ll_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = [normalizer.clip(ll_buf[0]), normalizer.clip(ll_buf[1])]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_one_row( src_row: &[U16x2], dst_row: &mut [U16x2], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 2]; /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| Shuffle to extract L0 and A0 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract L1 and A1 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract L2 and A2 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract L3 and A3 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ let p0_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0); let p1_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4); let p2_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8); let p3_shuffle = _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut ll_sum = _mm_set1_epi64x(half_error); let mut coeffs = chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff0_i64x2 = _mm_set1_epi64x(k[0] as i64); let coeff1_i64x2 = _mm_set1_epi64x(k[1] as i64); let coeff2_i64x2 = _mm_set1_epi64x(k[2] as i64); let coeff3_i64x2 = _mm_set1_epi64x(k[3] as i64); let source = simd_utils::loadu_si128(src_row, x); let p_i64x2 = _mm_shuffle_epi8(source, p0_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(p_i64x2, coeff0_i64x2)); let p_i64x2 = _mm_shuffle_epi8(source, p1_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(p_i64x2, coeff1_i64x2)); let p_i64x2 = _mm_shuffle_epi8(source, p2_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(p_i64x2, coeff2_i64x2)); let p_i64x2 = _mm_shuffle_epi8(source, p3_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(p_i64x2, coeff3_i64x2)); x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = _mm_set1_epi64x(k[0] as i64); let coeff1_i64x2 = _mm_set1_epi64x(k[1] as i64); let source = simd_utils::loadl_epi64(src_row, x); let p_i64x2 = _mm_shuffle_epi8(source, p0_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(p_i64x2, coeff0_i64x2)); let p_i64x2 = _mm_shuffle_epi8(source, p1_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(p_i64x2, coeff1_i64x2)); x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x2 = _mm_set1_epi64x(k as i64); let source = simd_utils::loadl_epi32(src_row, x); let p_i64x2 = _mm_shuffle_epi8(source, p0_shuffle); ll_sum = _mm_add_epi64(ll_sum, _mm_mul_epi32(p_i64x2, coeff0_i64x2)); } _mm_storeu_si128(ll_buf.as_mut_ptr() as *mut __m128i, ll_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = [normalizer.clip(ll_buf[0]), normalizer.clip(ll_buf[1])]; } } fast_image_resize-5.3.0/src/convolution/u16x2/wasm32.rs000064400000000000000000000224101046102023000210250ustar 00000000000000use std::arch::wasm32::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x2; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x2]; 4], dst_rows: [&mut [U16x2]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 2]; /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| Shuffle to extract L0 and A0 as i64: 0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1 Shuffle to extract L1 and A1 as i64: 4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1 Shuffle to extract L2 and A2 as i64: 8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1 Shuffle to extract L3 and A3 as i64: 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1 */ const P0_SHUFFLE: v128 = i8x16(0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1); const P1_SHUFFLE: v128 = i8x16(4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1); const P2_SHUFFLE: v128 = i8x16(8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1); const P3_SHUFFLE: v128 = i8x16( 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1, ); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut ll_sum = [i64x2_splat(half_error); 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff0_i64x2 = i64x2_splat(k[0] as i64); let coeff1_i64x2 = i64x2_splat(k[1] as i64); let coeff2_i64x2 = i64x2_splat(k[2] as i64); let coeff3_i64x2 = i64x2_splat(k[3] as i64); for i in 0..4 { let mut sum = ll_sum[i]; let source = wasm32_utils::load_v128(src_rows[i], x); let p_i64x2 = i8x16_swizzle(source, P0_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff0_i64x2)); let p_i64x2 = i8x16_swizzle(source, P1_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff1_i64x2)); let p_i64x2 = i8x16_swizzle(source, P2_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff2_i64x2)); let p_i64x2 = i8x16_swizzle(source, P3_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff3_i64x2)); ll_sum[i] = sum; } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = i64x2_splat(k[0] as i64); let coeff1_i64x2 = i64x2_splat(k[1] as i64); for i in 0..4 { let mut sum = ll_sum[i]; let source = wasm32_utils::loadl_i64(src_rows[i], x); let p_i64x2 = i8x16_swizzle(source, P0_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff0_i64x2)); let p_i64x2 = i8x16_swizzle(source, P1_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff1_i64x2)); ll_sum[i] = sum; } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x2 = i64x2_splat(k as i64); for i in 0..4 { let source = wasm32_utils::loadl_i32(src_rows[i], x); let p_i64x2 = i8x16_swizzle(source, P0_SHUFFLE); ll_sum[i] = i64x2_add(ll_sum[i], wasm32_utils::i64x2_mul_lo(p_i64x2, coeff0_i64x2)); } } for i in 0..4 { v128_store(ll_buf.as_mut_ptr() as *mut v128, ll_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = [normalizer.clip(ll_buf[0]), normalizer.clip(ll_buf[1])]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_one_row( src_row: &[U16x2], dst_row: &mut [U16x2], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut ll_buf = [0i64; 2]; /* |L0 A0 | |L1 A1 | |L2 A2 | |L3 A3 | |0001 0203| |0405 0607| |0809 1011| |1213 1415| Shuffle to extract L0 and A0 as i64: 0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1 Shuffle to extract L1 and A1 as i64: 4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1 Shuffle to extract L2 and A2 as i64: 8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1 Shuffle to extract L3 and A3 as i64: 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1 */ const P0_SHUFFLE: v128 = i8x16(0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1); const P1_SHUFFLE: v128 = i8x16(4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1); const P2_SHUFFLE: v128 = i8x16(8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1); const P3_SHUFFLE: v128 = i8x16( 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1, ); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut ll_sum = i64x2_splat(half_error); let mut coeffs = coeffs_chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff0_i64x2 = i64x2_splat(k[0] as i64); let coeff1_i64x2 = i64x2_splat(k[1] as i64); let coeff2_i64x2 = i64x2_splat(k[2] as i64); let coeff3_i64x2 = i64x2_splat(k[3] as i64); let source = wasm32_utils::load_v128(src_row, x); let p_i64x2 = i8x16_swizzle(source, P0_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff0_i64x2)); let p_i64x2 = i8x16_swizzle(source, P1_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff1_i64x2)); let p_i64x2 = i8x16_swizzle(source, P2_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff2_i64x2)); let p_i64x2 = i8x16_swizzle(source, P3_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff3_i64x2)); x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = i64x2_splat(k[0] as i64); let coeff1_i64x2 = i64x2_splat(k[1] as i64); let source = wasm32_utils::loadl_i64(src_row, x); let p_i64x2 = i8x16_swizzle(source, P0_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff0_i64x2)); let p_i64x2 = i8x16_swizzle(source, P1_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff1_i64x2)); x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x2 = i64x2_splat(k as i64); let source = wasm32_utils::loadl_i32(src_row, x); let p_i64x2 = i8x16_swizzle(source, P0_SHUFFLE); ll_sum = i64x2_add(ll_sum, wasm32_utils::i64x2_mul_lo(p_i64x2, coeff0_i64x2)); } v128_store(ll_buf.as_mut_ptr() as *mut v128, ll_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = [normalizer.clip(ll_buf[0]), normalizer.clip(ll_buf[1])]; } } fast_image_resize-5.3.0/src/convolution/u16x3/avx2.rs000064400000000000000000000302561046102023000206010ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x3; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x3]; 4], dst_rows: [&mut [U16x3]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 4]; let mut rg_bb_buf = [0i64; 4]; let mut bbb_buf = [0i64; 4]; /* |R G B | |R G B | |R G | - |B | |R G B | |R G B | |R | |0001 0203 0405| |0607 0809 1011| |1213 1415| - |0001| |0203 0405 0607| |0809 1011 1213| |1415| Shuffle to extract RG components of pixels 0 and 3 as i64: lo: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 hi: -1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 3, 2 Shuffle to extract RG components of pixels 1 and 4 as i64: lo: -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 7, 6 hi: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract RG components of pixel 2 and BB of pixels 2-3 as i64: lo: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 hi: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract BB components of pixels 0, 1 and 4 as i64: lo: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 5, 4 hi: -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 13, 12 */ let rg03_shuffle = _mm256_set_m128i( _mm_set_epi8(-1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 3, 2), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0), ); let rg14_shuffle = _mm256_set_m128i( _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 7, 6), ); let rg3_b3b4_shuffle = _mm256_set_m128i( _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 1, 0), _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ), ); let b1b2_b5_shuffle = _mm256_set_m128i( _mm_set_epi8( -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 13, 12, ), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 5, 4), ); let width = src_rows[0].len(); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut rg_sum = [_mm256_set1_epi8(0); 4]; let mut rg_bb_sum = [_mm256_set1_epi8(0); 4]; let mut bbb_sum = [_mm256_set1_epi8(0); 4]; let mut coeffs = chunk.values(); let end_x = x + coeffs.len(); if width - end_x >= 1 { let coeffs_by_5 = coeffs.chunks_exact(5); coeffs = coeffs_by_5.remainder(); for k in coeffs_by_5 { let coeff0033_i64x4 = _mm256_set_epi64x(k[3] as i64, k[3] as i64, k[0] as i64, k[0] as i64); let coeff1144_i64x2 = _mm256_set_epi64x(k[4] as i64, k[4] as i64, k[1] as i64, k[1] as i64); let coeff2223_i64x2 = _mm256_set_epi64x(k[3] as i64, k[2] as i64, k[2] as i64, k[2] as i64); let coeff014_i64x2 = _mm256_set_epi64x(0, k[4] as i64, k[1] as i64, k[0] as i64); for i in 0..4 { let source = simd_utils::loadu_si256(src_rows[i], x); let rg03_i64x4 = _mm256_shuffle_epi8(source, rg03_shuffle); rg_sum[i] = _mm256_add_epi64(rg_sum[i], _mm256_mul_epi32(rg03_i64x4, coeff0033_i64x4)); let rg14_i64x4 = _mm256_shuffle_epi8(source, rg14_shuffle); rg_sum[i] = _mm256_add_epi64(rg_sum[i], _mm256_mul_epi32(rg14_i64x4, coeff1144_i64x2)); let rg_bb_i64x4 = _mm256_shuffle_epi8(source, rg3_b3b4_shuffle); rg_bb_sum[i] = _mm256_add_epi64( rg_bb_sum[i], _mm256_mul_epi32(rg_bb_i64x4, coeff2223_i64x2), ); let bbb_i64x4 = _mm256_shuffle_epi8(source, b1b2_b5_shuffle); bbb_sum[i] = _mm256_add_epi64(bbb_sum[i], _mm256_mul_epi32(bbb_i64x4, coeff014_i64x2)); } x += 5; } } for &k in coeffs { let coeff_i64x4 = _mm256_set1_epi64x(k as i64); for i in 0..4 { let &pixel = src_rows[i].get_unchecked(x); let rgb_i64x4 = _mm256_set_epi64x(0, pixel.0[2] as i64, pixel.0[1] as i64, pixel.0[0] as i64); rg_bb_sum[i] = _mm256_add_epi64(rg_bb_sum[i], _mm256_mul_epi32(rgb_i64x4, coeff_i64x4)); } x += 1; } for i in 0..4 { _mm256_storeu_si256(rg_buf.as_mut_ptr() as *mut __m256i, rg_sum[i]); _mm256_storeu_si256(rg_bb_buf.as_mut_ptr() as *mut __m256i, rg_bb_sum[i]); _mm256_storeu_si256(bbb_buf.as_mut_ptr() as *mut __m256i, bbb_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0[0] = normalizer.clip(rg_buf[0] + rg_buf[2] + rg_bb_buf[0] + half_error); dst_pixel.0[1] = normalizer.clip(rg_buf[1] + rg_buf[3] + rg_bb_buf[1] + half_error); dst_pixel.0[2] = normalizer.clip( rg_bb_buf[2] + rg_bb_buf[3] + bbb_buf[0] + bbb_buf[1] + bbb_buf[2] + half_error, ); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_one_row( src_row: &[U16x3], dst_row: &mut [U16x3], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 4]; let mut rg_bb_buf = [0i64; 4]; let mut bbb_buf = [0i64; 4]; /* |R G B | |R G B | |R G | - |B | |R G B | |R G B | |R | |0001 0203 0405| |0607 0809 1011| |1213 1415| - |0001| |0203 0405 0607| |0809 1011 1213| |1415| Shuffle to extract RG components of pixels 0 and 3 as i64: lo: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 hi: -1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 3, 2 Shuffle to extract RG components of pixels 1 and 4 as i64: lo: -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 7, 6 hi: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract RG components of pixel 2 and BB of pixels 2-3 as i64: lo: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 hi: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract BB components of pixels 0, 1 and 4 as i64: lo: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 5, 4 hi: -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 13, 12 */ let rg03_shuffle = _mm256_set_m128i( _mm_set_epi8(-1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 3, 2), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0), ); let rg14_shuffle = _mm256_set_m128i( _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 7, 6), ); let rg3_b3b4_shuffle = _mm256_set_m128i( _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 1, 0), _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ), ); let b1b2_b5_shuffle = _mm256_set_m128i( _mm_set_epi8( -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 13, 12, ), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 5, 4), ); let zero_i64x4 = _mm256_set1_epi8(0); let width = src_row.len(); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut rg_sum = zero_i64x4; let mut rg_bb_sum = zero_i64x4; let mut bbb_sum = zero_i64x4; let mut coeffs = chunk.values(); let end_x = x + coeffs.len(); if width - end_x >= 1 { let coeffs_by_5 = coeffs.chunks_exact(5); coeffs = coeffs_by_5.remainder(); for k in coeffs_by_5 { let coeff0033_i64x4 = _mm256_set_epi64x(k[3] as i64, k[3] as i64, k[0] as i64, k[0] as i64); let coeff1144_i64x2 = _mm256_set_epi64x(k[4] as i64, k[4] as i64, k[1] as i64, k[1] as i64); let coeff2223_i64x2 = _mm256_set_epi64x(k[3] as i64, k[2] as i64, k[2] as i64, k[2] as i64); let coeff014_i64x2 = _mm256_set_epi64x(0, k[4] as i64, k[1] as i64, k[0] as i64); let source = simd_utils::loadu_si256(src_row, x); let rg03_i64x4 = _mm256_shuffle_epi8(source, rg03_shuffle); rg_sum = _mm256_add_epi64(rg_sum, _mm256_mul_epi32(rg03_i64x4, coeff0033_i64x4)); let rg14_i64x4 = _mm256_shuffle_epi8(source, rg14_shuffle); rg_sum = _mm256_add_epi64(rg_sum, _mm256_mul_epi32(rg14_i64x4, coeff1144_i64x2)); let rg_bb_i64x4 = _mm256_shuffle_epi8(source, rg3_b3b4_shuffle); rg_bb_sum = _mm256_add_epi64(rg_bb_sum, _mm256_mul_epi32(rg_bb_i64x4, coeff2223_i64x2)); let bbb_i64x4 = _mm256_shuffle_epi8(source, b1b2_b5_shuffle); bbb_sum = _mm256_add_epi64(bbb_sum, _mm256_mul_epi32(bbb_i64x4, coeff014_i64x2)); x += 5; } } for &k in coeffs { let coeff_i64x4 = _mm256_set1_epi64x(k as i64); let &pixel = src_row.get_unchecked(x); let rgb_i64x4 = _mm256_set_epi64x(0, pixel.0[2] as i64, pixel.0[1] as i64, pixel.0[0] as i64); rg_bb_sum = _mm256_add_epi64(rg_bb_sum, _mm256_mul_epi32(rgb_i64x4, coeff_i64x4)); x += 1; } _mm256_storeu_si256(rg_buf.as_mut_ptr() as *mut __m256i, rg_sum); _mm256_storeu_si256(rg_bb_buf.as_mut_ptr() as *mut __m256i, rg_bb_sum); _mm256_storeu_si256(bbb_buf.as_mut_ptr() as *mut __m256i, bbb_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0[0] = normalizer.clip(rg_buf[0] + rg_buf[2] + rg_bb_buf[0] + half_error); dst_pixel.0[1] = normalizer.clip(rg_buf[1] + rg_buf[3] + rg_bb_buf[1] + half_error); dst_pixel.0[2] = normalizer .clip(rg_bb_buf[2] + rg_bb_buf[3] + bbb_buf[0] + bbb_buf[1] + bbb_buf[2] + half_error); } } fast_image_resize-5.3.0/src/convolution/u16x3/mod.rs000064400000000000000000000050401046102023000204710ustar 00000000000000use super::{Coefficients, Convolution}; use crate::convolution::optimisations::Normalizer32; use crate::convolution::vertical_u16::vert_convolution_u16; use crate::pixels::U16x3; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U16x3; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let normalizer = Normalizer32::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let normalizer = Normalizer32::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_v! { vert_convolution_u16( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, normalizer), _ => native::horiz_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/u16x3/native.rs000064400000000000000000000023371046102023000212060ustar 00000000000000use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x3; use crate::{ImageView, ImageViewMut}; #[inline(always)] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let coefficients_chunks = normalizer.chunks(); let initial = 1i64 << (precision - 1); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (coeffs_chunk, dst_pixel) in coefficients_chunks.iter().zip(dst_row.iter_mut()) { let first_x_src = coeffs_chunk.start as usize; let mut ss = [initial; 3]; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, src_pixel) in coeffs_chunk.values().iter().zip(src_pixels) { for (s, c) in ss.iter_mut().zip(src_pixel.0) { *s += c as i64 * (k as i64); } } for (i, s) in ss.iter().copied().enumerate() { dst_pixel.0[i] = normalizer.clip(s); } } } } fast_image_resize-5.3.0/src/convolution/u16x3/neon.rs000064400000000000000000000136161046102023000206610ustar 00000000000000use std::arch::aarch64::*; use crate::convolution::optimisations::{CoefficientsI32Chunk, Normalizer32}; use crate::neon_utils; use crate::pixels::U16x3; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_64_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let coefficients_chunks = normalizer.chunks(); let src_iter = src_view.iter_rows(offset); let dst_iter = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_one_row::(src_row, dst_row, coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_one_row( src_row: &[U16x3], dst_row: &mut [U16x3], coefficients_chunks: &[CoefficientsI32Chunk], ) { let initial = vdupq_n_s64(1i64 << (PRECISION - 2)); let zero_u16x8 = vdupq_n_u16(0); let zero_u16x4 = vdup_n_u16(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss = [initial; 3]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i32x4x2 = neon_utils::load_i32x4x2(k, 0); let source = neon_utils::load_deintrel_u16x8x3(src_row, x); sss[0] = conv_8_comp(sss[0], source.0, coeffs_i32x4x2, zero_u16x8); sss[1] = conv_8_comp(sss[1], source.1, coeffs_i32x4x2, zero_u16x8); sss[2] = conv_8_comp(sss[2], source.2, coeffs_i32x4x2, zero_u16x8); x += 8; } let mut coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i32x4 = neon_utils::load_i32x4(k, 0); let source = neon_utils::load_deintrel_u16x4x3(src_row, x); sss[0] = conv_4_comp(sss[0], source.0, coeffs_i32x4, zero_u16x4); sss[1] = conv_4_comp(sss[1], source.1, coeffs_i32x4, zero_u16x4); sss[2] = conv_4_comp(sss[2], source.2, coeffs_i32x4, zero_u16x4); x += 4; } let mut coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); if let Some(k) = coeffs_by_2.next() { let coeffs_i32x2 = neon_utils::load_i32x2(k, 0); let source = neon_utils::load_deintrel_u16x2x3(src_row, x); sss[0] = conv_2_comp(sss[0], source.0, coeffs_i32x2, zero_u16x4); sss[1] = conv_2_comp(sss[1], source.1, coeffs_i32x2, zero_u16x4); sss[2] = conv_2_comp(sss[2], source.2, coeffs_i32x2, zero_u16x4); x += 2; } if !coeffs.is_empty() { let coeffs_i32x2 = neon_utils::load_i32x1(coeffs, 0); let source = neon_utils::load_deintrel_u16x1x3(src_row, x); sss[0] = conv_2_comp(sss[0], source.0, coeffs_i32x2, zero_u16x4); sss[1] = conv_2_comp(sss[1], source.1, coeffs_i32x2, zero_u16x4); sss[2] = conv_2_comp(sss[2], source.2, coeffs_i32x2, zero_u16x4); } let mut sss_i64 = [ vadd_s64(vget_low_s64(sss[0]), vget_high_s64(sss[0])), vadd_s64(vget_low_s64(sss[1]), vget_high_s64(sss[1])), vadd_s64(vget_low_s64(sss[2]), vget_high_s64(sss[2])), ]; sss_i64[0] = vshr_n_s64::(sss_i64[0]); sss_i64[1] = vshr_n_s64::(sss_i64[1]); sss_i64[2] = vshr_n_s64::(sss_i64[2]); dst_row.get_unchecked_mut(dst_x).0 = [ vqmovns_u32(vqmovund_s64(vdupd_lane_s64::<0>(sss_i64[0]))), vqmovns_u32(vqmovund_s64(vdupd_lane_s64::<0>(sss_i64[1]))), vqmovns_u32(vqmovund_s64(vdupd_lane_s64::<0>(sss_i64[2]))), ]; } } #[inline(always)] unsafe fn conv_8_comp( mut sss: int64x2_t, source: uint16x8_t, coeffs: int32x4x2_t, zero_u16x8: uint16x8_t, ) -> int64x2_t { let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss = vmlal_s32(sss, vget_low_s32(pix_i32), vget_low_s32(coeffs.0)); sss = vmlal_s32(sss, vget_high_s32(pix_i32), vget_high_s32(coeffs.0)); let pix_i32 = vreinterpretq_s32_u16(vzip2q_u16(source, zero_u16x8)); sss = vmlal_s32(sss, vget_low_s32(pix_i32), vget_low_s32(coeffs.1)); sss = vmlal_s32(sss, vget_high_s32(pix_i32), vget_high_s32(coeffs.1)); sss } #[inline(always)] unsafe fn conv_4_comp( mut sss: int64x2_t, source: uint16x4_t, coeffs: int32x4_t, zero_u16x4: uint16x4_t, ) -> int64x2_t { let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, vget_low_s32(coeffs)); let pix_i32 = vreinterpret_s32_u16(vzip2_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, vget_high_s32(coeffs)); sss } #[inline(always)] unsafe fn conv_2_comp( mut sss: int64x2_t, source: uint16x4_t, coeffs: int32x2_t, zero_u16x4: uint16x4_t, ) -> int64x2_t { let pix_i32 = vreinterpret_s32_u16(vzip1_u16(source, zero_u16x4)); sss = vmlal_s32(sss, pix_i32, coeffs); sss } fast_image_resize-5.3.0/src/convolution/u16x3/sse4.rs000064400000000000000000000204711046102023000205750ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x3; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x3]; 4], dst_rows: [&mut [U16x3]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 2]; let mut bb_buf = [0i64; 2]; /* |R G B | |R G B | |R G | |0001 0203 0405| |0607 0809 1011| |1213 1415| Shuffle to extract RG components of first pixel as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract RG components of second pixel as i64: -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 7, 6 Shuffle to extract B components of two pixels as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 5, 4 */ let rg0_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0); let rg1_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 7, 6); let bb_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 5, 4); let width = src_rows[0].len(); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut rg_sum = [_mm_set1_epi8(0); 4]; let mut bb_sum = [_mm_set1_epi8(0); 4]; let mut coeffs = chunk.values(); let end_x = x + coeffs.len(); if width - end_x >= 1 { let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = _mm_set1_epi64x(k[0] as i64); let coeff1_i64x2 = _mm_set1_epi64x(k[1] as i64); let coeff_i64x2 = _mm_set_epi64x(k[1] as i64, k[0] as i64); for i in 0..4 { let source = simd_utils::loadu_si128(src_rows[i], x); let rg0_i64x2 = _mm_shuffle_epi8(source, rg0_shuffle); rg_sum[i] = _mm_add_epi64(rg_sum[i], _mm_mul_epi32(rg0_i64x2, coeff0_i64x2)); let rg1_i64x2 = _mm_shuffle_epi8(source, rg1_shuffle); rg_sum[i] = _mm_add_epi64(rg_sum[i], _mm_mul_epi32(rg1_i64x2, coeff1_i64x2)); let bb_i64x2 = _mm_shuffle_epi8(source, bb_shuffle); bb_sum[i] = _mm_add_epi64(bb_sum[i], _mm_mul_epi32(bb_i64x2, coeff_i64x2)); } x += 2; } } for &k in coeffs { let coeff_i64x2 = _mm_set1_epi64x(k as i64); for i in 0..4 { let &pixel = src_rows[i].get_unchecked(x); let rg_i64x2 = _mm_set_epi64x(pixel.0[1] as i64, pixel.0[0] as i64); rg_sum[i] = _mm_add_epi64(rg_sum[i], _mm_mul_epi32(rg_i64x2, coeff_i64x2)); let bb_i64x2 = _mm_set_epi64x(0, pixel.0[2] as i64); bb_sum[i] = _mm_add_epi64(bb_sum[i], _mm_mul_epi32(bb_i64x2, coeff_i64x2)); } x += 1; } for i in 0..4 { _mm_storeu_si128(rg_buf.as_mut_ptr() as *mut __m128i, rg_sum[i]); _mm_storeu_si128(bb_buf.as_mut_ptr() as *mut __m128i, bb_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0[0] = normalizer.clip(rg_buf[0] + half_error); dst_pixel.0[1] = normalizer.clip(rg_buf[1] + half_error); dst_pixel.0[2] = normalizer.clip(bb_buf[0] + bb_buf[1] + half_error); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_one_row( src_row: &[U16x3], dst_row: &mut [U16x3], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let rg_initial = _mm_set1_epi64x(1 << (precision - 1)); let bb_initial = _mm_set1_epi64x(1 << (precision - 2)); /* |R G B | |R G B | |R G | |0001 0203 0405| |0607 0809 1011| |1213 1415| Shuffle to extract RG components of first pixel as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract RG components of second pixel as i64: -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 7, 6 Shuffle to extract B components of two pixels as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 5, 4 */ let rg0_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0); let rg1_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 7, 6); let bb_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 5, 4); let mut rg_buf = [0i64; 2]; let mut bb_buf = [0i64; 2]; let width = src_row.len(); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut rg_sum = rg_initial; let mut bb_sum = bb_initial; let mut coeffs = chunk.values(); let end_x = x + coeffs.len(); if width - end_x >= 1 { let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = _mm_set1_epi64x(k[0] as i64); let coeff1_i64x2 = _mm_set1_epi64x(k[1] as i64); let coeff_i64x2 = _mm_set_epi64x(k[1] as i64, k[0] as i64); let source = simd_utils::loadu_si128(src_row, x); let rg0_i64x2 = _mm_shuffle_epi8(source, rg0_shuffle); rg_sum = _mm_add_epi64(rg_sum, _mm_mul_epi32(rg0_i64x2, coeff0_i64x2)); let rg1_i64x2 = _mm_shuffle_epi8(source, rg1_shuffle); rg_sum = _mm_add_epi64(rg_sum, _mm_mul_epi32(rg1_i64x2, coeff1_i64x2)); let bb_i64x2 = _mm_shuffle_epi8(source, bb_shuffle); bb_sum = _mm_add_epi64(bb_sum, _mm_mul_epi32(bb_i64x2, coeff_i64x2)); x += 2; } } for &k in coeffs { let coeff_i64x2 = _mm_set1_epi64x(k as i64); let &pixel = src_row.get_unchecked(x); let rg_i64x2 = _mm_set_epi64x(pixel.0[1] as i64, pixel.0[0] as i64); rg_sum = _mm_add_epi64(rg_sum, _mm_mul_epi32(rg_i64x2, coeff_i64x2)); let bb_i64x2 = _mm_set_epi64x(0, pixel.0[2] as i64); bb_sum = _mm_add_epi64(bb_sum, _mm_mul_epi32(bb_i64x2, coeff_i64x2)); x += 1; } _mm_storeu_si128(rg_buf.as_mut_ptr() as *mut __m128i, rg_sum); _mm_storeu_si128(bb_buf.as_mut_ptr() as *mut __m128i, bb_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0[0] = normalizer.clip(rg_buf[0]); dst_pixel.0[1] = normalizer.clip(rg_buf[1]); dst_pixel.0[2] = normalizer.clip(bb_buf[0] + bb_buf[1]); } } fast_image_resize-5.3.0/src/convolution/u16x3/wasm32.rs000064400000000000000000000210061046102023000210260ustar 00000000000000use std::arch::wasm32::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x3; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x3]; 4], dst_rows: [&mut [U16x3]; 4], normalizer: &Normalizer32, ) { const ZERO: v128 = i64x2(0, 0); let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 2]; let mut bb_buf = [0i64; 2]; /* |R G B | |R G B | |R G | |0001 0203 0405| |0607 0809 1011| |1213 1415| Shuffle to extract RG components of first pixel as i64: 0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1 Shuffle to extract RG components of second pixel as i64: 6, 7, -1, -1, -1, -1, -1, -1, 8, 9, -1, -1, -1, -1, -1, -1 Shuffle to extract B components of two pixels as i64: 4, 5, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1 */ const RG0_SHUFFLE: v128 = i8x16(0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1); const RG1_SHUFFLE: v128 = i8x16(6, 7, -1, -1, -1, -1, -1, -1, 8, 9, -1, -1, -1, -1, -1, -1); const BB_SHUFFLE: v128 = i8x16(4, 5, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1); let width = src_rows[0].len(); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut rg_sum = [ZERO; 4]; let mut bb_sum = [ZERO; 4]; let mut coeffs = coeffs_chunk.values(); let end_x = x + coeffs.len(); if width - end_x >= 1 { let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = i64x2_splat(k[0] as i64); let coeff1_i64x2 = i64x2_splat(k[1] as i64); let coeff_i64x2 = i64x2(k[0] as i64, k[1] as i64); for i in 0..4 { let source = wasm32_utils::load_v128(src_rows[i], x); let rg0_i64x2 = i8x16_swizzle(source, RG0_SHUFFLE); rg_sum[i] = i64x2_add( rg_sum[i], wasm32_utils::i64x2_mul_lo(rg0_i64x2, coeff0_i64x2), ); let rg1_i64x2 = i8x16_swizzle(source, RG1_SHUFFLE); rg_sum[i] = i64x2_add( rg_sum[i], wasm32_utils::i64x2_mul_lo(rg1_i64x2, coeff1_i64x2), ); let bb_i64x2 = i8x16_swizzle(source, BB_SHUFFLE); bb_sum[i] = i64x2_add(bb_sum[i], wasm32_utils::i64x2_mul_lo(bb_i64x2, coeff_i64x2)); } x += 2; } } for &k in coeffs { let coeff_i64x2 = i64x2_splat(k as i64); for i in 0..4 { let &pixel = src_rows[i].get_unchecked(x); let rg_i64x2 = i64x2(pixel.0[0] as i64, pixel.0[1] as i64); rg_sum[i] = i64x2_add(rg_sum[i], wasm32_utils::i64x2_mul_lo(rg_i64x2, coeff_i64x2)); let bb_i64x2 = i64x2(pixel.0[2] as i64, 0); bb_sum[i] = i64x2_add(bb_sum[i], wasm32_utils::i64x2_mul_lo(bb_i64x2, coeff_i64x2)); } x += 1; } for i in 0..4 { v128_store(rg_buf.as_mut_ptr() as *mut v128, rg_sum[i]); v128_store(bb_buf.as_mut_ptr() as *mut v128, bb_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0[0] = normalizer.clip(rg_buf[0] + half_error); dst_pixel.0[1] = normalizer.clip(rg_buf[1] + half_error); dst_pixel.0[2] = normalizer.clip(bb_buf[0] + bb_buf[1] + half_error); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_one_row( src_row: &[U16x3], dst_row: &mut [U16x3], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let rg_initial = i64x2_splat(1 << (precision - 1)); let bb_initial = i64x2_splat(1 << (precision - 2)); /* |R G B | |R G B | |R G | |0001 0203 0405| |0607 0809 1011| |1213 1415| Shuffle to extract RG components of first pixel as i64: 0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1 Shuffle to extract RG components of second pixel as i64: 6, 7, -1, -1, -1, -1, -1, -1, 8, 9, -1, -1, -1, -1, -1, -1 Shuffle to extract B components of two pixels as i64: 4, 5, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1 */ const RG0_SHUFFLE: v128 = i8x16(0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1); const RG1_SHUFFLE: v128 = i8x16(6, 7, -1, -1, -1, -1, -1, -1, 8, 9, -1, -1, -1, -1, -1, -1); const BB_SHUFFLE: v128 = i8x16(4, 5, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1); let mut rg_buf = [0i64; 2]; let mut bb_buf = [0i64; 2]; let width = src_row.len(); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut rg_sum = rg_initial; let mut bb_sum = bb_initial; let mut coeffs = coeffs_chunk.values(); let end_x = x + coeffs.len(); if width - end_x >= 1 { let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = i64x2_splat(k[0] as i64); let coeff1_i64x2 = i64x2_splat(k[1] as i64); let coeff_i64x2 = i64x2(k[0] as i64, k[1] as i64); let source = wasm32_utils::load_v128(src_row, x); let rg0_i64x2 = i8x16_swizzle(source, RG0_SHUFFLE); rg_sum = i64x2_add(rg_sum, wasm32_utils::i64x2_mul_lo(rg0_i64x2, coeff0_i64x2)); let rg1_i64x2 = i8x16_swizzle(source, RG1_SHUFFLE); rg_sum = i64x2_add(rg_sum, wasm32_utils::i64x2_mul_lo(rg1_i64x2, coeff1_i64x2)); let bb_i64x2 = i8x16_swizzle(source, BB_SHUFFLE); bb_sum = i64x2_add(bb_sum, wasm32_utils::i64x2_mul_lo(bb_i64x2, coeff_i64x2)); x += 2; } } for &k in coeffs { let coeff_i64x2 = i64x2_splat(k as i64); let &pixel = src_row.get_unchecked(x); let rg_i64x2 = i64x2(pixel.0[0] as i64, pixel.0[1] as i64); rg_sum = i64x2_add(rg_sum, wasm32_utils::i64x2_mul_lo(rg_i64x2, coeff_i64x2)); let bb_i64x2 = i64x2(pixel.0[2] as i64, 0); bb_sum = i64x2_add(bb_sum, wasm32_utils::i64x2_mul_lo(bb_i64x2, coeff_i64x2)); x += 1; } v128_store(rg_buf.as_mut_ptr() as *mut v128, rg_sum); v128_store(bb_buf.as_mut_ptr() as *mut v128, bb_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0[0] = normalizer.clip(rg_buf[0]); dst_pixel.0[1] = normalizer.clip(rg_buf[1]); dst_pixel.0[2] = normalizer.clip(bb_buf[0] + bb_buf[1]); } } fast_image_resize-5.3.0/src/convolution/u16x4/avx2.rs000064400000000000000000000262421046102023000206020ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x4; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x4]; 4], dst_rows: [&mut [U16x4]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 4]; let mut ba_buf = [0i64; 4]; /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| Shuffle to extract R0 and G0 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract R1 and G1 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract B0 and A0 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract B1 and A1 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ #[rustfmt::skip] let rg0_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, ); #[rustfmt::skip] let rg1_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, ); #[rustfmt::skip] let ba0_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, ); #[rustfmt::skip] let ba1_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut coeffs = chunk.values(); let mut rg_sum = [_mm256_set1_epi64x(half_error); 2]; let mut ba_sum = [_mm256_set1_epi64x(half_error); 2]; let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x4 = _mm256_set1_epi64x(k[0] as i64); let coeff1_i64x4 = _mm256_set1_epi64x(k[1] as i64); for i in 0..2 { let source = _mm256_set_m128i( simd_utils::loadu_si128(src_rows[i * 2 + 1], x), simd_utils::loadu_si128(src_rows[i * 2], x), ); let mut sum = rg_sum[i]; let rg_i64x4 = _mm256_shuffle_epi8(source, rg0_shuffle); sum = _mm256_add_epi64(sum, _mm256_mul_epi32(rg_i64x4, coeff0_i64x4)); let rg_i64x4 = _mm256_shuffle_epi8(source, rg1_shuffle); sum = _mm256_add_epi64(sum, _mm256_mul_epi32(rg_i64x4, coeff1_i64x4)); rg_sum[i] = sum; let mut sum = ba_sum[i]; let ba_i64x4 = _mm256_shuffle_epi8(source, ba0_shuffle); sum = _mm256_add_epi64(sum, _mm256_mul_epi32(ba_i64x4, coeff0_i64x4)); let ba_i64x4 = _mm256_shuffle_epi8(source, ba1_shuffle); sum = _mm256_add_epi64(sum, _mm256_mul_epi32(ba_i64x4, coeff1_i64x4)); ba_sum[i] = sum; } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x4 = _mm256_set1_epi64x(k as i64); for i in 0..2 { let source = _mm256_set_m128i( simd_utils::loadl_epi64(src_rows[i * 2 + 1], x), simd_utils::loadl_epi64(src_rows[i * 2], x), ); let mut sum = rg_sum[i]; let rg_i64x4 = _mm256_shuffle_epi8(source, rg0_shuffle); sum = _mm256_add_epi64(sum, _mm256_mul_epi32(rg_i64x4, coeff0_i64x4)); rg_sum[i] = sum; let mut sum = ba_sum[i]; let ba_i64x4 = _mm256_shuffle_epi8(source, ba0_shuffle); sum = _mm256_add_epi64(sum, _mm256_mul_epi32(ba_i64x4, coeff0_i64x4)); ba_sum[i] = sum; } } for i in 0..2 { _mm256_storeu_si256(rg_buf.as_mut_ptr() as *mut __m256i, rg_sum[i]); _mm256_storeu_si256(ba_buf.as_mut_ptr() as *mut __m256i, ba_sum[i]); let dst_pixel = dst_rows[i * 2].get_unchecked_mut(dst_x); dst_pixel.0 = [ normalizer.clip(rg_buf[0]), normalizer.clip(rg_buf[1]), normalizer.clip(ba_buf[0]), normalizer.clip(ba_buf[1]), ]; let dst_pixel = dst_rows[i * 2 + 1].get_unchecked_mut(dst_x); dst_pixel.0 = [ normalizer.clip(rg_buf[2]), normalizer.clip(rg_buf[3]), normalizer.clip(ba_buf[2]), normalizer.clip(ba_buf[3]), ]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_one_row( src_row: &[U16x4], dst_row: &mut [U16x4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 4]; let mut ba_buf = [0i64; 4]; /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| Shuffle to extract R0 and G0 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract R1 and G1 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract B0 and A0 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract B1 and A1 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ #[rustfmt::skip] let rg02_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0, ); #[rustfmt::skip] let rg13_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8, ); #[rustfmt::skip] let ba02_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4, ); #[rustfmt::skip] let ba13_shuffle = _mm256_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut coeffs = chunk.values(); let mut rg_sum = _mm256_setzero_si256(); let mut ba_sum = _mm256_setzero_si256(); let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeff02_i64x4 = _mm256_set_epi64x(k[2] as i64, k[2] as i64, k[0] as i64, k[0] as i64); let coeff13_i64x4 = _mm256_set_epi64x(k[3] as i64, k[3] as i64, k[1] as i64, k[1] as i64); let source = simd_utils::loadu_si256(src_row, x); let rg_i64x4 = _mm256_shuffle_epi8(source, rg02_shuffle); rg_sum = _mm256_add_epi64(rg_sum, _mm256_mul_epi32(rg_i64x4, coeff02_i64x4)); let rg_i64x4 = _mm256_shuffle_epi8(source, rg13_shuffle); rg_sum = _mm256_add_epi64(rg_sum, _mm256_mul_epi32(rg_i64x4, coeff13_i64x4)); let ba_i64x4 = _mm256_shuffle_epi8(source, ba02_shuffle); ba_sum = _mm256_add_epi64(ba_sum, _mm256_mul_epi32(ba_i64x4, coeff02_i64x4)); let ba_i64x4 = _mm256_shuffle_epi8(source, ba13_shuffle); ba_sum = _mm256_add_epi64(ba_sum, _mm256_mul_epi32(ba_i64x4, coeff13_i64x4)); x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff01_i64x4 = _mm256_set_epi64x(k[1] as i64, k[1] as i64, k[0] as i64, k[0] as i64); let source = _mm256_set_m128i( simd_utils::loadl_epi64(src_row, x + 1), simd_utils::loadl_epi64(src_row, x), ); let rg_i64x4 = _mm256_shuffle_epi8(source, rg02_shuffle); rg_sum = _mm256_add_epi64(rg_sum, _mm256_mul_epi32(rg_i64x4, coeff01_i64x4)); let ba_i64x4 = _mm256_shuffle_epi8(source, ba02_shuffle); ba_sum = _mm256_add_epi64(ba_sum, _mm256_mul_epi32(ba_i64x4, coeff01_i64x4)); x += 2; } if let Some(&k) = coeffs.first() { let coeff_i64x4 = _mm256_set_epi64x(0, 0, k as i64, k as i64); let source = _mm256_set_m128i(_mm_setzero_si128(), simd_utils::loadl_epi64(src_row, x)); let rg_i64x4 = _mm256_shuffle_epi8(source, rg02_shuffle); rg_sum = _mm256_add_epi64(rg_sum, _mm256_mul_epi32(rg_i64x4, coeff_i64x4)); let ba_i64x4 = _mm256_shuffle_epi8(source, ba02_shuffle); ba_sum = _mm256_add_epi64(ba_sum, _mm256_mul_epi32(ba_i64x4, coeff_i64x4)); } _mm256_storeu_si256(rg_buf.as_mut_ptr() as *mut __m256i, rg_sum); _mm256_storeu_si256(ba_buf.as_mut_ptr() as *mut __m256i, ba_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = [ normalizer.clip(rg_buf[0] + rg_buf[2] + half_error), normalizer.clip(rg_buf[1] + rg_buf[3] + half_error), normalizer.clip(ba_buf[0] + ba_buf[2] + half_error), normalizer.clip(ba_buf[1] + ba_buf[3] + half_error), ]; } } fast_image_resize-5.3.0/src/convolution/u16x4/mod.rs000064400000000000000000000050401046102023000204720ustar 00000000000000use super::{Coefficients, Convolution}; use crate::convolution::optimisations::Normalizer32; use crate::convolution::vertical_u16::vert_convolution_u16; use crate::pixels::U16x4; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U16x4; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let normalizer = Normalizer32::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let normalizer = Normalizer32::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_v! { vert_convolution_u16( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, normalizer), _ => native::horiz_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/u16x4/native.rs000064400000000000000000000023511046102023000212030ustar 00000000000000use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x4; use crate::{ImageView, ImageViewMut}; #[inline(always)] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let coefficients_chunks = normalizer.chunks(); let initial: i64 = 1 << (precision - 1); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (coeffs_chunk, dst_pixel) in coefficients_chunks.iter().zip(dst_row.iter_mut()) { let first_x_src = coeffs_chunk.start as usize; let mut ss = [initial; 4]; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, src_pixel) in coeffs_chunk.values().iter().zip(src_pixels) { for (i, s) in ss.iter_mut().enumerate() { *s += src_pixel.0[i] as i64 * (k as i64); } } for (i, s) in ss.iter().copied().enumerate() { dst_pixel.0[i] = normalizer.clip(s); } } } } fast_image_resize-5.3.0/src/convolution/u16x4/neon.rs000064400000000000000000000241451046102023000206610ustar 00000000000000use std::arch::aarch64::*; use crate::convolution::optimisations::{CoefficientsI32Chunk, Normalizer32}; use crate::neon_utils; use crate::pixels::U16x4; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_64_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let coefficients_chunks = normalizer.chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x4]; 4], dst_rows: [&mut [U16x4]; 4], coefficients_chunks: &[CoefficientsI32Chunk], ) { let initial = vdupq_n_s64(1i64 << (PRECISION - 1)); let zero_u16x8 = vdupq_n_u16(0); let zero_u16x4 = vdup_n_u16(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss_a = [int64x2x2_t(initial, initial); 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(4); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i32x4 = neon_utils::load_i32x4(k, 0); let coeff0 = vdup_laneq_s32::<0>(coeffs_i32x4); let coeff1 = vdup_laneq_s32::<1>(coeffs_i32x4); let coeff2 = vdup_laneq_s32::<2>(coeffs_i32x4); let coeff3 = vdup_laneq_s32::<3>(coeffs_i32x4); for i in 0..4 { let mut sss = sss_a[i]; let source = neon_utils::load_u16x8(src_rows[i], x); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff0); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff0); let pix_i32 = vreinterpretq_s32_u16(vzip2q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff1); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff1); let source = neon_utils::load_u16x8(src_rows[i], x + 2); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff2); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff2); let pix_i32 = vreinterpretq_s32_u16(vzip2q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff3); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff3); sss_a[i] = sss; } x += 4; } let coeffs_by_4 = coeffs.chunks_exact(2); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeffs_i32x2 = neon_utils::load_i32x2(k, 0); let coeff0 = vdup_lane_s32::<0>(coeffs_i32x2); let coeff1 = vdup_lane_s32::<1>(coeffs_i32x2); for i in 0..4 { let mut sss = sss_a[i]; let source = neon_utils::load_u16x8(src_rows[i], x); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff0); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff0); let pix_i32 = vreinterpretq_s32_u16(vzip2q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff1); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff1); sss_a[i] = sss; } x += 2; } if let Some(&k) = coeffs.first() { let coeff = vdup_n_s32(k); for i in 0..4 { let mut sss = sss_a[i]; let source = vcombine_u16(neon_utils::load_u16x4(src_rows[i], x), zero_u16x4); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff); sss_a[i] = sss; } } sss_a[0].0 = vshrq_n_s64::(sss_a[0].0); sss_a[0].1 = vshrq_n_s64::(sss_a[0].1); sss_a[1].0 = vshrq_n_s64::(sss_a[1].0); sss_a[1].1 = vshrq_n_s64::(sss_a[1].1); sss_a[2].0 = vshrq_n_s64::(sss_a[2].0); sss_a[2].1 = vshrq_n_s64::(sss_a[2].1); sss_a[3].0 = vshrq_n_s64::(sss_a[3].0); sss_a[3].1 = vshrq_n_s64::(sss_a[3].1); for i in 0..4 { let sss = sss_a[i]; let sss_i32x4 = vcombine_s32(vqmovn_s64(sss.0), vqmovn_s64(sss.1)); let sss_u16x4 = vqmovun_s32(sss_i32x4); let dst_pix = dst_rows[i].get_unchecked_mut(dst_x); let ptr = dst_pix as *mut U16x4 as *mut u16; vst1_u16(ptr, sss_u16x4); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_one_row( src_row: &[U16x4], dst_row: &mut [U16x4], coefficients_chunks: &[CoefficientsI32Chunk], ) { let initial = vdupq_n_s64(1i64 << (PRECISION - 1)); let zero_u16x8 = vdupq_n_u16(0); let zero_u16x4 = vdup_n_u16(0); for (coeffs_chunk, dst_pix) in coefficients_chunks.iter().zip(dst_row) { let mut x: usize = coeffs_chunk.start as usize; let mut sss = int64x2x2_t(initial, initial); let mut coeffs = coeffs_chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeffs_i32x4 = neon_utils::load_i32x4(k, 0); let source = neon_utils::load_u16x8(src_row, x); let coeff = vdup_laneq_s32::<0>(coeffs_i32x4); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff); let coeff = vdup_laneq_s32::<1>(coeffs_i32x4); let pix_i32 = vreinterpretq_s32_u16(vzip2q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff); let source = neon_utils::load_u16x8(src_row, x + 2); let coeff = vdup_laneq_s32::<2>(coeffs_i32x4); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff); let coeff = vdup_laneq_s32::<3>(coeffs_i32x4); let pix_i32 = vreinterpretq_s32_u16(vzip2q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff); x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeffs_i32x2 = neon_utils::load_i32x2(k, 0); let source = neon_utils::load_u16x8(src_row, x); let coeff = vdup_lane_s32::<0>(coeffs_i32x2); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff); let coeff = vdup_lane_s32::<1>(coeffs_i32x2); let pix_i32 = vreinterpretq_s32_u16(vzip2q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff); x += 2; } if let Some(&k) = coeffs.first() { let source = vcombine_u16(neon_utils::load_u16x4(src_row, x), zero_u16x4); let coeff = vdup_n_s32(k); let pix_i32 = vreinterpretq_s32_u16(vzip1q_u16(source, zero_u16x8)); sss.0 = vmlal_s32(sss.0, vget_low_s32(pix_i32), coeff); sss.1 = vmlal_s32(sss.1, vget_high_s32(pix_i32), coeff); } sss.0 = vshrq_n_s64::(sss.0); sss.1 = vshrq_n_s64::(sss.1); let sss_i32x4 = vcombine_s32(vqmovn_s64(sss.0), vqmovn_s64(sss.1)); let sss_u16x4 = vqmovun_s32(sss_i32x4); let ptr = dst_pix as *mut U16x4 as *mut u16; vst1_u16(ptr, sss_u16x4); } } fast_image_resize-5.3.0/src/convolution/u16x4/sse4.rs000064400000000000000000000210351046102023000205730ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x4; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x4]; 4], dst_rows: [&mut [U16x4]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 2]; let mut ba_buf = [0i64; 2]; /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| Shuffle to extract R0 and G0 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract R1 and G1 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract B0 and A0 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract B1 and A1 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ let rg0_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0); let rg1_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8); let ba0_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4); let ba1_shuffle = _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut coeffs = chunk.values(); let mut rg_sum = [_mm_set1_epi64x(half_error); 4]; let mut ba_sum = [_mm_set1_epi64x(half_error); 4]; let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = _mm_set1_epi64x(k[0] as i64); let coeff1_i64x2 = _mm_set1_epi64x(k[1] as i64); for i in 0..4 { let source = simd_utils::loadu_si128(src_rows[i], x); let mut sum = rg_sum[i]; let rg_i64x2 = _mm_shuffle_epi8(source, rg0_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(rg_i64x2, coeff0_i64x2)); let rg_i64x2 = _mm_shuffle_epi8(source, rg1_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(rg_i64x2, coeff1_i64x2)); rg_sum[i] = sum; let mut sum = ba_sum[i]; let ba_i64x2 = _mm_shuffle_epi8(source, ba0_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(ba_i64x2, coeff0_i64x2)); let ba_i64x2 = _mm_shuffle_epi8(source, ba1_shuffle); sum = _mm_add_epi64(sum, _mm_mul_epi32(ba_i64x2, coeff1_i64x2)); ba_sum[i] = sum; } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x2 = _mm_set1_epi64x(k as i64); for i in 0..4 { let source = simd_utils::loadl_epi64(src_rows[i], x); let rg_i64x2 = _mm_shuffle_epi8(source, rg0_shuffle); rg_sum[i] = _mm_add_epi64(rg_sum[i], _mm_mul_epi32(rg_i64x2, coeff0_i64x2)); let ba_i64x2 = _mm_shuffle_epi8(source, ba0_shuffle); ba_sum[i] = _mm_add_epi64(ba_sum[i], _mm_mul_epi32(ba_i64x2, coeff0_i64x2)); } } for i in 0..4 { _mm_storeu_si128(rg_buf.as_mut_ptr() as *mut __m128i, rg_sum[i]); _mm_storeu_si128(ba_buf.as_mut_ptr() as *mut __m128i, ba_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = [ normalizer.clip(rg_buf[0]), normalizer.clip(rg_buf[1]), normalizer.clip(ba_buf[0]), normalizer.clip(ba_buf[1]), ]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_one_row( src_row: &[U16x4], dst_row: &mut [U16x4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 2]; let mut ba_buf = [0i64; 2]; /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| Shuffle to extract R0 and G0 as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract R1 and G1 as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract B0 and A0 as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract B1 and A1 as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ let rg0_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0); let rg1_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8); let ba0_shuffle = _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4); let ba1_shuffle = _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut coeffs = chunk.values(); let mut rg_sum = _mm_set1_epi64x(half_error); let mut ba_sum = _mm_set1_epi64x(half_error); let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = _mm_set1_epi64x(k[0] as i64); let coeff1_i64x2 = _mm_set1_epi64x(k[1] as i64); let source = simd_utils::loadu_si128(src_row, x); let rg_i64x2 = _mm_shuffle_epi8(source, rg0_shuffle); rg_sum = _mm_add_epi64(rg_sum, _mm_mul_epi32(rg_i64x2, coeff0_i64x2)); let rg_i64x2 = _mm_shuffle_epi8(source, rg1_shuffle); rg_sum = _mm_add_epi64(rg_sum, _mm_mul_epi32(rg_i64x2, coeff1_i64x2)); let ba_i64x2 = _mm_shuffle_epi8(source, ba0_shuffle); ba_sum = _mm_add_epi64(ba_sum, _mm_mul_epi32(ba_i64x2, coeff0_i64x2)); let ba_i64x2 = _mm_shuffle_epi8(source, ba1_shuffle); ba_sum = _mm_add_epi64(ba_sum, _mm_mul_epi32(ba_i64x2, coeff1_i64x2)); x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x2 = _mm_set1_epi64x(k as i64); let source = simd_utils::loadl_epi64(src_row, x); let rg_i64x2 = _mm_shuffle_epi8(source, rg0_shuffle); rg_sum = _mm_add_epi64(rg_sum, _mm_mul_epi32(rg_i64x2, coeff0_i64x2)); let ba_i64x2 = _mm_shuffle_epi8(source, ba0_shuffle); ba_sum = _mm_add_epi64(ba_sum, _mm_mul_epi32(ba_i64x2, coeff0_i64x2)); } _mm_storeu_si128(rg_buf.as_mut_ptr() as *mut __m128i, rg_sum); _mm_storeu_si128(ba_buf.as_mut_ptr() as *mut __m128i, ba_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = [ normalizer.clip(rg_buf[0]), normalizer.clip(rg_buf[1]), normalizer.clip(ba_buf[0]), normalizer.clip(ba_buf[1]), ]; } } fast_image_resize-5.3.0/src/convolution/u16x4/wasm32.rs000064400000000000000000000213371046102023000210360ustar 00000000000000use std::arch::wasm32::*; use crate::convolution::optimisations::Normalizer32; use crate::pixels::U16x4; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U16x4]; 4], dst_rows: [&mut [U16x4]; 4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 2]; let mut ba_buf = [0i64; 2]; /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| Shuffle to extract R0 and G0 as i64: 0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1 Shuffle to extract R1 and G1 as i64: 8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1 Shuffle to extract B0 and A0 as i64: 4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1 Shuffle to extract B1 and A1 as i64: 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1 */ const RG0_SHUFFLE: v128 = i8x16(0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1); const RG1_SHUFFLE: v128 = i8x16(8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1); const BA0_SHUFFLE: v128 = i8x16(4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1); const BA1_SHUFFLE: v128 = i8x16( 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1, ); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut rg_sum = [i64x2_splat(half_error); 4]; let mut ba_sum = [i64x2_splat(half_error); 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = i64x2_splat(k[0] as i64); let coeff1_i64x2 = i64x2_splat(k[1] as i64); for i in 0..4 { let source = wasm32_utils::load_v128(src_rows[i], x); let mut sum = rg_sum[i]; let rg_i64x2 = i8x16_swizzle(source, RG0_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(rg_i64x2, coeff0_i64x2)); let rg_i64x2 = i8x16_swizzle(source, RG1_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(rg_i64x2, coeff1_i64x2)); rg_sum[i] = sum; let mut sum = ba_sum[i]; let ba_i64x2 = i8x16_swizzle(source, BA0_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(ba_i64x2, coeff0_i64x2)); let ba_i64x2 = i8x16_swizzle(source, BA1_SHUFFLE); sum = i64x2_add(sum, wasm32_utils::i64x2_mul_lo(ba_i64x2, coeff1_i64x2)); ba_sum[i] = sum; } x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x2 = i64x2_splat(k as i64); for i in 0..4 { let source = wasm32_utils::loadl_i64(src_rows[i], x); let rg_i64x2 = i8x16_swizzle(source, RG0_SHUFFLE); rg_sum[i] = i64x2_add( rg_sum[i], wasm32_utils::i64x2_mul_lo(rg_i64x2, coeff0_i64x2), ); let ba_i64x2 = i8x16_swizzle(source, BA0_SHUFFLE); ba_sum[i] = i64x2_add( ba_sum[i], wasm32_utils::i64x2_mul_lo(ba_i64x2, coeff0_i64x2), ); } } for i in 0..4 { v128_store(rg_buf.as_mut_ptr() as *mut v128, rg_sum[i]); v128_store(ba_buf.as_mut_ptr() as *mut v128, ba_sum[i]); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = [ normalizer.clip(rg_buf[0]), normalizer.clip(rg_buf[1]), normalizer.clip(ba_buf[0]), normalizer.clip(ba_buf[1]), ]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_one_row( src_row: &[U16x4], dst_row: &mut [U16x4], normalizer: &Normalizer32, ) { let precision = normalizer.precision(); let half_error = 1i64 << (precision - 1); let mut rg_buf = [0i64; 2]; let mut ba_buf = [0i64; 2]; /* |R0 G0 B0 A0 | |R1 G1 B1 A1 | |0001 0203 0405 0607| |0809 1011 1213 1415| Shuffle to extract R0 and G0 as i64: 0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1 Shuffle to extract R1 and G1 as i64: 8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1 Shuffle to extract B0 and A0 as i64: 4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1 Shuffle to extract B1 and A1 as i64: 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1 */ const RG0_SHUFFLE: v128 = i8x16(0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1); const RG1_SHUFFLE: v128 = i8x16(8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1); const BA0_SHUFFLE: v128 = i8x16(4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1); const BA1_SHUFFLE: v128 = i8x16( 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1, ); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut coeffs = coeffs_chunk.values(); let mut rg_sum = i64x2_splat(half_error); let mut ba_sum = i64x2_splat(half_error); let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0_i64x2 = i64x2_splat(k[0] as i64); let coeff1_i64x2 = i64x2_splat(k[1] as i64); let source = wasm32_utils::load_v128(src_row, x); let rg_i64x2 = i8x16_swizzle(source, RG0_SHUFFLE); rg_sum = i64x2_add(rg_sum, wasm32_utils::i64x2_mul_lo(rg_i64x2, coeff0_i64x2)); let rg_i64x2 = i8x16_swizzle(source, RG1_SHUFFLE); rg_sum = i64x2_add(rg_sum, wasm32_utils::i64x2_mul_lo(rg_i64x2, coeff1_i64x2)); let ba_i64x2 = i8x16_swizzle(source, BA0_SHUFFLE); ba_sum = i64x2_add(ba_sum, wasm32_utils::i64x2_mul_lo(ba_i64x2, coeff0_i64x2)); let ba_i64x2 = i8x16_swizzle(source, BA1_SHUFFLE); ba_sum = i64x2_add(ba_sum, wasm32_utils::i64x2_mul_lo(ba_i64x2, coeff1_i64x2)); x += 2; } if let Some(&k) = coeffs.first() { let coeff0_i64x2 = i64x2_splat(k as i64); let source = wasm32_utils::loadl_i64(src_row, x); let rg_i64x2 = i8x16_swizzle(source, RG0_SHUFFLE); rg_sum = i64x2_add(rg_sum, wasm32_utils::i64x2_mul_lo(rg_i64x2, coeff0_i64x2)); let ba_i64x2 = i8x16_swizzle(source, BA0_SHUFFLE); ba_sum = i64x2_add(ba_sum, wasm32_utils::i64x2_mul_lo(ba_i64x2, coeff0_i64x2)); } v128_store(rg_buf.as_mut_ptr() as *mut v128, rg_sum); v128_store(ba_buf.as_mut_ptr() as *mut v128, ba_sum); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = [ normalizer.clip(rg_buf[0]), normalizer.clip(rg_buf[1]), normalizer.clip(ba_buf[0]), normalizer.clip(ba_buf[1]), ]; } } fast_image_resize-5.3.0/src/convolution/u8x1/avx2.rs000064400000000000000000000145301046102023000205150ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8]; 4], dst_rows: [&mut [U8]; 4], normalizer: &Normalizer16, ) { let zero = _mm_setzero_si128(); // 8 components will be added, use only 1/8 of the error let initial = _mm256_set1_epi32(1 << (normalizer.precision() - 4)); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut result_i32x8x4 = [initial, initial, initial, initial]; let coeffs_by_16 = chunk.values().chunks_exact(16); let reminder16 = coeffs_by_16.remainder(); for k in coeffs_by_16 { let coeffs_i16x16 = _mm256_loadu_si256(k.as_ptr() as *const __m256i); for i in 0..4 { let pixels_u8x16 = simd_utils::loadu_si128(src_rows[i], x); let pixels_i16x16 = _mm256_cvtepu8_epi16(pixels_u8x16); result_i32x8x4[i] = _mm256_add_epi32( result_i32x8x4[i], _mm256_madd_epi16(pixels_i16x16, coeffs_i16x16), ); } x += 16; } let mut coeffs_by_8 = reminder16.chunks_exact(8); let reminder8 = coeffs_by_8.remainder(); if let Some(k) = coeffs_by_8.next() { let coeffs_i16x8 = _mm_loadu_si128(k.as_ptr() as *const __m128i); for i in 0..4 { let pixels_u8x8 = simd_utils::loadl_epi64(src_rows[i], x); let pixels_i16x8 = _mm_cvtepu8_epi16(pixels_u8x8); result_i32x8x4[i] = _mm256_add_epi32( result_i32x8x4[i], _mm256_set_m128i(zero, _mm_madd_epi16(pixels_i16x8, coeffs_i16x8)), ); } x += 8; } let mut result_i32x4 = result_i32x8x4.map(|v| hsum_i32x8_avx2(v)); for &coeff in reminder8 { let coeff_i32 = coeff as i32; for i in 0..4 { result_i32x4[i] += src_rows[i].get_unchecked(x).0.to_owned() as i32 * coeff_i32; } x += 1; } let result_u8x4 = result_i32x4.map(|v| normalizer.clip(v)); for i in 0..4 { dst_rows[i].get_unchecked_mut(dst_x).0 = result_u8x4[i]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_one_row(src_row: &[U8], dst_row: &mut [U8], normalizer: &Normalizer16) { let zero = _mm_setzero_si128(); // 8 components will be added, use only 1/8 of the error let initial = _mm256_set1_epi32(1 << (normalizer.precision() - 4)); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut result_i32x8 = initial; let coeffs_by_16 = chunk.values().chunks_exact(16); let reminder16 = coeffs_by_16.remainder(); for k in coeffs_by_16 { let coeffs_i16x16 = _mm256_loadu_si256(k.as_ptr() as *const __m256i); let pixels_u8x16 = simd_utils::loadu_si128(src_row, x); let pixels_i16x16 = _mm256_cvtepu8_epi16(pixels_u8x16); result_i32x8 = _mm256_add_epi32( result_i32x8, _mm256_madd_epi16(pixels_i16x16, coeffs_i16x16), ); x += 16; } let mut coeffs_by_8 = reminder16.chunks_exact(8); let reminder8 = coeffs_by_8.remainder(); if let Some(k) = coeffs_by_8.next() { let coeffs_i16x8 = _mm_loadu_si128(k.as_ptr() as *const __m128i); let pixels_u8x8 = simd_utils::loadl_epi64(src_row, x); let pixels_i16x8 = _mm_cvtepu8_epi16(pixels_u8x8); result_i32x8 = _mm256_add_epi32( result_i32x8, _mm256_set_m128i(zero, _mm_madd_epi16(pixels_i16x8, coeffs_i16x8)), ); x += 8; } let mut result_i32 = hsum_i32x8_avx2(result_i32x8); for &coeff in reminder8 { let coeff_i32 = coeff as i32; result_i32 += src_row.get_unchecked(x).0 as i32 * coeff_i32; x += 1; } dst_row.get_unchecked_mut(dst_x).0 = normalizer.clip(result_i32); } } // only needs AVX2 #[inline] #[target_feature(enable = "avx2")] unsafe fn hsum_i32x8_avx2(v: __m256i) -> i32 { let sum128 = _mm_add_epi32(_mm256_castsi256_si128(v), _mm256_extracti128_si256::<1>(v)); hsum_epi32_avx(sum128) } #[inline] #[target_feature(enable = "avx2")] unsafe fn hsum_epi32_avx(x: __m128i) -> i32 { // 3-operand non-destructive AVX lets us save a byte without needing a movdqa let hi64 = _mm_unpackhi_epi64(x, x); let sum64 = _mm_add_epi32(hi64, x); const I: i32 = (2 << 6) | (3 << 4) | 1; let hi32 = _mm_shuffle_epi32::(sum64); // Swap the low two elements let sum32 = _mm_add_epi32(sum64, hi32); _mm_cvtsi128_si32(sum32) // movd } fast_image_resize-5.3.0/src/convolution/u8x1/mod.rs000064400000000000000000000050271046102023000204150ustar 00000000000000use super::{Coefficients, Convolution}; use crate::convolution::optimisations::Normalizer16; use crate::convolution::vertical_u8::vert_convolution_u8; use crate::pixels::U8; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U8; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let normalizer = Normalizer16::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let normalizer = Normalizer16::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_v! { vert_convolution_u8( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, normalizer), _ => native::horiz_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/u8x1/native.rs000064400000000000000000000020621046102023000211200ustar 00000000000000use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8; use crate::{ImageView, ImageViewMut}; #[inline(always)] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); let initial = 1 << (precision - 1); let coefficients = normalizer.chunks(); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (coeffs_chunk, dst_pixel) in coefficients.iter().zip(dst_row.iter_mut()) { let first_x_src = coeffs_chunk.start as usize; let mut ss = initial; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, &src_pixel) in coeffs_chunk.values().iter().zip(src_pixels) { ss += src_pixel.0 as i32 * (k as i32); } dst_pixel.0 = unsafe { normalizer.clip(ss) }; } } } fast_image_resize-5.3.0/src/convolution/u8x1/neon.rs000064400000000000000000000217731046102023000206030ustar 00000000000000use std::arch::aarch64::*; use crate::convolution::optimisations::Normalizer16; use crate::neon_utils; use crate::pixels::U8; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8]; 4], dst_rows: [&mut [U8]; 4], normalizer: &Normalizer16, ) { let precision = normalizer.precision(); let initial = vdupq_n_s32(1 << (precision - 3)); let zero_u8x16 = vdupq_n_u8(0); let zero_u8x8 = vdup_n_u8(0); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x = coeffs_chunk.start as usize; let mut sss_a = [initial; 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_16 = coeffs.chunks_exact(16); coeffs = coeffs_by_16.remainder(); for k in coeffs_by_16 { let coeffs_i16x8x2 = neon_utils::load_i16x8x2(k, 0); let coeff0 = vget_low_s16(coeffs_i16x8x2.0); let coeff1 = vget_high_s16(coeffs_i16x8x2.0); let coeff2 = vget_low_s16(coeffs_i16x8x2.1); let coeff3 = vget_high_s16(coeffs_i16x8x2.1); for i in 0..4 { let source = neon_utils::load_u8x16(src_rows[i], x); let mut sss = sss_a[i]; let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); sss = vmlal_s16(sss, vget_low_s16(source_i16), coeff0); sss = vmlal_s16(sss, vget_high_s16(source_i16), coeff1); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); sss = vmlal_s16(sss, vget_low_s16(source_i16), coeff2); sss = vmlal_s16(sss, vget_high_s16(source_i16), coeff3); sss_a[i] = sss; } x += 16; } let mut coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); if let Some(k) = coeffs_by_8.next() { let coeffs_i16x8 = neon_utils::load_i16x8(k, 0); let coeff0 = vget_low_s16(coeffs_i16x8); let coeff1 = vget_high_s16(coeffs_i16x8); for i in 0..4 { let source = neon_utils::load_u8x8(src_rows[i], x); let mut sss = sss_a[i]; let pix = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff0); let pix = vreinterpret_s16_u8(vzip2_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff1); sss_a[i] = sss; } x += 8; } let mut coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i16x4 = neon_utils::load_i16x4(k, 0); for i in 0..4 { let source = neon_utils::load_u8x4(src_rows[i], x); sss_a[i] = conv_4_pixels(sss_a[i], coeffs_i16x4, source, zero_u8x8); } x += 4; } let mut coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); if let Some(k) = coeffs_by_2.next() { let coeffs_i16x4 = neon_utils::load_i16x2(k, 0); for i in 0..4 { let source = neon_utils::load_u8x2(src_rows[i], x); sss_a[i] = conv_4_pixels(sss_a[i], coeffs_i16x4, source, zero_u8x8); } x += 2; } if !coeffs.is_empty() { let coeffs_i16x4 = neon_utils::load_i16x1(coeffs, 0); for i in 0..4 { let source = neon_utils::load_u8x1(src_rows[i], x); sss_a[i] = conv_4_pixels(sss_a[i], coeffs_i16x4, source, zero_u8x8); } } for i in 0..4 { let sss = sss_a[i]; let res_i32x2 = vadd_s32(vget_low_s32(sss), vget_high_s32(sss)); let res = vget_lane_s32::<0>(res_i32x2) + vget_lane_s32::<1>(res_i32x2); dst_rows[i].get_unchecked_mut(dst_x).0 = normalizer.clip(res); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_one_row(src_row: &[U8], dst_row: &mut [U8], normalizer: &Normalizer16) { let precision = normalizer.precision(); let initial = vdupq_n_s32(1 << (precision - 3)); let zero_u8x16 = vdupq_n_u8(0); let zero_u8x8 = vdup_n_u8(0); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x = coeffs_chunk.start as usize; let mut sss = initial; let mut coeffs = coeffs_chunk.values(); let coeffs_by_16 = coeffs.chunks_exact(16); coeffs = coeffs_by_16.remainder(); for k in coeffs_by_16 { let coeffs_i16x8x2 = neon_utils::load_i16x8x2(k, 0); let source = neon_utils::load_u8x16(src_row, x); let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); sss = vmlal_s16( sss, vget_low_s16(source_i16), vget_low_s16(coeffs_i16x8x2.0), ); sss = vmlal_s16( sss, vget_high_s16(source_i16), vget_high_s16(coeffs_i16x8x2.0), ); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); sss = vmlal_s16( sss, vget_low_s16(source_i16), vget_low_s16(coeffs_i16x8x2.1), ); sss = vmlal_s16( sss, vget_high_s16(source_i16), vget_high_s16(coeffs_i16x8x2.1), ); x += 16; } let mut coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); if let Some(k) = coeffs_by_8.next() { let coeffs_i16x8 = neon_utils::load_i16x8(k, 0); let source = neon_utils::load_u8x8(src_row, x); let source_i16 = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); sss = vmlal_s16(sss, source_i16, vget_low_s16(coeffs_i16x8)); let source_i16 = vreinterpret_s16_u8(vzip2_u8(source, zero_u8x8)); sss = vmlal_s16(sss, source_i16, vget_high_s16(coeffs_i16x8)); x += 8; } let mut coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i16x4 = neon_utils::load_i16x4(k, 0); let source = neon_utils::load_u8x4(src_row, x); sss = conv_4_pixels(sss, coeffs_i16x4, source, zero_u8x8); x += 4; } let mut coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); if let Some(k) = coeffs_by_2.next() { let coeffs_i16x4 = neon_utils::load_i16x2(k, 0); let source = neon_utils::load_u8x2(src_row, x); sss = conv_4_pixels(sss, coeffs_i16x4, source, zero_u8x8); x += 2; } if !coeffs.is_empty() { let coeffs_i16x4 = neon_utils::load_i16x1(coeffs, 0); let source = neon_utils::load_u8x1(src_row, x); sss = conv_4_pixels(sss, coeffs_i16x4, source, zero_u8x8); } let res_i32x2 = vadd_s32(vget_low_s32(sss), vget_high_s32(sss)); let res = vget_lane_s32::<0>(res_i32x2) + vget_lane_s32::<1>(res_i32x2); dst_row.get_unchecked_mut(dst_x).0 = normalizer.clip(res); } } #[inline] #[target_feature(enable = "neon")] unsafe fn conv_4_pixels( sss: int32x4_t, coeffs_i16x4: int16x4_t, source: uint8x8_t, zero_u8x8: uint8x8_t, ) -> int32x4_t { let source_i16 = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); vmlal_s16(sss, source_i16, coeffs_i16x4) } fast_image_resize-5.3.0/src/convolution/u8x1/sse4.rs000064400000000000000000000126411046102023000205140ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8]; 4], dst_rows: [&mut [U8]; 4], normalizer: &Normalizer16, ) { let zero = _mm_setzero_si128(); let initial = 1 << (normalizer.precision() - 1); let mut buf = [0, 0, 0, 0, initial]; for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut result_i32x4 = [zero, zero, zero, zero]; let coeffs_by_8 = chunk.values().chunks_exact(8); let reminder8 = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = _mm_loadu_si128(k.as_ptr() as *const __m128i); for i in 0..4 { let pixels_u8x8 = simd_utils::loadl_epi64(src_rows[i], x); let pixels_i16x8 = _mm_cvtepu8_epi16(pixels_u8x8); result_i32x4[i] = _mm_add_epi32(result_i32x4[i], _mm_madd_epi16(pixels_i16x8, coeffs_i16x8)); } x += 8; } let mut coeffs_by_4 = reminder8.chunks_exact(4); let reminder4 = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i16x4 = simd_utils::loadl_epi64(k, 0); for i in 0..4 { let pixels_u8x4 = simd_utils::loadl_epi32(src_rows[i], x); let pixels_i16x4 = _mm_cvtepu8_epi16(pixels_u8x4); result_i32x4[i] = _mm_add_epi32(result_i32x4[i], _mm_madd_epi16(pixels_i16x4, coeffs_i16x4)); } x += 4; } let mut result_i32x4 = result_i32x4.map(|v| { _mm_storeu_si128(buf.as_mut_ptr() as *mut __m128i, v); buf.iter().sum() }); for &coeff in reminder4 { let coeff_i32 = coeff as i32; for i in 0..4 { result_i32x4[i] += src_rows[i].get_unchecked(x).0.to_owned() as i32 * coeff_i32; } x += 1; } let result_u8x4 = result_i32x4.map(|v| normalizer.clip(v)); for i in 0..4 { dst_rows[i].get_unchecked_mut(dst_x).0 = result_u8x4[i]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_one_row(src_row: &[U8], dst_row: &mut [U8], normalizer: &Normalizer16) { let zero = _mm_setzero_si128(); let initial = 1 << (normalizer.precision() - 1); let mut buf = [0, 0, 0, 0, initial]; for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut result_i32x4 = zero; let coeffs_by_8 = chunk.values().chunks_exact(8); let reminder8 = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = _mm_loadu_si128(k.as_ptr() as *const __m128i); let pixels_u8x8 = simd_utils::loadl_epi64(src_row, x); let pixels_i16x8 = _mm_cvtepu8_epi16(pixels_u8x8); result_i32x4 = _mm_add_epi32(result_i32x4, _mm_madd_epi16(pixels_i16x8, coeffs_i16x8)); x += 8; } let mut coeffs_by_4 = reminder8.chunks_exact(4); let reminder4 = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i16x4 = simd_utils::loadl_epi64(k, 0); let pixels_u8x4 = simd_utils::loadl_epi32(src_row, x); let pixels_i16x4 = _mm_cvtepu8_epi16(pixels_u8x4); result_i32x4 = _mm_add_epi32(result_i32x4, _mm_madd_epi16(pixels_i16x4, coeffs_i16x4)); x += 4; } _mm_storeu_si128(buf.as_mut_ptr() as *mut __m128i, result_i32x4); let mut result_i32 = buf.iter().sum(); for &coeff in reminder4 { let coeff_i32 = coeff as i32; result_i32 += src_row.get_unchecked(x).0 as i32 * coeff_i32; x += 1; } dst_row.get_unchecked_mut(dst_x).0 = normalizer.clip(result_i32); } } fast_image_resize-5.3.0/src/convolution/u8x1/wasm32.rs000064400000000000000000000131361046102023000207520ustar 00000000000000use std::arch::wasm32::*; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8]; 4], dst_rows: [&mut [U8]; 4], normalizer: &Normalizer16, ) { const ZERO: v128 = i64x2(0, 0); let initial = 1 << (normalizer.precision() - 1); let mut buf = [0, 0, 0, 0, initial]; let coefficients_chunks = normalizer.chunks(); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let coeffs = coeffs_chunk.values(); let mut x = coeffs_chunk.start as usize; let mut result_i32x4 = [ZERO, ZERO, ZERO, ZERO]; let coeffs_by_8 = coeffs.chunks_exact(8); let reminder8 = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = v128_load(k.as_ptr() as *const v128); for i in 0..4 { let pixels_u8x8 = wasm32_utils::loadl_i64(src_rows[i], x); let pixels_i16x8 = u16x8_extend_low_u8x16(pixels_u8x8); result_i32x4[i] = i32x4_add(result_i32x4[i], i32x4_dot_i16x8(pixels_i16x8, coeffs_i16x8)); } x += 8; } let mut coeffs_by_4 = reminder8.chunks_exact(4); let reminder4 = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i16x4 = wasm32_utils::loadl_i64(k, 0); for i in 0..4 { let pixels_u8x4 = wasm32_utils::loadl_i32(src_rows[i], x); let pixels_i16x4 = u16x8_extend_low_u8x16(pixels_u8x4); result_i32x4[i] = i32x4_add(result_i32x4[i], i32x4_dot_i16x8(pixels_i16x4, coeffs_i16x4)); } x += 4; } let mut result_i32x4 = result_i32x4.map(|v| { v128_store(buf.as_mut_ptr() as *mut v128, v); buf.iter().sum() }); for &coeff in reminder4 { let coeff_i32 = coeff as i32; for i in 0..4 { result_i32x4[i] += src_rows[i].get_unchecked(x).0.to_owned() as i32 * coeff_i32; } x += 1; } let result_u8x4 = result_i32x4.map(|v| normalizer.clip(v)); for i in 0..4 { dst_rows[i].get_unchecked_mut(dst_x).0 = result_u8x4[i]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_one_row(src_row: &[U8], dst_row: &mut [U8], normalizer: &Normalizer16) { const ZERO: v128 = i64x2(0, 0); let initial = 1 << (normalizer.precision() - 1); let mut buf = [0, 0, 0, 0, initial]; let coefficients_chunks = normalizer.chunks(); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let coeffs = coeffs_chunk.values(); let mut x = coeffs_chunk.start as usize; let mut result_i32x4 = ZERO; let coeffs_by_8 = coeffs.chunks_exact(8); let reminder8 = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = v128_load(k.as_ptr() as *const v128); let pixels_u8x8 = wasm32_utils::loadl_i64(src_row, x); let pixels_i16x8 = u16x8_extend_low_u8x16(pixels_u8x8); result_i32x4 = i32x4_add(result_i32x4, i32x4_dot_i16x8(pixels_i16x8, coeffs_i16x8)); x += 8; } let mut coeffs_by_4 = reminder8.chunks_exact(4); let reminder4 = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i16x4 = wasm32_utils::loadl_i64(k, 0); let pixels_u8x4 = wasm32_utils::loadl_i32(src_row, x); let pixels_i16x4 = u16x8_extend_low_u8x16(pixels_u8x4); result_i32x4 = i32x4_add(result_i32x4, i32x4_dot_i16x8(pixels_i16x4, coeffs_i16x4)); x += 4; } v128_store(buf.as_mut_ptr() as *mut v128, result_i32x4); let mut result_i32 = buf.iter().sum(); for &coeff in reminder4 { let coeff_i32 = coeff as i32; result_i32 += src_row.get_unchecked(x).0 as i32 * coeff_i32; x += 1; } dst_row.get_unchecked_mut(dst_x).0 = normalizer.clip(result_i32); } } fast_image_resize-5.3.0/src/convolution/u8x2/avx2.rs000064400000000000000000000350151046102023000205170ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x2; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x2]; 4], dst_rows: [&mut [U8x2]; 4], normalizer: &Normalizer16, ) { let precision = normalizer.precision(); let initial = _mm256_set1_epi32(1 << (precision - 2)); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Shuffle components with converting from u8 into i16: A: |-1 07| |-1 05| |-1 03| |-1 01| L: |-1 06| |-1 04| |-1 02| |-1 00| */ #[rustfmt::skip] let sh1 = _mm256_set_epi8( -1, 7, -1, 5, -1, 3, -1, 1, -1, 6, -1, 4, -1, 2, -1, 0, -1, 7, -1, 5, -1, 3, -1, 1, -1, 6, -1, 4, -1, 2, -1, 0, ); /* A: |-1 15| |-1 13| |-1 11| |-1 09| L: |-1 14| |-1 12| |-1 10| |-1 08| */ #[rustfmt::skip] let sh2 = _mm256_set_epi8( -1, 15, -1, 13, -1, 11, -1, 9, -1, 14, -1, 12, -1, 10, -1, 8, -1, 15, -1, 13, -1, 11, -1, 9, -1, 14, -1, 12, -1, 10, -1, 8, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut sss0 = initial; let mut sss1 = initial; let coeffs = chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); let reminder = coeffs_by_8.remainder(); for k in coeffs_by_8 { let mmk0 = simd_utils::ptr_i16_to_256set1_epi64x(k, 0); let mmk1 = simd_utils::ptr_i16_to_256set1_epi64x(k, 4); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadu_si128(src_rows[0], x)), simd_utils::loadu_si128(src_rows[1], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk0)); let pix = _mm256_shuffle_epi8(source, sh2); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk1)); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadu_si128(src_rows[2], x)), simd_utils::loadu_si128(src_rows[3], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk0)); let pix = _mm256_shuffle_epi8(source, sh2); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk1)); x += 8; } let coeffs_by_4 = reminder.chunks_exact(4); let reminder = coeffs_by_4.remainder(); for k in coeffs_by_4 { let mmk = simd_utils::ptr_i16_to_256set1_epi64x(k, 0); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi64(src_rows[0], x)), simd_utils::loadl_epi64(src_rows[1], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk)); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi64(src_rows[2], x)), simd_utils::loadl_epi64(src_rows[3], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk)); x += 4; } let coeffs_by_2 = reminder.chunks_exact(2); let reminder = coeffs_by_2.remainder(); for k in coeffs_by_2 { let mmk = simd_utils::mm256_load_and_clone_i16x2(k); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi32(src_rows[0], x)), simd_utils::loadl_epi32(src_rows[1], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk)); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi32(src_rows[2], x)), simd_utils::loadl_epi32(src_rows[3], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk)); x += 2; } if let Some(&k) = reminder.first() { // [16] xx k0 xx k0 xx k0 xx k0 xx k0 xx k0 xx k0 xx k0 let mmk = _mm256_set1_epi32(k as i32); // [16] xx a0 xx b0 xx g0 xx r0 xx a0 xx b0 xx g0 xx r0 let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi16(src_rows[0], x)), simd_utils::loadl_epi16(src_rows[1], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk)); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi16(src_rows[2], x)), simd_utils::loadl_epi16(src_rows[3], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk)); } let lo128 = _mm256_extracti128_si256::<0>(sss0); let hi128 = _mm256_extracti128_si256::<1>(sss0); set_dst_pixel(lo128, dst_rows[0], dst_x, normalizer); set_dst_pixel(hi128, dst_rows[1], dst_x, normalizer); let lo128 = _mm256_extracti128_si256::<0>(sss1); let hi128 = _mm256_extracti128_si256::<1>(sss1); set_dst_pixel(lo128, dst_rows[2], dst_x, normalizer); set_dst_pixel(hi128, dst_rows[3], dst_x, normalizer); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn set_dst_pixel(raw: __m128i, d_row: &mut [U8x2], dst_x: usize, normalizer: &Normalizer16) { let l32x2 = _mm_extract_epi64::<0>(raw); let a32x2 = _mm_extract_epi64::<1>(raw); let l32 = ((l32x2 >> 32) as i32).saturating_add((l32x2 & 0xffffffff) as i32); let a32 = ((a32x2 >> 32) as i32).saturating_add((a32x2 & 0xffffffff) as i32); let l8 = normalizer.clip(l32); let a8 = normalizer.clip(a32); d_row.get_unchecked_mut(dst_x).0 = [l8, a8]; } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_one_row( src_row: &[U8x2], dst_row: &mut [U8x2], normalizer: &Normalizer16, ) { let precision = normalizer.precision(); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Scale first four pixels into i16: A: |-1 07| |-1 05| L: |-1 06| |-1 04| A: |-1 03| |-1 01| L: |-1 02| |-1 00| */ #[rustfmt::skip] let pix_sh1 = _mm256_set_epi8( -1, 7, -1, 5, -1, 6, -1, 4, -1, 3, -1, 1, -1, 2, -1, 0, -1, 7, -1, 5, -1, 6, -1, 4, -1, 3, -1, 1, -1, 2, -1, 0, ); /* |C0 | |C1 | |C2 | |C3 | |C4 | |C5 | |C6 | |C7 | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Duplicate first four coefficients for A and L components of pixels: CA: |07 06| |05 04| CL: |07 06| |05 04| CA: |03 02| |01 00| CL: |03 02| |01 00| */ #[rustfmt::skip] let coeff_sh1 = _mm256_set_epi8( 7, 6, 5, 4, 7, 6, 5, 4, 3, 2, 1, 0, 3, 2, 1, 0, 7, 6, 5, 4, 7, 6, 5, 4, 3, 2, 1, 0, 3, 2, 1, 0, ); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Scale second four pixels into i16: A: |-1 15| |-1 13| L: |-1 14| |-1 12| A: |-1 11| |-1 09| L: |-1 10| |-1 08| */ #[rustfmt::skip] let pix_sh2 = _mm256_set_epi8( -1, 15, -1, 13, -1, 14, -1, 12, -1, 11, -1, 9, -1, 10, -1, 8, -1, 15, -1, 13, -1, 14, -1, 12, -1, 11, -1, 9, -1, 10, -1, 8, ); /* |C0 | |C1 | |C2 | |C3 | |C4 | |C5 | |C6 | |C7 | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Duplicate second four coefficients for A and L components of pixels: CA: |15 14| |13 12| CL: |15 14| |13 12| CA: |11 10| |09 08| CL: |11 10| |09 08| */ #[rustfmt::skip] let coeff_sh2 = _mm256_set_epi8( 15, 14, 13, 12, 15, 14, 13, 12, 11, 10, 9, 8, 11, 10, 9, 8, 15, 14, 13, 12, 15, 14, 13, 12, 11, 10, 9, 8, 11, 10, 9, 8, ); /* Scale to i16 first four pixels in first half of register, and second four pixels in second half of register. */ #[rustfmt::skip] let pix_sh3 = _mm256_set_epi8( -1, 15, -1, 13, -1, 14, -1, 12, -1, 11, -1, 9, -1, 10, -1, 8, -1, 7, -1, 5, -1, 6, -1, 4, -1, 3, -1, 1, -1, 2, -1, 0, ); /* Duplicate first four coefficients in first half of register, and second four coefficients in second half of register. */ #[rustfmt::skip] let coeff_sh3 = _mm256_set_epi8( 15, 14, 13, 12, 15, 14, 13, 12, 11, 10, 9, 8, 11, 10, 9, 8, 7, 6, 5, 4, 7, 6, 5, 4, 3, 2, 1, 0, 3, 2, 1, 0, ); /* |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Scale four pixels into i16: A: |-1 07| |-1 05| L: |-1 06| |-1 04| A: |-1 03| |-1 01| L: |-1 02| |-1 00| */ let pix_sh4 = _mm_set_epi8(-1, 7, -1, 5, -1, 6, -1, 4, -1, 3, -1, 1, -1, 2, -1, 0); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut coeffs = chunk.values(); let mut sss = if coeffs.len() < 16 { // Lower part will be added to higher, use only half of the error _mm_set1_epi32(1 << (precision - 2)) } else { // Lower part will be added to higher twice, use only quarter of the error let mut sss256 = _mm256_set1_epi32(1 << (precision - 3)); let coeffs_by_16 = coeffs.chunks_exact(16); let reminder = coeffs_by_16.remainder(); for k in coeffs_by_16 { let ksource = simd_utils::loadu_si256(k, 0); let source = simd_utils::loadu_si256(src_row, x); let pix = _mm256_shuffle_epi8(source, pix_sh1); let mmk = _mm256_shuffle_epi8(ksource, coeff_sh1); sss256 = _mm256_add_epi32(sss256, _mm256_madd_epi16(pix, mmk)); let pix = _mm256_shuffle_epi8(source, pix_sh2); let mmk = _mm256_shuffle_epi8(ksource, coeff_sh2); sss256 = _mm256_add_epi32(sss256, _mm256_madd_epi16(pix, mmk)); x += 16; } let coeffs_by_8 = reminder.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let tmp = simd_utils::loadu_si128(k, 0); let ksource = _mm256_insertf128_si256::<1>(_mm256_castsi128_si256(tmp), tmp); let tmp = simd_utils::loadu_si128(src_row, x); let source = _mm256_insertf128_si256::<1>(_mm256_castsi128_si256(tmp), tmp); let pix = _mm256_shuffle_epi8(source, pix_sh3); let mmk = _mm256_shuffle_epi8(ksource, coeff_sh3); sss256 = _mm256_add_epi32(sss256, _mm256_madd_epi16(pix, mmk)); x += 8; } _mm_add_epi32( _mm256_extracti128_si256::<0>(sss256), _mm256_extracti128_si256::<1>(sss256), ) }; let coeffs_by_4 = coeffs.chunks_exact(4); let reminder1 = coeffs_by_4.remainder(); for k in coeffs_by_4 { let mmk = _mm_set_epi16(k[3], k[2], k[3], k[2], k[1], k[0], k[1], k[0]); let source = simd_utils::loadl_epi64(src_row, x); let pix = _mm_shuffle_epi8(source, pix_sh4); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 4 } if !reminder1.is_empty() { let mut pixels: [i16; 6] = [0; 6]; let mut coeffs: [i16; 3] = [0; 3]; for (i, &coeff) in reminder1.iter().enumerate() { coeffs[i] = coeff; let pixel: [u8; 2] = src_row.get_unchecked(x).0; pixels[i * 2] = pixel[0] as i16; pixels[i * 2 + 1] = pixel[1] as i16; x += 1; } let pix = _mm_set_epi16( 0, pixels[5], 0, pixels[4], pixels[3], pixels[1], pixels[2], pixels[0], ); let mmk = _mm_set_epi16( 0, coeffs[2], 0, coeffs[2], coeffs[1], coeffs[0], coeffs[1], coeffs[0], ); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); } let lo = _mm_extract_epi64::<0>(sss); let hi = _mm_extract_epi64::<1>(sss); let a32 = ((lo >> 32) as i32).saturating_add((hi >> 32) as i32); let l32 = ((lo & 0xffffffff) as i32).saturating_add((hi & 0xffffffff) as i32); let a8 = normalizer.clip(a32); let l8 = normalizer.clip(l32); dst_row.get_unchecked_mut(dst_x).0 = [l8, a8]; } } fast_image_resize-5.3.0/src/convolution/u8x2/mod.rs000064400000000000000000000050331046102023000204130ustar 00000000000000use super::{Coefficients, Convolution}; use crate::convolution::optimisations::Normalizer16; use crate::convolution::vertical_u8::vert_convolution_u8; use crate::pixels::U8x2; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U8x2; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let normalizer = Normalizer16::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let normalizer = Normalizer16::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_v! { vert_convolution_u8( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, normalizer), _ => native::horiz_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/u8x2/native.rs000064400000000000000000000022501046102023000211200ustar 00000000000000use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x2; use crate::{ImageView, ImageViewMut}; pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); let coefficients_chunks = normalizer.chunks(); let initial = 1 << (precision - 1); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (coeffs_chunk, dst_pixel) in coefficients_chunks.iter().zip(dst_row.iter_mut()) { let first_x_src = coeffs_chunk.start as usize; let ks = coeffs_chunk.values(); let mut ss = [initial; 2]; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, &src_pixel) in ks.iter().zip(src_pixels) { for (s, c) in ss.iter_mut().zip(src_pixel.0) { *s += c as i32 * (k as i32); } } dst_pixel.0 = ss.map(|v| unsafe { normalizer.clip(v) }); } } } fast_image_resize-5.3.0/src/convolution/u8x2/neon.rs000064400000000000000000000230421046102023000205730ustar 00000000000000use std::arch::aarch64::*; use crate::convolution::optimisations::{CoefficientsI16Chunk, Normalizer16}; use crate::pixels::U8x2; use crate::{neon_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let coefficients_chunks = normalizer.chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x2]; 4], dst_rows: [&mut [U8x2]; 4], coefficients_chunks: &[CoefficientsI16Chunk], ) { let initial = vdupq_n_s32(1 << (PRECISION - 2)); let zero_u8x16 = vdupq_n_u8(0); let zero_u8x8 = vdup_n_u8(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss_a = [initial; 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = neon_utils::load_i16x8(k, 0); let coeff01 = vzip1q_s16(coeffs_i16x8, coeffs_i16x8); let coeff23 = vzip2q_s16(coeffs_i16x8, coeffs_i16x8); let coeff0 = vget_low_s16(coeff01); let coeff1 = vget_high_s16(coeff01); let coeff2 = vget_low_s16(coeff23); let coeff3 = vget_high_s16(coeff23); for i in 0..4 { let source = neon_utils::load_u8x16(src_rows[i], x); let mut sss = sss_a[i]; let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); sss = vmlal_s16(sss, vget_low_s16(source_i16), coeff0); sss = vmlal_s16(sss, vget_high_s16(source_i16), coeff1); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); sss = vmlal_s16(sss, vget_low_s16(source_i16), coeff2); sss = vmlal_s16(sss, vget_high_s16(source_i16), coeff3); sss_a[i] = sss; } x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeffs_i16x4 = neon_utils::load_i16x4(k, 0); let coeff0 = vzip1_s16(coeffs_i16x4, coeffs_i16x4); let coeff1 = vzip2_s16(coeffs_i16x4, coeffs_i16x4); for i in 0..4 { let source = neon_utils::load_u8x8(src_rows[i], x); let mut sss = sss_a[i]; let pix = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff0); let pix = vreinterpret_s16_u8(vzip2_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff1); sss_a[i] = sss; } x += 4; } if !coeffs.is_empty() { let mut four_coeffs = [0i16; 4]; four_coeffs .iter_mut() .zip(coeffs) .for_each(|(d, s)| *d = *s); let coeffs_i16x4 = neon_utils::load_i16x4(&four_coeffs, 0); let coeff0 = vzip1_s16(coeffs_i16x4, coeffs_i16x4); let coeff1 = vzip2_s16(coeffs_i16x4, coeffs_i16x4); let mut four_pixels = [U8x2::new([0; 2]); 4]; for i in 0..4 { four_pixels .iter_mut() .zip(src_rows[i].get_unchecked(x..)) .for_each(|(d, s)| *d = *s); let source = neon_utils::load_u8x8(&four_pixels, 0); let mut sss = sss_a[i]; let pix = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff0); let pix = vreinterpret_s16_u8(vzip2_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff1); sss_a[i] = sss; } } let mut res_i32x2x4 = sss_a.map(|sss| vadd_s32(vget_low_s32(sss), vget_high_s32(sss))); res_i32x2x4[0] = vshr_n_s32::(res_i32x2x4[0]); res_i32x2x4[1] = vshr_n_s32::(res_i32x2x4[1]); res_i32x2x4[2] = vshr_n_s32::(res_i32x2x4[2]); res_i32x2x4[3] = vshr_n_s32::(res_i32x2x4[3]); for i in 0..4 { let sss = vcombine_s32(res_i32x2x4[i], vreinterpret_s32_u8(zero_u8x8)); let s = vreinterpret_u16_u8(vqmovun_s16(vcombine_s16( vqmovn_s32(sss), vreinterpret_s16_u8(zero_u8x8), ))); dst_rows[i].get_unchecked_mut(dst_x).0 = vget_lane_u16::<0>(s).to_le_bytes(); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_one_row( src_row: &[U8x2], dst_row: &mut [U8x2], coefficients_chunks: &[CoefficientsI16Chunk], ) { let initial = vdupq_n_s32(1 << (PRECISION - 2)); let zero_u8x16 = vdupq_n_u8(0); let zero_u8x8 = vdup_n_u8(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss = initial; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = neon_utils::load_i16x8(k, 0); let coeff01 = vzip1q_s16(coeffs_i16x8, coeffs_i16x8); let coeff23 = vzip2q_s16(coeffs_i16x8, coeffs_i16x8); let source = neon_utils::load_u8x16(src_row, x); let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); sss = vmlal_s16(sss, vget_low_s16(source_i16), vget_low_s16(coeff01)); sss = vmlal_s16(sss, vget_high_s16(source_i16), vget_high_s16(coeff01)); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); sss = vmlal_s16(sss, vget_low_s16(source_i16), vget_low_s16(coeff23)); sss = vmlal_s16(sss, vget_high_s16(source_i16), vget_high_s16(coeff23)); x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { sss = conv_four_pixels(sss, k, src_row, x, zero_u8x8); x += 4; } if !coeffs.is_empty() { let mut four_coeffs = [0i16; 4]; four_coeffs .iter_mut() .zip(coeffs) .for_each(|(d, s)| *d = *s); let mut four_pixels = [U8x2::new([0; 2]); 4]; four_pixels .iter_mut() .zip(src_row.get_unchecked(x..)) .for_each(|(d, s)| *d = *s); sss = conv_four_pixels(sss, &four_coeffs, &four_pixels, 0, zero_u8x8); } let mut res_i32x2 = vadd_s32(vget_low_s32(sss), vget_high_s32(sss)); res_i32x2 = vshr_n_s32::(res_i32x2); let sss = vcombine_s32(res_i32x2, vreinterpret_s32_u8(zero_u8x8)); let s = vreinterpret_u16_u8(vqmovun_s16(vcombine_s16( vqmovn_s32(sss), vreinterpret_s16_u8(zero_u8x8), ))); dst_row.get_unchecked_mut(dst_x).0 = vget_lane_u16::<0>(s).to_le_bytes(); } } #[inline] #[target_feature(enable = "neon")] unsafe fn conv_four_pixels( mut sss: int32x4_t, coeffs: &[i16], src_row: &[U8x2], x: usize, zero_u8x8: uint8x8_t, ) -> int32x4_t { let coeffs_i16x4 = neon_utils::load_i16x4(coeffs, 0); let coeff0 = vzip1_s16(coeffs_i16x4, coeffs_i16x4); let coeff1 = vzip2_s16(coeffs_i16x4, coeffs_i16x4); let source = neon_utils::load_u8x8(src_row, x); let pix = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff0); let pix = vreinterpret_s16_u8(vzip2_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff1); sss } fast_image_resize-5.3.0/src/convolution/u8x2/sse4.rs000064400000000000000000000237571046102023000205270ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x2; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x2]; 4], dst_rows: [&mut [U8x2]; 4], normalizer: &Normalizer16, ) { let precision = normalizer.precision(); let initial = _mm_set1_epi32(1 << (precision - 2)); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Shuffle components with converting from u8 into i16: A: |-1 07| |-1 05| |-1 03| |-1 01| L: |-1 06| |-1 04| |-1 02| |-1 00| */ #[rustfmt::skip] let sh1 = _mm_set_epi8( -1, 7, -1, 5, -1, 3, -1, 1, -1, 6, -1, 4, -1, 2, -1, 0, ); /* A: |-1 15| |-1 13| |-1 11| |-1 09| L: |-1 14| |-1 12| |-1 10| |-1 08| */ #[rustfmt::skip] let sh2 = _mm_set_epi8( -1, 15, -1, 13, -1, 11, -1, 9, -1, 14, -1, 12, -1, 10, -1, 8, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let coeffs = chunk.values(); let mut sss: [__m128i; 4] = [initial; 4]; let coeffs_by_8 = coeffs.chunks_exact(8); let reminder = coeffs_by_8.remainder(); for k in coeffs_by_8 { let mmk0 = simd_utils::ptr_i16_to_set1_epi64x(k, 0); let mmk1 = simd_utils::ptr_i16_to_set1_epi64x(k, 4); for i in 0..4 { let source = simd_utils::loadu_si128(src_rows[i], x); let pix = _mm_shuffle_epi8(source, sh1); let tmp_sum = _mm_add_epi32(sss[i], _mm_madd_epi16(pix, mmk0)); let pix = _mm_shuffle_epi8(source, sh2); sss[i] = _mm_add_epi32(tmp_sum, _mm_madd_epi16(pix, mmk1)); } x += 8; } let coeffs_by_4 = reminder.chunks_exact(4); let reminder = coeffs_by_4.remainder(); for k in coeffs_by_4 { let mmk = simd_utils::ptr_i16_to_set1_epi64x(k, 0); for i in 0..4 { let source = simd_utils::loadl_epi64(src_rows[i], x); let pix = _mm_shuffle_epi8(source, sh1); sss[i] = _mm_add_epi32(sss[i], _mm_madd_epi16(pix, mmk)); } x += 4; } let coeffs_by_2 = reminder.chunks_exact(2); let reminder = coeffs_by_2.remainder(); for k in coeffs_by_2 { let mmk = simd_utils::mm_load_and_clone_i16x2(k); for i in 0..4 { let source = simd_utils::loadl_epi32(src_rows[i], x); let pix = _mm_shuffle_epi8(source, sh1); sss[i] = _mm_add_epi32(sss[i], _mm_madd_epi16(pix, mmk)); } x += 2; } if let Some(&k) = reminder.first() { let mmk = _mm_set1_epi32(k as i32); for i in 0..4 { let source = simd_utils::loadl_epi16(src_rows[i], x); let pix = _mm_shuffle_epi8(source, sh1); sss[i] = _mm_add_epi32(sss[i], _mm_madd_epi16(pix, mmk)); } } for i in 0..4 { set_dst_pixel(sss[i], dst_rows[i], dst_x, normalizer); } } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn set_dst_pixel(raw: __m128i, d_row: &mut [U8x2], dst_x: usize, normalizer: &Normalizer16) { let l32x2 = _mm_extract_epi64::<0>(raw); let a32x2 = _mm_extract_epi64::<1>(raw); let l32 = ((l32x2 >> 32) as i32).saturating_add((l32x2 & 0xffffffff) as i32); let a32 = ((a32x2 >> 32) as i32).saturating_add((a32x2 & 0xffffffff) as i32); let l8 = normalizer.clip(l32); let a8 = normalizer.clip(a32); d_row.get_unchecked_mut(dst_x).0 = [l8, a8]; } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_one_row( src_row: &[U8x2], dst_row: &mut [U8x2], normalizer: &Normalizer16, ) { let precision = normalizer.precision(); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Scale first four pixels into i16: A: |-1 07| |-1 05| L: |-1 06| |-1 04| A: |-1 03| |-1 01| L: |-1 02| |-1 00| */ #[rustfmt::skip] let pix_sh1 = _mm_set_epi8( -1, 7, -1, 5, -1, 6, -1, 4, -1, 3, -1, 1, -1, 2, -1, 0, ); /* |C0 | |C1 | |C2 | |C3 | |C4 | |C5 | |C6 | |C7 | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Duplicate first four coefficients for A and L components of pixels: CA: |07 06| |05 04| CL: |07 06| |05 04| CA: |03 02| |01 00| CL: |03 02| |01 00| */ #[rustfmt::skip] let coeff_sh1 = _mm_set_epi8( 7, 6, 5, 4, 7, 6, 5, 4, 3, 2, 1, 0, 3, 2, 1, 0, ); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Scale second four pixels into i16: A: |-1 15| |-1 13| L: |-1 14| |-1 12| A: |-1 11| |-1 09| L: |-1 10| |-1 08| */ #[rustfmt::skip] let pix_sh2 = _mm_set_epi8( -1, 15, -1, 13, -1, 14, -1, 12, -1, 11, -1, 9, -1, 10, -1, 8, ); /* |C0 | |C1 | |C2 | |C3 | |C4 | |C5 | |C6 | |C7 | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Duplicate second four coefficients for A and L components of pixels: CA: |15 14| |13 12| CL: |15 14| |13 12| CA: |11 10| |09 08| CL: |11 10| |09 08| */ #[rustfmt::skip] let coeff_sh2 = _mm_set_epi8( 15, 14, 13, 12, 15, 14, 13, 12, 11, 10, 9, 8, 11, 10, 9, 8, ); /* |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Scale four pixels into i16: A: |-1 07| |-1 05| L: |-1 06| |-1 04| A: |-1 03| |-1 01| L: |-1 02| |-1 00| */ let pix_sh3 = _mm_set_epi8(-1, 7, -1, 5, -1, 6, -1, 4, -1, 3, -1, 1, -1, 2, -1, 0); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut coeffs = chunk.values(); // Lower part will be added to higher, use only half of the error let mut sss = _mm_set1_epi32(1 << (precision - 2)); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let ksource = simd_utils::loadu_si128(k, 0); let source = simd_utils::loadu_si128(src_row, x); let pix = _mm_shuffle_epi8(source, pix_sh1); let mmk = _mm_shuffle_epi8(ksource, coeff_sh1); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); let pix = _mm_shuffle_epi8(source, pix_sh2); let mmk = _mm_shuffle_epi8(ksource, coeff_sh2); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); let reminder1 = coeffs_by_4.remainder(); for k in coeffs_by_4 { let mmk = _mm_set_epi16(k[3], k[2], k[3], k[2], k[1], k[0], k[1], k[0]); let source = simd_utils::loadl_epi64(src_row, x); let pix = _mm_shuffle_epi8(source, pix_sh3); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 4 } if !reminder1.is_empty() { let mut pixels: [i16; 6] = [0; 6]; let mut coeffs: [i16; 3] = [0; 3]; for (i, &coeff) in reminder1.iter().enumerate() { coeffs[i] = coeff; let pixel: [u8; 2] = src_row.get_unchecked(x).0; pixels[i * 2] = pixel[0] as i16; pixels[i * 2 + 1] = pixel[1] as i16; x += 1; } let pix = _mm_set_epi16( 0, pixels[5], 0, pixels[4], pixels[3], pixels[1], pixels[2], pixels[0], ); let mmk = _mm_set_epi16( 0, coeffs[2], 0, coeffs[2], coeffs[1], coeffs[0], coeffs[1], coeffs[0], ); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); } let lo = _mm_extract_epi64::<0>(sss); let hi = _mm_extract_epi64::<1>(sss); let a32 = ((lo >> 32) as i32).saturating_add((hi >> 32) as i32); let l32 = ((lo & 0xffffffff) as i32).saturating_add((hi & 0xffffffff) as i32); let a8 = normalizer.clip(a32); let l8 = normalizer.clip(l32); dst_row.get_unchecked_mut(dst_x).0 = [l8, a8]; } } fast_image_resize-5.3.0/src/convolution/u8x2/wasm32.rs000064400000000000000000000170141046102023000207520ustar 00000000000000use std::arch::wasm32::*; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x2; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x2]; 4], dst_rows: [&mut [U8x2]; 4], normalizer: &Normalizer16, ) { let precision = normalizer.precision(); let initial = i32x4_splat(1 << (precision - 2)); /* |L A | |L A | |L A | |L A | |L A | |L A | |L A | |L A | |00 01| |02 03| |04 05| |06 07| |08 09| |10 11| |12 13| |14 15| Shuffle components with converting from u8 into i16: A: |-1 07| |-1 05| |-1 03| |-1 01| L: |-1 06| |-1 04| |-1 02| |-1 00| */ const SH1: v128 = i8x16(0, -1, 2, -1, 4, -1, 6, -1, 1, -1, 3, -1, 5, -1, 7, -1); /* A: |-1 15| |-1 13| |-1 11| |-1 09| L: |-1 14| |-1 12| |-1 10| |-1 08| */ const SH2: v128 = i8x16(8, -1, 10, -1, 12, -1, 14, -1, 9, -1, 11, -1, 13, -1, 15, -1); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x = coeffs_chunk.start as usize; let mut sss: [v128; 4] = [initial; 4]; let coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); let reminder = coeffs_by_8.remainder(); for k in coeffs_by_8 { let mmk0 = wasm32_utils::ptr_i16_to_set1_i64(k, 0); let mmk1 = wasm32_utils::ptr_i16_to_set1_i64(k, 4); for i in 0..4 { let source = wasm32_utils::load_v128(src_rows[i], x); let pix = i8x16_swizzle(source, SH1); let tmp_sum = i32x4_add(sss[i], i32x4_dot_i16x8(pix, mmk0)); let pix = i8x16_swizzle(source, SH2); sss[i] = i32x4_add(tmp_sum, i32x4_dot_i16x8(pix, mmk1)); } x += 8; } let coeffs_by_4 = reminder.chunks_exact(4); let reminder = coeffs_by_4.remainder(); for k in coeffs_by_4 { let mmk = wasm32_utils::ptr_i16_to_set1_i64(k, 0); for i in 0..4 { let source = wasm32_utils::loadl_i64(src_rows[i], x); let pix = i8x16_swizzle(source, SH1); sss[i] = i32x4_add(sss[i], i32x4_dot_i16x8(pix, mmk)); } x += 4; } let coeffs_by_2 = reminder.chunks_exact(2); let reminder = coeffs_by_2.remainder(); for k in coeffs_by_2 { let mmk = wasm32_utils::ptr_i16_to_set1_i32(k, 0); for i in 0..4 { let source = wasm32_utils::loadl_i32(src_rows[i], x); let pix = i8x16_swizzle(source, SH1); sss[i] = i32x4_add(sss[i], i32x4_dot_i16x8(pix, mmk)); } x += 2; } if let Some(&k) = reminder.first() { let mmk = i32x4_splat(k as i32); for i in 0..4 { let source = wasm32_utils::loadl_i16(src_rows[i], x); let pix = i8x16_swizzle(source, SH1); sss[i] = i32x4_add(sss[i], i32x4_dot_i16x8(pix, mmk)); } } for i in 0..4 { set_dst_pixel(sss[i], dst_rows[i], dst_x, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_one_row( src_row: &[U8x2], dst_row: &mut [U8x2], normalizer: &Normalizer16, ) { const SH1: v128 = i8x16(0, -1, 2, -1, 4, -1, 6, -1, 1, -1, 3, -1, 5, -1, 7, -1); /* A: |-1 15| |-1 13| |-1 11| |-1 09| L: |-1 14| |-1 12| |-1 10| |-1 08| */ const SH2: v128 = i8x16(8, -1, 10, -1, 12, -1, 14, -1, 9, -1, 11, -1, 13, -1, 15, -1); // Lower part will be added to higher, use only half of the error let precision = normalizer.precision(); let initial = i32x4_splat(1 << (precision - 2)); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x = coeffs_chunk.start as usize; let mut coeffs = coeffs_chunk.values(); let mut sss = initial; let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let mmk0 = wasm32_utils::ptr_i16_to_set1_i64(k, 0); let mmk1 = wasm32_utils::ptr_i16_to_set1_i64(k, 4); let source = wasm32_utils::load_v128(src_row, x); let pix = i8x16_swizzle(source, SH1); let tmp_sum = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk0)); let pix = i8x16_swizzle(source, SH2); sss = i32x4_add(tmp_sum, i32x4_dot_i16x8(pix, mmk1)); x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); let reminder1 = coeffs_by_4.remainder(); for k in coeffs_by_4 { let mmk = wasm32_utils::ptr_i16_to_set1_i64(k, 0); let source = wasm32_utils::loadl_i64(src_row, x); let pix = i8x16_swizzle(source, SH1); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); x += 4 } let coeffs_by_2 = reminder1.chunks_exact(2); let reminder = coeffs_by_2.remainder(); for k in coeffs_by_2 { let mmk = wasm32_utils::ptr_i16_to_set1_i32(k, 0); let source = wasm32_utils::loadl_i32(src_row, x); let pix = i8x16_swizzle(source, SH1); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); x += 2; } if let Some(&k) = reminder.first() { let mmk = i32x4_splat(k as i32); let source = wasm32_utils::loadl_i16(src_row, x); let pix = i8x16_swizzle(source, SH1); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); } set_dst_pixel(sss, dst_row, dst_x, normalizer); } } #[inline] #[target_feature(enable = "simd128")] unsafe fn set_dst_pixel(raw: v128, d_row: &mut [U8x2], dst_x: usize, normalizer: &Normalizer16) { let mut buf = [0i32; 4]; v128_store(buf.as_mut_ptr() as _, raw); let l32 = buf[0].saturating_add(buf[1]); let a32 = buf[2].saturating_add(buf[3]); let l8 = normalizer.clip(l32); let a8 = normalizer.clip(a32); d_row.get_unchecked_mut(dst_x).0 = [l8, a8]; } fast_image_resize-5.3.0/src/convolution/u8x3/avx2.rs000064400000000000000000000335131046102023000205210ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x3; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x3]; 4], dst_rows: [&mut [U8x3]; 4], normalizer: &Normalizer16, ) { let zero = _mm256_setzero_si256(); let initial = _mm256_set1_epi32(1 << (PRECISION - 1)); let src_width = src_rows[0].len(); /* |R G B | |R G B | |R G B | |R G B | |R G B | |R | |00 01 02| |03 04 05| |06 07 08| |09 10 11| |12 13 14| |15| Ignore 12-15 bytes in each half of 32-bytes register and shuffle other components with converting from u8 into i16: x: |-1 -1| |-1 -1| B: |-1 05| |-1 02| G: |-1 04| |-1 01| R: |-1 03| |-1 00| */ #[rustfmt::skip] let sh1 = _mm256_set_epi8( -1, -1, -1, -1, -1, 5, -1, 2, -1, 4, -1, 1, -1, 3, -1, 0, -1, -1, -1, -1, -1, 5, -1, 2, -1, 4, -1, 1, -1, 3, -1, 0, ); /* x: |-1 -1| |-1 -1| B: |-1 11| |-1 08| G: |-1 10| |-1 07| R: |-1 09| |-1 06| */ #[rustfmt::skip] let sh2 = _mm256_set_epi8( -1, -1, -1, -1, -1, 11, -1, 8, -1, 10, -1, 7, -1, 9, -1, 6, -1, -1, -1, -1, -1, 11, -1, 8, -1, 10, -1, 7, -1, 9, -1, 6, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let x_start = chunk.start as usize; let mut x = x_start; let mut sss0 = initial; let mut sss1 = initial; let mut coeffs = chunk.values(); // (16 bytes) / (3 bytes per pixel) = 5 whole pixels + 1 byte let max_x = src_width.saturating_sub(5); if x < max_x { let coeffs_by_4 = coeffs.chunks_exact(4); for k in coeffs_by_4 { let mmk0 = simd_utils::mm256_load_and_clone_i16x2(k); let mmk1 = simd_utils::mm256_load_and_clone_i16x2(&k[2..]); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadu_si128(src_rows[0], x)), simd_utils::loadu_si128(src_rows[1], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk0)); let pix = _mm256_shuffle_epi8(source, sh2); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk1)); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadu_si128(src_rows[2], x)), simd_utils::loadu_si128(src_rows[3], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk0)); let pix = _mm256_shuffle_epi8(source, sh2); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk1)); x += 4; if x >= max_x { break; } } } // (8 bytes) / (3 bytes per pixel) = 2 whole pixels + 2 bytes let max_x = src_width.saturating_sub(2); if x < max_x { let coeffs_by_2 = coeffs[x - x_start..].chunks_exact(2); for k in coeffs_by_2 { let mmk = simd_utils::mm256_load_and_clone_i16x2(k); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi64(src_rows[0], x)), simd_utils::loadl_epi64(src_rows[1], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk)); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi64(src_rows[2], x)), simd_utils::loadl_epi64(src_rows[3], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk)); x += 2; if x >= max_x { break; } } } coeffs = coeffs.split_at(x - x_start).1; for &k in coeffs { // [16] xx k0 xx k0 xx k0 xx k0 xx k0 xx k0 xx k0 xx k0 let mmk = _mm256_set1_epi32(k as i32); // [16] xx a0 xx b0 xx g0 xx r0 xx a0 xx b0 xx g0 xx r0 let pix = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::mm_cvtepu8_epi32_u8x3(src_rows[0], x)), simd_utils::mm_cvtepu8_epi32_u8x3(src_rows[1], x), ); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk)); let pix = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::mm_cvtepu8_epi32_u8x3(src_rows[2], x)), simd_utils::mm_cvtepu8_epi32_u8x3(src_rows[3], x), ); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk)); x += 1; } sss0 = _mm256_srai_epi32::(sss0); sss1 = _mm256_srai_epi32::(sss1); sss0 = _mm256_packs_epi32(sss0, zero); sss1 = _mm256_packs_epi32(sss1, zero); sss0 = _mm256_packus_epi16(sss0, zero); sss1 = _mm256_packus_epi16(sss1, zero); let pixel: u32 = i32::cast_unsigned(_mm_cvtsi128_si32(_mm256_extracti128_si256::<0>(sss0))); let bytes = pixel.to_le_bytes(); dst_rows[0].get_unchecked_mut(dst_x).0 = [bytes[0], bytes[1], bytes[2]]; let pixel: u32 = i32::cast_unsigned(_mm_cvtsi128_si32(_mm256_extracti128_si256::<1>(sss0))); let bytes = pixel.to_le_bytes(); dst_rows[1].get_unchecked_mut(dst_x).0 = [bytes[0], bytes[1], bytes[2]]; let pixel: u32 = i32::cast_unsigned(_mm_cvtsi128_si32(_mm256_extracti128_si256::<0>(sss1))); let bytes = pixel.to_le_bytes(); dst_rows[2].get_unchecked_mut(dst_x).0 = [bytes[0], bytes[1], bytes[2]]; let pixel: u32 = i32::cast_unsigned(_mm_cvtsi128_si32(_mm256_extracti128_si256::<1>(sss1))); let bytes = pixel.to_le_bytes(); dst_rows[3].get_unchecked_mut(dst_x).0 = [bytes[0], bytes[1], bytes[2]]; } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_one_row( src_row: &[U8x3], dst_row: &mut [U8x3], normalizer: &Normalizer16, ) { #[rustfmt::skip] let sh1 = _mm256_set_epi8( -1, -1, -1, -1, -1, 5, -1, 2, -1, 4, -1, 1, -1, 3, -1, 0, -1, -1, -1, -1, -1, 5, -1, 2, -1, 4, -1, 1, -1, 3, -1, 0, ); #[rustfmt::skip] let sh2 = _mm256_set_epi8( 11, 10, 9, 8, 11, 10, 9, 8, 11, 10, 9, 8, 11, 10, 9, 8, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, ); #[rustfmt::skip] let sh3 = _mm256_set_epi8( -1, -1, -1, -1, -1, 11, -1, 8, -1, 10, -1, 7, -1, 9, -1, 6, -1, -1, -1, -1, -1, 11, -1, 8, -1, 10, -1, 7, -1, 9, -1, 6, ); #[rustfmt::skip] let sh4 = _mm256_set_epi8( 15, 14, 13, 12, 15, 14, 13, 12, 15, 14, 13, 12, 15, 14, 13, 12, 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, ); #[rustfmt::skip] let sh5 = _mm256_set_epi8( -1, -1, -1, -1, -1, 11, -1, 8, -1, 10, -1, 7, -1, 9, -1, 6, -1, -1, -1, -1, -1, 5, -1, 2, -1, 4, -1, 1, -1, 3, -1, 0, ); #[rustfmt::skip] let sh6 = _mm256_set_epi8( 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, ); /* Load 8 bytes from memory into low half of 16-bytes register: |R G B | |R G B | |R G | |00 01 02| |03 04 05| |06 07| 08 09 10 11 12 13 14 15 Ignore 06-16 bytes in 16-bytes register and shuffle other components with converting from u8 into i16: x: |-1 -1| |-1 -1| B: |-1 05| |-1 02| G: |-1 04| |-1 01| R: |-1 03| |-1 00| */ let sh7 = _mm_set_epi8(-1, -1, -1, -1, -1, 5, -1, 2, -1, 4, -1, 1, -1, 3, -1, 0); let src_width = src_row.len(); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let x_start = chunk.start as usize; let mut x = x_start; let mut coeffs = chunk.values(); // (16 bytes) / (3 bytes per pixel) = 5 whole pixels + 1 bytes // 4 + 5 = 9 let max_x = src_width.saturating_sub(9); // (32 bytes) / (3 bytes per pixel) = 10 whole pixels + 2 bytes let mut sss = if coeffs.len() < 8 || x >= max_x { _mm_set1_epi32(1 << (PRECISION - 1)) } else { // Lower part will be added to higher, use only half of the error let mut sss256 = _mm256_set1_epi32(1 << (PRECISION - 2)); let coeffs_by_8 = coeffs.chunks_exact(8); for k in coeffs_by_8 { let tmp = simd_utils::loadu_si128(k, 0); let ksource = _mm256_insertf128_si256::<1>(_mm256_castsi128_si256(tmp), tmp); let s_upper = simd_utils::loadu_si128(src_row, x); let s_lower = simd_utils::loadu_si128(src_row, x + 4); let source = _mm256_inserti128_si256::<1>(_mm256_castsi128_si256(s_upper), s_lower); let pix = _mm256_shuffle_epi8(source, sh1); let mmk = _mm256_shuffle_epi8(ksource, sh2); sss256 = _mm256_add_epi32(sss256, _mm256_madd_epi16(pix, mmk)); let pix = _mm256_shuffle_epi8(source, sh3); let mmk = _mm256_shuffle_epi8(ksource, sh4); sss256 = _mm256_add_epi32(sss256, _mm256_madd_epi16(pix, mmk)); x += 8; if x >= max_x { break; } } // (16 bytes) / (3 bytes per pixel) = 5 whole pixels + 1 bytes let max_x = src_width.saturating_sub(5); if x < max_x { let coeffs_by_4 = coeffs[x - x_start..].chunks_exact(4); for k in coeffs_by_4 { let tmp = simd_utils::loadl_epi64(k, 0); let ksource = _mm256_insertf128_si256::<1>(_mm256_castsi128_si256(tmp), tmp); let tmp = simd_utils::loadu_si128(src_row, x); let source = _mm256_insertf128_si256::<1>(_mm256_castsi128_si256(tmp), tmp); let pix = _mm256_shuffle_epi8(source, sh5); let mmk = _mm256_shuffle_epi8(ksource, sh6); sss256 = _mm256_add_epi32(sss256, _mm256_madd_epi16(pix, mmk)); x += 4; if x >= max_x { break; } } } _mm_add_epi32( _mm256_extracti128_si256::<0>(sss256), _mm256_extracti128_si256::<1>(sss256), ) }; // (8 bytes) / (3 bytes per pixel) = 2 whole pixels + 2 bytes let max_x = src_width.saturating_sub(2); if x < max_x { let coeffs_by_2 = coeffs[x - x_start..].chunks_exact(2); for k in coeffs_by_2 { let mmk = simd_utils::mm_load_and_clone_i16x2(k); let source = simd_utils::loadl_epi64(src_row, x); let pix = _mm_shuffle_epi8(source, sh7); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 2; if x >= max_x { break; } } } coeffs = coeffs.split_at(x - x_start).1; for &k in coeffs { let pix = simd_utils::mm_cvtepu8_epi32_u8x3(src_row, x); let mmk = _mm_set1_epi32(k as i32); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 1; } sss = _mm_srai_epi32::(sss); sss = _mm_packs_epi32(sss, sss); let pixel: u32 = i32::cast_unsigned(_mm_cvtsi128_si32(_mm_packus_epi16(sss, sss))); let bytes = pixel.to_le_bytes(); dst_row.get_unchecked_mut(dst_x).0 = [bytes[0], bytes[1], bytes[2]]; } } fast_image_resize-5.3.0/src/convolution/u8x3/mod.rs000064400000000000000000000050331046102023000204140ustar 00000000000000use super::{Coefficients, Convolution}; use crate::convolution::optimisations::Normalizer16; use crate::convolution::vertical_u8::vert_convolution_u8; use crate::pixels::U8x3; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U8x3; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let normalizer = Normalizer16::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let normalizer = Normalizer16::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_v! { vert_convolution_u8( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, normalizer), _ => native::horiz_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/u8x3/native.rs000064400000000000000000000022251046102023000211230ustar 00000000000000use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x3; use crate::{ImageView, ImageViewMut}; #[inline(always)] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); let coefficients = normalizer.chunks(); let initial = 1i32 << (precision - 1); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, src_row) in dst_rows.zip(src_rows) { for (coeffs_chunk, dst_pixel) in coefficients.iter().zip(dst_row.iter_mut()) { let first_x_src = coeffs_chunk.start as usize; let mut ss = [initial; 3]; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, src_pixel) in coeffs_chunk.values().iter().zip(src_pixels) { for (s, c) in ss.iter_mut().zip(src_pixel.0) { *s += c as i32 * (k as i32); } } dst_pixel.0 = ss.map(|v| unsafe { normalizer.clip(v) }); } } } fast_image_resize-5.3.0/src/convolution/u8x3/neon.rs000064400000000000000000000237701046102023000206040ustar 00000000000000use std::arch::aarch64::*; use crate::convolution::optimisations::{CoefficientsI16Chunk, Normalizer16}; use crate::neon_utils; use crate::pixels::U8x3; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let coefficients_chunks = normalizer.chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x3]; 4], dst_rows: [&mut [U8x3]; 4], coefficients_chunks: &[CoefficientsI16Chunk], ) { let initial = vdupq_n_s32(1 << (PRECISION - 1)); let zero_u8x8 = vdup_n_u8(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss_a = [initial; 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = neon_utils::load_i16x8(k, 0); for i in 0..4 { sss_a[i] = conv_8_pixels(sss_a[i], coeffs_i16x8, src_rows[i], x, zero_u8x8); } x += 8; } let mut coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i16x4 = neon_utils::load_i16x4(k, 0); for i in 0..4 { sss_a[i] = conv_4_pixels(sss_a[i], coeffs_i16x4, src_rows[i], x, zero_u8x8); } x += 4; } if !coeffs.is_empty() { let mut four_coeffs = [0i16; 4]; four_coeffs .iter_mut() .zip(coeffs) .for_each(|(d, s)| *d = *s); let coeffs_i16x4 = neon_utils::load_i16x4(&four_coeffs, 0); let mut four_pixels = [U8x3::new([0, 0, 0]); 4]; for i in 0..4 { four_pixels .iter_mut() .zip(src_rows[i].get_unchecked(x..)) .for_each(|(d, s)| *d = *s); sss_a[i] = conv_4_pixels(sss_a[i], coeffs_i16x4, &four_pixels, 0, zero_u8x8); } } sss_a[0] = vshrq_n_s32::(sss_a[0]); sss_a[1] = vshrq_n_s32::(sss_a[1]); sss_a[2] = vshrq_n_s32::(sss_a[2]); sss_a[3] = vshrq_n_s32::(sss_a[3]); for i in 0..4 { store_pixel(sss_a[i], dst_rows[i], dst_x, zero_u8x8); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_one_row( src_row: &[U8x3], dst_row: &mut [U8x3], coefficients_chunks: &[CoefficientsI16Chunk], ) { let initial = vdupq_n_s32(1 << (PRECISION - 1)); let zero_u8x8 = vdup_n_u8(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss = initial; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = neon_utils::load_i16x8(k, 0); sss = conv_8_pixels(sss, coeffs_i16x8, src_row, x, zero_u8x8); x += 8; } let mut coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); if let Some(k) = coeffs_by_4.next() { let coeffs_i16x4 = neon_utils::load_i16x4(k, 0); sss = conv_4_pixels(sss, coeffs_i16x4, src_row, x, zero_u8x8); x += 4; } if !coeffs.is_empty() { let mut four_coeffs = [0i16; 4]; four_coeffs .iter_mut() .zip(coeffs) .for_each(|(d, s)| *d = *s); let coeffs_i16x4 = neon_utils::load_i16x4(&four_coeffs, 0); let mut four_pixels = [U8x3::new([0, 0, 0]); 4]; four_pixels .iter_mut() .zip(src_row.get_unchecked(x..)) .for_each(|(d, s)| *d = *s); sss = conv_4_pixels(sss, coeffs_i16x4, &four_pixels, 0, zero_u8x8); } sss = vshrq_n_s32::(sss); store_pixel(sss, dst_row, dst_x, zero_u8x8); } } #[inline] unsafe fn store_pixel(sss: int32x4_t, dst_row: &mut [U8x3], dst_x: usize, zero_u8x8: uint8x8_t) { let res_i16x4 = vmovn_s32(sss); let res_u8x8 = vqmovun_s16(vcombine_s16(res_i16x4, vreinterpret_s16_u8(zero_u8x8))); let res_u32 = vget_lane_u32::<0>(vreinterpret_u32_u8(res_u8x8)); let rgbx = res_u32.to_le_bytes(); dst_row.get_unchecked_mut(dst_x).0 = [rgbx[0], rgbx[1], rgbx[2]]; } #[inline] unsafe fn conv_8_pixels( mut sss: int32x4_t, coeffs_i16x8: int16x8_t, src_row: &[U8x3], x: usize, zero_u8x8: uint8x8_t, ) -> int32x4_t { let source = neon_utils::load_u8x8x3(src_row, x); // pixel 0 let pix_i16x4 = vreinterpret_s16_u8(vzip1_u8(source.0, zero_u8x8)); let coeff = vdup_laneq_s16::<0>(coeffs_i16x8); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 1 let pix_i16x4 = vreinterpret_s16_u8(vtbl1_u8( source.0, vcreate_u8(u64::from_le_bytes([3, 255, 4, 255, 5, 255, 255, 255])), )); let coeff = vdup_laneq_s16::<1>(coeffs_i16x8); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 2 let pix_i16x4 = vreinterpret_s16_u8(vtbl2_u8( uint8x8x2_t(source.0, source.1), vcreate_u8(u64::from_le_bytes([6, 255, 7, 255, 8, 255, 255, 255])), )); let coeff = vdup_laneq_s16::<2>(coeffs_i16x8); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 3 let pix_i16x4 = vreinterpret_s16_u8(vtbl1_u8( source.1, vcreate_u8(u64::from_le_bytes([1, 255, 2, 255, 3, 255, 255, 255])), )); let coeff = vdup_laneq_s16::<3>(coeffs_i16x8); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 4 let pix_i16x4 = vreinterpret_s16_u8(vtbl1_u8( source.1, vcreate_u8(u64::from_le_bytes([4, 255, 5, 255, 6, 255, 255, 255])), )); let coeff = vdup_laneq_s16::<4>(coeffs_i16x8); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 5 let pix_i16x4 = vreinterpret_s16_u8(vtbl2_u8( uint8x8x2_t(source.1, source.2), vcreate_u8(u64::from_le_bytes([7, 255, 8, 255, 9, 255, 255, 255])), )); let coeff = vdup_laneq_s16::<5>(coeffs_i16x8); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 6 let pix_i16x4 = vreinterpret_s16_u8(vtbl1_u8( source.2, vcreate_u8(u64::from_le_bytes([2, 255, 3, 255, 4, 255, 255, 255])), )); let coeff = vdup_laneq_s16::<6>(coeffs_i16x8); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 7 let pix_i16x4 = vreinterpret_s16_u8(vtbl1_u8( source.2, vcreate_u8(u64::from_le_bytes([5, 255, 6, 255, 7, 255, 255, 255])), )); let coeff = vdup_laneq_s16::<7>(coeffs_i16x8); sss = vmlal_s16(sss, pix_i16x4, coeff); sss } #[inline] unsafe fn conv_4_pixels( mut sss: int32x4_t, coeffs_i16x4: int16x4_t, src_row: &[U8x3], x: usize, zero_u8x8: uint8x8_t, ) -> int32x4_t { // |R0 G0 B0 R1 G1 B1 R2 G2| let source0 = neon_utils::load_u8x8(src_row, x); // |G1 B1 R2 G2 B2 R3 G3 B3| let source1 = vld1_u8((src_row.get_unchecked(x..).as_ptr() as *const u8).add(4)); // pixel 0 let pix_i16x4 = vreinterpret_s16_u8(vzip1_u8(source0, zero_u8x8)); let coeff = vdup_lane_s16::<0>(coeffs_i16x4); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 1 let pix_i16x4 = vreinterpret_s16_u8(vtbl1_u8( source0, vcreate_u8(u64::from_le_bytes([3, 255, 4, 255, 5, 255, 255, 255])), )); let coeff = vdup_lane_s16::<1>(coeffs_i16x4); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 2 let pix_i16x4 = vreinterpret_s16_u8(vtbl1_u8( source1, vcreate_u8(u64::from_le_bytes([2, 255, 3, 255, 4, 255, 255, 255])), )); let coeff = vdup_lane_s16::<2>(coeffs_i16x4); sss = vmlal_s16(sss, pix_i16x4, coeff); // pixel 3 let pix_i16x4 = vreinterpret_s16_u8(vtbl1_u8( source1, vcreate_u8(u64::from_le_bytes([5, 255, 6, 255, 7, 255, 255, 255])), )); let coeff = vdup_lane_s16::<3>(coeffs_i16x4); sss = vmlal_s16(sss, pix_i16x4, coeff); sss } fast_image_resize-5.3.0/src/convolution/u8x3/sse4.rs000064400000000000000000000237031046102023000205170ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x3; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x3]; 4], dst_rows: [&mut [U8x3]; 4], normalizer: &Normalizer16, ) { let zero = _mm_setzero_si128(); let initial = _mm_set1_epi32(1 << (PRECISION - 1)); let src_width = src_rows[0].len(); /* |R G B | |R G B | |R G B | |R G B | |R G B | |R | |00 01 02| |03 04 05| |06 07 08| |09 10 11| |12 13 14| |15| Ignore 12-15 bytes in register and shuffle other components with converting from u8 into i16: x: |-1 -1| |-1 -1| B: |-1 05| |-1 02| G: |-1 04| |-1 01| R: |-1 03| |-1 00| */ #[rustfmt::skip] let sh_lo = _mm_set_epi8( -1, -1, -1, -1, -1, 5, -1, 2, -1, 4, -1, 1, -1, 3, -1, 0, ); /* x: |-1 -1| |-1 -1| B: |-1 11| |-1 08| G: |-1 10| |-1 07| R: |-1 09| |-1 06| */ #[rustfmt::skip] let sh_hi = _mm_set_epi8( -1, -1, -1, -1, -1, 11, -1, 8, -1, 10, -1, 7, -1, 9, -1, 6, ); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let x_start = chunk.start as usize; let mut x = x_start; let mut sss_a = [initial; 4]; let mut coeffs = chunk.values(); // The next block of code will load source pixels by 16 bytes per time. // We must guarantee that this process won't go beyond // the one row of the image. // (16 bytes) / (3 bytes per pixel) = 5 whole pixels + 1 byte let max_x = src_width.saturating_sub(5); if x < max_x { let coeffs_by_4 = coeffs.chunks_exact(4); for k in coeffs_by_4 { let mmk0 = simd_utils::mm_load_and_clone_i16x2(k); let mmk1 = simd_utils::mm_load_and_clone_i16x2(&k[2..]); for i in 0..4 { let source = simd_utils::loadu_si128(src_rows[i], x); let pix = _mm_shuffle_epi8(source, sh_lo); let mut sss = sss_a[i]; sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk0)); let pix = _mm_shuffle_epi8(source, sh_hi); sss_a[i] = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk1)); } x += 4; if x >= max_x { break; } } } // The next block of code will load source pixels by 8 bytes per time. // We must guarantee that this process won't go beyond // the one row of the image. // (8 bytes) / (3 bytes per pixel) = 2 whole pixels + 2 bytes let max_x = src_width.saturating_sub(2); if x < max_x { let coeffs_by_2 = coeffs[x - x_start..].chunks_exact(2); for k in coeffs_by_2 { let mmk = simd_utils::mm_load_and_clone_i16x2(k); for i in 0..4 { let source = simd_utils::loadl_epi64(src_rows[i], x); let pix = _mm_shuffle_epi8(source, sh_lo); sss_a[i] = _mm_add_epi32(sss_a[i], _mm_madd_epi16(pix, mmk)); } x += 2; if x >= max_x { break; } } } coeffs = coeffs.split_at(x - x_start).1; for &k in coeffs { let mmk = _mm_set1_epi32(k as i32); for i in 0..4 { let pix = simd_utils::mm_cvtepu8_epi32_u8x3(src_rows[i], x); sss_a[i] = _mm_add_epi32(sss_a[i], _mm_madd_epi16(pix, mmk)); } x += 1; } sss_a[0] = _mm_srai_epi32::(sss_a[0]); sss_a[1] = _mm_srai_epi32::(sss_a[1]); sss_a[2] = _mm_srai_epi32::(sss_a[2]); sss_a[3] = _mm_srai_epi32::(sss_a[3]); for i in 0..4 { let sss = _mm_packs_epi32(sss_a[i], zero); let pixel: u32 = i32::cast_unsigned(_mm_cvtsi128_si32(_mm_packus_epi16(sss, zero))); let bytes = pixel.to_le_bytes(); let dst_pixel = dst_rows[i].get_unchecked_mut(dst_x); dst_pixel.0 = [bytes[0], bytes[1], bytes[2]]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_one_row( src_row: &[U8x3], dst_row: &mut [U8x3], normalizer: &Normalizer16, ) { #[rustfmt::skip] let pix_sh1 = _mm_set_epi8( -1, -1, -1, -1, -1, 5, -1, 2, -1, 4, -1, 1, -1, 3, -1, 0, ); #[rustfmt::skip] let coef_sh1 = _mm_set_epi8( 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, ); #[rustfmt::skip] let pix_sh2 = _mm_set_epi8( -1, -1, -1, -1, -1, 11, -1, 8, -1, 10, -1, 7, -1, 9, -1, 6, ); #[rustfmt::skip] let coef_sh2 = _mm_set_epi8( 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, ); /* Load 8 bytes from memory into low half of 16-bytes register: |R G B | |R G B | |R G | |00 01 02| |03 04 05| |06 07| 08 09 10 11 12 13 14 15 Ignore 06-16 bytes in 16-bytes register and shuffle other components with converting from u8 into i16: x: |-1 -1| |-1 -1| B: |-1 05| |-1 02| G: |-1 04| |-1 01| R: |-1 03| |-1 00| */ let src_width = src_row.len(); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let x_start = chunk.start as usize; let mut x = x_start; let mut coeffs = chunk.values(); let mut sss = _mm_set1_epi32(1 << (PRECISION - 1)); // The next block of code will load source pixels by 16 bytes per time. // We must guarantee that this process won't go beyond // the one row of the image. // (16 bytes) / (3 bytes per pixel) = 5 whole pixels + 1 byte let max_x = src_width.saturating_sub(5); if x < max_x { let coeffs_by_4 = coeffs.chunks_exact(4); for k in coeffs_by_4 { let ksource = simd_utils::loadl_epi64(k, 0); let source = simd_utils::loadu_si128(src_row, x); let pix = _mm_shuffle_epi8(source, pix_sh1); let mmk = _mm_shuffle_epi8(ksource, coef_sh1); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); let pix = _mm_shuffle_epi8(source, pix_sh2); let mmk = _mm_shuffle_epi8(ksource, coef_sh2); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 4; if x >= max_x { break; } } } // The next block of code will load source pixels by 8 bytes per time. // We must guarantee that this process won't go beyond // the one row of the image. // (8 bytes) / (3 bytes per pixel) = 2 whole pixels + 2 bytes let max_x = src_width.saturating_sub(2); if x < max_x { let coeffs_by_2 = coeffs[x - x_start..].chunks_exact(2); for k in coeffs_by_2 { let mmk = simd_utils::mm_load_and_clone_i16x2(k); let source = simd_utils::loadl_epi64(src_row, x); let pix = _mm_shuffle_epi8(source, pix_sh1); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 2; if x >= max_x { break; } } } coeffs = coeffs.split_at(x - x_start).1; for &k in coeffs { let pix = simd_utils::mm_cvtepu8_epi32_u8x3(src_row, x); let mmk = _mm_set1_epi32(k as i32); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 1; } sss = _mm_srai_epi32::(sss); sss = _mm_packs_epi32(sss, sss); let pixel: u32 = i32::cast_unsigned(_mm_cvtsi128_si32(_mm_packus_epi16(sss, sss))); let bytes = pixel.to_le_bytes(); let dst_pixel = dst_row.get_unchecked_mut(dst_x); dst_pixel.0 = [bytes[0], bytes[1], bytes[2]]; } } fast_image_resize-5.3.0/src/convolution/u8x3/wasm32.rs000064400000000000000000000227771046102023000207670ustar 00000000000000use std::arch::wasm32::*; use std::mem::transmute; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x3; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x3]; 4], dst_rows: [&mut [U8x3]; 4], normalizer: &Normalizer16, ) { const ZERO: v128 = i64x2(0, 0); let precision = normalizer.precision() as u32; let initial = i32x4_splat(1 << (precision - 1)); let src_width = src_rows[0].len(); /* |R G B | |R G B | |R G B | |R G B | |R G B | |R | |00 01 02| |03 04 05| |06 07 08| |09 10 11| |12 13 14| |15| Ignore 12-15 bytes in register and shuffle other components with converting from u8 into i16: x: |-1 -1| |-1 -1| B: |-1 05| |-1 02| G: |-1 04| |-1 01| R: |-1 03| |-1 00| */ #[rustfmt::skip] const SH_LO: v128 = i8x16( 0, -1, 3, -1, 1, -1, 4, -1, 2, -1, 5, -1, -1, -1, -1, -1 ); /* x: |-1 -1| |-1 -1| B: |-1 11| |-1 08| G: |-1 10| |-1 07| R: |-1 09| |-1 06| */ #[rustfmt::skip] const SH_HI: v128 = i8x16( 6, -1, 9, -1, 7, -1, 10, -1, 8, -1, 11, -1, -1, -1, -1, -1 ); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let x_start = coeffs_chunk.start as usize; let mut x = x_start; let mut sss_a = [initial; 4]; let mut coeffs = coeffs_chunk.values(); // Next block of code will be load source pixels by 16 bytes per time. // We must guarantee what this process will not go beyond // the one row of image. // (16 bytes) / (3 bytes per pixel) = 5 whole pixels + 1 byte let max_x = src_width.saturating_sub(5); if x < max_x { let coeffs_by_4 = coeffs.chunks_exact(4); for k in coeffs_by_4 { let mmk0 = wasm32_utils::ptr_i16_to_set1_i32(k, 0); let mmk1 = wasm32_utils::ptr_i16_to_set1_i32(k, 2); for i in 0..4 { let source = wasm32_utils::load_v128(src_rows[i], x); let pix = i8x16_swizzle(source, SH_LO); let mut sss = sss_a[i]; sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk0)); let pix = i8x16_swizzle(source, SH_HI); sss_a[i] = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk1)); } x += 4; if x >= max_x { break; } } } // Next block of code will be load source pixels by 8 bytes per time. // We must guarantee what this process will not go beyond // the one row of image. // (8 bytes) / (3 bytes per pixel) = 2 whole pixels + 2 bytes let max_x = src_width.saturating_sub(2); if x < max_x { let coeffs_by_2 = coeffs[x - x_start..].chunks_exact(2); for k in coeffs_by_2 { let mmk = wasm32_utils::ptr_i16_to_set1_i32(k, 0); for i in 0..4 { let source = wasm32_utils::loadl_i64(src_rows[i], x); let pix = i8x16_swizzle(source, SH_LO); sss_a[i] = i32x4_add(sss_a[i], i32x4_dot_i16x8(pix, mmk)); } x += 2; if x >= max_x { break; } } } coeffs = coeffs.split_at(x - x_start).1; for &k in coeffs { let mmk = i32x4_splat(k as i32); for i in 0..4 { let pix = wasm32_utils::i32x4_extend_low_ptr_u8x3(src_rows[i], x); sss_a[i] = i32x4_add(sss_a[i], i32x4_dot_i16x8(pix, mmk)); } x += 1; } sss_a[0] = i32x4_shr(sss_a[0], precision); sss_a[1] = i32x4_shr(sss_a[1], precision); sss_a[2] = i32x4_shr(sss_a[2], precision); sss_a[3] = i32x4_shr(sss_a[3], precision); for i in 0..4 { let sss = i16x8_narrow_i32x4(sss_a[i], ZERO); let pixel: u32 = transmute(i32x4_extract_lane::<0>(u8x16_narrow_i16x8(sss, ZERO))); let bytes = pixel.to_le_bytes(); dst_rows[i].get_unchecked_mut(dst_x).0 = [bytes[0], bytes[1], bytes[2]]; } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_one_row( src_row: &[U8x3], dst_row: &mut [U8x3], normalizer: &Normalizer16, ) { #[rustfmt::skip] const PIX_SH1: v128 = i8x16( 0, -1, 3, -1, 1, -1, 4, -1, 2, -1, 5, -1, -1, -1, -1, -1 ); #[rustfmt::skip] const COEF_SH1: v128 = i8x16( 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3 ); #[rustfmt::skip] const PIX_SH2: v128 = i8x16( 6, -1, 9, -1, 7, -1, 10, -1, 8, -1, 11, -1, -1, -1, -1, -1 ); #[rustfmt::skip] const COEF_SH2: v128 = i8x16( 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7 ); /* Load 8 bytes from memory into low half of 16-bytes register: |R G B | |R G B | |R G | |00 01 02| |03 04 05| |06 07| 08 09 10 11 12 13 14 15 Ignore 06-16 bytes in 16-bytes register and shuffle other components with converting from u8 into i16: x: |-1 -1| |-1 -1| B: |-1 05| |-1 02| G: |-1 04| |-1 01| R: |-1 03| |-1 00| */ let precision = normalizer.precision() as u32; let src_width = src_row.len(); let initial = i32x4_splat(1 << (precision - 1)); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let x_start = coeffs_chunk.start as usize; let mut x = x_start; let mut coeffs = coeffs_chunk.values(); let mut sss = initial; // Next block of code will be load source pixels by 16 bytes per time. // We must guarantee what this process will not go beyond // the one row of image. // (16 bytes) / (3 bytes per pixel) = 5 whole pixels + 1 bytes let max_x = src_width.saturating_sub(5); if x < max_x { let coeffs_by_4 = coeffs.chunks_exact(4); for k in coeffs_by_4 { let ksource = wasm32_utils::loadl_i64(k, 0); let source = wasm32_utils::load_v128(src_row, x); let pix = i8x16_swizzle(source, PIX_SH1); let mmk = i8x16_swizzle(ksource, COEF_SH1); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); let pix = i8x16_swizzle(source, PIX_SH2); let mmk = i8x16_swizzle(ksource, COEF_SH2); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); x += 4; if x >= max_x { break; } } } // Next block of code will be load source pixels by 8 bytes per time. // We must guarantee what this process will not go beyond // the one row of image. // (8 bytes) / (3 bytes per pixel) = 2 whole pixels + 2 bytes let max_x = src_width.saturating_sub(2); if x < max_x { let coeffs_by_2 = coeffs[x - x_start..].chunks_exact(2); for k in coeffs_by_2 { let mmk = wasm32_utils::ptr_i16_to_set1_i32(k, 0); let source = wasm32_utils::loadl_i64(src_row, x); let pix = i8x16_swizzle(source, PIX_SH1); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); x += 2; if x >= max_x { break; } } } coeffs = coeffs.split_at(x - x_start).1; for &k in coeffs { let pix = wasm32_utils::i32x4_extend_low_ptr_u8x3(src_row, x); let mmk = i32x4_splat(k as i32); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); x += 1; } sss = i32x4_shr(sss, precision); sss = i16x8_narrow_i32x4(sss, sss); let pixel: u32 = transmute(i32x4_extract_lane::<0>(u8x16_narrow_i16x8(sss, sss))); let bytes = pixel.to_le_bytes(); dst_row.get_unchecked_mut(dst_x).0 = [bytes[0], bytes[1], bytes[2]]; } } fast_image_resize-5.3.0/src/convolution/u8x4/avx2.rs000064400000000000000000000261751046102023000205300ustar 00000000000000use std::arch::x86_64::*; use std::mem::transmute; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x4; use crate::{simd_utils, ImageView, ImageViewMut}; // This code is based on C-implementation from Pillow-SIMD package for Python // https://github.com/uploadcare/pillow-simd #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x4]; 4], dst_rows: [&mut [U8x4]; 4], normalizer: &Normalizer16, ) { let zero = _mm256_setzero_si256(); let initial = _mm256_set1_epi32(1 << (PRECISION - 1)); #[rustfmt::skip] let sh1 = _mm256_set_epi8( -1, 7, -1, 3, -1, 6, -1, 2, -1, 5, -1, 1, -1, 4, -1, 0, -1, 7, -1, 3, -1, 6, -1, 2, -1, 5, -1, 1, -1, 4, -1, 0, ); #[rustfmt::skip] let sh2 = _mm256_set_epi8( -1, 15, -1, 11, -1, 14, -1, 10, -1, 13, -1, 9, -1, 12, -1, 8, -1, 15, -1, 11, -1, 14, -1, 10, -1, 13, -1, 9, -1, 12, -1, 8, ); let coefficients_chunks = normalizer.chunks(); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x = coeffs_chunk.start as usize; let mut sss0 = initial; let mut sss1 = initial; let coeffs = coeffs_chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); let reminder1 = coeffs_by_4.remainder(); for k in coeffs_by_4 { let mmk0 = simd_utils::mm256_load_and_clone_i16x2(k); let mmk1 = simd_utils::mm256_load_and_clone_i16x2(&k[2..]); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadu_si128(src_rows[0], x)), simd_utils::loadu_si128(src_rows[1], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk0)); let pix = _mm256_shuffle_epi8(source, sh2); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk1)); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadu_si128(src_rows[2], x)), simd_utils::loadu_si128(src_rows[3], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk0)); let pix = _mm256_shuffle_epi8(source, sh2); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk1)); x += 4; } let coeffs_by_2 = reminder1.chunks_exact(2); let reminder2 = coeffs_by_2.remainder(); for k in coeffs_by_2 { let mmk = simd_utils::mm256_load_and_clone_i16x2(k); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi64(src_rows[0], x)), simd_utils::loadl_epi64(src_rows[1], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk)); let source = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::loadl_epi64(src_rows[2], x)), simd_utils::loadl_epi64(src_rows[3], x), ); let pix = _mm256_shuffle_epi8(source, sh1); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk)); x += 2; } if let Some(&k) = reminder2.first() { // [16] xx k0 xx k0 xx k0 xx k0 xx k0 xx k0 xx k0 xx k0 let mmk = _mm256_set1_epi32(k as i32); // [16] xx a0 xx b0 xx g0 xx r0 xx a0 xx b0 xx g0 xx r0 let pix = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::mm_cvtepu8_epi32(src_rows[0], x)), simd_utils::mm_cvtepu8_epi32(src_rows[1], x), ); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk)); let pix = _mm256_inserti128_si256::<1>( _mm256_castsi128_si256(simd_utils::mm_cvtepu8_epi32(src_rows[2], x)), simd_utils::mm_cvtepu8_epi32(src_rows[3], x), ); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk)); } sss0 = _mm256_srai_epi32::(sss0); sss1 = _mm256_srai_epi32::(sss1); sss0 = _mm256_packs_epi32(sss0, zero); sss1 = _mm256_packs_epi32(sss1, zero); sss0 = _mm256_packus_epi16(sss0, zero); sss1 = _mm256_packus_epi16(sss1, zero); *dst_rows[0].get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm256_extracti128_si256::<0>(sss0))); *dst_rows[1].get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm256_extracti128_si256::<1>(sss0))); *dst_rows[2].get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm256_extracti128_si256::<0>(sss1))); *dst_rows[3].get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm256_extracti128_si256::<1>(sss1))); } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coeffs.len() == dst_rows.0.len() * window_size /// - max(bound.start + bound.size for bound in bounds) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[inline] #[target_feature(enable = "avx2")] unsafe fn horiz_convolution_one_row( src_row: &[U8x4], dst_row: &mut [U8x4], normalizer: &Normalizer16, ) { #[rustfmt::skip] let sh1 = _mm256_set_epi8( -1, 7, -1, 3, -1, 6, -1, 2, -1, 5, -1, 1, -1, 4, -1, 0, -1, 7, -1, 3, -1, 6, -1, 2, -1, 5, -1, 1, -1, 4, -1, 0, ); #[rustfmt::skip] let sh2 = _mm256_set_epi8( 11, 10, 9, 8, 11, 10, 9, 8, 11, 10, 9, 8, 11, 10, 9, 8, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, ); #[rustfmt::skip] let sh3 = _mm256_set_epi8( -1, 15, -1, 11, -1, 14, -1, 10, -1, 13, -1, 9, -1, 12, -1, 8, -1, 15, -1, 11, -1, 14, -1, 10, -1, 13, -1, 9, -1, 12, -1, 8, ); #[rustfmt::skip] let sh4 = _mm256_set_epi8( 15, 14, 13, 12, 15, 14, 13, 12, 15, 14, 13, 12, 15, 14, 13, 12, 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, ); #[rustfmt::skip] let sh5 = _mm256_set_epi8( -1, 15, -1, 11, -1, 14, -1, 10, -1, 13, -1, 9, -1, 12, -1, 8, -1, 7, -1, 3, -1, 6, -1, 2, -1, 5, -1, 1, -1, 4, -1, 0, ); #[rustfmt::skip] let sh6 = _mm256_set_epi8( 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, 7, 6, 5, 4, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, ); let sh7 = _mm_set_epi8(-1, 7, -1, 3, -1, 6, -1, 2, -1, 5, -1, 1, -1, 4, -1, 0); let coefficients_chunks = normalizer.chunks(); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x = coeffs_chunk.start as usize; let mut coeffs = coeffs_chunk.values(); let mut sss = if coeffs.len() < 8 { _mm_set1_epi32(1 << (PRECISION - 1)) } else { // Lower part will be added to higher, use only half of the error let mut sss256 = _mm256_set1_epi32(1 << (PRECISION - 2)); let coeffs_by_8 = coeffs.chunks_exact(8); let reminder1 = coeffs_by_8.remainder(); for k in coeffs_by_8 { let tmp = simd_utils::loadu_si128(k, 0); let ksource = _mm256_insertf128_si256::<1>(_mm256_castsi128_si256(tmp), tmp); let source = simd_utils::loadu_si256(src_row, x); let pix = _mm256_shuffle_epi8(source, sh1); let mmk = _mm256_shuffle_epi8(ksource, sh2); sss256 = _mm256_add_epi32(sss256, _mm256_madd_epi16(pix, mmk)); let pix = _mm256_shuffle_epi8(source, sh3); let mmk = _mm256_shuffle_epi8(ksource, sh4); sss256 = _mm256_add_epi32(sss256, _mm256_madd_epi16(pix, mmk)); x += 8; } let coeffs_by_4 = reminder1.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let tmp = simd_utils::loadl_epi64(k, 0); let ksource = _mm256_insertf128_si256::<1>(_mm256_castsi128_si256(tmp), tmp); let tmp = simd_utils::loadu_si128(src_row, x); let source = _mm256_insertf128_si256::<1>(_mm256_castsi128_si256(tmp), tmp); let pix = _mm256_shuffle_epi8(source, sh5); let mmk = _mm256_shuffle_epi8(ksource, sh6); sss256 = _mm256_add_epi32(sss256, _mm256_madd_epi16(pix, mmk)); x += 4; } _mm_add_epi32( _mm256_extracti128_si256::<0>(sss256), _mm256_extracti128_si256::<1>(sss256), ) }; let coeffs_by_2 = coeffs.chunks_exact(2); let reminder1 = coeffs_by_2.remainder(); for k in coeffs_by_2 { let mmk = simd_utils::mm_load_and_clone_i16x2(k); let source = simd_utils::loadl_epi64(src_row, x); let pix = _mm_shuffle_epi8(source, sh7); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 2 } if let Some(&k) = reminder1.first() { let pix = simd_utils::mm_cvtepu8_epi32(src_row, x); let mmk = _mm_set1_epi32(k as i32); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); } sss = _mm_srai_epi32::(sss); sss = _mm_packs_epi32(sss, sss); *dst_row.get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm_packus_epi16(sss, sss))); } } fast_image_resize-5.3.0/src/convolution/u8x4/mod.rs000064400000000000000000000050331046102023000204150ustar 00000000000000use super::{Coefficients, Convolution}; use crate::convolution::optimisations::Normalizer16; use crate::convolution::vertical_u8::vert_convolution_u8; use crate::pixels::U8x4; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] mod avx2; mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] mod sse4; #[cfg(target_arch = "wasm32")] mod wasm32; type P = U8x4; impl Convolution for P { fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.height() - offset >= dst_view.height()); let normalizer = Normalizer16::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_h! { horiz_convolution( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: Coefficients, cpu_extensions: CpuExtensions, ) { debug_assert!(src_view.width() - offset >= dst_view.width()); let normalizer = Normalizer16::new(coeffs); let normalizer_ref = &normalizer; try_process_in_threads_v! { vert_convolution_u8( src_view, dst_view, offset, normalizer_ref, cpu_extensions, ); } } } fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, cpu_extensions: CpuExtensions, ) { match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::horiz_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::horiz_convolution(src_view, dst_view, offset, normalizer), _ => native::horiz_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/u8x4/native.rs000064400000000000000000000022471046102023000211300ustar 00000000000000use crate::convolution::optimisations::Normalizer16; use crate::image_view::{ImageView, ImageViewMut}; use crate::pixels::U8x4; #[inline(always)] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); let initial = 1 << (precision - 1); let coefficients = normalizer.chunks(); let src_rows = src_view.iter_rows(offset); let dst_rows = dst_view.iter_rows_mut(0); for (src_row, dst_row) in src_rows.zip(dst_rows) { for (coeffs_chunk, dst_pixel) in coefficients.iter().zip(dst_row.iter_mut()) { let first_x_src = coeffs_chunk.start as usize; let mut ss = [initial; 4]; let src_pixels = unsafe { src_row.get_unchecked(first_x_src..) }; for (&k, &src_pixel) in coeffs_chunk.values().iter().zip(src_pixels) { for (i, s) in ss.iter_mut().enumerate() { *s += src_pixel.0[i] as i32 * (k as i32); } } dst_pixel.0 = ss.map(|v| unsafe { normalizer.clip(v) }); } } } fast_image_resize-5.3.0/src/convolution/u8x4/neon.rs000064400000000000000000000266221046102023000206040ustar 00000000000000use std::arch::aarch64::*; use crate::convolution::optimisations::{CoefficientsI16Chunk, Normalizer16}; use crate::neon_utils; use crate::pixels::U8x4; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let coefficients_chunks = normalizer.chunks(); let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, &coefficients_chunks); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, &coefficients_chunks); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x4]; 4], dst_rows: [&mut [U8x4]; 4], coefficients_chunks: &[CoefficientsI16Chunk], ) { let initial = vdupq_n_s32(1 << (PRECISION - 1)); let zero_u8x16 = vdupq_n_u8(0); let zero_u8x8 = vdup_n_u8(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss_a = [initial; 4]; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = neon_utils::load_i16x8(k, 0); let coeff0 = vdup_laneq_s16::<0>(coeffs_i16x8); let coeff1 = vdup_laneq_s16::<1>(coeffs_i16x8); let coeff2 = vdup_laneq_s16::<2>(coeffs_i16x8); let coeff3 = vdup_laneq_s16::<3>(coeffs_i16x8); let coeff4 = vdup_laneq_s16::<4>(coeffs_i16x8); let coeff5 = vdup_laneq_s16::<5>(coeffs_i16x8); let coeff6 = vdup_laneq_s16::<6>(coeffs_i16x8); let coeff7 = vdup_laneq_s16::<7>(coeffs_i16x8); for i in 0..4 { let source = neon_utils::load_u8x16(src_rows[i], x); let mut sss = sss_a[i]; let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, coeff0); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, coeff1); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, coeff2); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, coeff3); let source = neon_utils::load_u8x16(src_rows[i], x + 4); let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, coeff4); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, coeff5); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, coeff6); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, coeff7); sss_a[i] = sss; } x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeffs_i16x4 = neon_utils::load_i16x4(k, 0); let coeff0 = vdup_lane_s16::<0>(coeffs_i16x4); let coeff1 = vdup_lane_s16::<1>(coeffs_i16x4); let coeff2 = vdup_lane_s16::<2>(coeffs_i16x4); let coeff3 = vdup_lane_s16::<3>(coeffs_i16x4); for i in 0..4 { let source = neon_utils::load_u8x16(src_rows[i], x); let mut sss = sss_a[i]; let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, coeff0); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, coeff1); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, coeff2); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, coeff3); sss_a[i] = sss; } x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let coeff0 = vdup_n_s16(k[0]); let coeff1 = vdup_n_s16(k[1]); for i in 0..4 { let source = neon_utils::load_u8x8(src_rows[i], x); let mut sss = sss_a[i]; let pix = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff0); let pix = vreinterpret_s16_u8(vzip2_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, coeff1); sss_a[i] = sss; } x += 2 } if let Some(&k) = coeffs.first() { let coeff = vdup_n_s16(k); for i in 0..4 { let source = neon_utils::load_u8x4(src_rows[i], x); let pix = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); sss_a[i] = vmlal_s16(sss_a[i], pix, coeff); } } sss_a[0] = vshrq_n_s32::(sss_a[0]); sss_a[1] = vshrq_n_s32::(sss_a[1]); sss_a[2] = vshrq_n_s32::(sss_a[2]); sss_a[3] = vshrq_n_s32::(sss_a[3]); for i in 0..4 { let s = vqmovun_s16(vcombine_s16(vqmovn_s32(sss_a[i]), vdup_n_s16(0))); let s = vreinterpret_u32_u8(s); dst_rows[i].get_unchecked_mut(dst_x).0 = vget_lane_u32::<0>(s).to_le_bytes(); } } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "neon")] unsafe fn horiz_convolution_one_row( src_row: &[U8x4], dst_row: &mut [U8x4], coefficients_chunks: &[CoefficientsI16Chunk], ) { let initial = vdupq_n_s32(1 << (PRECISION - 1)); let zero_u8x16 = vdupq_n_u8(0); let zero_u8x8 = vdup_n_u8(0); for (dst_x, coeffs_chunk) in coefficients_chunks.iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss = initial; let mut coeffs = coeffs_chunk.values(); let coeffs_by_8 = coeffs.chunks_exact(8); coeffs = coeffs_by_8.remainder(); for k in coeffs_by_8 { let coeffs_i16x8 = neon_utils::load_i16x8(k, 0); let source = neon_utils::load_u8x16(src_row, x); let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_laneq_s16::<0>(coeffs_i16x8)); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_laneq_s16::<1>(coeffs_i16x8)); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_laneq_s16::<2>(coeffs_i16x8)); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_laneq_s16::<3>(coeffs_i16x8)); let source = neon_utils::load_u8x16(src_row, x + 4); let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_laneq_s16::<4>(coeffs_i16x8)); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_laneq_s16::<5>(coeffs_i16x8)); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_laneq_s16::<6>(coeffs_i16x8)); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_laneq_s16::<7>(coeffs_i16x8)); x += 8; } let coeffs_by_4 = coeffs.chunks_exact(4); coeffs = coeffs_by_4.remainder(); for k in coeffs_by_4 { let coeffs_i16x4 = neon_utils::load_i16x4(k, 0); let source = neon_utils::load_u8x16(src_row, x); let source_i16 = vreinterpretq_s16_u8(vzip1q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_lane_s16::<0>(coeffs_i16x4)); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_lane_s16::<1>(coeffs_i16x4)); let source_i16 = vreinterpretq_s16_u8(vzip2q_u8(source, zero_u8x16)); let pix = vget_low_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_lane_s16::<2>(coeffs_i16x4)); let pix = vget_high_s16(source_i16); sss = vmlal_s16(sss, pix, vdup_lane_s16::<3>(coeffs_i16x4)); x += 4; } let coeffs_by_2 = coeffs.chunks_exact(2); coeffs = coeffs_by_2.remainder(); for k in coeffs_by_2 { let source = neon_utils::load_u8x8(src_row, x); let pix = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, vdup_n_s16(k[0])); let pix = vreinterpret_s16_u8(vzip2_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, vdup_n_s16(k[1])); x += 2 } if let Some(&k) = coeffs.first() { let source = neon_utils::load_u8x4(src_row, x); let pix = vreinterpret_s16_u8(vzip1_u8(source, zero_u8x8)); sss = vmlal_s16(sss, pix, vdup_n_s16(k)); } sss = vshrq_n_s32::(sss); let s = vqmovun_s16(vcombine_s16(vqmovn_s32(sss), vdup_n_s16(0))); let s = vreinterpret_u32_u8(s); dst_row.get_unchecked_mut(dst_x).0 = vget_lane_u32::<0>(s).to_le_bytes(); } } fast_image_resize-5.3.0/src/convolution/u8x4/sse4.rs000064400000000000000000000252221046102023000205160ustar 00000000000000use std::arch::x86_64::*; use std::mem::transmute; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x4; use crate::{simd_utils, ImageView, ImageViewMut}; // This code is based on C-implementation from Pillow-SIMD package for Python // https://github.com/uploadcare/pillow-simd #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ horiz_convolution_p::<$imm8>(src_view, dst_view, offset, normalizer); }}; } constify_imm8!(precision, call); } fn horiz_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows::(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row::(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x4]; 4], dst_rows: [&mut [U8x4]; 4], normalizer: &Normalizer16, ) { let initial = _mm_set1_epi32(1 << (PRECISION - 1)); let mask_lo = _mm_set_epi8(-1, 7, -1, 3, -1, 6, -1, 2, -1, 5, -1, 1, -1, 4, -1, 0); let mask_hi = _mm_set_epi8(-1, 15, -1, 11, -1, 14, -1, 10, -1, 13, -1, 9, -1, 12, -1, 8); let mask = _mm_set_epi8(-1, 7, -1, 3, -1, 6, -1, 2, -1, 5, -1, 1, -1, 4, -1, 0); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut sss0 = initial; let mut sss1 = initial; let mut sss2 = initial; let mut sss3 = initial; let coeffs_by_4 = chunk.values().chunks_exact(4); let reminder1 = coeffs_by_4.remainder(); for k in coeffs_by_4 { let mmk_lo = simd_utils::mm_load_and_clone_i16x2(k); let mmk_hi = simd_utils::mm_load_and_clone_i16x2(&k[2..]); // [8] a3 b3 g3 r3 a2 b2 g2 r2 a1 b1 g1 r1 a0 b0 g0 r0 let mut source = simd_utils::loadu_si128(src_rows[0], x); // [16] a1 a0 b1 b0 g1 g0 r1 r0 let mut pix = _mm_shuffle_epi8(source, mask_lo); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk_lo)); // [16] a3 a2 b3 b2 g3 g2 r3 r2 pix = _mm_shuffle_epi8(source, mask_hi); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk_hi)); source = simd_utils::loadu_si128(src_rows[1], x); pix = _mm_shuffle_epi8(source, mask_lo); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk_lo)); pix = _mm_shuffle_epi8(source, mask_hi); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk_hi)); source = simd_utils::loadu_si128(src_rows[2], x); pix = _mm_shuffle_epi8(source, mask_lo); sss2 = _mm_add_epi32(sss2, _mm_madd_epi16(pix, mmk_lo)); pix = _mm_shuffle_epi8(source, mask_hi); sss2 = _mm_add_epi32(sss2, _mm_madd_epi16(pix, mmk_hi)); source = simd_utils::loadu_si128(src_rows[3], x); pix = _mm_shuffle_epi8(source, mask_lo); sss3 = _mm_add_epi32(sss3, _mm_madd_epi16(pix, mmk_lo)); pix = _mm_shuffle_epi8(source, mask_hi); sss3 = _mm_add_epi32(sss3, _mm_madd_epi16(pix, mmk_hi)); x += 4; } let coeffs_by_2 = reminder1.chunks_exact(2); let reminder2 = coeffs_by_2.remainder(); for k in coeffs_by_2 { // [16] k1 k0 k1 k0 k1 k0 k1 k0 let mmk = simd_utils::mm_load_and_clone_i16x2(k); // [8] x x x x x x x x a1 b1 g1 r1 a0 b0 g0 r0 let mut pix = simd_utils::loadl_epi64(src_rows[0], x); // [16] a1 a0 b1 b0 g1 g0 r1 r0 pix = _mm_shuffle_epi8(pix, mask); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk)); pix = simd_utils::loadl_epi64(src_rows[1], x); pix = _mm_shuffle_epi8(pix, mask); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk)); pix = simd_utils::loadl_epi64(src_rows[2], x); pix = _mm_shuffle_epi8(pix, mask); sss2 = _mm_add_epi32(sss2, _mm_madd_epi16(pix, mmk)); pix = simd_utils::loadl_epi64(src_rows[3], x); pix = _mm_shuffle_epi8(pix, mask); sss3 = _mm_add_epi32(sss3, _mm_madd_epi16(pix, mmk)); x += 2; } if let Some(&k) = reminder2.first() { // [16] xx k0 xx k0 xx k0 xx k0 let mmk = _mm_set1_epi32(k as i32); // [16] xx a0 xx b0 xx g0 xx r0 let mut pix = simd_utils::mm_cvtepu8_epi32(src_rows[0], x); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk)); pix = simd_utils::mm_cvtepu8_epi32(src_rows[1], x); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk)); pix = simd_utils::mm_cvtepu8_epi32(src_rows[2], x); sss2 = _mm_add_epi32(sss2, _mm_madd_epi16(pix, mmk)); pix = simd_utils::mm_cvtepu8_epi32(src_rows[3], x); sss3 = _mm_add_epi32(sss3, _mm_madd_epi16(pix, mmk)); } sss0 = _mm_srai_epi32::(sss0); sss1 = _mm_srai_epi32::(sss1); sss2 = _mm_srai_epi32::(sss2); sss3 = _mm_srai_epi32::(sss3); sss0 = _mm_packs_epi32(sss0, sss0); sss1 = _mm_packs_epi32(sss1, sss1); sss2 = _mm_packs_epi32(sss2, sss2); sss3 = _mm_packs_epi32(sss3, sss3); *dst_rows[0].get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm_packus_epi16(sss0, sss0))); *dst_rows[1].get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm_packus_epi16(sss1, sss1))); *dst_rows[2].get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm_packus_epi16(sss2, sss2))); *dst_rows[3].get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm_packus_epi16(sss3, sss3))); } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "sse4.1")] unsafe fn horiz_convolution_one_row( src_row: &[U8x4], dst_row: &mut [U8x4], normalizer: &Normalizer16, ) { let initial = _mm_set1_epi32(1 << (PRECISION - 1)); let sh1 = _mm_set_epi8(-1, 11, -1, 3, -1, 10, -1, 2, -1, 9, -1, 1, -1, 8, -1, 0); let sh2 = _mm_set_epi8(5, 4, 1, 0, 5, 4, 1, 0, 5, 4, 1, 0, 5, 4, 1, 0); let sh3 = _mm_set_epi8(-1, 15, -1, 7, -1, 14, -1, 6, -1, 13, -1, 5, -1, 12, -1, 4); let sh4 = _mm_set_epi8(7, 6, 3, 2, 7, 6, 3, 2, 7, 6, 3, 2, 7, 6, 3, 2); let sh5 = _mm_set_epi8(13, 12, 9, 8, 13, 12, 9, 8, 13, 12, 9, 8, 13, 12, 9, 8); let sh6 = _mm_set_epi8( 15, 14, 11, 10, 15, 14, 11, 10, 15, 14, 11, 10, 15, 14, 11, 10, ); let sh7 = _mm_set_epi8(-1, 7, -1, 3, -1, 6, -1, 2, -1, 5, -1, 1, -1, 4, -1, 0); for (dst_x, chunk) in normalizer.chunks().iter().enumerate() { let mut x = chunk.start as usize; let mut sss = initial; let coeffs_by_8 = chunk.values().chunks_exact(8); let reminder8 = coeffs_by_8.remainder(); for k in coeffs_by_8 { let ksource = simd_utils::loadu_si128(k, 0); let mut source = simd_utils::loadu_si128(src_row, x); let mut pix = _mm_shuffle_epi8(source, sh1); let mut mmk = _mm_shuffle_epi8(ksource, sh2); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); pix = _mm_shuffle_epi8(source, sh3); mmk = _mm_shuffle_epi8(ksource, sh4); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); source = simd_utils::loadu_si128(src_row, x + 4); pix = _mm_shuffle_epi8(source, sh1); mmk = _mm_shuffle_epi8(ksource, sh5); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); pix = _mm_shuffle_epi8(source, sh3); mmk = _mm_shuffle_epi8(ksource, sh6); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 8; } let coeffs_by_4 = reminder8.chunks_exact(4); let reminder4 = coeffs_by_4.remainder(); for k in coeffs_by_4 { let source = simd_utils::loadu_si128(src_row, x); let ksource = simd_utils::loadl_epi64(k, 0); let mut pix = _mm_shuffle_epi8(source, sh1); let mut mmk = _mm_shuffle_epi8(ksource, sh2); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); pix = _mm_shuffle_epi8(source, sh3); mmk = _mm_shuffle_epi8(ksource, sh4); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 4; } let coeffs_by_2 = reminder4.chunks_exact(2); let reminder2 = coeffs_by_2.remainder(); for k in coeffs_by_2 { let mmk = simd_utils::mm_load_and_clone_i16x2(k); let source = simd_utils::loadl_epi64(src_row, x); let pix = _mm_shuffle_epi8(source, sh7); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); x += 2 } if let Some(&k) = reminder2.first() { let pix = simd_utils::mm_cvtepu8_epi32(src_row, x); let mmk = _mm_set1_epi32(k as i32); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); } sss = _mm_srai_epi32::(sss); sss = _mm_packs_epi32(sss, sss); *dst_row.get_unchecked_mut(dst_x) = transmute::(_mm_cvtsi128_si32(_mm_packus_epi16(sss, sss))); } } fast_image_resize-5.3.0/src/convolution/u8x4/wasm32.rs000064400000000000000000000241371046102023000207600ustar 00000000000000use std::arch::wasm32::*; use std::mem::transmute; use crate::convolution::optimisations::Normalizer16; use crate::pixels::U8x4; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn horiz_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let dst_height = dst_view.height(); let src_iter = src_view.iter_4_rows(offset, dst_height + offset); let dst_iter = dst_view.iter_4_rows_mut(); for (src_rows, dst_rows) in src_iter.zip(dst_iter) { unsafe { horiz_convolution_four_rows(src_rows, dst_rows, normalizer); } } let yy = dst_height - dst_height % 4; let src_rows = src_view.iter_rows(yy + offset); let dst_rows = dst_view.iter_rows_mut(yy); for (src_row, dst_row) in src_rows.zip(dst_rows) { unsafe { horiz_convolution_one_row(src_row, dst_row, normalizer); } } } /// For safety, it is necessary to ensure the following conditions: /// - length of all rows in src_rows must be equal /// - length of all rows in dst_rows must be equal /// - coefficients_chunks.len() == dst_rows.0.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.0.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_four_rows( src_rows: [&[U8x4]; 4], dst_rows: [&mut [U8x4]; 4], normalizer: &Normalizer16, ) { let precision = normalizer.precision() as u32; let initial = i32x4_splat(1 << (precision - 1)); const MASK_LO: v128 = i8x16(0, -1, 4, -1, 1, -1, 5, -1, 2, -1, 6, -1, 3, -1, 7, -1); const MASK_HI: v128 = i8x16(8, -1, 12, -1, 9, -1, 13, -1, 10, -1, 14, -1, 11, -1, 15, -1); const MASK: v128 = i8x16(0, -1, 4, -1, 1, -1, 5, -1, 2, -1, 6, -1, 3, -1, 7, -1); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss0 = initial; let mut sss1 = initial; let mut sss2 = initial; let mut sss3 = initial; let coeffs = coeffs_chunk.values(); let coeffs_by_4 = coeffs.chunks_exact(4); let reminder1 = coeffs_by_4.remainder(); for k in coeffs_by_4 { let mmk_lo = wasm32_utils::ptr_i16_to_set1_i32(k, 0); let mmk_hi = wasm32_utils::ptr_i16_to_set1_i32(k, 2); // [8] a3 b3 g3 r3 a2 b2 g2 r2 a1 b1 g1 r1 a0 b0 g0 r0 let mut source = wasm32_utils::load_v128(src_rows[0], x); // [16] a1 a0 b1 b0 g1 g0 r1 r0 let mut pix = i8x16_swizzle(source, MASK_LO); sss0 = i32x4_add(sss0, i32x4_dot_i16x8(pix, mmk_lo)); // [16] a3 a2 b3 b2 g3 g2 r3 r2 pix = i8x16_swizzle(source, MASK_HI); sss0 = i32x4_add(sss0, i32x4_dot_i16x8(pix, mmk_hi)); source = wasm32_utils::load_v128(src_rows[1], x); pix = i8x16_swizzle(source, MASK_LO); sss1 = i32x4_add(sss1, i32x4_dot_i16x8(pix, mmk_lo)); pix = i8x16_swizzle(source, MASK_HI); sss1 = i32x4_add(sss1, i32x4_dot_i16x8(pix, mmk_hi)); source = wasm32_utils::load_v128(src_rows[2], x); pix = i8x16_swizzle(source, MASK_LO); sss2 = i32x4_add(sss2, i32x4_dot_i16x8(pix, mmk_lo)); pix = i8x16_swizzle(source, MASK_HI); sss2 = i32x4_add(sss2, i32x4_dot_i16x8(pix, mmk_hi)); source = wasm32_utils::load_v128(src_rows[3], x); pix = i8x16_swizzle(source, MASK_LO); sss3 = i32x4_add(sss3, i32x4_dot_i16x8(pix, mmk_lo)); pix = i8x16_swizzle(source, MASK_HI); sss3 = i32x4_add(sss3, i32x4_dot_i16x8(pix, mmk_hi)); x += 4; } let coeffs_by_2 = reminder1.chunks_exact(2); let reminder2 = coeffs_by_2.remainder(); for k in coeffs_by_2 { // [16] k1 k0 k1 k0 k1 k0 k1 k0 let mmk = wasm32_utils::ptr_i16_to_set1_i32(k, 0); // [8] x x x x x x x x a1 b1 g1 r1 a0 b0 g0 r0 let mut pix = wasm32_utils::loadl_i64(src_rows[0], x); // [16] a1 a0 b1 b0 g1 g0 r1 r0 pix = i8x16_swizzle(pix, MASK); sss0 = i32x4_add(sss0, i32x4_dot_i16x8(pix, mmk)); pix = wasm32_utils::loadl_i64(src_rows[1], x); pix = i8x16_swizzle(pix, MASK); sss1 = i32x4_add(sss1, i32x4_dot_i16x8(pix, mmk)); pix = wasm32_utils::loadl_i64(src_rows[2], x); pix = i8x16_swizzle(pix, MASK); sss2 = i32x4_add(sss2, i32x4_dot_i16x8(pix, mmk)); pix = wasm32_utils::loadl_i64(src_rows[3], x); pix = i8x16_swizzle(pix, MASK); sss3 = i32x4_add(sss3, i32x4_dot_i16x8(pix, mmk)); x += 2; } if let Some(&k) = reminder2.first() { // [16] xx k0 xx k0 xx k0 xx k0 let mmk = i32x4_splat(k as i32); // [16] xx a0 xx b0 xx g0 xx r0 let mut pix = wasm32_utils::i32x4_extend_low_ptr_u8x4(src_rows[0], x); sss0 = i32x4_add(sss0, i32x4_dot_i16x8(pix, mmk)); pix = wasm32_utils::i32x4_extend_low_ptr_u8x4(src_rows[1], x); sss1 = i32x4_add(sss1, i32x4_dot_i16x8(pix, mmk)); pix = wasm32_utils::i32x4_extend_low_ptr_u8x4(src_rows[2], x); sss2 = i32x4_add(sss2, i32x4_dot_i16x8(pix, mmk)); pix = wasm32_utils::i32x4_extend_low_ptr_u8x4(src_rows[3], x); sss3 = i32x4_add(sss3, i32x4_dot_i16x8(pix, mmk)); } sss0 = i32x4_shr(sss0, precision); sss1 = i32x4_shr(sss1, precision); sss2 = i32x4_shr(sss2, precision); sss3 = i32x4_shr(sss3, precision); sss0 = i16x8_narrow_i32x4(sss0, sss0); sss1 = i16x8_narrow_i32x4(sss1, sss1); sss2 = i16x8_narrow_i32x4(sss2, sss2); sss3 = i16x8_narrow_i32x4(sss3, sss3); *dst_rows[0].get_unchecked_mut(dst_x) = transmute(i32x4_extract_lane::<0>(u8x16_narrow_i16x8(sss0, sss0))); *dst_rows[1].get_unchecked_mut(dst_x) = transmute(i32x4_extract_lane::<0>(u8x16_narrow_i16x8(sss1, sss1))); *dst_rows[2].get_unchecked_mut(dst_x) = transmute(i32x4_extract_lane::<0>(u8x16_narrow_i16x8(sss2, sss2))); *dst_rows[3].get_unchecked_mut(dst_x) = transmute(i32x4_extract_lane::<0>(u8x16_narrow_i16x8(sss3, sss3))); } } /// For safety, it is necessary to ensure the following conditions: /// - bounds.len() == dst_row.len() /// - coefficients_chunks.len() == dst_row.len() /// - max(chunk.start + chunk.values.len() for chunk in coefficients_chunks) <= src_row.len() /// - precision <= MAX_COEFS_PRECISION #[target_feature(enable = "simd128")] unsafe fn horiz_convolution_one_row( src_row: &[U8x4], dst_row: &mut [U8x4], normalizer: &Normalizer16, ) { let precision = normalizer.precision() as u32; let initial = i32x4_splat(1 << (precision - 1)); const SH1: v128 = i8x16(0, -1, 8, -1, 1, -1, 9, -1, 2, -1, 10, -1, 3, -1, 11, -1); const SH2: v128 = i8x16(0, 1, 4, 5, 0, 1, 4, 5, 0, 1, 4, 5, 0, 1, 4, 5); const SH3: v128 = i8x16(4, -1, 12, -1, 5, -1, 13, -1, 6, -1, 14, -1, 7, -1, 15, -1); const SH4: v128 = i8x16(2, 3, 6, 7, 2, 3, 6, 7, 2, 3, 6, 7, 2, 3, 6, 7); const SH5: v128 = i8x16(8, 9, 12, 13, 8, 9, 12, 13, 8, 9, 12, 13, 8, 9, 12, 13); const SH6: v128 = i8x16( 10, 11, 14, 15, 10, 11, 14, 15, 10, 11, 14, 15, 10, 11, 14, 15, ); const SH7: v128 = i8x16(0, -1, 4, -1, 1, -1, 5, -1, 2, -1, 6, -1, 3, -1, 7, -1); for (dst_x, coeffs_chunk) in normalizer.chunks().iter().enumerate() { let mut x: usize = coeffs_chunk.start as usize; let mut sss = initial; let coeffs_by_8 = coeffs_chunk.values().chunks_exact(8); let reminder8 = coeffs_by_8.remainder(); for k in coeffs_by_8 { let ksource = wasm32_utils::load_v128(k, 0); let mut source = wasm32_utils::load_v128(src_row, x); let mut pix = i8x16_swizzle(source, SH1); let mut mmk = i8x16_swizzle(ksource, SH2); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); pix = i8x16_swizzle(source, SH3); mmk = i8x16_swizzle(ksource, SH4); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); source = wasm32_utils::load_v128(src_row, x + 4); pix = i8x16_swizzle(source, SH1); mmk = i8x16_swizzle(ksource, SH5); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); pix = i8x16_swizzle(source, SH3); mmk = i8x16_swizzle(ksource, SH6); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); x += 8; } let coeffs_by_4 = reminder8.chunks_exact(4); let reminder4 = coeffs_by_4.remainder(); for k in coeffs_by_4 { let source = wasm32_utils::load_v128(src_row, x); let ksource = wasm32_utils::loadl_i64(k, 0); let mut pix = i8x16_swizzle(source, SH1); let mut mmk = i8x16_swizzle(ksource, SH2); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); pix = i8x16_swizzle(source, SH3); mmk = i8x16_swizzle(ksource, SH4); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); x += 4; } let coeffs_by_2 = reminder4.chunks_exact(2); let reminder2 = coeffs_by_2.remainder(); for k in coeffs_by_2 { let mmk = wasm32_utils::ptr_i16_to_set1_i32(k, 0); let source = wasm32_utils::loadl_i64(src_row, x); let pix = i8x16_swizzle(source, SH7); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); x += 2 } if let Some(&k) = reminder2.first() { let pix = wasm32_utils::i32x4_extend_low_ptr_u8x4(src_row, x); let mmk = i32x4_splat(k as i32); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); } sss = i32x4_shr(sss, precision); sss = i16x8_narrow_i32x4(sss, sss); *dst_row.get_unchecked_mut(dst_x) = transmute(i32x4_extract_lane::<0>(u8x16_narrow_i16x8(sss, sss))); } } fast_image_resize-5.3.0/src/convolution/vertical_f32/avx2.rs000064400000000000000000000101551046102023000221720ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::InnerPixel; use crate::{simd_utils, ImageView, ImageViewMut}; use super::native; pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) where T: InnerPixel, { let coefficients_chunks = coeffs.get_chunks(); let src_x = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, coeffs_chunk) in dst_rows.zip(coefficients_chunks) { unsafe { vert_convolution_into_one_row_f32(src_view, dst_row, src_x, coeffs_chunk); } } } #[target_feature(enable = "avx2")] unsafe fn vert_convolution_into_one_row_f32>( src_view: &impl ImageView, dst_row: &mut [T], mut src_x: usize, coeffs_chunk: CoefficientsChunk, ) { let mut c_buf = [0f64; 4]; let mut dst_f32 = T::components_mut(dst_row); let mut dst_chunks = dst_f32.chunks_exact_mut(32); for dst_chunk in &mut dst_chunks { multiply_components_of_rows::<_, 8>(src_view, src_x, coeffs_chunk, dst_chunk, &mut c_buf); src_x += 32; } dst_f32 = dst_chunks.into_remainder(); dst_chunks = dst_f32.chunks_exact_mut(16); for dst_chunk in &mut dst_chunks { multiply_components_of_rows::<_, 4>(src_view, src_x, coeffs_chunk, dst_chunk, &mut c_buf); src_x += 16; } dst_f32 = dst_chunks.into_remainder(); dst_chunks = dst_f32.chunks_exact_mut(8); for dst_chunk in &mut dst_chunks { multiply_components_of_rows::<_, 2>(src_view, src_x, coeffs_chunk, dst_chunk, &mut c_buf); src_x += 8; } dst_f32 = dst_chunks.into_remainder(); if !dst_f32.is_empty() { let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values; native::convolution_by_f32(src_view, dst_f32, src_x, y_start, coeffs); } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_components_of_rows, const SUMS_COUNT: usize>( src_view: &impl ImageView, src_x: usize, coeffs_chunk: CoefficientsChunk, dst_chunk: &mut [f32], c_buf: &mut [f64; 4], ) { let mut sums = [_mm256_set1_pd(0.); SUMS_COUNT]; let y_start = coeffs_chunk.start; let mut coeffs = coeffs_chunk.values; let mut y: u32 = 0; let max_rows = coeffs.len() as u32; let coeffs_2 = coeffs.chunks_exact(2); coeffs = coeffs_2.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_2) { let src_rows = src_rows.map(|row| T::components(row).get_unchecked(src_x..)); for (&coeff, src_row) in two_coeffs.iter().zip(src_rows) { multiply_components_of_row(&mut sums, coeff, src_row); } y += 2; } if let Some(&coeff) = coeffs.first() { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let src_row = T::components(s_row).get_unchecked(src_x..); multiply_components_of_row(&mut sums, coeff, src_row); } } let mut dst_ptr = dst_chunk.as_mut_ptr(); for sum in sums { _mm256_storeu_pd(c_buf.as_mut_ptr(), sum); for &v in c_buf.iter() { *dst_ptr = v as f32; dst_ptr = dst_ptr.add(1); } } } #[inline] #[target_feature(enable = "avx2")] unsafe fn multiply_components_of_row( sums: &mut [__m256d; SUMS_COUNT], coeff: f64, src_row: &[f32], ) { let coeff_f64x4 = _mm256_set1_pd(coeff); let mut i = 0; while i < SUMS_COUNT { let comp07_f32x8 = simd_utils::loadu_ps256(src_row, i * 4); let comp03_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<0>(comp07_f32x8)); sums[i] = _mm256_add_pd(sums[i], _mm256_mul_pd(comp03_f64x4, coeff_f64x4)); i += 1; let comp47_f64x4 = _mm256_cvtps_pd(_mm256_extractf128_ps::<1>(comp07_f32x8)); sums[i] = _mm256_add_pd(sums[i], _mm256_mul_pd(comp47_f64x4, coeff_f64x4)); i += 1; } } fast_image_resize-5.3.0/src/convolution/vertical_f32/mod.rs000064400000000000000000000026051046102023000220720ustar 00000000000000use crate::convolution::Coefficients; use crate::pixels::InnerPixel; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] pub(crate) mod avx2; pub(crate) mod native; // #[cfg(target_arch = "aarch64")] // mod neon; #[cfg(target_arch = "x86_64")] pub(crate) mod sse4; // #[cfg(target_arch = "wasm32")] // pub mod wasm32; pub(crate) fn vert_convolution_f32>( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, cpu_extensions: CpuExtensions, ) { // Check safety conditions debug_assert!(src_view.width() - offset >= dst_view.width()); debug_assert_eq!(coeffs.bounds.len(), dst_view.height() as usize); match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::vert_convolution(src_view, dst_view, offset, coeffs), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::vert_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "aarch64")] // CpuExtensions::Neon => neon::vert_convolution(src_view, dst_view, offset, coeffs), // #[cfg(target_arch = "wasm32")] // CpuExtensions::Simd128 => wasm32::vert_convolution(src_view, dst_view, offset, coeffs), _ => native::vert_convolution(src_view, dst_view, offset, coeffs), } } fast_image_resize-5.3.0/src/convolution/vertical_f32/native.rs000064400000000000000000000076101046102023000226020ustar 00000000000000use crate::convolution::Coefficients; use crate::pixels::InnerPixel; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; #[inline(always)] pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) where T: InnerPixel, { let coefficients_chunks = coeffs.get_chunks(); let src_x_initial = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); let coeffs_chunks_iter = coefficients_chunks.into_iter(); for (coeffs_chunk, dst_row) in coeffs_chunks_iter.zip(dst_rows) { let first_y_src = coeffs_chunk.start; let ks = coeffs_chunk.values; let mut dst_components = T::components_mut(dst_row); let mut x_src = src_x_initial; #[cfg(target_arch = "aarch64")] { (dst_components, x_src) = convolution_by_chunks::<_, 16>(src_view, dst_components, x_src, first_y_src, ks); } #[cfg(not(target_arch = "wasm32"))] { if !dst_components.is_empty() { (dst_components, x_src) = convolution_by_chunks::<_, 8>(src_view, dst_components, x_src, first_y_src, ks); } } #[cfg(target_arch = "wasm32")] { if !dst_components.is_empty() { (dst_components, x_src) = crate::convolution::vertical_f32::native::convolution_by_chunks::<_, 4>( src_view, dst_components, x_src, first_y_src, ks, ); } } if !dst_components.is_empty() { convolution_by_f32(src_view, dst_components, x_src, first_y_src, ks); } } } #[inline(always)] pub(crate) fn convolution_by_f32>( src_view: &impl ImageView, dst_components: &mut [f32], mut x_src: usize, first_y_src: u32, ks: &[f64], ) -> usize { for dst_component in dst_components.iter_mut() { let mut ss = 0.; let src_rows = src_view.iter_rows(first_y_src); for (&k, src_row) in ks.iter().zip(src_rows) { // SAFETY: Alignment of src_row is greater or equal than alignment f32 // because a component of pixel type T is f32. let src_ptr = src_row.as_ptr() as *const f32; let src_component = unsafe { *src_ptr.add(x_src) }; ss += src_component as f64 * k; } *dst_component = ss as f32; x_src += 1 } x_src } #[inline(always)] fn convolution_by_chunks<'a, T, const CHUNK_SIZE: usize>( src_view: &impl ImageView, dst_components: &'a mut [f32], mut x_src: usize, first_y_src: u32, ks: &[f64], ) -> (&'a mut [f32], usize) where T: InnerPixel, { let mut dst_chunks = dst_components.chunks_exact_mut(CHUNK_SIZE); for dst_chunk in &mut dst_chunks { let mut ss = [0.; CHUNK_SIZE]; let src_rows = src_view.iter_rows(first_y_src); foreach_with_pre_reading( ks.iter().zip(src_rows), |(&k, src_row)| { let src_ptr = src_row.as_ptr() as *const f32; let src_chunk = unsafe { let ptr = src_ptr.add(x_src) as *const [f32; CHUNK_SIZE]; ptr.read_unaligned() }; (src_chunk, k) }, |(src_chunk, k)| { for (s, c) in ss.iter_mut().zip(src_chunk) { *s += c as f64 * k; } }, ); for (i, s) in ss.iter().copied().enumerate() { dst_chunk[i] = s as f32; } x_src += CHUNK_SIZE; } (dst_chunks.into_remainder(), x_src) } fast_image_resize-5.3.0/src/convolution/vertical_f32/sse4.rs000064400000000000000000000101301046102023000221610ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::{Coefficients, CoefficientsChunk}; use crate::pixels::InnerPixel; use crate::{simd_utils, ImageView, ImageViewMut}; use super::native; pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, coeffs: &Coefficients, ) where T: InnerPixel, { let coefficients_chunks = coeffs.get_chunks(); let src_x = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, coeffs_chunk) in dst_rows.zip(coefficients_chunks) { unsafe { vert_convolution_into_one_row_f32(src_view, dst_row, src_x, coeffs_chunk); } } } #[target_feature(enable = "sse4.1")] unsafe fn vert_convolution_into_one_row_f32>( src_view: &impl ImageView, dst_row: &mut [T], mut src_x: usize, coeffs_chunk: CoefficientsChunk, ) { let mut c_buf = [0f64; 2]; let mut dst_f32 = T::components_mut(dst_row); let mut dst_chunks = dst_f32.chunks_exact_mut(16); for dst_chunk in &mut dst_chunks { multiply_components_of_rows::<_, 8>(src_view, src_x, coeffs_chunk, dst_chunk, &mut c_buf); src_x += 16; } dst_f32 = dst_chunks.into_remainder(); dst_chunks = dst_f32.chunks_exact_mut(8); for dst_chunk in &mut dst_chunks { multiply_components_of_rows::<_, 4>(src_view, src_x, coeffs_chunk, dst_chunk, &mut c_buf); src_x += 8; } dst_f32 = dst_chunks.into_remainder(); dst_chunks = dst_f32.chunks_exact_mut(4); if let Some(dst_chunk) = dst_chunks.next() { multiply_components_of_rows::<_, 2>(src_view, src_x, coeffs_chunk, dst_chunk, &mut c_buf); src_x += 4; } dst_f32 = dst_chunks.into_remainder(); if !dst_f32.is_empty() { let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values; native::convolution_by_f32(src_view, dst_f32, src_x, y_start, coeffs); } } #[inline] #[target_feature(enable = "sse4.1")] pub(crate) unsafe fn multiply_components_of_rows< T: InnerPixel, const SUMS_COUNT: usize, >( src_view: &impl ImageView, src_x: usize, coeffs_chunk: CoefficientsChunk, dst_chunk: &mut [f32], c_buf: &mut [f64; 2], ) { let mut sums = [_mm_set1_pd(0.); SUMS_COUNT]; let y_start = coeffs_chunk.start; let mut coeffs = coeffs_chunk.values; let mut y: u32 = 0; let max_rows = coeffs.len() as u32; let coeffs_2 = coeffs.chunks_exact(2); coeffs = coeffs_2.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_2) { let src_rows = src_rows.map(|row| T::components(row).get_unchecked(src_x..)); for (&coeff, src_row) in two_coeffs.iter().zip(src_rows) { multiply_components_of_row(&mut sums, coeff, src_row); } y += 2; } if let Some(&coeff) = coeffs.first() { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let src_row = T::components(s_row).get_unchecked(src_x..); multiply_components_of_row(&mut sums, coeff, src_row); } } let mut dst_ptr = dst_chunk.as_mut_ptr(); for sum in sums { _mm_storeu_pd(c_buf.as_mut_ptr(), sum); for &v in c_buf.iter() { *dst_ptr = v as f32; dst_ptr = dst_ptr.add(1); } } } #[inline] #[target_feature(enable = "sse4.1")] unsafe fn multiply_components_of_row( sums: &mut [__m128d; SUMS_COUNT], coeff: f64, src_row: &[f32], ) { let coeff_f64x2 = _mm_set1_pd(coeff); let mut i = 0; while i < SUMS_COUNT { let comp03_f32x4 = simd_utils::loadu_ps(src_row, i * 2); let comp01_f64x2 = _mm_cvtps_pd(comp03_f32x4); sums[i] = _mm_add_pd(sums[i], _mm_mul_pd(comp01_f64x2, coeff_f64x2)); i += 1; let comp23_f64x2 = _mm_cvtps_pd(_mm_movehl_ps(comp03_f32x4, comp03_f32x4)); sums[i] = _mm_add_pd(sums[i], _mm_mul_pd(comp23_f64x2, coeff_f64x2)); i += 1; } } fast_image_resize-5.3.0/src/convolution/vertical_u16/avx2.rs000064400000000000000000000140621046102023000222140ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::{CoefficientsI32Chunk, Normalizer32}; use crate::pixels::InnerPixel; use crate::{simd_utils, ImageView, ImageViewMut}; pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) where T: InnerPixel, { let coefficients_chunks = normalizer.chunks(); let src_x = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, coeffs_chunk) in dst_rows.zip(coefficients_chunks) { unsafe { vert_convolution_into_one_row_u16(src_view, dst_row, src_x, coeffs_chunk, normalizer); } } } #[target_feature(enable = "avx2")] pub(crate) unsafe fn vert_convolution_into_one_row_u16( src_view: &impl ImageView, dst_row: &mut [T], mut src_x: usize, coeffs_chunk: &CoefficientsI32Chunk, normalizer: &Normalizer32, ) where T: InnerPixel, { let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values(); let mut dst_u16 = T::components_mut(dst_row); /* |R G B | |R G B | |R G | - |B | |R G B | |R G B | |R | |0001 0203 0405| |0607 0809 1011| |1213 1415| - |0001| |0203 0405 0607| |0809 1011 1213| |1415| Shuffle to extract 0-1 components as i64: lo: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 hi: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract 2-3 components as i64: lo: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 hi: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract 4-5 components as i64: lo: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 hi: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract 6-7 components as i64: lo: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 hi: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ let shuffles = [ _mm256_set_m128i( _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0), ), _mm256_set_m128i( _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4), ), _mm256_set_m128i( _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8), ), _mm256_set_m128i( _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ), _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ), ), ]; let precision = normalizer.precision(); let initial = _mm256_set1_epi64x(1 << (precision - 1)); let mut comp_buf = [0i64; 4]; // 16 components in one register let mut dst_chunks_16 = dst_u16.chunks_exact_mut(16); for dst_chunk in &mut dst_chunks_16 { // 16 components / 4 per register = 4 registers let mut sum = [initial; 4]; for (s_row, &coeff) in src_view.iter_rows(y_start).zip(coeffs) { let components = T::components(s_row); let coeff_i64x4 = _mm256_set1_epi64x(coeff as i64); let source = simd_utils::loadu_si256(components, src_x); for i in 0..4 { let comp_i64x4 = _mm256_shuffle_epi8(source, shuffles[i]); sum[i] = _mm256_add_epi64(sum[i], _mm256_mul_epi32(comp_i64x4, coeff_i64x4)); } } for i in 0..4 { _mm256_storeu_si256(comp_buf.as_mut_ptr() as *mut __m256i, sum[i]); let component = dst_chunk.get_unchecked_mut(i * 2); *component = normalizer.clip(comp_buf[0]); let component = dst_chunk.get_unchecked_mut(i * 2 + 1); *component = normalizer.clip(comp_buf[1]); let component = dst_chunk.get_unchecked_mut(i * 2 + 8); *component = normalizer.clip(comp_buf[2]); let component = dst_chunk.get_unchecked_mut(i * 2 + 9); *component = normalizer.clip(comp_buf[3]); } src_x += 16; } dst_u16 = dst_chunks_16.into_remainder(); if !dst_u16.is_empty() { // 16 components / 4 per register = 4 registers let mut sum = [initial; 4]; let mut buf = [0u16; 16]; for (s_row, &coeff) in src_view.iter_rows(y_start).zip(coeffs) { let components = T::components(s_row); for (i, &v) in components .get_unchecked(src_x..) .iter() .take(dst_u16.len()) .enumerate() { buf[i] = v; } let coeff_i64x4 = _mm256_set1_epi64x(coeff as i64); let source = simd_utils::loadu_si256(&buf, 0); for i in 0..4 { let comp_i64x4 = _mm256_shuffle_epi8(source, shuffles[i]); sum[i] = _mm256_add_epi64(sum[i], _mm256_mul_epi32(comp_i64x4, coeff_i64x4)); } } for i in 0..4 { _mm256_storeu_si256(comp_buf.as_mut_ptr() as *mut __m256i, sum[i]); let component = buf.get_unchecked_mut(i * 2); *component = normalizer.clip(comp_buf[0]); let component = buf.get_unchecked_mut(i * 2 + 1); *component = normalizer.clip(comp_buf[1]); let component = buf.get_unchecked_mut(i * 2 + 8); *component = normalizer.clip(comp_buf[2]); let component = buf.get_unchecked_mut(i * 2 + 9); *component = normalizer.clip(comp_buf[3]); } for (i, v) in dst_u16.iter_mut().enumerate() { *v = buf[i]; } } } fast_image_resize-5.3.0/src/convolution/vertical_u16/mod.rs000064400000000000000000000026301046102023000221110ustar 00000000000000use crate::convolution::optimisations::Normalizer32; use crate::pixels::InnerPixel; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] pub(crate) mod avx2; pub(crate) mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] pub(crate) mod sse4; #[cfg(target_arch = "wasm32")] pub mod wasm32; pub(crate) fn vert_convolution_u16>( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, cpu_extensions: CpuExtensions, ) { // Check safety conditions debug_assert!(src_view.width() - offset >= dst_view.width()); debug_assert_eq!(normalizer.chunks_len(), dst_view.height() as usize); match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::vert_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::vert_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::vert_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::vert_convolution(src_view, dst_view, offset, normalizer), _ => native::vert_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/vertical_u16/native.rs000064400000000000000000000066401046102023000226250ustar 00000000000000use crate::convolution::optimisations::Normalizer32; use crate::pixels::InnerPixel; use crate::utils::foreach_with_pre_reading; use crate::{ImageView, ImageViewMut}; #[inline(always)] pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) where T: InnerPixel, { let coefficients_chunks = normalizer.chunks(); let precision = normalizer.precision(); let initial: i64 = 1 << (precision - 1); let src_x_initial = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); let coeffs_chunks_iter = coefficients_chunks.iter(); for (coeffs_chunk, dst_row) in coeffs_chunks_iter.zip(dst_rows) { let first_y_src = coeffs_chunk.start; let ks = coeffs_chunk.values(); let dst_components = T::components_mut(dst_row); let mut x_src = src_x_initial; let (_, dst_chunks, tail) = unsafe { dst_components.align_to_mut::<[u16; 16]>() }; x_src = convolution_by_chunks( src_view, normalizer, initial, dst_chunks, x_src, first_y_src, ks, ); if !tail.is_empty() { convolution_by_u16(src_view, normalizer, initial, tail, x_src, first_y_src, ks); } } } #[inline(always)] pub(crate) fn convolution_by_u16>( src_view: &impl ImageView, normalizer: &Normalizer32, initial: i64, dst_components: &mut [u16], mut x_src: usize, first_y_src: u32, ks: &[i32], ) -> usize { for dst_component in dst_components.iter_mut() { let mut ss = initial; let src_rows = src_view.iter_rows(first_y_src); for (&k, src_row) in ks.iter().zip(src_rows) { // SAFETY: Alignment of src_row is greater or equal than alignment u16 // because one component of pixel type T is u16. let src_ptr = src_row.as_ptr() as *const u16; let src_component = unsafe { *src_ptr.add(x_src) }; ss += src_component as i64 * (k as i64); } *dst_component = normalizer.clip(ss); x_src += 1 } x_src } #[inline(always)] fn convolution_by_chunks( src_view: &impl ImageView, normalizer: &Normalizer32, initial: i64, dst_chunks: &mut [[u16; CHUNK_SIZE]], mut x_src: usize, first_y_src: u32, ks: &[i32], ) -> usize where T: InnerPixel, { for dst_chunk in dst_chunks { let mut ss = [initial; CHUNK_SIZE]; let src_rows = src_view.iter_rows(first_y_src); foreach_with_pre_reading( ks.iter().zip(src_rows), |(&k, src_row)| { let src_ptr = src_row.as_ptr() as *const u16; let src_chunk = unsafe { let ptr = src_ptr.add(x_src) as *const [u16; CHUNK_SIZE]; ptr.read_unaligned() }; (src_chunk, k) }, |(src_chunk, k)| { for (s, c) in ss.iter_mut().zip(src_chunk) { *s += c as i64 * (k as i64); } }, ); for (i, s) in ss.iter().copied().enumerate() { dst_chunk[i] = normalizer.clip(s); } x_src += CHUNK_SIZE; } x_src } fast_image_resize-5.3.0/src/convolution/vertical_u16/neon.rs000064400000000000000000000172541046102023000223010ustar 00000000000000use std::arch::aarch64::*; use std::mem::transmute; use crate::convolution::optimisations::{CoefficientsI32Chunk, Normalizer32}; use crate::neon_utils; use crate::pixels::InnerPixel; use crate::{ImageView, ImageViewMut}; pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) where T: InnerPixel, { let coefficients_chunks = normalizer.chunks(); let precision = normalizer.precision(); let initial = 1i64 << (precision - 1); let start_src_x = offset as usize * T::count_of_components(); let mut tmp_dst = vec![0i64; dst_view.width() as usize * T::count_of_components()]; let tmp_buf = tmp_dst.as_mut_slice(); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, coeffs_chunk) in dst_rows.zip(coefficients_chunks) { tmp_buf.fill(initial); unsafe { vert_convolution_into_one_row_i64(src_view, tmp_buf, start_src_x, coeffs_chunk); let dst_comp = T::components_mut(dst_row); macro_rules! call { ($imm8:expr) => {{ store_tmp_buf_into_dst_row::<$imm8>(tmp_buf, dst_comp, normalizer); }}; } constify_64_imm8!(precision as i64, call); } } } #[target_feature(enable = "neon")] unsafe fn vert_convolution_into_one_row_i64>( src_view: &impl ImageView, dst_buf: &mut [i64], start_src_x: usize, coeffs_chunk: &CoefficientsI32Chunk, ) { let width = dst_buf.len(); let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values(); let zero_u16x8 = vdupq_n_u16(0); let zero_u16x4 = vdup_n_u16(0); for (s_row, &coeff) in src_view.iter_rows(y_start).zip(coeffs) { let components = T::components(s_row); let coeff_i32x2 = vdup_n_s32(coeff); let mut dst_x: usize = 0; let mut src_x = start_src_x; while dst_x < width.saturating_sub(31) { let source = neon_utils::load_u16x8x4(components, src_x); for s in [source.0, source.1, source.2, source.3] { let mut accum = neon_utils::load_i64x2x4(dst_buf, dst_x); let pix = vreinterpretq_s32_u16(vzip1q_u16(s, zero_u16x8)); accum.0 = vmlal_s32(accum.0, vget_low_s32(pix), coeff_i32x2); accum.1 = vmlal_s32(accum.1, vget_high_s32(pix), coeff_i32x2); let pix = vreinterpretq_s32_u16(vzip2q_u16(s, zero_u16x8)); accum.2 = vmlal_s32(accum.2, vget_low_s32(pix), coeff_i32x2); accum.3 = vmlal_s32(accum.3, vget_high_s32(pix), coeff_i32x2); neon_utils::store_i64x2x4(dst_buf, dst_x, accum); dst_x += 8; src_x += 8; } } if dst_x < width.saturating_sub(15) { let source = neon_utils::load_u16x8x2(components, src_x); for s in [source.0, source.1] { let mut accum = neon_utils::load_i64x2x4(dst_buf, dst_x); let pix = vreinterpretq_s32_u16(vzip1q_u16(s, zero_u16x8)); accum.0 = vmlal_s32(accum.0, vget_low_s32(pix), coeff_i32x2); accum.1 = vmlal_s32(accum.1, vget_high_s32(pix), coeff_i32x2); let pix = vreinterpretq_s32_u16(vzip2q_u16(s, zero_u16x8)); accum.2 = vmlal_s32(accum.2, vget_low_s32(pix), coeff_i32x2); accum.3 = vmlal_s32(accum.3, vget_high_s32(pix), coeff_i32x2); neon_utils::store_i64x2x4(dst_buf, dst_x, accum); dst_x += 8; src_x += 8; } } if dst_x < width.saturating_sub(7) { let s = neon_utils::load_u16x8(components, src_x); let mut accum = neon_utils::load_i64x2x4(dst_buf, dst_x); let pix = vreinterpretq_s32_u16(vzip1q_u16(s, zero_u16x8)); accum.0 = vmlal_s32(accum.0, vget_low_s32(pix), coeff_i32x2); accum.1 = vmlal_s32(accum.1, vget_high_s32(pix), coeff_i32x2); let pix = vreinterpretq_s32_u16(vzip2q_u16(s, zero_u16x8)); accum.2 = vmlal_s32(accum.2, vget_low_s32(pix), coeff_i32x2); accum.3 = vmlal_s32(accum.3, vget_high_s32(pix), coeff_i32x2); neon_utils::store_i64x2x4(dst_buf, dst_x, accum); dst_x += 8; src_x += 8; } if dst_x < width.saturating_sub(3) { let s = vcombine_u16(neon_utils::load_u16x4(components, src_x), zero_u16x4); let mut accum = neon_utils::load_i64x2x2(dst_buf, dst_x); let pix = vreinterpretq_s32_u16(vzip1q_u16(s, zero_u16x8)); accum.0 = vmlal_s32(accum.0, vget_low_s32(pix), coeff_i32x2); accum.1 = vmlal_s32(accum.1, vget_high_s32(pix), coeff_i32x2); neon_utils::store_i64x2x2(dst_buf, dst_x, accum); dst_x += 4; src_x += 4; } let coeff = coeff as i64; let tmp_tail = dst_buf.iter_mut().skip(dst_x); let comp_tail = components.iter().skip(src_x); for (accum, &comp) in tmp_tail.zip(comp_tail) { *accum += coeff * comp as i64; } } } #[target_feature(enable = "neon")] unsafe fn store_tmp_buf_into_dst_row( mut src_buf: &[i64], dst_buf: &mut [u16], normalizer: &Normalizer32, ) { let mut dst_chunks_8 = dst_buf.chunks_exact_mut(8); let src_chunks_8 = src_buf.chunks_exact(8); src_buf = src_chunks_8.remainder(); for (dst_chunk, src_chunk) in dst_chunks_8.by_ref().zip(src_chunks_8) { let mut accum = neon_utils::load_i64x2x4(src_chunk, 0); accum.0 = vshrq_n_s64::(accum.0); accum.1 = vshrq_n_s64::(accum.1); accum.2 = vshrq_n_s64::(accum.2); accum.3 = vshrq_n_s64::(accum.3); let sss0_i32 = vcombine_s32(vqmovn_s64(accum.0), vqmovn_s64(accum.1)); let sss1_i32 = vcombine_s32(vqmovn_s64(accum.2), vqmovn_s64(accum.3)); let sss_u16 = vcombine_u16(vqmovun_s32(sss0_i32), vqmovun_s32(sss1_i32)); let dst_ptr = dst_chunk.as_mut_ptr() as *mut u128; vstrq_p128(dst_ptr, transmute(sss_u16)); } let mut dst_chunks_4 = dst_chunks_8.into_remainder().chunks_exact_mut(4); let src_chunks_4 = src_buf.chunks_exact(4); src_buf = src_chunks_4.remainder(); for (dst_chunk, src_chunk) in dst_chunks_4.by_ref().zip(src_chunks_4) { let mut accum = neon_utils::load_i64x2x2(src_chunk, 0); accum.0 = vshrq_n_s64::(accum.0); accum.1 = vshrq_n_s64::(accum.1); let sss_i32 = vcombine_s32(vqmovn_s64(accum.0), vqmovn_s64(accum.1)); let sss_u16 = vcombine_u16(vqmovun_s32(sss_i32), vqmovun_s32(sss_i32)); let res = vdupd_laneq_u64::<0>(vreinterpretq_u64_u16(sss_u16)); let dst_ptr = dst_chunk.as_mut_ptr() as *mut u64; dst_ptr.write_unaligned(res); } let mut dst_chunks_2 = dst_chunks_4.into_remainder().chunks_exact_mut(2); let src_chunks_2 = src_buf.chunks_exact(2); src_buf = src_chunks_2.remainder(); for (dst_chunk, src_chunk) in dst_chunks_2.by_ref().zip(src_chunks_2) { let mut accum = neon_utils::load_i64x2(src_chunk, 0); accum = vshrq_n_s64::(accum); let sss_i32 = vcombine_s32(vqmovn_s64(accum), vqmovn_s64(accum)); let sss_u16 = vcombine_u16(vqmovun_s32(sss_i32), vqmovun_s32(sss_i32)); let res = vdups_laneq_u32::<0>(vreinterpretq_u32_u16(sss_u16)); let dst_ptr = dst_chunk.as_mut_ptr() as *mut u32; dst_ptr.write_unaligned(res); } let dst_chunk = dst_chunks_2.into_remainder(); for (dst, &src) in dst_chunk.iter_mut().zip(src_buf) { *dst = normalizer.clip(src); } } fast_image_resize-5.3.0/src/convolution/vertical_u16/sse4.rs000064400000000000000000000217301046102023000222120ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::{CoefficientsI32Chunk, Normalizer32}; use crate::convolution::vertical_u16::native::convolution_by_u16; use crate::pixels::InnerPixel; use crate::{simd_utils, ImageView, ImageViewMut}; pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) where T: InnerPixel, { let coefficients_chunks = normalizer.chunks(); let src_x = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, coeffs_chunk) in dst_rows.zip(coefficients_chunks) { unsafe { vert_convolution_into_one_row_u16(src_view, dst_row, src_x, coeffs_chunk, normalizer); } } } #[target_feature(enable = "sse4.1")] unsafe fn vert_convolution_into_one_row_u16>( src_view: &impl ImageView, dst_row: &mut [T], mut src_x: usize, coeffs_chunk: &CoefficientsI32Chunk, normalizer: &Normalizer32, ) { let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values(); let max_rows = coeffs.len() as u32; let mut dst_u16 = T::components_mut(dst_row); /* |0 1 2 3 4 5 6 7 | |0001 0203 0405 0607 0809 1011 1213 1415| Shuffle to extract 0-1 components as i64: -1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0 Shuffle to extract 2-3 components as i64: -1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4 Shuffle to extract 4-5 components as i64: -1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8 Shuffle to extract 6-7 components as i64: -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12 */ let c_shuffles = [ _mm_set_epi8(-1, -1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 1, 0), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 7, 6, -1, -1, -1, -1, -1, -1, 5, 4), _mm_set_epi8(-1, -1, -1, -1, -1, -1, 11, 10, -1, -1, -1, -1, -1, -1, 9, 8), _mm_set_epi8( -1, -1, -1, -1, -1, -1, 15, 14, -1, -1, -1, -1, -1, -1, 13, 12, ), ]; let precision = normalizer.precision(); let initial = _mm_set1_epi64x(1 << (precision - 1)); let mut c_buf = [0i64; 2]; let mut dst_chunks_16 = dst_u16.chunks_exact_mut(16); for dst_chunk in &mut dst_chunks_16 { let mut sums = [[initial; 2], [initial; 2], [initial; 2], [initial; 2]]; let mut y: u32 = 0; let coeffs_2 = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_2.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_2) { let src_rows = src_rows.map(|row| T::components(row)); for r in 0..2 { let coeff_i64x2 = _mm_set1_epi64x(two_coeffs[r] as i64); for x in 0..2 { let source = simd_utils::loadu_si128(src_rows[r], src_x + x * 8); for i in 0..4 { let c_i64x2 = _mm_shuffle_epi8(source, c_shuffles[i]); sums[i][x] = _mm_add_epi64(sums[i][x], _mm_mul_epi32(c_i64x2, coeff_i64x2)); } } } y += 2; } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let components = T::components(s_row); let coeff_i64x2 = _mm_set1_epi64x(k as i64); for x in 0..2 { let source = simd_utils::loadu_si128(components, src_x + x * 8); for i in 0..4 { let c_i64x2 = _mm_shuffle_epi8(source, c_shuffles[i]); sums[i][x] = _mm_add_epi64(sums[i][x], _mm_mul_epi32(c_i64x2, coeff_i64x2)); } } } } let mut dst_ptr = dst_chunk.as_mut_ptr(); for x in 0..2 { for sum in sums { _mm_storeu_si128(c_buf.as_mut_ptr() as *mut __m128i, sum[x]); *dst_ptr = normalizer.clip(c_buf[0]); dst_ptr = dst_ptr.add(1); *dst_ptr = normalizer.clip(c_buf[1]); dst_ptr = dst_ptr.add(1); } } src_x += 16; } dst_u16 = dst_chunks_16.into_remainder(); let mut dst_chunks_8 = dst_u16.chunks_exact_mut(8); if let Some(dst_chunk) = dst_chunks_8.next() { let mut sums = [initial, initial, initial, initial]; let mut y: u32 = 0; let coeffs_2 = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_2.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_2) { let src_rows = src_rows.map(|row| T::components(row)); let coeffs_i64 = [ _mm_set1_epi64x(two_coeffs[0] as i64), _mm_set1_epi64x(two_coeffs[1] as i64), ]; for r in 0..2 { let source = simd_utils::loadu_si128(src_rows[r], src_x); for i in 0..4 { let c_i64x2 = _mm_shuffle_epi8(source, c_shuffles[i]); sums[i] = _mm_add_epi64(sums[i], _mm_mul_epi32(c_i64x2, coeffs_i64[r])); } } y += 2; } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let components = T::components(s_row); let coeff_i64x2 = _mm_set1_epi64x(k as i64); let source = simd_utils::loadu_si128(components, src_x); for i in 0..4 { let c_i64x2 = _mm_shuffle_epi8(source, c_shuffles[i]); sums[i] = _mm_add_epi64(sums[i], _mm_mul_epi32(c_i64x2, coeff_i64x2)); } } } let mut dst_ptr = dst_chunk.as_mut_ptr(); for sum in sums { // let mask = _mm_cmpgt_epi64(sums[i], zero); // sums[i] = _mm_and_si128(sums[i] , mask); // sums[i] = _mm_srl_epi64(sums[i] , precision_i64); // _mm_packus_epi32(sums[i] , sums[i] ); _mm_storeu_si128(c_buf.as_mut_ptr() as *mut __m128i, sum); *dst_ptr = normalizer.clip(c_buf[0]); dst_ptr = dst_ptr.add(1); *dst_ptr = normalizer.clip(c_buf[1]); dst_ptr = dst_ptr.add(1); } src_x += 8; } dst_u16 = dst_chunks_8.into_remainder(); let mut dst_chunks_4 = dst_u16.chunks_exact_mut(4); if let Some(dst_chunk) = dst_chunks_4.next() { let mut c01 = initial; let mut c23 = initial; let mut y: u32 = 0; let coeffs_2 = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_2.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_2) { let src_rows = src_rows.map(|row| T::components(row)); let coeffs_i64 = [ _mm_set1_epi64x(two_coeffs[0] as i64), _mm_set1_epi64x(two_coeffs[1] as i64), ]; for r in 0..2 { let comp_x4 = src_rows[r].get_unchecked(src_x..src_x + 4); let c_i64x2 = _mm_set_epi64x(comp_x4[1] as i64, comp_x4[0] as i64); c01 = _mm_add_epi64(c01, _mm_mul_epi32(c_i64x2, coeffs_i64[r])); let c_i64x2 = _mm_set_epi64x(comp_x4[3] as i64, comp_x4[2] as i64); c23 = _mm_add_epi64(c23, _mm_mul_epi32(c_i64x2, coeffs_i64[r])); } y += 2; } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let components = T::components(s_row); let coeff_i64x2 = _mm_set1_epi64x(k as i64); let comp_x4 = components.get_unchecked(src_x..src_x + 4); let c_i64x2 = _mm_set_epi64x(comp_x4[1] as i64, comp_x4[0] as i64); c01 = _mm_add_epi64(c01, _mm_mul_epi32(c_i64x2, coeff_i64x2)); let c_i64x2 = _mm_set_epi64x(comp_x4[3] as i64, comp_x4[2] as i64); c23 = _mm_add_epi64(c23, _mm_mul_epi32(c_i64x2, coeff_i64x2)); } } let mut dst_ptr = dst_chunk.as_mut_ptr(); _mm_storeu_si128(c_buf.as_mut_ptr() as *mut __m128i, c01); *dst_ptr = normalizer.clip(c_buf[0]); dst_ptr = dst_ptr.add(1); *dst_ptr = normalizer.clip(c_buf[1]); dst_ptr = dst_ptr.add(1); _mm_storeu_si128(c_buf.as_mut_ptr() as *mut __m128i, c23); *dst_ptr = normalizer.clip(c_buf[0]); dst_ptr = dst_ptr.add(1); *dst_ptr = normalizer.clip(c_buf[1]); src_x += 4; } dst_u16 = dst_chunks_4.into_remainder(); if !dst_u16.is_empty() { let initial = 1 << (precision - 1); convolution_by_u16( src_view, normalizer, initial, dst_u16, src_x, y_start, coeffs, ); } } fast_image_resize-5.3.0/src/convolution/vertical_u16/wasm32.rs000064400000000000000000000217351046102023000224550ustar 00000000000000use std::arch::wasm32::*; use crate::convolution::optimisations::{CoefficientsI32Chunk, Normalizer32}; use crate::convolution::vertical_u16::native::convolution_by_u16; use crate::pixels::InnerPixel; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; pub(crate) fn vert_convolution>( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer32, ) { let coefficients_chunks = normalizer.chunks(); let src_x = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, coeffs_chunk) in dst_rows.zip(coefficients_chunks) { unsafe { vert_convolution_into_one_row_u16(src_view, dst_row, src_x, coeffs_chunk, normalizer); } } } #[target_feature(enable = "simd128")] unsafe fn vert_convolution_into_one_row_u16>( src_view: &impl ImageView, dst_row: &mut [T], mut src_x: usize, coeffs_chunk: &CoefficientsI32Chunk, normalizer: &Normalizer32, ) { let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values(); let max_rows = coeffs.len() as u32; let mut dst_u16 = T::components_mut(dst_row); /* |0 1 2 3 4 5 6 7 | |0001 0203 0405 0607 0809 1011 1213 1415| Shuffle to extract 0-1 components as i64: 0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1 Shuffle to extract 2-3 components as i64: 4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1 Shuffle to extract 4-5 components as i64: 8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1 Shuffle to extract 6-7 components as i64: 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1 */ let c_shuffles = [ i8x16(0, 1, -1, -1, -1, -1, -1, -1, 2, 3, -1, -1, -1, -1, -1, -1), i8x16(4, 5, -1, -1, -1, -1, -1, -1, 6, 7, -1, -1, -1, -1, -1, -1), i8x16(8, 9, -1, -1, -1, -1, -1, -1, 10, 11, -1, -1, -1, -1, -1, -1), i8x16( 12, 13, -1, -1, -1, -1, -1, -1, 14, 15, -1, -1, -1, -1, -1, -1, ), ]; let precision = normalizer.precision(); let initial = i64x2_splat(1 << (precision - 1)); let mut c_buf = [0i64; 2]; let mut dst_chunks_16 = dst_u16.chunks_exact_mut(16); for dst_chunk in &mut dst_chunks_16 { let mut sums = [[initial; 2], [initial; 2], [initial; 2], [initial; 2]]; let mut y: u32 = 0; let coeffs_2 = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_2.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_2) { let src_rows = src_rows.map(|row| T::components(row)); for r in 0..2 { let coeff_i64x2 = i64x2_splat(two_coeffs[r] as i64); for x in 0..2 { let source = wasm32_utils::load_v128(src_rows[r], src_x + x * 8); for i in 0..4 { let c_i64x2 = i8x16_swizzle(source, c_shuffles[i]); sums[i][x] = i64x2_add(sums[i][x], wasm32_utils::i64x2_mul_lo(c_i64x2, coeff_i64x2)); } } } y += 2; } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let components = T::components(s_row); let coeff_i64x2 = i64x2_splat(k as i64); for x in 0..2 { let source = wasm32_utils::load_v128(components, src_x + x * 8); for i in 0..4 { let c_i64x2 = i8x16_swizzle(source, c_shuffles[i]); sums[i][x] = i64x2_add(sums[i][x], wasm32_utils::i64x2_mul_lo(c_i64x2, coeff_i64x2)); } } } } let mut dst_ptr = dst_chunk.as_mut_ptr(); for x in 0..2 { for sum in sums { v128_store(c_buf.as_mut_ptr() as *mut v128, sum[x]); *dst_ptr = normalizer.clip(c_buf[0]); dst_ptr = dst_ptr.add(1); *dst_ptr = normalizer.clip(c_buf[1]); dst_ptr = dst_ptr.add(1); } } src_x += 16; } dst_u16 = dst_chunks_16.into_remainder(); let mut dst_chunks_8 = dst_u16.chunks_exact_mut(8); if let Some(dst_chunk) = dst_chunks_8.next() { let mut sums = [initial, initial, initial, initial]; let mut y: u32 = 0; let coeffs_2 = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_2.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_2) { let src_rows = src_rows.map(|row| T::components(row)); let coeffs_i64 = [ i64x2_splat(two_coeffs[0] as i64), i64x2_splat(two_coeffs[1] as i64), ]; for r in 0..2 { let source = wasm32_utils::load_v128(src_rows[r], src_x); for i in 0..4 { let c_i64x2 = i8x16_swizzle(source, c_shuffles[i]); sums[i] = i64x2_add(sums[i], wasm32_utils::i64x2_mul_lo(c_i64x2, coeffs_i64[r])); } } y += 2; } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let components = T::components(s_row); let coeff_i64x2 = i64x2_splat(k as i64); let source = wasm32_utils::load_v128(components, src_x); for i in 0..4 { let c_i64x2 = i8x16_swizzle(source, c_shuffles[i]); sums[i] = i64x2_add(sums[i], wasm32_utils::i64x2_mul_lo(c_i64x2, coeff_i64x2)); } } } let mut dst_ptr = dst_chunk.as_mut_ptr(); for sum in sums { // let mask = _mm_cmpgt_epi64(sums[i], zero); // sums[i] = _mm_and_si128(sums[i] , mask); // sums[i] = _mm_srl_epi64(sums[i] , precision_i64); // _mm_packus_epi32(sums[i] , sums[i] ); v128_store(c_buf.as_mut_ptr() as *mut v128, sum); *dst_ptr = normalizer.clip(c_buf[0]); dst_ptr = dst_ptr.add(1); *dst_ptr = normalizer.clip(c_buf[1]); dst_ptr = dst_ptr.add(1); } src_x += 8; } dst_u16 = dst_chunks_8.into_remainder(); let mut dst_chunks_4 = dst_u16.chunks_exact_mut(4); if let Some(dst_chunk) = dst_chunks_4.next() { let mut c01 = initial; let mut c23 = initial; let mut y: u32 = 0; let coeffs_2 = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_2.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_2) { let src_rows = src_rows.map(|row| T::components(row)); let coeffs_i64 = [ i64x2_splat(two_coeffs[0] as i64), i64x2_splat(two_coeffs[1] as i64), ]; for r in 0..2 { let comp_x4 = src_rows[r].get_unchecked(src_x..src_x + 4); let c_i64x2 = i64x2(comp_x4[0] as i64, comp_x4[1] as i64); c01 = i64x2_add(c01, wasm32_utils::i64x2_mul_lo(c_i64x2, coeffs_i64[r])); let c_i64x2 = i64x2(comp_x4[2] as i64, comp_x4[3] as i64); c23 = i64x2_add(c23, wasm32_utils::i64x2_mul_lo(c_i64x2, coeffs_i64[r])); } y += 2; } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let components = T::components(s_row); let coeff_i64x2 = i64x2_splat(k as i64); let comp_x4 = components.get_unchecked(src_x..src_x + 4); let c_i64x2 = i64x2(comp_x4[0] as i64, comp_x4[1] as i64); c01 = i64x2_add(c01, wasm32_utils::i64x2_mul_lo(c_i64x2, coeff_i64x2)); let c_i64x2 = i64x2(comp_x4[2] as i64, comp_x4[3] as i64); c23 = i64x2_add(c23, wasm32_utils::i64x2_mul_lo(c_i64x2, coeff_i64x2)); } } let mut dst_ptr = dst_chunk.as_mut_ptr(); v128_store(c_buf.as_mut_ptr() as *mut v128, c01); *dst_ptr = normalizer.clip(c_buf[0]); dst_ptr = dst_ptr.add(1); *dst_ptr = normalizer.clip(c_buf[1]); dst_ptr = dst_ptr.add(1); v128_store(c_buf.as_mut_ptr() as *mut v128, c23); *dst_ptr = normalizer.clip(c_buf[0]); dst_ptr = dst_ptr.add(1); *dst_ptr = normalizer.clip(c_buf[1]); src_x += 4; } dst_u16 = dst_chunks_4.into_remainder(); if !dst_u16.is_empty() { let initial = 1 << (precision - 1); convolution_by_u16( src_view, normalizer, initial, dst_u16, src_x, y_start, coeffs, ); } } fast_image_resize-5.3.0/src/convolution/vertical_u8/avx2.rs000064400000000000000000000230701046102023000221340ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::{CoefficientsI16Chunk, Normalizer16}; use crate::convolution::vertical_u8::native; use crate::image_view::ImageViewMut; use crate::pixels::InnerPixel; use crate::{simd_utils, ImageView}; #[inline] pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) where T: InnerPixel, { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ vert_convolution_p::(src_view, dst_view, offset, normalizer); }}; } constify_imm8!(precision, call); } fn vert_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) where T: InnerPixel, { let coefficients_chunks = normalizer.chunks(); let src_x = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); let dst_row_and_coefs = dst_rows.zip(coefficients_chunks); for (dst_row, coeffs_chunk) in dst_row_and_coefs { unsafe { vert_convolution_into_one_row::( src_view, dst_row, src_x, coeffs_chunk, normalizer, ); } } } #[inline] #[target_feature(enable = "avx2")] unsafe fn vert_convolution_into_one_row( src_view: &impl ImageView, dst_row: &mut [T], mut src_x: usize, coeffs_chunk: &CoefficientsI16Chunk, normalizer: &Normalizer16, ) where T: InnerPixel, { let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values(); let max_rows = coeffs.len() as u32; let y_last = (y_start + max_rows).max(1) - 1; let initial = _mm_set1_epi32(1 << (PRECISION as u8 - 1)); let initial_256 = _mm256_set1_epi32(1 << (PRECISION as u8 - 1)); let mut dst_u8 = T::components_mut(dst_row); // 32 components in one register let mut dst_chunks_32 = dst_u8.chunks_exact_mut(32); for dst_chunk in &mut dst_chunks_32 { let mut sss0 = initial_256; let mut sss1 = initial_256; let mut sss2 = initial_256; let mut sss3 = initial_256; let coeffs_chunks = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_chunks.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_chunks) { let components1 = T::components(src_rows[0]); let components2 = T::components(src_rows[1]); // Load two coefficients at once let mmk = simd_utils::mm256_load_and_clone_i16x2(two_coeffs); let source1 = simd_utils::loadu_si256(components1, src_x); // top line let source2 = simd_utils::loadu_si256(components2, src_x); // bottom line let source = _mm256_unpacklo_epi8(source1, source2); let pix = _mm256_unpacklo_epi8(source, _mm256_setzero_si256()); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk)); let pix = _mm256_unpackhi_epi8(source, _mm256_setzero_si256()); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk)); let source = _mm256_unpackhi_epi8(source1, source2); let pix = _mm256_unpacklo_epi8(source, _mm256_setzero_si256()); sss2 = _mm256_add_epi32(sss2, _mm256_madd_epi16(pix, mmk)); let pix = _mm256_unpackhi_epi8(source, _mm256_setzero_si256()); sss3 = _mm256_add_epi32(sss3, _mm256_madd_epi16(pix, mmk)); } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_last).next() { let components = T::components(s_row); let mmk = _mm256_set1_epi32(k as i32); let source1 = simd_utils::loadu_si256(components, src_x); // top line let source2 = _mm256_setzero_si256(); // bottom line is empty let source = _mm256_unpacklo_epi8(source1, source2); let pix = _mm256_unpacklo_epi8(source, _mm256_setzero_si256()); sss0 = _mm256_add_epi32(sss0, _mm256_madd_epi16(pix, mmk)); let pix = _mm256_unpackhi_epi8(source, _mm256_setzero_si256()); sss1 = _mm256_add_epi32(sss1, _mm256_madd_epi16(pix, mmk)); let source = _mm256_unpackhi_epi8(source1, _mm256_setzero_si256()); let pix = _mm256_unpacklo_epi8(source, _mm256_setzero_si256()); sss2 = _mm256_add_epi32(sss2, _mm256_madd_epi16(pix, mmk)); let pix = _mm256_unpackhi_epi8(source, _mm256_setzero_si256()); sss3 = _mm256_add_epi32(sss3, _mm256_madd_epi16(pix, mmk)); } } sss0 = _mm256_srai_epi32::(sss0); sss1 = _mm256_srai_epi32::(sss1); sss2 = _mm256_srai_epi32::(sss2); sss3 = _mm256_srai_epi32::(sss3); sss0 = _mm256_packs_epi32(sss0, sss1); sss2 = _mm256_packs_epi32(sss2, sss3); sss0 = _mm256_packus_epi16(sss0, sss2); let dst_ptr = dst_chunk.as_mut_ptr() as *mut __m256i; _mm256_storeu_si256(dst_ptr, sss0); src_x += 32; } // 8 components in half of SSE register dst_u8 = dst_chunks_32.into_remainder(); let mut dst_chunks_8 = dst_u8.chunks_exact_mut(8); for dst_chunk in &mut dst_chunks_8 { let mut sss0 = initial; // left row let mut sss1 = initial; // right row let coeffs_chunks = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_chunks.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_chunks) { let components1 = T::components(src_rows[0]); let components2 = T::components(src_rows[1]); // Load two coefficients at once let mmk = simd_utils::mm_load_and_clone_i16x2(two_coeffs); let source1 = simd_utils::loadl_epi64(components1, src_x); // top line let source2 = simd_utils::loadl_epi64(components2, src_x); // bottom line let source = _mm_unpacklo_epi8(source1, source2); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk)); } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_last).next() { let components = T::components(s_row); let mmk = _mm_set1_epi32(k as i32); let source1 = simd_utils::loadl_epi64(components, src_x); // top line let source2 = _mm_setzero_si128(); // bottom line is empty let source = _mm_unpacklo_epi8(source1, source2); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk)); } } sss0 = _mm_srai_epi32::(sss0); sss1 = _mm_srai_epi32::(sss1); sss0 = _mm_packs_epi32(sss0, sss1); sss0 = _mm_packus_epi16(sss0, sss0); let dst_ptr = dst_chunk.as_mut_ptr() as *mut __m128i; _mm_storel_epi64(dst_ptr, sss0); src_x += 8; } dst_u8 = dst_chunks_8.into_remainder(); let mut dst_chunks_4 = dst_u8.chunks_exact_mut(4); if let Some(dst_chunk) = dst_chunks_4.next() { let mut sss = initial; let coeffs_chunks = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_chunks.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_chunks) { let components1 = T::components(src_rows[0]); let components2 = T::components(src_rows[1]); // Load two coefficients at once let two_coeffs = simd_utils::mm_load_and_clone_i16x2(two_coeffs); let row1 = simd_utils::mm_cvtsi32_si128_from_u8(components1, src_x); // top line let row2 = simd_utils::mm_cvtsi32_si128_from_u8(components2, src_x); // bottom line let pixels_u8 = _mm_unpacklo_epi8(row1, row2); let pixels_i16 = _mm_unpacklo_epi8(pixels_u8, _mm_setzero_si128()); sss = _mm_add_epi32(sss, _mm_madd_epi16(pixels_i16, two_coeffs)); } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_last).next() { let components = T::components(s_row); let pix = simd_utils::mm_cvtepu8_epi32_from_u8(components, src_x); let mmk = _mm_set1_epi32(k as i32); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); } } sss = _mm_srai_epi32::(sss); sss = _mm_packs_epi32(sss, sss); let dst_ptr = dst_chunk.as_mut_ptr() as *mut i32; dst_ptr.write_unaligned(_mm_cvtsi128_si32(_mm_packus_epi16(sss, sss))); src_x += 4; } dst_u8 = dst_chunks_4.into_remainder(); if !dst_u8.is_empty() { native::convolution_by_u8( src_view, normalizer, 1 << (PRECISION as u8 - 1), dst_u8, src_x, y_start, coeffs, ); } } fast_image_resize-5.3.0/src/convolution/vertical_u8/mod.rs000064400000000000000000000026351046102023000220370ustar 00000000000000use crate::convolution::optimisations::Normalizer16; use crate::pixels::InnerPixel; use crate::{CpuExtensions, ImageView, ImageViewMut}; #[cfg(target_arch = "x86_64")] pub(crate) mod avx2; pub(crate) mod native; #[cfg(target_arch = "aarch64")] mod neon; #[cfg(target_arch = "x86_64")] pub(crate) mod sse4; #[cfg(target_arch = "wasm32")] pub(crate) mod wasm32; pub(crate) fn vert_convolution_u8>( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, cpu_extensions: CpuExtensions, ) { // Check safety conditions debug_assert!(src_view.width() - offset >= dst_view.width()); debug_assert_eq!(normalizer.chunks_len(), dst_view.height() as usize); match cpu_extensions { #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => avx2::vert_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => sse4::vert_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => neon::vert_convolution(src_view, dst_view, offset, normalizer), #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => wasm32::vert_convolution(src_view, dst_view, offset, normalizer), _ => native::vert_convolution(src_view, dst_view, offset, normalizer), } } fast_image_resize-5.3.0/src/convolution/vertical_u8/native.rs000064400000000000000000000104161046102023000225420ustar 00000000000000use crate::convolution::optimisations::{CoefficientsI16Chunk, Normalizer16}; use crate::image_view::{ImageView, ImageViewMut}; use crate::pixels::InnerPixel; use crate::utils::foreach_with_pre_reading; #[inline(always)] pub(crate) fn vert_convolution( src_image: &impl ImageView, dst_image: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) where T: InnerPixel, { let coefficients_chunks = normalizer.chunks(); let precision = normalizer.precision(); let initial = 1 << (precision - 1); let src_x_initial = offset as usize * T::count_of_components(); let dst_rows = dst_image.iter_rows_mut(0); let coeffs_chunks_iter = coefficients_chunks.iter(); let coefs_and_dst_row = coeffs_chunks_iter.zip(dst_rows); for (coeffs_chunk, dst_row) in coefs_and_dst_row { scale_row( src_image, dst_row, initial, src_x_initial, coeffs_chunk, normalizer, ); } } fn scale_row( src_image: &impl ImageView, dst_row: &mut [T], initial: i32, src_x_initial: usize, coeffs_chunk: &CoefficientsI16Chunk, normalizer: &Normalizer16, ) where T: InnerPixel, { let first_y_src = coeffs_chunk.start; let ks = coeffs_chunk.values(); let mut x_src = src_x_initial; let dst_components = T::components_mut(dst_row); let (_, dst_chunks, tail) = unsafe { dst_components.align_to_mut::<[u8; 16]>() }; x_src = convolution_by_chunks( src_image, normalizer, initial, dst_chunks, x_src, first_y_src, ks, ); if tail.is_empty() { return; } let (_, dst_chunks, tail) = unsafe { tail.align_to_mut::<[u8; 8]>() }; x_src = convolution_by_chunks( src_image, normalizer, initial, dst_chunks, x_src, first_y_src, ks, ); if tail.is_empty() { return; } let (_, dst_chunks, tail) = unsafe { tail.align_to_mut::<[u8; 4]>() }; x_src = convolution_by_chunks( src_image, normalizer, initial, dst_chunks, x_src, first_y_src, ks, ); if !tail.is_empty() { convolution_by_u8(src_image, normalizer, initial, tail, x_src, first_y_src, ks); } } #[inline(always)] pub(crate) fn convolution_by_u8( src_image: &impl ImageView, normalizer: &Normalizer16, initial: i32, dst_components: &mut [u8], mut x_src: usize, first_y_src: u32, ks: &[i16], ) -> usize where T: InnerPixel, { for dst_component in dst_components { let mut ss = initial; let src_rows = src_image.iter_rows(first_y_src); for (&k, src_row) in ks.iter().zip(src_rows) { let src_ptr = src_row.as_ptr() as *const u8; let src_component = unsafe { *src_ptr.add(x_src) }; ss += src_component as i32 * (k as i32); } *dst_component = unsafe { normalizer.clip(ss) }; x_src += 1 } x_src } #[inline(always)] fn convolution_by_chunks( src_image: &impl ImageView, normalizer: &Normalizer16, initial: i32, dst_chunks: &mut [[u8; CHUNK_SIZE]], mut x_src: usize, first_y_src: u32, ks: &[i16], ) -> usize where T: InnerPixel, { for dst_chunk in dst_chunks { let mut ss = [initial; CHUNK_SIZE]; let src_rows = src_image.iter_rows(first_y_src); foreach_with_pre_reading( ks.iter().zip(src_rows), |(&k, src_row)| { let src_ptr = src_row.as_ptr() as *const u8; let src_chunk = unsafe { let ptr = src_ptr.add(x_src) as *const [u8; CHUNK_SIZE]; ptr.read_unaligned() }; (src_chunk, k) }, |(src_chunk, k)| { for (s, c) in ss.iter_mut().zip(src_chunk) { *s += c as i32 * (k as i32); } }, ); for (i, s) in ss.iter().copied().enumerate() { dst_chunk[i] = unsafe { normalizer.clip(s) }; } x_src += CHUNK_SIZE; } x_src } fast_image_resize-5.3.0/src/convolution/vertical_u8/neon.rs000064400000000000000000000201431046102023000222110ustar 00000000000000use std::arch::aarch64::*; use std::mem::transmute; use crate::convolution::optimisations::{CoefficientsI16Chunk, Normalizer16}; use crate::neon_utils; use crate::pixels::InnerPixel; use crate::{ImageView, ImageViewMut}; pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) where T: InnerPixel, { let coefficients_chunks = normalizer.chunks(); let precision = normalizer.precision(); let initial = 1 << (precision - 1); let start_src_x = offset as usize * T::count_of_components(); let mut tmp_dst = vec![0i32; dst_view.width() as usize * T::count_of_components()]; let tmp_buf = tmp_dst.as_mut_slice(); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, coeffs_chunk) in dst_rows.zip(coefficients_chunks) { tmp_buf.fill(initial); unsafe { vert_convolution_into_one_row_i32(src_view, tmp_buf, start_src_x, coeffs_chunk); let dst_comp = T::components_mut(dst_row); macro_rules! call { ($imm8:expr) => {{ store_tmp_buf_into_dst_row::<$imm8>(tmp_buf, dst_comp, &normalizer); }}; } constify_imm8!(precision as i32, call); } } } #[target_feature(enable = "neon")] unsafe fn vert_convolution_into_one_row_i32( src_view: &impl ImageView, dst_buf: &mut [i32], start_src_x: usize, coeffs_chunk: &CoefficientsI16Chunk, ) where T: InnerPixel, { let width = dst_buf.len(); let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values(); let zero_u8x16 = vdupq_n_u8(0); let zero_u8x8 = vdup_n_u8(0); for (s_row, &coeff) in src_view.iter_rows(y_start).zip(coeffs) { let components = T::components(s_row); let coeff_i16x4 = vdup_n_s16(coeff); let mut dst_x: usize = 0; let mut src_x = start_src_x; while dst_x < width.saturating_sub(63) { let source = neon_utils::load_u8x16x4(components, src_x); for s in [source.0, source.1, source.2, source.3] { let mut accum = neon_utils::load_i32x4x4(dst_buf, dst_x); let pix = vreinterpretq_s16_u8(vzip1q_u8(s, zero_u8x16)); accum.0 = vmlal_s16(accum.0, vget_low_s16(pix), coeff_i16x4); accum.1 = vmlal_s16(accum.1, vget_high_s16(pix), coeff_i16x4); let pix = vreinterpretq_s16_u8(vzip2q_u8(s, zero_u8x16)); accum.2 = vmlal_s16(accum.2, vget_low_s16(pix), coeff_i16x4); accum.3 = vmlal_s16(accum.3, vget_high_s16(pix), coeff_i16x4); neon_utils::store_i32x4x4(dst_buf, dst_x, accum); dst_x += 16; src_x += 16; } } if dst_x < width.saturating_sub(31) { let source = neon_utils::load_u8x16x2(components, src_x); for s in [source.0, source.1] { let mut accum = neon_utils::load_i32x4x4(dst_buf, dst_x); let pix = vreinterpretq_s16_u8(vzip1q_u8(s, zero_u8x16)); accum.0 = vmlal_s16(accum.0, vget_low_s16(pix), coeff_i16x4); accum.1 = vmlal_s16(accum.1, vget_high_s16(pix), coeff_i16x4); let pix = vreinterpretq_s16_u8(vzip2q_u8(s, zero_u8x16)); accum.2 = vmlal_s16(accum.2, vget_low_s16(pix), coeff_i16x4); accum.3 = vmlal_s16(accum.3, vget_high_s16(pix), coeff_i16x4); neon_utils::store_i32x4x4(dst_buf, dst_x, accum); dst_x += 16; src_x += 16; } } if dst_x < width.saturating_sub(15) { let s = neon_utils::load_u8x16(components, src_x); let mut accum = neon_utils::load_i32x4x4(dst_buf, dst_x); let pix = vreinterpretq_s16_u8(vzip1q_u8(s, zero_u8x16)); accum.0 = vmlal_s16(accum.0, vget_low_s16(pix), coeff_i16x4); accum.1 = vmlal_s16(accum.1, vget_high_s16(pix), coeff_i16x4); let pix = vreinterpretq_s16_u8(vzip2q_u8(s, zero_u8x16)); accum.2 = vmlal_s16(accum.2, vget_low_s16(pix), coeff_i16x4); accum.3 = vmlal_s16(accum.3, vget_high_s16(pix), coeff_i16x4); neon_utils::store_i32x4x4(dst_buf, dst_x, accum); dst_x += 16; src_x += 16; } if dst_x < width.saturating_sub(7) { let s = vcombine_u8(neon_utils::load_u8x8(components, src_x), zero_u8x8); let mut accum = neon_utils::load_i32x4x2(dst_buf, dst_x); let pix = vreinterpretq_s16_u8(vzip1q_u8(s, zero_u8x16)); accum.0 = vmlal_s16(accum.0, vget_low_s16(pix), coeff_i16x4); accum.1 = vmlal_s16(accum.1, vget_high_s16(pix), coeff_i16x4); neon_utils::store_i32x4x2(dst_buf, dst_x, accum); dst_x += 8; src_x += 8; } if dst_x < width.saturating_sub(3) { let s = neon_utils::create_u8x16_from_one_u32(components, src_x); let mut accum = neon_utils::load_i32x4(dst_buf, dst_x); let pix = vreinterpretq_s16_u8(vzip1q_u8(s, zero_u8x16)); accum = vmlal_s16(accum, vget_low_s16(pix), coeff_i16x4); neon_utils::store_i32x4(dst_buf, dst_x, accum); dst_x += 4; src_x += 4; } let coeff = coeff as i32; let tmp_tail = dst_buf.iter_mut().skip(dst_x); let comp_tail = components.iter().skip(src_x); for (accum, &comp) in tmp_tail.zip(comp_tail) { *accum += coeff * comp as i32; } } } #[target_feature(enable = "neon")] unsafe fn store_tmp_buf_into_dst_row( mut src_buf: &[i32], dst_buf: &mut [u8], normalizer: &Normalizer16, ) { let mut dst_chunks_16 = dst_buf.chunks_exact_mut(16); let src_chunks_16 = src_buf.chunks_exact(16); src_buf = src_chunks_16.remainder(); for (dst_chunk, src_chunk) in dst_chunks_16.by_ref().zip(src_chunks_16) { let mut accum = neon_utils::load_i32x4x4(src_chunk, 0); accum.0 = vshrq_n_s32::(accum.0); accum.1 = vshrq_n_s32::(accum.1); accum.2 = vshrq_n_s32::(accum.2); accum.3 = vshrq_n_s32::(accum.3); let sss0_i16 = vcombine_s16(vqmovn_s32(accum.0), vqmovn_s32(accum.1)); let sss1_i16 = vcombine_s16(vqmovn_s32(accum.2), vqmovn_s32(accum.3)); let sss_u8 = vcombine_u8(vqmovun_s16(sss0_i16), vqmovun_s16(sss1_i16)); let dst_ptr = dst_chunk.as_mut_ptr() as *mut u128; vstrq_p128(dst_ptr, transmute(sss_u8)); } let mut dst_chunks_8 = dst_chunks_16.into_remainder().chunks_exact_mut(8); let src_chunks_8 = src_buf.chunks_exact(8); src_buf = src_chunks_8.remainder(); for (dst_chunk, src_chunk) in dst_chunks_8.by_ref().zip(src_chunks_8) { let mut accum = neon_utils::load_i32x4x2(src_chunk, 0); accum.0 = vshrq_n_s32::(accum.0); accum.1 = vshrq_n_s32::(accum.1); let sss_i16 = vcombine_s16(vqmovn_s32(accum.0), vqmovn_s32(accum.1)); let sss_u8 = vcombine_u8(vqmovun_s16(sss_i16), vqmovun_s16(sss_i16)); let res = vdupd_laneq_u64::<0>(vreinterpretq_u64_u8(sss_u8)); let dst_ptr = dst_chunk.as_mut_ptr() as *mut u64; dst_ptr.write_unaligned(res); } let mut dst_chunks_4 = dst_chunks_8.into_remainder().chunks_exact_mut(4); let src_chunks_4 = src_buf.chunks_exact(4); src_buf = src_chunks_4.remainder(); for (dst_chunk, src_chunk) in dst_chunks_4.by_ref().zip(src_chunks_4) { let mut accum = neon_utils::load_i32x4(src_chunk, 0); accum = vshrq_n_s32::(accum); let sss_i16 = vcombine_s16(vqmovn_s32(accum), vqmovn_s32(accum)); let sss_u8 = vcombine_u8(vqmovun_s16(sss_i16), vqmovun_s16(sss_i16)); let res = vdups_laneq_u32::<0>(vreinterpretq_u32_u8(sss_u8)); let dst_ptr = dst_chunk.as_mut_ptr() as *mut u32; dst_ptr.write_unaligned(res); } let dst_chunk = dst_chunks_4.into_remainder(); for (dst, &src) in dst_chunk.iter_mut().zip(src_buf) { *dst = normalizer.clip(src); } } fast_image_resize-5.3.0/src/convolution/vertical_u8/sse4.rs000064400000000000000000000263511046102023000221370ustar 00000000000000use std::arch::x86_64::*; use crate::convolution::optimisations::{CoefficientsI16Chunk, Normalizer16}; use crate::convolution::vertical_u8::native; use crate::pixels::InnerPixel; use crate::{simd_utils, ImageView, ImageViewMut}; #[inline] pub(crate) fn vert_convolution( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) where T: InnerPixel, { let precision = normalizer.precision(); macro_rules! call { ($imm8:expr) => {{ vert_convolution_p::(src_view, dst_view, offset, normalizer); }}; } constify_imm8!(precision, call); } fn vert_convolution_p( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) where T: InnerPixel, { let coefficients_chunks = normalizer.chunks(); let src_x = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); let dst_row_and_coefs = dst_rows.zip(coefficients_chunks); for (dst_row, coeffs_chunk) in dst_row_and_coefs { unsafe { vert_convolution_into_one_row::( src_view, dst_row, src_x, coeffs_chunk, normalizer, ); } } } #[target_feature(enable = "sse4.1")] unsafe fn vert_convolution_into_one_row( src_view: &impl ImageView, dst_row: &mut [T], mut src_x: usize, coeffs_chunk: &CoefficientsI16Chunk, normalizer: &Normalizer16, ) where T: InnerPixel, { let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values(); let max_rows = coeffs.len() as u32; let y_last = (y_start + max_rows).max(1) - 1; let mut dst_u8 = T::components_mut(dst_row); let initial = _mm_set1_epi32(1 << (PRECISION - 1)); let mut dst_chunks_32 = dst_u8.chunks_exact_mut(32); for dst_chunk in &mut dst_chunks_32 { let mut sss0 = initial; let mut sss1 = initial; let mut sss2 = initial; let mut sss3 = initial; let mut sss4 = initial; let mut sss5 = initial; let mut sss6 = initial; let mut sss7 = initial; let coeffs_chunks = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_chunks.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_chunks) { let components1 = T::components(src_rows[0]); let components2 = T::components(src_rows[1]); // Load two coefficients at once let mmk = simd_utils::mm_load_and_clone_i16x2(two_coeffs); let source1 = simd_utils::loadu_si128(components1, src_x); // top line let source2 = simd_utils::loadu_si128(components2, src_x); // bottom line let source = _mm_unpacklo_epi8(source1, source2); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk)); let source = _mm_unpackhi_epi8(source1, source2); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss2 = _mm_add_epi32(sss2, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss3 = _mm_add_epi32(sss3, _mm_madd_epi16(pix, mmk)); let source1 = simd_utils::loadu_si128(components1, src_x + 16); // top line let source2 = simd_utils::loadu_si128(components2, src_x + 16); // bottom line let source = _mm_unpacklo_epi8(source1, source2); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss4 = _mm_add_epi32(sss4, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss5 = _mm_add_epi32(sss5, _mm_madd_epi16(pix, mmk)); let source = _mm_unpackhi_epi8(source1, source2); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss6 = _mm_add_epi32(sss6, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss7 = _mm_add_epi32(sss7, _mm_madd_epi16(pix, mmk)); } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_last).next() { let components = T::components(s_row); let mmk = _mm_set1_epi32(k as i32); let source1 = simd_utils::loadu_si128(components, src_x); // top line let source = _mm_unpacklo_epi8(source1, _mm_setzero_si128()); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk)); let source = _mm_unpackhi_epi8(source1, _mm_setzero_si128()); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss2 = _mm_add_epi32(sss2, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss3 = _mm_add_epi32(sss3, _mm_madd_epi16(pix, mmk)); let source1 = simd_utils::loadu_si128(components, src_x + 16); // top line let source = _mm_unpacklo_epi8(source1, _mm_setzero_si128()); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss4 = _mm_add_epi32(sss4, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss5 = _mm_add_epi32(sss5, _mm_madd_epi16(pix, mmk)); let source = _mm_unpackhi_epi8(source1, _mm_setzero_si128()); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss6 = _mm_add_epi32(sss6, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss7 = _mm_add_epi32(sss7, _mm_madd_epi16(pix, mmk)); } } sss0 = _mm_srai_epi32::(sss0); sss1 = _mm_srai_epi32::(sss1); sss2 = _mm_srai_epi32::(sss2); sss3 = _mm_srai_epi32::(sss3); sss4 = _mm_srai_epi32::(sss4); sss5 = _mm_srai_epi32::(sss5); sss6 = _mm_srai_epi32::(sss6); sss7 = _mm_srai_epi32::(sss7); sss0 = _mm_packs_epi32(sss0, sss1); sss2 = _mm_packs_epi32(sss2, sss3); sss0 = _mm_packus_epi16(sss0, sss2); let dst_ptr = dst_chunk.as_mut_ptr() as *mut __m128i; _mm_storeu_si128(dst_ptr, sss0); sss4 = _mm_packs_epi32(sss4, sss5); sss6 = _mm_packs_epi32(sss6, sss7); sss4 = _mm_packus_epi16(sss4, sss6); let dst_ptr = dst_ptr.add(1); _mm_storeu_si128(dst_ptr, sss4); src_x += 32; } dst_u8 = dst_chunks_32.into_remainder(); let mut dst_chunks_8 = dst_u8.chunks_exact_mut(8); for dst_chunk in &mut dst_chunks_8 { let mut sss0 = initial; // left row let mut sss1 = initial; // right row let coeffs_chunks = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_chunks.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_chunks) { let components1 = T::components(src_rows[0]); let components2 = T::components(src_rows[1]); // Load two coefficients at once let mmk = simd_utils::mm_load_and_clone_i16x2(two_coeffs); let source1 = simd_utils::loadl_epi64(components1, src_x); // top line let source2 = simd_utils::loadl_epi64(components2, src_x); // bottom line let source = _mm_unpacklo_epi8(source1, source2); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk)); } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_last).next() { let components = T::components(s_row); let mmk = _mm_set1_epi32(k as i32); let source1 = simd_utils::loadl_epi64(components, src_x); // top line let source = _mm_unpacklo_epi8(source1, _mm_setzero_si128()); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss0 = _mm_add_epi32(sss0, _mm_madd_epi16(pix, mmk)); let pix = _mm_unpackhi_epi8(source, _mm_setzero_si128()); sss1 = _mm_add_epi32(sss1, _mm_madd_epi16(pix, mmk)); } } sss0 = _mm_srai_epi32::(sss0); sss1 = _mm_srai_epi32::(sss1); sss0 = _mm_packs_epi32(sss0, sss1); sss0 = _mm_packus_epi16(sss0, sss0); let dst_ptr = dst_chunk.as_mut_ptr() as *mut __m128i; _mm_storel_epi64(dst_ptr, sss0); src_x += 8; } dst_u8 = dst_chunks_8.into_remainder(); let mut dst_chunks_4 = dst_u8.chunks_exact_mut(4); if let Some(dst_chunk) = dst_chunks_4.next() { let mut sss = initial; let coeffs_chunks = coeffs.chunks_exact(2); let coeffs_reminder = coeffs_chunks.remainder(); for (src_rows, two_coeffs) in src_view.iter_2_rows(y_start, max_rows).zip(coeffs_chunks) { let components1 = T::components(src_rows[0]); let components2 = T::components(src_rows[1]); // Load two coefficients at once let mmk = simd_utils::mm_load_and_clone_i16x2(two_coeffs); let source1 = simd_utils::mm_cvtsi32_si128_from_u8(components1, src_x); // top line let source2 = simd_utils::mm_cvtsi32_si128_from_u8(components2, src_x); // bottom line let source = _mm_unpacklo_epi8(source1, source2); let pix = _mm_unpacklo_epi8(source, _mm_setzero_si128()); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); } if let Some(&k) = coeffs_reminder.first() { if let Some(s_row) = src_view.iter_rows(y_last).next() { let components = T::components(s_row); let pix = simd_utils::mm_cvtepu8_epi32_from_u8(components, src_x); let mmk = _mm_set1_epi32(k as i32); sss = _mm_add_epi32(sss, _mm_madd_epi16(pix, mmk)); } } sss = _mm_srai_epi32::(sss); sss = _mm_packs_epi32(sss, sss); let dst_ptr = dst_chunk.as_mut_ptr() as *mut i32; dst_ptr.write_unaligned(_mm_cvtsi128_si32(_mm_packus_epi16(sss, sss))); src_x += 4; } dst_u8 = dst_chunks_4.into_remainder(); if !dst_u8.is_empty() { native::convolution_by_u8( src_view, normalizer, 1 << (PRECISION - 1), dst_u8, src_x, y_start, coeffs, ); } } fast_image_resize-5.3.0/src/convolution/vertical_u8/wasm32.rs000064400000000000000000000253161046102023000223750ustar 00000000000000use std::arch::wasm32::*; use crate::convolution::optimisations::{CoefficientsI16Chunk, Normalizer16}; use crate::convolution::vertical_u8::native; use crate::pixels::InnerPixel; use crate::wasm32_utils; use crate::{ImageView, ImageViewMut}; #[inline] pub(crate) fn vert_convolution>( src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, offset: u32, normalizer: &Normalizer16, ) { let coefficients_chunks = normalizer.chunks(); let src_x = offset as usize * T::count_of_components(); let dst_rows = dst_view.iter_rows_mut(0); for (dst_row, coeffs_chunk) in dst_rows.zip(coefficients_chunks) { unsafe { vert_convolution_into_one_row_u8(src_view, dst_row, src_x, coeffs_chunk, normalizer); } } } #[inline] #[target_feature(enable = "simd128")] unsafe fn vert_convolution_into_one_row_u8>( src_view: &impl ImageView, dst_row: &mut [T], mut src_x: usize, coeffs_chunk: &CoefficientsI16Chunk, normalizer: &Normalizer16, ) { const ZERO: v128 = i64x2(0, 0); let y_start = coeffs_chunk.start; let coeffs = coeffs_chunk.values(); let max_rows = coeffs.len() as u32; let precision = normalizer.precision() as u32; let mut dst_u8 = T::components_mut(dst_row); let initial = i32x4_splat(1 << (precision - 1)); let mut dst_chunks_32 = dst_u8.chunks_exact_mut(32); for dst_chunk in &mut dst_chunks_32 { let mut sss0 = initial; let mut sss1 = initial; let mut sss2 = initial; let mut sss3 = initial; let mut sss4 = initial; let mut sss5 = initial; let mut sss6 = initial; let mut sss7 = initial; let mut y: u32 = 0; for src_rows in src_view.iter_2_rows(y_start, max_rows) { let components1 = T::components(src_rows[0]); let components2 = T::components(src_rows[1]); // Load two coefficients at once let mmk = wasm32_utils::ptr_i16_to_set1_i32(coeffs, y as usize); let source1 = wasm32_utils::load_v128(components1, src_x); // top line let source2 = wasm32_utils::load_v128(components2, src_x); // bottom line let source = i8x16_shuffle::<0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23>( source1, source2, ); let pix = i16x8_extend_low_u8x16(source); sss0 = i32x4_add(sss0, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss1 = i32x4_add(sss1, i32x4_dot_i16x8(pix, mmk)); let source = i8x16_shuffle::<8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31>( source1, source2, ); let pix = i16x8_extend_low_u8x16(source); sss2 = i32x4_add(sss2, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss3 = i32x4_add(sss3, i32x4_dot_i16x8(pix, mmk)); let source1 = wasm32_utils::load_v128(components1, src_x + 16); // top line let source2 = wasm32_utils::load_v128(components2, src_x + 16); // bottom line let source = i8x16_shuffle::<0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23>( source1, source2, ); let pix = i16x8_extend_low_u8x16(source); sss4 = i32x4_add(sss4, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss5 = i32x4_add(sss5, i32x4_dot_i16x8(pix, mmk)); let source = i8x16_shuffle::<8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31>( source1, source2, ); let pix = i16x8_extend_low_u8x16(source); sss6 = i32x4_add(sss6, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss7 = i32x4_add(sss7, i32x4_dot_i16x8(pix, mmk)); y += 2; } if let Some(&k) = coeffs.get(y as usize) { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let components = T::components(s_row); let mmk = i32x4_splat(k as i32); let source1 = wasm32_utils::load_v128(components, src_x); // top line let source = i8x16_shuffle::<0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23>( source1, ZERO, ); let pix = i16x8_extend_low_u8x16(source); sss0 = i32x4_add(sss0, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss1 = i32x4_add(sss1, i32x4_dot_i16x8(pix, mmk)); let source = i16x8_extend_high_u8x16(source1); let pix = i16x8_extend_low_u8x16(source); sss2 = i32x4_add(sss2, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss3 = i32x4_add(sss3, i32x4_dot_i16x8(pix, mmk)); let source1 = wasm32_utils::load_v128(components, src_x + 16); // top line let source = i8x16_shuffle::<0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23>( source1, ZERO, ); let pix = i16x8_extend_low_u8x16(source); sss4 = i32x4_add(sss4, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss5 = i32x4_add(sss5, i32x4_dot_i16x8(pix, mmk)); let source = i16x8_extend_high_u8x16(source1); let pix = i16x8_extend_low_u8x16(source); sss6 = i32x4_add(sss6, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss7 = i32x4_add(sss7, i32x4_dot_i16x8(pix, mmk)); } } sss0 = i32x4_shr(sss0, precision); sss1 = i32x4_shr(sss1, precision); sss2 = i32x4_shr(sss2, precision); sss3 = i32x4_shr(sss3, precision); sss4 = i32x4_shr(sss4, precision); sss5 = i32x4_shr(sss5, precision); sss6 = i32x4_shr(sss6, precision); sss7 = i32x4_shr(sss7, precision); sss0 = i16x8_narrow_i32x4(sss0, sss1); sss2 = i16x8_narrow_i32x4(sss2, sss3); sss0 = u8x16_narrow_i16x8(sss0, sss2); let dst_ptr = dst_chunk.as_mut_ptr() as *mut v128; v128_store(dst_ptr, sss0); sss4 = i16x8_narrow_i32x4(sss4, sss5); sss6 = i16x8_narrow_i32x4(sss6, sss7); sss4 = u8x16_narrow_i16x8(sss4, sss6); let dst_ptr = dst_ptr.add(1); v128_store(dst_ptr, sss4); src_x += 32; } dst_u8 = dst_chunks_32.into_remainder(); let mut dst_chunks_8 = dst_u8.chunks_exact_mut(8); for dst_chunk in &mut dst_chunks_8 { let mut sss0 = initial; // left row let mut sss1 = initial; // right row let mut y: u32 = 0; for src_rows in src_view.iter_2_rows(y_start, max_rows) { let components1 = T::components(src_rows[0]); let components2 = T::components(src_rows[1]); // Load two coefficients at once let mmk = wasm32_utils::ptr_i16_to_set1_i32(coeffs, y as usize); let source1 = wasm32_utils::loadl_i64(components1, src_x); // top line let source2 = wasm32_utils::loadl_i64(components2, src_x); // bottom line let source = i8x16_shuffle::<0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23>( source1, source2, ); let pix = i16x8_extend_low_u8x16(source); sss0 = i32x4_add(sss0, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss1 = i32x4_add(sss1, i32x4_dot_i16x8(pix, mmk)); y += 2; } if let Some(&k) = coeffs.get(y as usize) { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let components = T::components(s_row); let mmk = i32x4_splat(k as i32); let source1 = wasm32_utils::loadl_i64(components, src_x); // top line let source = i8x16_shuffle::<0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23>( source1, ZERO, ); let pix = i16x8_extend_low_u8x16(source); sss0 = i32x4_add(sss0, i32x4_dot_i16x8(pix, mmk)); let pix = i16x8_extend_high_u8x16(source); sss1 = i32x4_add(sss1, i32x4_dot_i16x8(pix, mmk)); } } sss0 = i32x4_shr(sss0, precision); sss1 = i32x4_shr(sss1, precision); sss0 = i16x8_narrow_i32x4(sss0, sss1); sss0 = u8x16_narrow_i16x8(sss0, sss0); let dst_ptr = dst_chunk.as_mut_ptr() as *mut i64; dst_ptr.write_unaligned(i64x2_extract_lane::<0>(sss0)); src_x += 8; } dst_u8 = dst_chunks_8.into_remainder(); let mut dst_chunks_4 = dst_u8.chunks_exact_mut(4); if let Some(dst_chunk) = dst_chunks_4.next() { let mut sss = initial; let mut y: u32 = 0; for src_rows in src_view.iter_2_rows(y_start, max_rows) { let components1 = T::components(src_rows[0]); let components2 = T::components(src_rows[1]); // Load two coefficients at once let mmk = wasm32_utils::ptr_i16_to_set1_i32(coeffs, y as usize); let source1 = wasm32_utils::i32x4_v128_from_u8(components1, src_x); // top line let source2 = wasm32_utils::i32x4_v128_from_u8(components2, src_x); // bottom line let source = i8x16_shuffle::<0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23>( source1, source2, ); let pix = i16x8_extend_low_u8x16(source); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); y += 2; } if let Some(&k) = coeffs.get(y as usize) { if let Some(s_row) = src_view.iter_rows(y_start + y).next() { let components = T::components(s_row); let pix = wasm32_utils::i32x4_extend_low_ptr_u8(components, src_x); let mmk = i32x4_splat(k as i32); sss = i32x4_add(sss, i32x4_dot_i16x8(pix, mmk)); } } sss = i32x4_shr(sss, precision); sss = i16x8_narrow_i32x4(sss, sss); let dst_ptr = dst_chunk.as_mut_ptr() as *mut i32; dst_ptr.write_unaligned(i32x4_extract_lane::<0>(u8x16_narrow_i16x8(sss, sss))); src_x += 4; } dst_u8 = dst_chunks_4.into_remainder(); if !dst_u8.is_empty() { native::convolution_by_u8( src_view, normalizer, 1 << (precision - 1), dst_u8, src_x, y_start, coeffs, ); } } fast_image_resize-5.3.0/src/cpu_extensions.rs000064400000000000000000000036471046102023000175260ustar 00000000000000/// SIMD extension of CPU. /// Specific variants depend on target architecture. /// Look at source code to see all available variants. #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum CpuExtensions { None, #[cfg(target_arch = "x86_64")] /// SIMD extension of x86_64 architecture Sse4_1, #[cfg(target_arch = "x86_64")] /// SIMD extension of x86_64 architecture Avx2, #[cfg(target_arch = "aarch64")] /// SIMD extension of Arm64 architecture Neon, #[cfg(target_arch = "wasm32")] /// SIMD extension of Wasm32 architecture Simd128, } impl CpuExtensions { /// Returns `true` if your CPU support the extension. pub fn is_supported(&self) -> bool { match self { #[cfg(target_arch = "x86_64")] Self::Avx2 => is_x86_feature_detected!("avx2"), #[cfg(target_arch = "x86_64")] Self::Sse4_1 => is_x86_feature_detected!("sse4.1"), #[cfg(target_arch = "aarch64")] Self::Neon => std::arch::is_aarch64_feature_detected!("neon"), #[cfg(target_arch = "wasm32")] Self::Simd128 => true, Self::None => true, } } } impl Default for CpuExtensions { #[cfg(target_arch = "x86_64")] fn default() -> Self { if is_x86_feature_detected!("avx2") { Self::Avx2 } else if is_x86_feature_detected!("sse4.1") { Self::Sse4_1 } else { Self::None } } #[cfg(target_arch = "aarch64")] fn default() -> Self { if std::arch::is_aarch64_feature_detected!("neon") { Self::Neon } else { Self::None } } #[cfg(target_arch = "wasm32")] fn default() -> Self { Self::Simd128 } #[cfg(not(any( target_arch = "x86_64", target_arch = "aarch64", target_arch = "wasm32" )))] fn default() -> Self { Self::None } } fast_image_resize-5.3.0/src/crop_box.rs000064400000000000000000000107551046102023000162710ustar 00000000000000use crate::{CropBoxError, ImageView}; /// A crop box parameters. #[derive(Debug, Clone, Copy)] pub struct CropBox { pub left: f64, pub top: f64, pub width: f64, pub height: f64, } impl CropBox { /// Get a crop box to resize the source image into the /// aspect ratio of destination image without distortions. /// /// `centering` used to control the cropping position. Use (0.5, 0.5) for /// center cropping (e.g. if cropping the width, take 50% off /// of the left side, and therefore 50% off the right side). /// (0.0, 0.0) will crop from the top left corner (i.e. if /// cropping the width, take all the crop off of the right /// side, and if cropping the height, take all of it off the /// bottom). (1.0, 0.0) will crop from the bottom left /// corner, etc. (i.e. if cropping the width, take all the /// crop off the left side, and if cropping the height take /// none from the top, and therefore all off the bottom). pub fn fit_src_into_dst_size( src_width: u32, src_height: u32, dst_width: u32, dst_height: u32, centering: Option<(f64, f64)>, ) -> Self { if src_width == 0 || src_height == 0 || dst_width == 0 || dst_height == 0 { return Self { left: 0., top: 0., width: src_width as _, height: src_height as _, }; } // This function based on code of ImageOps.fit() from Pillow package. // https://github.com/python-pillow/Pillow/blob/master/src/PIL/ImageOps.py let centering = if let Some((x, y)) = centering { (x.clamp(0.0, 1.0), y.clamp(0.0, 1.0)) } else { (0.5, 0.5) }; // calculate aspect ratios let width = src_width as f64; let height = src_height as f64; let image_ratio = width / height; let required_ration = dst_width as f64 / dst_height as f64; let crop_width; let crop_height; // figure out if the sides or top/bottom will be cropped off if (image_ratio - required_ration).abs() < f64::EPSILON { // The image is already the needed ratio crop_width = width; crop_height = height; } else if image_ratio >= required_ration { // The image is wider than what's needed, crop the sides crop_width = required_ration * height; crop_height = height; } else { // The image is taller than what's needed, crop the top and bottom crop_width = width; crop_height = width / required_ration; } let crop_left = (width - crop_width) * centering.0; let crop_top = (height - crop_height) * centering.1; Self { left: crop_left, top: crop_top, width: crop_width, height: crop_height, } } } pub(crate) struct CroppedSrcImageView<'a, T: ImageView> { image_view: &'a T, crop_box: CropBox, } impl<'a, T: ImageView> CroppedSrcImageView<'a, T> { pub fn new(image_view: &'a T) -> Self { Self { image_view, crop_box: CropBox { left: 0.0, top: 0.0, width: image_view.width() as _, height: image_view.height() as _, }, } } pub fn crop(image_view: &'a T, crop_box: CropBox) -> Result { if crop_box.width < 0. || crop_box.height < 0. { return Err(CropBoxError::WidthOrHeightLessThanZero); } let img_width = image_view.width() as _; let img_height = image_view.height() as _; if crop_box.left >= img_width || crop_box.top >= img_height { return Err(CropBoxError::PositionIsOutOfImageBoundaries); } let right = crop_box.left + crop_box.width; let bottom = crop_box.top + crop_box.height; if right > img_width || bottom > img_height { return Err(CropBoxError::SizeIsOutOfImageBoundaries); } Ok(Self { image_view, crop_box, }) } pub unsafe fn crop_unchecked(image_view: &'a T, crop_box: CropBox) -> Self { Self { image_view, crop_box, } } /// Returns a reference to the wrapped image view. #[inline] pub fn image_view(&self) -> &T { self.image_view } #[inline] pub fn crop_box(&self) -> CropBox { self.crop_box } } fast_image_resize-5.3.0/src/errors.rs000064400000000000000000000041651046102023000157700ustar 00000000000000use thiserror::Error; #[derive(Error, Debug, Clone, Copy, PartialEq, Eq)] #[non_exhaustive] pub enum ImageError { #[error("Pixel type of image is not supported")] UnsupportedPixelType, } #[derive(Error, Debug, Clone, Copy)] #[error("Size of container with pixels is smaller than required")] pub struct InvalidPixelsSize; #[derive(Error, Debug, Clone, Copy, PartialEq, Eq)] pub enum ImageBufferError { #[error("Size of buffer is smaller than required")] InvalidBufferSize, #[error("Alignment of buffer don't match to alignment of required pixel type")] InvalidBufferAlignment, } #[derive(Error, Debug, Clone, Copy, PartialEq, Eq)] pub enum CropBoxError { #[error("Position of the crop box is out of the image boundaries")] PositionIsOutOfImageBoundaries, #[error("Size of the crop box is out of the image boundaries")] SizeIsOutOfImageBoundaries, #[error("Width or height of the crop box is less than zero")] WidthOrHeightLessThanZero, } #[derive(Error, Debug, Clone, Copy, PartialEq, Eq)] #[non_exhaustive] pub enum ResizeError { #[error("Source or destination image is not supported")] ImageError(#[from] ImageError), #[error("Pixel type of source image does not match to destination image")] PixelTypesAreDifferent, #[error("Source cropping option is invalid: {0}")] SrcCroppingError(#[from] CropBoxError), } #[derive(Error, Debug, Clone, Copy)] #[error( "The dimensions of the source image are not equal to the dimensions of the destination image" )] pub struct DifferentDimensionsError; #[derive(Error, Debug, Clone, Copy, PartialEq, Eq)] pub enum MappingError { #[error("Source or destination image is not supported")] ImageError(#[from] ImageError), #[error("The dimensions of the source image are not equal to the dimensions of the destination image")] DifferentDimensions, #[error("Unsupported combination of pixels of source and/or destination images")] UnsupportedCombinationOfImageTypes, } impl From for MappingError { fn from(_: DifferentDimensionsError) -> Self { MappingError::DifferentDimensions } } fast_image_resize-5.3.0/src/image_view.rs000064400000000000000000000210451046102023000165640ustar 00000000000000use std::num::NonZeroU32; use crate::images::{TypedCroppedImage, TypedCroppedImageMut, UnsafeImageMut}; use crate::pixels::InnerPixel; use crate::{ArrayChunks, ImageError, PixelTrait, PixelType}; /// A trait for getting access to image data. /// /// # Safety /// /// The length of the image rows returned by methods of this trait /// must be equal or greater than the image width. pub unsafe trait ImageView: Sync + Send + Sized { type Pixel: InnerPixel; fn width(&self) -> u32; fn height(&self) -> u32; /// Returns iterator by slices with image rows. fn iter_rows(&self, start_row: u32) -> impl Iterator; /// Returns iterator by arrays with two image rows. fn iter_2_rows( &self, start_y: u32, max_rows: u32, ) -> ArrayChunks, 2> { ArrayChunks::new(self.iter_rows(start_y).take(max_rows as usize)) } /// Returns iterator by arrays with four image rows. fn iter_4_rows( &self, start_y: u32, max_rows: u32, ) -> ArrayChunks, 4> { ArrayChunks::new(self.iter_rows(start_y).take(max_rows as usize)) } /// Returns iterator by slices with image rows selected from /// the image with the given step. fn iter_rows_with_step( &self, start_y: f64, step: f64, max_rows: u32, ) -> impl Iterator { let steps = (self.height() as f64 - start_y) / step; let steps = (steps.max(0.).ceil() as usize).min(max_rows as usize); let mut rows = self.iter_rows(start_y as u32); let mut y = start_y; let mut next_row_y = start_y as usize; let mut cur_row = None; (0..steps).filter_map(move |_| { let req_row_y = y as usize; if next_row_y <= req_row_y { for _ in next_row_y..=req_row_y { cur_row = rows.next(); } next_row_y = req_row_y + 1; } y += step; cur_row }) } /// Takes a part of the image from the given start row with the given height /// and divides it by height into `num_parts` of small images without /// mutual intersections. fn split_by_height( &self, start_row: u32, height: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let height = height.get(); let num_parts = num_parts.get(); if num_parts > height || height > self.height() || start_row > self.height() - height { return None; } let mut res = Vec::with_capacity(num_parts as usize); let step = height / num_parts; let mut modulo = height % num_parts; let mut top = start_row; let width = self.width(); for _ in 0..num_parts { let mut part_height = step; if modulo > 0 { part_height += 1; modulo -= 1; } let image = TypedCroppedImage::from_ref(self, 0, top, width, part_height).unwrap(); res.push(image); top += part_height; } debug_assert!(top - start_row == height); Some(res) } /// Takes a part of the image from the given start column with the given width /// and divides it by width into `num_parts` of small images without /// mutual intersections. fn split_by_width( &self, start_col: u32, width: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let width = width.get(); let num_parts = num_parts.get(); if num_parts > width || width > self.width() || start_col > self.width() - width { return None; } let mut res = Vec::with_capacity(num_parts as usize); let step = width / num_parts; let mut modulo = width % num_parts; let mut left = start_col; let height = self.height(); for _ in 0..num_parts { let mut part_width = step; if modulo > 0 { part_width += 1; modulo -= 1; } let image = TypedCroppedImage::from_ref(self, left, 0, part_width, height).unwrap(); res.push(image); left += part_width; } debug_assert!(left - start_col == width); Some(res) } } /// A trait for getting mutable access to image data. /// /// # Safety /// /// The length of the image rows returned by methods of this trait /// must be equal or greater than the image width. pub unsafe trait ImageViewMut: ImageView { /// Returns iterator by mutable slices with image rows. fn iter_rows_mut(&mut self, start_row: u32) -> impl Iterator; /// Returns iterator by arrays with two mutable image rows. fn iter_2_rows_mut(&mut self) -> ArrayChunks, 2> { ArrayChunks::new(self.iter_rows_mut(0)) } /// Returns iterator by arrays with four mutable image rows. fn iter_4_rows_mut(&mut self) -> ArrayChunks, 4> { ArrayChunks::new(self.iter_rows_mut(0)) } /// Takes a part of the image from the given start row with the given height /// and divides it by height into `num_parts` of small mutable images without /// mutual intersections. fn split_by_height_mut( &mut self, start_row: u32, height: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let height = height.get(); let num_parts = num_parts.get(); if num_parts > height || height > self.height() || start_row > self.height() - height { return None; } let mut res = Vec::with_capacity(num_parts as usize); let step = height / num_parts; let mut modulo = height % num_parts; let mut top = start_row; let width = self.width(); let unsafe_image = UnsafeImageMut::new(self); for _ in 0..num_parts { let mut part_height = step; if modulo > 0 { part_height += 1; modulo -= 1; } let image = TypedCroppedImageMut::new(unsafe_image.clone(), 0, top, width, part_height) .unwrap(); res.push(image); top += part_height; } debug_assert!(top - start_row == height); Some(res) } /// Takes a part of the image from the given start column with the given width /// and divides it by width into `num_parts` of small mutable images without /// mutual intersections. fn split_by_width_mut( &mut self, start_col: u32, width: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let width = width.get(); let num_parts = num_parts.get(); if num_parts > width || width > self.width() || start_col > self.width() - width { return None; } let mut res = Vec::with_capacity(num_parts as usize); let step = width / num_parts; let mut modulo = width % num_parts; let mut left = start_col; let height = self.height(); let unsafe_image = UnsafeImageMut::new(self); for _ in 0..num_parts { let mut part_width = step; if modulo > 0 { part_width += 1; modulo -= 1; } let image = TypedCroppedImageMut::new(unsafe_image.clone(), left, 0, part_width, height) .unwrap(); res.push(image); left += part_width; } debug_assert!(left - start_col == width); Some(res) } } /// Conversion into an [ImageView]. pub trait IntoImageView { /// Returns pixel's type of the image. fn pixel_type(&self) -> Option; fn width(&self) -> u32; fn height(&self) -> u32; fn image_view(&self) -> Option>; } /// Conversion into an [ImageViewMut]. pub trait IntoImageViewMut: IntoImageView { fn image_view_mut(&mut self) -> Option>; } /// Returns supported by the crate pixel's type of the image or `ImageError` if the image /// has not supported pixel's type. pub(crate) fn try_pixel_type(image: &impl IntoImageView) -> Result { image.pixel_type().ok_or(ImageError::UnsupportedPixelType) } fast_image_resize-5.3.0/src/images/cropped_image.rs000064400000000000000000000053711046102023000205170ustar 00000000000000use crate::images::{check_crop_box, TypedCroppedImage, TypedCroppedImageMut}; use crate::{ CropBoxError, ImageView, ImageViewMut, IntoImageView, IntoImageViewMut, PixelTrait, PixelType, }; /// It is a wrapper that provides [IntoImageView] for part of wrapped image. pub struct CroppedImage<'a, V: IntoImageView> { image: &'a V, left: u32, top: u32, width: u32, height: u32, } /// It is a wrapper that provides [IntoImageView] and [IntoImageViewMut] for part of wrapped image. pub struct CroppedImageMut<'a, V: IntoImageView> { image: &'a mut V, left: u32, top: u32, width: u32, height: u32, } impl<'a, V: IntoImageView> CroppedImage<'a, V> { pub fn new( image: &'a V, left: u32, top: u32, width: u32, height: u32, ) -> Result { check_crop_box(image.width(), image.height(), left, top, width, height)?; Ok(Self { image, left, top, width, height, }) } } impl<'a, V: IntoImageView> CroppedImageMut<'a, V> { pub fn new( image: &'a mut V, left: u32, top: u32, width: u32, height: u32, ) -> Result { check_crop_box(image.width(), image.height(), left, top, width, height)?; Ok(Self { image, left, top, width, height, }) } } impl<'a, V: IntoImageView> IntoImageView for CroppedImage<'a, V> { fn pixel_type(&self) -> Option { self.image.pixel_type() } fn width(&self) -> u32 { self.width } fn height(&self) -> u32 { self.height } fn image_view(&self) -> Option> { self.image.image_view().map(|v| { TypedCroppedImage::new(v, self.left, self.top, self.width, self.height).unwrap() }) } } impl<'a, V: IntoImageView> IntoImageView for CroppedImageMut<'a, V> { fn pixel_type(&self) -> Option { self.image.pixel_type() } fn width(&self) -> u32 { self.width } fn height(&self) -> u32 { self.height } fn image_view(&self) -> Option> { self.image.image_view().map(|v| { TypedCroppedImage::new(v, self.left, self.top, self.width, self.height).unwrap() }) } } impl<'a, V: IntoImageViewMut> IntoImageViewMut for CroppedImageMut<'a, V> { fn image_view_mut(&mut self) -> Option> { self.image.image_view_mut().map(|v| { TypedCroppedImageMut::new(v, self.left, self.top, self.width, self.height).unwrap() }) } } fast_image_resize-5.3.0/src/images/image.rs000064400000000000000000000156571046102023000170130ustar 00000000000000use crate::images::{BufferContainer, TypedImage, TypedImageRef}; use crate::pixels::InnerPixel; use crate::{ ImageBufferError, ImageView, ImageViewMut, IntoImageView, IntoImageViewMut, PixelTrait, PixelType, }; /// Simple reference to image data that provides [IntoImageView]. #[derive(Debug, Copy, Clone)] pub struct ImageRef<'a> { width: u32, height: u32, buffer: &'a [u8], pixel_type: PixelType, } impl<'a> ImageRef<'a> { /// Create an image from slice with pixels-data. pub fn new( width: u32, height: u32, buffer: &'a [u8], pixel_type: PixelType, ) -> Result { let size = width as usize * height as usize * pixel_type.size(); if buffer.len() < size { return Err(ImageBufferError::InvalidBufferSize); } if !pixel_type.is_aligned(buffer) { return Err(ImageBufferError::InvalidBufferAlignment); } Ok(Self { width, height, buffer, pixel_type, }) } pub fn from_pixels( width: u32, height: u32, pixels: &'a [P], ) -> Result { let (head, buffer, _) = unsafe { pixels.align_to::() }; if !head.is_empty() { return Err(ImageBufferError::InvalidBufferAlignment); } Self::new(width, height, buffer, P::pixel_type()) } #[inline] pub fn pixel_type(&self) -> PixelType { self.pixel_type } #[inline] pub fn width(&self) -> u32 { self.width } #[inline] pub fn height(&self) -> u32 { self.height } /// Buffer with image pixels data. #[inline] pub fn buffer(&self) -> &[u8] { self.buffer } #[inline] pub fn into_vec(self) -> Vec { self.buffer.into() } /// Get the typed version of the image. pub fn typed_image(&self) -> Option> { if P::pixel_type() != self.pixel_type { return None; } let typed_image = TypedImageRef::from_buffer(self.width, self.height, self.buffer).unwrap(); Some(typed_image) } } impl<'a> IntoImageView for ImageRef<'a> { fn pixel_type(&self) -> Option { Some(self.pixel_type) } fn width(&self) -> u32 { self.width } fn height(&self) -> u32 { self.height } fn image_view(&self) -> Option> { self.typed_image() } } /// Simple dynamic container of image data that provides [IntoImageView] and [IntoImageViewMut]. #[derive(Debug)] pub struct Image<'a> { width: u32, height: u32, buffer: BufferContainer<'a, u8>, pixel_type: PixelType, } impl Image<'static> { /// Create an empty image with given dimensions and pixel type. pub fn new(width: u32, height: u32, pixel_type: PixelType) -> Self { let pixels_count = width as usize * height as usize; let buffer = BufferContainer::Owned(vec![0; pixels_count * pixel_type.size()]); Self { width, height, buffer, pixel_type, } } /// Create an image from vector with pixels data. pub fn from_vec_u8( width: u32, height: u32, buffer: Vec, pixel_type: PixelType, ) -> Result { let size = width as usize * height as usize * pixel_type.size(); if buffer.len() < size { return Err(ImageBufferError::InvalidBufferSize); } if !pixel_type.is_aligned(&buffer) { return Err(ImageBufferError::InvalidBufferAlignment); } Ok(Self { width, height, buffer: BufferContainer::Owned(buffer), pixel_type, }) } } impl<'a> Image<'a> { /// Create an image from slice with pixels data. pub fn from_slice_u8( width: u32, height: u32, buffer: &'a mut [u8], pixel_type: PixelType, ) -> Result { let size = width as usize * height as usize * pixel_type.size(); if buffer.len() < size { return Err(ImageBufferError::InvalidBufferSize); } if !pixel_type.is_aligned(buffer) { return Err(ImageBufferError::InvalidBufferAlignment); } Ok(Self { width, height, buffer: BufferContainer::Borrowed(buffer), pixel_type, }) } #[inline] pub fn pixel_type(&self) -> PixelType { self.pixel_type } #[inline] pub fn width(&self) -> u32 { self.width } #[inline] pub fn height(&self) -> u32 { self.height } /// Buffer with image pixels data. #[inline] pub fn buffer(&self) -> &[u8] { match &self.buffer { BufferContainer::Borrowed(p) => p, BufferContainer::Owned(v) => v, } } /// Mutable buffer with image pixels data. #[inline] pub fn buffer_mut(&mut self) -> &mut [u8] { match &mut self.buffer { BufferContainer::Borrowed(p) => p, BufferContainer::Owned(ref mut v) => v.as_mut_slice(), } } #[inline] pub fn into_vec(self) -> Vec { match self.buffer { BufferContainer::Borrowed(p) => p.into(), BufferContainer::Owned(v) => v, } } /// Creates a copy of the image. pub fn copy(&self) -> Image<'static> { Image { width: self.width, height: self.height, buffer: BufferContainer::Owned(self.buffer.as_vec()), pixel_type: self.pixel_type, } } /// Get the typed version of the image. pub fn typed_image(&self) -> Option> { if P::pixel_type() != self.pixel_type { return None; } let typed_image = TypedImageRef::from_buffer(self.width, self.height, self.buffer()).unwrap(); Some(typed_image) } /// Get the typed mutable version of the image. pub fn typed_image_mut(&mut self) -> Option> { if P::pixel_type() != self.pixel_type { return None; } let typed_image = TypedImage::from_buffer(self.width, self.height, self.buffer_mut()).unwrap(); Some(typed_image) } } impl<'a> IntoImageView for Image<'a> { fn pixel_type(&self) -> Option { Some(self.pixel_type) } fn width(&self) -> u32 { self.width } fn height(&self) -> u32 { self.height } fn image_view(&self) -> Option> { self.typed_image() } } impl<'a> IntoImageViewMut for Image<'a> { fn image_view_mut(&mut self) -> Option> { self.typed_image_mut() } } fast_image_resize-5.3.0/src/images/image_crate.rs000064400000000000000000000055701046102023000201620ustar 00000000000000use std::ops::DerefMut; use crate::image_view::try_pixel_type; use crate::images::{TypedImage, TypedImageRef}; use crate::{ImageView, ImageViewMut, IntoImageView, IntoImageViewMut, PixelTrait, PixelType}; use bytemuck::cast_slice_mut; use image::DynamicImage; impl IntoImageView for DynamicImage { fn pixel_type(&self) -> Option { match self { DynamicImage::ImageLuma8(_) => Some(PixelType::U8), DynamicImage::ImageLumaA8(_) => Some(PixelType::U8x2), DynamicImage::ImageRgb8(_) => Some(PixelType::U8x3), DynamicImage::ImageRgba8(_) => Some(PixelType::U8x4), DynamicImage::ImageLuma16(_) => Some(PixelType::U16), DynamicImage::ImageLumaA16(_) => Some(PixelType::U16x2), DynamicImage::ImageRgb16(_) => Some(PixelType::U16x3), DynamicImage::ImageRgba16(_) => Some(PixelType::U16x4), DynamicImage::ImageRgb32F(_) => Some(PixelType::F32x3), DynamicImage::ImageRgba32F(_) => Some(PixelType::F32x4), _ => None, } } fn width(&self) -> u32 { self.width() } fn height(&self) -> u32 { self.height() } fn image_view(&self) -> Option> { if let Ok(pixel_type) = try_pixel_type(self) { if P::pixel_type() == pixel_type { return TypedImageRef::

::from_buffer( self.width(), self.height(), self.as_bytes(), ) .ok(); } } None } } impl IntoImageViewMut for DynamicImage { fn image_view_mut(&mut self) -> Option> { if let Ok(pixel_type) = try_pixel_type(self) { if P::pixel_type() == pixel_type { return TypedImage::

::from_buffer( self.width(), self.height(), image_as_bytes_mut(self), ) .ok(); } } None } } fn image_as_bytes_mut(image: &mut DynamicImage) -> &mut [u8] { match image { DynamicImage::ImageLuma8(img) => (*img).deref_mut(), DynamicImage::ImageLumaA8(img) => (*img).deref_mut(), DynamicImage::ImageRgb8(img) => (*img).deref_mut(), DynamicImage::ImageRgba8(img) => (*img).deref_mut(), DynamicImage::ImageLuma16(img) => cast_slice_mut((*img).deref_mut()), DynamicImage::ImageLumaA16(img) => cast_slice_mut((*img).deref_mut()), DynamicImage::ImageRgb16(img) => cast_slice_mut((*img).deref_mut()), DynamicImage::ImageRgba16(img) => cast_slice_mut((*img).deref_mut()), DynamicImage::ImageRgb32F(img) => cast_slice_mut((*img).deref_mut()), DynamicImage::ImageRgba32F(img) => cast_slice_mut((*img).deref_mut()), _ => &mut [], } } fast_image_resize-5.3.0/src/images/mod.rs000064400000000000000000000032031046102023000164700ustar 00000000000000//! Contains different types of images and wrappers for them. use std::fmt::Debug; pub use cropped_image::*; pub use image::*; pub use typed_cropped_image::*; pub use typed_image::*; pub(crate) use unsafe_image::UnsafeImageMut; mod cropped_image; mod image; mod typed_cropped_image; mod typed_image; mod unsafe_image; #[cfg(feature = "image")] mod image_crate; #[derive(Debug)] enum BufferContainer<'a, T: Copy + Debug> { Borrowed(&'a mut [T]), Owned(Vec), } impl<'a, T: Copy + Debug> BufferContainer<'a, T> { fn as_vec(&self) -> Vec { match self { Self::Borrowed(slice) => slice.to_vec(), Self::Owned(vec) => vec.clone(), } } pub fn borrow(&self) -> &[T] { match self { Self::Borrowed(p_ref) => p_ref, Self::Owned(vec) => vec, } } pub fn borrow_mut(&mut self) -> &mut [T] { match self { Self::Borrowed(p_ref) => p_ref, Self::Owned(vec) => vec, } } } enum View<'a, V: 'a> { Borrowed(&'a V), Owned(V), } impl<'a, V> View<'a, V> { fn get_ref(&self) -> &V { match self { Self::Borrowed(v_ref) => v_ref, Self::Owned(v_own) => v_own, } } } enum ViewMut<'a, V: 'a> { Borrowed(&'a mut V), Owned(V), } impl<'a, V> ViewMut<'a, V> { fn get_ref(&self) -> &V { match self { Self::Borrowed(v_ref) => v_ref, Self::Owned(v_own) => v_own, } } fn get_mut(&mut self) -> &mut V { match self { Self::Borrowed(p_ref) => p_ref, Self::Owned(vec) => vec, } } } fast_image_resize-5.3.0/src/images/typed_cropped_image.rs000064400000000000000000000205751046102023000217270ustar 00000000000000use crate::images::{View, ViewMut}; use crate::{CropBoxError, ImageView, ImageViewMut}; use std::num::NonZeroU32; pub(crate) fn check_crop_box( img_width: u32, img_height: u32, left: u32, top: u32, width: u32, height: u32, ) -> Result<(), CropBoxError> { if left >= img_width || top >= img_height { return Err(CropBoxError::PositionIsOutOfImageBoundaries); } let right = left + width; let bottom = top + height; if right > img_width || bottom > img_height { return Err(CropBoxError::SizeIsOutOfImageBoundaries); } Ok(()) } /// It is a typed wrapper that provides [ImageView] for part of wrapped image. pub struct TypedCroppedImage<'a, V: ImageView + 'a> { image_view: View<'a, V>, left: u32, top: u32, width: u32, height: u32, } /// It is a typed wrapper that provides [ImageView] and [ImageViewMut] for part of wrapped image. pub struct TypedCroppedImageMut<'a, V: ImageViewMut> { image_view: ViewMut<'a, V>, left: u32, top: u32, width: u32, height: u32, } impl<'a, V: ImageView + 'a> TypedCroppedImage<'a, V> { pub fn new( image_view: V, left: u32, top: u32, width: u32, height: u32, ) -> Result { check_crop_box( image_view.width(), image_view.height(), left, top, width, height, )?; Ok(Self { image_view: View::Owned(image_view), left, top, width, height, }) } pub fn from_ref( image_view: &'a V, left: u32, top: u32, width: u32, height: u32, ) -> Result { check_crop_box( image_view.width(), image_view.height(), left, top, width, height, )?; Ok(Self { image_view: View::Borrowed(image_view), left, top, width, height, }) } } impl<'a, V: ImageViewMut> TypedCroppedImageMut<'a, V> { pub fn new( image_view: V, left: u32, top: u32, width: u32, height: u32, ) -> Result { check_crop_box( image_view.width(), image_view.height(), left, top, width, height, )?; Ok(Self { image_view: ViewMut::Owned(image_view), left, top, width, height, }) } pub fn from_ref( image_view: &'a mut V, left: u32, top: u32, width: u32, height: u32, ) -> Result { check_crop_box( image_view.width(), image_view.height(), left, top, width, height, )?; Ok(Self { image_view: ViewMut::Borrowed(image_view), left, top, width, height, }) } } macro_rules! image_view_impl { ($wrapper_name:ident<$view_trait:ident>) => { unsafe impl<'a, V: $view_trait> ImageView for $wrapper_name<'a, V> { type Pixel = V::Pixel; fn width(&self) -> u32 { self.width } fn height(&self) -> u32 { self.height } fn iter_rows(&self, start_row: u32) -> impl Iterator { let left = self.left as usize; let right = left + self.width as usize; self.image_view .get_ref() .iter_rows(self.top + start_row) .take((self.height - start_row) as usize) // SAFETY: correct values of the left and the right // are guaranteed by new() method. .map(move |row| unsafe { row.get_unchecked(left..right) }) } fn split_by_height( &self, start_row: u32, height: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let height_u32 = height.get(); if num_parts > height || height_u32 > self.height() || start_row > self.height() - height_u32 { return None; } let image_view = self.image_view.get_ref(); let images = image_view.split_by_height(start_row + self.top, height, num_parts); images.map(|v| { v.into_iter() .map(|img| { let img_height = img.height(); TypedCroppedImage::new(img, self.left, 0, self.width, img_height) .unwrap() }) .collect() }) } fn split_by_width( &self, start_col: u32, width: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let width_u32 = width.get(); if num_parts > width || width_u32 > self.width() || start_col > self.width() - width_u32 { return None; } let image_view = self.image_view.get_ref(); let images = image_view.split_by_width(start_col + self.left, width, num_parts); images.map(|v| { v.into_iter() .map(|img| { let img_width = img.width(); TypedCroppedImage::new(img, 0, self.top, img_width, self.height) .unwrap() }) .collect() }) } } }; } image_view_impl!(TypedCroppedImage); image_view_impl!(TypedCroppedImageMut); unsafe impl<'a, V: ImageViewMut> ImageViewMut for TypedCroppedImageMut<'a, V> { fn iter_rows_mut(&mut self, start_row: u32) -> impl Iterator { let left = self.left as usize; let right = left + self.width as usize; self.image_view .get_mut() .iter_rows_mut(self.top + start_row) .take((self.height - start_row) as usize) // SAFETY: correct values of the left and the right // are guaranteed by new() method. .map(move |row| unsafe { row.get_unchecked_mut(left..right) }) } fn split_by_height_mut( &mut self, start_row: u32, height: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let height_u32 = height.get(); if num_parts > height || height_u32 > self.height() || start_row > self.height() - height_u32 { return None; } let image_view = self.image_view.get_mut(); let images = image_view.split_by_height_mut(start_row + self.top, height, num_parts); images.map(|v| { v.into_iter() .map(|img| { let img_height = img.height(); TypedCroppedImageMut::new(img, self.left, 0, self.width, img_height).unwrap() }) .collect() }) } fn split_by_width_mut( &mut self, start_col: u32, width: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let width_u32 = width.get(); if num_parts > width || width_u32 > self.width() || start_col > self.width() - width_u32 { return None; } let image_view = self.image_view.get_mut(); let images = image_view.split_by_width_mut(start_col + self.left, width, num_parts); images.map(|v| { v.into_iter() .map(|img| { let img_width = img.width(); TypedCroppedImageMut::new(img, 0, self.top, img_width, self.height).unwrap() }) .collect() }) } } fast_image_resize-5.3.0/src/images/typed_image.rs000064400000000000000000000233051046102023000202050ustar 00000000000000use crate::images::BufferContainer; use crate::pixels::InnerPixel; use crate::{ImageBufferError, ImageView, ImageViewMut, InvalidPixelsSize}; use std::fmt::Debug; use std::num::NonZeroU32; /// Generic reference to image data that provides [ImageView]. #[derive(Debug)] pub struct TypedImageRef<'a, P> { width: u32, height: u32, pixels: &'a [P], } impl<'a, P> TypedImageRef<'a, P> { pub fn new(width: u32, height: u32, pixels: &'a [P]) -> Result { let pixels_count = width as usize * height as usize; if pixels.len() < pixels_count { return Err(InvalidPixelsSize); } Ok(Self { width, height, pixels, }) } pub fn from_buffer( width: u32, height: u32, buffer: &'a [u8], ) -> Result { let pixels = align_buffer_to(buffer)?; Self::new(width, height, pixels).map_err(|_| ImageBufferError::InvalidBufferSize) } pub fn pixels(&self) -> &[P] { self.pixels } } unsafe impl<'a, P: InnerPixel> ImageView for TypedImageRef<'a, P> { type Pixel = P; fn width(&self) -> u32 { self.width } fn height(&self) -> u32 { self.height } fn iter_rows(&self, start_row: u32) -> impl Iterator { let width = self.width as usize; if width == 0 { [].chunks_exact(1) } else { let start = start_row as usize * width; self.pixels .get(start..) .unwrap_or_default() .chunks_exact(width) } } fn iter_rows_with_step( &self, start_y: f64, step: f64, max_rows: u32, ) -> impl Iterator { let row_size = self.width as usize; let steps = (self.height() as f64 - start_y) / step; let steps = (steps.max(0.).ceil() as u32).min(max_rows); let mut y = start_y; let mut next_row_y = start_y as usize; let mut cur_row = None; (0..steps).filter_map(move |_| { let cur_row_y = y as usize; if next_row_y <= cur_row_y { let start = cur_row_y * row_size; let end = start + row_size; cur_row = self.pixels.get(start..end); next_row_y = cur_row_y + 1; } y += step; cur_row }) } fn split_by_height( &self, start_row: u32, height: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let height = height.get(); let num_parts = num_parts.get(); if num_parts > height || height > self.height() || start_row > self.height() - height { return None; } let mut res = Vec::with_capacity(num_parts as usize); let step = height / num_parts; let mut modulo = height % num_parts; let mut top = start_row; let row_size = self.width as usize; let mut remains_pixels = self.pixels.split_at(top as usize * row_size).1; for _ in 0..num_parts { let mut part_height = step; if modulo > 0 { part_height += 1; modulo -= 1; } let parts = remains_pixels.split_at(part_height as usize * row_size); let image = TypedImageRef::new(self.width, part_height, parts.0).unwrap(); res.push(image); remains_pixels = parts.1; top += part_height; } Some(res) } } /// Generic image container that provides [ImageView] and [ImageViewMut]. #[derive(Debug)] pub struct TypedImage<'a, P: Default + Copy + Debug> { width: u32, height: u32, pixels: BufferContainer<'a, P>, } impl TypedImage<'static, P> { pub fn new(width: u32, height: u32) -> Self { let pixels_count = width as usize * height as usize; Self { width, height, pixels: BufferContainer::Owned(vec![P::default(); pixels_count]), } } } impl<'a, P: InnerPixel> TypedImage<'a, P> { pub fn from_pixels(width: u32, height: u32, pixels: Vec

) -> Result { let pixels_count = width as usize * height as usize; if pixels.len() < pixels_count { return Err(InvalidPixelsSize); } Ok(Self { width, height, pixels: BufferContainer::Owned(pixels), }) } pub fn from_pixels_slice( width: u32, height: u32, pixels: &'a mut [P], ) -> Result { let pixels_count = width as usize * height as usize; if pixels.len() < pixels_count { return Err(InvalidPixelsSize); } Ok(Self { width, height, pixels: BufferContainer::Borrowed(pixels), }) } pub fn from_buffer( width: u32, height: u32, buffer: &'a mut [u8], ) -> Result { let size = width as usize * height as usize * P::size(); if buffer.len() < size { return Err(ImageBufferError::InvalidBufferSize); } let pixels = align_buffer_to_mut(buffer)?; Self::from_pixels_slice(width, height, pixels) .map_err(|_| ImageBufferError::InvalidBufferSize) } pub fn pixels(&self) -> &[P] { self.pixels.borrow() } pub fn pixels_mut(&mut self) -> &mut [P] { self.pixels.borrow_mut() } } unsafe impl<'a, P: InnerPixel> ImageView for TypedImage<'a, P> { type Pixel = P; fn width(&self) -> u32 { self.width } fn height(&self) -> u32 { self.height } fn iter_rows(&self, start_row: u32) -> impl Iterator { let width = self.width as usize; if width == 0 { [].chunks_exact(1) } else { let start = start_row as usize * width; self.pixels .borrow() .get(start..) .unwrap_or_default() .chunks_exact(width) } } fn split_by_height( &self, start_row: u32, height: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let height = height.get(); let num_parts = num_parts.get(); if num_parts > height || height > self.height() || start_row > self.height() - height { return None; } let mut res = Vec::with_capacity(num_parts as usize); let step = height / num_parts; let mut modulo = height % num_parts; let mut top = start_row; let row_size = self.width as usize; let mut remains_pixels = self.pixels.borrow().split_at(top as usize * row_size).1; for _ in 0..num_parts { let mut part_height = step; if modulo > 0 { part_height += 1; modulo -= 1; } let parts = remains_pixels.split_at(part_height as usize * row_size); let image = TypedImageRef::new(self.width, part_height, parts.0).unwrap(); res.push(image); remains_pixels = parts.1; top += part_height; } debug_assert!(top - start_row == height); Some(res) } } unsafe impl<'a, P: InnerPixel> ImageViewMut for TypedImage<'a, P> { fn iter_rows_mut(&mut self, start_row: u32) -> impl Iterator { let width = self.width as usize; if width == 0 { [].chunks_exact_mut(1) } else { let start = start_row as usize * width; self.pixels .borrow_mut() .get_mut(start..) .unwrap_or_default() .chunks_exact_mut(width) } } fn split_by_height_mut( &mut self, start_row: u32, height: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { let height = height.get(); let num_parts = num_parts.get(); if num_parts > height || height > self.height() || start_row > self.height() - height { return None; } let mut res = Vec::with_capacity(num_parts as usize); let step = height / num_parts; let mut modulo = height % num_parts; let mut top = start_row; let row_size = self.width as usize; let mut remains_pixels = self .pixels .borrow_mut() .split_at_mut(top as usize * row_size) .1; for _ in 0..num_parts { let mut part_height = step; if modulo > 0 { part_height += 1; modulo -= 1; } let parts = remains_pixels.split_at_mut(part_height as usize * row_size); let image = TypedImage::from_pixels_slice(self.width, part_height, parts.0).unwrap(); res.push(image); remains_pixels = parts.1; top += part_height; } debug_assert!(top - start_row == height); Some(res) } } pub(crate) fn align_buffer_to(buffer: &[u8]) -> Result<&[T], ImageBufferError> { let (head, pixels, _) = unsafe { buffer.align_to::() }; if !head.is_empty() { return Err(ImageBufferError::InvalidBufferAlignment); } Ok(pixels) } pub(crate) fn align_buffer_to_mut(buffer: &mut [u8]) -> Result<&mut [T], ImageBufferError> { let (head, pixels, _) = unsafe { buffer.align_to_mut::() }; if !head.is_empty() { return Err(ImageBufferError::InvalidBufferAlignment); } Ok(pixels) } fast_image_resize-5.3.0/src/images/unsafe_image.rs000064400000000000000000000067501046102023000203460ustar 00000000000000use crate::{ArrayChunks, ImageView, ImageViewMut}; use std::marker::PhantomData; use std::num::NonZeroU32; #[derive(Copy)] pub(crate) struct UnsafeImageMut<'a, V> where V: ImageViewMut, { image: std::ptr::NonNull, p: PhantomData<&'a V>, } impl<'a, V> Clone for UnsafeImageMut<'a, V> where V: ImageViewMut, { fn clone(&self) -> Self { Self { image: self.image, p: PhantomData, } } } unsafe impl<'a, V: ImageViewMut> Send for UnsafeImageMut<'a, V> {} unsafe impl<'a, V: ImageViewMut> Sync for UnsafeImageMut<'a, V> {} impl<'a, V: ImageViewMut> UnsafeImageMut<'a, V> { pub fn new(image: &'a mut V) -> Self { let ptr = std::ptr::NonNull::new(image as *mut V).unwrap(); Self { image: ptr, p: PhantomData, } } fn get(&self) -> &V { unsafe { self.image.as_ref() } } fn get_mut(&mut self) -> &mut V { unsafe { self.image.as_mut() } } } unsafe impl<'a, V: ImageViewMut> ImageView for UnsafeImageMut<'a, V> { type Pixel = V::Pixel; fn width(&self) -> u32 { self.get().width() } fn height(&self) -> u32 { self.get().height() } fn iter_rows(&self, start_row: u32) -> impl Iterator { self.get().iter_rows(start_row) } fn iter_2_rows( &self, start_y: u32, max_rows: u32, ) -> ArrayChunks, 2> { self.get().iter_2_rows(start_y, max_rows) } fn iter_4_rows( &self, start_y: u32, max_rows: u32, ) -> ArrayChunks, 4> { self.get().iter_4_rows(start_y, max_rows) } fn iter_rows_with_step( &self, start_y: f64, step: f64, max_rows: u32, ) -> impl Iterator { self.get().iter_rows_with_step(start_y, step, max_rows) } fn split_by_height( &self, start_row: u32, height: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { self.get().split_by_height(start_row, height, num_parts) } fn split_by_width( &self, start_col: u32, width: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { self.get().split_by_width(start_col, width, num_parts) } } unsafe impl<'a, V: ImageViewMut> ImageViewMut for UnsafeImageMut<'a, V> { fn iter_rows_mut(&mut self, start_row: u32) -> impl Iterator { self.get_mut().iter_rows_mut(start_row) } fn iter_2_rows_mut(&mut self) -> ArrayChunks, 2> { self.get_mut().iter_2_rows_mut() } fn iter_4_rows_mut(&mut self) -> ArrayChunks, 4> { self.get_mut().iter_4_rows_mut() } fn split_by_height_mut( &mut self, start_row: u32, height: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { self.get_mut() .split_by_height_mut(start_row, height, num_parts) } fn split_by_width_mut( &mut self, start_col: u32, width: NonZeroU32, num_parts: NonZeroU32, ) -> Option>> { self.get_mut() .split_by_width_mut(start_col, width, num_parts) } } fast_image_resize-5.3.0/src/lib.rs000064400000000000000000000024241046102023000152160ustar 00000000000000#![doc = include_str!("../README.md")] //! //! ## Feature flags #![doc = document_features::document_features!()] pub use alpha::errors::*; pub use array_chunks::*; pub use change_components_type::*; pub use color::mappers::*; pub use color::PixelComponentMapper; pub use convolution::*; pub use cpu_extensions::CpuExtensions; pub use crop_box::*; pub use errors::*; pub use image_view::*; pub use mul_div::MulDiv; pub use pixels::PixelType; pub use resizer::{ResizeAlg, ResizeOptions, Resizer, SrcCropping}; use crate::alpha::AlphaMulDiv; #[macro_use] mod utils; mod alpha; mod array_chunks; mod change_components_type; mod color; mod convolution; mod cpu_extensions; mod crop_box; mod errors; mod image_view; pub mod images; mod mul_div; #[cfg(target_arch = "aarch64")] mod neon_utils; pub mod pixels; mod resizer; #[cfg(target_arch = "x86_64")] mod simd_utils; #[cfg(feature = "for_testing")] pub mod testing; #[cfg(feature = "rayon")] pub(crate) mod threading; #[cfg(target_arch = "wasm32")] mod wasm32_utils; /// A trait implemented by all pixel types from the crate. /// /// This trait must be used in your code instead of [InnerPixel](pixels::InnerPixel). #[allow(private_bounds)] pub trait PixelTrait: Convolution + AlphaMulDiv {} impl PixelTrait for P {} fast_image_resize-5.3.0/src/mul_div.rs000064400000000000000000000243521046102023000161130ustar 00000000000000use crate::cpu_extensions::CpuExtensions; use crate::image_view::{try_pixel_type, ImageViewMut, IntoImageView, IntoImageViewMut}; use crate::pixels::{F32x2, F32x4, U16x2, U16x4, U8x2, U8x4}; use crate::{ImageError, ImageView, MulDivImagesError, PixelTrait, PixelType}; /// Methods of this structure used to multiply or divide color-channels (RGB or Luma) /// by alpha-channel. Supported pixel types: U8x2, U8x4, U16x2, U16x4, F32x2 and F32x4. /// /// By default, the instance of `MulDiv` created with the best CPU-extension provided by your CPU. /// You can change this by using method [MulDiv::set_cpu_extensions]. /// /// # Examples /// /// ``` /// use fast_image_resize::pixels::PixelType; /// use fast_image_resize::images::Image; /// use fast_image_resize::MulDiv; /// /// let width: u32 = 10; /// let height: u32 = 7; /// let src_image = Image::new(width, height, PixelType::U8x4); /// let mut dst_image = Image::new(width, height, PixelType::U8x4); /// /// let mul_div = MulDiv::new(); /// mul_div.multiply_alpha(&src_image, &mut dst_image).unwrap(); /// ``` #[derive(Default, Debug, Clone)] pub struct MulDiv { cpu_extensions: CpuExtensions, } impl MulDiv { pub fn new() -> Self { Default::default() } pub fn cpu_extensions(&self) -> CpuExtensions { self.cpu_extensions } /// # Safety /// This is unsafe because this method allows you to set a CPU-extensions /// that is not actually supported by your CPU. pub unsafe fn set_cpu_extensions(&mut self, extensions: CpuExtensions) { self.cpu_extensions = extensions; } /// Multiplies color-channels (RGB or Luma) of source image by alpha-channel and store /// result into destination image. pub fn multiply_alpha( &self, src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MulDivImagesError> { let src_pixel_type = try_pixel_type(src_image)?; let dst_pixel_type = try_pixel_type(dst_image)?; if src_pixel_type != dst_pixel_type { return Err(MulDivImagesError::PixelTypesAreDifferent); } #[cfg(not(feature = "only_u8x4"))] match src_pixel_type { PixelType::U8x2 => self.multiply::(src_image, dst_image), PixelType::U8x4 => self.multiply::(src_image, dst_image), PixelType::U16x2 => self.multiply::(src_image, dst_image), PixelType::U16x4 => self.multiply::(src_image, dst_image), PixelType::F32x2 => self.multiply::(src_image, dst_image), PixelType::F32x4 => self.multiply::(src_image, dst_image), _ => Err(MulDivImagesError::ImageError( ImageError::UnsupportedPixelType, )), } #[cfg(feature = "only_u8x4")] match src_pixel_type { PixelType::U8x4 => self.multiply::(src_image, dst_image), _ => Err(MulDivImagesError::ImageError( ImageError::UnsupportedPixelType, )), } } #[inline] fn multiply( &self, src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MulDivImagesError> { match (src_image.image_view(), dst_image.image_view_mut()) { (Some(src), Some(mut dst)) => self.multiply_alpha_typed::

(&src, &mut dst), _ => Err(MulDivImagesError::ImageError( ImageError::UnsupportedPixelType, )), } } pub fn multiply_alpha_typed( &self, src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) -> Result<(), MulDivImagesError> { if src_view.width() != dst_view.width() || src_view.height() != dst_view.height() { return Err(MulDivImagesError::SizeIsDifferent); } if src_view.width() > 0 && src_view.height() > 0 { P::multiply_alpha(src_view, dst_view, self.cpu_extensions)?; } Ok(()) } /// Multiplies color-channels (RGB or Luma) of image by alpha-channel inplace. pub fn multiply_alpha_inplace( &self, image: &mut impl IntoImageViewMut, ) -> Result<(), ImageError> { let pixel_type = try_pixel_type(image)?; #[cfg(not(feature = "only_u8x4"))] match pixel_type { PixelType::U8x2 => self.multiply_inplace::(image), PixelType::U8x4 => self.multiply_inplace::(image), PixelType::U16x2 => self.multiply_inplace::(image), PixelType::U16x4 => self.multiply_inplace::(image), PixelType::F32x2 => self.multiply_inplace::(image), PixelType::F32x4 => self.multiply_inplace::(image), _ => Err(ImageError::UnsupportedPixelType), } #[cfg(feature = "only_u8x4")] match pixel_type { PixelType::U8x4 => self.multiply_inplace::(image), _ => Err(ImageError::UnsupportedPixelType), } } #[inline] fn multiply_inplace( &self, image: &mut impl IntoImageViewMut, ) -> Result<(), ImageError> { match image.image_view_mut() { Some(mut view) => self.multiply_alpha_inplace_typed::

(&mut view), _ => Err(ImageError::UnsupportedPixelType), } } pub fn multiply_alpha_inplace_typed( &self, img_view: &mut impl ImageViewMut, ) -> Result<(), ImageError> { if img_view.width() > 0 && img_view.height() > 0 { P::multiply_alpha_inplace(img_view, self.cpu_extensions) } else { Ok(()) } } /// Divides color-channels (RGB or Luma) of source image by alpha-channel and store /// result into destination image. pub fn divide_alpha( &self, src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MulDivImagesError> { let src_pixel_type = try_pixel_type(src_image)?; let dst_pixel_type = try_pixel_type(dst_image)?; if src_pixel_type != dst_pixel_type { return Err(MulDivImagesError::PixelTypesAreDifferent); } #[cfg(not(feature = "only_u8x4"))] match src_pixel_type { PixelType::U8x2 => self.divide::(src_image, dst_image), PixelType::U8x4 => self.divide::(src_image, dst_image), PixelType::U16x2 => self.divide::(src_image, dst_image), PixelType::U16x4 => self.divide::(src_image, dst_image), PixelType::F32x2 => self.divide::(src_image, dst_image), PixelType::F32x4 => self.divide::(src_image, dst_image), _ => Err(MulDivImagesError::ImageError( ImageError::UnsupportedPixelType, )), } #[cfg(feature = "only_u8x4")] match src_pixel_type { PixelType::U8x4 => self.divide::(src_image, dst_image), _ => Err(MulDivImagesError::ImageError( ImageError::UnsupportedPixelType, )), } } #[inline] fn divide( &self, src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, ) -> Result<(), MulDivImagesError> { match (src_image.image_view(), dst_image.image_view_mut()) { (Some(src), Some(mut dst)) => self.divide_alpha_typed::

(&src, &mut dst), _ => Err(MulDivImagesError::ImageError( ImageError::UnsupportedPixelType, )), } } pub fn divide_alpha_typed( &self, src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, ) -> Result<(), MulDivImagesError> { if src_view.width() != dst_view.width() || src_view.height() != dst_view.height() { return Err(MulDivImagesError::SizeIsDifferent); } if src_view.width() > 0 && src_view.height() > 0 { P::divide_alpha(src_view, dst_view, self.cpu_extensions)?; } Ok(()) } /// Divides color-channels (RGB or Luma) of image by alpha-channel inplace. pub fn divide_alpha_inplace( &self, image: &mut impl IntoImageViewMut, ) -> Result<(), ImageError> { let pixel_type = try_pixel_type(image)?; #[cfg(not(feature = "only_u8x4"))] match pixel_type { PixelType::U8x2 => self.divide_inplace::(image), PixelType::U8x4 => self.divide_inplace::(image), PixelType::U16x2 => self.divide_inplace::(image), PixelType::U16x4 => self.divide_inplace::(image), PixelType::F32x2 => self.divide_inplace::(image), PixelType::F32x4 => self.divide_inplace::(image), _ => Err(ImageError::UnsupportedPixelType), } #[cfg(feature = "only_u8x4")] match pixel_type { PixelType::U8x4 => self.divide_inplace::(image), _ => Err(ImageError::UnsupportedPixelType), } } #[inline] fn divide_inplace( &self, image: &mut impl IntoImageViewMut, ) -> Result<(), ImageError> { match image.image_view_mut() { Some(mut view) => self.divide_alpha_inplace_typed::

(&mut view), _ => Err(ImageError::UnsupportedPixelType), } } pub fn divide_alpha_inplace_typed( &self, img_view: &mut impl ImageViewMut, ) -> Result<(), ImageError> { if img_view.width() > 0 && img_view.height() > 0 { P::divide_alpha_inplace(img_view, self.cpu_extensions) } else { Ok(()) } } pub fn is_supported(&self, pixel_type: PixelType) -> bool { #[cfg(not(feature = "only_u8x4"))] { matches!( pixel_type, PixelType::U8x2 | PixelType::U8x4 | PixelType::U16x2 | PixelType::U16x4 | PixelType::F32x2 | PixelType::F32x4 ) } #[cfg(feature = "only_u8x4")] { matches!(pixel_type, PixelType::U8x4) } } } fast_image_resize-5.3.0/src/neon_utils.rs000064400000000000000000000331011046102023000166230ustar 00000000000000use std::arch::aarch64::*; use crate::pixels::InnerPixel; #[inline(always)] pub unsafe fn load_u8x1(buf: &[T], index: usize) -> uint8x8_t { let ptr = buf.get_unchecked(index..).as_ptr() as *const u8; vcreate_u8(ptr.read_unaligned() as u64) } #[inline(always)] pub unsafe fn load_u8x2(buf: &[T], index: usize) -> uint8x8_t { let ptr = buf.get_unchecked(index..).as_ptr() as *const u16; vcreate_u8(ptr.read_unaligned() as u64) } #[inline(always)] pub unsafe fn load_u8x4(buf: &[T], index: usize) -> uint8x8_t { let ptr = buf.get_unchecked(index..).as_ptr() as *const u32; vcreate_u8(ptr.read_unaligned() as u64) } #[inline(always)] pub unsafe fn load_u8x8(buf: &[T], index: usize) -> uint8x8_t { vld1_u8(buf.get_unchecked(index..).as_ptr() as *const u8) } #[inline(always)] pub unsafe fn load_u8x8x3(buf: &[T], index: usize) -> uint8x8x3_t { vld1_u8_x3(buf.get_unchecked(index..).as_ptr() as *const u8) } #[inline(always)] pub unsafe fn load_deintrel_u8x8x2(buf: &[T], index: usize) -> uint8x8x2_t { vld2_u8(buf.get_unchecked(index..).as_ptr() as *const u8) } #[inline(always)] pub unsafe fn load_deintrel_u8x8x4(buf: &[T], index: usize) -> uint8x8x4_t { vld4_u8(buf.get_unchecked(index..).as_ptr() as *const u8) } #[inline(always)] pub unsafe fn load_u8x16(buf: &[T], index: usize) -> uint8x16_t { vld1q_u8(buf.get_unchecked(index..).as_ptr() as *const u8) } #[inline(always)] pub unsafe fn load_u8x16x2(buf: &[T], index: usize) -> uint8x16x2_t { vld1q_u8_x2(buf.get_unchecked(index..).as_ptr() as *const u8) } #[inline(always)] pub unsafe fn load_u8x16x4(buf: &[T], index: usize) -> uint8x16x4_t { vld1q_u8_x4(buf.get_unchecked(index..).as_ptr() as *const u8) } #[inline(always)] pub unsafe fn load_deintrel_u8x16x2(buf: &[T], index: usize) -> uint8x16x2_t { vld2q_u8(buf.get_unchecked(index..).as_ptr() as *const u8) } // #[inline(always)] // pub unsafe fn load_deintrel_u8x16x3(buf: &[T], index: usize) -> uint8x16x3_t { // vld3q_u8(buf.get_unchecked(index..).as_ptr() as *const u8) // } #[inline(always)] pub unsafe fn load_deintrel_u8x16x4(buf: &[T], index: usize) -> uint8x16x4_t { vld4q_u8(buf.get_unchecked(index..).as_ptr() as *const u8) } #[inline(always)] pub unsafe fn load_u16x1(buf: &[T], index: usize) -> uint16x4_t { let ptr = buf.get_unchecked(index..).as_ptr() as *const u16; vcreate_u16(ptr.read_unaligned() as u64) } #[inline(always)] pub unsafe fn load_u16x2(buf: &[T], index: usize) -> uint16x4_t { let ptr = buf.get_unchecked(index..).as_ptr() as *const u32; vcreate_u16(ptr.read_unaligned() as u64) } #[inline(always)] pub unsafe fn load_u16x4(buf: &[T], index: usize) -> uint16x4_t { vld1_u16(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_u16x8(buf: &[T], index: usize) -> uint16x8_t { vld1q_u16(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_u16x8x2(buf: &[T], index: usize) -> uint16x8x2_t { vld1q_u16_x2(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_u16x8x4(buf: &[T], index: usize) -> uint16x8x4_t { vld1q_u16_x4(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_deintrel_u16x1x3>( buf: &[T], index: usize, ) -> uint16x4x3_t { let mut arr = [0u16; 12]; let src_ptr = buf.get_unchecked(index..).as_ptr() as *const u16; let dst_ptr = arr.as_mut_ptr(); std::ptr::copy_nonoverlapping(src_ptr, dst_ptr, 3); vld3_u16(arr.as_ptr()) } #[inline(always)] pub unsafe fn load_deintrel_u16x2x3>( buf: &[T], index: usize, ) -> uint16x4x3_t { let mut arr = [0u16; 12]; let src_ptr = buf.get_unchecked(index..).as_ptr() as *const u16; let dst_ptr = arr.as_mut_ptr(); std::ptr::copy_nonoverlapping(src_ptr, dst_ptr, 6); vld3_u16(arr.as_ptr()) } #[inline(always)] pub unsafe fn load_deintrel_u16x4x3>( buf: &[T], index: usize, ) -> uint16x4x3_t { vld3_u16(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_deintrel_u16x4x4>( buf: &[T], index: usize, ) -> uint16x4x4_t { vld4_u16(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_deintrel_u16x4x2>( buf: &[T], index: usize, ) -> uint16x4x2_t { vld2_u16(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_deintrel_u16x8x2>( buf: &[T], index: usize, ) -> uint16x8x2_t { vld2q_u16(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_deintrel_u16x8x3>( buf: &[T], index: usize, ) -> uint16x8x3_t { vld3q_u16(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_deintrel_u16x8x4>( buf: &[T], index: usize, ) -> uint16x8x4_t { vld4q_u16(buf.get_unchecked(index..).as_ptr() as *const u16) } #[inline(always)] pub unsafe fn load_i32x1(buf: &[T], index: usize) -> int32x2_t { let ptr = buf.get_unchecked(index..).as_ptr() as *const u32; vcreate_s32(ptr.read_unaligned() as u64) } #[inline(always)] pub unsafe fn load_i32x2(buf: &[T], index: usize) -> int32x2_t { vld1_s32(buf.get_unchecked(index..).as_ptr() as *const i32) } #[inline(always)] pub unsafe fn load_i32x4(buf: &[T], index: usize) -> int32x4_t { vld1q_s32(buf.get_unchecked(index..).as_ptr() as *const i32) } #[inline(always)] pub unsafe fn store_i32x4(buf: &mut [T], index: usize, v: int32x4_t) { vst1q_s32(buf.get_unchecked_mut(index..).as_mut_ptr() as *mut i32, v); } #[inline(always)] pub unsafe fn load_i32x4x2(buf: &[T], index: usize) -> int32x4x2_t { vld1q_s32_x2(buf.get_unchecked(index..).as_ptr() as *const i32) } #[inline(always)] pub unsafe fn store_i32x4x2(buf: &mut [T], index: usize, v: int32x4x2_t) { vst1q_s32_x2(buf.get_unchecked_mut(index..).as_mut_ptr() as *mut i32, v); } #[inline(always)] pub unsafe fn load_i32x4x4(buf: &[T], index: usize) -> int32x4x4_t { vld1q_s32_x4(buf.get_unchecked(index..).as_ptr() as *const i32) } #[inline(always)] pub unsafe fn store_i32x4x4(buf: &mut [T], index: usize, v: int32x4x4_t) { vst1q_s32_x4(buf.get_unchecked_mut(index..).as_mut_ptr() as *mut i32, v); } #[inline(always)] pub unsafe fn load_i64x2(buf: &[T], index: usize) -> int64x2_t { vld1q_s64(buf.get_unchecked(index..).as_ptr() as *const i64) } #[inline(always)] pub unsafe fn load_i64x2x2(buf: &[T], index: usize) -> int64x2x2_t { vld1q_s64_x2(buf.get_unchecked(index..).as_ptr() as *const i64) } #[inline(always)] pub unsafe fn load_i64x2x4(buf: &[T], index: usize) -> int64x2x4_t { vld1q_s64_x4(buf.get_unchecked(index..).as_ptr() as *const i64) } #[inline(always)] pub unsafe fn store_i64x2x2(buf: &mut [T], index: usize, v: int64x2x2_t) { vst1q_s64_x2(buf.get_unchecked_mut(index..).as_mut_ptr() as *mut i64, v); } #[inline(always)] pub unsafe fn store_i64x2x4(buf: &mut [T], index: usize, v: int64x2x4_t) { vst1q_s64_x4(buf.get_unchecked_mut(index..).as_mut_ptr() as *mut i64, v); } #[inline(always)] pub unsafe fn load_i16x1(buf: &[T], index: usize) -> int16x4_t { let ptr = buf.get_unchecked(index..).as_ptr() as *const u16; vcreate_s16(ptr.read_unaligned() as u64) } #[inline(always)] pub unsafe fn load_i16x2(buf: &[T], index: usize) -> int16x4_t { let ptr = buf.get_unchecked(index..).as_ptr() as *const u32; vcreate_s16(ptr.read_unaligned() as u64) } #[inline(always)] pub unsafe fn load_i16x4(buf: &[T], index: usize) -> int16x4_t { vld1_s16(buf.get_unchecked(index..).as_ptr() as *const i16) } #[inline(always)] pub unsafe fn load_i16x8(buf: &[T], index: usize) -> int16x8_t { vld1q_s16(buf.get_unchecked(index..).as_ptr() as *const i16) } #[inline(always)] pub unsafe fn load_i16x8x2(buf: &[T], index: usize) -> int16x8x2_t { vld1q_s16_x2(buf.get_unchecked(index..).as_ptr() as *const i16) } /// Moves 32-bit integer from `buf` to the least significant 32 bits of an uint8x16_t object, /// zero extending the upper bits. /// ```plain /// r0 := a /// r1 := 0x0 /// r2 := 0x0 /// r3 := 0x0 /// ``` #[inline(always)] pub unsafe fn create_u8x16_from_one_u32(buf: &[T], index: usize) -> uint8x16_t { let ptr = buf.get_unchecked(index..).as_ptr() as *const u32; vreinterpretq_u8_u32(vsetq_lane_u32::<0>(ptr.read_unaligned(), vdupq_n_u32(0u32))) } /// Multiply the packed unsigned 16-bit integers in a and b, producing /// intermediate 32-bit integers, and store the high 16 bits of the intermediate /// integers in dst. // #[inline(always)] // pub unsafe fn mulhi_u16x8(a: uint16x8_t, b: uint16x8_t) -> uint16x8_t { // let a3210 = vget_low_u16(a); // let b3210 = vget_low_u16(b); // let ab3210 = vmull_u16(a3210, b3210); // let ab7654 = vmull_high_u16(a, b); // vuzp2q_u16(vreinterpretq_u16_u32(ab3210), vreinterpretq_u16_u32(ab7654)) // } /// Multiply the packed unsigned 32-bit integers in a and b, producing /// intermediate 64-bit integers, and store the high 32 bits of the intermediate /// integers in dst. // #[inline(always)] // pub unsafe fn mulhi_u32x4(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t { // let a3210 = vget_low_u32(a); // let b3210 = vget_low_u32(b); // let ab3210 = vmull_u32(a3210, b3210); // let ab7654 = vmull_high_u32(a, b); // vuzp2q_u32(vreinterpretq_u32_u64(ab3210), vreinterpretq_u32_u64(ab7654)) // } #[inline] #[target_feature(enable = "neon")] pub unsafe fn mul_color_to_alpha_u8x16( color: uint8x16_t, alpha_u16: uint16x8x2_t, zero: uint8x16_t, ) -> uint8x16_t { let color_u16_lo = vreinterpretq_u16_u8(vzip1q_u8(color, zero)); let mut tmp_res = vmulq_u16(color_u16_lo, alpha_u16.0); tmp_res = vaddq_u16(tmp_res, vrshrq_n_u16::<8>(tmp_res)); let res_u16_lo = vrshrq_n_u16::<8>(tmp_res); let color_u16_hi = vreinterpretq_u16_u8(vzip2q_u8(color, zero)); let mut tmp_res = vmulq_u16(color_u16_hi, alpha_u16.1); tmp_res = vaddq_u16(tmp_res, vrshrq_n_u16::<8>(tmp_res)); let res_u16_hi = vrshrq_n_u16::<8>(tmp_res); vcombine_u8(vqmovn_u16(res_u16_lo), vqmovn_u16(res_u16_hi)) } #[inline] #[target_feature(enable = "neon")] pub unsafe fn mul_color_to_alpha_u8x8( color: uint8x8_t, alpha_u16: uint16x8_t, zero: uint8x8_t, ) -> uint8x8_t { let color_u16_lo = vreinterpret_u16_u8(vzip1_u8(color, zero)); let color_u16_hi = vreinterpret_u16_u8(vzip2_u8(color, zero)); let color_u16 = vcombine_u16(color_u16_lo, color_u16_hi); let mut tmp_res = vmulq_u16(color_u16, alpha_u16); tmp_res = vaddq_u16(tmp_res, vrshrq_n_u16::<8>(tmp_res)); let res_u16 = vrshrq_n_u16::<8>(tmp_res); vqmovn_u16(res_u16) } #[inline(always)] pub unsafe fn multiply_color_to_alpha_u16x8(color: uint16x8_t, alpha: uint16x8_t) -> uint16x8_t { let rounder = vdupq_n_u32(0x8000); let color_lo_u32 = vmlal_u16(rounder, vget_low_u16(color), vget_low_u16(alpha)); let color_hi_u32 = vmlal_high_u16(rounder, color, alpha); let color_lo_u16 = vaddhn_u32(color_lo_u32, vshrq_n_u32::<16>(color_lo_u32)); let color_hi_u16 = vaddhn_u32(color_hi_u32, vshrq_n_u32::<16>(color_hi_u32)); vcombine_u16(color_lo_u16, color_hi_u16) } #[inline(always)] pub unsafe fn multiply_color_to_alpha_u16x4(color: uint16x4_t, alpha: uint16x4_t) -> uint16x4_t { let rounder = vdupq_n_u32(0x8000); let color_u32 = vmlal_u16(rounder, color, alpha); vaddhn_u32(color_u32, vshrq_n_u32::<16>(color_u32)) } unsafe fn mul_color_alpha_u16x8(color: uint16x8_t, alpha: uint16x8_t) -> uint16x8_t { let res_color_lo_u16 = vrshrn_n_u32::<16>(vmull_u16(vget_low_u16(color), vget_low_u16(alpha))); let res_color_hi_u16 = vrshrn_n_u32::<16>(vmull_high_u16(color, alpha)); vcombine_u16(res_color_lo_u16, res_color_hi_u16) } #[inline(always)] pub unsafe fn mul_color_recip_alpha_u8x16( color: uint8x16_t, recip_alpha: uint16x8x2_t, zero: uint8x16_t, ) -> uint8x16_t { let color_u16_lo = vreinterpretq_u16_u8(vzip1q_u8(zero, color)); let color_u16_hi = vreinterpretq_u16_u8(vzip2q_u8(zero, color)); let res_u16_lo = mul_color_alpha_u16x8(color_u16_lo, recip_alpha.0); let res_u16_hi = mul_color_alpha_u16x8(color_u16_hi, recip_alpha.1); vcombine_u8(vqmovn_u16(res_u16_lo), vqmovn_u16(res_u16_hi)) } #[inline(always)] pub unsafe fn mul_color_recip_alpha_u8x8( color: uint8x8_t, recip_alpha: uint16x8_t, zero: uint8x8_t, ) -> uint8x8_t { let color_u16_lo = vreinterpret_u16_u8(vzip1_u8(zero, color)); let color_u16_hi = vreinterpret_u16_u8(vzip2_u8(zero, color)); let color_u16 = vcombine_u16(color_u16_lo, color_u16_hi); let res_color_u16 = mul_color_alpha_u16x8(color_u16, recip_alpha); vqmovn_u16(res_color_u16) } #[inline(always)] pub unsafe fn mul_color_recip_alpha_u16x8( color: uint16x8_t, recip_alpha_lo: float32x4_t, recip_alpha_hi: float32x4_t, zero: uint16x8_t, ) -> uint16x8_t { let max_value = vdupq_n_u32(0xffff); let color_lo_f32 = vcvtq_f32_u32(vreinterpretq_u32_u16(vzip1q_u16(color, zero))); let mut res_lo_u32 = vcvtaq_u32_f32(vmulq_f32(color_lo_f32, recip_alpha_lo)); res_lo_u32 = vminq_u32(res_lo_u32, max_value); let color_hi_f32 = vcvtq_f32_u32(vreinterpretq_u32_u16(vzip2q_u16(color, zero))); let mut res_hi_u32 = vcvtaq_u32_f32(vmulq_f32(color_hi_f32, recip_alpha_hi)); res_hi_u32 = vminq_u32(res_hi_u32, max_value); vcombine_u16(vmovn_u32(res_lo_u32), vmovn_u32(res_hi_u32)) } fast_image_resize-5.3.0/src/pixels.rs000064400000000000000000000243671046102023000157660ustar 00000000000000//! Contains types of pixels. use std::fmt::{Debug, Formatter}; use std::marker::PhantomData; use std::mem::size_of; use std::slice; #[derive(Debug, Clone, Copy, PartialEq, Eq)] #[non_exhaustive] pub enum PixelType { U8, U8x2, U8x3, U8x4, U16, U16x2, U16x3, U16x4, I32, F32, F32x2, F32x3, F32x4, } impl PixelType { /// Returns pixel size in bytes. pub fn size(&self) -> usize { match self { Self::U8 => 1, Self::U8x2 => 2, Self::U8x3 => 3, Self::U16 => 2, Self::U16x2 => 4, Self::U16x3 => 6, Self::U16x4 => 8, Self::F32x2 => 8, Self::F32x3 => 12, Self::F32x4 => 16, _ => 4, } } /// Returns `true` if given buffer is aligned by the alignment of pixel. pub(crate) fn is_aligned(&self, buffer: &[u8]) -> bool { match self { Self::U8 => true, Self::U8x2 => unsafe { buffer.align_to::().0.is_empty() }, Self::U8x3 => unsafe { buffer.align_to::().0.is_empty() }, Self::U8x4 => unsafe { buffer.align_to::().0.is_empty() }, Self::U16 => unsafe { buffer.align_to::().0.is_empty() }, Self::U16x2 => unsafe { buffer.align_to::().0.is_empty() }, Self::U16x3 => unsafe { buffer.align_to::().0.is_empty() }, Self::U16x4 => unsafe { buffer.align_to::().0.is_empty() }, Self::I32 => unsafe { buffer.align_to::().0.is_empty() }, Self::F32 => unsafe { buffer.align_to::().0.is_empty() }, Self::F32x2 => unsafe { buffer.align_to::().0.is_empty() }, Self::F32x3 => unsafe { buffer.align_to::().0.is_empty() }, Self::F32x4 => unsafe { buffer.align_to::().0.is_empty() }, } } } pub trait GetCount { fn count() -> usize; } /// Generic type to represent the number of components in single pixel. pub struct Count; impl GetCount for Count { #[inline(always)] fn count() -> usize { N } } pub trait GetCountOfValues { fn count_of_values() -> usize; } /// Generic type to represent the number of available values for a single pixel component. pub struct Values; impl GetCountOfValues for Values { fn count_of_values() -> usize { N } } /// Information about one component of pixel. pub trait PixelComponent where Self: Sized + Copy + Debug + PartialEq + 'static, { /// Type that provides information about a count of /// available values of one pixel's component type CountOfComponentValues: GetCountOfValues; /// Count of available values of one pixel's component fn count_of_values() -> usize { Self::CountOfComponentValues::count_of_values() } } impl PixelComponent for u8 { type CountOfComponentValues = Values<0x100>; } impl PixelComponent for u16 { type CountOfComponentValues = Values<0x10000>; } impl PixelComponent for i32 { type CountOfComponentValues = Values<0>; } impl PixelComponent for f32 { type CountOfComponentValues = Values<0>; } // Prevent users from implementing the InnerPixel trait. mod private { pub trait Sealed {} } /// Inner trait that provides additional information about pixel type. /// /// Don't use this trait in your code. You must use the "child" /// trait [PixelTrait](crate::PixelTrait) instead. /// /// This trait is sealed and cannot be implemented for types outside this crate. pub trait InnerPixel: private::Sealed + Copy + Clone + Sized + Debug + PartialEq + Default + Send + Sync + 'static { /// Type of pixel components type Component: PixelComponent; /// Type that provides information about a count of pixel's components type CountOfComponents: GetCount; fn pixel_type() -> PixelType; /// Count of pixel's components fn count_of_components() -> usize { Self::CountOfComponents::count() } /// Count of available values of one pixel's component fn count_of_component_values() -> usize { Self::Component::count_of_values() } fn components_is_u8() -> bool { Self::count_of_component_values() == 256 } /// Size of pixel in bytes /// /// Example: /// ``` /// # use fast_image_resize::pixels::{U8x2, U8x3, U8, InnerPixel}; /// assert_eq!(U8x3::size(), 3); /// assert_eq!(U8x2::size(), 2); /// assert_eq!(U8::size(), 1); /// ``` fn size() -> usize { size_of::() } /// Create slice of pixel's components from slice of pixels fn components(buf: &[Self]) -> &[Self::Component] { let size = buf.len() * Self::count_of_components(); let components_ptr = buf.as_ptr() as *const Self::Component; unsafe { slice::from_raw_parts(components_ptr, size) } } /// Create mutable slice of pixel's components from mutable slice of pixels fn components_mut(buf: &mut [Self]) -> &mut [Self::Component] { let size = buf.len() * Self::count_of_components(); let components_ptr = buf.as_mut_ptr() as *mut Self::Component; unsafe { slice::from_raw_parts_mut(components_ptr, size) } } /// Returns empty pixel value fn empty() -> Self { Self::default() } } /// Generic type of pixel. #[derive(Copy, Clone, PartialEq, Default)] #[repr(C)] pub struct Pixel( pub T, PhantomData<[C; COUNT_OF_COMPONENTS]>, ) where T: Sized + Copy + Clone + PartialEq + 'static, C: PixelComponent; impl Pixel where T: Sized + Copy + Clone + PartialEq + Default + 'static, C: PixelComponent, { #[inline(always)] pub const fn new(v: T) -> Self { Self(v, PhantomData) } } macro_rules! pixel_struct { ($name:ident, $type:tt, $comp_type:tt, $comp_count:literal, $pixel_type:expr, $doc:expr) => { #[doc = $doc] pub type $name = Pixel<$type, $comp_type, $comp_count>; impl private::Sealed for $name {} impl InnerPixel for $name { type Component = $comp_type; type CountOfComponents = Count<$comp_count>; fn pixel_type() -> PixelType { $pixel_type } } impl Debug for $name { fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { let components_ptr = self as *const _ as *const $comp_type; let components: &[$comp_type] = unsafe { slice::from_raw_parts(components_ptr, $comp_count) }; write!(f, "{}{:?}", stringify!($name), components) } } }; } pixel_struct!(U8, u8, u8, 1, PixelType::U8, "One byte per pixel (e.g. L8)"); pixel_struct!( U8x2, [u8; 2], u8, 2, PixelType::U8x2, "Two bytes per pixel (e.g. LA8)" ); pixel_struct!( U8x3, [u8; 3], u8, 3, PixelType::U8x3, "Three bytes per pixel (e.g. RGB8)" ); pixel_struct!( U8x4, [u8; 4], u8, 4, PixelType::U8x4, "Four bytes per pixel (RGBA8, RGBx8, CMYK8 and other)" ); pixel_struct!( U16, u16, u16, 1, PixelType::U16, "One `u16` component per pixel (e.g. L16)" ); pixel_struct!( U16x2, [u16; 2], u16, 2, PixelType::U16x2, "Two `u16` components per pixel (e.g. LA16)" ); pixel_struct!( U16x3, [u16; 3], u16, 3, PixelType::U16x3, "Three `u16` components per pixel (e.g. RGB16)" ); pixel_struct!( U16x4, [u16; 4], u16, 4, PixelType::U16x4, "Four `u16` components per pixel (e.g. RGBA16)" ); pixel_struct!( I32, i32, i32, 1, PixelType::I32, "One `i32` component per pixel" ); pixel_struct!( F32, f32, f32, 1, PixelType::F32, "One `f32` component per pixel" ); pixel_struct!( F32x2, [f32; 2], f32, 2, PixelType::F32x2, "Two `f32` component per pixel (e.g. LA32F)" ); pixel_struct!( F32x3, [f32; 3], f32, 3, PixelType::F32x3, "Three `f32` components per pixel (e.g. RGB32F)" ); pixel_struct!( F32x4, [f32; 4], f32, 4, PixelType::F32x4, "Four `f32` components per pixel (e.g. RGBA32F)" ); pub trait IntoPixelComponent where Self: PixelComponent, { fn into_component(self) -> Out; } impl IntoPixelComponent for C { fn into_component(self) -> C { self } } // u8 impl IntoPixelComponent for u8 { fn into_component(self) -> u16 { u16::from_le_bytes([self, self]) } } impl IntoPixelComponent for u8 { fn into_component(self) -> i32 { (self as i32) << 23 } } impl IntoPixelComponent for u8 { fn into_component(self) -> f32 { (self as f32) / u8::MAX as f32 } } // u16 impl IntoPixelComponent for u16 { fn into_component(self) -> u8 { self.to_le_bytes()[1] } } impl IntoPixelComponent for u16 { fn into_component(self) -> i32 { (self as i32) << 15 } } impl IntoPixelComponent for u16 { fn into_component(self) -> f32 { (self as f32) / u16::MAX as f32 } } // i32 impl IntoPixelComponent for i32 { fn into_component(self) -> u8 { (self.max(0).saturating_add(1 << 22) >> 23) as u8 } } impl IntoPixelComponent for i32 { fn into_component(self) -> u16 { (self.max(0).saturating_add(1 << 14) >> 15) as u16 } } impl IntoPixelComponent for i32 { fn into_component(self) -> f32 { if self < 0 { (self as f32) / i32::MIN as f32 } else { (self as f32) / i32::MAX as f32 } } } // f32 impl IntoPixelComponent for f32 { fn into_component(self) -> u8 { (self.clamp(0., 1.) * u8::MAX as f32).round() as u8 } } impl IntoPixelComponent for f32 { fn into_component(self) -> u16 { (self.clamp(0., 1.) * u16::MAX as f32).round() as u16 } } impl IntoPixelComponent for f32 { fn into_component(self) -> i32 { let max = if self < 0. { i32::MIN as f32 } else { i32::MAX as f32 }; (self.clamp(-1., 1.) * max).round() as i32 } } fast_image_resize-5.3.0/src/resizer.rs000064400000000000000000000562061046102023000161420ustar 00000000000000use crate::convolution::{self, FilterType}; use crate::crop_box::CroppedSrcImageView; use crate::image_view::{try_pixel_type, ImageView, ImageViewMut, IntoImageView, IntoImageViewMut}; use crate::images::TypedImage; use crate::pixels::{self, InnerPixel}; use crate::{ CpuExtensions, CropBox, DifferentDimensionsError, MulDiv, PixelTrait, PixelType, ResizeError, }; #[derive(Debug, Clone, Copy, Eq, PartialEq)] #[non_exhaustive] pub enum ResizeAlg { Nearest, Convolution(FilterType), /// It is like `Convolution` but with a fixed kernel size. /// /// This algorithm can be useful if you want to get a result /// similar to `OpenCV` (except `INTER_AREA` interpolation). Interpolation(FilterType), SuperSampling(FilterType, u8), } impl Default for ResizeAlg { fn default() -> Self { Self::Convolution(FilterType::Lanczos3) } } #[derive(Debug, Default, Clone, Copy)] #[non_exhaustive] pub enum SrcCropping { #[default] None, Crop(CropBox), FitIntoDestination((f64, f64)), } /// Options for configuring a resize process. #[derive(Debug, Clone, Copy)] pub struct ResizeOptions { /// Default: `ResizeAlg::Convolution(FilterType::Lanczos3)` pub algorithm: ResizeAlg, /// Default: `SrcCropping::None`. pub cropping: SrcCropping, /// Enable or disable consideration of the alpha channel when resizing. /// /// Default: `true`. pub mul_div_alpha: bool, } impl Default for ResizeOptions { fn default() -> Self { Self { algorithm: ResizeAlg::Convolution(FilterType::Lanczos3), cropping: SrcCropping::None, mul_div_alpha: true, } } } impl ResizeOptions { pub fn new() -> Self { Default::default() } /// Set resize algorythm. pub fn resize_alg(&self, resize_alg: ResizeAlg) -> Self { let mut options = *self; options.algorithm = resize_alg; options } /// Set crop box for source image. pub fn crop(&self, left: f64, top: f64, width: f64, height: f64) -> Self { let mut options = *self; options.cropping = SrcCropping::Crop(CropBox { left, top, width, height, }); options } /// Fit a source image into the aspect ratio of a destination image without distortions. /// /// `centering` is used to control the cropping position. Use (0.5, 0.5) for /// center cropping (e.g. if cropping the width, take 50% off /// of the left side, and therefore 50% off the right side). /// (0.0, 0.0) will crop from the top left corner (i.e. if /// cropping the width, take all the crop off of the right /// side, and if cropping the height, take all of it off the /// bottom). (1.0, 0.0) will crop from the bottom left /// corner, etc. (i.e. if cropping the width, take all the /// crop off the left side, and if cropping the height, take /// none from the top, and therefore all off the bottom). pub fn fit_into_destination(&self, centering: Option<(f64, f64)>) -> Self { let mut options = *self; options.cropping = SrcCropping::FitIntoDestination(centering.unwrap_or((0.5, 0.5))); options } /// Enable or disable consideration of the alpha channel when resizing. pub fn use_alpha(&self, v: bool) -> Self { let mut options = *self; options.mul_div_alpha = v; options } fn get_crop_box( &self, src_view: &impl ImageView, dst_view: &impl ImageView, ) -> CropBox { match self.cropping { SrcCropping::None => CropBox { left: 0., top: 0., width: src_view.width() as _, height: src_view.height() as _, }, SrcCropping::Crop(crop_box) => crop_box, SrcCropping::FitIntoDestination(centering) => CropBox::fit_src_into_dst_size( src_view.width(), src_view.height(), dst_view.width(), dst_view.height(), Some(centering), ), } } } /// Methods of this structure used to resize images. #[derive(Default, Debug, Clone)] pub struct Resizer { cpu_extensions: CpuExtensions, mul_div: MulDiv, alpha_buffer: Vec, convolution_buffer: Vec, super_sampling_buffer: Vec, } impl Resizer { /// Creates an instance of `Resizer`. /// /// By default, an instance of `Resizer` is created with the best CPU /// extensions provided by your CPU. /// You can change this by using the method [Resizer::set_cpu_extensions]. pub fn new() -> Self { Default::default() } /// Resize the source image to the size of the destination image and save /// the result to the latter's pixel buffer. pub fn resize<'o>( &mut self, src_image: &impl IntoImageView, dst_image: &mut impl IntoImageViewMut, options: impl Into>, ) -> Result<(), ResizeError> { let src_pixel_type = try_pixel_type(src_image)?; let dst_pixel_type = try_pixel_type(dst_image)?; if src_pixel_type != dst_pixel_type { return Err(ResizeError::PixelTypesAreDifferent); } use PixelType as PT; macro_rules! match_img { ( $src_image: ident, $dst_image: ident, $(($p: path, $pt: path),)* ) => ( match src_pixel_type { $( $p => { match ( $src_image.image_view::<$pt>(), $dst_image.image_view_mut::<$pt>(), ) { (Some(src), Some(mut dst)) => self.resize_typed(&src, &mut dst, options), _ => Err(ResizeError::PixelTypesAreDifferent), } } )* _ => Err(ResizeError::PixelTypesAreDifferent), } ) } #[cfg(not(feature = "only_u8x4"))] #[allow(unreachable_patterns)] let result = match_img!( src_image, dst_image, (PT::U8, pixels::U8), (PT::U8x2, pixels::U8x2), (PT::U8x3, pixels::U8x3), (PT::U8x4, pixels::U8x4), (PT::U16, pixels::U16), (PT::U16x2, pixels::U16x2), (PT::U16x3, pixels::U16x3), (PT::U16x4, pixels::U16x4), (PT::I32, pixels::I32), (PT::F32, pixels::F32), (PT::F32x2, pixels::F32x2), (PT::F32x3, pixels::F32x3), (PT::F32x4, pixels::F32x4), ); #[cfg(feature = "only_u8x4")] let result = match_img!(src_image, dst_image, (PT::U8x4, pixels::U8x4),); result } /// Resize the source image to the size of the destination image /// and save the result to the latter's pixel buffer. pub fn resize_typed<'o, P: PixelTrait>( &mut self, src_view: &impl ImageView, dst_view: &mut impl ImageViewMut, options: impl Into>, ) -> Result<(), ResizeError> { let default_options = ResizeOptions::default(); let options = options.into().unwrap_or(&default_options); let crop_box = options.get_crop_box(src_view, dst_view); if crop_box.width == 0. || crop_box.height == 0. || dst_view.width() == 0 || dst_view.height() == 0 { // Do nothing if any size of the source or destination image is equal to zero. return Ok(()); } let cropped_src_view = CroppedSrcImageView::crop(src_view, crop_box)?; if copy_image(&cropped_src_view, dst_view).is_ok() { // If `copy_image()` returns `Ok` it means that // the size of the destination image is equal to // the size of the cropped source image and // the copy operation has success. return Ok(()); } match options.algorithm { ResizeAlg::Nearest => resample_nearest(&cropped_src_view, dst_view), ResizeAlg::Convolution(filter_type) => self.resample_convolution( &cropped_src_view, dst_view, filter_type, true, options.mul_div_alpha, ), ResizeAlg::Interpolation(filter_type) => self.resample_convolution( &cropped_src_view, dst_view, filter_type, false, options.mul_div_alpha, ), ResizeAlg::SuperSampling(filter_type, multiplicity) => self.resample_super_sampling( &cropped_src_view, dst_view, filter_type, multiplicity, options.mul_div_alpha, ), } Ok(()) } /// Returns the size of internal buffers used to store the results of /// intermediate resizing steps. pub fn size_of_internal_buffers(&self) -> usize { (self.alpha_buffer.capacity() + self.convolution_buffer.capacity() + self.super_sampling_buffer.capacity()) * size_of::() } /// Deallocates the internal buffers used to store the results of /// intermediate resizing steps. pub fn reset_internal_buffers(&mut self) { if self.alpha_buffer.capacity() > 0 { self.alpha_buffer = Vec::new(); } if self.convolution_buffer.capacity() > 0 { self.convolution_buffer = Vec::new(); } if self.super_sampling_buffer.capacity() > 0 { self.super_sampling_buffer = Vec::new(); } } #[inline(always)] pub fn cpu_extensions(&self) -> CpuExtensions { self.cpu_extensions } /// # Safety /// This is unsafe because this method allows you to set a CPU extension /// that is not supported by your CPU. pub unsafe fn set_cpu_extensions(&mut self, extensions: CpuExtensions) { self.cpu_extensions = extensions; self.mul_div.set_cpu_extensions(extensions); } fn resample_convolution( &mut self, cropped_src_view: &CroppedSrcImageView>, dst_view: &mut impl ImageViewMut, filter_type: FilterType, adaptive_kernel_size: bool, use_alpha: bool, ) { if use_alpha && self.mul_div.is_supported(P::pixel_type()) { let src_view = cropped_src_view.image_view(); let mut alpha_buffer = std::mem::take(&mut self.alpha_buffer); let mut premultiplied_src = get_temp_image_from_buffer(&mut alpha_buffer, src_view.width(), src_view.height()); if self .mul_div .multiply_alpha_typed(src_view, &mut premultiplied_src) .is_ok() { // SAFETY: `premultiplied_src` has the same size as `src_view` let cropped_premultiplied_src = unsafe { CroppedSrcImageView::crop_unchecked( &premultiplied_src, cropped_src_view.crop_box(), ) }; self.do_convolution( &cropped_premultiplied_src, dst_view, filter_type, adaptive_kernel_size, ); self.mul_div.divide_alpha_inplace_typed(dst_view).unwrap(); self.alpha_buffer = alpha_buffer; return; } self.alpha_buffer = alpha_buffer; } self.do_convolution( cropped_src_view, dst_view, filter_type, adaptive_kernel_size, ); } fn do_convolution( &mut self, cropped_src_view: &CroppedSrcImageView>, dst_view: &mut impl ImageViewMut, filter_type: FilterType, adaptive_kernel_size: bool, ) { let src_view = cropped_src_view.image_view(); let crop_box = cropped_src_view.crop_box(); let (dst_width, dst_height) = (dst_view.width(), dst_view.height()); if dst_width == 0 || dst_height == 0 || crop_box.width <= 0. || crop_box.height <= 0. { return; } let (filter_fn, filter_support) = convolution::get_filter_func(filter_type); let need_horizontal = dst_width as f64 != crop_box.width || crop_box.left != crop_box.left.round(); let horiz_coeffs = need_horizontal.then(|| { test_log!("compute horizontal convolution coefficients"); convolution::precompute_coefficients( src_view.width(), crop_box.left, crop_box.left + crop_box.width, dst_width, filter_fn, filter_support, adaptive_kernel_size, ) }); let need_vertical = dst_height as f64 != crop_box.height || crop_box.top != crop_box.top.round(); let vert_coeffs = need_vertical.then(|| { test_log!("compute vertical convolution coefficients"); convolution::precompute_coefficients( src_view.height(), crop_box.top, crop_box.top + crop_box.height, dst_height, filter_fn, filter_support, adaptive_kernel_size, ) }); match (horiz_coeffs, vert_coeffs) { (Some(mut horiz_coeffs), Some(mut vert_coeffs)) => { if P::components_is_u8() { // For u8-based images, it is faster to do the vertical pass first // instead of the horizontal. let x_first = horiz_coeffs.bounds[0].start; // Last used col in the source image let last_x_bound = horiz_coeffs.bounds.last().unwrap(); let x_last = last_x_bound.start + last_x_bound.size; let temp_width = x_last - x_first; let mut temp_image = get_temp_image_from_buffer( &mut self.convolution_buffer, temp_width, dst_height, ); P::vert_convolution( src_view, &mut temp_image, x_first, vert_coeffs, self.cpu_extensions, ); // Shift bounds for the horizontal pass horiz_coeffs .bounds .iter_mut() .for_each(|b| b.start -= x_first); P::horiz_convolution( &temp_image, dst_view, 0, horiz_coeffs, self.cpu_extensions, ); } else { let y_first = vert_coeffs.bounds[0].start; // Last used row in the source image let last_y_bound = vert_coeffs.bounds.last().unwrap(); let y_last = last_y_bound.start + last_y_bound.size; let temp_height = y_last - y_first; let mut temp_image = get_temp_image_from_buffer( &mut self.convolution_buffer, dst_width, temp_height, ); P::horiz_convolution( src_view, &mut temp_image, y_first, horiz_coeffs, self.cpu_extensions, ); // Shift bounds for the vertical pass vert_coeffs .bounds .iter_mut() .for_each(|b| b.start -= y_first); P::vert_convolution(&temp_image, dst_view, 0, vert_coeffs, self.cpu_extensions); } } (Some(horiz_coeffs), None) => { P::horiz_convolution( src_view, dst_view, crop_box.top as u32, // crop_box.top is exactly an integer if the vertical pass is not required horiz_coeffs, self.cpu_extensions, ); } (None, Some(vert_coeffs)) => { P::vert_convolution( src_view, dst_view, crop_box.left as u32, // crop_box.left is exactly an integer if the horizontal pass is not required vert_coeffs, self.cpu_extensions, ); } _ => {} } } fn resample_super_sampling( &mut self, cropped_src_view: &CroppedSrcImageView>, dst_view: &mut impl ImageViewMut, filter_type: FilterType, multiplicity: u8, use_alpha: bool, ) { let crop_box = cropped_src_view.crop_box(); let dst_width = dst_view.width(); let dst_height = dst_view.height(); if dst_width == 0 || dst_height == 0 || crop_box.width <= 0. || crop_box.height <= 0. { return; } let width_scale = crop_box.width / dst_width as f64; let height_scale = crop_box.height / dst_height as f64; // It makes sense to resize the image in two steps only if the image // size is greater than the required size by multiplicity times. let factor = width_scale.min(height_scale) / multiplicity as f64; if factor > 1.2 { // The first step is resizing the source image by the fastest algorithm. // The temporary image will be about ``multiplicity`` times larger // than required. let tmp_width = (crop_box.width / factor).round() as u32; let tmp_height = (crop_box.height / factor).round() as u32; let mut super_sampling_buffer = std::mem::take(&mut self.super_sampling_buffer); let mut tmp_img = get_temp_image_from_buffer(&mut super_sampling_buffer, tmp_width, tmp_height); resample_nearest(cropped_src_view, &mut tmp_img); // The second step is resizing the temporary image with a convolution. let cropped_tmp_img = CroppedSrcImageView::new(&tmp_img); self.resample_convolution(&cropped_tmp_img, dst_view, filter_type, true, use_alpha); self.super_sampling_buffer = super_sampling_buffer; } else { // There is no point in doing the resizing in two steps. // We immediately resize the original image with a convolution. self.resample_convolution(cropped_src_view, dst_view, filter_type, true, use_alpha); } } } /// Creates an inner image container from part of the given buffer. /// Buffer may be expanded if its size is less than required for the image. fn get_temp_image_from_buffer( buffer: &mut Vec, width: u32, height: u32, ) -> TypedImage<'_, P> { let pixels_count = width as usize * height as usize; // Add pixel size as a gap for alignment of resulted buffer. let buf_size = pixels_count * P::size() + P::size(); if buffer.len() < buf_size { buffer.resize(buf_size, 0); } let pixels = unsafe { buffer.align_to_mut::

().1 }; TypedImage::from_pixels_slice(width, height, &mut pixels[0..pixels_count]).unwrap() } fn resample_nearest( cropped_src_view: &CroppedSrcImageView>, dst_view: &mut impl ImageViewMut, ) { let (dst_width, dst_height) = (dst_view.width(), dst_view.height()); let src_view = cropped_src_view.image_view(); let crop_box = cropped_src_view.crop_box(); if dst_width == 0 || dst_height == 0 || crop_box.width <= 0. || crop_box.height <= 0. { return; } let x_scale = crop_box.width / dst_width as f64; let y_scale = crop_box.height / dst_height as f64; // Pretabulate horizontal pixel positions let x_in_start = crop_box.left + x_scale * 0.5; let max_src_x = src_view.width() as usize; let x_in_tab: Vec = (0..dst_width) .map(|x| ((x_in_start + x_scale * x as f64) as usize).min(max_src_x)) .collect(); let y_in_start = crop_box.top + y_scale * 0.5; let src_rows = src_view.iter_rows_with_step(y_in_start, y_scale, dst_height); let dst_rows = dst_view.iter_rows_mut(0); #[cfg(feature = "rayon")] { use rayon::prelude::*; let mut row_refs: Vec<(&mut [P], &[P])> = dst_rows.zip(src_rows).collect(); row_refs.par_iter_mut().for_each(|(out_row, in_row)| { for (&x_in, out_pixel) in x_in_tab.iter().zip(out_row.iter_mut()) { // Safety of x_in value guaranteed by algorithm of creating of x_in_tab *out_pixel = unsafe { *in_row.get_unchecked(x_in) }; } }); } #[cfg(not(feature = "rayon"))] { for (out_row, in_row) in dst_rows.zip(src_rows) { for (&x_in, out_pixel) in x_in_tab.iter().zip(out_row.iter_mut()) { // Safety of x_in value guaranteed by algorithm of creating of x_in_tab *out_pixel = unsafe { *in_row.get_unchecked(x_in) }; } } } } /// Copy pixels from src_view into dst_view. pub(crate) fn copy_image( cropped_src_view: &CroppedSrcImageView, dst_view: &mut impl ImageViewMut, ) -> Result<(), DifferentDimensionsError> where S: ImageView, { let crop_box = cropped_src_view.crop_box(); if crop_box.left != crop_box.left.round() || crop_box.top != crop_box.top.round() || crop_box.width != crop_box.width.round() || crop_box.height != crop_box.height.round() { // The crop box has a fractional part in some his part return Err(DifferentDimensionsError); } if dst_view.width() != crop_box.width as u32 || dst_view.height() != crop_box.height as u32 { return Err(DifferentDimensionsError); } if dst_view.width() > 0 && dst_view.height() > 0 { dst_view .iter_rows_mut(0) .zip(iter_cropped_rows(cropped_src_view)) .for_each(|(d, s)| d.copy_from_slice(s)); } Ok(()) } fn iter_cropped_rows<'a, S: ImageView>( cropped_src_view: &'a CroppedSrcImageView, ) -> impl Iterator { let crop_box = cropped_src_view.crop_box(); let rows = cropped_src_view .image_view() .iter_rows(crop_box.top.max(0.) as u32) .take(crop_box.height.max(0.) as usize); let first_col = crop_box.left.max(0.) as usize; let last_col = first_col + crop_box.width.max(0.) as usize; rows.map(move |row| unsafe { row.get_unchecked(first_col..last_col) }) } fast_image_resize-5.3.0/src/simd_utils.rs000064400000000000000000000063011046102023000166220ustar 00000000000000use std::arch::x86_64::*; use crate::pixels::{U8x3, U8x4}; #[inline(always)] pub unsafe fn loadu_si128(buf: &[T], index: usize) -> __m128i { _mm_loadu_si128(buf.get_unchecked(index..).as_ptr() as *const __m128i) } #[inline(always)] pub unsafe fn loadu_si256(buf: &[T], index: usize) -> __m256i { _mm256_loadu_si256(buf.get_unchecked(index..).as_ptr() as *const __m256i) } #[inline(always)] pub unsafe fn loadl_epi16(buf: &[T], index: usize) -> __m128i { let mem_addr = buf.get_unchecked(index..).as_ptr() as *const i16; _mm_set_epi16(0, 0, 0, 0, 0, 0, 0, mem_addr.read_unaligned()) } #[inline(always)] pub unsafe fn loadl_epi32(buf: &[T], index: usize) -> __m128i { let mem_addr = buf.get_unchecked(index..).as_ptr() as *const i32; _mm_set_epi32(0, 0, 0, mem_addr.read_unaligned()) } #[inline(always)] pub unsafe fn loadl_epi64(buf: &[T], index: usize) -> __m128i { _mm_loadl_epi64(buf.get_unchecked(index..).as_ptr() as *const __m128i) } #[inline(always)] pub unsafe fn loadu_ps(buf: &[T], index: usize) -> __m128 { _mm_loadu_ps(buf.get_unchecked(index..).as_ptr() as *const f32) } #[inline(always)] pub unsafe fn loadu_ps256(buf: &[T], index: usize) -> __m256 { _mm256_loadu_ps(buf.get_unchecked(index..).as_ptr() as *const f32) } #[inline(always)] pub unsafe fn loadu_pd(buf: &[T], index: usize) -> __m128d { _mm_loadu_pd(buf.get_unchecked(index..).as_ptr() as *const f64) } #[inline(always)] pub unsafe fn loadu_pd256(buf: &[T], index: usize) -> __m256d { _mm256_loadu_pd(buf.get_unchecked(index..).as_ptr() as *const f64) } #[inline(always)] pub unsafe fn mm_cvtepu8_epi32(buf: &[U8x4], index: usize) -> __m128i { let v: i32 = i32::from_ne_bytes(buf.get_unchecked(index).0); _mm_cvtepu8_epi32(_mm_cvtsi32_si128(v)) } #[inline(always)] pub unsafe fn mm_cvtepu8_epi32_u8x3(buf: &[U8x3], index: usize) -> __m128i { let pixel = buf.get_unchecked(index).0; let v: i32 = i32::from_le_bytes([pixel[0], pixel[1], pixel[2], 0]); _mm_cvtepu8_epi32(_mm_cvtsi32_si128(v)) } #[inline(always)] pub unsafe fn mm_cvtepu8_epi32_from_u8(buf: &[u8], index: usize) -> __m128i { let ptr = buf.get_unchecked(index..).as_ptr() as *const i32; _mm_cvtepu8_epi32(_mm_cvtsi32_si128(ptr.read_unaligned())) } #[inline(always)] pub unsafe fn mm_cvtsi32_si128_from_u8(buf: &[u8], index: usize) -> __m128i { let ptr = buf.get_unchecked(index..).as_ptr() as *const i32; _mm_cvtsi32_si128(ptr.read_unaligned()) } #[inline(always)] pub unsafe fn mm_load_and_clone_i16x2(buf: &[i16]) -> __m128i { debug_assert!(buf.len() >= 2); _mm_set1_epi32((buf.as_ptr() as *const i32).read_unaligned()) } #[inline(always)] pub unsafe fn mm256_load_and_clone_i16x2(buf: &[i16]) -> __m256i { debug_assert!(buf.len() >= 2); _mm256_set1_epi32((buf.as_ptr() as *const i32).read_unaligned()) } #[inline(always)] pub unsafe fn ptr_i16_to_set1_epi64x(buf: &[i16], index: usize) -> __m128i { _mm_set1_epi64x((buf.get_unchecked(index..).as_ptr() as *const i64).read_unaligned()) } #[inline(always)] pub unsafe fn ptr_i16_to_256set1_epi64x(buf: &[i16], index: usize) -> __m256i { _mm256_set1_epi64x((buf.get_unchecked(index..).as_ptr() as *const i64).read_unaligned()) } fast_image_resize-5.3.0/src/testing.rs000064400000000000000000000011551046102023000161250ustar 00000000000000use std::cell::RefCell; thread_local!(static TEST_LOGS: RefCell> = const { RefCell::new(Vec::new()) }); pub fn log_message(msg: &str) { TEST_LOGS.with(|f| { let mut logs = f.borrow_mut(); logs.push(msg.to_string()); }); } pub fn logs_contain(msg: &str) -> bool { TEST_LOGS.with(|f| { let logs = f.borrow(); for line in logs.iter() { if line.contains(msg) { return true; } } false }) } pub fn clear_log() { TEST_LOGS.with(|f| { let mut logs = f.borrow_mut(); logs.clear(); }) } fast_image_resize-5.3.0/src/threading.rs000064400000000000000000000075721046102023000164260ustar 00000000000000use std::num::NonZeroU32; use rayon::current_num_threads; use rayon::prelude::*; use crate::pixels::InnerPixel; use crate::{ImageView, ImageViewMut}; const ONE: NonZeroU32 = NonZeroU32::MIN; #[inline] fn non_zero_or_one(num: u32) -> NonZeroU32 { NonZeroU32::new(num).unwrap_or(ONE) } #[inline] pub(crate) fn split_h_two_images_for_threading<'a, P: InnerPixel>( src_view: &'a impl ImageView, dst_view: &'a mut impl ImageViewMut, src_offset: u32, ) -> Option< impl ParallelIterator< Item = ( impl ImageView + 'a, impl ImageViewMut + 'a, ), >, > { debug_assert!(src_view.height() - src_offset >= dst_view.height()); let dst_width = dst_view.width(); let dst_height = dst_view.height(); let num_threads = non_zero_or_one(current_num_threads() as u32); let max_width = dst_width.max(src_view.width()); let num_parts = calculate_max_number_of_horizonal_parts(max_width, dst_height).min(num_threads); if num_parts > ONE { let dst_height = NonZeroU32::new(dst_height)?; if let Some(src_parts) = src_view.split_by_height(src_offset, dst_height, num_parts) { if let Some(dst_parts) = dst_view.split_by_height_mut(0, dst_height, num_parts) { let src_iter = src_parts.into_par_iter(); let dst_iter = dst_parts.into_par_iter(); return Some(src_iter.zip(dst_iter)); } } } None } #[inline] pub(crate) fn split_h_one_image_for_threading( image_view: &mut impl ImageViewMut, ) -> Option + '_>> { let width = image_view.width(); let height = image_view.height(); let num_threads = non_zero_or_one(current_num_threads() as u32); let num_parts = calculate_max_number_of_horizonal_parts(width, height).min(num_threads); if num_parts > ONE { let height = NonZeroU32::new(height)?; let img_parts = image_view.split_by_height_mut(0, height, num_parts); return img_parts.map(|parts| parts.into_par_iter()); } None } #[inline] pub(crate) fn split_v_two_images_for_threading<'a, P: InnerPixel>( src_view: &'a impl ImageView, dst_view: &'a mut impl ImageViewMut, src_offset: u32, ) -> Option< impl ParallelIterator< Item = ( impl ImageView + 'a, impl ImageViewMut + 'a, ), >, > { debug_assert!(src_view.width() - src_offset >= dst_view.width()); let dst_width = dst_view.width(); let dst_height = dst_view.height(); let num_threads = non_zero_or_one(current_num_threads() as u32); let max_height = dst_height.max(src_view.height()); let num_parts = calculate_max_number_of_vertical_parts(dst_width, max_height).min(num_threads); if num_parts > ONE { let dst_width = NonZeroU32::new(dst_width).unwrap(); if let Some(src_parts) = src_view.split_by_width(src_offset, dst_width, num_parts) { if let Some(dst_parts) = dst_view.split_by_width_mut(0, dst_width, num_parts) { let src_iter = src_parts.into_par_iter(); let dst_iter = dst_parts.into_par_iter(); return Some(src_iter.zip(dst_iter)); } } } None } const PIXELS_PER_THREAD: u64 = 1024; // It was selected as a result of simple benchmarking. fn calculate_max_number_of_horizonal_parts(width: u32, height: u32) -> NonZeroU32 { let area = width as u64 * height as u64; let num_parts = (area / PIXELS_PER_THREAD).min(height as _) as u32; non_zero_or_one(num_parts) } fn calculate_max_number_of_vertical_parts(width: u32, height: u32) -> NonZeroU32 { let area = width as u64 * height as u64; let num_parts = (area / PIXELS_PER_THREAD).min(width as _) as u32; non_zero_or_one(num_parts) } fast_image_resize-5.3.0/src/utils.rs000064400000000000000000000013321046102023000156050ustar 00000000000000/// Pre-reading data from memory increases speed slightly for some operations #[inline(always)] pub(crate) fn foreach_with_pre_reading( mut iter: impl Iterator, mut read_data: impl FnMut(I) -> D, mut process_data: impl FnMut(D), ) { let mut next_data: D; if let Some(src) = iter.next() { next_data = read_data(src); for src in iter { let data = next_data; next_data = read_data(src); process_data(data); } process_data(next_data); } } macro_rules! test_log { ($s:expr) => { #[cfg(feature = "for_testing")] { use crate::testing::log_message; log_message($s); } }; } fast_image_resize-5.3.0/src/wasm32_utils.rs000064400000000000000000000064541046102023000170130ustar 00000000000000use std::arch::wasm32::*; use crate::pixels::{U8x3, U8x4}; #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn load_v128(buf: &[T], index: usize) -> v128 { v128_load(buf.get_unchecked(index..).as_ptr() as *const v128) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn loadl_i64(buf: &[T], index: usize) -> v128 { let p = buf.get_unchecked(index..).as_ptr() as *const i64; i64x2(p.read_unaligned(), 0) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn loadl_i32(buf: &[T], index: usize) -> v128 { let p = buf.get_unchecked(index..).as_ptr() as *const i32; i32x4(p.read_unaligned(), 0, 0, 0) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn loadl_i16(buf: &[T], index: usize) -> v128 { let p = buf.get_unchecked(index..).as_ptr() as *const i16; i16x8(p.read_unaligned(), 0, 0, 0, 0, 0, 0, 0) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn ptr_i16_to_set1_i64(buf: &[i16], index: usize) -> v128 { let p = buf.get_unchecked(index..).as_ptr() as *const i64; i64x2_splat(p.read_unaligned()) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn ptr_i16_to_set1_i32(buf: &[i16], index: usize) -> v128 { let p = buf.get_unchecked(index..).as_ptr() as *const i32; i32x4_splat(p.read_unaligned()) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn i32x4_extend_low_ptr_u8(buf: &[u8], index: usize) -> v128 { let p = buf.get_unchecked(index..).as_ptr() as *const v128; u32x4_extend_low_u16x8(i16x8_extend_low_u8x16(v128_load(p))) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn i32x4_extend_low_ptr_u8x4(buf: &[U8x4], index: usize) -> v128 { let v: u32 = u32::from_le_bytes(buf.get_unchecked(index).0); u32x4_extend_low_u16x8(i16x8_extend_low_u8x16(u32x4(v, 0, 0, 0))) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn i32x4_extend_low_ptr_u8x3(buf: &[U8x3], index: usize) -> v128 { let pixel = buf.get_unchecked(index).0; i32x4(pixel[0] as i32, pixel[1] as i32, pixel[2] as i32, 0) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn i32x4_v128_from_u8(buf: &[u8], index: usize) -> v128 { let p = buf.get_unchecked(index..).as_ptr() as *const i32; i32x4(p.read_unaligned(), 0, 0, 0) } // #[inline] // #[target_feature(enable = "simd128")] // pub(crate) unsafe fn u16x8_mul_shr16(a_u16x8: v128, b_u16x8: v128) -> v128 { // let lo_u32x4 = u32x4_extmul_low_u16x8(a_u16x8, b_u16x8); // let hi_u32x4 = u32x4_extmul_high_u16x8(a_u16x8, b_u16x8); // i16x8_shuffle::<1, 3, 5, 7, 9, 11, 13, 15>(lo_u32x4, hi_u32x4) // } pub(crate) unsafe fn u16x8_mul_add_shr16(a_u16x8: v128, b_u16x8: v128, c: v128) -> v128 { let lo_u32x4 = u32x4_extmul_low_u16x8(a_u16x8, b_u16x8); let hi_u32x4 = u32x4_extmul_high_u16x8(a_u16x8, b_u16x8); let lo_u32x4 = u32x4_add(lo_u32x4, c); let hi_u32x4 = u32x4_add(hi_u32x4, c); i16x8_shuffle::<1, 3, 5, 7, 9, 11, 13, 15>(lo_u32x4, hi_u32x4) } #[inline] #[target_feature(enable = "simd128")] pub(crate) unsafe fn i64x2_mul_lo(a: v128, b: v128) -> v128 { const SHUFFLE: v128 = i8x16(0, 1, 2, 3, 8, 9, 10, 11, -1, -1, -1, -1, -1, -1, -1, -1); i64x2_extmul_low_i32x4(i8x16_swizzle(a, SHUFFLE), i8x16_swizzle(b, SHUFFLE)) } fast_image_resize-5.3.0/tests/alpha_tests.rs000064400000000000000000000375421046102023000173430ustar 00000000000000use fast_image_resize::images::{Image, TypedImage, TypedImageRef}; use fast_image_resize::{CpuExtensions, MulDiv, PixelTrait}; use testing::{cpu_ext_into_str, PixelTestingExt}; mod testing; #[derive(Clone, Copy, PartialEq, Eq)] enum Oper { Mul, Div, } fn mul_div_alpha_test( oper: Oper, src_pixels_tpl: &[P], expected_pixels_tpl: &[P], cpu_extensions: CpuExtensions, ) { assert_eq!(src_pixels_tpl.len(), expected_pixels_tpl.len()); if !cpu_extensions.is_supported() { println!( "Cpu Extensions '{}' not supported by your CPU", cpu_ext_into_str(cpu_extensions) ); return; } let height: u32 = 3; for width in [1, 9, 17, 25, 33, 41, 49, 57, 65] { let src_size = width as usize * height as usize; let src_pixels: Vec

= src_pixels_tpl .iter() .copied() .cycle() .take(src_size) .collect(); let mut dst_pixels = src_pixels.clone(); let src_image = TypedImageRef::new(width, height, &src_pixels).unwrap(); let mut dst_image = TypedImage::from_pixels_slice(width, height, &mut dst_pixels).unwrap(); let mut alpha_mul_div: MulDiv = Default::default(); unsafe { alpha_mul_div.set_cpu_extensions(cpu_extensions); } match oper { Oper::Mul => alpha_mul_div .multiply_alpha_typed(&src_image, &mut dst_image) .unwrap(), Oper::Div => alpha_mul_div .divide_alpha_typed(&src_image, &mut dst_image) .unwrap(), } let oper_str = if oper == Oper::Mul { "multiple" } else { "divide" }; let cpu_ext_str = cpu_ext_into_str(cpu_extensions); let expected_pixels: Vec

= expected_pixels_tpl .iter() .copied() .cycle() .take(src_size) .collect(); for ((s, r), e) in src_pixels .iter() .zip(dst_pixels) .zip(expected_pixels.iter()) { assert_eq!( r, *e, "failed test for {oper_str} alpha with '{cpu_ext_str}' CPU extensions \ and image width {width}: src={s:?}, result={r:?}, expected_result={e:?}", ); } // Inplace let mut src_pixels_clone = src_pixels.clone(); let mut image = TypedImage::from_pixels_slice(width, height, &mut src_pixels_clone).unwrap(); match oper { Oper::Mul => alpha_mul_div .multiply_alpha_inplace_typed(&mut image) .unwrap(), Oper::Div => alpha_mul_div .divide_alpha_inplace_typed(&mut image) .unwrap(), } for ((s, r), e) in src_pixels.iter().zip(src_pixels_clone).zip(expected_pixels) { assert_eq!( r, e, "failed inplace test for {oper_str} alpha with '{cpu_ext_str}' CPU extensions \ and image width {width}: src={s:?}, result={r:?}, expected_result={e:?}", ); } } } fn run_tests_with_real_image_u8(oper: Oper, expected_checksum: [u64; N]) where P: PixelTrait + PixelTestingExt, { let mut pixels = vec![0u8; 256 * 256 * N]; let mut i: usize = 0; for alpha in 0..=255u8 { for color in 0..=255u8 { let pixel = pixels.get_mut(i..i + N).unwrap(); for comp in pixel.iter_mut().take(N - 1) { *comp = color; } if let Some(c) = pixel.iter_mut().last() { *c = alpha; } i += N; } } let size = 256; let src_image = Image::from_vec_u8(size, size, pixels, P::pixel_type()).unwrap(); let mut dst_image = Image::new(size, size, P::pixel_type()); let mut alpha_mul_div: MulDiv = Default::default(); for cpu_extensions in P::cpu_extensions() { if !cpu_extensions.is_supported() { println!( "Cpu Extensions '{}' not supported by your CPU", cpu_ext_into_str(cpu_extensions) ); continue; } unsafe { alpha_mul_div.set_cpu_extensions(cpu_extensions); } match oper { Oper::Mul => { alpha_mul_div .multiply_alpha(&src_image, &mut dst_image) .unwrap(); } Oper::Div => { alpha_mul_div .divide_alpha(&src_image, &mut dst_image) .unwrap(); } } let oper_str = if oper == Oper::Mul { "multiple" } else { "divide" }; let pixel_type_str = P::pixel_type_str(); let cpu_ext_str = cpu_ext_into_str(cpu_extensions); let name = format!("{oper_str}_alpha_{pixel_type_str}-{cpu_ext_str}"); testing::save_result(&dst_image, &name); let checksum = testing::image_checksum::(&dst_image); assert_eq!( checksum, expected_checksum, "failed test for {oper_str} alpha real image: \ pixel_type={pixel_type_str}, cpu_extensions={cpu_ext_str}", ); } } mod u8_tests { use super::*; fn full_mul_div_alpha_test_u8>( create_pixel: fn(u8, u8) -> P, cpu_extensions: CpuExtensions, ) { const PRECISION: u32 = 8; const ALPHA_SCALE: u32 = 255u32 * (1 << (PRECISION + 1)); const ROUND_CORRECTION: u32 = 1 << (PRECISION - 1); for oper in [Oper::Mul, Oper::Div] { for color in 0u8..=255u8 { for alpha in 0u8..=255u8 { let result_color = if alpha == 0 { 0 } else { match oper { Oper::Mul => { let tmp = color as u32 * alpha as u32 + 128; (((tmp >> 8) + tmp) >> 8) as u8 } Oper::Div => { let recip_alpha = ((ALPHA_SCALE / alpha as u32) + 1) >> 1; let tmp = (color as u32 * recip_alpha + ROUND_CORRECTION) >> PRECISION; tmp.min(255) as u8 } } }; let src = [create_pixel(color, alpha)]; let res = [create_pixel(result_color, alpha)]; mul_div_alpha_test(oper, &src, &res, cpu_extensions); } } } } #[cfg(not(feature = "only_u8x4"))] #[cfg(test)] mod u8x2 { use fast_image_resize::pixels::U8x2; use super::*; type P = U8x2; const fn new_pixel(l: u8, a: u8) -> P { P::new([l, a]) } #[test] fn mul_div_alpha_test() { for cpu_extensions in P::cpu_extensions() { full_mul_div_alpha_test_u8(new_pixel, cpu_extensions); } } #[test] fn multiply_real_image() { run_tests_with_real_image_u8::(Oper::Mul, [4177920, 8355840]); } #[test] fn divide_real_image() { run_tests_with_real_image_u8::(Oper::Div, [12452343, 8355840]); } } #[cfg(test)] mod u8x4 { use fast_image_resize::pixels::U8x4; use super::*; type P = U8x4; const fn new_pixel(c: u8, a: u8) -> P { P::new([c, c, c, a]) } #[test] fn mul_div_alpha_test() { for cpu_extensions in P::cpu_extensions() { full_mul_div_alpha_test_u8(new_pixel, cpu_extensions); } } #[test] fn multiply_real_image() { run_tests_with_real_image_u8::(Oper::Mul, [4177920, 4177920, 4177920, 8355840]); } #[test] fn divide_real_image() { run_tests_with_real_image_u8::( Oper::Div, [12452343, 12452343, 12452343, 8355840], ); } } } #[cfg(not(feature = "only_u8x4"))] mod u16_tests { use super::*; struct TestCaseU16 { pub color: u16, pub alpha: u16, pub expected_color: u16, } const fn new_case_16(c: u16, a: u16, e: u16) -> TestCaseU16 { TestCaseU16 { color: c, alpha: a, expected_color: e, } } fn get_mul_test_cases_u16

(create_pixel: fn(u16, u16) -> P) -> (Vec

, Vec

) where P: PixelTrait, { let test_cases = [ new_case_16(0xffff, 0x8000, 0x8000), new_case_16(0x8000, 0x8000, 0x4000), new_case_16(0, 0x8000, 0), new_case_16(0xffff, 0xffff, 0xffff), new_case_16(0x8000, 0xffff, 0x8000), new_case_16(0, 0xffff, 0), new_case_16(0xffff, 0, 0), new_case_16(0x8000, 0, 0), new_case_16(0, 0, 0), ]; let mut scr_pixels = vec![]; let mut expected_pixels = vec![]; for case in test_cases { scr_pixels.push(create_pixel(case.color, case.alpha)); expected_pixels.push(create_pixel(case.expected_color, case.alpha)); } (scr_pixels, expected_pixels) } fn get_div_test_cases_u16

(create_pixel: fn(u16, u16) -> P) -> (Vec

, Vec

) where P: PixelTrait, { let test_cases = [ new_case_16(0x8000, 0x8000, 0xffff), new_case_16(0x4000, 0x8000, 0x8000), new_case_16(0, 0x8000, 0), new_case_16(0xffff, 0xffff, 0xffff), new_case_16(0x8000, 0xffff, 0x8000), new_case_16(1, 2, 32768), new_case_16(0, 0xffff, 0), new_case_16(0xffff, 0, 0), new_case_16(0x8000, 0, 0), new_case_16(0, 0, 0), new_case_16(0xffff, 0xc0c0, 0xffff), ]; let mut scr_pixels = vec![]; let mut expected_pixels = vec![]; for case in test_cases { scr_pixels.push(create_pixel(case.color, case.alpha)); expected_pixels.push(create_pixel(case.expected_color, case.alpha)); } (scr_pixels, expected_pixels) } #[cfg(test)] mod u16x2 { use fast_image_resize::pixels::U16x2; use super::*; type P = U16x2; const fn new_pixel(l: u16, a: u16) -> P { P::new([l, a]) } #[test] fn multiple_alpha_test() { let (scr_pixels, expected_pixels) = get_mul_test_cases_u16(new_pixel); for cpu_extensions in P::cpu_extensions() { mul_div_alpha_test(Oper::Mul, &scr_pixels, &expected_pixels, cpu_extensions); } } #[test] fn divide_alpha_test() { let (scr_pixels, expected_pixels) = get_div_test_cases_u16(new_pixel); for cpu_extensions in P::cpu_extensions() { mul_div_alpha_test(Oper::Div, &scr_pixels, &expected_pixels, cpu_extensions); } } } #[cfg(test)] mod u16x4 { use fast_image_resize::pixels::U16x4; use super::*; type P = U16x4; const fn new_pixel(c: u16, a: u16) -> P { P::new([c, c, c, a]) } #[test] fn multiple_alpha_test() { let (scr_pixels, expected_pixels) = get_mul_test_cases_u16(new_pixel); for cpu_extensions in P::cpu_extensions() { mul_div_alpha_test(Oper::Mul, &scr_pixels, &expected_pixels, cpu_extensions); } } #[test] fn divide_alpha_test() { let (scr_pixels, expected_pixels) = get_div_test_cases_u16(new_pixel); for cpu_extensions in P::cpu_extensions() { mul_div_alpha_test(Oper::Div, &scr_pixels, &expected_pixels, cpu_extensions); } } } } #[cfg(not(feature = "only_u8x4"))] mod f32_tests { use super::*; struct TestCaseF32 { pub color: f32, pub alpha: f32, pub expected_color: f32, } const fn new_case_f32(c: f32, a: f32, e: f32) -> TestCaseF32 { TestCaseF32 { color: c, alpha: a, expected_color: e, } } fn get_mul_test_cases_f32

(create_pixel: fn(f32, f32) -> P) -> (Vec

, Vec

) where P: PixelTrait, { let test_cases = [ new_case_f32(1., 0.5, 0.5), new_case_f32(0.5, 0.5, 0.25), new_case_f32(0., 0.5, 0.), new_case_f32(1., 1., 1.), new_case_f32(0.5, 1., 0.5), new_case_f32(0., 1., 0.), new_case_f32(1., 0., 0.), new_case_f32(0.5, 0., 0.), new_case_f32(0., 0., 0.), ]; let mut scr_pixels = vec![]; let mut expected_pixels = vec![]; for case in test_cases { scr_pixels.push(create_pixel(case.color, case.alpha)); expected_pixels.push(create_pixel(case.expected_color, case.alpha)); } (scr_pixels, expected_pixels) } fn get_div_test_cases_f32

(create_pixel: fn(f32, f32) -> P) -> (Vec

, Vec

) where P: PixelTrait, { let test_cases = [ new_case_f32(0.5, 0.5, 1.), new_case_f32(0.25, 0.5, 0.5), new_case_f32(0., 0.5, 0.), new_case_f32(1., 1., 1.), new_case_f32(0.5, 1., 0.5), new_case_f32(0.00001, 0.00002, 0.00001 / 0.00002), new_case_f32(0., 1., 0.), new_case_f32(1., 0., 0.), new_case_f32(0.5, 0., 0.), new_case_f32(0., 0., 0.), // f32 can afford to have a value greater than 1.0 new_case_f32(1., 0.7, 1. / 0.7), ]; let mut scr_pixels = vec![]; let mut expected_pixels = vec![]; for case in test_cases { scr_pixels.push(create_pixel(case.color, case.alpha)); expected_pixels.push(create_pixel(case.expected_color, case.alpha)); } (scr_pixels, expected_pixels) } #[cfg(test)] mod f32x2 { use fast_image_resize::pixels::F32x2; use super::*; type P = F32x2; const fn new_pixel(c: f32, a: f32) -> P { P::new([c, a]) } #[test] fn multiple_alpha_test() { let (scr_pixels, expected_pixels) = get_mul_test_cases_f32(new_pixel); for cpu_extensions in P::cpu_extensions() { mul_div_alpha_test(Oper::Mul, &scr_pixels, &expected_pixels, cpu_extensions); } } #[test] fn divide_alpha_test() { let (scr_pixels, expected_pixels) = get_div_test_cases_f32(new_pixel); for cpu_extensions in P::cpu_extensions() { mul_div_alpha_test(Oper::Div, &scr_pixels, &expected_pixels, cpu_extensions); } } } #[cfg(test)] mod f32x4 { use fast_image_resize::pixels::F32x4; use super::*; type P = F32x4; const fn new_pixel(c: f32, a: f32) -> P { P::new([c, c, c, a]) } #[test] fn multiple_alpha_test() { let (scr_pixels, expected_pixels) = get_mul_test_cases_f32(new_pixel); for cpu_extensions in P::cpu_extensions() { mul_div_alpha_test(Oper::Mul, &scr_pixels, &expected_pixels, cpu_extensions); } } #[test] fn divide_alpha_test() { let (scr_pixels, expected_pixels) = get_div_test_cases_f32(new_pixel); for cpu_extensions in P::cpu_extensions() { mul_div_alpha_test(Oper::Div, &scr_pixels, &expected_pixels, cpu_extensions); } } } } fast_image_resize-5.3.0/tests/color_tests.rs000064400000000000000000000150331046102023000173630ustar 00000000000000use fast_image_resize as fr; use fast_image_resize::images::Image; use fast_image_resize::pixels::*; mod testing; #[cfg(not(feature = "only_u8x4"))] mod gamma_tests { use super::*; #[test] fn gamma22_into_linear_test() { let mapper = fr::create_gamma_22_mapper(); let buffer: Vec = (0u8..=255).collect(); let src_image = Image::from_vec_u8(16, 16, buffer, PixelType::U8).unwrap(); let src_checksum = testing::image_checksum::(&src_image); assert_eq!(src_checksum, [32640]); // into U8 let mut dst_image = Image::new(16, 16, PixelType::U8); mapper.forward_map(&src_image, &mut dst_image).unwrap(); let dst_checksum = testing::image_checksum::(&dst_image); assert_eq!(dst_checksum, [20443]); // into U16 let mut dst_image = Image::new(16, 16, PixelType::U16); mapper.forward_map(&src_image, &mut dst_image).unwrap(); let dst_checksum = testing::image_checksum::(&dst_image); assert_eq!(dst_checksum, [5255141]); } #[test] fn gamma22_into_linear_errors_test() { let mapper = fr::create_gamma_22_mapper(); let buffer: Vec = (0u8..=255).collect(); let src_image = Image::from_vec_u8(16, 16, buffer, PixelType::U8).unwrap(); let mut dst_image = Image::new(16, 1, PixelType::U8); let result = mapper.forward_map(&src_image, &mut dst_image); assert!(matches!(result, Err(fr::MappingError::DifferentDimensions))); let mut dst_image = Image::new(16, 16, PixelType::U8x2); let result = mapper.forward_map(&src_image, &mut dst_image); assert!(matches!( result, Err(fr::MappingError::UnsupportedCombinationOfImageTypes) )); } #[test] fn linear_into_gamma22_test() { let mapper = fr::create_gamma_22_mapper(); let buffer: Vec = (0u8..=255).collect(); let src_image = Image::from_vec_u8(16, 16, buffer, PixelType::U8).unwrap(); let mut dst_image = Image::new(16, 16, PixelType::U8); mapper.backward_map(&src_image, &mut dst_image).unwrap(); let src_checksum = testing::image_checksum::(&src_image); assert_eq!(src_checksum, [32640]); let dst_checksum = testing::image_checksum::(&dst_image); assert_eq!(dst_checksum, [44824]); } #[test] fn linear_into_gamma22_errors_test() { let mapper = fr::create_gamma_22_mapper(); let buffer: Vec = (0u8..=255).collect(); let src_image = Image::from_vec_u8(16, 16, buffer, PixelType::U8).unwrap(); let mut dst_image = Image::new(16, 1, PixelType::U8); let result = mapper.backward_map(&src_image, &mut dst_image); assert!(matches!(result, Err(fr::MappingError::DifferentDimensions))); let mut dst_image = Image::new(16, 16, PixelType::U8x2); let result = mapper.backward_map(&src_image, &mut dst_image); assert!(matches!( result, Err(fr::MappingError::UnsupportedCombinationOfImageTypes) )); } } #[cfg(not(feature = "only_u8x4"))] mod srgb_tests { use super::*; #[test] fn srgb_into_rgb_test() { let mapper = fr::create_srgb_mapper(); let buffer: Vec = (0u8..=255).flat_map(|v| [v, v, v]).collect(); let src_image = Image::from_vec_u8(16, 16, buffer, PixelType::U8x3).unwrap(); let src_checksum = testing::image_checksum::(&src_image); assert_eq!(src_checksum, [32640, 32640, 32640]); let mut dst_image = Image::new(16, 16, PixelType::U8x3); mapper.forward_map(&src_image, &mut dst_image).unwrap(); let dst_checksum = testing::image_checksum::(&dst_image); assert_eq!(dst_checksum, [20304, 20304, 20304]); } #[test] fn srgba_into_rgba_test() { let mapper = fr::create_srgb_mapper(); let buffer: Vec = (0u8..=255).flat_map(|v| [v, v, v, 255]).collect(); let src_image = Image::from_vec_u8(16, 16, buffer, PixelType::U8x4).unwrap(); let src_checksum = testing::image_checksum::(&src_image); assert_eq!(src_checksum, [32640, 32640, 32640, 65280]); let mut dst_image = Image::new(16, 16, PixelType::U8x4); mapper.forward_map(&src_image, &mut dst_image).unwrap(); let dst_checksum = testing::image_checksum::(&dst_image); assert_eq!(dst_checksum, [20304, 20304, 20304, 65280]); } #[test] fn srgb_into_rgb_errors_test() { let mapper = fr::create_srgb_mapper(); let buffer: Vec = (0u8..=255).flat_map(|v| [v, v, v]).collect(); let src_image = Image::from_vec_u8(16, 16, buffer, PixelType::U8x3).unwrap(); let mut dst_image = Image::new(16, 1, PixelType::U8x3); let result = mapper.forward_map(&src_image, &mut dst_image); assert!(matches!(result, Err(fr::MappingError::DifferentDimensions))); let mut dst_image = Image::new(16, 16, PixelType::U8x2); let result = mapper.forward_map(&src_image, &mut dst_image); assert!(matches!( result, Err(fr::MappingError::UnsupportedCombinationOfImageTypes) )); } #[test] fn rgb_into_srgb_test() { let mapper = fr::create_srgb_mapper(); let buffer: Vec = (0u8..=255).flat_map(|v| [v, v, v]).collect(); let src_image = Image::from_vec_u8(16, 16, buffer, PixelType::U8x3).unwrap(); let src_checksum = testing::image_checksum::(&src_image); assert_eq!(src_checksum, [32640, 32640, 32640]); let mut dst_image = Image::new(16, 16, PixelType::U8x3); mapper.backward_map(&src_image, &mut dst_image).unwrap(); let dst_checksum = testing::image_checksum::(&dst_image); assert_eq!(dst_checksum, [44981, 44981, 44981]); } #[test] fn rgb_into_srgb_errors_test() { let mapper = fr::create_srgb_mapper(); let buffer: Vec = (0u8..=255).flat_map(|v| [v, v, v]).collect(); let src_image = Image::from_vec_u8(16, 16, buffer, PixelType::U8x3).unwrap(); let mut dst_image = Image::new(16, 1, PixelType::U8x3); let result = mapper.backward_map(&src_image, &mut dst_image); assert!(matches!(result, Err(fr::MappingError::DifferentDimensions))); let mut dst_image = Image::new(16, 16, PixelType::U8x2); let result = mapper.backward_map(&src_image, &mut dst_image); assert!(matches!( result, Err(fr::MappingError::UnsupportedCombinationOfImageTypes) )); } } fast_image_resize-5.3.0/tests/image_view.rs000064400000000000000000000074221046102023000171420ustar 00000000000000use fast_image_resize::images::{TypedCroppedImageMut, TypedImage}; use fast_image_resize::pixels::U8; use fast_image_resize::{ImageView, ImageViewMut}; use testing::non_zero_u32; mod testing; mod split_by_width { use super::*; use fast_image_resize::images::{TypedCroppedImage, TypedImageRef}; fn split(img: &T) { for num_parts in 1..16 { let res = img .split_by_width(0, non_zero_u32(512), non_zero_u32(num_parts)) .unwrap(); assert_eq!(res.len() as u32, num_parts); let sum_width = res.iter().map(|v| v.width()).sum::(); assert_eq!(sum_width, 512); } } fn split_mut(img: &mut T) { for num_parts in 1..16 { let res = img .split_by_width_mut(0, non_zero_u32(512), non_zero_u32(num_parts)) .unwrap(); assert_eq!(res.len() as u32, num_parts); let sum_width = res.iter().map(|v| v.width()).sum::(); assert_eq!(sum_width, 512); } } #[test] fn typed_image_ref() { let width = 512; let height = 384; let buffer = vec![U8::new(0); (width * height) as usize]; let img = TypedImageRef::::new(width, height, &buffer).unwrap(); split(&img); } #[test] fn typed_image() { let mut img = TypedImage::::new(512, 384); split(&img); split_mut(&mut img); } #[test] fn typed_cropped_image() { let img = TypedImage::::new(512 + 20, 384 + 20); let cropped_img = TypedCroppedImage::from_ref(&img, 10, 10, 512, 384).unwrap(); split(&cropped_img); } #[test] fn typed_cropped_image_mut() { let mut img = TypedImage::::new(512 + 20, 384 + 20); let mut cropped_img = TypedCroppedImageMut::from_ref(&mut img, 10, 10, 512, 384).unwrap(); split(&cropped_img); split_mut(&mut cropped_img); } } mod split_by_height { use super::*; use fast_image_resize::images::{TypedCroppedImage, TypedImageRef}; fn split(img: &T) { for num_parts in 1..16 { let res = img .split_by_height(0, non_zero_u32(512), non_zero_u32(num_parts)) .unwrap(); assert_eq!(res.len() as u32, num_parts); let sum_height = res.iter().map(|v| v.height()).sum::(); assert_eq!(sum_height, 512); } } fn split_mut(img: &mut T) { for num_parts in 1..16 { let res = img .split_by_height_mut(0, non_zero_u32(512), non_zero_u32(num_parts)) .unwrap(); assert_eq!(res.len() as u32, num_parts); let sum_height = res.iter().map(|v| v.height()).sum::(); assert_eq!(sum_height, 512); } } #[test] fn typed_image_ref() { let width = 384; let height = 512; let buffer = vec![U8::new(0); (width * height) as usize]; let img = TypedImageRef::::new(width, height, &buffer).unwrap(); split(&img); } #[test] fn typed_image() { let mut img: TypedImage = TypedImage::new(384, 512); split(&img); split_mut(&mut img); } #[test] fn typed_cropped_image() { let img = TypedImage::::new(384 + 20, 512 + 20); let cropped_img = TypedCroppedImage::from_ref(&img, 10, 10, 384, 512).unwrap(); split(&cropped_img); } #[test] fn typed_cropped_image_mut() { let mut img: TypedImage = TypedImage::new(384 + 20, 512 + 20); let mut cropped_img = TypedCroppedImageMut::from_ref(&mut img, 10, 10, 384, 512).unwrap(); split(&cropped_img); split_mut(&mut cropped_img); } } fast_image_resize-5.3.0/tests/images_tests.rs000064400000000000000000000165341046102023000175210ustar 00000000000000use fast_image_resize as fr; use fast_image_resize::images::{ CroppedImage, CroppedImageMut, Image, ImageRef, TypedCroppedImage, TypedCroppedImageMut, TypedImage, TypedImageRef, }; use fast_image_resize::pixels::{U8x4, U8}; use fast_image_resize::{ImageView, IntoImageView, PixelType, ResizeOptions}; #[test] fn create_image_ref_from_small_buffer() { let width = 64; let height = 32; let buffer = vec![0; 64 * 30]; let res = ImageRef::new(width, height, &buffer, PixelType::U8); assert_eq!(res.unwrap_err(), fr::ImageBufferError::InvalidBufferSize); } #[test] fn create_image_from_small_buffer() { let width = 64; let height = 32; let mut buffer = vec![0; 64 * 30]; let res = Image::from_slice_u8(width, height, &mut buffer, PixelType::U8); assert_eq!(res.unwrap_err(), fr::ImageBufferError::InvalidBufferSize); let res = Image::from_vec_u8(width, height, buffer, PixelType::U8); assert_eq!(res.unwrap_err(), fr::ImageBufferError::InvalidBufferSize); } #[test] fn create_image_from_big_buffer() { let width = 64; let height = 32; let mut buffer = vec![0; 65 * 32]; let res = Image::from_slice_u8(width, height, &mut buffer, PixelType::U8); assert!(res.is_ok()); let res = Image::from_vec_u8(width, height, buffer, PixelType::U8); assert!(res.is_ok()); } #[test] fn create_type_image_ref_from_small_buffer() { let width = 64; let height = 32; let buffer = vec![U8::new(0); 64 * 30]; let res = TypedImageRef::::new(width, height, &buffer); assert!(matches!(res, Err(fr::InvalidPixelsSize))); } #[test] fn create_typed_image_from_small_buffer() { let width = 64; let height = 32; let mut buffer = vec![0; 64 * 30]; let res = TypedImage::::from_buffer(width, height, &mut buffer); assert_eq!(res.unwrap_err(), fr::ImageBufferError::InvalidBufferSize); let res = TypedImageRef::::from_buffer(width, height, &buffer); assert_eq!(res.unwrap_err(), fr::ImageBufferError::InvalidBufferSize); } #[test] fn create_typed_image_from_big_buffer() { let width = 64; let height = 32; let mut buffer = vec![0; 65 * 32]; let res = TypedImage::::from_buffer(width, height, &mut buffer); assert!(res.is_ok()); let res = TypedImageRef::::from_buffer(width, height, &buffer); assert!(res.is_ok()); } #[test] fn typed_cropped_image() { const BLACK: U8x4 = U8x4::new([0; 4]); const WHITE: U8x4 = U8x4::new([255; 4]); let mut source_pixels = Vec::with_capacity(64 * 64); source_pixels.extend((0..64 * 64).map(|i| { let y = i / 64; if (10..54).contains(&y) { let x = i % 64; if (10..54).contains(&x) { return WHITE; } } BLACK })); // Black source image with white square inside let src_image = TypedImage::::from_pixels(64, 64, source_pixels).unwrap(); // Black destination image let mut dst_image = TypedImage::::new(40, 40); let cropped_src_image = TypedCroppedImage::from_ref(&src_image, 10, 10, 44, 44).unwrap(); assert_eq!(cropped_src_image.width(), 44); assert_eq!(cropped_src_image.height(), 44); let mut resizer = fr::Resizer::new(); resizer .resize_typed( &cropped_src_image, &mut dst_image, &ResizeOptions::new().resize_alg(fr::ResizeAlg::Nearest), ) .unwrap(); let white_block = vec![WHITE; 40 * 40]; assert_eq!(dst_image.pixels(), white_block); } #[test] fn cropped_image() { const BLACK: U8x4 = U8x4::new([0; 4]); const WHITE: U8x4 = U8x4::new([255; 4]); let mut source_pixels = Vec::with_capacity(64 * 64); source_pixels.extend((0..64 * 64).map(|i| { let y = i / 64; if (10..54).contains(&y) { let x = i % 64; if (10..54).contains(&x) { return WHITE; } } BLACK })); // Black source image with white square inside let src_image = ImageRef::from_pixels(64, 64, &source_pixels).unwrap(); // Black destination image let mut dst_image = Image::new(40, 40, PixelType::U8x4); let cropped_src_image = CroppedImage::new(&src_image, 10, 10, 44, 44).unwrap(); assert_eq!(cropped_src_image.width(), 44); assert_eq!(cropped_src_image.height(), 44); let mut resizer = fr::Resizer::new(); resizer .resize( &cropped_src_image, &mut dst_image, &ResizeOptions::new().resize_alg(fr::ResizeAlg::Nearest), ) .unwrap(); let dst_typed_image = dst_image.typed_image().unwrap(); let dst_pixels: &[U8x4] = dst_typed_image.pixels(); let white_block = vec![WHITE; 40 * 40]; assert_eq!(dst_pixels, white_block); } #[test] fn typed_cropped_image_mut() { const BLACK: U8x4 = U8x4::new([0; 4]); const WHITE: U8x4 = U8x4::new([255; 4]); // White source image let src_image = TypedImage::from_pixels(64, 32, vec![WHITE; 64 * 32]).unwrap(); // Black destination image let mut dst_image = TypedImage::::new(64, 32); let mut cropped_dst_image = TypedCroppedImageMut::from_ref(&mut dst_image, 10, 10, 44, 12).unwrap(); assert_eq!(cropped_dst_image.width(), 44); assert_eq!(cropped_dst_image.height(), 12); let mut resizer = fr::Resizer::new(); resizer .resize_typed( &src_image, &mut cropped_dst_image, &ResizeOptions::new().resize_alg(fr::ResizeAlg::Nearest), ) .unwrap(); let dst_pixels = dst_image.pixels(); let row_size: usize = 64; let black_block = vec![BLACK; 10 * row_size]; // Top border assert_eq!(dst_pixels[0..10 * row_size], black_block); // Middle rows let mut middle_row = vec![BLACK; 10]; middle_row.extend(vec![WHITE; 44]); middle_row.extend(vec![BLACK; 10]); for row in dst_pixels.chunks_exact(row_size).skip(10).take(12) { assert_eq!(row, middle_row); } // Bottom border assert_eq!(dst_pixels[22 * row_size..], black_block); } #[test] fn cropped_image_mut() { const BLACK: U8x4 = U8x4::new([0; 4]); const WHITE: U8x4 = U8x4::new([255; 4]); // White source image let src_pixels = vec![WHITE; 64 * 32]; let src_image = ImageRef::from_pixels(64, 32, &src_pixels).unwrap(); // Black destination image let mut dst_image = Image::new(64, 32, PixelType::U8x4); let mut cropped_dst_image = CroppedImageMut::new(&mut dst_image, 10, 10, 44, 12).unwrap(); assert_eq!(cropped_dst_image.width(), 44); assert_eq!(cropped_dst_image.height(), 12); let mut resizer = fr::Resizer::new(); resizer .resize( &src_image, &mut cropped_dst_image, &ResizeOptions::new().resize_alg(fr::ResizeAlg::Nearest), ) .unwrap(); let dst_typed_image = dst_image.typed_image().unwrap(); let dst_pixels: &[U8x4] = dst_typed_image.pixels(); let row_size: usize = 64; let black_block = vec![BLACK; 10 * row_size]; // Top border assert_eq!(dst_pixels[0..10 * row_size], black_block); // Middle rows let mut middle_row = vec![BLACK; 10]; middle_row.extend(vec![WHITE; 44]); middle_row.extend(vec![BLACK; 10]); for row in dst_pixels.chunks_exact(row_size).skip(10).take(12) { assert_eq!(row, middle_row); } // Bottom border assert_eq!(dst_pixels[22 * row_size..], black_block); } fast_image_resize-5.3.0/tests/resize_tests.rs000064400000000000000000001004371046102023000175510ustar 00000000000000use std::cmp::Ordering; use std::fmt::Debug; use fast_image_resize::images::Image; use fast_image_resize::pixels::*; use fast_image_resize::{ testing as fr_testing, CpuExtensions, CropBoxError, Filter, FilterType, IntoImageView, PixelTrait, PixelType, ResizeAlg, ResizeError, ResizeOptions, Resizer, }; use testing::{cpu_ext_into_str, image_checksum, save_result, PixelTestingExt}; mod testing; fn get_new_height(src_image: &impl IntoImageView, new_width: u32) -> u32 { let scale = new_width as f32 / src_image.width() as f32; (src_image.height() as f32 * scale).round() as u32 } const NEW_WIDTH: u32 = 255; const NEW_BIG_WIDTH: u32 = 5016; #[test] fn try_resize_to_other_pixel_type() { let mut resizer = Resizer::new(); let src_image = U8x4::load_big_src_image(); let mut dst_image = Image::new(1024, 256, PixelType::U8); assert!(matches!( resizer.resize( &src_image, &mut dst_image, &ResizeOptions::new().resize_alg(ResizeAlg::Nearest), ), Err(ResizeError::PixelTypesAreDifferent) )); } #[test] fn resize_to_same_size() { let width = 100; let height = 80; let buffer: Vec = (0..8000) .map(|v| (v & 0xff) as u8) .flat_map(|v| [v; 4]) .collect(); let src_image = Image::from_vec_u8(width, height, buffer, PixelType::U8x4).unwrap(); let mut dst_image = Image::new(width, height, PixelType::U8x4); Resizer::new() .resize(&src_image, &mut dst_image, None) .unwrap(); assert!(matches!( src_image.buffer().cmp(dst_image.buffer()), Ordering::Equal )); } #[test] fn resize_to_same_size_after_cropping() { let width = 100; let height = 80; let src_width = 120; let src_height = 100; let buffer: Vec = (0..12000) .map(|v| (v & 0xff) as u8) .flat_map(|v| [v; 4]) .collect(); let src_image = Image::from_vec_u8(src_width, src_height, buffer, PixelType::U8x4).unwrap(); let mut dst_image = Image::new(width, height, PixelType::U8x4); let options = ResizeOptions::new().crop(10., 10., width as _, height as _); Resizer::new() .resize(&src_image, &mut dst_image, &options) .unwrap(); let cropped_buffer: Vec = (0..12000u32) .filter_map(|v| { let row = v / 120; let col = v % 120; if (10..90u32).contains(&row) && (10..110u32).contains(&col) { Some((v & 0xff) as u8) } else { None } }) .flat_map(|v| [v; 4]) .collect(); let dst_buffer = dst_image.into_vec(); assert!(matches!(cropped_buffer.cmp(&dst_buffer), Ordering::Equal)); } /// In this test, we check that resizer won't use horizontal convolution /// if the width of destination image is equal to the width of cropped source image. fn resize_to_same_width( pixel_type: PixelType, cpu_extensions: CpuExtensions, create_pixel: fn(v: u8) -> [u8; C], ) { fr_testing::clear_log(); let width = 100; let height = 80; let src_width = 120; let src_height = 100; // Image columns are made up of pixels of the same color. let buffer: Vec = (0..12000) .flat_map(|v| create_pixel((v % 120) as u8)) .collect(); let src_image = Image::from_vec_u8(src_width, src_height, buffer, pixel_type).unwrap(); let mut dst_image = Image::new(width, height, pixel_type); let mut resizer = Resizer::new(); unsafe { resizer.set_cpu_extensions(cpu_extensions); } resizer .resize( &src_image, &mut dst_image, &ResizeOptions::new() .crop(10., 0., width as _, src_height as _) .use_alpha(false), ) .unwrap(); assert!(fr_testing::logs_contain( "compute vertical convolution coefficients" )); assert!(!fr_testing::logs_contain( "compute horizontal convolution coefficients" )); let expected_result: Vec = (0..8000u32) .flat_map(|v| create_pixel((10 + v % 100) as u8)) .collect(); let dst_buffer = dst_image.into_vec(); assert!( matches!(expected_result.cmp(&dst_buffer), Ordering::Equal), "Resizing result is not equal to expected ones ({:?}, {:?})", pixel_type, cpu_extensions ); } /// In this test, we check that resizer won't use vertical convolution /// if the height of destination image is equal to the height of cropped source image. fn resize_to_same_height( pixel_type: PixelType, cpu_extensions: CpuExtensions, create_pixel: fn(v: u8) -> [u8; C], ) { fr_testing::clear_log(); let width = 100; let height = 80; let src_width = 120; let src_height = 100; // Image rows are made up of pixels of the same color. let buffer: Vec = (0..12000) .flat_map(|v| create_pixel((v / 120) as u8)) .collect(); let src_image = Image::from_vec_u8(src_width, src_height, buffer, pixel_type).unwrap(); let mut dst_image = Image::new(width, height, pixel_type); let mut resizer = Resizer::new(); unsafe { resizer.set_cpu_extensions(cpu_extensions); } resizer .resize( &src_image, &mut dst_image, &ResizeOptions::new() .crop(0., 10., src_width as _, height as _) .use_alpha(false), ) .unwrap(); assert!(!fr_testing::logs_contain( "compute vertical convolution coefficients" )); assert!(fr_testing::logs_contain( "compute horizontal convolution coefficients" )); let expected_result: Vec = (0..8000u32) .flat_map(|v| create_pixel((10 + v / 100) as u8)) .collect(); let dst_buffer = dst_image.into_vec(); assert!( matches!(expected_result.cmp(&dst_buffer), Ordering::Equal), "Resizing result is not equal to expected ones ({:?}, {:?})", pixel_type, cpu_extensions ); } #[test] fn resize_to_same_width_or_height_after_cropping() { let mut cpu_extensions_vec = vec![CpuExtensions::None]; #[cfg(target_arch = "x86_64")] { cpu_extensions_vec.push(CpuExtensions::Sse4_1); cpu_extensions_vec.push(CpuExtensions::Avx2); } #[cfg(target_arch = "aarch64")] { cpu_extensions_vec.push(CpuExtensions::Neon); } #[cfg(target_arch = "wasm32")] { cpu_extensions_vec.push(CpuExtensions::Simd128); } for cpu_extensions in cpu_extensions_vec { if !cpu_extensions.is_supported() { continue; } resize_to_same_width(PixelType::U8x4, cpu_extensions, |v| [v; 4]); #[cfg(not(feature = "only_u8x4"))] { resize_to_same_width(PixelType::U8, cpu_extensions, |v| [v]); resize_to_same_width(PixelType::U8x2, cpu_extensions, |v| [v; 2]); resize_to_same_width(PixelType::U8x3, cpu_extensions, |v| [v; 3]); resize_to_same_width(PixelType::U16, cpu_extensions, |v| [v, 0]); resize_to_same_width(PixelType::U16x2, cpu_extensions, |v| [v, 0, v, 0]); resize_to_same_width(PixelType::U16x3, cpu_extensions, |v| [v, 0, v, 0, v, 0]); resize_to_same_width(PixelType::U16x4, cpu_extensions, |v| { [v, 0, v, 0, v, 0, v, 0] }); } resize_to_same_height(PixelType::U8x4, cpu_extensions, |v| [v; 4]); #[cfg(not(feature = "only_u8x4"))] { resize_to_same_height(PixelType::U8, cpu_extensions, |v| [v]); resize_to_same_height(PixelType::U8x2, cpu_extensions, |v| [v; 2]); resize_to_same_height(PixelType::U8x3, cpu_extensions, |v| [v; 3]); resize_to_same_height(PixelType::U16, cpu_extensions, |v| [v, 0]); resize_to_same_height(PixelType::U16x2, cpu_extensions, |v| [v, 0, v, 0]); resize_to_same_height(PixelType::U16x3, cpu_extensions, |v| [v, 0, v, 0, v, 0]); resize_to_same_height(PixelType::U16x4, cpu_extensions, |v| { [v, 0, v, 0, v, 0, v, 0] }); } } #[cfg(not(feature = "only_u8x4"))] { resize_to_same_width(PixelType::I32, CpuExtensions::None, |v| { (v as i32).to_le_bytes() }); resize_to_same_width(PixelType::F32, CpuExtensions::None, |v| { (v as f32).to_le_bytes() }); resize_to_same_height(PixelType::I32, CpuExtensions::None, |v| { (v as i32).to_le_bytes() }); resize_to_same_height(PixelType::F32, CpuExtensions::None, |v| { (v as f32).to_le_bytes() }); } } trait ResizeTest { fn downscale_test(resize_alg: ResizeAlg, cpu_extensions: CpuExtensions, checksum: [u64; CC]); fn upscale_test(resize_alg: ResizeAlg, cpu_extensions: CpuExtensions, checksum: [u64; CC]); } impl ResizeTest for Pixel where Self: PixelTestingExt + PixelTrait, T: Sized + Copy + Clone + Debug + PartialEq + Default + 'static, C: PixelComponent, { fn downscale_test(resize_alg: ResizeAlg, cpu_extensions: CpuExtensions, checksum: [u64; CC]) { if !cpu_extensions.is_supported() { println!( "Cpu Extensions '{}' not supported by your CPU", cpu_ext_into_str(cpu_extensions) ); return; } let image = Self::load_big_src_image(); assert_eq!(image.pixel_type(), Self::pixel_type()); let mut resizer = Resizer::new(); unsafe { resizer.set_cpu_extensions(cpu_extensions); } let new_height = get_new_height(&image, NEW_WIDTH); let mut result = Image::new(NEW_WIDTH, new_height, image.pixel_type()); resizer .resize( &image, &mut result, &ResizeOptions::new().resize_alg(resize_alg).use_alpha(false), ) .unwrap(); let alg_name = match resize_alg { ResizeAlg::Nearest => "nearest", ResizeAlg::Convolution(filter) => match filter { FilterType::Box => "box", FilterType::Bilinear => "bilinear", FilterType::Hamming => "hamming", FilterType::Mitchell => "mitchell", FilterType::CatmullRom => "catmullrom", FilterType::Gaussian => "gaussian", FilterType::Lanczos3 => "lanczos3", _ => "unknown", }, ResizeAlg::Interpolation(filter) => match filter { FilterType::Box => "inter_box", FilterType::Bilinear => "inter_bilinear", FilterType::Hamming => "inter_hamming", FilterType::Mitchell => "inter_mitchell", FilterType::CatmullRom => "inter_catmullrom", FilterType::Gaussian => "inter_gaussian", FilterType::Lanczos3 => "inter_lanczos3", _ => "inter_unknown", }, ResizeAlg::SuperSampling(_, _) => "supersampling", _ => "unknown", }; let name = format!( "downscale-{}-{}-{}", Self::pixel_type_str(), alg_name, cpu_ext_into_str(cpu_extensions), ); save_result(&result, &name); assert_eq!( image_checksum::(&result), checksum, "Error in checksum for {cpu_extensions:?}", ); } fn upscale_test(resize_alg: ResizeAlg, cpu_extensions: CpuExtensions, checksum: [u64; CC]) { if !cpu_extensions.is_supported() { println!( "Cpu Extensions '{}' not supported by your CPU", cpu_ext_into_str(cpu_extensions) ); return; } let image = Self::load_small_src_image(); assert_eq!(image.pixel_type(), Self::pixel_type()); let mut resizer = Resizer::new(); unsafe { resizer.set_cpu_extensions(cpu_extensions); } let new_height = get_new_height(&image, NEW_BIG_WIDTH); let mut result = Image::new(NEW_BIG_WIDTH, new_height, image.pixel_type()); resizer .resize( &image, &mut result, &ResizeOptions::new().resize_alg(resize_alg).use_alpha(false), ) .unwrap(); let alg_name = match resize_alg { ResizeAlg::Nearest => "nearest", ResizeAlg::Convolution(filter) => match filter { FilterType::Box => "box", FilterType::Bilinear => "bilinear", FilterType::Hamming => "hamming", FilterType::Mitchell => "mitchell", FilterType::CatmullRom => "catmullrom", FilterType::Lanczos3 => "lanczos3", _ => "unknown", }, ResizeAlg::SuperSampling(_, _) => "supersampling", _ => "unknown", }; let name = format!( "upscale-{}-{}-{}", Self::pixel_type_str(), alg_name, cpu_ext_into_str(cpu_extensions), ); save_result(&result, &name); assert_eq!( image_checksum::(&result), checksum, "Error in checksum for {:?}", cpu_extensions ); } } #[cfg(not(feature = "only_u8x4"))] mod not_u8x4 { use super::*; #[test] fn downscale_u8() { type P = U8; P::downscale_test(ResizeAlg::Nearest, CpuExtensions::None, [2920348]); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [2923555], ); } } #[test] fn upscale_u8() { type P = U8; P::upscale_test(ResizeAlg::Nearest, CpuExtensions::None, [1148754010]); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [1148811829], ); } } #[test] fn downscale_u8x2() { type P = U8x2; P::downscale_test(ResizeAlg::Nearest, CpuExtensions::None, [2920348, 6121802]); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [2923555, 6122718], ); } } #[test] fn upscale_u8x2() { type P = U8x2; P::upscale_test( ResizeAlg::Nearest, CpuExtensions::None, [1146218632, 2364895380], ); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [1146284886, 2364890085], ); } } #[test] fn downscale_u8x3() { type P = U8x3; P::downscale_test( ResizeAlg::Nearest, CpuExtensions::None, [2937940, 2945380, 2882679], ); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [2942547, 2947799, 2885025], ); } } #[test] fn upscale_u8x3() { type P = U8x3; P::upscale_test( ResizeAlg::Nearest, CpuExtensions::None, [1156008260, 1158417906, 1135087540], ); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [1156107445, 1158443938, 1135102297], ); } } #[test] fn resize_u8x3_interpolation() { type P = U8x3; P::downscale_test( ResizeAlg::Interpolation(FilterType::Bilinear), CpuExtensions::None, [2938733, 2946338, 2883813], ); P::upscale_test( ResizeAlg::Interpolation(FilterType::Bilinear), CpuExtensions::None, [1156013474, 1158419787, 1135090328], ); } #[test] fn downscale_u16() { type P = U16; P::downscale_test(ResizeAlg::Nearest, CpuExtensions::None, [750529436]); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [751401243], ); } } #[test] fn upscale_u16() { type P = U16; P::upscale_test(ResizeAlg::Nearest, CpuExtensions::None, [295229780570]); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [295246940755], ); } } #[test] fn downscale_u16x2() { type P = U16x2; P::downscale_test( ResizeAlg::Nearest, CpuExtensions::None, [750529436, 1573303114], ); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [751401243, 1573563971], ); } } #[test] fn upscale_u16x2() { type P = U16x2; P::upscale_test( ResizeAlg::Nearest, CpuExtensions::None, [294578188424, 607778112660], ); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [294597368766, 607776760273], ); } } #[test] fn downscale_u16x3() { type P = U16x3; P::downscale_test( ResizeAlg::Nearest, CpuExtensions::None, [755050580, 756962660, 740848503], ); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [756269847, 757632467, 741478612], ); } } #[test] fn upscale_u16x3() { type P = U16x3; P::upscale_test( ResizeAlg::Nearest, CpuExtensions::None, [297094122820, 297713401842, 291717497780], ); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [297122154090, 297723994984, 291725294637], ); } } #[test] fn downscale_u16x4() { type P = U16x4; P::downscale_test( ResizeAlg::Nearest, CpuExtensions::None, [755050580, 756962660, 740848503, 1573303114], ); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [756269847, 757632467, 741478612, 1573563971], ); } } #[test] fn upscale_u16x4() { type P = U16x4; P::upscale_test( ResizeAlg::Nearest, CpuExtensions::None, [296859917949, 296229709231, 288684470903, 607778112660], ); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [296888688348, 296243667797, 288698172180, 607776760273], ); } } // I32 #[test] fn downscale_i32() { type P = I32; P::downscale_test(ResizeAlg::Nearest, CpuExtensions::None, [24593724281554]); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [36889044005199], ); } } #[test] fn upscale_i32() { type P = I32; P::upscale_test(ResizeAlg::Nearest, CpuExtensions::None, [9674237252903955]); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [11090415545881916], ); } } // F32 #[test] fn downscale_f32() { type P = F32; P::downscale_test(ResizeAlg::Nearest, CpuExtensions::None, [28891951209032]); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [41687319249443], ); } } #[test] fn upscale_f32() { type P = F32; P::upscale_test(ResizeAlg::Nearest, CpuExtensions::None, [11165019414549868]); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [12506894762090128], ); } } // F32x2 #[test] fn downscale_f32x2() { type P = F32x2; P::downscale_test( ResizeAlg::Nearest, CpuExtensions::None, [28891951209032, 26023210300788], ); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [41687319249443, 29873206892121], ); } } #[test] fn upscale_f32x2() { type P = F32x2; P::upscale_test( ResizeAlg::Nearest, CpuExtensions::None, [9941292360529429, 10060767588318486], ); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [10426687457795354, 10465695788378423], ); } } // F32x3 #[test] fn downscale_f32x3() { type P = F32x3; P::downscale_test( ResizeAlg::Nearest, CpuExtensions::None, [28271357515050, 34102344731602, 34154875278897], ); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [41676905227663, 40314876714373, 40146438250679], ); } } #[test] fn upscale_f32x3() { type P = F32x3; P::upscale_test( ResizeAlg::Nearest, CpuExtensions::None, [10945976142696393, 13185868359050104, 13307431189096686], ); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [12058434766196853, 14081191473477964, 14079890133382920], ); } } // F32x4 #[test] fn downscale_f32x4() { type P = F32x4; P::downscale_test( ResizeAlg::Nearest, CpuExtensions::None, [ 28271357515050, 34102344731602, 34154875278897, 26023210300788, ], ); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [ 41676905227663, 40314876714373, 40146438250679, 29873206892121, ], ); } } #[test] fn upscale_f32x4() { type P = F32x4; P::upscale_test( ResizeAlg::Nearest, CpuExtensions::None, [ 9939217321658853, 9944847383085392, 9940281434784023, 10060767588318486, ], ); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [ 10431959951229359, 10423928285570087, 10420567450069105, 10465695788378423, ], ); } } #[test] fn fractional_cropping() { let mut src_buf = [0, 0, 0, 0, 255, 0, 0, 0, 0]; let src_image = Image::from_slice_u8(3, 3, &mut src_buf, PixelType::U8).unwrap(); let mut dst_image = Image::new(1, 1, PixelType::U8); let mut resizer = Resizer::new(); let options = ResizeOptions::new().resize_alg(ResizeAlg::Convolution(FilterType::Box)); // Resize without cropping resizer .resize(&src_image, &mut dst_image, &options) .unwrap(); assert_eq!(dst_image.buffer()[0], (255.0f32 / 9.0).round() as u8); // Resize with fractional cropping resizer .resize(&src_image, &mut dst_image, &options.crop(0.5, 0.5, 2., 2.)) .unwrap(); assert_eq!(dst_image.buffer()[0], (255.0f32 / 4.0).round() as u8); // Resize with integer cropping resizer .resize(&src_image, &mut dst_image, &options.crop(1., 1., 1., 1.)) .unwrap(); assert_eq!(dst_image.buffer()[0], 255); } } mod u8x4 { use std::f64::consts::PI; use fast_image_resize::ResizeError; use image::ImageReader; use super::*; type P = U8x4; #[test] fn invalid_crop_box() { let mut resizer = Resizer::new(); let src_image = Image::new(1, 1, P::pixel_type()); let mut dst_image = Image::new(2, 2, P::pixel_type()); let mut options = ResizeOptions::new().resize_alg(ResizeAlg::Nearest); for (left, top) in [(1., 0.), (0., 1.)] { options = options.crop(left, top, 1., 1.); assert_eq!( resizer.resize(&src_image, &mut dst_image, &options), Err(ResizeError::SrcCroppingError( CropBoxError::PositionIsOutOfImageBoundaries )) ); } for (width, height) in [(2., 1.), (1., 2.)] { options = options.crop(0., 0., width, height); assert_eq!( resizer.resize(&src_image, &mut dst_image, &options), Err(ResizeError::SrcCroppingError( CropBoxError::SizeIsOutOfImageBoundaries )) ); } for (width, height) in [(-1., 1.), (1., -1.)] { options = options.crop(0., 0., width, height); assert_eq!( resizer.resize(&src_image, &mut dst_image, &options), Err(ResizeError::SrcCroppingError( CropBoxError::WidthOrHeightLessThanZero )) ); } } #[test] fn downscale_u8x4() { P::downscale_test( ResizeAlg::Nearest, CpuExtensions::None, [2937940, 2945380, 2882679, 6121802], ); for cpu_extensions in P::cpu_extensions() { P::downscale_test( ResizeAlg::Convolution(FilterType::Gaussian), cpu_extensions, [2939881, 2946811, 2884299, 6122867], ); P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [2942547, 2947799, 2885025, 6122718], ); P::downscale_test( ResizeAlg::SuperSampling(FilterType::Lanczos3, 2), cpu_extensions, [2942426, 2947750, 2884861, 6123019], ); } } #[test] fn upscale_u8x4() { P::upscale_test( ResizeAlg::Nearest, CpuExtensions::None, [1155096957, 1152644783, 1123285879, 2364895380], ); for cpu_extensions in P::cpu_extensions() { P::upscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), cpu_extensions, [1155201879, 1152689565, 1123329272, 2364890085], ); } } #[test] fn custom_filter_u8x4() { std::env::set_var("DONT_SAVE_RESULT", "1"); const LANCZOS3_RESULT: [u64; 4] = [2942547, 2947799, 2885025, 6122718]; const LANCZOS4_RESULT: [u64; 4] = [2943083, 2948315, 2885436, 6122629]; P::downscale_test( ResizeAlg::Convolution(FilterType::Lanczos3), CpuExtensions::None, LANCZOS3_RESULT, ); fn sinc_filter(mut x: f64) -> f64 { if x == 0.0 { 1.0 } else { x *= PI; x.sin() / x } } fn lanczos3_filter(x: f64) -> f64 { if (-3.0..3.0).contains(&x) { sinc_filter(x) * sinc_filter(x / 3.) } else { 0.0 } } for bad_support in [0.0, -1.0, f64::NAN, f64::INFINITY, f64::NEG_INFINITY] { assert!(Filter::new("bad_support", lanczos3_filter, bad_support).is_err()); } let my_lanczos3 = Filter::new("MyLanczos3", lanczos3_filter, 3.0).unwrap(); P::downscale_test( ResizeAlg::Convolution(FilterType::Custom(my_lanczos3)), CpuExtensions::None, LANCZOS3_RESULT, ); fn lanczos4_filter(x: f64) -> f64 { if (-4.0..4.0).contains(&x) { sinc_filter(x) * sinc_filter(x / 4.) } else { 0.0 } } let my_lanczos4 = Filter::new("MyLanczos4", lanczos4_filter, 4.0).unwrap(); P::downscale_test( ResizeAlg::Convolution(FilterType::Custom(my_lanczos4)), CpuExtensions::None, LANCZOS4_RESULT, ); } #[test] fn cropping() { let img = ImageReader::open("./data/crop_test.png") .unwrap() .decode() .unwrap(); let width = img.width(); let height = img.height(); let src_image = Image::from_vec_u8(width, height, img.to_rgba8().into_raw(), PixelType::U8x4).unwrap(); let options = ResizeOptions::new().crop(521., 1414., 1485., 1486.); let mut dst_image = Image::new(1279, 1280, PixelType::U8x4); let mut resizer = Resizer::new(); let mut results = vec![]; for cpu_extensions in U8x4::cpu_extensions() { unsafe { resizer.set_cpu_extensions(cpu_extensions); } resizer .resize(&src_image, &mut dst_image, &options) .unwrap(); let cpu_ext_str = cpu_ext_into_str(cpu_extensions); save_result(&dst_image, &format!("crop_test-{}.png", cpu_ext_str)); results.push((image_checksum::(&dst_image), cpu_ext_str)); } for (checksum, cpu_ext_str) in results { assert_eq!( checksum, [0, 236287962, 170693682, 417465600], "checksum of result image was resized with cpu_extensions={} is incorrect", cpu_ext_str ); } } } #[cfg(feature = "rayon")] #[test] fn split_image_on_different_number_of_parts() { let src_image = Image::new(2176, 4608, PixelType::U8x4); let mut dst_image = Image::new(582, 552, src_image.pixel_type()); let options = ResizeOptions::new() .use_alpha(false) .resize_alg(ResizeAlg::Convolution(FilterType::Box)) .crop(740.0, 1645.2, 58.200000000000045, 55.299999999999955); for num in 2..32 { let mut builder = rayon::ThreadPoolBuilder::new(); builder = builder.num_threads(num); let pool = builder.build().unwrap(); pool.install(|| { let mut resizer = Resizer::new(); resizer .resize(&src_image, &mut dst_image, &options) .expect("resize failed with {num} threads"); }); } } fast_image_resize-5.3.0/tests/testing.rs000064400000000000000000000350421046102023000165020ustar 00000000000000use std::fs::File; use std::io::BufReader; use std::num::NonZeroU32; use std::ops::Deref; use fast_image_resize::images::Image; use fast_image_resize::pixels::*; use fast_image_resize::{change_type_of_pixel_components, CpuExtensions, PixelTrait, PixelType}; use image::{ColorType, ExtendedColorType, ImageBuffer, ImageReader}; pub fn non_zero_u32(v: u32) -> NonZeroU32 { NonZeroU32::new(v).unwrap() } pub fn image_checksum(image: &Image) -> [u64; N] { let buffer = image.buffer(); let mut res = [0u64; N]; let component_size = P::size() / P::count_of_components(); match component_size { 1 => { for pixel in buffer.chunks_exact(N) { res.iter_mut().zip(pixel).for_each(|(d, &s)| *d += s as u64); } } 2 => { let buffer_u16 = unsafe { buffer.align_to::().1 }; for pixel in buffer_u16.chunks_exact(N) { res.iter_mut().zip(pixel).for_each(|(d, &s)| *d += s as u64); } } 4 => { let buffer_u32 = unsafe { buffer.align_to::().1 }; for pixel in buffer_u32.chunks_exact(N) { res.iter_mut() .zip(pixel) .for_each(|(d, &s)| *d = d.overflowing_add(s as u64).0); } } _ => (), }; res } pub trait PixelTestingExt: PixelTrait { type ImagePixel: image::Pixel; type Container: Deref::Subpixel]>; fn pixel_type_str() -> &'static str { match Self::pixel_type() { PixelType::U8 => "u8", PixelType::U8x2 => "u8x2", PixelType::U8x3 => "u8x3", PixelType::U8x4 => "u8x4", PixelType::U16 => "u16", PixelType::U16x2 => "u16x2", PixelType::U16x3 => "u16x3", PixelType::U16x4 => "u16x4", PixelType::I32 => "i32", PixelType::F32 => "f32", PixelType::F32x2 => "f32x2", PixelType::F32x3 => "f32x3", PixelType::F32x4 => "f32x4", _ => unreachable!(), } } fn cpu_extensions() -> Vec { let mut cpu_extensions_vec = vec![CpuExtensions::None]; #[cfg(target_arch = "x86_64")] { cpu_extensions_vec.push(CpuExtensions::Sse4_1); cpu_extensions_vec.push(CpuExtensions::Avx2); } #[cfg(target_arch = "aarch64")] { cpu_extensions_vec.push(CpuExtensions::Neon); } #[cfg(target_arch = "wasm32")] { cpu_extensions_vec.push(CpuExtensions::Simd128); } cpu_extensions_vec } fn img_paths() -> (&'static str, &'static str, &'static str) { match Self::pixel_type() { PixelType::U8 | PixelType::U8x3 | PixelType::U16 | PixelType::U16x3 | PixelType::I32 | PixelType::F32 | PixelType::F32x3 => ( "./data/nasa-4928x3279.png", "./data/nasa-4019x4019.png", "./data/nasa-852x567.png", ), PixelType::U8x2 | PixelType::U8x4 | PixelType::U16x2 | PixelType::U16x4 | PixelType::F32x2 | PixelType::F32x4 => ( "./data/nasa-4928x3279-rgba.png", "./data/nasa-4019x4019-rgba.png", "./data/nasa-852x567-rgba.png", ), _ => unreachable!(), } } fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer; fn load_big_image() -> ImageBuffer { Self::load_image_buffer(ImageReader::open(Self::img_paths().0).unwrap()) } fn load_big_square_image() -> ImageBuffer { Self::load_image_buffer(ImageReader::open(Self::img_paths().1).unwrap()) } fn load_small_image() -> ImageBuffer { Self::load_image_buffer(ImageReader::open(Self::img_paths().2).unwrap()) } fn load_big_src_image() -> Image<'static> { let img = Self::load_big_image(); Image::from_vec_u8( img.width(), img.height(), Self::img_into_bytes(img), Self::pixel_type(), ) .unwrap() } fn load_big_square_src_image() -> Image<'static> { let img = Self::load_big_square_image(); Image::from_vec_u8( img.width(), img.height(), Self::img_into_bytes(img), Self::pixel_type(), ) .unwrap() } fn load_small_src_image() -> Image<'static> { let img = Self::load_small_image(); Image::from_vec_u8( img.width(), img.height(), Self::img_into_bytes(img), Self::pixel_type(), ) .unwrap() } fn img_into_bytes(img: ImageBuffer) -> Vec; } #[cfg(not(feature = "only_u8x4"))] pub mod not_u8x4 { use super::*; impl PixelTestingExt for U8 { type ImagePixel = image::Luma; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma8() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.into_raw() } } impl PixelTestingExt for U8x2 { type ImagePixel = image::LumaA; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma_alpha8() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.into_raw() } } impl PixelTestingExt for U8x3 { type ImagePixel = image::Rgb; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgb8() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.into_raw() } } impl PixelTestingExt for U16 { type ImagePixel = image::Luma; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma16() } fn img_into_bytes(img: ImageBuffer) -> Vec { // img.as_raw() // .iter() // .enumerate() // .flat_map(|(i, &c)| ((i & 0xffff) as u16).to_le_bytes()) // .collect() img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for U16x2 { type ImagePixel = image::LumaA; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma_alpha16() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for U16x3 { type ImagePixel = image::Rgb; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgb16() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for U16x4 { type ImagePixel = image::Rgba; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgba16() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for I32 { type ImagePixel = image::Luma; type Container = Vec; fn cpu_extensions() -> Vec { vec![CpuExtensions::None] } fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { let image_u16 = img_reader.decode().unwrap().to_luma32f(); ImageBuffer::from_fn(image_u16.width(), image_u16.height(), |x, y| { let pixel = image_u16.get_pixel(x, y); image::Luma::from([(pixel.0[0] * i32::MAX as f32).round() as i32]) }) } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw() .iter() .flat_map(|val| val.to_le_bytes()) .collect() } } impl PixelTestingExt for F32 { type ImagePixel = image::Luma; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma32f() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw() .iter() .flat_map(|val| val.to_le_bytes()) .collect() } } impl PixelTestingExt for F32x2 { type ImagePixel = image::LumaA; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_luma_alpha32f() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw() .iter() .flat_map(|val| val.to_le_bytes()) .collect() } } impl PixelTestingExt for F32x3 { type ImagePixel = image::Rgb; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgb32f() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } impl PixelTestingExt for F32x4 { type ImagePixel = image::Rgba; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgba32f() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.as_raw().iter().flat_map(|&c| c.to_le_bytes()).collect() } } } impl PixelTestingExt for U8x4 { type ImagePixel = image::Rgba; type Container = Vec; fn load_image_buffer( img_reader: ImageReader>, ) -> ImageBuffer { img_reader.decode().unwrap().to_rgba8() } fn img_into_bytes(img: ImageBuffer) -> Vec { img.into_raw() } } pub fn save_result(image: &Image, name: &str) { if std::env::var("SAVE_RESULT") .unwrap_or_else(|_| "".to_owned()) .is_empty() { return; } std::fs::create_dir_all("./data/result").unwrap(); let path = format!("./data/result/{name}.png"); let color_type: ExtendedColorType = match image.pixel_type() { PixelType::U8 => ColorType::L8.into(), PixelType::U8x2 => ColorType::La8.into(), PixelType::U8x3 => ColorType::Rgb8.into(), PixelType::U8x4 => ColorType::Rgba8.into(), PixelType::U16 => ColorType::L16.into(), PixelType::U16x2 => ColorType::La16.into(), PixelType::U16x3 => ColorType::Rgb16.into(), PixelType::U16x4 => ColorType::Rgba16.into(), PixelType::I32 | PixelType::F32 => { let mut image_u16 = Image::new(image.width(), image.height(), PixelType::U16); change_type_of_pixel_components(image, &mut image_u16).unwrap(); save_result(&image_u16, name); return; } PixelType::F32x2 => { let mut image_u16 = Image::new(image.width(), image.height(), PixelType::U16x2); change_type_of_pixel_components(image, &mut image_u16).unwrap(); save_result(&image_u16, name); return; } PixelType::F32x3 => { let mut image_u16 = Image::new(image.width(), image.height(), PixelType::U16x3); change_type_of_pixel_components(image, &mut image_u16).unwrap(); save_result(&image_u16, name); return; } PixelType::F32x4 => { let mut image_u16 = Image::new(image.width(), image.height(), PixelType::U16x4); change_type_of_pixel_components(image, &mut image_u16).unwrap(); save_result(&image_u16, name); return; } _ => panic!("Unsupported type of pixels"), }; image::save_buffer( path, image.buffer(), image.width(), image.height(), color_type, ) .unwrap(); } pub const fn cpu_ext_into_str(cpu_extensions: CpuExtensions) -> &'static str { match cpu_extensions { CpuExtensions::None => "rust", #[cfg(target_arch = "x86_64")] CpuExtensions::Sse4_1 => "sse4.1", #[cfg(target_arch = "x86_64")] CpuExtensions::Avx2 => "avx2", #[cfg(target_arch = "aarch64")] CpuExtensions::Neon => "neon", #[cfg(target_arch = "wasm32")] CpuExtensions::Simd128 => "simd128", } }