Introduction to the new Vector API
8 min readThe Vector API in Java, first introduced as an incubator module in Java 16, through Java 21, aims at optimizing the performance of computation-intensive operations through the utilization of SIMD (Single Instruction, Multiple Data) instructions. The goal of Vector API is for parallel processing of data, by significantly speeding up tasks that can be performed independently on large sets of data, such as mathematical calculations and data processing. This feature is particularly beneficial for applications in machine learning, scientific computing, and multimedia processing, where handling vast amounts of data efficiently is crucial.
The Java doc for the Vector API can be found here. Check it out for a comprehensive documentation. This article's introduction aims to simplify and clarify the core aspects of the API, helping software engineers understand it more easily.
1. What the Vector API is
It is a new API incubating since Java 16, that provides a way to access hardware-level optimizations for processing vectors. Vectors are arrays of primitive data types, such as floats or integers, that can be processed in parallel using the SIMD instructions. The Vector API provides a number of classes and interfaces that allow you to create, manipulate, and operate on vectors.
2. What the Vector API is NOT
There is a “Vector” class in Java 1.1, which is a legacy class that not related to the Vector API. The Vector class is a resizable array that can store objects of any type. The Vector class is not expected to use for processing vectors, and it is not recommended for use in the real-world applications.
3. Where is it from
To effectively understand Java’s new Vector API, it’s critical to understand the concept of essential data types and how computer hardware today process on the data set.
Scalars and CPU operation
A CPU is made up of several processing units, each capable of handling operations one at a time. These operations deal with scalar values, which are single data points. Operations on these scalars can be simple, like adding to a number (a unary operation), or more complex, like adding two numbers together (a binary operation). The time it takes for a processing unit to complete an operation varies, measured in cycles.
Parallelism in CPU
Modern CPUs usually include multiple cores, with each core containing many processing units. In this architecture, operations can be carried out in parallel across different threads, which significantly speeding up processes especially for large calculations. By dividing large datasets into smaller chunks and distributing them across threads, we can achieve faster processing times through parallel execution.
Parallellism in SIMD
Parallel computing takes a different turn with SIMD (Single Instruction, Multiple Data) processors, which operate without multithreading. In SIMD processors, multiple processing units execute the same operation on different data points simultaneously, in just one CPU cycle. This is done by loading an array of data into the processors and performing the operations in parallel. Unlike traditional processors that handle scalar values one at a time, SIMD processors work with vectors, i.e arrays of data— allowing for efficient parallel processing of large datasets without relying on concurrent programming techniques.
The Vector API in Java leverages the power of SIMD processors to perform vector operations, making parallel processing tasks more efficient.
4. Key Concepts
The Vector API allows for operations like adding two arrays using vectors, significantly faster than scalar operations due to parallel processing capabilities. This is achieved through methods like fromArray() for creating vectors from arrays, and operations like add() for combining vectors. Let’s look into the API in more details.
4.1 Vectors and Their Representation
The Vector API in Java allows for efficient representation and manipulation of vectors containing primitive types. It utilizes specific classes for each primitive type: ByteVector, ShortVector, IntVector, LongVector, FloatVector, and DoubleVector. These classes enable operations on vectors that are executed using SIMD instructions on supported CPUs.
public static void main(String[] args) {
// Specify the preferred species for a vector of integers. This will depend on the CPU's capabilities.
VectorSpecies<Integer> species = IntVector.SPECIES_PREFERRED;
// Example arrays to perform operations on
int[] array1 = {1, 2, 3, 4, 5, 6, 7, 8};
int[] array2 = {8, 7, 6, 5, 4, 3, 2, 1};
int[] resultArray = new int[array1.length];
// Load arrays into vectors
IntVector vector1 = IntVector.fromArray(species, array1, 0);
IntVector vector2 = IntVector.fromArray(species, array2, 0);
// Perform an element-wise addition
IntVector resultVector = vector1.add(vector2);
// Store the result back into an array
resultVector.intoArray(resultArray, 0);
// Print the result
System.out.println("Result of vector addition: ");
for (int i : resultArray) {
System.out.print(i + " ");
}
}
4.2 Shapes, Species, and Lanes
The API defines vectors by their size in bits (ranging from 64 to 512 bits) and elements, termed as lanes. The "shape" of a vector indicates its bit-size, while "species" refers to the combination of shape and data type, facilitating operations on vectors of specific types and sizes.
public static void main(String[] args) {
// Define the species with a specific shape.
// This example uses 256 bits as the vector size, which can fit 8 integers (32 bits each).
VectorSpecies<Integer> species256 = IntVector.SPECIES_256;
// Prepare two arrays of integers for demonstration.
int[] array1 = {1, 2, 3, 4, 5, 6, 7, 8};
int[] array2 = {8, 7, 6, 5, 4, 3, 2, 1};
int[] resultArray = new int[species256.length()]; // Ensure the result array matches the species length.
// Load the arrays into vectors.
IntVector vector1 = IntVector.fromArray(species256, array1, 0);
IntVector vector2 = IntVector.fromArray(species256, array2, 0);
// Perform an element-wise addition of the two vectors.
IntVector resultVector = vector1.add(vector2);
// Store the result back into an array.
resultVector.intoArray(resultArray, 0);
// Output the results.
System.out.println("Result of vector addition:");
for (int i : resultArray) {
System.out.print(i + " ");
}
}
In this example, IntVector.SPECIES_256 specifies a species of integers with a 256-bit shape, meaning it can hold 8 integers (since each integer is 32 bits). The example demonstrates how to load data into vectors, perform an addition operation on them, and store the result back into an array. The choice of species directly influences the number of elements (lanes) that can be processed in parallel, showcasing the flexibility and power of the Vector API for optimizing computational tasks.
4.3 Lane Operations
Operations are divided into lane-wise, affecting individual elements, and cross-lane, affecting multiple elements or the vector as a whole, including permutations and reductions.
4.3.1 Lane-wise operation
Lane-wise operations perform the same operation on each corresponding pair of elements (lanes) from two vectors. Here's how you can perform an element-wise addition:
public static void main(String[] args) {
VectorSpecies<Integer> species = IntVector.SPECIES_PREFERRED;
// Arrays for demonstration
int[] array1 = {10, 20, 30, 40};
int[] array2 = {1, 2, 3, 4};
int[] resultArray = new int[species.length()];
// Load arrays into vectors
IntVector vector1 = IntVector.fromArray(species, array1, 0);
IntVector vector2 = IntVector.fromArray(species, array2, 0);
// Perform a lane-wise addition
IntVector resultVector = vector1.add(vector2);
// Store the result back into an array
resultVector.intoArray(resultArray, 0);
// Print the result
System.out.println("Result of lane-wise addition:");
for (int i : resultArray) {
System.out.print(i + " ");
}
}
In this example, add() is used for a lane-wise operation, adding corresponding elements from two vectors.
4.3.2 Cross-lane operation
Cross-lane operations can operate across elements of a vector, such as computing the sum of all elements within a vector. This example demonstrates a reduction operation to sum all elements of a vector:
public static void main(String[] args) {
VectorSpecies<Integer> species = IntVector.SPECIES_PREFERRED;
// Array for demonstration
int[] array = {10, 20, 30, 40};
// Load array into a vector
IntVector vector = IntVector.fromArray(species, array, 0);
// Perform a cross-lane reduction to sum all elements of the vector
int sum = vector.reduceLanes(VectorOperators.ADD);
// Print the result
System.out.println("Result of cross-lane reduction (sum): " + sum);
}
In the one above, reduceLanes(VectorOperators.ADD) performs a cross-lane reduction by summing all elements in the vector.
Vector Masks
Vector masks (VectorMask<E>
) enable selective operations on vectors, useful for handling cases where vector sizes do not match the SIMD width exactly, ensuring flexibility and efficiency in vector computations.
public static void main(String[] args) {
VectorSpecies<Integer> species = IntVector.SPECIES_PREFERRED;
// Arrays for demonstration
int[] array1 = {1, 2, 3, 4, 5, 6, 7, 8};
int[] array2 = {8, 7, 6, 5, 4, 3, 2, 1};
int[] resultArray = new int[species.length()];
// Load arrays into vectors
IntVector vector1 = IntVector.fromArray(species, array1, 0);
IntVector vector2 = IntVector.fromArray(species, array2, 0);
// Create a mask for selecting even elements (index-wise)
VectorMask<Integer> mask = VectorMask.fromPredicate(species, i -> i % 2 == 0, null);
// Perform an addition only on elements that match the mask (even indices)
IntVector resultVector = vector1.add(vector2, mask);
// Store the result back into an array
resultVector.intoArray(resultArray, 0);
// Print the result
System.out.println("Result of selective addition using vector masks:");
for (int i : resultArray) {
System.out.print(i + " ");
}
}
In the example, we first create two vectors from the given arrays. We then define a VectorMask that selects elements based on a predicate, in this case, selecting elements with even indices. Using this mask, we perform an addition operation only on the selected elements.
Performance
This API can significantly improve the performance of computations that are common in machine learning, data analysis, and scientific computing. And just to note that, the Vector API's effectiveness is contingent on hardware support for SIMD instructions, and performance gains may vary across different architectures.
In summary, the new Vector API provides a mechanism to write complex vector algorithms in Java, which can be reliably compiled at runtime to optimal vector hardware instructions on supported CPU architectures. The introduction of the Vector API marks a groundbreaking advancement for developers engaged in high-performance computing.