Print Email

Z Architecture Continuously Improves Application Support

4/10/2019 8:48:32 AM |

Three topics have a direct impact on applications that run on IBM Z:

 

1.     Performance improvements for COBOL and other enterprise system development products with vector decimal instructions that operate on zoned or packed decimal data

2.     Single-instruction, multiple data enhancements that provide performance improvements for Java, PL/1, COBOL and C/C++

3.     IBM Z architectural enhancements that improve the performance of the garbage collection process for Java

 

I’m a software specialist but since I’ve been attending the ECC Conference each year at Marist, I’ve become interested in IBM Z hardware as IBM brings one to the event every year. I had a chance to examine both z13 and z14 and that started me on a journey of discovery to answer the question—how do the recent Z architecture enhancements support applications that run on IBM Z?

 

Performance Improvements With Vector Decimal Instructions

 

In August 2017, Tom Ross presented “Enterprise COBOL V6.2 was Announced! What's New?” at SHARE Providence. He mentioned that the new compiler provided support for the new IBM z14 hardware and IBM z/OS V2.3 OS so applications can take advantage of the latest IBM Z architecture and OS features. One of the features he explained was the new vector packed decimal facility of z14. His mention of the vector packed decimal facility sent me first to the latest version of “z/Architecture Principles of Operation” where I found that the manual (Chapter 25) was updated to describe more than 15 new vector decimal instructions. For a visual of these vector decimal instructions, see Figure 1. 

 Figure1.png

Figure 1: Vector Decimal Instructions

 

The introduction to the chapter states, “The vector-packed-decimal facility provides instructions to operate on signed-packed-decimal format data in register operands.”

Operating on data in registers is “better” than operating on data in memory. Why is that? The introduction explains: “Since the delay between instructions encountered to ensure sequential order of operand accesses is likely less between register accesses than between storage accesses, a sequence of vector decimal instructions referencing operands in registers may achieve better performance than a comparable sequence of decimal instructions referencing operands in storage.”

 

What’s the Impact on COBOL Programs?

 

In his 2017 SHARE presentation, Ross discusses results featuring four different tests. In the unsigned packed decimal add test case, the results were 4.85x faster than using the non-vector instructions. For the large decimal divide test, the results were an amazing 135x faster. The large decimal multiply produced results that were 39x faster than the legacy implementation. Finally, the zoned decimal computation case results were 3.05x faster. All the details are in Ross’ SHARE presentation. The documented results show that these new instructions make a significant difference in the performance of COBOL programs that have computations as a key part of their processing logic.

 

SIMD Enhancements

 

There are two big ideas that are important to understand to appreciate this performance enhancement that began in z13 and has been extended in z14. The first is single instruction multiple data (SIMD) which enable users to perform the same operation on multiple data points at once. This is useful for companies that want to quickly and efficiently process large amounts of data for analytics, mobile applications, data serving and more—transforming that data rapidly into information.  

 

The second concept is vector processing. With SIMD, entire arrays of data can be processed by a single instruction and that practice is implemented with the Z architecture vector facility. The z13 superscalar processors featured 32 double-width (128 bit width) vector registers and 139 instructions to accelerate processing.


Additional instructions, in support of decimal operations, were introduced with the IBM z14 and are designed to give a performance boost for traditional workloads written in COBOL (I covered this previously, referencing Chapter 25 of “z/Architecture Principles of Operation”).  

 

Looking again at the latest version of “z/Architecture Principles of Operation” you find a discussion of the vector facility starting with an overview and extending on to multiple chapters that explain the individual instructions by type. See Figure 2 for this vector overview.

Figure2.png

Figure 2: Vector overview

 

Using SIMD with vector support is straightforward as many existing programs need only be recompiled in order to take advantage of the performance boost. Programs or workloads using IBM Java will use SIMD automatically with the newest Java version. For a summary of SIMD support in z13 and z14, see Figure 3.

 

Compilers

IBM z/OS XL C/C++ V2.1.1

 

IBM Enterprise COBOL 5.2

 

IBM Enterprise PL/I for z/OS V4.5

Other uses

Analytics applications such as Apache Spark for z/OS. See this Redbook for details

 

Select Public-Key Cryptographic Standards #11 clear key operations can take advantage of SIMD

OSes

z/OS V2.3, and z/OS V2.2 or z/OS V2.1 with PTFs

 

VM V6.4 and VM 6.3 with PFT UM34752 enables guess support thus allowing Linux to use SIMD

 

KVM executing on IBM Z supports SIMD

Java

IBM z14, z13s, z13, and IBM Java 8 improves performance with SIMD

 

Java applications, exploiting SMT-enabled zIIP specialty engines, benefit due to Java 8 code optimizations and use of SIMD by Java string operations

 

Java uses SIMD to provide acceleration of matrix operations, string processing and other uses

 

Auto-vectorization is a just in time (JIT) compiler optimization in IBM Java 8 that accelerates simple scalar loops (matrix multiplication operations were up to 60% faster when using the new Java 8 JIT in IBM lab)

Figure 3: Summary of SIMD support for z13 and z14

  

Improved Garbage Collection for Java

Garbage collection affects performance by slowing down applications during processing. The garbage collector manages the memory used by Java and by applications running in the JVM. When the garbage collector receives a request for storage, unused memory in the heap is set aside in the allocation process. The garbage collector also checks for areas of memory that are no longer referenced and releases them for reuse—this is called collection.

 

Performance improvements were introduced with z14 and are referred to as pauseless garbage collection (PGC). This enhancement is implemented with the guarded storage facility focused on runtime environments that use garbage collection to increase efficiency of the collection. Before the invention of PGC, garbage collection stopped all threads, effectively bringing application processing to a stop. By making use of the guarded storage facility in IBM z14 hardware, PGC allows more parallel execution of GC-related processing with application code (see Chapter 4 of “z/Architecture Principles of Operation” for details on the new hardware instructions).

 

PGC is important for applications with strict response time agreements and applications with large Java heaps. What’s the performance impact? In Java With z14 Features Hardware Facilities for Secure, High-Performance Transaction Processing” the authors discussed a Java store inventory application that demonstrated a 10x improvement in pause time as a result of PGC.  

Next Up

Next month, I’ll follow up this article with another that continues this focus on Z architecture improvements that directly support enterprise applications.   

 

 

Join Now!
First Encounters of the Z Kind

First Encounters of the Z Kind

She started on Interskill, she went to SHARE and she’s starting to master the mainframe. What’s next?

Read more »

Breaking Through CPU Roadblocks

Breaking Through CPU Roadblocks

The Warning Track implementation on the zEC12 and the Blocked Workload Support in z/OS provide mechanisms to clear lower-priority work out of the way without violating z/OS priority dispatching to any great extent.

Read more »