Looking at In-Database Analytics
High-performance data transformation on DB2 for z/OS is possible
4/5/2017 12:00:22 AM |
By Mehmet Cuneyt Goksu
Data transformation is a core process in the operation of most business analytics hubs. However, even experienced data professionals can suffer from certain misconceptions about this practice, believing that data must be moved from source platforms to other platforms, often through several intermediary staging and storage systems, to enable transformation for downstream consumption in analytics applications.
But execution of high-performance data transformations on transactional source systems (e.g., in DB2 for z/OS on z Systems) is possible using a capability known as in-database analytics. Indeed, traditional practice—known as extract–transform–load (ETL)—might not be the ideal deployment approach for executing data transformation in many real-world scenarios, because ETL:
- Can increase data latency to longer than a day
- Windows lengthen with increases in quantity of data
- Processes can take months to implement for new data elements
However, before recognizing in-database analytics on transactional source systems as an alternative to ETL, data professionals must first abandon three common myths:
- Transactional workloads cannot be executed on the same platform as analytics workloads
- Running analytics on transactional data increases your operational cost
- Transactional systems aren’t designed to execute analytics
Indeed, and contrary to popular belief, accelerating efficient transformations on transactional source systems without affecting transactional workloads or moving the data for staging or final delivery is eminently possible. Options include:
- Offload ETL functions to an accelerator on the source system, such as IBM DB2 Analytics Accelerator—an extension of DB2 for z/OS with Netezza technology—that provides high-performance storage saver, query acceleration and accelerator-only table (AoT) capabilities (called temporary tables in DB2 Analytics Accelerator).
- Do transformations within SQL using only the transactional source system by means of real-time data transformation for data consumability in SQL via views. This involves separating data into groups of semistatic (categorical) and dynamic data. Generate semistatic content as necessary—be it daily, weekly or monthly—to represent metadata associated with customers, products or business units that would typically be contained within a join of dimension tables and fact tables. This also includes the keys necessary to join to operational tables, but not the fact data. Dynamic data, or facts, from operational systems should be joined by leveraging the semistatic data and joining it to the operational system data using the keys involved in generating the semistatic table content.
This latter approach offers a range of advantages:
- Views can hide SQL complexity from users while still offering the intelligence necessary to retrofit data and simplify access
- Views can reflect existing data warehousing/data mart schemas while keeping existing transactional workloads running
- Views can leverage existing database objects (dimensional structures) to transform and standardize data
- Repetitive transformations from operational data to information data can be standardized by leveraging smart data modeling techniques and objects and by staging prepared data objects before they are joined to fact data
- Removing complex processes and prestaging data can significantly boost performance while enhancing access to operational data
What’s more, during execution of multiple transformations on data that’s essentially categorical (data that only changes periodically, or semistatic data), accelerators such as DB2 Analytics Accelerator can further enhance performance—as can database performance objects, materialized query tables and AoTs.
Mehmet Cuneyt Goksu is zAnalytics technical leader in the Middle East and Asia, focusing on zAnalytics and big data solutions on z Systems. He is a L2 Certified IT Specialist, specialized in DB2 for z/OS, DB2 tools, data governance, system integration and cognitive solutions in IBM. Goksu worked as a DB2 DBA, DB2 system programmer, DB2 Gold Consultant and IBM business partner for 20 years. He provides a wealth of experience to customers, IBMers and business partners in defining, developing and using z Systems platform-based applications and solutions. He served in the International DB2 Users Group for 10 years as EMEA conference team member, BOD member and a passionate speaker about DB2 and Information Management in international conferences such as IDUG and IBM Insight. He has been writing several papers and articles on DB2 and Analytics, contributes regularly at customer events, workshops and publications. Goksu holds a Computer Science degree, MBA and MS in Database Systems from IIT, Chicago.
» More Articles
Joseph Gulla discusses the future of IT jobs, the availability of skilled human resources and the impact of innovation.
Read more »
West Texas A&M University professor H. Paul Haiduk has developed a reputation as both a staunch mainframe advocate and a tough, yet effective, educator. Accordingly, WTAMU’s exacting IT program standards are tough and demanding, and it’s those standards that routinely produce IT students in general and mainframe students in particular who excel in their eventual career paths.
Read more »