Hadoop’s Presence on the Mainframe
It is part of a solution to provide real-time analytics on the platform
1/7/2015 5:34:33 AM |
By Alan Radding
Don’t expect Hadoop for Linux on System z, the elephant on the mainframe, to have a big impact immediately. Announced officially in 2014, its biggest impact will come from its ability to integrate new, non-traditional data sources with traditional corporate data that resides in abundance on the System z platform.
If the thought of Hadoop on the mainframe conjured up visions (or nightmares) of burning MIPS to excess sorting through multiple PB of social data with the System z server to find dissatisfied customers who were dissing your product online, relax. That’s not what IBM had in mind.
“Hadoop on System z is not intended for extreme data volumes; it is best considered for non-traditional exploration of data that largely originates on the System z platform,” says Paul DiMarzio, IBM System z big data and analytics offering manager. DiMarzio called his Enterprise2014 Hadoop presentation “The Elephant on the Mainframe.”
That doesn’t mean you can’t, shouldn’t or won’t pull in some data from Twitter (a new IBM partner) or Facebook or any other non-traditional data source to augment an analysis of data you already have on the System z platform. Just keep the volumes from nontraditional data sources other than System z to GB or TB. For example, data insights from social media, machine-generated data and even email can be integrated with System z data for an effective analysis, notes DiMarzio.
Data that originates on the System z platform works best. This includes non-relational data (e.g., log files, XML), VSAM, other record-oriented files and the large amounts of historical data in transactional systems already residing on the mainframe.
IBM pitches Hadoop for Linux on System z as just one piece of a more comprehensive, integrated big data and analytics approach. Furthermore, the integration of real-time analytics on the System z platform delivers significant advantages overall, especially if the organization is facing transaction fraud—although real time is not a specific capability of Hadoop, a batch process.
Much more appealing to most System z shops will be the economic efficiencies a comprehensive analytics system on the mainframe brings. In the October announcement of real-time analytics for the mainframe, System z General Manager Ross Mauri pointed out the inefficiency of off-loading operational data from the server in order to perform analytics on another platform. “This increases cost and complexity while limiting the ability of businesses to use the insights in a timely manner,” said Mauri.
But with Hadoop, BigInsights, Veristorm’s vStorm Connect and other tools to pull in everything from text analytics to spreadsheet-based data and ANSI-compliant SQL, a mainframe shop can do it all right there while eliminating the cost and delay of moving data and performing Extract, Transform and Load between platforms. Finally, mainframe shops have “an end-to-end solution that makes analytics a part of the flow of transactions and allows organizations to gain real-time insights while improving their business performance with every transaction,” noted Mauri.
With analytics especially, delay is the killer. You want to catch the fraud while the bad guy is at the point-of-sales terminal or while a patient is experiencing complications. In his keynote at Enterprise2014, IBM Senior Vice President Tom Rosamilia introduced the five minute rule. The rule defines how fast you have to respond to people who contact you, online or through mobile. People, it seems, expect companies to respond to or at least acknowledge their comments, questions or problems within five minutes. That means companies need to monitor and analyze the volumes of data they receive in real time and respond appropriately.
Can your mainframe shop pass the five minute test? The combination of massive amounts of data and consumers who are empowered with mobile access, according to Rosamilia, is creating a difficult challenge for businesses. Consumers now expect near immediate response to any interaction, at any time, and through their own preferred channel of communication. You have no chance to pass this test without real-time analytics on one fast, single, scalable platform—your mainframe.
To expedite your move to comprehensive, real-time mainframe analytics, IBM has assembled a set of tools. This includes:
- Hadoop for Linux on System z
- Veristorm vStorm Connect
- InfoSphere BigInsights (open source Hadoop with enterprise enhancements)
- BigSheets to include spreadsheet data in your analytics
- Eclipse IDE to speed development of analytic applications
Also, make sure you are running the latest enterprise edition of Linux for System z. As IBM puts it: these tools represent more than five years’ worth of technology development. For mainframe data centers it all can be delivered on one platform, the System z mainframe.
And what might you do with this tool set? Start with non-relational data originate outside System z. DiMarzio suggests analyzing sentiments and identify customers who are dissatisfied with company as uncovered in Twitter, Facebook and other social media posts. Then join these results with operational data residing on the mainframe to alert the people responsible for customer retention.
Hadoop on the System z platform really makes up only one piece. The real value come from the different ways you can combine the analytic and query pieces on one platform with DB2 and other data on System z to achieve a variety of objectives.
Alan Radding is a Newton, Mass.-based freelance writer specializing in business and technology. Over the years his writing has appeared in a wide range of publications including CFO Magazine, CIO Magazine and Information Week. He can be reached through his website, technologywriter.com.