(The Hosting News) – Strata + Waterline Data Science and Syncsort, a global leader in high-performance big data software, today announced at Strata+Hadoop World a technology alliance delivering powerful, integrated software to the market for ingesting, discovering, profiling, transforming, and analyzing Big Data using Apache Hadoop, with enterprise-grade governance.
“Many companies have been using Syncsort’s market-leading Apache Hadoop-based software to shift production workloads and their associated data from more expensive, legacy platforms to Apache Hadoop, liberating budget from mediocre legacy products,” said Lonne Jaffe, CEO, Syncsort. “This alliance brings together Syncsort’s technology with Waterline’s advanced data profiling, data inventory and discovery capabilities, making vast quantities of data available for advanced analytics.”
As Apache Hadoop production environments mature, organizations are using Syncsort’s DMX-h product to ingest and process data within their Hadoop-based “data lakes” because of its unique integration with Apache Hadoop that provides best in class ease of use, scalability and performance. As data becomes available in the data lake, data scientists require an easy way to discover, profile and understand the data that is available. Waterline Data Inventoryaddresses the challenge of keeping track of data from multiple and varied data sources, providing enterprise-grade governance and an ability to easily find the right data for a growing number of analytical use cases.
“Organizations have loaded very large volumes of diverse data into Hadoop, and now realize that it’s not easy to get data out of Hadoop in a form that data scientists can easily consume,” said Alex Gorelik, founder and CEO, Waterline Data Science. “Hadoop previously lacked data lineage, data quality metrics, and business metadata. By integrating Syncsort’s industry-leading data ingest and transformation capabilities with our automated data profiling, inventory and discovery, we’re empowering users to discover business insights previously available only to users of expensive legacy data warehouses.”
The capabilities that Waterline Data Science and Syncsort are delivering include:
- Data Integration. Bringing data into Apache Hadoop from virtually any source, pre-processing the data to make it more consumable and distributing the data to other systems.
- Automated Data Inventory & Discovery. Building a complete inventory of data within Apache Hadoop with metadata tags, and enabling data discovery self-service for finding, understanding, and governing the data.
- Automated Data Profiling. Curating the data to identify subtle relationships so the right data from a wide variety of sources can be selected for analysis.
Without this joint offering, users would need to spend a lot of time searching for the right data in the large volumes of diverse data in Hadoop. This can be a frustrating process that doesn’t scale well. Syncsort and Waterline streamline this process by enabling users, via rich graphical user interfaces, to quickly get the data they need into Hadoop, identify and understand that data, and leverage crowdsourcing for tagging and inventorying the data ̶ all without any coding.
Syncsort provides fast, secure, enterprise-grade software spanning Big Data solutions in Hadoop to Big Iron on mainframes. We help customers around the world to collect, process and distribute more data in less time, with fewer resources and lower costs. 87 of the Fortune 100 companies are Syncsort customers, and Syncsort’s products are used in more than 85 countries to offload expensive and inefficient legacy data workloads, speed data warehouse and mainframe processing, and optimize cloud data integration. Experience Syncsort athttp://www.syncsort.com/en/TestDrive.
About Waterline Data Science
Waterline Data Science is an early-stage Big Data software company, founded in December 2013, backed by Menlo Ventures and Sigma West. The inspiration for the name “Waterline” came from the metaphor of the Big Data Lake. Waterline solves the challenges of data self-service for the Hadoop data lake. It’s easy to get data into Hadoop, but it’s not easy to get it out in a self-service manner and derive business value from it. The idea behind Waterline is that data self-service for Hadoop should be like finding the data you need easily, without having to dive for it — you should be able to Hadoop “above the waterline.”