High‑Performance Data Analytics (HPDA): Architecture, Market Trends, and Fujitsu Reference Model
The article provides a comprehensive overview of High‑Performance Data Analytics (HPDA), detailing its market drivers, technical classifications, integration of HPC with big‑data workloads, Fujitsu's reference architecture, hardware configurations, benchmark results, and the economic benefits of deploying HPDA on existing HPC infrastructures.
High‑Performance Data Analytics (HPDA) is an emerging technology and a major segment of HPC, with leading players such as IBM, Fujitsu, SGI, Oracle, HPE, and Google active in cloud services, big data, storage, servers, and networking.
According to the WGR report, the primary growth driver for the HPDA market is data‑intensive HPC applications, which will soon extend beyond traditional simulation to e‑commerce, finance, and economics.
HPDA can be classified along two dimensions: technology and market application. Technologically it includes Graph Analytics, Streaming Analytics, Compute‑Intensive Analytics, and Novel Architectures. Market‑wise it serves Financial Services, Manufacturing, Scientific, Energy, Healthcare, and Telecommunications sectors.
IDC forecasts a 13.3% CAGR for HPDA‑driven server revenue, rising from $743.8 M in 2012 to $1.4 B in 2017, with storage revenue expected to reach $800 M. The article uses Fujitsu’s HPDA solution as a case study to analyze reference architecture and technical approaches.
The paper examines the benefits of merging big‑data analytics with HPC, describing concepts, components, and a generic solution architecture that highlights the economic value of HPDA.
It also presents a cost‑effective reference model that enables enterprises to leverage existing HPC resources for HPDA workloads, outlining a performance‑benchmarking methodology.
The Emergence of HPDA
With explosive data growth—projected to reach 163 ZB globally by 2025—linear scaling of servers and storage becomes untenable. Data processing demands three stages: capture & filtering, analysis, and visualization, requiring “big compute” (HPC) to unlock full potential.
Industries that benefit from accelerated, data‑intensive workloads include e‑commerce, weather and climate modeling, and traditional scientific HPC environments.
HPDA Workload Types
Workloads vary by retrieval speed, data flow, dataset size, and I/O characteristics, influencing the required analysis effort.
In data‑ and compute‑intensive applications, workloads are large‑scale, parallel, and heavily dependent on network and storage, handling structured and unstructured data from IoT devices, sensors, etc.
HPDA Process Flow
By using HPC resources on big‑data platforms such as Hadoop, HPDA creates a high‑performance analytics configuration. Data collection and analysis times depend on ingestion rates and processing complexity, mirroring HPC workflows that merge data for complex numerical models.
When HPC and big‑data technologies converge, the HPDA platform runs complex workloads on HPC resources, processing and storing massive datasets.
When to Adopt HPDA Architecture
Standard solutions like Hadoop and Spark dominate the market. Hadoop, a Java‑based open‑source framework, handles large datasets via the Hadoop Distributed File System (HDFS), which is slower for small‑file random reads compared with HPC parallel file systems (e.g., Fujitsu FEFS, GPFS). Spark adds in‑memory processing for higher performance but consumes significant memory.
Integrating HPC hardware, software, and Hadoop/Spark yields a high‑performance, agile, and scalable solution for data‑intensive workloads.
Fujitsu’s extensive experience with Hadoop‑based analytics is leveraged, focusing on users with existing HPC infrastructure to build scalable, agile HPDA environments.
HPDA Reference Model
The model combines big‑data and analytics technologies, exploiting existing HPC resources (or Fujitsu PrimeFlex) and can be expanded as needed. Traditional HPC clusters are augmented with Hadoop tools, and the HPC parallel file system is equipped with an HDFS connector for seamless data access.
Fujitsu’s approach builds an agile HPDA system where structured and unstructured data processing occurs within the HPDA architecture, efficiently combining HPC and analytics workloads while reducing total cost of ownership.
Compute nodes consist of flexible HPC cluster nodes; the Fujitsu PrimeRay RX2530 1U dual‑processor server offers high flexibility and scalability, with SSD local storage sized at a 3:1 SSD‑to‑memory ratio. A BeeGFS‑based parallel file system with HDFS connector provides optimal performance for both HPC and HPDA jobs.
InfiniBand/Omni‑Path high‑speed interconnects maximize node‑to‑node communication and data‑movement throughput.
HPDA General System Architecture
A shared HPC‑HPDA environment integrates job submission on the head node with batch‑system scheduling, exposing an HDFS‑compatible local PFS for fast data access and a permanent PFS layer for long‑term storage.
Recommended configurations for medium to ultra‑large data sizes are provided to achieve optimal performance at minimal cost.
The main advantages of Fujitsu’s HPDA reference model are:
Deployment on existing HPC platforms.
Unified cluster management (SLURM) for both compute‑intensive and big‑data analytics workloads.
Accelerated Hadoop performance via high‑speed interconnects and parallel file systems.
Performance was validated using the TeraSort benchmark on identical hardware, comparing standard Spark/HDFS setups with the HPDA reference model that employs a Slurm‑driven data‑analysis connector and BeeGFS with an HDFS connector. Results show higher throughput and significantly reduced data generation and analysis time.
The benchmark ran on Fujitsu’s PrimeFlex platform (8 compute nodes, each with dual Broadwell CPUs and 128 GB RAM, using 400 GB Intel SSDs and an 8‑node parallel file system).
Conclusion and Summary
For complex, time‑critical big‑data workloads, many traditional HPC parallel‑file‑system vendors offer HDF5‑type big‑data support; Fujitsu’s HPDA reference architecture adopts similar solutions with minimal impact on existing networks and low investment cost.
The model leverages existing HPC resources to run Hadoop or other big‑data applications without disrupting current workloads, delivering optimal results for both domains.
Numerous governments, enterprises, and research organizations estimate annual savings of millions of dollars by using HPC for data analytics. Fujitsu’s integrated PrimeFlex system provides a scalable solution that unifies HPC and data‑analytics workloads on a single infrastructure.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
