Rethinking Hadoop: When to Use It and How Cloud Computing Changes the Game
This article reviews when Hadoop is appropriate, outlines its core features and limitations, explains cloud computing concepts and service models, and highlights the benefits of pre‑built Hadoop images for accelerating big‑data projects.
These notes are compiled from the UC San Diego Coursera Big Data series.
1. When to Reconsider Hadoop
Hadoop’s ecosystem is rapidly growing, making previously difficult tasks possible.
Assess whether Hadoop truly addresses your specific problem.
Key Hadoop Features
If you see a massive increase in data volume, Hadoop may be worthwhile; it also offers efficient access to archived data.
It supports multiple applications on the same data store, handling large‑scale or high‑quality workloads.
Hadoop enables data parallelism, allowing the same operation to run concurrently on many nodes.
Task parallelism lets different functions run simultaneously on the same or different datasets; further analysis of Hadoop tools is needed for task‑level parallelism.
What Benefits Do These Tools Offer?
Not all algorithms scale on Hadoop; highly coupled data‑processing algorithms require careful analysis before deployment. Consider whether replacing existing database solutions with Hadoop is appropriate.
Hadoop can serve as a platform to ingest diverse datasets and transform them into database‑friendly formats, but it may not always be the optimal storage solution for every business case.
HDFS stores data in large blocks (64 MB or larger), which can make random access difficult because entire files may need to be read.
While Hadoop provides scalability for many algorithms, it does not solve all big‑data management challenges, especially for small datasets, specialized hardware‑intensive algorithms, task‑level parallelism, infrastructure replacement, or random access needs.
2. Cloud Computing
Cloud computing is a major driver of the big‑data era, offering on‑demand compute resources that can be rented as needed.
The main idea is to commoditize computing infrastructure so developers can focus on application challenges rather than building and maintaining hardware.
Renting compute clusters is analogous to renting a vehicle—you only pay for what you use, avoiding large capital expenditures and enabling rapid project deployment.
Should you build your own hardware and software resources?
Or should you rent them from the cloud?
Building an in‑house data center requires hiring staff, purchasing and maintaining hardware, and handling ongoing upgrades and security updates, which can be costly and time‑consuming, especially for startups.
What Can the Cloud Do for Us?
The cloud offers pay‑as‑you‑go pricing, low capital investment, rapid implementation, and the ability to scale quickly as business needs grow, eliminating the need for extensive resource forecasting.
Custom cloud machines let you select CPU, GPU, memory, and storage options tailored to your application’s requirements, effectively providing a “self‑service buffet” of compute resources.
3. Cloud Service Models
IaaS – Infrastructure as a Service
IaaS provides the lowest level of rental service, where you manage the operating system and applications on virtualized hardware; Amazon EC2 is a typical example.
PaaS – Platform as a Service
PaaS supplies the full computing platform, including OS and programming languages, allowing you to develop and run applications on top of provided databases and web servers; Google App Engine and Microsoft Azure illustrate this model.
SaaS – Software as a Service
SaaS delivers complete applications over the cloud, handling both hardware and software environments; Dropbox is a well‑known example.
Security Considerations
When deploying cloud services, you must assess and mitigate security risks because your data resides on third‑party servers.
XaaS – Anything as a Service
XaaS extends the model to finer‑grained resources such as storage‑as‑a‑service, communication‑as‑a‑service, and marketing‑as‑a‑service.
Choosing the right service model depends on your team’s skill set, development and maintenance capabilities, and specific project requirements.
4. Value From Hadoop and Pre‑built Hadoop Images
Assembling a Hadoop stack from scratch can be time‑consuming and error‑prone. Pre‑built images provide ready‑to‑run software stacks with OS, libraries, and applications pre‑installed, similar to buying pre‑assembled furniture.
These images, often delivered via virtualization platforms, enable rapid deployment of Hadoop environments, saving significant project time and effort.
Vendors such as Hortonworks and Cloudera offer pre‑configured Hadoop images for various platforms, along with tutorials and step‑by‑step guides for cloud deployment.
Using pre‑built images accelerates prototyping, scaling, and validation of big‑data solutions, and many companies also provide turnkey solutions tailored to specific project needs.
Original source: http://www.jianshu.com/p/52c8fdc07727 Author: YangXuLei
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
