Big Data 12 min read

Installing Elasticsearch and Performing Data Aggregation Queries

This article walks through installing Elasticsearch 5.6.9, configuring system limits, creating indices, inserting and deleting documents, executing complex aggregation queries, and integrating Elasticsearch with Java using the TransportClient, providing a practical guide for building analytics on large‑scale data.

Architecture Digest

May 27, 2018

Installing Elasticsearch and Performing Data Aggregation Queries

The author describes a need for efficient data analysis of merchant statistics and chooses Elasticsearch (ES) over MySQL, HBase, and Hadoop MapReduce, then details the step‑by‑step installation and configuration of ES 5.6.9 on a Linux server.

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.9.tar.gz
 tar -zxvf elasticsearch-5.6.9.tar.gz -C /usr/local/
 cd /usr/local/elasticsearch-5.6.9

Key configuration changes in config/elasticsearch.yml include setting the cluster name, node name, data and log paths, network host, and HTTP port. System limits must be increased by editing /etc/security/limits.conf and /etc/sysctl.conf to avoid file‑descriptor and virtual‑memory errors.

* soft nofile 65536
* hard nofile 65536
* soft nproc 2048
* hard nproc 4096

vm.max_map_count=262144

After creating a non‑root user (e.g., elastic) and granting permissions, ES is started with bin/elasticsearch or in the background using

nohup bin/elasticsearch > /opt/data/elastic/elastic.log 2>&1 &

. A successful start is verified by accessing http://192.168.0.1:9200 in a browser.

ES operations are performed via HTTP requests. The article shows how to create an index with mappings, insert documents, delete by query, and execute search queries using PUT, POST, and GET methods.

PUT:http://192.168.0.1:9200/shopsinfo
{
  "mappings":{
    "shopsOrder":{
      "properties":{
        "shopid":{"type":"string","index":"not_analyzed"},
        "createdate":{"type":"string","index":"not_analyzed"},
        "timestamp":{"type":"long"},
        "paymentType":{"type":"string","index":"not_analyzed"},
        "amount":{"type":"long"}
      }
    }
  }
}

Examples of aggregation queries are provided, demonstrating how to compute sums, group by fields, and combine multiple aggregations. The JSON DSL for aggregations uses the aggs element, with terms for grouping and sum for calculating totals.

Sample aggregation to sum amount for specific shops and timestamps:

{
  "size":0,
  "aggs":{
    "query_amount":{"sum":{"field":"amount"}}
  },
  "query":{...}
}

More complex examples show daily grouping and nested aggregations by payment type and date, with the resulting JSON containing buckets that hold counts and summed values.

For Java integration, the article lists Maven dependencies for Elasticsearch 5.6.9 and provides a helper class that creates a TransportClient with the appropriate cluster settings.

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch</artifactId>
  <version>5.6.9</version>
</dependency>
<dependency>
  <groupId>org.elasticsearch.client</groupId>
  <artifactId>transport</artifactId>
  <version>5.6.9</version>
</dependency>

Java code shows how to build a search request, apply term and range filters, define sum and terms aggregations, execute the request, and extract aggregation results such as total amount and per‑payment‑type statistics.

public void getAmountData(Long startTimestamp, String... shopIds) {
    SearchRequestBuilder sbuilder = client.prepareSearch("shopsinfo").setTypes("shopsOrder");
    TermsQueryBuilder mpq = QueryBuilders.termsQuery("shopid", shopIds);
    RangeQueryBuilder mpq2 = QueryBuilders.rangeQuery("timestamp").gte(startTimestamp);
    QueryBuilder queryBuilder = QueryBuilders.boolQuery().must(mpq).must(mpq2);
    sbuilder.setQuery(queryBuilder).setSize(0);
    SumAggregationBuilder salaryAgg = AggregationBuilders.sum("query_amount").field("amount");
    TermsAggregationBuilder paymentAgg = AggregationBuilders.terms("paymentType").field("paymentType");
    paymentAgg.size(100).subAggregation(salaryAgg);
    TermsAggregationBuilder groupDateAgg = AggregationBuilders.terms("payment_date").field("createdate").order(Order.term(true));
    groupDateAgg.size(100).subAggregation(salaryAgg);
    paymentAgg.subAggregation(groupDateAgg);
    sbuilder.addAggregation(salaryAgg).addAggregation(paymentAgg);
    SearchResponse response = sbuilder.execute().actionGet();
    // process response
}

The article concludes with practical tips, such as remembering to set the size parameter for aggregations to retrieve all buckets, and acknowledges the author's friend who suggested the Elasticsearch solution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Analytics Big Data Elasticsearch data aggregation Search

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.