Installing Elasticsearch and Performing Data Aggregation Queries
This article walks through installing Elasticsearch 5.6.9, configuring system limits, creating indices, inserting and deleting documents, executing complex aggregation queries, and integrating Elasticsearch with Java using the TransportClient, providing a practical guide for building analytics on large‑scale data.
The author describes a need for efficient data analysis of merchant statistics and chooses Elasticsearch (ES) over MySQL, HBase, and Hadoop MapReduce, then details the step‑by‑step installation and configuration of ES 5.6.9 on a Linux server.
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.9.tar.gz tar -zxvf elasticsearch-5.6.9.tar.gz -C /usr/local/ cd /usr/local/elasticsearch-5.6.9
Key configuration changes in config/elasticsearch.yml include setting the cluster name, node name, data and log paths, network host, and HTTP port. System limits must be increased by editing /etc/security/limits.conf and /etc/sysctl.conf to avoid file‑descriptor and virtual‑memory errors.
* soft nofile 65536 * hard nofile 65536 * soft nproc 2048 * hard nproc 4096
vm.max_map_count=262144
After creating a non‑root user (e.g., elastic ) and granting permissions, ES is started with bin/elasticsearch or in the background using nohup bin/elasticsearch > /opt/data/elastic/elastic.log 2>&1 & . A successful start is verified by accessing http://192.168.0.1:9200 in a browser.
ES operations are performed via HTTP requests. The article shows how to create an index with mappings, insert documents, delete by query, and execute search queries using PUT , POST , and GET methods.
PUT:http://192.168.0.1:9200/shopsinfo { "mappings":{ "shopsOrder":{ "properties":{ "shopid":{"type":"string","index":"not_analyzed"}, "createdate":{"type":"string","index":"not_analyzed"}, "timestamp":{"type":"long"}, "paymentType":{"type":"string","index":"not_analyzed"}, "amount":{"type":"long"} } } } }
Examples of aggregation queries are provided, demonstrating how to compute sums, group by fields, and combine multiple aggregations. The JSON DSL for aggregations uses the aggs element, with terms for grouping and sum for calculating totals.
Sample aggregation to sum amount for specific shops and timestamps:
{ "size":0, "aggs":{ "query_amount":{"sum":{"field":"amount"}} }, "query":{...} }
More complex examples show daily grouping and nested aggregations by payment type and date, with the resulting JSON containing buckets that hold counts and summed values.
For Java integration, the article lists Maven dependencies for Elasticsearch 5.6.9 and provides a helper class that creates a TransportClient with the appropriate cluster settings.
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>5.6.9</version> </dependency> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>transport</artifactId> <version>5.6.9</version> </dependency>
Java code shows how to build a search request, apply term and range filters, define sum and terms aggregations, execute the request, and extract aggregation results such as total amount and per‑payment‑type statistics.
public void getAmountData(Long startTimestamp, String... shopIds) { SearchRequestBuilder sbuilder = client.prepareSearch("shopsinfo").setTypes("shopsOrder"); TermsQueryBuilder mpq = QueryBuilders.termsQuery("shopid", shopIds); RangeQueryBuilder mpq2 = QueryBuilders.rangeQuery("timestamp").gte(startTimestamp); QueryBuilder queryBuilder = QueryBuilders.boolQuery().must(mpq).must(mpq2); sbuilder.setQuery(queryBuilder).setSize(0); SumAggregationBuilder salaryAgg = AggregationBuilders.sum("query_amount").field("amount"); TermsAggregationBuilder paymentAgg = AggregationBuilders.terms("paymentType").field("paymentType"); paymentAgg.size(100).subAggregation(salaryAgg); TermsAggregationBuilder groupDateAgg = AggregationBuilders.terms("payment_date").field("createdate").order(Order.term(true)); groupDateAgg.size(100).subAggregation(salaryAgg); paymentAgg.subAggregation(groupDateAgg); sbuilder.addAggregation(salaryAgg).addAggregation(paymentAgg); SearchResponse response = sbuilder.execute().actionGet(); // process response }
The article concludes with practical tips, such as remembering to set the size parameter for aggregations to retrieve all buckets, and acknowledges the author's friend who suggested the Elasticsearch solution.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.