Simplify Java Stream with JDFrame: A DataFrame‑Style API for Cleaner Code

This article introduces JDFrame/SDFrame, a JVM‑level DataFrame‑style library that provides a more semantic and concise API for Java 8 Stream processing, covering dependency setup, quick start, filtering, aggregation, distinct, grouping, sorting, joining, and advanced features such as percent conversion, partitioning, ranking, and missing‑data replenishment.

Architect
Architect
Architect
Simplify Java Stream with JDFrame: A DataFrame‑Style API for Cleaner Code

Introduction

When frequently forgetting Stream APIs and copying long code, the author wanted a more semantic API similar to DataFrame models in Spark or Pandas, and discovered JVM‑level DataFrame tools such as tablesaw and joinery.

However those tools require hard‑coded field names, which is painful for developers who prefer cleaner code. The author therefore created a JVM‑level DataFrame‑like utility that simplifies Java 8 Stream processing with a more expressive syntax.

Quick Start

1.1 Add Dependency

<dependency>
  <groupId>io.github.burukeyou</groupId>
  <artifactId>jdframe</artifactId>
  <version>0.0.2</version>
</dependency>

1.2 Example

Calculate the total score of students whose age is between 9 and 16, then return the top two schools.

static List<Student> studentList = new ArrayList<>();
// …populate list…

SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
        .whereNotNull(Student::getAge)
        .whereBetween(Student::getAge, 9, 16)
        .groupBySum(Student::getSchool, Student::getScore)
        .sortDesc(FI2::getC2)
        .cutFirst(2);
sdf2.show();

Output:

c1  c2
三中 10
二中 7

API Overview

2.1 Matrix Information

void show(int n);               // print matrix info
List<String> columns();          // get column names
List<R> col(Function<T,R> fn);   // get a column
T head();                        // first element
List<T> head(int n);              // first n elements
T tail();                        // last element
List<T> tail(int n);              // last n elements

2.2 Filtering

.whereBetween(Student::getAge, 3, 6)          // [3,6]
.whereBetweenR(Student::getAge, 3, 6)         // (3,6]
.whereBetweenL(Student::getAge, 3, 6)         // [3,6)
.whereNotNull(Student::getName)               // not null or empty
.whereGt(Student::getAge, 3)                 // >3
.whereGe(Student::getAge, 3)                 // >=3
.whereLt(Student::getAge, 3)                 // <3
.whereIn(Student::getAge, Arrays.asList(3,7,8))
.whereNotIn(Student::getAge, Arrays.asList(3,7,8))
.whereEq(Student::getAge, 3)
.whereNotEq(Student::getAge, 3)
.whereLike(Student::getName, "jay")          // %jay%
.whereLikeLeft(Student::getName, "jay")      // jay%
.whereLikeRight(Student::getName, "jay")     // %jay

2.3 Aggregation

Student s1 = frame.max(Student::getAge);
Integer s2 = frame.maxValue(Student::getAge);
Student s3 = frame.min(Student::getAge);
Integer s4 = frame.minValue(Student::getAge);
BigDecimal s5 = frame.avg(Student::getAge);
BigDecimal s6 = frame.sum(Student::getAge);
MaxMin<Student> s7 = frame.maxMin(Student::getAge);
MaxMin<Integer> s8 = frame.maxMinValue(Student::getAge);

2.4 Distinct

List<Student> std = SDFrame.read(studentList).distinct().toLists();               // by object hash
std = SDFrame.read(studentList).distinct(Student::getSchool).toLists();          // by school
std = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists();
std = SDFrame.read(studentList).distinct(Student::getSchool).distinct(Student::getLevel).toLists();

2.5 Simple Group‑by Aggregation

List<FI2<String, BigDecimal>> a = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> a2 = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Student>> a3 = frame.groupByMax(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> a4 = frame.groupByMinValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Long>> a5 = frame.groupByCount(Student::getSchool).toLists();
List<FI2<String, BigDecimal>> a6 = frame.groupByAvg(Student::getSchool, Student::getAge).toLists();
List<FI3<String, BigDecimal, Long>> a7 = frame.groupBySumCount(Student::getSchool, Student::getAge).toLists();
List<FI3<String, String, BigDecimal>> a8 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getAge).toLists();
List<FI4<String, String, String, BigDecimal>> a9 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getName, Student::getAge).toLists();

2.6 Sorting

SDFrame.read(studentList).sortDesc(Student::getAge);
SDFrame.read(studentList).sortDesc(Student::getAge).sortAsc(Student::getLevel);
SDFrame.read(studentList).sortAsc(Student::getAge);
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));

2.7 Join

append(T t);                     // add element
union(IFrame<T> other);           // addAll
join(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join);          // inner join
leftJoin(...);                    // left join
rightJoin(...);                   // right join

Example of inner join between two matrices and printing the result is shown in the source.

2.8 Miscellaneous

Percent conversion :

SDFrame.read(list).mapPercent(Student::getScore, Student::setScore, 2)

Partition : SDFrame.read(list).partition(5).toLists() Generate order number :

SDFrame.read(list).sortDesc(Student::getAge).addSortNoCol(Student::setRank).show(30)

Generate ranking (same values share rank) :

SDFrame.read(list).addRankingSameColDesc(Student::getAge, Student::setRank).show(20)

Replenish missing entries :

SDFrame.read(list).replenish(Student::getSchool, allDim, school -> new Student(school)).show()

Group‑wise replenish :

SDFrame.read(list).replenish(Student::getSchool, Student::getLevel, (s,l) -> new Student(s,l)).show(30)

Frames Comparison

Two frames are provided: SDFrame behaves like a Java Stream – operations are lazy and only materialised on a terminal call, requiring a new read for further processing. JDFrame applies each operation immediately, so the intermediate state can be reused without re‑reading. Choose SDFrame for pure stream pipelines and JDFrame when you need “mid‑point” data.

Where to Get It

https://github.com/burukeYou/JDFrame
https://central.sonatype.com/artifact/io.github.burukeyou/jdframe

The library aims to bring DataFrame‑style, declarative data manipulation to the JVM, reducing boilerplate and improving readability of Java 8 Stream code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaAPIStreamdataframeDataProcessingJDFrameSDFrame
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.