Simplify Java Stream Processing with JDFrame – A JVM‑Level DataFrame Library
This article introduces JDFrame, a JVM‑level DataFrame‑style library that provides a more expressive, SQL‑like API for Java 8 streams, shows how to add the Maven dependency, demonstrates common operations such as filtering, grouping, sorting, joining, and explains the differences between SDFrame and JDFrame with practical code examples.
0. Introduction
Because I often forget some Stream APIs and have to copy‑paste long code, I wanted a more semantic API. Inspired by DataFrame models in Spark and Pandas, I discovered JVM‑level DataFrame tools like tablesaw and joinery, but they require hard‑coded field names. To make it more concise, I created JDFrame, a JVM‑level DataFrame‑like tool that simplifies Java 8 Stream processing with a more expressive API.
1. Quick Start
1.1 Add Dependency
<dependency>
<groupId>io.github.burukeyou</groupId>
<artifactId>jdframe</artifactId>
<version>0.1.7</version>
</dependency>1.2 Example
Count the total score of students whose age is not null and between 9 and 16, then get the top‑2 schools by total score.
@Data
@AllArgsConstructor
@NoArgsConstructor
public class Student {
private int id;
private String name;
private String school;
private String level;
private Integer age;
private BigDecimal score;
private Integer rank;
public Student(String level, BigDecimal score) { this.level = level; this.score = score; }
public Student(int id, String name, String school, String level, Integer age, BigDecimal score) {
this.id = id; this.name = name; this.school = school; this.level = level; this.age = age; this.score = score;
}
}
static List<Student> studentList = new ArrayList<>();
static {
studentList.add(new Student(1, "a", "一中", "一年级", 11, new BigDecimal(1)));
studentList.add(new Student(2, "a", "一中", "一年级", 11, new BigDecimal(1)));
studentList.add(new Student(3, "b", "一中", "一年级", 12, new BigDecimal(2)));
studentList.add(new Student(4, "c", "二中", "一年级", 13, new BigDecimal(3)));
studentList.add(new Student(5, "d", "二中", "一年级", 14, new BigDecimal(4)));
studentList.add(new Student(6, "e", "三中", "二年级", 14, new BigDecimal(5)));
studentList.add(new Student(7, "e", "三中", "二年级", 15, new BigDecimal(5)));
}
// Equivalent SQL:
// SELECT school, SUM(score)
// FROM students
// WHERE age IS NOT NULL AND age >= 9 AND age <= 16
// GROUP BY school
// ORDER BY SUM(score) DESC
// LIMIT 2
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
.whereNotNull(Student::getAge)
.whereBetween(Student::getAge, 9, 16)
.groupBySum(Student::getSchool, Student::getScore)
.sortDesc(FI2::getC2)
.cutFirst(2);
sdf2.show();Output:
c1 c2
三中 10
二中 7
一中 42. API Cases
2.1 Matrix Information
void show(int n); // Print matrix information to console
List<String> columns(); // Get matrix header field names
List<R> col(Function<T,R> f); // Get values of a specific column
T head(); // Get the first element
List<T> head(int n); // Get the first n elements
T tail(); // Get the last element
List<T> tail(int n); // Get the last n elements2.2 Filtering
whereBetween(Student::getAge, 3, 6); // age in [3,6]
whereBetweenR(Student::getAge, 3, 6); // age in (3,6]
whereBetweenL(Student::getAge, 3, 6); // age in [3,6)
whereNotNull(Student::getName); // name not null (including empty string)
whereGt(Student::getAge, 3); // age > 3
whereGe(Student::getAge, 3); // age >= 3
whereLt(Student::getAge, 3); // age < 3
whereIn(Student::getAge, Arrays.asList(3,7,8)); // age is 3,7 or 8
whereNotIn(Student::getAge, Arrays.asList(3,7,8)); // age not 3,7,8
whereEq(Student::getAge, 3); // age == 3
whereNotEq(Student::getAge, 3); // age != 3
whereLike(Student::getName, "jay"); // LIKE "%jay%"
whereLikeLeft(Student::getName, "jay"); // LIKE "jay%"
whereLikeRight(Student::getName, "jay"); // LIKE "%jay"2.3 Aggregation
Student s1 = frame.max(Student::getAge); // student with max age
Integer s2 = frame.maxValue(Student::getAge); // max age value
Student s3 = frame.min(Student::getAge); // student with min age
Integer s4 = frame.minValue(Student::getAge); // min age value
BigDecimal s5 = frame.avg(Student::getAge); // average age
BigDecimal s6 = frame.sum(Student::getAge); // sum of ages
MaxMin<Student> s7 = frame.maxMin(Student::getAge); // both max and min student
MaxMin<Integer> s8 = frame.maxMinValue(Student::getAge); // both max and min values2.4 Distinct
List<Student> std = SDFrame.read(studentList).distinct().toLists(); // distinct by object hashCode
std = SDFrame.read(studentList).distinct(Student::getSchool).toLists(); // distinct by school name
std = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists(); // distinct by concatenated fields
std = SDFrame.read(studentList).distinct(Student::getSchool).distinct(Student::getLevel).toLists(); // chain distinct2.5 Simple Group‑by Aggregation
// SELECT school, SUM(age) ... GROUP BY school
List<FI2<String, BigDecimal>> a = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
// SELECT school, MAX(age) ... GROUP BY school
List<FI2<String, Integer>> a2 = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
// SELECT school, MAX(age) ... GROUP BY school (returns Student object)
List<FI2<String, Student>> a3 = frame.groupByMax(Student::getSchool, Student::getAge).toLists();
// SELECT school, MIN(age) ... GROUP BY school
List<FI2<String, Integer>> a4 = frame.groupByMinValue(Student::getSchool, Student::getAge).toLists();
// SELECT school, COUNT(*) ... GROUP BY school
List<FI2<String, Long>> a5 = frame.groupByCount(Student::getSchool).toLists();
// SELECT school, AVG(age) ... GROUP BY school
List<FI2<String, BigDecimal>> a6 = frame.groupByAvg(Student::getSchool, Student::getAge).toLists();
// SELECT school, SUM(age), COUNT(age) ... GROUP BY school
List<FI3<String, BigDecimal, Long>> a7 = frame.groupBySumCount(Student::getSchool, Student::getAge).toLists();
// 2‑level group: SELECT school, level, SUM(age), COUNT(age) ... GROUP BY school, level
List<FI3<String, String, BigDecimal>> a8 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getAge).toLists();
// 3‑level group: SELECT school, level, name, SUM(age), COUNT(age) ... GROUP BY school, level, name
List<FI4<String, String, String, BigDecimal>> a9 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getName, Student::getAge).toLists();2.6 Sorting
// ORDER BY age DESC
SDFrame.read(studentList).sortDesc(Student::getAge);
// ORDER BY age DESC, level ASC
SDFrame.read(studentList).sortDesc(Student::getAge).sortAsc(Student::getLevel);
// ORDER BY age ASC
SDFrame.read(studentList).sortAsc(Student::getAge);
// Comparator based sorting
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));2.7 Join Matrix
API List:
append(T t); // equivalent to Collection.add
union(IFrame<T> other); // equivalent to Collection.addAll
join(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join); // inner join
leftJoin(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join); // left join (null for missing K)
rightJoin(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join); // right join (null for missing T)Inner join example:
System.out.println("======== Matrix1 =======");
SDFrame<Student> sdf = SDFrame.read(studentList);
sdf.show(20);
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
.whereNotNull(Student::getAge)
.whereBetween(Student::getAge, 9, 16)
.groupBySum(Student::getSchool, Student::getScore)
.sortDesc(FI2::getC2)
.cutFirst(10);
System.out.println("======== Matrix2 =======");
sdf2.show();
SDFrame<UserInfo> frame = sdf.join(sdf2, (a,b) -> a.getSchool().equals(b.getC1()), (a,b) -> {
UserInfo ui = new UserInfo();
ui.setKey1(a.getSchool());
ui.setKey2(b.getC2().intValue());
ui.setKey3(String.valueOf(a.getId()));
return ui;
});
System.out.println("======== Joined Result =======");
frame.show(5);Printed result is equivalent to:
select a.*, b.* from sdf a inner join sdf2 b on a.school = b.c12.8 Other Utilities
// Percent conversion (score * 100, rounded to 2 decimals)
SDFrame<Student> map2 = SDFrame.read(studentList).mapPercent(Student::getScore, Student::setScore, 2);
// Partition every 5 elements into sub‑lists
List<List<Student>> t = SDFrame.read(studentList).partition(5).toLists();
// Generate sequential numbers based on age order (starting from 0)
SDFrame.read(studentList)
.sortDesc(Student::getAge)
.addSortNoCol(Student::setRank)
.show(30);
// Generate ranking numbers (same rank for equal values, start from 0)
SDFrame<Student> df = SDFrame.read(studentList).addRankingSameColDesc(Student::getAge, Student::setRank);
df.show(20);
// Replenish missing school entries
List<String> allDim = Arrays.asList("一中","二中","三中","四中");
SDFrame.read(studentList).replenish(Student::getSchool, allDim, school -> new Student(school)).show();
// Replenish missing level entries within each school
SDFrame.read(studentList).replenish(Student::getSchool, Student::getLevel, (school, level) -> new Student(school, level)).show(30);Final Notes
Code repository: https://github.com/burukeYou/JDFrame
Maven dependency: https://central.sonatype.com/artifact/io.github.burukeyou/jdframe
JDFrame provides two frames, SDFrame and JDFrame, with identical APIs. JDFrame operations take effect immediately without needing to re‑read, while SDFrame behaves like the native Stream API where operations are lazy and only applied on terminal actions. Use SDFrame for simple one‑pass stream processing; use JDFrame when you need intermediate results to serve as the starting point for further calculations, similar to a DataFrame model.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
