JDFrame/SDFrame: A JVM‑Level DataFrame‑like API for Simplified Stream Processing in Java
This article introduces JDFrame/SDFrame, a Java library that provides a DataFrame‑style, semantic API for JVM‑level stream processing, demonstrates quick start with Maven dependency, and showcases extensive examples covering filtering, aggregation, distinct, grouping, sorting, joining, partitioning, ranking, and data replenishment, helping developers write concise, readable data‑processing code.
Because remembering Java Stream APIs can be cumbersome, the author created JDFrame/SDFrame, a JVM‑level DataFrame‑like tool that offers a more semantic and concise way to work with streams, similar to Spark or Pandas.
0. Introduction
The library aims to simplify stream operations, avoid hard‑coded field names, and allow anonymous functions for field processing.
1. Quick Start
1.1 Add Maven Dependency
<dependency>
<groupId>io.github.burukeyou</groupId>
<artifactId>jdframe</artifactId>
<version>0.0.2</version>
</dependency>1.2 Example
Calculate the total score of students aged 9‑16 for each school and list the top two schools.
static List<Student> studentList = new ArrayList<>();
static {
studentList.add(new Student(1,"a","一中","一年级",11,new BigDecimal(1)));
// ... other students omitted for brevity
}
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
.whereNotNull(Student::getAge)
.whereBetween(Student::getAge,9,16)
.groupBySum(Student::getSchool, Student::getScore)
.sortDesc(FI2::getC2)
.cutFirst(2);
sdf2.show();Output:
c1 c2
三中 10
二中 72. API Examples
2.1 Matrix Information
void show(int n); // print matrix
List<String> columns(); // header names
List<R> col(Function<T,R> function); // get column values
T head();
List<T> head(int n);
T tail();
List<T> tail(int n);2.2 Filtering
SDFrame.read(studentList)
.whereBetween(Student::getAge,3,6) // [3,6]
.whereBetweenR(Student::getAge,3,6) // (3,6]
.whereBetweenL(Student::getAge,3,6) // [3,6)
.whereNotNull(Student::getName) // not null or empty
.whereGt(Student::getAge,3)
.whereGe(Student::getAge,3)
.whereLt(Student::getAge,3)
.whereIn(Student::getAge, Arrays.asList(3,7,8))
.whereNotIn(Student::getAge, Arrays.asList(3,7,8))
.whereEq(Student::getAge,3)
.whereNotEq(Student::getAge,3)
.whereLike(Student::getName,"jay") // %jay%
.whereLikeLeft(Student::getName,"jay") // jay%
.whereLikeRight(Student::getName,"jay"); // %jay2.3 Aggregation
JDFrame<Student> frame = JDFrame.read(studentList);
Student s1 = frame.max(Student::getAge);
Integer s2 = frame.maxValue(Student::getAge);
Student s3 = frame.min(Student::getAge);
Integer s4 = frame.minValue(Student::getAge);
BigDecimal s5 = frame.avg(Student::getAge);
BigDecimal s6 = frame.sum(Student::getAge);
MaxMin<Student> s7 = frame.maxMin(Student::getAge);
MaxMin<Integer> s8 = frame.maxMinValue(Student::getAge);2.4 Distinct
Native streams only support object‑level deduplication; JDFrame adds field‑level distinct.
List<Student> std = SDFrame.read(studentList).distinct().toLists();
std = SDFrame.read(studentList).distinct(Student::getSchool).toLists();
std = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists();
std = SDFrame.read(studentList).distinct(Student::getSchool).distinct(Student::getLevel).toLists();2.5 Simple Group‑by Aggregation
JDFrame<Student> frame = JDFrame.from(studentList);
List<FI2<String, BigDecimal>> a = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> a2 = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Student>> a3 = frame.groupByMax(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> a4 = frame.groupByMinValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Long>> a5 = frame.groupByCount(Student::getSchool).toLists();
List<FI2<String, BigDecimal>> a6 = frame.groupByAvg(Student::getSchool, Student::getAge).toLists();
List<FI3<String, BigDecimal, Long>> a7 = frame.groupBySumCount(Student::getSchool, Student::getAge).toLists();
List<FI3<String, String, BigDecimal>> a8 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getAge).toLists();
List<FI4<String, String, String, BigDecimal>> a9 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getName, Student::getAge).toLists();2.6 Sorting
SDFrame.read(studentList).sortDesc(Student::getAge);
SDFrame.read(studentList).sortDesc(Student::getAge).sortAsc(Student::getLevel);
SDFrame.read(studentList).sortAsc(Student::getAge);
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));2.7 Joining Matrices
SDFrame<Student> sdf = SDFrame.read(studentList);
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
.whereNotNull(Student::getAge)
.whereBetween(Student::getAge,9,16)
.groupBySum(Student::getSchool, Student::getScore)
.sortDesc(FI2::getC2)
.cutFirst(10);
SDFrame<UserInfo> frame = sdf.join(sdf2, (a,b) -> a.getSchool().equals(b.getC1()), (a,b) -> {
UserInfo ui = new UserInfo();
ui.setKey1(a.getSchool());
ui.setKey2(b.getC2().intValue());
ui.setKey3(String.valueOf(a.getId()));
return ui;
});
frame.show(5);2.8 Other Utilities
Percentage conversion:
SDFrame<Student> map2 = SDFrame.read(studentList)
.mapPercent(Student::getScore, Student::setScore, 2);Partition into sub‑lists of size 5:
List<List<Student>> parts = SDFrame.read(studentList).partition(5).toLists();Generate sequential numbers (sort‑no) after sorting by age:
SDFrame.read(studentList)
.sortDesc(Student::getAge)
.addSortNoCol(Student::setRank)
.show(30);Generate ranking numbers where equal values share the same rank:
SDFrame<Student> df = SDFrame.read(studentList)
.addRankingSameColDesc(Student::getAge, Student::setRank);
df.show(20);Replenish missing dimension entries (e.g., schools or grades):
// Fill missing schools
List<String> allDim = Arrays.asList("一中","二中","三中","四中");
SDFrame.read(studentList).replenish(Student::getSchool, allDim, school -> new Student(school)).show();
// Fill missing grades per school
SDFrame.read(studentList).replenish(Student::getSchool, Student::getLevel, (school, level) -> new Student(school, level)).show(30);Conclusion
JDFrame and SDFrame share the same API; JDFrame applies operations immediately, while SDFrame behaves like Java Stream, applying changes only on terminal operations. Choose SDFrame for one‑shot stream processing and JDFrame when intermediate results are needed, similar to the DataFrame model.
The library still has many less‑used APIs not listed here, and the author hopes for a future JVM‑level "Pandas" in Java.
Top Architecture Tech Stack
Sharing Java and Python tech insights, with occasional practical development tool tips.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.