Backend Development 17 min read

JDFrame/SDFrame: A JVM‑Level DataFrame‑like API for Simplified Stream Processing in Java

This article introduces JDFrame/SDFrame, a Java library that provides a DataFrame‑style, semantic API for JVM‑level stream processing, demonstrates quick start with Maven dependency, and showcases extensive examples covering filtering, aggregation, distinct, grouping, sorting, joining, partitioning, ranking, and data replenishment, helping developers write concise, readable data‑processing code.

Top Architecture Tech Stack
Top Architecture Tech Stack
Top Architecture Tech Stack
JDFrame/SDFrame: A JVM‑Level DataFrame‑like API for Simplified Stream Processing in Java

Because remembering Java Stream APIs can be cumbersome, the author created JDFrame/SDFrame, a JVM‑level DataFrame‑like tool that offers a more semantic and concise way to work with streams, similar to Spark or Pandas.

0. Introduction

The library aims to simplify stream operations, avoid hard‑coded field names, and allow anonymous functions for field processing.

1. Quick Start

1.1 Add Maven Dependency

<dependency>
    <groupId>io.github.burukeyou</groupId>
    <artifactId>jdframe</artifactId>
    <version>0.0.2</version>
</dependency>

1.2 Example

Calculate the total score of students aged 9‑16 for each school and list the top two schools.

static List<Student> studentList = new ArrayList<>();

static {
    studentList.add(new Student(1,"a","一中","一年级",11,new BigDecimal(1)));
    // ... other students omitted for brevity
}

SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
        .whereNotNull(Student::getAge)
        .whereBetween(Student::getAge,9,16)
        .groupBySum(Student::getSchool, Student::getScore)
        .sortDesc(FI2::getC2)
        .cutFirst(2);

sdf2.show();

Output:

c1  c2
三中 10
二中 7

2. API Examples

2.1 Matrix Information

void show(int n); // print matrix
List<String> columns(); // header names
List<R> col(Function<T,R> function); // get column values
T head();
List<T> head(int n);
T tail();
List<T> tail(int n);

2.2 Filtering

SDFrame.read(studentList)
        .whereBetween(Student::getAge,3,6) // [3,6]
        .whereBetweenR(Student::getAge,3,6) // (3,6]
        .whereBetweenL(Student::getAge,3,6) // [3,6)
        .whereNotNull(Student::getName) // not null or empty
        .whereGt(Student::getAge,3)
        .whereGe(Student::getAge,3)
        .whereLt(Student::getAge,3)
        .whereIn(Student::getAge, Arrays.asList(3,7,8))
        .whereNotIn(Student::getAge, Arrays.asList(3,7,8))
        .whereEq(Student::getAge,3)
        .whereNotEq(Student::getAge,3)
        .whereLike(Student::getName,"jay") // %jay%
        .whereLikeLeft(Student::getName,"jay") // jay%
        .whereLikeRight(Student::getName,"jay"); // %jay

2.3 Aggregation

JDFrame<Student> frame = JDFrame.read(studentList);
Student s1 = frame.max(Student::getAge);
Integer s2 = frame.maxValue(Student::getAge);
Student s3 = frame.min(Student::getAge);
Integer s4 = frame.minValue(Student::getAge);
BigDecimal s5 = frame.avg(Student::getAge);
BigDecimal s6 = frame.sum(Student::getAge);
MaxMin<Student> s7 = frame.maxMin(Student::getAge);
MaxMin<Integer> s8 = frame.maxMinValue(Student::getAge);

2.4 Distinct

Native streams only support object‑level deduplication; JDFrame adds field‑level distinct.

List<Student> std = SDFrame.read(studentList).distinct().toLists();
std = SDFrame.read(studentList).distinct(Student::getSchool).toLists();
std = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists();
std = SDFrame.read(studentList).distinct(Student::getSchool).distinct(Student::getLevel).toLists();

2.5 Simple Group‑by Aggregation

JDFrame<Student> frame = JDFrame.from(studentList);
List<FI2<String, BigDecimal>> a = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> a2 = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Student>> a3 = frame.groupByMax(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> a4 = frame.groupByMinValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Long>> a5 = frame.groupByCount(Student::getSchool).toLists();
List<FI2<String, BigDecimal>> a6 = frame.groupByAvg(Student::getSchool, Student::getAge).toLists();
List<FI3<String, BigDecimal, Long>> a7 = frame.groupBySumCount(Student::getSchool, Student::getAge).toLists();
List<FI3<String, String, BigDecimal>> a8 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getAge).toLists();
List<FI4<String, String, String, BigDecimal>> a9 = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getName, Student::getAge).toLists();

2.6 Sorting

SDFrame.read(studentList).sortDesc(Student::getAge);
SDFrame.read(studentList).sortDesc(Student::getAge).sortAsc(Student::getLevel);
SDFrame.read(studentList).sortAsc(Student::getAge);
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));

2.7 Joining Matrices

SDFrame<Student> sdf = SDFrame.read(studentList);
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
        .whereNotNull(Student::getAge)
        .whereBetween(Student::getAge,9,16)
        .groupBySum(Student::getSchool, Student::getScore)
        .sortDesc(FI2::getC2)
        .cutFirst(10);

SDFrame<UserInfo> frame = sdf.join(sdf2, (a,b) -> a.getSchool().equals(b.getC1()), (a,b) -> {
    UserInfo ui = new UserInfo();
    ui.setKey1(a.getSchool());
    ui.setKey2(b.getC2().intValue());
    ui.setKey3(String.valueOf(a.getId()));
    return ui;
});
frame.show(5);

2.8 Other Utilities

Percentage conversion:

SDFrame<Student> map2 = SDFrame.read(studentList)
        .mapPercent(Student::getScore, Student::setScore, 2);

Partition into sub‑lists of size 5:

List<List<Student>> parts = SDFrame.read(studentList).partition(5).toLists();

Generate sequential numbers (sort‑no) after sorting by age:

SDFrame.read(studentList)
        .sortDesc(Student::getAge)
        .addSortNoCol(Student::setRank)
        .show(30);

Generate ranking numbers where equal values share the same rank:

SDFrame<Student> df = SDFrame.read(studentList)
        .addRankingSameColDesc(Student::getAge, Student::setRank);
df.show(20);

Replenish missing dimension entries (e.g., schools or grades):

// Fill missing schools
List<String> allDim = Arrays.asList("一中","二中","三中","四中");
SDFrame.read(studentList).replenish(Student::getSchool, allDim, school -> new Student(school)).show();

// Fill missing grades per school
SDFrame.read(studentList).replenish(Student::getSchool, Student::getLevel, (school, level) -> new Student(school, level)).show(30);

Conclusion

JDFrame and SDFrame share the same API; JDFrame applies operations immediately, while SDFrame behaves like Java Stream, applying changes only on terminal operations. Choose SDFrame for one‑shot stream processing and JDFrame when intermediate results are needed, similar to the DataFrame model.

The library still has many less‑used APIs not listed here, and the author hopes for a future JVM‑level "Pandas" in Java.

JavaAPIStreamdataframejdframesdframe
Top Architecture Tech Stack
Written by

Top Architecture Tech Stack

Sharing Java and Python tech insights, with occasional practical development tool tips.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.