Simplify Java Stream Operations with JDFrame: A Semantic DataFrame API

This article introduces JDFrame/SDFrame, a JVM‑level DataFrame library that provides a more semantic and concise API for Java 8 stream processing, showcases quick start steps, detailed API categories such as filtering, aggregation, grouping, sorting, joining, and explains the differences between SDFrame and JDFrame with practical code examples.

Java Architect Essentials
Java Architect Essentials
Java Architect Essentials
Simplify Java Stream Operations with JDFrame: A Semantic DataFrame API

Author "Architect" introduces a JVM‑level DataFrame tool that offers a semantic and simplified API for Java 8 stream processing, inspired by DataFrame models like Spark, tablesaw, and joinery.

1. Quick Start

1.1 Add Dependency

<dependency>
    <groupId>io.github.burukeyou</groupId>
    <artifactId>jdframe</artifactId>
    <version>0.0.4</version>
</dependency>

1.2 Example

Calculate the total score of each school for students whose age is not null and between 9 and 16, then retrieve the top two schools.

static List<Student> studentList = new ArrayList<>();
static {
    studentList.add(new Student(1, "a", "一中", "一年级", 11, new BigDecimal(1)));
    studentList.add(new Student(2, "a", "一中", "一年级", 11, new BigDecimal(1)));
    studentList.add(new Student(3, "b", "一中", "三年级", 12, new BigDecimal(2)));
    studentList.add(new Student(4, "c", "二中", "一年级", 13, new BigDecimal(3)));
    studentList.add(new Student(5, "d", "二中", "一年级", 14, new BigDecimal(4)));
    studentList.add(new Student(6, "e", "三中", "二年级", 14, new BigDecimal(5)));
    studentList.add(new Student(7, "e", "三中", "二年级", 15, new BigDecimal(5)));
}
// Equivalent SQL:
// select school, sum(score) from students
// where age is not null and age >=9 and age <=16
// group by school order by sum(score) desc limit 2
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
        .whereNotNull(Student::getAge)
        .whereBetween(Student::getAge, 9, 16)
        .groupBySum(Student::getSchool, Student::getScore)
        .sortDesc(FI2::getC2)
        .cutFirst(2);

sdf2.show();

Output:

c1   c2
三中 10
二中 7
一中 4

2. API Cases

2.1 Matrix View

void show(int n);               // print matrix
List<String> columns();         // get column names
List<R> col(Function<T,R> f);  // get a column
T head();                       // first element
List<T> head(int n);            // first n elements
T tail();                       // last element
List<T> tail(int n);           // last n elements
List<T> page(int page, int pageSize); // pagination

2.2 Filtering

SDFrame.read(studentList)
    .whereBetween(Student::getAge, 3, 6)          // [3,6]
    .whereBetweenR(Student::getAge, 3, 6)         // (3,6]
    .whereBetweenL(Student::getAge, 3, 6)         // [3,6)
    .whereNotNull(Student::getName)               // name not null
    .whereGt(Student::getAge, 3)                  // >3
    .whereGe(Student::getAge, 3)                  // >=3
    .whereLt(Student::getAge, 3)                  // <3
    .whereIn(Student::getAge, Arrays.asList(3,7,8))
    .whereNotIn(Student::getAge, Arrays.asList(3,7,8))
    .whereEq(Student::getAge, 3)
    .whereNotEq(Student::getAge, 3)
    .whereLike(Student::getName, "jay")
    .whereLikeLeft(Student::getName, "jay")
    .whereLikeRight(Student::getName, "jay");

2.3 Aggregation

JDFrame<Student> frame = JDFrame.read(studentList);
Student maxAgeStudent = frame.max(Student::getAge);
Integer maxAge = frame.maxValue(Student::getAge);
Student minAgeStudent = frame.min(Student::getAge);
Integer minAge = frame.minValue(Student::getAge);
BigDecimal avgAge = frame.avg(Student::getAge);
BigDecimal sumAge = frame.sum(Student::getAge);
MaxMin<Student> maxMinStudent = frame.maxMin(Student::getAge);
MaxMin<Integer> maxMinValue = frame.maxMinValue(Student::getAge);

2.4 Distinct

List<Student> distinctByObject = SDFrame.read(studentList).distinct().toLists();
List<Student> distinctBySchool = SDFrame.read(studentList).distinct(Student::getSchool).toLists();
List<Student> distinctBySchoolAndLevel = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists();
List<Student> multiDistinct = SDFrame.read(studentList).distinct(Student::getSchool).distinct(Student::getLevel).toLists();

2.5 Group & Aggregate (SQL‑like)

List<FI2<String, BigDecimal>> sumBySchool = JDFrame.from(studentList)
    .groupBySum(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> maxBySchool = JDFrame.from(studentList)
    .groupByMaxValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Student>> maxObjBySchool = JDFrame.from(studentList)
    .groupByMax(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> minBySchool = JDFrame.from(studentList)
    .groupByMinValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Long>> countBySchool = JDFrame.from(studentList)
    .groupByCount(Student::getSchool).toLists();
List<FI2<String, BigDecimal>> avgBySchool = JDFrame.from(studentList)
    .groupByAvg(Student::getSchool, Student::getAge).toLists();
List<FI3<String, BigDecimal, Long>> sumCountBySchool = JDFrame.from(studentList)
    .groupBySumCount(Student::getSchool, Student::getAge).toLists();
// two‑level grouping
List<FI3<String, String, BigDecimal>> sumBySchoolLevel = JDFrame.from(studentList)
    .groupBySum(Student::getSchool, Student::getLevel, Student::getAge).toLists();
// three‑level grouping
List<FI4<String, String, String, BigDecimal>> sumBySchoolLevelName = JDFrame.from(studentList)
    .groupBySum(Student::getSchool, Student::getLevel, Student::getName, Student::getAge).toLists();

2.6 Sorting

// order by age desc
SDFrame.read(studentList).sortDesc(Student::getAge);
// multi‑level: age desc, level asc
SDFrame.read(studentList).sortAsc(Sorter.sortDescBy(Student::getAge).sortAsc(Student::getLevel));
// order by age asc
SDFrame.read(studentList).sortAsc(Student::getAge);
// using Comparator
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));

2.7 Join (Matrix Connection)

append(T t);               // like List.add
union(IFrame<T> other);      // like List.addAll
join(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join);      // inner join
leftJoin(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join);   // left join
rightJoin(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join);  // right join

Example of inner join:

System.out.println("======== Matrix1 =======");
SDFrame<Student> sdf = SDFrame.read(studentList);
sdf.show(20);

SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
        .whereNotNull(Student::getAge)
        .whereBetween(Student::getAge, 9, 16)
        .groupBySum(Student::getSchool, Student::getScore)
        .sortDesc(FI2::getC2)
        .cutFirst(10);

System.out.println("======== Matrix2 =======");
sdf2.show();

JDFrame<UserInfo> frame = sdf.join(sdf2,
        (a,b) -> a.getSchool().equals(b.getC1()),
        (a,b) -> {
            UserInfo ui = new UserInfo();
            ui.setKey1(a.getSchool());
            ui.setKey2(b.getC2().intValue());
            ui.setKey3(String.valueOf(a.getId()));
            return ui;
        });
System.out.println("======== Joined Result =======");
frame.show(5);

2.8 Cutting

cutFirst(int n);   // first n rows
cutLast(int n);    // last n rows
cut(Integer start, Integer end); // sub‑list like List.subList
cutPage(int page, int pageSize); // pagination
cutFirstRank(Sorter<T> sorter, int n); // top n by rank

2.9 Frame Parameter Settings

defaultScale(int scale, RoundingMode roundingMode); // set default decimal precision

2.10 Other Utilities

Percentage conversion: SDFrame.read(list).mapPercent(getScore, setScore, 2) Partition: split list into sub‑lists of a given size

Add row number column based on ordering

Replenish missing dimension entries (e.g., missing schools or grades) using a custom function

3. Window Functions

JDFrame also supports programmable window functions; see the tutorial at https://juejin.cn/post/7367306429054959631 .

Final Notes

Code repository: https://github.com/burukeYou/JDFrame

Maven coordinates: https://central.sonatype.com/artifact/io.github.burukeyou/jdframe

JDFrame provides two frames with identical APIs: SDFrame behaves like a lazy Java Stream (operations take effect only on terminal actions and require a new read for each stage), while JDFrame updates immediately, allowing intermediate results to be reused without re‑reading. Use SDFrame for simple one‑pass stream processing and JDFrame when you need to pause, inspect, or branch the data flow.

Javabackend developmentstreamDataFrameJDFrame
Java Architect Essentials
Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.