Backend Development 16 min read

Simplify Java Stream Processing with JDFrame: A Semantic DataFrame API

This article introduces JDFrame/SDFrame, a JVM‑level DataFrame‑style library that provides semantic, chainable APIs for Java 8 streams, covering quick start, dependency setup, example use cases, and detailed API categories such as matrix view, filtering, aggregation, distinct, grouping, sorting, joining, slicing, parameter settings, percentage conversion, partitioning, row‑number generation, and data replenishment, all illustrated with concise code snippets.

macrozheng
macrozheng
macrozheng
Simplify Java Stream Processing with JDFrame: A Semantic DataFrame API

0 Introduction

When frequently using Java Stream APIs, developers often forget method signatures and write verbose code. Inspired by DataFrame‑style APIs in Spark and Pandas, JDFrame offers a JVM‑level, semantic, and concise alternative to Java 8 streams.

1 Quick Start

1.1 Add Dependency

<code><dependency>
  <groupId>io.github.burukeyou</groupId>
  <artifactId>jdframe</artifactId>
  <version>0.0.6</version>
</dependency>
</code>

The library is open‑source and can be used together with the mall SpringBoot3 + Vue e‑commerce project (GitHub ★60K) for full‑stack demos.

1.2 Example Case

Calculate the total score of students aged 9‑16 for each school and return the top‑2 schools.

<code>static List<Student> studentList = new ArrayList<>();
static {
    studentList.add(new Student(1, "a", "一中", "一年级", 11, new BigDecimal(1)));
    studentList.add(new Student(2, "a", "一中", "一年级", 11, new BigDecimal(1)));
    studentList.add(new Student(3, "b", "一中", "三年级", 12, new BigDecimal(2)));
    studentList.add(new Student(4, "c", "二中", "一年级", 13, new BigDecimal(3)));
    studentList.add(new Student(5, "d", "二中", "一年级", 14, new BigDecimal(4)));
    studentList.add(new Student(6, "e", "三中", "二年级", 14, new BigDecimal(5)));
    studentList.add(new Student(7, "e", "三中", "二年级", 15, new BigDecimal(5)));
}

// Equivalent SQL:
// SELECT school, SUM(score) FROM students
// WHERE age IS NOT NULL AND age BETWEEN 9 AND 16
// GROUP BY school ORDER BY SUM(score) DESC LIMIT 2
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
    .whereNotNull(Student::getAge)
    .whereBetween(Student::getAge, 9, 16)
    .groupBySum(Student::getSchool, Student::getScore)
    .sortDesc(FI2::getC2)
    .cutFirst(2);

sdf2.show();
</code>

Output:

<code>三中 10
二中 7
</code>

2 API Catalog

2.1 Matrix View

<code>void show(int n);               // Print matrix with n rows
List<String> columns();          // Get header names
List<R> col(Function<T,R> fn); // Get a column by function
T head();                       // First element
List<T> head(int n);            // First n elements
T tail();                       // Last element
List<T> tail(int n);            // Last n elements
List<T> page(int page, int pageSize); // Pagination
</code>

2.2 Filtering

<code>SDFrame.read(list)
    .whereBetween(Student::getAge, 3, 6)          // [3,6]
    .whereBetweenR(Student::getAge, 3, 6)         // (3,6]
    .whereBetweenL(Student::getAge, 3, 6)         // [3,6)
    .whereNotNull(Student::getName)               // name != null && name != ""
    .whereGt(Student::getAge, 3)                  // age > 3
    .whereGe(Student::getAge, 3)                  // age >= 3
    .whereLt(Student::getAge, 3)                  // age < 3
    .whereIn(Student::getAge, Arrays.asList(3,7,8))
    .whereNotIn(Student::getAge, Arrays.asList(3,7,8))
    .whereEq(Student::getAge, 3)
    .whereNotEq(Student::getAge, 3)
    .whereLike(Student::getName, "jay")          // %jay%
    .whereLikeLeft(Student::getName, "jay")      // jay%
    .whereLikeRight(Student::getName, "jay");   // %jay
</code>

2.3 Aggregation

<code>JDFrame<Student> frame = JDFrame.read(list);
Student maxAgeStudent = frame.max(Student::getAge);
Integer maxAge = frame.maxValue(Student::getAge);
Student minAgeStudent = frame.min(Student::getAge);
Integer minAge = frame.minValue(Student::getAge);
BigDecimal avgAge = frame.avg(Student::getAge);
BigDecimal sumAge = frame.sum(Student::getAge);
MaxMin<Student> maxMinStudent = frame.maxMin(Student::getAge);
MaxMin<Integer> maxMinValue = frame.maxMinValue(Student::getAge);
</code>

2.4 Distinct

<code>List<Student> distinctByObject = SDFrame.read(list).distinct().toLists();
List<Student> distinctBySchool = SDFrame.read(list).distinct(Student::getSchool).toLists();
List<Student> distinctByComposite = SDFrame.read(list).distinct(e -> e.getSchool() + e.getLevel()).toLists();
</code>

2.5 Group‑by Aggregation

<code>// group by school and sum age
List<FI2<String, BigDecimal>> sumAge = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
// group by school and max age value
List<FI2<String, Integer>> maxAge = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
// group by school and count
List<FI2<String, Long>> count = frame.groupByCount(Student::getSchool).toLists();
// multi‑level grouping (school, level)
List<FI3<String, String, BigDecimal>> sumBySchoolLevel = frame.groupBySum(Student::getSchool, Student::getLevel, Student::getAge).toLists();
</code>

2.6 Sorting

<code>SDFrame.read(list).sortDesc(Student::getAge); // age DESC
SDFrame.read(list).sortAsc(Sorter.sortDescBy(Student::getAge).sortAsc(Student::getLevel)); // multi‑level
SDFrame.read(list).sortAsc(Student::getAge); // age ASC
SDFrame.read(list).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));
</code>

2.7 Joining

<code>// Inner join example
SDFrame<Student> a = SDFrame.read(list);
SDFrame<FI2<String, BigDecimal>> b = /* result from previous aggregation */;
SDFrame<UserInfo> joined = a.join(b, (s, f) -> s.getSchool().equals(f.getC1()), (s, f) -> {
    UserInfo ui = new UserInfo();
    ui.setKey1(s.getSchool());
    ui.setKey2(f.getC2().intValue());
    ui.setKey3(String.valueOf(s.getId()));
    return ui;
});
joined.show(5);
</code>

2.8 Slicing

<code>cutFirst(int n);   // first n rows
cutLast(int n);    // last n rows
cut(int start, int end); // sub‑list [start, end)
cutPage(int page, int pageSize); // pagination
cutFirstRank(Sorter<T> sorter, int n); // top‑n by rank
</code>

2.9 Frame Parameter Settings

<code>defaultScale(int scale, RoundingMode roundingMode); // set default decimal precision
</code>

2.10 Miscellaneous

<code>// Percentage conversion (score * 100, rounded to 2 decimals)
SDFrame<Student> percent = SDFrame.read(list).mapPercent(Student::getScore, Student::setScore, 2);

// Partition into sub‑lists of size 5
List<List<Student>> partitions = SDFrame.read(list).partition(5).toLists();

// Generate row number after sorting by age
SDFrame.read(list)
    .sortDesc(Student::getAge)
    .addRowNumberCol(Student::setRank)
    .show(30);

// Replenish missing dimension values
List<String> allSchools = Arrays.asList("一中","二中","三中","四中");
SDFrame.read(list).replenish(Student::getSchool, allSchools, school -> new Student(school)).show();
</code>

Project Source

GitHub: https://github.com/burukeYou/JDFrame

Conclusion

JDFrame provides a real‑time DataFrame API where operations take effect immediately, while SDFrame follows the lazy evaluation model of Java streams. Choose JDFrame for intermediate‑state inspection and SDFrame for pure stream processing. The library brings DataFrame‑style semantics to the JVM, offering a promising direction for future Java data‑processing APIs.

JavaData ProcessingStreamdataframejdframesdframe
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.