Backend Development 18 min read

JDFrame/SDFrame: A JVM‑Level DataFrame API for Simplified Java Stream Processing

This article introduces JDFrame and SDFrame, two Java libraries that provide a DataFrame‑style, semantic API for simplifying stream operations, including dependency setup, quick‑start examples, matrix viewing, filtering, aggregation, deduplication, grouping, sorting, joining, pagination, window functions, and a comparison of their execution models, along with links to the source code and documentation.

Top Architect
Top Architect
Top Architect
JDFrame/SDFrame: A JVM‑Level DataFrame API for Simplified Java Stream Processing

The author, a senior architect, presents a JVM‑level DataFrame tool designed to make Java 8 stream processing more semantic and concise, inspired by DataFrame models used in big‑data frameworks like Spark.

0. Introduction

Java streams often require verbose API calls; the author proposes a DataFrame‑style API ( DataFrame ) that abstracts common operations and supports anonymous functions for field handling.

1. Quick Start

1.1 Add Dependency

<dependency>
    <groupId>io.github.burukeyou</groupId>
    <artifactId>jdframe</artifactId>
    <version>0.0.4</version>
</dependency>

1.2 Example

Calculate the total score of students aged 9‑16 for each school and retrieve the top‑2 schools.

static List
studentList = new ArrayList<>();
// ...populate studentList with Student objects...
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
    .whereNotNull(Student::getAge)
    .whereBetween(Student::getAge, 9, 16)
    .groupBySum(Student::getSchool, Student::getScore)
    .sortDesc(FI2::getC2)
    .cutFirst(2);

sdf2.show();

Output:

c1    c2
三中  10
二中  7

2. API Cases

2.1 Matrix View

void show(int n); // print matrix
List
columns(); // get column names
List
col(Function
function); // get column values
T head(); // first element
List
head(int n); // first n elements
T tail(); // last element
List
tail(int n); // last n elements
List
page(int page, int pageSize); // pagination

2.2 Filtering

.whereBetween(Student::getAge, 3, 6)
.whereBetweenR(Student::getAge, 3, 6) // (3,6]
.whereBetweenL(Student::getAge, 3, 6) // [3,6)
.whereNotNull(Student::getName)
.whereGt(Student::getAge, 3)
.whereGe(Student::getAge, 3)
.whereLt(Student::getAge, 3)
.whereIn(Student::getAge, Arrays.asList(3,7,8))
.whereNotIn(Student::getAge, Arrays.asList(3,7,8))
.whereEq(Student::getAge, 3)
.whereNotEq(Student::getAge, 3)
.whereLike(Student::getName, "jay")
.whereLikeLeft(Student::getName, "jay")
.whereLikeRight(Student::getName, "jay")

2.3 Aggregation

JDFrame<Student> frame = JDFrame.read(studentList);
Student maxAgeStudent = frame.max(Student::getAge);
Integer maxAge = frame.maxValue(Student::getAge);
Student minAgeStudent = frame.min(Student::getAge);
BigDecimal avgAge = frame.avg(Student::getAge);
BigDecimal sumAge = frame.sum(Student::getAge);
MaxMin<Student> maxMinStudent = frame.maxMin(Student::getAge);
MaxMin<Integer> maxMinAge = frame.maxMinValue(Student::getAge);

2.4 Deduplication

List<Student> distinctByObject = SDFrame.read(studentList).distinct().toLists();
List<Student> distinctBySchool = SDFrame.read(studentList).distinct(Student::getSchool).toLists();
List<Student> distinctByComposite = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists();

2.5 Group & Aggregate

List<FI2<String, BigDecimal>> sumBySchool = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> maxBySchool = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Long>> countBySchool = frame.groupByCount(Student::getSchool).toLists();
// multi‑level grouping examples omitted for brevity

2.6 Sorting

SDFrame.read(studentList).sortDesc(Student::getAge);
SDFrame.read(studentList).sortAsc(Sorter.sortDescBy(Student::getAge).sortAsc(Student::getLevel));
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));

2.7 Joining

SDFrame<Student> sdf = SDFrame.read(studentList);
SDFrame<FI2<String, BigDecimal>> sdf2 = // result of aggregation above
UserInfo frame = sdf.join(sdf2, (a,b) -> a.getSchool().equals(b.getC1()), (a,b) -> {
    UserInfo ui = new UserInfo();
    ui.setKey1(a.getSchool());
    ui.setKey2(b.getC2().intValue());
    ui.setKey3(String.valueOf(a.getId()));
    return ui;
});
frame.show(5);

2.8 Cutting (Pagination & Ranking)

cutFirst(int n); // first n rows
cutLast(int n); // last n rows
cut(Integer start, Integer end); // sub‑list
cutPage(int page, int pageSize); // pagination
cutFirstRank(Sorter
sorter, int n); // top n by rank

2.9 Frame Parameter Settings

defaultScale(int scale, RoundingMode roundingMode); // set default decimal precision

2.10 Miscellaneous

Percentage conversion: SDFrame.read(list).mapPercent(...)

Partitioning: split list into sub‑lists of a given size.

Generate row numbers based on sorting order.

Replenish missing dimension values in grouping results.

3. Window Functions

JDFrame also supports programmable window functions; see the linked tutorial for details.

4. Final Notes

The library provides two frame types: SDFrame (lazy, stream‑compatible) and JDFrame (eager, intermediate‑state accessible). Choose SDFrame for simple stream pipelines and JDFrame when you need to pause and resume processing, similar to a DataFrame in Python/Pandas.

Source code: https://github.com/burukeYou/JDFrame

Maven dependency: https://central.sonatype.com/artifact/io.github.burukeyou/jdframe

BackendJavaAPIStreamdataframejdframesdframe
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.