Backend Development 16 min read

JDFrame/SDFrame: A Semantic Java Stream DataFrame Library for Simplified Data Processing

This article introduces JDFrame/SDFrame, a JVM‑level DataFrame library that provides a more semantic and concise API for Java 8 stream operations, demonstrates how to add the Maven dependency, shows practical examples for filtering, grouping, sorting, joining, pagination, and explains the differences between the mutable JDFrame and the immutable SDFrame.

Top Architect
Top Architect
Top Architect
JDFrame/SDFrame: A Semantic Java Stream DataFrame Library for Simplified Data Processing

The author, a senior architect, presents a Java library called JDFrame/SDFrame that mimics DataFrame concepts from Spark/Pandas to make Java 8 stream processing more expressive and less error‑prone.

Quick start : add the Maven dependency

<dependency>
    <groupId>io.github.burukeyou</groupId>
    <artifactId>jdframe</artifactId>
    <version>0.0.4</version>
</dependency>

Then create a list of Student objects (the article provides the full POJO definition) and use the library to perform a query that selects schools with students aged 9‑16, sums their scores, and returns the top two schools:

SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
    .whereNotNull(Student::getAge)
    .whereBetween(Student::getAge, 9, 16)
    .groupBySum(Student::getSchool, Student::getScore)
    .sortDesc(FI2::getC2)
    .cutFirst(2);

sdf2.show();

The output shows the school name and the aggregated score.

API catalogue covers matrix viewing, filtering, aggregation, deduplication, grouping, sorting, joining, pagination, frame configuration, and miscellaneous utilities such as percentage conversion, partitioning, and row‑number generation. Example snippets include:

// matrix view
void show(int n);
List
columns();
List
col(Function
function);
T head();
List
head(int n);
T tail();
List
tail(int n);
List
page(int page, int pageSize);
// filtering
.whereBetween(Student::getAge, 3, 6)
.whereBetweenR(Student::getAge, 3, 6) // (3,6]
.whereNotNull(Student::getName)
.whereGt(Student::getAge, 3)
.whereIn(Student::getAge, Arrays.asList(3,7,8))
.whereLike(Student::getName, "jay");
// aggregation
frame.max(Student::getAge);
frame.avg(Student::getAge);
frame.sum(Student::getAge);
frame.groupBySum(Student::getSchool, Student::getAge);
frame.groupByCount(Student::getSchool);
// deduplication
SDFrame.read(studentList).distinct().toLists();
SDFrame.read(studentList).distinct(Student::getSchool).toLists();
// sorting
SDFrame.read(studentList).sortDesc(Student::getAge);
SDFrame.read(studentList).sortAsc(Sorter.sortDescBy(Student::getAge).sortAsc(Student::getLevel));
// joining
UserInfo userInfo = new UserInfo();
userInfo.setKey1(a.getSchool());
userInfo.setKey2(b.getC2().intValue());
userInfo.setKey3(String.valueOf(a.getId()));
return userInfo;

The article also explains the key difference between JDFrame (stateful, operations take effect immediately) and SDFrame (stateless, similar to Java streams, requiring a new read after each terminal operation).

Finally, the author provides links to the source repository, Maven Central, and a tutorial on window functions, and invites readers to discuss further extensions.

BackendJavaStream APIdataframejdframesdframe
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.