JDFrame/SDFrame: A JVM‑Level DataFrame‑Like Stream API for Java

This article introduces JDFrame/SDFrame, a Java library that provides a DataFrame‑style, semantic API for stream processing, covering quick start, dependency setup, comprehensive examples of filtering, aggregation, grouping, sorting, joining, and utility functions, along with code snippets and usage guidance.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
JDFrame/SDFrame: A JVM‑Level DataFrame‑Like Stream API for Java

1. Quick Start

Include the Maven dependency:

<dependency>
    <groupId>io.github.burukeyou</groupId>
    <artifactId>jdframe</artifactId>
    <version>0.0.2</version>
</dependency>

Example: calculate the total score of students aged 9‑16 for each school and retrieve the top‑2 schools.

static List<Student> studentList = new ArrayList<>();
static {
    studentList.add(new Student(1,"a","一中","一年级",11, new BigDecimal(1)));
    // ... other students omitted for brevity ...
    studentList.add(new Student(7,"e","三中","二年级",15, new BigDecimal(5)));
}

SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
        .whereNotNull(Student::getAge)
        .whereBetween(Student::getAge,9,16)
        .groupBySum(Student::getSchool, Student::getScore)
        .sortDesc(FI2::getC2)
        .cutFirst(2);

sdf2.show();

Output:

c1  c2
三中 10
二中 7

2. API Examples

2.1 Matrix Information

void show(int n);               // print matrix to console
List<String> columns();        // get column names
List<R> col(Function<T,R> f); // get a column by mapping function
T head();                      // first element
List<T> head(int n);           // first n elements
T tail();                      // last element
List<T> tail(int n);           // last n elements

2.2 Filtering

SDFrame.read(studentList)
    .whereBetween(Student::getAge,3,6)      // [3,6]
    .whereBetweenR(Student::getAge,3,6)     // (3,6]
    .whereBetweenL(Student::getAge,3,6)     // [3,6)
    .whereNotNull(Student::getName)         // not null or empty
    .whereGt(Student::getAge,3)             // >3
    .whereGe(Student::getAge,3)            // >=3
    .whereLt(Student::getAge,3)             // <3
    .whereIn(Student::getAge, Arrays.asList(3,7,8))
    .whereNotIn(Student::getAge, Arrays.asList(3,7,8))
    .whereEq(Student::getAge,3)
    .whereNotEq(Student::getAge,3)
    .whereLike(Student::getName,"jay")
    .whereLikeLeft(Student::getName,"jay")
    .whereLikeRight(Student::getName,"jay");

2.3 Aggregation

JDFrame<Student> frame = JDFrame.read(studentList);
Student maxStudent = frame.max(Student::getAge);
Integer maxAge = frame.maxValue(Student::getAge);
Student minStudent = frame.min(Student::getAge);
Integer minAge = frame.minValue(Student::getAge);
BigDecimal avgAge = frame.avg(Student::getAge);
BigDecimal sumAge = frame.sum(Student::getAge);
MaxMin<Student> maxMin = frame.maxMin(Student::getAge);
MaxMin<Integer> maxMinVal = frame.maxMinValue(Student::getAge);

2.4 Deduplication

Native streams only deduplicate by object hash; JDFrame provides field‑level distinct.

List<Student> distinct = SDFrame.read(studentList).distinct().toLists();
List<Student> bySchool = SDFrame.read(studentList).distinct(Student::getSchool).toLists();
List<Student> byComposite = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists();

2.5 Simple Group‑by Aggregation

JDFrame<Student> frame = JDFrame.from(studentList);
List<FI2<String, BigDecimal>> sumAge = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> maxAge = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Student>> maxStudent = frame.groupByMax(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> minAge = frame.groupByMinValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Long>> count = frame.groupByCount(Student::getSchool).toLists();
List<FI2<String, BigDecimal>> avgAge = frame.groupByAvg(Student::getSchool, Student::getAge).toLists();
// multi‑level grouping examples omitted for brevity

2.6 Sorting

SDFrame.read(studentList).sortDesc(Student::getAge);
SDFrame.read(studentList).sortDesc(Student::getAge).sortAsc(Student::getLevel);
SDFrame.read(studentList).sortAsc(Student::getAge);
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));

2.7 Joining Matrices

API list:

append(T t);                     // add element
union(IFrame<T> other);          // addAll
join(IFrame<K> other, JoinOn<T,K> on, Join<T,K,R> join);   // inner join
leftJoin(...);                  // left join (null for missing side)
rightJoin(...);                 // right join (null for missing side)

Example of inner join between a student matrix and the aggregated score matrix:

SDFrame<Student> sdf = SDFrame.read(studentList);
SDFrame<FI2<String, BigDecimal>> sdf2 = /* aggregation shown earlier */;
UserInfo frame = sdf.join(sdf2,
    (a,b) -> a.getSchool().equals(b.getC1()),
    (a,b) -> {
        UserInfo ui = new UserInfo();
        ui.setKey1(a.getSchool());
        ui.setKey2(b.getC2().intValue());
        ui.setKey3(String.valueOf(a.getId()));
        return ui;
    });
frame.show(5);

2.8 Other Utilities

Percentage conversion (equivalent to round(score*100,2)):

SDFrame<Student> pct = SDFrame.read(studentList)
    .mapPercent(Student::getScore, Student::setScore, 2);

Partitioning a list into sub‑lists of size 5:

List<List<Student>> parts = SDFrame.read(studentList).partition(5).toLists();

Generating sequential numbers after sorting by age:

SDFrame.read(studentList)
    .sortDesc(Student::getAge)
    .addSortNoCol(Student::setRank)
    .show(30);

Generating ranking numbers where equal values share the same rank:

SDFrame<Student> df = SDFrame.read(studentList)
    .addRankingSameColDesc(Student::getAge, Student::setRank);
df.show(20);

Replenishing missing dimension entries (e.g., schools or grades) using a ReplenishFunction:

// Fill missing schools
List<String> allSchools = Arrays.asList("一中","二中","三中","四中");
SDFrame.read(studentList)
    .replenish(Student::getSchool, allSchools, school -> new Student(school))
    .show();

// Fill missing grades within each school
SDFrame.read(studentList)
    .replenish(Student::getSchool, Student::getLevel,
        (school, level) -> new Student(school, level))
    .show(30);

Conclusion

The library offers two frames: SDFrame (lazy, stream‑compatible) and JDFrame (eager, real‑time). Use SDFrame for simple one‑pass stream operations; use JDFrame when intermediate results need to be reused, mirroring the DataFrame model.

Source code and Maven coordinates:

https://github.com/burukeYou/JDFrame
https://central.sonatype.com/artifact/io.github.burukeyou/jdframe

Future work may include richer matrix representations and tighter JVM language support.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJavaStream APIdataframeJDFrameSDFrame
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.