JDFrame/SDFrame: A JVM‑Level DataFrame‑Like Stream API for Java
This article introduces JDFrame/SDFrame, a Java library that provides a DataFrame‑style, semantic API for stream processing, covering quick start, dependency setup, comprehensive examples of filtering, aggregation, grouping, sorting, joining, and utility functions, along with code snippets and usage guidance.
1. Quick Start
Include the Maven dependency:
<dependency>
<groupId>io.github.burukeyou</groupId>
<artifactId>jdframe</artifactId>
<version>0.0.2</version>
</dependency>Example: calculate the total score of students aged 9‑16 for each school and retrieve the top‑2 schools.
static List<Student> studentList = new ArrayList<>();
static {
studentList.add(new Student(1,"a","一中","一年级",11, new BigDecimal(1)));
// ... other students omitted for brevity ...
studentList.add(new Student(7,"e","三中","二年级",15, new BigDecimal(5)));
}
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
.whereNotNull(Student::getAge)
.whereBetween(Student::getAge,9,16)
.groupBySum(Student::getSchool, Student::getScore)
.sortDesc(FI2::getC2)
.cutFirst(2);
sdf2.show();Output:
c1 c2
三中 10
二中 72. API Examples
2.1 Matrix Information
void show(int n); // print matrix to console
List
columns(); // get column names
List
col(Function
f); // get a column by mapping function
T head(); // first element
List
head(int n); // first n elements
T tail(); // last element
List
tail(int n); // last n elements2.2 Filtering
SDFrame.read(studentList)
.whereBetween(Student::getAge,3,6) // [3,6]
.whereBetweenR(Student::getAge,3,6) // (3,6]
.whereBetweenL(Student::getAge,3,6) // [3,6)
.whereNotNull(Student::getName) // not null or empty
.whereGt(Student::getAge,3) // >3
.whereGe(Student::getAge,3) // >=3
.whereLt(Student::getAge,3) // <3
.whereIn(Student::getAge, Arrays.asList(3,7,8))
.whereNotIn(Student::getAge, Arrays.asList(3,7,8))
.whereEq(Student::getAge,3)
.whereNotEq(Student::getAge,3)
.whereLike(Student::getName,"jay")
.whereLikeLeft(Student::getName,"jay")
.whereLikeRight(Student::getName,"jay");2.3 Aggregation
JDFrame<Student> frame = JDFrame.read(studentList);
Student maxStudent = frame.max(Student::getAge);
Integer maxAge = frame.maxValue(Student::getAge);
Student minStudent = frame.min(Student::getAge);
Integer minAge = frame.minValue(Student::getAge);
BigDecimal avgAge = frame.avg(Student::getAge);
BigDecimal sumAge = frame.sum(Student::getAge);
MaxMin<Student> maxMin = frame.maxMin(Student::getAge);
MaxMin<Integer> maxMinVal = frame.maxMinValue(Student::getAge);2.4 Deduplication
Native streams only deduplicate by object hash; JDFrame provides field‑level distinct.
List<Student> distinct = SDFrame.read(studentList).distinct().toLists();
List<Student> bySchool = SDFrame.read(studentList).distinct(Student::getSchool).toLists();
List<Student> byComposite = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists();2.5 Simple Group‑by Aggregation
JDFrame<Student> frame = JDFrame.from(studentList);
List<FI2<String, BigDecimal>> sumAge = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> maxAge = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Student>> maxStudent = frame.groupByMax(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> minAge = frame.groupByMinValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Long>> count = frame.groupByCount(Student::getSchool).toLists();
List<FI2<String, BigDecimal>> avgAge = frame.groupByAvg(Student::getSchool, Student::getAge).toLists();
// multi‑level grouping examples omitted for brevity2.6 Sorting
SDFrame.read(studentList).sortDesc(Student::getAge);
SDFrame.read(studentList).sortDesc(Student::getAge).sortAsc(Student::getLevel);
SDFrame.read(studentList).sortAsc(Student::getAge);
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));2.7 Joining Matrices
API list:
append(T t); // add element
union(IFrame
other); // addAll
join(IFrame
other, JoinOn
on, Join
join); // inner join
leftJoin(...); // left join (null for missing side)
rightJoin(...); // right join (null for missing side)Example of inner join between a student matrix and the aggregated score matrix:
SDFrame<Student> sdf = SDFrame.read(studentList);
SDFrame<FI2<String, BigDecimal>> sdf2 = /* aggregation shown earlier */;
UserInfo frame = sdf.join(sdf2,
(a,b) -> a.getSchool().equals(b.getC1()),
(a,b) -> {
UserInfo ui = new UserInfo();
ui.setKey1(a.getSchool());
ui.setKey2(b.getC2().intValue());
ui.setKey3(String.valueOf(a.getId()));
return ui;
});
frame.show(5);2.8 Other Utilities
Percentage conversion (equivalent to round(score*100,2) ):
SDFrame<Student> pct = SDFrame.read(studentList)
.mapPercent(Student::getScore, Student::setScore, 2);Partitioning a list into sub‑lists of size 5:
List<List<Student>> parts = SDFrame.read(studentList).partition(5).toLists();Generating sequential numbers after sorting by age:
SDFrame.read(studentList)
.sortDesc(Student::getAge)
.addSortNoCol(Student::setRank)
.show(30);Generating ranking numbers where equal values share the same rank:
SDFrame<Student> df = SDFrame.read(studentList)
.addRankingSameColDesc(Student::getAge, Student::setRank);
df.show(20);Replenishing missing dimension entries (e.g., schools or grades) using a ReplenishFunction :
// Fill missing schools
List
allSchools = Arrays.asList("一中","二中","三中","四中");
SDFrame.read(studentList)
.replenish(Student::getSchool, allSchools, school -> new Student(school))
.show();
// Fill missing grades within each school
SDFrame.read(studentList)
.replenish(Student::getSchool, Student::getLevel,
(school, level) -> new Student(school, level))
.show(30);Conclusion
The library offers two frames: SDFrame (lazy, stream‑compatible) and JDFrame (eager, real‑time). Use SDFrame for simple one‑pass stream operations; use JDFrame when intermediate results need to be reused, mirroring the DataFrame model.
Source code and Maven coordinates:
https://github.com/burukeYou/JDFrame
https://central.sonatype.com/artifact/io.github.burukeyou/jdframeFuture work may include richer matrix representations and tighter JVM language support.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.