JDFrame/SDFrame: A JVM‑Level DataFrame API for Simplified Java Stream Processing
This article introduces JDFrame and SDFrame, two Java libraries that provide a DataFrame‑style, semantic API for simplifying stream operations, including dependency setup, quick‑start examples, matrix viewing, filtering, aggregation, deduplication, grouping, sorting, joining, pagination, window functions, and a comparison of their execution models, along with links to the source code and documentation.
The author, a senior architect, presents a JVM‑level DataFrame tool designed to make Java 8 stream processing more semantic and concise, inspired by DataFrame models used in big‑data frameworks like Spark.
0. Introduction
Java streams often require verbose API calls; the author proposes a DataFrame‑style API ( DataFrame ) that abstracts common operations and supports anonymous functions for field handling.
1. Quick Start
1.1 Add Dependency
<dependency>
<groupId>io.github.burukeyou</groupId>
<artifactId>jdframe</artifactId>
<version>0.0.4</version>
</dependency>1.2 Example
Calculate the total score of students aged 9‑16 for each school and retrieve the top‑2 schools.
static List
studentList = new ArrayList<>();
// ...populate studentList with Student objects...
SDFrame<FI2<String, BigDecimal>> sdf2 = SDFrame.read(studentList)
.whereNotNull(Student::getAge)
.whereBetween(Student::getAge, 9, 16)
.groupBySum(Student::getSchool, Student::getScore)
.sortDesc(FI2::getC2)
.cutFirst(2);
sdf2.show();Output:
c1 c2
三中 10
二中 72. API Cases
2.1 Matrix View
void show(int n); // print matrix
List
columns(); // get column names
List
col(Function
function); // get column values
T head(); // first element
List
head(int n); // first n elements
T tail(); // last element
List
tail(int n); // last n elements
List
page(int page, int pageSize); // pagination2.2 Filtering
.whereBetween(Student::getAge, 3, 6)
.whereBetweenR(Student::getAge, 3, 6) // (3,6]
.whereBetweenL(Student::getAge, 3, 6) // [3,6)
.whereNotNull(Student::getName)
.whereGt(Student::getAge, 3)
.whereGe(Student::getAge, 3)
.whereLt(Student::getAge, 3)
.whereIn(Student::getAge, Arrays.asList(3,7,8))
.whereNotIn(Student::getAge, Arrays.asList(3,7,8))
.whereEq(Student::getAge, 3)
.whereNotEq(Student::getAge, 3)
.whereLike(Student::getName, "jay")
.whereLikeLeft(Student::getName, "jay")
.whereLikeRight(Student::getName, "jay")2.3 Aggregation
JDFrame<Student> frame = JDFrame.read(studentList);
Student maxAgeStudent = frame.max(Student::getAge);
Integer maxAge = frame.maxValue(Student::getAge);
Student minAgeStudent = frame.min(Student::getAge);
BigDecimal avgAge = frame.avg(Student::getAge);
BigDecimal sumAge = frame.sum(Student::getAge);
MaxMin<Student> maxMinStudent = frame.maxMin(Student::getAge);
MaxMin<Integer> maxMinAge = frame.maxMinValue(Student::getAge);2.4 Deduplication
List<Student> distinctByObject = SDFrame.read(studentList).distinct().toLists();
List<Student> distinctBySchool = SDFrame.read(studentList).distinct(Student::getSchool).toLists();
List<Student> distinctByComposite = SDFrame.read(studentList).distinct(e -> e.getSchool() + e.getLevel()).toLists();2.5 Group & Aggregate
List<FI2<String, BigDecimal>> sumBySchool = frame.groupBySum(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Integer>> maxBySchool = frame.groupByMaxValue(Student::getSchool, Student::getAge).toLists();
List<FI2<String, Long>> countBySchool = frame.groupByCount(Student::getSchool).toLists();
// multi‑level grouping examples omitted for brevity2.6 Sorting
SDFrame.read(studentList).sortDesc(Student::getAge);
SDFrame.read(studentList).sortAsc(Sorter.sortDescBy(Student::getAge).sortAsc(Student::getLevel));
SDFrame.read(studentList).sortAsc(Comparator.comparing(e -> e.getLevel() + e.getId()));2.7 Joining
SDFrame<Student> sdf = SDFrame.read(studentList);
SDFrame<FI2<String, BigDecimal>> sdf2 = // result of aggregation above
UserInfo frame = sdf.join(sdf2, (a,b) -> a.getSchool().equals(b.getC1()), (a,b) -> {
UserInfo ui = new UserInfo();
ui.setKey1(a.getSchool());
ui.setKey2(b.getC2().intValue());
ui.setKey3(String.valueOf(a.getId()));
return ui;
});
frame.show(5);2.8 Cutting (Pagination & Ranking)
cutFirst(int n); // first n rows
cutLast(int n); // last n rows
cut(Integer start, Integer end); // sub‑list
cutPage(int page, int pageSize); // pagination
cutFirstRank(Sorter
sorter, int n); // top n by rank2.9 Frame Parameter Settings
defaultScale(int scale, RoundingMode roundingMode); // set default decimal precision2.10 Miscellaneous
Percentage conversion: SDFrame.read(list).mapPercent(...)
Partitioning: split list into sub‑lists of a given size.
Generate row numbers based on sorting order.
Replenish missing dimension values in grouping results.
3. Window Functions
JDFrame also supports programmable window functions; see the linked tutorial for details.
4. Final Notes
The library provides two frame types: SDFrame (lazy, stream‑compatible) and JDFrame (eager, intermediate‑state accessible). Choose SDFrame for simple stream pipelines and JDFrame when you need to pause and resume processing, similar to a DataFrame in Python/Pandas.
Source code: https://github.com/burukeYou/JDFrame
Maven dependency: https://central.sonatype.com/artifact/io.github.burukeyou/jdframe
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.