How SPL Transforms Java Data Processing: From CSV to Multi‑JSON with Embedded SQL

This article introduces SPL, an open‑source Java‑embeddable computation library that outperforms traditional embedded databases and DataFrame tools by handling both tabular and nested JSON data, supporting JDBC, SQL‑like queries, multi‑source integration, and persistent .btx files with concise code examples.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
How SPL Transforms Java Data Processing: From CSV to Multi‑JSON with Embedded SQL

SPL is an open‑source Java library that can be embedded directly into Java applications to perform data calculations, supporting both two‑dimensional structured data and multi‑layer JSON. Compared with embedded databases such as HSQLDB, Derby, H2, or SQLite, and DataFrame libraries like Tablesaw, Joinery, and Morpheus, SPL offers a more convenient and powerful approach.

Calling SPL from Java via JDBC

Using the standard JDBC driver, SPL can be invoked with a single Class.forName("com.esproc.jdbc.InternalDriver") call. For example, to sort a tab‑separated file Orders.txt by Client ascending and Amount descending:

Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection = DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String str = "=T(\"D:/Orders.txt\").sort(Client,-Amount)";
ResultSet result = statement.executeQuery(str);

The T function encapsulates the entire import‑process of an embedded database, making the workflow much simpler.

SQL‑like Syntax Support

SPL also understands SQL‑style statements. The previous sorting operation can be rewritten as:

str = "$select * from d:/Orders.txt orderby Client, Amount desc";

Built‑in Functions

SPL provides hundreds of built‑in functions for common calculations. Examples include:

=T("D:/Orders.txt").select(Amount>1000 && Amount<=3000 && like(Client,"*S*"))

– conditional query =T("D:/Orders.txt").groups(year(OrderDate); sum(Amount)) – group and aggregate

=join(T("D:/Orders.txt"):O, SellerId; T("D:/data/Employees.txt"):E, EId).new(O.OrderID, O.Client, O.SellerId, O.Amount, O.OrderDate, E.Name, E.Gender, E.Dept)

– join two datasets

Persisting Data to .btx Collections

Processed data can be persisted as .btx collection files, which are compact and offer high performance. For instance, merging two CSV files and removing duplicate rows: =[T("d:/orders1.csv"), T("d:/orders2.csv")].merge@u() – union of records file("d:/fast.btx").export@b(A1) – write to a .btx file

The resulting .btx file can be queried just like a regular file:

str = "=T(\"D:/fast.btx\").sort(Client,-Amount)";
str = "$select * from d:/fast.btx order by Client, Amount desc";

Calling SPL Scripts as Stored Procedures

SPL scripts can be stored as files (e.g., bigCustomer.dfx) and invoked from Java:

Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection = DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery("call bigCustomer ()");

A sample script to find the top‑n customers whose cumulative sales reach half of the total:

= T("D:/data/sales.csv").sort(amount:-1)          // sort descending
= A1.cumulate(amount)                               // cumulative sum
= A2.m(-1)/2                                        // total sum
= A2.pselect(~>=A3)                                 // position where half is reached
= A1(to(A4))                                        // retrieve top customers

Handling Complex Calculations

SPL’s rich syntax simplifies operations that are difficult with stored procedures or embedded databases. For example, calculating the longest consecutive rise days of a stock from an Excel file requires only two SPL lines:

=T("d:/AAPL.xlsx")
=a=0, A1.max(a=if(price>price[-1], a+1, 0))

IDE Support

SPL provides a dedicated IDE that allows step‑by‑step debugging and observation of intermediate results.

Processing Irregular Text

For irregular text files, SPL can specify custom delimiters. Example using double‑pipe || as separator:

= file("D:/Orders.txt").import@t(;,"||")

Multi‑Source Integration

SPL supports many data sources beyond plain text, including Excel, relational databases, NoSQL stores, web services, and RESTful APIs. A RESTful JSON example that filters orders with price between 500 and 2000 and client name containing "bro":

=json(httpfile("http://127.0.0.1:6868/api/getEmpOrders").read())
=A1.conj(Orders)
=A2.select(Amount>500 && Amount<=2000 && like@c(Client,"*bro*"))

Overall, SPL is a powerful Java‑embedded computation engine that surpasses traditional embedded databases in structured calculations, outperforms DataFrame libraries for nested JSON handling, and offers comprehensive multi‑source support.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Javadata-processingJSONJDBCEmbedded DatabaseSPL
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.