How SPL Transforms Java Data Processing: From CSV to Multi‑JSON with Embedded SQL
This article introduces SPL, an open‑source Java‑embeddable computation library that outperforms traditional embedded databases and DataFrame tools by handling both tabular and nested JSON data, supporting JDBC, SQL‑like queries, multi‑source integration, and persistent .btx files with concise code examples.
SPL is an open‑source Java library that can be embedded directly into Java applications to perform data calculations, supporting both two‑dimensional structured data and multi‑layer JSON. Compared with embedded databases such as HSQLDB, Derby, H2, or SQLite, and DataFrame libraries like Tablesaw, Joinery, and Morpheus, SPL offers a more convenient and powerful approach.
Calling SPL from Java via JDBC
Using the standard JDBC driver, SPL can be invoked with a single Class.forName("com.esproc.jdbc.InternalDriver") call. For example, to sort a tab‑separated file Orders.txt by Client ascending and Amount descending:
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection = DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String str = "=T(\"D:/Orders.txt\").sort(Client,-Amount)";
ResultSet result = statement.executeQuery(str);The T function encapsulates the entire import‑process of an embedded database, making the workflow much simpler.
SQL‑like Syntax Support
SPL also understands SQL‑style statements. The previous sorting operation can be rewritten as:
str = "$select * from d:/Orders.txt orderby Client, Amount desc";Built‑in Functions
SPL provides hundreds of built‑in functions for common calculations. Examples include:
=T("D:/Orders.txt").select(Amount>1000 && Amount<=3000 && like(Client,"*S*"))– conditional query =T("D:/Orders.txt").groups(year(OrderDate); sum(Amount)) – group and aggregate
=join(T("D:/Orders.txt"):O, SellerId; T("D:/data/Employees.txt"):E, EId).new(O.OrderID, O.Client, O.SellerId, O.Amount, O.OrderDate, E.Name, E.Gender, E.Dept)– join two datasets
Persisting Data to .btx Collections
Processed data can be persisted as .btx collection files, which are compact and offer high performance. For instance, merging two CSV files and removing duplicate rows: =[T("d:/orders1.csv"), T("d:/orders2.csv")].merge@u() – union of records file("d:/fast.btx").export@b(A1) – write to a .btx file
The resulting .btx file can be queried just like a regular file:
str = "=T(\"D:/fast.btx\").sort(Client,-Amount)";
str = "$select * from d:/fast.btx order by Client, Amount desc";Calling SPL Scripts as Stored Procedures
SPL scripts can be stored as files (e.g., bigCustomer.dfx) and invoked from Java:
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection = DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery("call bigCustomer ()");A sample script to find the top‑n customers whose cumulative sales reach half of the total:
= T("D:/data/sales.csv").sort(amount:-1) // sort descending
= A1.cumulate(amount) // cumulative sum
= A2.m(-1)/2 // total sum
= A2.pselect(~>=A3) // position where half is reached
= A1(to(A4)) // retrieve top customersHandling Complex Calculations
SPL’s rich syntax simplifies operations that are difficult with stored procedures or embedded databases. For example, calculating the longest consecutive rise days of a stock from an Excel file requires only two SPL lines:
=T("d:/AAPL.xlsx")
=a=0, A1.max(a=if(price>price[-1], a+1, 0))IDE Support
SPL provides a dedicated IDE that allows step‑by‑step debugging and observation of intermediate results.
Processing Irregular Text
For irregular text files, SPL can specify custom delimiters. Example using double‑pipe || as separator:
= file("D:/Orders.txt").import@t(;,"||")Multi‑Source Integration
SPL supports many data sources beyond plain text, including Excel, relational databases, NoSQL stores, web services, and RESTful APIs. A RESTful JSON example that filters orders with price between 500 and 2000 and client name containing "bro":
=json(httpfile("http://127.0.0.1:6868/api/getEmpOrders").read())
=A1.conj(Orders)
=A2.select(Amount>500 && Amount<=2000 && like@c(Client,"*bro*"))Overall, SPL is a powerful Java‑embedded computation engine that surpasses traditional embedded databases in structured calculations, outperforms DataFrame libraries for nested JSON handling, and offers comprehensive multi‑source support.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
