How to Parallel Load Hundreds of Millions of Rows with SQL*Loader: Tips and Pitfalls
This article shares a real‑world case study of loading billions of rows from an unsplittable text file into an Oracle table using SQL*Loader, detailing a parallel skip‑and‑load technique, performance results, a critical 4‑byte row limit bug, and practical recommendations for CPU‑vs IO‑bound workloads.
Background and Test Scenario
The author recounts a performance‑testing scenario from several years ago where a txt file containing a few hundred million rows (over 200 columns) had to be loaded into an Oracle table using SQL*Loader without modifying any database configuration. The goal was to measure the total import time; the Exadata X5 benchmark achieved about 40 minutes, while the author’s own test reached roughly 10 minutes, prompting investigation into the techniques used.
Parallel Loading Misconception
Although the test specification prohibited splitting the file, many assumed that SQL*Loader required file splitting to enable parallelism. The author disproved this by using a "skip + load" approach that logically partitions the workload without physically dividing the file.
Count total lines with wc -l.
Generate multiple load commands using skip and load parameters to simulate file splitting.
Execute the commands in parallel.
set serveroutput on
set linesize 1000
set pages 0
declare
total_line_number number;
dop number;
skip number;
load number;
tail_of_mod number;
command varchar2(4000);
directory varchar2(4000);
begin
total_line_number := 348104868;
directory := '/home/oracle/adam';
dop := 20;
skip := 0;
load := 0;
tail_of_mod := mod(total_line_number,dop);
load := trunc(total_line_number/dop);
for i in 1..dop loop
if i = dop then
load := load + tail_of_mod;
end if;
command := 'nohup sqlldr tester/tester control='||directory||'/load.ctl log='||directory||'/test'||i||'.log READSIZE=20000000 BINDSIZE=20000000 direct=true parallel=true errors=99999 silent=errors,discards skip='||skip||' load='||load ||' &' ;
dbms_output.put_line(command);
skip := skip+load;
end loop;
end;
/Performance Observations
The bottleneck in this scenario was CPU, not I/O, making the logical partitioning method highly effective. When I/O becomes the limiting factor, the benefit of parallelism diminishes because the skip operation adds unnecessary reads, and multi‑process loading yields only modest I/O gains.
Improved Approach for Very Large Files
For an even larger test (a 6 TB unsplittable file), the original wc -l line count became impractically slow. The author adopted an estimation technique:
Sample the first 50 000 lines with head -n 50000 file > 1.txt.
Estimate total lines by scaling the sample size to the full file size.
Omit the final load value, letting the last job load the remaining rows.
This estimation allowed the job to complete successfully.
Critical Bug in SQL*Loader
During a later run, the imported row count was off by billions. Investigation revealed a hard limit: a single SQL*Loader command cannot load more than 4 294 967 295 (2³²‑1) rows, and the skip value adds another ~6.5 billion rows. This bug was fixed only in Oracle 12c; the test environment used 11.2.0.4, so the unsplit file could not be fully loaded.
Conclusion
The "skip + load" parallel loading technique is effective for CPU‑intensive imports with many columns, but it offers limited advantage for I/O‑bound workloads and is constrained by a row‑count bug in older Oracle versions. When possible, using external tables may simplify large‑scale data loads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
