Databases 7 min read

How to Parallel Load Hundreds of Millions of Rows with SQL*Loader: Tips and Pitfalls

This article shares a real‑world case study of loading billions of rows from an unsplittable text file into an Oracle table using SQL*Loader, detailing a parallel skip‑and‑load technique, performance results, a critical 4‑byte row limit bug, and practical recommendations for CPU‑vs IO‑bound workloads.

dbaplus Community
dbaplus Community
dbaplus Community
How to Parallel Load Hundreds of Millions of Rows with SQL*Loader: Tips and Pitfalls

Background and Test Scenario

The author recounts a performance‑testing scenario from several years ago where a txt file containing a few hundred million rows (over 200 columns) had to be loaded into an Oracle table using SQL*Loader without modifying any database configuration. The goal was to measure the total import time; the Exadata X5 benchmark achieved about 40 minutes, while the author’s own test reached roughly 10 minutes, prompting investigation into the techniques used.

Parallel Loading Misconception

Although the test specification prohibited splitting the file, many assumed that SQL*Loader required file splitting to enable parallelism. The author disproved this by using a "skip + load" approach that logically partitions the workload without physically dividing the file.

Count total lines with wc -l.

Generate multiple load commands using skip and load parameters to simulate file splitting.

Execute the commands in parallel.

set serveroutput on
set linesize 1000
set pages 0
declare
  total_line_number number;
  dop  number;
  skip  number;
  load  number;
  tail_of_mod  number;
  command varchar2(4000);
  directory varchar2(4000);
begin
  total_line_number := 348104868;
  directory := '/home/oracle/adam';
  dop := 20;
  skip := 0;
  load := 0;
  tail_of_mod := mod(total_line_number,dop);
  load := trunc(total_line_number/dop);
  for i in 1..dop loop
    if i = dop then
      load := load + tail_of_mod;
    end if;
    command := 'nohup sqlldr tester/tester control='||directory||'/load.ctl log='||directory||'/test'||i||'.log READSIZE=20000000 BINDSIZE=20000000 direct=true parallel=true  errors=99999 silent=errors,discards skip='||skip||'   load='||load ||' &' ;
    dbms_output.put_line(command);
    skip := skip+load;
  end loop;
end;
/

Performance Observations

The bottleneck in this scenario was CPU, not I/O, making the logical partitioning method highly effective. When I/O becomes the limiting factor, the benefit of parallelism diminishes because the skip operation adds unnecessary reads, and multi‑process loading yields only modest I/O gains.

Improved Approach for Very Large Files

For an even larger test (a 6 TB unsplittable file), the original wc -l line count became impractically slow. The author adopted an estimation technique:

Sample the first 50 000 lines with head -n 50000 file > 1.txt.

Estimate total lines by scaling the sample size to the full file size.

Omit the final load value, letting the last job load the remaining rows.

This estimation allowed the job to complete successfully.

Critical Bug in SQL*Loader

During a later run, the imported row count was off by billions. Investigation revealed a hard limit: a single SQL*Loader command cannot load more than 4 294 967 295 (2³²‑1) rows, and the skip value adds another ~6.5 billion rows. This bug was fixed only in Oracle 12c; the test environment used 11.2.0.4, so the unsplit file could not be fully loaded.

SQL*Loader row limit bug screenshot
SQL*Loader row limit bug screenshot

Conclusion

The "skip + load" parallel loading technique is effective for CPU‑intensive imports with many columns, but it offers limited advantage for I/O‑bound workloads and is constrained by a row‑count bug in older Oracle versions. When possible, using external tables may simplify large‑scale data loads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

bugOracledata importParallel LoadingSQL*Loader
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.