Databases 17 min read

How to Prevent Null, Type, and Charset Pitfalls in Oracle‑to‑ADB Data Sync

This article details the common pitfalls encountered when synchronizing Oracle databases to AnalyticDB PostgreSQL, covering null versus empty string handling, data type conversion challenges, character set issues, special character processing, and comprehensive testing strategies to ensure data consistency and performance.

dbaplus Community

Mar 7, 2022

How to Prevent Null, Type, and Charset Pitfalls in Oracle‑to‑ADB Data Sync

1. Null and Empty String Handling

In Oracle, null values and empty strings are treated the same, so IS NULL should be used for checks; ='' is ineffective. When syncing to other databases, you must confirm how the target represents Oracle's null/empty values and keep consistency between full‑load and incremental phases.

Additionally, fixed‑length CHAR columns pad with spaces, which cannot be detected by char_col='' or char_col IS NULL. Use TRIM(char_col) IS NULL or OCT_LENGTH(char_col) > 0 to identify such cases, as they cause mismatches after migration.

2. Data Type Conversion Issues

Cross‑database type conversion involves precision, efficiency, and compatibility. Examples:

Oracle → PostgreSQL: Numeric values can be stored as NUMBER without precision, but mapping to NUMERIC may lose performance; BIGINT is often a better choice, though it requires vendor confirmation.

Oracle → DB2: Primary‑key columns cannot contain leading or trailing spaces; spaces cause duplicate‑key conflicts during sync.

Oracle → AnalyticDB (ADB): Distributed columns must be part of the primary key due to MPP architecture.

Oracle → HBase: HBase requires a primary key; using Oracle's ROWID as rowkey satisfies this requirement.

3. Character Set Conversion Problems

When migrating between different character sets (e.g., BIG5 to UTF‑8), verify the hex representation of Chinese characters using Oracle's DUMP function or equivalent in the target. Remember that multibyte characters occupy different byte lengths (GBK = 2 bytes, UTF‑8 = 3 bytes) and adjust column widths accordingly.

Be aware of characters missing in the target set (e.g., BIG5 lacks the character “邨”) and custom‑defined character regions that may cause data loss.

4. Special Character Handling

Special characters such as single/double quotes, newlines, slashes, and backslashes can break full‑load processes. Recommended approaches include:

Using CSV format with proper escaping.

Employing multibyte delimiters.

Performing data cleansing before sync.

Syncing only “normal” data and handling “special” records separately.

5. Abnormal Record Processing

Records that violate database rules (e.g., illegal dates like 0000-00-00 00:00:00 or 2022-02-30 00:00:00, or NaN values) must be identified and corrected. For dates, a simple +1 then -1 adjustment often fixes them; otherwise, negotiate a reset with business owners.

6. Full‑Load Testing

Choose test tables that:

Include large tables to expose bottlenecks.

Cover all column types involved in the migration.

Contain multibyte data if the target handles such characters.

Are static or quasi‑static to simplify consistency verification.

7. Incremental Sync Testing

Before full deployment, run incremental sync on high‑change tables using a single‑process approach to surface configuration issues.

8. Data Consistency Verification

Validate consistency by:

Comparing static or quasi‑static snapshots between source and target.

Using built‑in MD5 functions to hash rows and compare hashes.

9. Software Limitations and Stress Testing

Identify the tool’s limits (e.g., supported data volume, feature combinations) and conduct stress tests such as:

Large‑transaction tests: Bulk operations to increase log volume and observe impact on sync latency and resource usage.

Long‑transaction tests: Verify that open transactions before incremental sync are handled correctly.

Frequent‑transaction tests: Detect performance degradation caused by many short transactions (e.g., excessive WITH AS usage).

Transaction‑order tests: Ensure the sync preserves the order of updates to avoid stale data overwriting newer data.

Batch DDL tests: Check how large batches of DDL statements affect the source’s parsing speed and sync stability.

Process‑restart tests: Observe behavior on normal and abnormal restarts, ensuring custom parameters persist.

By systematically addressing these areas, practitioners can reduce the risk of data loss, inconsistency, and performance bottlenecks during heterogeneous database synchronization projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Oracle AnalyticDB data-sync database testing NULL handling

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.