How to Prevent Null, Type, and Charset Pitfalls in Oracle‑to‑ADB Data Sync
This article details the common pitfalls encountered when synchronizing Oracle databases to AnalyticDB PostgreSQL, covering null versus empty string handling, data type conversion challenges, character set issues, special character processing, and comprehensive testing strategies to ensure data consistency and performance.
1. Null and Empty String Handling
In Oracle, null values and empty strings are treated the same, so IS NULL should be used for checks; ='' is ineffective. When syncing to other databases, you must confirm how the target represents Oracle's null/empty values and keep consistency between full‑load and incremental phases.
Additionally, fixed‑length CHAR columns pad with spaces, which cannot be detected by char_col='' or char_col IS NULL. Use TRIM(char_col) IS NULL or OCT_LENGTH(char_col) > 0 to identify such cases, as they cause mismatches after migration.
2. Data Type Conversion Issues
Cross‑database type conversion involves precision, efficiency, and compatibility. Examples:
Oracle → PostgreSQL: Numeric values can be stored as NUMBER without precision, but mapping to NUMERIC may lose performance; BIGINT is often a better choice, though it requires vendor confirmation.
Oracle → DB2: Primary‑key columns cannot contain leading or trailing spaces; spaces cause duplicate‑key conflicts during sync.
Oracle → AnalyticDB (ADB): Distributed columns must be part of the primary key due to MPP architecture.
Oracle → HBase: HBase requires a primary key; using Oracle's ROWID as rowkey satisfies this requirement.
3. Character Set Conversion Problems
When migrating between different character sets (e.g., BIG5 to UTF‑8), verify the hex representation of Chinese characters using Oracle's DUMP function or equivalent in the target. Remember that multibyte characters occupy different byte lengths (GBK = 2 bytes, UTF‑8 = 3 bytes) and adjust column widths accordingly.
Be aware of characters missing in the target set (e.g., BIG5 lacks the character “邨”) and custom‑defined character regions that may cause data loss.
4. Special Character Handling
Special characters such as single/double quotes, newlines, slashes, and backslashes can break full‑load processes. Recommended approaches include:
Using CSV format with proper escaping.
Employing multibyte delimiters.
Performing data cleansing before sync.
Syncing only “normal” data and handling “special” records separately.
5. Abnormal Record Processing
Records that violate database rules (e.g., illegal dates like 0000-00-00 00:00:00 or 2022-02-30 00:00:00, or NaN values) must be identified and corrected. For dates, a simple +1 then -1 adjustment often fixes them; otherwise, negotiate a reset with business owners.
6. Full‑Load Testing
Choose test tables that:
Include large tables to expose bottlenecks.
Cover all column types involved in the migration.
Contain multibyte data if the target handles such characters.
Are static or quasi‑static to simplify consistency verification.
7. Incremental Sync Testing
Before full deployment, run incremental sync on high‑change tables using a single‑process approach to surface configuration issues.
8. Data Consistency Verification
Validate consistency by:
Comparing static or quasi‑static snapshots between source and target.
Using built‑in MD5 functions to hash rows and compare hashes.
9. Software Limitations and Stress Testing
Identify the tool’s limits (e.g., supported data volume, feature combinations) and conduct stress tests such as:
Large‑transaction tests: Bulk operations to increase log volume and observe impact on sync latency and resource usage.
Long‑transaction tests: Verify that open transactions before incremental sync are handled correctly.
Frequent‑transaction tests: Detect performance degradation caused by many short transactions (e.g., excessive WITH AS usage).
Transaction‑order tests: Ensure the sync preserves the order of updates to avoid stale data overwriting newer data.
Batch DDL tests: Check how large batches of DDL statements affect the source’s parsing speed and sync stability.
Process‑restart tests: Observe behavior on normal and abnormal restarts, ensuring custom parameters persist.
By systematically addressing these areas, practitioners can reduce the risk of data loss, inconsistency, and performance bottlenecks during heterogeneous database synchronization projects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
