Big Data 9 min read

Mastering Inceptor Data Auditing: Table-Level Error Logging and Permissions

This article explains how Inceptor’s data auditing feature works, showing how to enable or disable table‑level error logging, interpret and query the Log Error Table, and manage its access permissions to ensure accurate query results despite dirty data.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Mastering Inceptor Data Auditing: Table-Level Error Logging and Permissions

Review Dirty data can affect query execution and accuracy, so analysts must clean data before loading it into a data warehouse. When cleaning is incomplete, data auditing can prevent dirty data from being accessed, improving query reliability.

In the previous article we introduced the concept of data auditing, how Inceptor uses a Log Error Table, and how to enable auditing. This continuation dives into table‑level attribute control, interpreting the Log Error Table, and permission management.

Controlling Data Auditing Attributes for a Specific Table

Inceptor allows users to maintain data auditing attributes at the table granularity, including turning auditing on or off and modifying its settings.

Disable data auditing for a table The following statement stops auditing for table_name and stops recording dirty‑data information for that table only: ALTER TABLE table_name SET ERRORS LOG OFF; Enable data auditing for a table To re‑activate auditing, the OVERWRITE and REJECT options must be set, with n as an integer:

ALTER TABLE table_name
SET ERRORS LOG ON OVERWRITE [on|off]
REJECT [on|off] LIMIT n ROWS;

This statement can also be used to modify the OVERWRITE and REJECT settings.

Example 1: Disable auditing for employee using SET OFF LOG

After SET OFF LOG, SELECT * FROM employee returns all rows, including dirty data.

Counting age now returns the total number of rows.

The Log Error Table shows no recorded dirty‑data entries.

Example 2: Re‑enable auditing for employee with OVERWRITE on, REJECT on, LIMIT 2 ROWS

After enabling, SELECT * FROM employee excludes dirty data from the result set. COUNT(age) now returns the count of valid rows only.

The employee_error_table records three error entries.

Log Error Table Details

When using data auditing, three aspects of the Log Error Table are often of interest:

Checking whether a Log Error Table is assigned

Run DESC FORMATTED table_name. If the output contains a line like:

ErrorTableSetting{errorTableName='employee_error_table', rejectEnable=true, overwriteOn=false, rowCount=2}

the table has an associated Log Error Table.

Structure of the Log Error Table

The system automatically creates the table with the following definition:

CREATE TABLE error_table_name (
  sql__time string,
  issue__sql string,
  real__table__name string,
  error__file__name string,
  error__block__offset bigint,
  error__msg string,
  raw__data string
);

Field meanings: sql__time: execution timestamp of the erroneous statement. issue__sql: the SQL that caused the error. real__table__name: the table accessed when dirty data was read. error__file__name: file containing the dirty data. error__block__offset: block offset of the dirty data within the file. error__msg: error message. raw__data: original erroneous record.

Alternatively, you can run DESC error_table_name; to view the structure.

Viewing Log Error Table contents

The Log Error Table can be queried like any regular table, e.g., SELECT * FROM error_table_name or SELECT count(*) FROM error_table_name to get the number of dirty‑data rows.

Example 3: Query Log Error Table

The query shows the contents of employee_error_table. Because the output can be long, it can be saved to a file (e.g., /tmp/error) for inspection.

Log Error Table Permission Control

Inceptor defines the following authorization rules for the Log Error Table:

If SELECT permission on the external table is granted to user B, user B can read the external table and write to its Log Error Table, but cannot read the Log Error Table unless SELECT on the Log Error Table is also granted.

If no permission is granted on the external table, user B has no permissions on the Log Error Table.

Summary

Through this two‑part series we introduced Inceptor’s data auditing feature, explained why it is needed, and demonstrated how to control its usage at the table level. Data auditing not only improves result accuracy by filtering out dirty data but also ensures smooth query execution, reducing the effort required for manual data cleaning and boosting analysis efficiency.

Data qualityData AuditingInceptorLog Error Table
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.