Resolving OAT Precheck ulimit Errors by Enabling PAM in SSH Configuration
This article explains why OAT's precheck fails due to mismatched ulimit values when SSH does not load PAM limits, and provides a step‑by‑step solution to enable PAM in sshd_config so the expected limits are applied correctly.
1 Background
A customer encountered an error during OAT server initialization at the precheck step: the current session hard limit of open_files (ulimit -H -n) was 4096 instead of the expected 655350.
ERROR - check current session hard limit of open_files (ulimit -H -n): 4096 != 655350 ... EXPECT 655350 ... FAIL
The error indicates that OAT requires the server's ulimit -H -n command to return 655350, but it actually returned 4096. The underlying cause was low virtual‑machine disk and memory specifications, which can be ignored for the purpose of this analysis.
2 Investigation Process
① Check the server's ulimit values
Log in to the target server via SSH and inspect the ulimit values.
Result: Not as expected
Switch to the admin user using su and check again.
Result: Matches the expected value
The discrepancy suggests that OAT, before the prepare step, had already adjusted kernel parameters using the global configuration file oceanbase_limits.conf , which should make the ulimit consistent for both SSH and SU sessions.
② Examine OAT's ulimit checking mechanism
OAT runs the script init_server_with_tag.py during the precheck.
[2024-09-27T16:34:52.775+0800] INFO - Running: ['airflow', 'tasks', 'run', 'init_server_with_tag', 'precheck', 'manual__2024-09-27T08:34:21.675314+00:00', '--job-id', '39950', '--raw', '--subdir', 'DAGS_FOLDER/init_server_with_tag.py', '--cfg-path', '/tmp/tmp5ni__moh']Inside the container, the script ultimately calls prepare.sh to perform the ulimit check.
# script task_engine/dags/init_server_with_tag.py
def precheck():
ctx = get_current_context()
common.server_precheck(ctx, logger=logger)
# script task_engine/plugins/common.py
def server_precheck(ctx, logger):
init_tag = ctx['params']['init_tag']
role = _get_server_role(init_tag)
envs = _get_custom_user_env(ctx['params'])
with ServerRemoteExecute(server_id=ctx['params']['server_id']) as client:
precheck_sh = SHELL_PATH / 'precheck.sh'
ret_code, _ = client.execute_script(
precheck_sh, args=('-m', role), control_master=False, logger=logger,
env={'LC_ALL': 'en_US.UTF-8', 'OB_IP': client.server['ip'], **envs})
if ret_code != 0:
raise RuntimeError('server precheck failed, please see the summary info above for details')
# script task_engine/shells/precheck.sh
check_limit() {
limit_type_list=(-H/hard -S/soft)
for limit in "${EXPECT_LIMITS[@]}"; do
limit_option=$(echo $limit | awk -F'/' '{print $1}')
expect_limit=$(echo $limit | awk -F'/' '{print $2}')
limit_description=$(echo $limit | awk -F'/' '{print $3}')
limit_item=$(echo $limit | awk -F'/' '{print $4}')
for limit_type in "${limit_type_list[@]}"; do
limit_type_option=$(echo $limit_type | awk -F'/' '{print $1}')
limit_type_description=$(echo $limit_type | awk -F'/' '{print $2}')
get_limit_cmd="ulimit $limit_type_option $limit_option"
# check new session
current_limit=$(runuser - "$EXPECT_USER" -c "$get_limit_cmd")
if ! compare_ulimit "$current_limit" "$expect_limit"; then
echo_fail "check permanent $limit_type_description limit of $limit_description ($get_limit_cmd): $current_limit != $expect_limit ... EXPECT $expect_limit"
echo_hint "modify /etc/security/limits.d/oceanbase_limits.conf\n echo \"* $limit_type_description $limit_item $expect_limit\" >> /etc/security/limits.d/oceanbase_limits.conf"
else
echo_pass "check $limit_type_description limit of new session $limit_description ($get_limit_cmd): $current_limit"
fi
# check current session
current_limit=$($get_limit_cmd)
if ! compare_ulimit "$current_limit" "$expect_limit"; then
echo_fail "check current session $limit_type_description limit of $limit_description ($get_limit_cmd): $current_limit != $expect_limit ... EXPECT $expect_limit"
echo_hint "excute: ulimit $limit_type_option $limit_option $expect_limit"
else
echo_pass "check $limit_type_description limit of $limit_description ($get_limit_cmd): $current_limit"
fi
done
done
}The script shows that OAT uses ServerRemoteExecute (SSH) to run prepare.sh on the target server, which performs the ulimit comparison.
③ Why does the SSH session report an incorrect ulimit?
Using strace su - admin reveals that the pam_limits.so module reads configuration from /etc/security/limits.conf and /etc/security/limits.d/oceanbase_limits.conf , resulting in the expected ulimit when switching to admin .
System logs ( /var/log/secure ) confirm that the su operation loads the PAM module, while the SSH operation does not.
Oct 14 17:44:44 10-186-58-85 su: pam_unix(su-l:session): session opened for user admin by root(uid=0)Inspecting /etc/ssh/sshd_config shows that UsePAM is set to no , meaning the SSH daemon does not load PAM and therefore ignores the limits configuration files.
3 Solution
Modify /etc/ssh/sshd_config to set UsePAM yes and restart the SSH service.
After enabling PAM, SSH sessions load pam_unix(sshd:session) as shown in the logs, and the ulimit values match the expected configuration.
Oct 14 17:51:56 10-186-58-85 sshd[26147]: Accepted publickey for root from 10.186.58.85 port 19480 ssh2: RSA SHA256:+TtbeuvInWm90vrJG7cHHm2G2a2FULFE0Uq+imx2m30
Oct 14 17:51:56 10-186-58-85 sshd[26147]: pam_unix(sshd:session): session opened for user root by (uid=0)4 Conclusion
OAT precheck fails because the target server's SSH daemon has PAM disabled, preventing it from reading the tuned ulimit configuration.
Enabling PAM in sshd_config resolves the discrepancy and allows OAT to verify the correct limits.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.