Cloud Native 27 min read

Docker Container Fails to Start? Common Causes and Troubleshooting Commands

This guide walks operators through a systematic, step‑by‑step process for diagnosing Docker container startup failures, covering status checks, log inspection, detailed use of docker inspect, and categorized troubleshooting of image, configuration, resource, permission, network, and volume issues with concrete commands and examples.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Docker Container Fails to Start? Common Causes and Troubleshooting Commands

Background

Docker containers often fail to start because of image problems, mis‑configuration, resource limits, network connectivity, permission restrictions, health‑check failures, or dependent services that are not ready. The container exits, and without the -a flag docker ps shows only running containers, making troubleshooting harder.

Troubleshooting workflow

Confirm container status – docker ps -a | grep <container-name> View logs – docker logs <container-id> and docker logs -f <container-id> Inspect details – docker inspect <container-id> Identify root cause – image, configuration, resource, permission, network, dependency, or health‑check problems.

Fix and verify – re‑run or start the container.

Step 1 – Confirm container status

# List all containers, including exited ones
docker ps -a

# Show a specific container’s brief status
docker ps -a | grep <container-name>

# Formatted output for clarity
docker ps -a --format "table {{.ID}}	{{.Names}}	{{.Status}}	{{.Image}}"

Typical Status values: Exited (1) … – exited with code 1 Created – created but not started Up … – running Restarting (1) (starting) – repeatedly crashing

Step 2 – View container logs

# Show stdout + stderr
docker logs <container-id>

# Follow live output
docker logs -f <container-id>

# Show last 100 lines
docker logs --tail 100 <container-id>

# Show timestamps
docker logs --timestamps <container-id>

# Show logs since a specific time
docker logs --since "2024-01-15T10:00:00" <container-id>

docker logs --since 30m <container-id>

# Filter error lines
docker logs <container-id> 2>&1 | grep -i error

Reasons docker logs may be empty:

The entrypoint redirects output to a file (e.g. CMD ["python","app.py","> /var/log/app.log"]).

The container uses a non‑ json-file log driver such as syslog, fluentd, or awslogs.

Log files have been rotated or removed.

# Check log driver
docker inspect <container-id> --format='{{.HostConfig.LogConfig.Type}}'

# If json‑file, view raw log file
cat /var/lib/docker/containers/<container-id>/*-json.log | tail -100

Step 3 – Inspect container details

# Full JSON output
docker inspect <container-id>

# Extract State as JSON
docker inspect <container-id> --format='{{json .State}}' | jq .

# Extract Config as JSON
docker inspect <container-id> --format='{{json .Config}}' | jq .

# Extract HostConfig as JSON
docker inspect <container-id> --format='{{json .HostConfig}}' | jq .

# Common fields
docker inspect <container-id> --format='
State: {{.State.Status}}
ExitCode: {{.State.ExitCode}}
OOMKilled: {{.State.OOMKilled}}
Error: {{.State.Error}}
StartedAt: {{.State.StartedAt}}
FinishedAt: {{.State.FinishedAt}}
Path: {{.Path}}
Args: {{.Args}}
WorkingDir: {{.Config.WorkingDir}}
Cmd: {{.Config.Cmd}}
Entrypoint: {{.Config.Entrypoint}}
Env: {{range .Config.Env}}{{.}} {{end}}'

Common exit codes and typical causes:

0 – normal exit; process completed.

1 – general error; application configuration or parameter error.

125 – Docker daemon error; e.g., memory limit caused daemon to kill the container.

126 – command not executable; CMD/ENTRYPOINT permission or path issue.

127 – command not found; PATH problem or missing binary.

137 – SIGKILL (OOM); out‑of‑memory kill.

139 – SIGSEGV; segmentation fault.

143 – SIGTERM; graceful stop (e.g., docker stop).

Category‑specific checks

Image issues

Typical symptoms:

Error: image nginx:1.24 not found
Layer already exists
docker: Error response from daemon: manifest for … not found

Diagnostic commands:

# List local images
docker images

# Inspect image metadata
docker inspect nginx:1.24

# Pull the image
docker pull nginx:1.24

# Verify tag existence
docker manifest inspect nginx:1.24

# Remove dangling images
docker image prune -f

Common scenarios:

Tag is latest but the image was not pushed or the tag was not updated.

Registry address typo.

Image was deleted or overwritten; local digest is stale.

Cross‑architecture pull (e.g., ARM host pulling x86 image).

# Check image architecture
docker inspect <image> | grep Architecture

# Show digests
docker images --digests

Fixes:

# Pull latest image
docker pull <image>:<tag>

# For private registries
docker login registry.example.com
docker pull registry.example.com/my-app:v1.2.3

# Roll back to a known digest
docker images --digests | grep <image>
docker run --rm <image>@sha256:xxxx

Configuration issues

Port conflict

docker: Error response from daemon: Ports are not available: bind address port already in use.

Check which process holds the port:

# Linux
ss -tlnp | grep :80
netstat -tlnp | grep :80

# macOS
lsof -i :80

# Docker daemon listening ports
ps aux | grep dockerd

Fixes:

# Use a different host port
docker run -p 8080:80 nginx

# Stop the occupying service
systemctl stop nginx
kill $(lsof -t i:80)

# Check other containers using the same port
docker ps --format "{{.Names}} {{.Ports}}"

Missing or wrong environment variables

FATAL: Required environment variable DATABASE_URL is not set
# Inspect env vars
docker inspect <container-id> --format='{{range .Config.Env}}{{.}} {{end}}'

docker inspect <container-id> | jq '.Config.Env'

Fixes:

# Pass env var at run time
docker run -e "DATABASE_URL=postgres://user:pass@host:5432/db" my-app

# Docker‑compose variant
docker-compose run -e "DATABASE_URL=…" app

# Verify .env file
cat .env

Incorrect startup command

# Symptoms: exit code 126/127, "command not found", or immediate container exit.
# Inspect CMD and ENTRYPOINT
docker inspect <container-id> --format='{{.Config.Cmd}}'

docker inspect <container-id> --format='{{.Config.Entrypoint}}'

Test the command manually:

# Run command inside image
docker run --rm <image> <cmd> <args>

# Open a shell
docker run --rm -it <image> sh

Common mistakes:

ENTRYPOINT and CMD order reversed.

Using shell form (e.g., CMD python app.py) which interferes with signal handling.

Path errors (e.g., CMD ["app.py"] when WORKDIR is not correct).

Resource issues

Out‑of‑Memory (OOM)

# Symptoms: no logs, OOMKilled: true, exit code 137/143.
# Check OOM flag
docker inspect <container-id> | grep OOMKilled

# Host memory
free -h

# cgroup limits
cat /sys/fs/cgroup/memory/docker/<container-id>/memory.limit_in_bytes
cat /sys/fs/cgroup/memory/docker/<container-id>/memory.usage_in_bytes

# Kernel OOM messages
dmesg | grep -i oom
journalctl | grep -i oom

Fixes:

# Increase memory limit
docker run --memory=1g my-app

# Remove limit (not recommended for production)
docker run --memory="" my-app

# Investigate application memory leaks
docker stats --no-stream

Disk‑space exhaustion

# Symptoms: "no space left on device", "disk quota exceeded".
# Disk usage
df -h
df -h /var/lib/docker

# Docker storage usage
docker system df

# Largest containers
docker ps -s | sort -k3 -rh | head

# Clean up
docker system prune -af
docker builder prune -af

Fixes:

# Remove unused resources
docker system prune -a --volumes

# Truncate container log file
> /var/lib/docker/containers/<container-id>/*-json.log

# Configure log rotation (daemon.json)
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  }
}

Permission issues

SELinux / AppArmor

# SELinux status
getenforce

# SELinux denial logs
ausearch -m AVC -ts recent

# AppArmor status
aa-status
apparmor_parser -r /etc/apparmor.d/*

# Privileged mode flag
docker inspect <container-id> | grep Privileged

Fixes:

# Temporarily disable SELinux (not for prod)
setenforce 0

# Run privileged (not recommended)
docker run --privileged my-app

# Proper SELinux label
docker run -v /data:/data:Z my-app

# Disable AppArmor profile for testing
docker run --security-opt apparmor=unconfined my-app

Filesystem permissions

# ReadonlyRootfs flag
docker inspect <container-id> | grep ReadonlyRootfs

# Volume mount permissions
ls -la /var/lib/docker/volumes/<volume-name>/_data

# Container user
docker inspect <container-id> | grep -E "User|WorkingDir"

Fixes:

# Set correct workdir and ownership in Dockerfile
WORKDIR /app
RUN chown -R appuser:appuser /app

# Run as specific user
docker run -u appuser my-app

# Run as root with privileged if needed
docker run -u root --privileged my-app

Network issues

Dependent service not ready

# Test connectivity from container
docker exec <app-container-id> nc -zv db-host 5432
docker exec <app-container-id> curl -v http://api-host:8080/health

# DNS resolution
docker exec <app-container-id> nslookup db-host
docker exec <app-container-id> cat /etc/resolv.conf

# Inspect network settings
docker inspect <container-id> | grep -A 10 "Networks"

Solutions:

# docker‑compose depends_on (order only)
services:
  app:
    image: my-app
    depends_on:
      - db
      - redis
  db:
    image: postgres:15
  redis:
    image: redis:alpine

# healthcheck + depends_on condition
services:
  db:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL","pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 5
  app:
    image: my-app
    depends_on:
      db:
        condition: service_healthy
# Python example of retry logic
import time, psycopg2, os

def connect_with_retry(max_retries=10, delay=5):
    for i in range(max_retries):
        try:
            return psycopg2.connect(os.environ['DATABASE_URL'])
        except psycopg2.OperationalError as e:
            print(f"Attempt {i+1} failed: {e}")
            time.sleep(delay)
    raise Exception("Could not connect to database after retries")

Health‑check failures

# Show health‑check config
docker inspect <container-id> | grep -A 10 "Health"

# Show health‑check state JSON
docker inspect --format='{{json .State.Health}}' <container-id> | jq .

# Show health‑check log entries
docker inspect <container-id> | grep -A 5 "Log"

Typical fix:

# Example healthcheck in docker‑compose.yml
services:
  app:
    image: my-app
    healthcheck:
      test: ["CMD","curl","-f","http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Common pitfalls: curl not installed – use wget instead.

Health‑check endpoint returns non‑200 (e.g., 401/403). start_period too short for slow‑starting apps.

# Test healthcheck manually
docker exec <container-id> curl -f http://localhost:8080/health || exit 1

Volume issues

Bind‑mount source path does not exist

docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /data/logs.
# Verify host directory
ls -la /data/logs
ls -la /data

# List Docker volumes
docker volume ls
docker volume inspect <volume-name>

Fixes:

# Create directory
mkdir -p /data/logs
chmod 755 /data/logs

# Use a named volume (auto‑creates)
docker volume create my-data

Volume content overwritten

# Avoid mounting over important container directories
# Use an empty host directory or a named volume instead
services:
  app:
    volumes:
      - app_data:/var/lib/app

volumes:
  app_data:
    driver: local

Practice scenarios

Scenario 1 – MySQL container fails

# Symptom: container exits immediately, logs show password error
docker ps -a | grep mysql
docker logs <container-id>

# Common causes
# 1. Password contains special characters – env parsing issue
# 2. Init script failure
# 3. Data directory permission problem

# Fix
docker run -e MYSQL_ROOT_PASSWORD='MyPass@123!' mysql:8.0

# Verify
docker exec -it <container-id> mysql -uroot -p'MyPass@123!' -e "SELECT VERSION();"

Scenario 2 – Redis port binding failure

# Symptom: logs show "bind: Cannot assign requested address"
ss -tlnp | grep 6379

# Fix
docker run -p 6379:6379 redis:alpine

Scenario 3 – Nginx 502 Bad Gateway

# Check Nginx config inside container
docker exec <nginx-id> cat /etc/nginx/conf.d/default.conf

# Test connectivity to backend
docker exec <nginx-id> ping backend-app
docker exec <nginx-id> nc -zv backend-app 8080

# Verify network
docker network ls
docker network inspect bridge
docker network inspect <network-name>

# Fix: place Nginx and backend in same network
docker network create my-net
docker run --network my-net --name backend my-app
docker run --network my-net --name nginx -p 80:80 nginx

Scenario 4 – Java application container crash

# Symptom: NoClassDefFoundError, exit code 1
docker logs <container-id>

docker run --rm -it <image> java -version

# Possible causes
# 1. JRE not installed in image
# 2. Wrong classpath
# 3. Missing JAR dependencies

# Inspect image contents
docker run --rm -it <image> ls -la /app
docker run --rm -it <image> env | grep JAVA

# Fix Dockerfile example
FROM eclipse-temurin:17-jre-alpine
COPY app.jar /app/app.jar
WORKDIR /app
CMD ["java","-jar","app.jar"]

Quick‑reference command cheat sheet

# Container status
docker ps -a
docker ps -a | grep <name>

# Logs
docker logs <id>
docker logs -f <id>
docker logs --tail 100 <id>
docker logs --timestamps <id>

# Details
docker inspect <id>
docker inspect <id> --format='{{.State.Status}}'
docker inspect <id> | jq '.State'

# Images
docker images
docker pull <image>:<tag>
docker rmi <image-id>

# Networks
docker network ls
docker network inspect <network-name>

docker exec <id> cat /etc/resolv.conf

# Resource usage
docker stats --no-stream
docker system df

# Cleanup
docker system prune -af
docker image prune -af
docker builder prune -af
docker container prune

Summary

The most effective first step when a container fails to start is to examine docker logs and docker inspect. These commands reveal exit codes, OOM flags, configuration errors, and other clues that resolve the majority of issues. If the logs are insufficient, run the image interactively ( docker run --rm -it <image> sh) to reproduce the startup sequence step by step.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerConfigurationnetworkpermissionContainertroubleshootingresourceImagelogsinspect
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.