21 min read

Exploring Devika AI: An Open‑Source AI Programmer’s Capabilities and Limits

Devika AI, an open‑source AI programmer from Stition AI, is examined for its architecture, supported actions, installation steps, and real‑world performance across tasks such as building a Snake game, Conway’s Game of Life, Vue3 components, and unit‑test generation, highlighting strengths, weaknesses, and future potential.

MoonWebTeam

Apr 23, 2024

Exploring Devika AI: An Open‑Source AI Programmer’s Capabilities and Limits

1. Introduction

With the rapid development of artificial intelligence, AI programmers like Devin and AutoDev have become hot topics in software engineering, dramatically improving development efficiency and automation while hinting at a possible transformation of programmers' work methods.

Devin, the first AI programmer released by Cognition Labs, can independently plan and complete software engineering tasks without human assistance, solving 13.86% of SWE‑Bench problems without any help, surpassing previous models. However, Cognition Labs has not disclosed the underlying model details and the tool is not publicly available.

Fortunately, Stition AI from India has open‑sourced a replacement called Devika AI, aiming to match Devin’s SWE‑Bench score and eventually surpass it.

This article introduces Devika AI’s actual capabilities, architecture, and user experience.

2. DEVIKA AI Overview

Devika AI, built by the Indian team Stition AI, is an AI software engineer capable of understanding complex human instructions and breaking them into executable steps. It generates code to achieve user goals with minimal manual guidance.

The basic workflow involves automatic task decomposition, execution, and conversion of recent user queries into actionable commands until the instruction is fulfilled.

Devika currently supports six action commands; aligning prompts with these commands yields better results.

- `answer` - Answer a question about the project.
- `run` - Run the project.
- `deploy` - Deploy the project.
- `feature` - Add a new feature to the project.
- `bug` - Fix a bug in the project.
- `report` - Generate a report on the project.

Devika’s basic flow diagram:

The system architecture consists of the following key components:

User Interface: web‑based chat interface for interaction, file viewing, and agent status monitoring.

Agent Core: core component that orchestrates AI planning, reasoning, and execution.

Large Language Model: utilizes Claude, GPT‑3.5/4, and local LLMs such as Ollama for natural language understanding and generation.

Planning and Reasoning Engine: breaks high‑level goals into actionable steps and makes decisions based on the current environment.

Research Module: uses keyword extraction and web browsing to gather relevant information for the task.

Code Generation Module: generates code in multiple programming languages based on plans and research results.

Browser Interaction Module: navigates websites, extracts information, and interacts with web elements using Playwright.

Knowledge Base: stores and retrieves project‑specific information, code snippets, and learned knowledge.

Database: persists project data, agent state, and configuration settings.

Overall architecture diagram:

3. DEVIKA AI Installation & Configuration

Installation follows the official GitHub repository GitHub - stitionai/devika .

Key points to note:

Environment requirements

- Python >= 3.9 and < 3.12
- NodeJs >= 18 # UI runtime
- bun

Model configuration – the recommended model is Google Gemini 1.0 Pro (free).

Network access – most LLM APIs enforce IP checks, so a VPN or proxy is required. Example proxy settings:

export https_proxy=http://127.0.0.1:7890 http_proxy=http://127.0.0.1:7890 all_proxy=socks5://127.0.0.1:7890

4. DEVIKA AI Usage Experience

4.1 Full video demo: creating a simple Snake game

Devika automatically decomposes the task, navigates the web, and generates code. The initial code did not meet requirements and required manual bug fixing; Gemini handled Chinese better than LLM‑3.5, while GPT‑4 performed best.

import pygame
import random

pygame.init()
screen = pygame.display.set_mode((640, 480))
clock = pygame.time.Clock()

snake = [(320, 240), (320, 240), (320, 240)]
food = (random.randint(0, 59) * 10, random.randint(0, 47) * 10)
direction = pygame.K_RIGHT

running = True
while running:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False
        elif event.type == pygame.KEYDOWN:
            if event.key == pygame.K_UP and direction != pygame.K_DOWN:
                direction = pygame.K_UP
            elif event.key == pygame.K_DOWN and direction != pygame.K_UP:
                direction = pygame.K_DOWN
            elif event.key == pygame.K_LEFT and direction != pygame.K_RIGHT:
                direction = pygame.K_LEFT
            elif event.key == pygame.K_RIGHT and direction != pygame.K_LEFT:
                direction = pygame.K_RIGHT

    if snake[0] == food:
        food = (random.randint(0, 59) * 10, random.randint(0, 47) * 10)
        snake.append((0, 0))
    else:
        for i in range(len(snake) - 1, 0, -1):
            snake[i] = snake[i - 1]

    if direction == pygame.K_UP:
        snake[0] = (snake[0][0], snake[0][1] - 10)
    elif direction == pygame.K_DOWN:
        snake[0] = (snake[0][0], snake[0][1] + 10)
    elif direction == pygame.K_LEFT:
        snake[0] = (snake[0][0] - 10, snake[0][1])
    elif direction == pygame.K_RIGHT:
        snake[0] = (snake[0][0] + 10, snake[0][1])

    if snake[0][0] < 0 or snake[0][0] > 630 or snake[0][1] < 0 or snake[0][1] > 470:
        running = False
    for i in range(1, len(snake)):
        if snake[0] == snake[i]:
            running = False

    screen.fill((0, 0, 0))
    for part in snake:
        pygame.draw.rect(screen, (255, 255, 255), pygame.Rect(part[0], part[1], 10, 10))
    pygame.draw.rect(screen, (255, 0, 0), pygame.Rect(food[0], food[1], 10, 10))

    pygame.display.update()
    clock.tick(10)

pygame.quit()

The generated Snake game is simple; more complex tasks benefit from stronger models such as GPT‑4.

4.2 Using Pygame to implement Conway’s Game of Life

Conway’s Game of Life is a classic cellular automaton with four simple rules that model birth, survival, and death.

When tested with a local Llama2 model, Devika only returned a trivial print("Example") because the prompt forced a specific JSON format.

After adjusting the prompt, Gemini 1.0 Pro produced functional code:

import numpy as np
import pygame

# Initialize Pygame
pygame.init()

# Game settings
SCREEN_WIDTH = 500
SCREEN_HEIGHT = 500
CELL_SIZE = 10
FPS = 30

# Colors
BLACK = (0, 0, 0)
WHITE = (255, 255, 255)

# Create the game screen
screen = pygame.display.set_mode((SCREEN_WIDTH, SCREEN_HEIGHT))
pygame.display.set_caption("Conway's Game of Life")

grid = np.zeros((SCREEN_WIDTH // CELL_SIZE, SCREEN_HEIGHT // CELL_SIZE))

running = True
while running:
    # Handle events
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False

    # Update the game grid
    for i in range(grid.shape[0]):
        for j in range(grid.shape[1]):
            # Count the number of alive neighbors
            neighbors = 0
            for x in range(-1, 2):
                for y in range(-1, 2):
                    if x == 0 and y == 0:
                        continue
                    if i + x >= 0 and i + x < grid.shape[0] and j + y >= 0 and j + y < grid.shape[1] and grid[i + x, j + y] == 1:
                        neighbors += 1

            # Apply the game rules
            if grid[i, j] == 1:
                if neighbors < 2 or neighbors > 3:
                    grid[i, j] = 0
            else:
                if neighbors == 3:
                    grid[i, j] = 1

    # Draw the game grid
    screen.fill(BLACK)
    for i in range(grid.shape[0]):
        for j in range(grid.shape[1]):
            if grid[i, j] == 1:
                pygame.draw.rect(screen, WHITE, (i * CELL_SIZE, j * CELL_SIZE, CELL_SIZE, CELL_SIZE))

    # Update the display
    pygame.display.update()
    pygame.time.Clock().tick(FPS)

pygame.quit()

Running the code launched the terminal but the white cells did not appear; manual intervention was required to fix the issue.

4.3 Clone a project and write a Vue3 virtual‑scroll list component

Clone *** project and install dependencies, then create a Vue3 virtual scroll list component under packages/***/components/gemini/

Devika correctly split the task but initially omitted placing the component in the specified directory; after further prompting it succeeded.

4.4 Provide unit tests for local project code

Write Jest unit tests for all functions in /Users/xxx/xxx/game-private/packages/shared/utils/format.ts

Devika could not read the local file and reported that the file does not exist, demonstrating limited ability to access local resources.

4.5 Verify Devika knowledge base

When re‑executing a previously performed task, Devika returned answers from its internal knowledge base instead of fetching fresh web data, showing that the knowledge base currently offers only simple tag‑matching functionality.

4.6 Summary of experience

Devika provides a planning and reasoning engine that decomposes high‑level goals into actionable steps.

The browser interaction module enables web navigation, data collection, and presents screenshots to the user.

Code generation quality is tightly coupled to the underlying LLM’s capabilities.

Self‑correction exists but is unstable; sometimes a service restart is needed.

Generated code is stored under data/projects/<project‑name>, configurable via settings.

Devika cannot autonomously read local files; manual prompts are required.

The built‑in knowledge base and database perform simple tag equality matching, offering limited practical value.

Logs help locate issues, yet LLM responses are often not actionable without human intervention.

Generated code is not automatically executed for self‑debugging; human intervention is required for error correction.

5. Conclusion

Although still in its early stage, Devika already shows the potential of AI programmers to automate repetitive tasks and improve productivity. With continuous improvement and community contributions, it may become a competitive open‑source alternative to Devin AI.

As AI programmers evolve, developers can expect higher efficiency, streamlined workflows, and easier handling of complex coding tasks, marking the beginning of a collaborative coding era between humans and AI.

References:

GitHub – stitionai/devika

AutoDev: Automated AI‑Driven Development (arXiv)

large language models tool evaluation Devika AI

Written by

MoonWebTeam

Official account of MoonWebTeam. All members are former front‑end engineers from Tencent, and the account shares valuable team tech insights, reflections, and other information.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.