Fundamentals 5 min read

Automate Question Bank Extraction to Excel with Python & Pandas

This article demonstrates how to use Python and pandas to automatically read a text‑based question bank, parse questions and options, and export the data to an Excel file, providing complete code examples and a step‑by‑step explanation of the process.

Python Crawling & Data Mining

Mar 8, 2023

Automate Question Bank Extraction to Excel with Python & Pandas

1. Introduction

Hello, I am PiPi. In a Python community a member asked how to automate processing of a question bank. Below is the problem and a complete solution using Python.

import pandas as pd
import chardet

# Read the question bank file
with open('未命名.txt', 'rb') as f:
    encoding = chardet.detect(f.read())['encoding']

with open('未命名.txt', 'r', encoding=encoding) as f:
    lines = f.readlines()

# Separate questions and options into lists
questions = []
options = []
for line in lines:
    elements = line.strip().split('  ')
    if len(elements) == 5:
        q, a, b, c, d = elements
        questions.append(q)
        options.append([a, b, c, d])
    else:
        print(f'Error: invalid line: {line}')

# Store data into an Excel file
df = pd.DataFrame({
    '题目': questions,
    '选项A': [o[0] for o in options],
    '选项B': [o[1] for o in options],
    '选项C': [o[2] for o in options],
    '选项D': [o[3] for o in options]
})
df.to_excel('question_bank.xlsx', index=False)

The script reads the raw text file, detects its encoding, splits each line into a question and four options, and writes the structured data to question_bank.xlsx. The following image shows the execution result.

2. Implementation Process

Another contributor provided a slightly different approach. After a minor modification, the code runs correctly and produces the expected Excel file.

import pandas as pd
import chardet

# Read the question bank file
with open('未命名.txt', 'rb') as f:
    encoding = chardet.detect(f.read())['encoding']

with open('未命名.txt', 'r', encoding=encoding) as f:
    lines = f.readlines()

# Only split lines that contain exactly four options
# Store data into an Excel file
df = pd.DataFrame([lines[i: i + 5] for i in range(0, len(lines), 5)],
                  columns=['题目', '选项A', '选项B', '选项C', '选项D'])
df = df.apply(lambda x: x.str.strip())
df.to_excel('question_bank.xlsx', index=False)

The revised script groups every five lines (one question plus four options) into a DataFrame, trims whitespace, and saves the result. The image below shows the final output.

3. Conclusion

This article presented a practical Python automation solution for processing a question bank, covering the problem analysis, complete code implementations, and the resulting Excel file. The approach helps users quickly transform unstructured text data into a structured spreadsheet for further analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Excel Pandas data-processing

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.