Python XML Parsing with xml.etree.ElementTree and xml.dom.minidom
This tutorial explains how to parse, query, modify, and delete XML data in Python using the built‑in xml.etree.ElementTree and xml.dom.minidom modules, providing step‑by‑step code examples for reading files, handling strings, accessing elements, attributes, and writing updated XML back to disk.
Python provides two built‑in modules for working with XML: xml.etree.ElementTree (a lightweight tree API) and xml.dom.minidom (a minimal DOM implementation). Both can read XML from files or strings, navigate the hierarchical structure, and modify the document.
What is XML? XML (eXtensible Markup Language) is a markup language for representing structured data, similar in appearance to HTML but designed for data interchange between client and server.
Example XML file Sample.xml used throughout the tutorial:
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
<item name="breakfast">Idly</item>
<price>$2.5</price>
<description>Two idly's with chutney</description>
<calories>553</calories>
</food>
... (other food items) ...
</metadata>Using xml.etree.ElementTree
Parse a file with parse() :
import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()Parse a string with fromstring() :
import xml.etree.ElementTree as ET
data = '''<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
<item name="breakfast">Idly</item>
<price>$2.5</price>
<description>Two idly's with chutney</description>
<calories>553</calories>
</food>
</metadata>'''
myroot = ET.fromstring(data)
print(myroot.tag)Access root element, child tags, and text:
print(myroot) # <Element 'metadata' at 0x...>
print(myroot[0].tag) # food
for x in myroot[0]:
print(x.tag, x.attrib) # item {'name': 'breakfast'} etc.
for x in myroot[0]:
print(x.text) # Idly, $2.5, Two idly's with chutney, 553Find specific elements and attributes:
for x in myroot.findall('food'):
item = x.find('item').text
price = x.find('price').text
print(item, price)Modify XML – add, update, or delete nodes:
# Add a new attribute to each description
for description in myroot.iter('description'):
new_desc = description.text + ' will be served'
description.text = new_desc
description.set('updated', 'yes')
mytree.write('new.xml')
# Add a new sub‑element
ET.SubElement(myroot[0], 'speciality')
for x in myroot.iter('speciality'):
x.text = 'South Indian Special'
mytree.write('output5.xml')
# Delete an attribute
myroot[0][0].attrib.pop('name', None)
mytree.write('output5.xml')
# Remove a child element
myroot[0].remove(myroot[0][0])
mytree.write('output6.xml')
# Clear all children of a tag
myroot[0].clear()
mytree.write('output7.xml')Using xml.dom.minidom
Parse a file with parse() :
from xml.dom import minidom
p1 = minidom.parse('sample.xml')
print(p1)Parse a string with parseString() :
p3 = minidom.parseString('<myxml>Using<empty/> parseString</myxml>')
print(p3)Access elements by tag name:
dat = minidom.parse('sample.xml')
item_node = dat.getElementsByTagName('item')[0]
print(item_node) # <DOM Element: item at ...>
print(item_node.attributes['name'].value) # breakfast
print(item_node.firstChild.data) # IdlyIterate over all items and count them:
items = dat.getElementsByTagName('item')
for x in items:
print(x.firstChild.data)
print('Total items:', len(items))The article concludes with a brief promotion for a free Python public course and links to related tutorials.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.