Understanding Python’s `is` Operator, re.sub quirks, and list pitfalls
This article explains why Python’s `is` operator yields True or False for integers, how integer caching and code‑block scope affect object identity, uncovers a subtle misuse of re.sub’s count parameter, clarifies lstrip’s character‑wise stripping, and demonstrates the pitfalls of creating nested lists with the `[*] * n` syntax.
1. Which comparisons are True and which are False?
Three groups of code are examined to see the result of
a is bfor different integer values.
<code># First group
a = 256
b = 256
a is b
# Second group
a = 257
b = 257
a is b
# Third group
a = 257; b = 257
a is b
</code>The first and third groups evaluate to True , while the second evaluates to False . Using
id()shows that 256 is cached by Python (ids are identical), but 257 is not cached when entered on separate lines, so distinct objects are created.
Python caches small integers in the range [-5, 256]; values outside this range are newly allocated each time unless they appear in the same code block, where the interpreter may reuse the same object.
<code># Demonstration inside a function
a = 257
b = 257
def func():
c = 257
print(a is c) # False
print(a is b) # True
func()
</code>In the interactive prompt each line is a separate code block, so the first two
257objects differ, while the third group shares the same block and reuses the object.
2. Using re.sub() correctly
The following function removes all HTML tags from a string using a regular expression.
<code>import re
def remove_tag(html):
text = re.sub(r'<.*?>', '', html, re.S)
return text
</code>When the fourth argument of
re.subis mistakenly set to
re.S, it is interpreted as the
countparameter (value 16), so only the first 16 matches are replaced, leaving the last two tags (
</body></html>) untouched.
<code>print(re.S) # 16
</code>3. How lstrip() works
lstrip()removes characters from the left side of a string. When a string of characters is passed, each character is stripped individually until a character not in the set is encountered.
<code>print("aabbcc".lstrip('aa')) # bbcc
print("ababacac".lstrip('ab')) # cac
</code>To remove a specific prefix instead of any combination of its characters, use
replace()or other methods.
4. Pitfalls of nested list creation
Creating a list of three empty lists can be done with a comprehension or by multiplication, but the latter creates references to the same inner list.
<code># Option 1
li = [[] for i in range(3)]
# Option 2
li = [[]] * 3
</code>Appending to one sub‑list of the multiplied version modifies all sub‑lists because they reference the same object:
<code>li = [[]] * 3
li[0].append(1)
print(li) # [[1], [1], [1]]
</code>Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.