Fundamentals 7 min read

Understanding Python’s `is` Operator, re.sub quirks, and list pitfalls

This article explains why Python’s `is` operator yields True or False for integers, how integer caching and code‑block scope affect object identity, uncovers a subtle misuse of re.sub’s count parameter, clarifies lstrip’s character‑wise stripping, and demonstrates the pitfalls of creating nested lists with the `[*] * n` syntax.

Raymond Ops
Raymond Ops
Raymond Ops
Understanding Python’s `is` Operator, re.sub quirks, and list pitfalls

1. Which comparisons are True and which are False?

Three groups of code are examined to see the result of

a is b

for different integer values.

<code># First group
a = 256
b = 256
a is b

# Second group
a = 257
b = 257
a is b

# Third group
a = 257; b = 257
a is b
</code>

The first and third groups evaluate to True , while the second evaluates to False . Using

id()

shows that 256 is cached by Python (ids are identical), but 257 is not cached when entered on separate lines, so distinct objects are created.

Python caches small integers in the range [-5, 256]; values outside this range are newly allocated each time unless they appear in the same code block, where the interpreter may reuse the same object.

<code># Demonstration inside a function
a = 257
b = 257
def func():
    c = 257
    print(a is c)  # False
print(a is b)      # True
func()
</code>

In the interactive prompt each line is a separate code block, so the first two

257

objects differ, while the third group shares the same block and reuses the object.

2. Using re.sub() correctly

The following function removes all HTML tags from a string using a regular expression.

<code>import re
def remove_tag(html):
    text = re.sub(r'<.*?>', '', html, re.S)
    return text
</code>

When the fourth argument of

re.sub

is mistakenly set to

re.S

, it is interpreted as the

count

parameter (value 16), so only the first 16 matches are replaced, leaving the last two tags (

&lt;/body&gt;&lt;/html&gt;

) untouched.

<code>print(re.S)  # 16
</code>

3. How lstrip() works

lstrip()

removes characters from the left side of a string. When a string of characters is passed, each character is stripped individually until a character not in the set is encountered.

<code>print("aabbcc".lstrip('aa'))      # bbcc
print("ababacac".lstrip('ab'))     # cac
</code>

To remove a specific prefix instead of any combination of its characters, use

replace()

or other methods.

4. Pitfalls of nested list creation

Creating a list of three empty lists can be done with a comprehension or by multiplication, but the latter creates references to the same inner list.

<code># Option 1
li = [[] for i in range(3)]

# Option 2
li = [[]] * 3
</code>

Appending to one sub‑list of the multiplied version modifies all sub‑lists because they reference the same object:

<code>li = [[]] * 3
li[0].append(1)
print(li)  # [[1], [1], [1]]
</code>
pythonregexstring-manipulationobject-identitylist-issues
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.