Python-Based WeChat Friend Data Analysis: Gender, Location, Avatar, and Signature Insights
This article demonstrates how to use Python libraries such as itchat, matplotlib, jieba, and SnowNLP to collect WeChat friend information and perform data analysis on gender, location, avatars, and signatures, presenting results with charts and word clouds, including sentiment and face detection.
This tutorial shows how to leverage Python to retrieve and analyze WeChat friend data, focusing on dimensions such as gender, location, avatar, and signature. The results are visualized using charts and word clouds, with additional sentiment analysis and face detection.
Before diving into the analysis, the following third‑party modules are required:
itchat – a Python wrapper for the WeChat web interface, used to fetch friend information.
jieba – Chinese word segmentation library for processing textual data.
matplotlib – plotting library for creating bar charts, pie charts, etc.
snownlp – Chinese sentiment analysis library.
PIL – image processing library.
numpy – numerical computing library, used together with the wordcloud module.
wordcloud – generates word‑cloud images.
TencentYoutuyun – SDK for face detection and image tag extraction.
All modules can be installed via pip . The analysis begins by logging into WeChat and retrieving the friend list:
<code>itchat.auto_login(hotReload=True)</code>
<code>friends = itchat.get_friends(update=True)</code>The returned friends object is a list of dictionaries; the first element represents the logged‑in user, so the actual data starts from friends[1:] . Each friend dictionary contains fields such as Sex , City , Province , HeadImgUrl , and Signature , which are the focus of the subsequent analysis.
Gender Analysis
Gender information is extracted from the Sex field, counted, and visualized as a pie chart. The implementation is:
<code>def analyseSex(friends):</code>
<code> sexs = list(map(lambda x: x['Sex'], friends[1:]))</code>
<code> counts = list(map(lambda x: x[1], Counter(sexs).items()))</code>
<code> labels = ['Unknow', 'Male', 'Female']</code>
<code> colors = ['red', 'yellowgreen', 'lightskyblue']</code>
<code> plt.figure(figsize=(8,5), dpi=80)</code>
<code> plt.axes(aspect=1)</code>
<code> plt.pie(counts, labels=labels, colors=colors, labeldistance=1.1, autopct='%.1f%%', shadow=False, startangle=90, pctdistance=0.6)</code>
<code> plt.legend(loc='upper right')</code>
<code> plt.title(f"{friends[0]['NickName']}的微信好友性别组成")</code>
<code> plt.show()</code>Location Analysis
Location data is derived from the Province and City fields. The script writes these fields to a CSV file, which can later be imported into a mapping tool (e.g., BDP) for visualisation:
<code>def analyseLocation(friends):</code>
<code> headers = ['NickName', 'Province', 'City']</code>
<code> with open('location.csv', 'w', encoding='utf-8', newline='') as csvFile:</code>
<code> writer = csv.DictWriter(csvFile, headers)</code>
<code> writer.writeheader()</code>
<code> for friend in friends[1:]:</code>
<code> row = {}</code>
<code> row['NickName'] = friend['NickName']</code>
<code> row['City'] = friend['City']</code>
<code> writer.writerow(row)</code>Avatar (Head Image) Analysis
Avatars are downloaded, optionally processed with a face‑detection API, and the results are visualised as a pie chart indicating how many avatars contain recognizable faces. The code also builds a word cloud from the tags extracted by the face API:
<code>def analyseHeadImage(friends):</code>
<code> basePath = os.path.abspath('.')
<code> baseFolder = basePath + '\\HeadImages\\'
<code> if not os.path.exists(baseFolder):
<code> os.makedirs(baseFolder)
<code> faceApi = FaceAPI()
<code> use_face = 0
<code> not_use_face = 0
<code> image_tags = ''
<code> for index in range(1, len(friends)):
<code> friend = friends[index]
<code> imgFile = baseFolder + f'\\Image{index}.jpg'
<code> imgData = itchat.get_head_img(userName=friend['UserName'])
<code> if not os.path.exists(imgFile):
<code> with open(imgFile, 'wb') as file:
<code> file.write(imgData)
<code> time.sleep(1)
<code> result = faceApi.detectFace(imgFile)
<code> if result:
<code> use_face += 1
<code> else:
<code> not_use_face += 1
<code> result = faceApi.extractTags(imgFile)
<code> image_tags += ','.join([x['tag_name'] for x in result])
<code> labels = ['使用人脸头像', '不使用人脸头像']
<code> counts = [use_face, not_use_face]
<code> colors = ['red', 'yellowgreen', 'lightskyblue']
<code> plt.figure(figsize=(8,5), dpi=80)
<code> plt.axes(aspect=1)
<code> plt.pie(counts, labels=labels, colors=colors, labeldistance=1.1, autopct='%.1f%%', shadow=False, startangle=90, pctdistance=0.6)
<code> plt.legend(loc='upper right')
<code> plt.title(f"{friends[0]['NickName']}的头像人脸使用情况")
<code> plt.show()
<code> # Word cloud for tags
<code> back_coloring = np.array(Image.open('flower.jpg'))
<code> wordcloud = WordCloud(font_path='simfang.ttf', background_color='white', max_words=1200, mask=back_coloring, max_font_size=75, random_state=45, width=800, height=480, margin=15)
<code> wordcloud.generate(image_tags)
<code> plt.imshow(wordcloud)
<code> plt.axis('off')
<code> plt.show()Signature Analysis
Signatures are cleaned, tokenised, and fed into SnowNLP for sentiment scoring. The extracted keywords form a word cloud, while sentiment scores are aggregated into three categories (positive, neutral, negative) and displayed as a bar chart:
<code>def analyseSignature(friends):</code>
<code> signatures = ''
<code> emotions = []
<code> pattern = re.compile("1f\d.+")
<code> for friend in friends:
<code> signature = friend['Signature']
<code> if signature:
<code> signature = signature.strip().replace('span','').replace('class','').replace('emoji','')
<code> signature = re.sub(r'1f(\d.+)', '', signature)
<code> if len(signature) > 0:
<code> nlp = SnowNLP(signature)
<code> emotions.append(nlp.sentiments)
<code> signatures += ''.join(jieba.analyse.extract_tags(signature, 5))
<code> with open('signatures.txt', 'wt', encoding='utf-8') as file:
<code> file.write(signatures)
<code> # Word cloud for signatures
<code> back_coloring = np.array(Image.open('flower.jpg'))
<code> wordcloud = WordCloud(font_path='simfang.ttf', background_color='white', max_words=1200, mask=back_coloring, max_font_size=75, random_state=45, width=960, height=720, margin=15)
<code> wordcloud.generate(signatures)
<code> # Sentiment statistics
<code> count_good = len(list(filter(lambda x: x > 0.66, emotions)))
<code> count_normal = len(list(filter(lambda x: 0.33 <= x <= 0.66, emotions)))
<code> count_bad = len(list(filter(lambda x: x < 0.33, emotions)))
<code> labels = ['负面消极', '中性', '正面积极']
<code> values = (count_bad, count_normal, count_good)
<code> plt.rcParams['font.sans-serif'] = ['simHei']
<code> plt.rcParams['axes.unicode_minus'] = False
<code> plt.xlabel('情感判断')
<code> plt.ylabel('频数')
<code> plt.xticks(range(3), labels)
<code> plt.bar(range(3), values, color='rgb')
<code> plt.title(f"{friends[0]['NickName']}的微信好友签名信息情感分析")
<code> plt.show()The article concludes by directing readers to the full project tutorial, which can be obtained by following the public account mentioned throughout the text.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.