Artificial Intelligence 21 min read

Build a Free Cloud AI Speed‑Dating Model with Alibaba PAI‑DSW

This article introduces Alibaba Cloud’s free PAI‑DSW cloud IDE for AI development, explains the evolution of machine learning, guides users through creating notebooks, running Python code, and demonstrates a complete speed‑dating dataset analysis and predictive modeling pipeline using logistic regression and data‑balancing techniques.

Alibaba Cloud Developer

Aug 25, 2020

Build a Free Cloud AI Speed‑Dating Model with Alibaba PAI‑DSW

Artificial intelligence (AI) has evolved from its inception in the 1950s to become a dominant technology, with machine learning as its core sub‑field. Deep learning, powered by frameworks like TensorFlow and GPUs, has driven breakthroughs in vision, speech, and recommendation systems.

Setting up a local deep‑learning environment can be cumbersome due to OS, GPU, driver, and library dependencies. Alibaba Cloud’s PAI‑DSW (Data Science Workshop) offers a cloud‑based IDE that integrates JupyterLab, WebIDE, and terminal access, providing free GPU resources for personal developers.

Getting Started with PAI‑DSW

Visit https://dsw-dev.data.aliyun.com/#/ and log in with an Alibaba Cloud account.

Create a new DSW lab; the interface shows a file explorer, workspace, and resource panel.

Open a Python 3 notebook; a new .ipynb file is created.

Write and run code by selecting a cell and pressing Shift+Enter.

Example: print a welcome message.

print("Welcome to DSW 👏👏👏 You can use this lab for any data experiment you like 😝")

Next, compute the 10th Fibonacci number.

# Compute the Nth Fibonacci number
def Fibonacci(n):
    if n < 0:
        print("Invalid input, please provide a positive integer")
    elif n == 1:
        return 0
    elif n == 2:
        return 1
    else:
        return Fibonacci(n-1) + Fibonacci(n-2)

print('The 10th Fibonacci number is:', Fibonacci(10))

Case Study: Predicting Speed‑Dating Success

The dataset contains ~8,000 rapid‑dating rounds with features such as gender, age, attractiveness, sincerity, intelligence, and match outcome.

import pandas as pd
df = pd.read_csv('Speed Dating Data.csv', encoding='gbk')
print(df.shape)

Exploratory analysis shows only 16.47% of rounds result in a match. Gender‑wise success rates differ slightly, with females having a marginally higher match probability.

# Plot gender match rates
import matplotlib.pyplot as plt
size_of_groups = df[df.gender == 0].match.value_counts().values
single_pct = round(size_of_groups[0]/sum(size_of_groups)*100, 2)
matched_pct = round(size_of_groups[1]/sum(size_of_groups)*100, 2)
labels = [f'Single: {single_pct}%', f'Matched: {matched_pct}%']
plt.pie(size_of_groups, labels=labels)
plt.show()

Age distribution peaks between 22‑28 years, suggesting most participants are young adults.

# Age histogram
import numpy as np
age = df[np.isfinite(df['age'])]['age']
plt.hist(age, bins=35)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

Correlation analysis highlights that attributes such as attractiveness, sincerity, intelligence, fun, ambition, and shared interests are strongly related to the match outcome.

# Correlation heatmap
import seaborn as sns
selected = df[['attr_o','sinc_o','intel_o','fun_o','amb_o','shar_o','match']]
corr = selected.corr()
sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns)
plt.show()

Modeling steps:

Select the six predictor columns and the target match.

Drop rows with missing values.

Apply SVMSMOTE to balance the classes.

Split into training (80%) and test (20%) sets.

Train a logistic regression model.

# Data preparation
clean_df = df[['attr_o','sinc_o','intel_o','fun_o','amb_o','shar_o','match']].dropna()
X = clean_df[['attr_o','sinc_o','intel_o','fun_o','amb_o','shar_o']]
y = clean_df['match']

# Balance classes
from imblearn.over_sampling import SVMSMOTE
X_res, y_res = SVMSMOTE().fit_resample(X, y)

# Train‑test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_res, y_res, test_size=0.2, random_state=0, stratify=y_res)

# Logistic regression
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
model = LogisticRegression(C=1, random_state=0)
model.fit(X_train, y_train)
train_acc = metrics.accuracy_score(y_train, model.predict(X_train))
test_acc = metrics.accuracy_score(y_test, model.predict(X_test))
print('Training Accuracy:', train_acc)
print('Validation Accuracy:', test_acc)

The balanced logistic regression achieves around 83% validation accuracy, demonstrating a viable approach to predict whether a speed‑dating round will result in a match.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning Python AI data science Cloud IDE

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.