The steps to calculate the correlation matrix using the corr()
method in pandas and visualize it using a heatmap in Seaborn to identify strongly correlated features in the Titanic dataset.
- Load the dataset:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt url = 'https://raw.githubusercontent.com/drshahizan/dataset/main/titanic/train.csv' titanic = pd.read_csv(url)
- Select only numeric columns for correlation calculation:
numeric_cols = titanic.select_dtypes(include=['number']).columns corr_matrix = titanic[numeric_cols].corr()
- Create the heatmap:
plt.figure(figsize=(10, 8)) sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0) plt.title('Correlation Matrix of Titanic Dataset') plt.show()
Here's the complete code in a single notebook:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load the Titanic dataset from the provided URL
url = 'https://raw.githubusercontent.com/drshahizan/dataset/main/titanic/train.csv'
titanic = pd.read_csv(url)
# Select only numeric columns for correlation calculation
numeric_cols = titanic.select_dtypes(include=['number']).columns
corr_matrix = titanic[numeric_cols].corr()
# Create a heatmap to visualize the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix of Titanic Dataset')
plt.show()
Please create an Issue for any improvements, suggestions or errors in the content.
You can also contact me using Linkedin for any other queries or feedback.