Data exploration and scatter plot visualization in Python using Pandas and Matplotlib
import pandas as pd
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv("dataset.csv")
# Print the first 5 rows of the dataset
print(data.head())
# Print summary statistics of the dataset
print(data.describe())
# Create a scatter plot of two variables
plt.scatter(data['Variable1'], data['Variable2'])
plt.xlabel('Variable 1')
plt.ylabel('Variable 2')
plt.title('Scatter plot of two variables')
plt.show()
In this program, we first import the necessary libraries, including pandas
for data manipulation and matplotlib
for data visualization. We then load a dataset from a CSV file using the pd.read_csv()
function from pandas
.
Next, we print the first 5 rows of the dataset using the head()
method to get a quick glimpse of what the data looks like. We also print some summary statistics of the dataset using the describe()
method to get an idea of the data's distribution.
Finally, we create a scatter plot of two variables from the dataset using matplotlib
. We use the scatter()
function to create the plot and specify the x and y variables using the dataset column names. We then add axis labels and a plot title using the xlabel()
, ylabel()
, and title()
functions. Finally, we display the plot using the show()
function.
Comments
Post a Comment