Basic Python | Data Visualization Using matplotlib


Data visualization is a technique to present the data in a pictorial or graphical format.
Many new Python data visualization libraries are introduced recently such as:
Python Libraries—matplotlib
Using Python’s matplotlib, the data visualization of large and complex data becomes easy.
There are several advantages of using matplotlib to visualize data. They are as follows:
Is a multi-platform data visualization tool; therefore, it is fast and efficient.Can work well with many operating systems and graphics back ends.Has high-quality graphics and plots to print and view for a range of graphs.With Jupyternotebook integration, the developers are free to spend their time implementing features.Has large community support and cross platform support as it is an open source tool.Has full control over graphs or plot styles
The Plot
A plot is a graphical representation of data, which shows the relationship between two variables or the distribution of data.
Steps to Create a Plot
You can create a plot using four simple steps.
Step 01: Import the required libraries
Step 02: Define or import the required dataset
Step 03: Set the plot parameters [using Object Oriented Interface or PYPlot]
Step 04: Display the created plot
Line Graph and Line Properties:
A line chart or line graph is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments
Line styles are about as commonly used as colors. There are a few predefined line-styles available to use. Note that there are some advanced techniques to specify some custom line styles.
linestyledescription
'-'solid
'--'dashed
'-.'dashdot
':'dotted
'None'draw nothing
' 'draw nothing
''draw nothing
Other attributes
With just about any plot you can make, there are many attributes that can be modified to make the lines and markers suit your needs. Note that for many plotting functions, matplotlib will cycle the colors for each dataset you plot. However, you are free to explicitly state which colors you want used for which plots. For the plt.plot() and plt.scatter() functions, you can mix the specification for the colors, linestyles, and markers in a single string.
PropertyValue Type
alphafloat
color or cany matplotlib color
drawstyle[ ‘default’ ‘steps’ ‘steps-pre’
‘steps-mid’ ‘steps-post’ ]
linestyle or ls[ '-' '--' '-.' ':' 'None' ' ' '']
and any drawstyle in combination with a
linestyle, e.g. 'steps--'.
linewidth or lwfloat value in points
marker[ 0 1 2 3 4 5 6 7 'o' 'd' 'D' 'h' 'H'
'' 'None' ' ' None '8' 'p' ','
'+' 'x' '.' 's' '*' '_' '|'
'1' '2' '3' '4' 'v' '<' '>' '^' ]
markeredgecolor or mecany matplotlib color
markeredgewidth or mewfloat value in points
markerfacecolor or mfcany matplotlib color
markersize or msfloat
visible[True False]
Histograms are graphical representations of a probability distribution. A histogram is a kind of a bar chart. Using matplotlib and its bar chart function, you can create histogram charts. 
Advantages of Histogram charts:
They display the number of values within a specified interval. They are suitable for large datasets as they can be grouped within the intervals.
Lets take a Olympic dataset and exposure the bar plot and scatter plots.
In a bar plot we can compare two data series. Below is the bar plot for the Olympic dataset,  comparing Gold and Silver medals tally.
Setting annotations in a plot.
A scatter plot is used to graphically display the relationships between variables. However, to control a plot, it is recommended to use scatter() method.
It has several advantages:
Shows the correlation between variables
Is suitable for large datasets
Is easy to find clusters
Is possible to represent each piece of data as a point on the plot
Below is the scatter plot for the Olympic dataset,  showing the correlation relationship between Gold, Silver and Bronze medals

No comments:

Post a Comment

Note: only a member of this blog may post a comment.