There is a nice package called mat4py
which can easily be installed using
pip install mat4py
It is straightforward to use (from the website):
Load data from a MAT-file
The function loadmat
loads all variables stored in the MAT-file into a simple Python data structure, using only Python’s dict
and list
objects. Numeric and cell arrays are converted to
row-ordered nested lists. Arrays are squeezed to eliminate arrays with only one element. The resulting data structure is composed of simple types that are compatible with the JSON format.
Example: Load a MAT-file into a Python data structure:
from mat4py import loadmat
data = loadmat('datafile.mat')
The variable data
is a dict
with the variables and values contained in the MAT-file.
Save a Python data structure to a
MAT-file
Python data can be saved to a MAT-file, with the function savemat
. Data has to be structured in the same way as for loadmat
, i.e. it should be composed of simple data types, like dict
, list
, str
, int
, and float
.
Example: Save a Python data structure to a MAT-file:
from mat4py import savemat
savemat('datafile.mat', data)
The parameter data
shall be a dict
with the variables.
A large number of datasets for data science and research, utilize .mat files. In this article, we’ll learn to work with .mat files in Python and explore them in detail.
Why do we use .mat files in Python?
The purpose of a .mat file may not seem obvious right off the bat. But when working with large
datasets, the information contained within these files is absolutely crucial for data science/machine learning projects!
This is because the .mat files contain the metadata of every object/record in the dataset.
While the files are not exactly designed for the sole purpose of creating annotations, a lot of researchers use MATLAB for their research and data collection, causing a lot of the annotations that we use in Machine Learning to be present in the form
of .mat files.
So, it’s important for a data scientist to understand how to use the .mat files for your projects. These also help you better work with training and testing data sets instead of working with regular CSV files.
Let’s get started!
By default, Python is not capable of reading .mat files. We need to import a library that knows how to handle the
file format.
1. Install scipy
Similar to how we use the CSV module to work with .csv files, we’ll import the scipy libary to work with .mat files in Python.
If you don’t already have scipy, you can use the pip command to
install the same
Now that we have scipy set up and ready to use, the next step is to open up your python script to finally get the data required from the file.
2. Import the scipy.io.loadmat module
In this example, I will be using the accordion annotations provided by Caltech, in 101 Object Categories.
from scipy.io import loadmat annots = loadmat('annotation_0001.mat') print(annots)
Upon execution, printing out annots would provide us with
this as the output.
{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN, Created on: Tue Dec 14 15:57:03 2004', '__version__': '1.0', '__globals__': [], 'box_coord': array([[ 2, 300, 1, 260]], dtype=uint16), 'obj_contour': array([[ 37.16574586, 61.94475138, 89.47697974, 126.92081031, 169.32044199, 226.03683241, 259.07550645, 258.52486188, 203.46040516, 177.5801105 , 147.84530387, 117.0092081 , 1.37384899, 1.37384899, 7.98158379, 0.82320442, 16.2412523 , 31.65930018, 38.81767956, 38.81767956], [ 58.59300184, 44.27624309, 23.90239411, 0.77532228, 2.97790055, 61.34622468, 126.87292818, 214.97605893, 267.83793738, 270.59116022, 298.67403315, 298.67403315, 187.99447514, 94.93554328, 90.53038674, 77.31491713, 62.44751381, 62.99815838, 56.94106814, 56.94106814]])}
Starting off, you can see that this single .mat file provides information regarding the version of MATLAB used, the platform, the date of its creation, and a lot more.
The part that we should be focusing on is, however, the box_coord
, and the obj_contour
.
3. Parse the .mat file structure
If you’ve gone through the information regarding the Annotations provided by Caltech, you’d know
that these numbers are the outlines of the corresponding image in the dataset.
In a little more detail, this means that the object present in image 0001, consists of these outlines. A little further down in the article, we’ll be sorting through the numbers, so, don’t worry about it for now.
Parsing through this file structure, we could assign all the contour values to a new Python list.
con_list = [[element for element in upperElement] for upperElement in annots['obj_contour']]
If we printed out con_list
, we would receive a simple 2D array.
[[37.16574585635357, 61.94475138121544, 89.47697974217309, 126.92081031307546, 169.32044198895025, 226.03683241252295, 259.0755064456721, 258.52486187845295, 203.4604051565377, 177.58011049723754, 147.84530386740326, 117.0092081031307, 1.3738489871086301, 1.3738489871086301, 7.98158379373848, 0.8232044198894926, 16.24125230202577, 31.65930018416205, 38.81767955801104, 38.81767955801104], [58.59300184162066, 44.27624309392269, 23.90239410681403, 0.7753222836096256, 2.9779005524862328, 61.34622467771641, 126.87292817679563, 214.97605893186008, 267.83793738489874, 270.59116022099454, 298.6740331491713, 298.6740331491713, 187.9944751381216, 94.93554327808477, 90.53038674033152, 77.31491712707185, 62.44751381215474, 62.998158379373876, 56.94106813996319, 56.94106813996319]]
4. Use Pandas dataframes to work with the data
Now that you have the information and the data retrieved, how would you work with it? Continue to use lists? Definitely not.
We use Dataframes as the structure to work with, in that it functions much like a table of data. Neat to look
at, and extremely simple to use.
Now, to work with Dataframes, we’ll need to import yet another module, Pandas.
Pandas is an open source data analysis tool, that is used by machine learning enthusiasts and data scientists throughout the world. The operations provided by it are considered vital and fundamental in a lot of data science applications.
We’ll only be working with DataFrames in this article, but, keep in mind that the opportunities provided by Pandas are immense.
Working with the data we’ve received above can be simplified by using pandas to construct a data frame with rows and columns for the data.
# zip provides us with both the x and y in a tuple. newData = list(zip(con_list[0], con_list[1])) columns = ['obj_contour_x', 'obj_contour_y'] df = pd.DataFrame(newData, columns=columns)
Now, we have our data in a neat DataFrame!
obj_contour_x obj_contour_y 0 37.165746 58.593002 1 61.944751 44.276243 2 89.476980 23.902394 3 126.920810 0.775322 4 169.320442 2.977901 5 226.036832 61.346225 6 259.075506 126.872928 7 258.524862 214.976059 8 203.460405 267.837937 9 177.580110 270.591160 10 147.845304 298.674033 11 117.009208 298.674033 12 1.373849 187.994475 13 1.373849 94.935543 14 7.981584 90.530387 15 0.823204 77.314917 16 16.241252 62.447514 17 31.659300 62.998158 18 38.817680 56.941068 19 38.817680 56.941068
As you can see, we have the X and Y coordinates for the image’s outline in a simple DataFrame of two columns.
This should
provide you with some clarity about the nature of the data in the file.
The process of creating DataFrames for each .mat file is different but, with experience and practice, creating them out of .mat files should come naturally to you.
That’s all for this article!
Conclusion
You now know how to work with .mat files in Python, and how to create dataframes in pandas with its content.
The next steps to work with this
data would be to and create your own models, or employ existing ones for training or testing your copy of the dataset.
References
- Official Scipy.io Documentation
- Official Pandas DataFrame
Documentation
How do I load a .MAT dataset in Python?
I installed SciPy version 0.7..
Install the package: pip install pymatreader..
Import the relevant function of this package: from pymatreader import read_mat..
Use the function to read the matlab struct: data = read_mat(‘matlab_struct. mat’).
use data. keys() to locate where the data is actually stored..
How do I open a MATLAB data file in Python?
Read Matlab mat Files in Python.
Use the scipy.io Module to Read .mat Files in Python..
Use the NumPy Module to Read mat Files in Python..
Use the mat4py Module to Read mat Files in Python..
Use the matlab.engine Module to Read mat Files in Python..
What can open .MAT files?
MATLAB from MathWorks can open MAT files that are used by that program.
How do I open a .MAT file without MATLAB?
It is not possible to open it with a text editor (except you have a special plugin as Dennis Jaheruddin says). Otherwise you will have to convert it into a text file (csv for example) with a script. This could be done by python for example: Read . mat files in Python.
Thuộc website harveymomstudy.com