Should I store very large amount of data as .mat files or .csv. Which is more: (1) efficient when it comes to reading the data (2) more compressed in terms of size
No products are associated with this question.
I think the answer to both (1) and (2) is .mat file. ASCII files (like CSV) require conversion to and from the format in memory (binary), which makes them slow. Moreover, if written at more than 6 significant figures, they are bigger than the usual (double precision) binary format as well. Therefore, you should probably only use CSV if either (a) you need to exchange data with software that can read CSV but cannot read MAT (like Excel) or (b) you want to be able to peruse the data yourself in a text editor or CSV editor.
Aside, if speed is your absolute goal, consider using
save myvariable myfile -v6
because both the save() and load() commands are much quicker if compression is disabled like this (compression was not available in Version 6 of Matlab). Vice versa, if small file size is your goal, use the usual save/load commands.
Depending on the data, you might get some additional savings by first casting to a smaller data type. For instance, if you are storing data as double precision but are confident that single precision will be enough... try this:
a = randn(1000, 1); save a1 a a = single(a); save a2 a
and check the filesizes. Note that the filesize has got smaller in a2 because you've thrown away information which you judged yourself to be irrelevant.
My suggestion after years of being angry at reading in a binary file with the wrong dimensions: use .mat files.
I am going to use the .mat files both in Matlab and R. R does have packages to load the .mat files. So, is .mat preferred or .csv preferred for speed and size of execution.
I am talking about let's say 500GB of data.