неделя, 13 февруари 2022 г.

How to export and load a list of dataframes without pickles

I had this problem yesterday with pickles from different pandas version not being compatible. It turned out to be a total disaster. You can read more about it here.  The worst is that I don't think there is anything you can do to fix it, they are just not compatible. Which was really bad for me - all my current local files are on pandas 1.2 and a file I calculated on a distant computer was with pandas 1.4. I couldn't downgrade the distant computer and I didn't want to upgrade my pandas as it would render all my pickles not working. In short - a disaster.

 At first I tried to load my pickle and turn it into a csv file:

import pickle as pkl
import pandas as pd
with open("file.pkl", "rb") as f:
    object = pkl.load(f)
    
df = pd.DataFrame(object)
df.to_csv(r'file.csv')
or the simpler:

df=pd.read_csv("file.csv")

df.to_pickle("file.pkl")

 

But that didn't work. Because I'm exporting a list of dataframes, when you import them as a csv with 

df = pd.read_csv('test.csv') 

you will get a nonsense. I tried a lot of different options, nothing rendered my list of dataframes usable - it always would produce an error. So an hour and a half later of googling and trial and error, I got to the final solution - just export all the csv separately. This doesn't work very well for me, because I have 9 dataframes per model, which means a lot of csv files which I want to avoid and which was exactly the reason to export to pickle on the first place. But for backwards and forward compatibility, I think the best way to go is this. Just export everything and then load it into the same structure and that's it.

#####

1. load pickle 

2. Save to csv #different csv files 

for i in range(0,9):

 ... f = open("test"+str(i)+".csv", 'a') 

... df[i].to_csv(f)

 ... f.close() 

3. Copy the files between computers

4. Read them: 

dfx=[0]*9 

for i in range(0,9): 

dfx[i]=pd.read_csv("test"+str(i)+".csv") 

Enjoy a working list of dataframes!!!!

Below are some of the places I read in my all-nighter to reach to this solution. I know I probably should have given up on that way earlier, but damn, I wanted to find an elegant solution to just convert pickles 1.2 to pickles 1.3 and vise versa. And I couldn't. I guess the only other thing I could do is to create a new conda environment, have the newer pandas there and work from there when needed. But considering how much trouble creating this working conda env was (some fortran problems, then some other problems, then some other other problems), i just couldn't be bothered with it. Even though it probably would have been easier and quicker.

Some useful info on csv files and pickles:  
(https://stackoverflow.com/questions/36519086/how-to-get-rid-of-unnamed-0-column-in-a-pandas-dataframe-read-in-from-csv-fil,
 https://docs.python.org/3/library/csv.html, 
https://www.listendata.com/2019/06/pandas-read-csv.html, 
https://www.geeksforgeeks.org/python-read-csv-using-pandas-read_csv/, 
https://e2eml.school/data_files.html, 
https://www.analyticsvidhya.com/blog/2021/08/python-tutorial-working-with-csv-file-for-data-science/,
 https://stackoverflow.com/questions/42168420/how-to-dill-pickle-to-file