Cleaninig dirty data in dataframe

let's assume I've used df.read_csv to import a csv file.

the df.head() looks something like this:



Date,"Last","Open","Max","Min","Volume","Delta %"

04.01.2021,"26,65","25,77","27,57","24,32","-","7,75%"
30.12.2020,"24,73","25,00","25,27","24,42","-","-0,34%"
29.12.2020,"24,82","23,73","24,89","23,21","-","3,71%"
28.12.2020,"23,93","25,36","25,42","23,93","-","-5,48%"

some questions pls:
1) how do I get rid of the " in the heading?
2) how do I convert the Date column in a format like '%Y-%m-%d'. I tried df["Date"].apply(lambda x: x.replace(".", "-")) but with no success
3) finally, how do I get rid of the " from the values beloging to the other columns and convert them into float. I tried this but unsuccessfully: df['Last'].apply(pd.to_numeric())
thx

Hi Gian,



1.  You can strip the first and last characters by string slicing.



2. You can use a similar lambda function or the python to_datetime:

from datetime import datetime
datetime_str = '04.01.2021'

datetime_object = datetime.strptime(datetime_str, '%d.%m.%Y')
print(datetime_object)

date_time = datetime_object.strftime('%d-%m-%Y')
print(date_time)


3. Try some lambda function like this on the particular column:

from re import sub
from decimal import Decimal

money = '6,50989%'
value = Decimal(sub(r'[^\d.]', '', money.replace(',','.')))
float(value)

Hope this helps!