Pupli – Page 74

Select fastest apt mirror on Ubuntu Linux

December 7, 2019

sudo nano /etc/apt/sources.list

replace

deb http://us.archive.ubuntu.com/ubuntu/ xenial main restricted

with

deb mirror://mirrors.ubuntu.com/mirrors.txt xenial main restricted

Manual apt mirror selection

wget -qO - mirrors.ubuntu.com/mirrors.txt

References
https://linuxconfig.org/how-to-select-the-fastest-apt-mirror-on-ubuntu-linux

Change Data Type of columns in Pandas Dataframe

December 6, 2019

Method #1: Using DataFrame.astype()

# importing pandas as pd 
import pandas as pd 

# sample dataframe 
df = pd.DataFrame({ 
  'A': [1, 2, 3, 4, 5], 
  'B': ['a', 'b', 'c', 'd', 'e'], 
  'C': [1.1, '1.0', '1.3', 2, 5] }) 

# converting all columns to string type 
df = df.astype(str) 
print(df.dtypes)

# importing pandas as pd 
import pandas as pd 

# sample dataframe 
df = pd.DataFrame({ 
  'A': [1, 2, 3, 4, 5], 
  'B': ['a', 'b', 'c', 'd', 'e'], 
  'C': [1.1, '1.0', '1.3', 2, 5] }) 

# using dictionary to convert specific columns 
convert_dict = {'A': int, 
        'C': float
      } 

df = df.astype(convert_dict) 
print(df.dtypes)

Method #2: Using DataFrame.apply()

We can pass pandas.to_numeric, pandas.to_datetime and pandas.to_timedelta as argument to apply() function to change the datatype of one or more columns to numeric, datetime and timedelta respectively.

# importing pandas as pd 
import pandas as pd 

# sample dataframe 
df = pd.DataFrame({ 
  'A': [1, 2, 3, '4', '5'], 
  'B': ['a', 'b', 'c', 'd', 'e'], 
  'C': [1.1, '2.1', 3.0, '4.1', '5.1'] }) 

# using apply method 
df[['A', 'C']] = df[['A', 'C']].apply(pd.to_numeric) 
print(df.dtypes)

Method #3: Using DataFrame.infer_objects()

# importing pandas as pd 
import pandas as pd 

# sample dataframe 
df = pd.DataFrame({ 
  'A': [1, 2, 3, 4, 5], 
  'B': ['a', 'b', 'c', 'd', 'e'], 
  'C': [1.1, 2.1, 3.0, 4.1, 5.1] 
  }, dtype ='object') 

# converting datatypes 
df = df.infer_objects() 
print(df.dtypes)

References
https://www.geeksforgeeks.org/change-data-type-for-one-or-more-columns-in-pandas-dataframe/
https://towardsdatascience.com/my-pandas-cheat-sheet-b71437ab26f

Working on a copy of Pandas DataFrame

December 6, 2019

import pandas as pd

df: pd.DataFrame = pd.read_csv("avocado.csv")

df_cp = df.copy()

References
https://pythonprogramming.net/introduction-python3-pandas-data-analysis/

Sort Index in Pandas DataFrame

December 6, 2019

import pandas as pd

df: pd.DataFrame = pd.read_csv("avocado.csv")

df.set_index("Date", inplace=True)
df.sort_index(inplace=True)

print(df.index)

References
https://pythonprogramming.net/graph-visualization-python3-pandas-data-analysis/

Set Index in Pandas DataFrame

December 6, 2019

import pandas as pd

df: pd.DataFrame = pd.read_csv("avocado.csv")

# index of dataframe
print(df.index)

# set index in dataframe
df.set_index("Date", inplace=True)

# or set index this ways
# df = df.set_index("Date")

# print index
print(df.index)

References
https://pythonprogramming.net/introduction-python3-pandas-data-analysis/

Filter by values of a column in Pandas DataFrame

December 6, 2019

import pandas as pd

df: pd.DataFrame = pd.read_csv("avocado.csv")
albany_df = df[df["region"] == "Albany"]
# albany_df = df[df["region"] == "Albany"].copy()
print(albany_df)

males = df[(df[Gender]=='Male') & (df[Year]==2014)]

To store your dataframes in a dict using a for loop:

from collections import defaultdict
dic={}
for g in ['male', 'female']:
  dic[g]=defaultdict(dict)
  for y in [2013, 2014]:
    dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict

References
https://pythonprogramming.net/introduction-python3-pandas-data-analysis/
https://stackoverflow.com/questions/22086116/how-do-you-filter-pandas-dataframes-by-multiple-columns

Access specific column in Pandas DataFrame

December 6, 2019

import pandas as pd

df: pd.DataFrame = pd.read_csv("avocado.csv")
print(df["AveragePrice"])

References
https://pythonprogramming.net/introduction-python3-pandas-data-analysis/

Add rows to Pandas DataFrame

December 5, 2019

Create two data frames and append the second to the first one

# Importing pandas as pd 
import pandas as pd 

# Creating the first Dataframe using dictionary 
df1 = df = pd.DataFrame({"a":[1, 2, 3, 4], 
            "b":[5, 6, 7, 8]}) 

# Creating the Second Dataframe using dictionary 
df2 = pd.DataFrame({"a":[1, 2, 3], 
          "b":[5, 6, 7]}) 

# Print df1 
print(df1, "\n") 

# Print df2 
df2

Now append df2 at the end of df1.

# to append df2 at the end of df1 dataframe 
df1.append(df2)

Notice the index value of second data frame is maintained in the appended data frame. If we do not want it to happen then we can set ignore_index=True.

# A continuous index value will be maintained 
# across the rows in the new appended data frame. 
df.append(df2, ignore_index = True)

References
https://www.geeksforgeeks.org/python-pandas-dataframe-append/

Writing data to Excel with Pandas

December 4, 2019

Pandas uses the xlwt Python module internally for writing to Excel files.

movies.to_excel('output.xlsx')

You can choose to skip the index by passing along index-False.

movies.to_excel('output.xlsx', index=False)

We can do use these advanced output options by creating a ExcelWriter object and use this object to write to the EXcel file.

writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
movies.to_excel(writer, index=False, sheet_name='report')
workbook = writer.bookworksheet = writer.sheets['report']

header_fmt = workbook.add_format({'bold': True})
worksheet.set_row(0, None, header_fmt)

writer.save()

References
https://www.dataquest.io/blog/excel-and-pandas/

Pandas df.size, df.shape and df.ndim

December 4, 2019

# importing pandas module 
import pandas as pd 
  
# making data frame 
data = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv") 
  
# dataframe.size 
size = data.size 
  
# dataframe.shape 
shape = data.shape 
  
# dataframe.ndim 
df_ndim = data.ndim 
  
# series.ndim 
series_ndim = data["Salary"].ndim 
  
# printing size and shape 
print("Size = {}\nShape ={}\nShape[0] x Shape[1] = {}". 
format(size, shape, shape[0]*shape[1])) 
  
# printing ndim 
print("ndim of dataframe = {}\nndim of series ={}". 
format(df_ndim, series_ndim))

Size = 4122
Shape=(458, 9)
Shape[0] x Shape[1] = 4122
ndim of dataframe = 2
ndim of series=1

References
https://www.geeksforgeeks.org/python-pandas-df-size-df-shape-and-df-ndim/