English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Pandas deskriptive Statistik

Beispiel für deskriptive Statistiken von Pandas

DataFrame wird für eine Vielzahl von Berechnungen, deskriptiven Statistiken und anderen verwandten Operationen verwendet. Die meisten sind Aggregierungen, wie sum(), mean(), aber einige Aggregierungen (wie sumsum()) erzeugen Objekte gleicher Größe. Im Allgemeinen verwenden diese Methoden den Achsenparameter, wie ndarray. {sum, std, ...}, aber Achsen können durch Namen oder Integer angegeben werden DataFrame − Index (Achse=0, Standard), Spalte (Achse=1)

Lassen Sie uns ein DataFrame erstellen und verwenden wir dieses Objekt in diesem Kapitel für alle Operationen.

Beispiel

 import pandas as pd
 import numpy as np
 #Erstellung eines Series-Dictionaries
 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
    'Lee','David','Gasper','Betina','Andres']),
    },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
    'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
 }
 # Erstellen Sie ein DataFrame
 df = pd.DataFrame(d)
 print(df)

Laufender Output:

        Alter  Name   Bewertung
0   25   Tom     4.23
1   26   James   3.24
2   25   Ricky   3.98
3   23   Vin     2.56
4   30   Steve   3.20
5   29   Smith   4.60
6   23   Jack    3.80
7   34   Lee     3.78
8   40   David   2.98
9   30   Gasper  4.80
10  51   Betina  4.10
11  46   Andres  3.65

sum()

Gibt die Summe der Werte des angeforderten Achses zurück. Standardmäßig ist der Achse der Index (Achse=0)

 import pandas as pd
 import numpy as np
  
 # Create a Series dictionary
 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
    'Lee','David','Gasper','Betina','Andres']),
    },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
    'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
 }
 #Erstellung eines DataFrame
 df = pd.DataFrame(d)
 print(df.sum())

Laufender Output:

    Age                                                    382
Name  TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...
Rating                                               44.92
dtype: object

Each individual column is added with a string

axis=1

This syntax will output the following content.

 import pandas as pd
 import numpy as np
  
 #Erstellung eines Series-Dictionaries
 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
    'Lee','David','Gasper','Betina','Andres']),
    },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
    'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
 }
  
 #Erstellung eines DataFrame
 df = pd.DataFrame(d)
 print(df.sum(1))

Laufender Output:

    0    29.23
1    29.24
2    28.98
3    25.56
4    33.20
5    33.60
6    26.80
7    37.78
8    42.98
9    34.80
10   55.10
11   49.65
dtype: float64

mean()

Returns the average

 import pandas as pd
 import numpy as np
 #Erstellung eines Series-Dictionaries
 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
    'Lee','David','Gasper','Betina','Andres']),
    },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
    'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
 }
 #Erstellung eines DataFrame
 df = pd.DataFrame(d)
 print(df.mean())

Laufender Output:

    Age       31.833333
Rating     3.743333
dtype: float64

std()

Returns the Bressel standard deviation of the numeric columns.

 import pandas as pd
 import numpy as np
 #Erstellung eines Series-Dictionaries
 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
    'Lee','David','Gasper','Betina','Andres']),
    },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
    'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
 }
 #Erstellung eines DataFrame
 df = pd.DataFrame(d)
 print(df.std())

Laufender Output:

    Age       9.232682
Rating  0.661628
dtype: float64

Functions & Description

Now let's understand the features under descriptive statistics in Python Pandas. The following table lists important features:

NumberMethodDescription
1count()Non-empty number
2sum()Total
3mean()Mean
4median()Median
5mode()Mode
6std()Standard deviation
7min()Minimum value
8max()Maximum value
9abs()Absolute value
10prod()Product
11cumsum()Cumulative sum
12cumprod()Cumulative product
Note: − Since DataFrame is a heterogeneous data structure, generic operations do not apply to all functions.
    Functions such as sum(), cumsum() can be used for numeric and character (or) string data elements without any errors. Although character sets are not commonly used, no exceptions will be thrown.
  • When a DataFrame contains character or string data, functions such as abs(), cumprod() will raise exceptions because such operations cannot be performed.

Summarize data

  import pandas as pd
 import numpy as np
 #Erstellung eines Series-Dictionaries
 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
    'Lee','David','Gasper','Betina','Andres']),
    },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
    'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
 }
 #Erstellung eines DataFrame
 df = pd.DataFrame(d)
 print(df.describe())

Laufender Output:

                Age  Rating
count    12.000000      12.000000
mean     31.833333       3.743333
std       9.232682       0.661628
min      23.000000       2.560000
25%      25.000000       3.230000
50%      29.500000       3.790000
75%      35.500000       4.132500
max      51.000000       4.800000

This function provides the mean, std, and IQR values. And, the function does not include character columns and the given summary of numeric columns. "include" is a parameter used to pass necessary information about which columns to consider when summarizing. Value list; the default is "numeric".

object − Summarize string columnsnumber − Summarize numeric columnsall − Summarize all columns together (it should not be taken as a list value)

Below we use the following statement in the program and execute and output:

  import pandas as pd
 import numpy as np
 #Erstellung eines Series-Dictionaries
 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
    'Lee','David','Gasper','Betina','Andres']),
    },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
    'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
 }
 #Erstellung eines DataFrame
 df = pd.DataFrame(d)
 print(df.describe(include=['object']))

Laufender Output:

           Name
count       12
unique      12
top  Ricky
freq         1

Below we use the following statement in the program and execute and output:

  import pandas as pd
 import numpy as np
 #Erstellung eines Series-Dictionaries
 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
    'Lee','David','Gasper','Betina','Andres']),
    },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
    'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
 }
 #Erstellung eines DataFrame
 df = pd.DataFrame(d)
 print(df. describe(include='all'))

Laufender Output:

           Alter          Name       Bewertung
count   12.000000        12    12.000000
unique        NaN        12          NaN
top           NaN     Ricky          NaN
freq          NaN         1          NaN
mean    31.833333       NaN     3.743333
std      9.232682       NaN     0.661628
min     23.000000       NaN     2.560000
25%     25.000000       NaN     3.230000
50%     29.500000       NaN     3.790000
75%     35.500000       NaN     4.132500
max     51.000000       NaN     4.800000