English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Beispiel für deskriptive Statistiken von Pandas
DataFrame wird für eine Vielzahl von Berechnungen, deskriptiven Statistiken und anderen verwandten Operationen verwendet. Die meisten sind Aggregierungen, wie sum(), mean(), aber einige Aggregierungen (wie sumsum()) erzeugen Objekte gleicher Größe. Im Allgemeinen verwenden diese Methoden den Achsenparameter, wie ndarray. {sum, std, ...}, aber Achsen können durch Namen oder Integer angegeben werden DataFrame − Index (Achse=0, Standard), Spalte (Achse=1)
import pandas as pd import numpy as np #Erstellung eines Series-Dictionaries d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } # Erstellen Sie ein DataFrame df = pd.DataFrame(d) print(df)
Laufender Output:
Alter Name Bewertung 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80 7 34 Lee 3.78 8 40 David 2.98 9 30 Gasper 4.80 10 51 Betina 4.10 11 46 Andres 3.65
Gibt die Summe der Werte des angeforderten Achses zurück. Standardmäßig ist der Achse der Index (Achse=0)
import pandas as pd import numpy as np # Create a Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Erstellung eines DataFrame df = pd.DataFrame(d) print(df.sum())
Laufender Output:
Age 382 Name TomJamesRickyVinSteveSmithJackLeeDavidGasperBe... Rating 44.92 dtype: object
Each individual column is added with a string
This syntax will output the following content.
import pandas as pd import numpy as np #Erstellung eines Series-Dictionaries d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Erstellung eines DataFrame df = pd.DataFrame(d) print(df.sum(1))
Laufender Output:
0 29.23 1 29.24 2 28.98 3 25.56 4 33.20 5 33.60 6 26.80 7 37.78 8 42.98 9 34.80 10 55.10 11 49.65 dtype: float64
Returns the average
import pandas as pd import numpy as np #Erstellung eines Series-Dictionaries d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Erstellung eines DataFrame df = pd.DataFrame(d) print(df.mean())
Laufender Output:
Age 31.833333 Rating 3.743333 dtype: float64
Returns the Bressel standard deviation of the numeric columns.
import pandas as pd import numpy as np #Erstellung eines Series-Dictionaries d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Erstellung eines DataFrame df = pd.DataFrame(d) print(df.std())
Laufender Output:
Age 9.232682 Rating 0.661628 dtype: float64
Now let's understand the features under descriptive statistics in Python Pandas. The following table lists important features:
Number | Method | Description |
1 | count() | Non-empty number |
2 | sum() | Total |
3 | mean() | Mean |
4 | median() | Median |
5 | mode() | Mode |
6 | std() | Standard deviation |
7 | min() | Minimum value |
8 | max() | Maximum value |
9 | abs() | Absolute value |
10 | prod() | Product |
11 | cumsum() | Cumulative sum |
12 | cumprod() | Cumulative product |
import pandas as pd import numpy as np #Erstellung eines Series-Dictionaries d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Erstellung eines DataFrame df = pd.DataFrame(d) print(df.describe())
Laufender Output:
Age Rating count 12.000000 12.000000 mean 31.833333 3.743333 std 9.232682 0.661628 min 23.000000 2.560000 25% 25.000000 3.230000 50% 29.500000 3.790000 75% 35.500000 4.132500 max 51.000000 4.800000
This function provides the mean, std, and IQR values. And, the function does not include character columns and the given summary of numeric columns. "include" is a parameter used to pass necessary information about which columns to consider when summarizing. Value list; the default is "numeric".
object − Summarize string columnsnumber − Summarize numeric columnsall − Summarize all columns together (it should not be taken as a list value)Below we use the following statement in the program and execute and output:
import pandas as pd import numpy as np #Erstellung eines Series-Dictionaries d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Erstellung eines DataFrame df = pd.DataFrame(d) print(df.describe(include=['object']))
Laufender Output:
Name count 12 unique 12 top Ricky freq 1
Below we use the following statement in the program and execute and output:
import pandas as pd import numpy as np #Erstellung eines Series-Dictionaries d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), },'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65]) } #Erstellung eines DataFrame df = pd.DataFrame(d) print(df. describe(include='all'))
Laufender Output:
Alter Name Bewertung count 12.000000 12 12.000000 unique NaN 12 NaN top NaN Ricky NaN freq NaN 1 NaN mean 31.833333 NaN 3.743333 std 9.232682 NaN 0.661628 min 23.000000 NaN 2.560000 25% 25.000000 NaN 3.230000 50% 29.500000 NaN 3.790000 75% 35.500000 NaN 4.132500 max 51.000000 NaN 4.800000