Pandas SQL-Operationen Pandas Grundmethoden

Pandas deskriptive Statistik

Beispiel für deskriptive Statistiken von Pandas

DataFrame wird für eine Vielzahl von Berechnungen, deskriptiven Statistiken und anderen verwandten Operationen verwendet. Die meisten sind Aggregierungen, wie sum(), mean(), aber einige Aggregierungen (wie sumsum()) erzeugen Objekte gleicher Größe. Im Allgemeinen verwenden diese Methoden den Achsenparameter, wie ndarray. {sum, std, ...}, aber Achsen können durch Namen oder Integer angegeben werden DataFrame − Index (Achse=0, Standard), Spalte (Achse=1)

Lassen Sie uns ein DataFrame erstellen und verwenden wir dieses Objekt in diesem Kapitel für alle Operationen.

Beispiel

　import　pandas　as　pd
　import　numpy　as　np
　#Erstellung eines Series-Dictionaries
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　},'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#　Erstellen Sie ein DataFrame
　df　=　pd.DataFrame(d)
　print(df)

Laufender Output：

　　　　　　　　Alter　　Name　　　Bewertung
0　　　25　　　Tom　　　　　4.23
1　　　26　　　James　　　3.24
2　　　25　　　Ricky　　　3.98
3　　　23　　　Vin　　　　　2.56
4　　　30　　　Steve　　　3.20
5　　　29　　　Smith　　　4.60
6　　　23　　　Jack　　　　3.80
7　　　34　　　Lee　　　　　3.78
8　　　40　　　David　　　2.98
9　　　30　　　Gasper　　4.80
10　　51　　　Betina　　4.10
11　　46　　　Andres　　3.65

sum()

Gibt die Summe der Werte des angeforderten Achses zurück. Standardmäßig ist der Achse der Index (Achse=0)

Beispiel

　import　pandas　as　pd
　import　numpy　as　np
　　
　# Create a Series dictionary
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　},'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Erstellung eines DataFrame
　df　=　pd.DataFrame(d)
　print(df.sum())

Laufender Output：

　　　　Age　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　382
Name　　TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...
Rating　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　44.92
dtype:　object

Each individual column is added with a string

axis=1

This syntax will output the following content.

Beispiel

　import　pandas　as　pd
　import　numpy　as　np
　　
　#Erstellung eines Series-Dictionaries
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　},'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　　
　#Erstellung eines DataFrame
　df　=　pd.DataFrame(d)
　print(df.sum(1))

Laufender Output：

　　　　0　　　　29.23
1　　　　29.24
2　　　　28.98
3　　　　25.56
4　　　　33.20
5　　　　33.60
6　　　　26.80
7　　　　37.78
8　　　　42.98
9　　　　34.80
10　　　55.10
11　　　49.65
dtype:　float64

mean()

Returns the average

Beispiel

　import　pandas　as　pd
　import　numpy　as　np
　#Erstellung eines Series-Dictionaries
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　},'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Erstellung eines DataFrame
　df　=　pd.DataFrame(d)
　print(df.mean())

Laufender Output：

　　　　Age　　　　　　　31.833333
Rating　　　　　3.743333
dtype:　float64

std()

Returns the Bressel standard deviation of the numeric columns.

Beispiel

　import　pandas　as　pd
　import　numpy　as　np
　#Erstellung eines Series-Dictionaries
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　},'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Erstellung eines DataFrame
　df　=　pd.DataFrame(d)
　print(df.std())

Laufender Output：

　　　　Age　　　　　　　9.232682
Rating　　0.661628
dtype:　float64

Functions & Description

Now let's understand the features under descriptive statistics in Python Pandas. The following table lists important features:

Number	Method	Description
1	count()	Non-empty number
2	sum()	Total
3	mean()	Mean
4	median()	Median
5	mode()	Mode
6	std()	Standard deviation
7	min()	Minimum value
8	max()	Maximum value
9	abs()	Absolute value
10	prod()	Product
11	cumsum()	Cumulative sum
12	cumprod()	Cumulative product

Note: − Since DataFrame is a heterogeneous data structure, generic operations do not apply to all functions.

Functions such as sum(), cumsum() can be used for numeric and character (or) string data elements without any errors. Although character sets are not commonly used, no exceptions will be thrown.

When a DataFrame contains character or string data, functions such as abs(), cumprod() will raise exceptions because such operations cannot be performed.

Summarize data

Beispiel

　　import　pandas　as　pd
　import　numpy　as　np
　#Erstellung eines Series-Dictionaries
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　},'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Erstellung eines DataFrame
　df　=　pd.DataFrame(d)
　print(df.describe())

Laufender Output：

　　　　　　　　　　　　　　　　Age　　Rating
count　　　　12.000000　　　　　　12.000000
mean　　　　　31.833333　　　　　　　3.743333
std　　　　　　　9.232682　　　　　　　0.661628
min　　　　　　23.000000　　　　　　　2.560000
25%　　　　　　25.000000　　　　　　　3.230000
50%　　　　　　29.500000　　　　　　　3.790000
75%　　　　　　35.500000　　　　　　　4.132500
max　　　　　　51.000000　　　　　　　4.800000

This function provides the mean, std, and IQR values. And, the function does not include character columns and the given summary of numeric columns. "include" is a parameter used to pass necessary information about which columns to consider when summarizing. Value list; the default is "numeric".

object − Summarize string columnsnumber − Summarize numeric columnsall − Summarize all columns together (it should not be taken as a list value)

Below we use the following statement in the program and execute and output:

Beispiel

　　import　pandas　as　pd
　import　numpy　as　np
　#Erstellung eines Series-Dictionaries
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　},'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Erstellung eines DataFrame
　df　=　pd.DataFrame(d)
　print(df.describe(include=['object']))

Laufender Output：

　　　　　　　　　　　Name
count　　　　　　　12
unique　　　　　　12
top　　Ricky
freq　　　　　　　　　1

Below we use the following statement in the program and execute and output:

Beispiel

　　import　pandas　as　pd
　import　numpy　as　np
　#Erstellung eines Series-Dictionaries
　d　=　{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
　　　　'Lee','David','Gasper','Betina','Andres']),
　　　　},'Alter':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
　　　　'Bewertung':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
　}
　#Erstellung eines DataFrame
　df　=　pd.DataFrame(d)
　print(df.　describe(include='all'))

Laufender Output：

　　　　　　　　　　　Alter　　　　　　　　　　Name　　　　　　　Bewertung
count　　　12.000000　　　　　　　　12　　　　12.000000
unique　　　　　　　　NaN　　　　　　　　12　　　　　　　　　　NaN
top　　　　　　　　　　　NaN　　　　　Ricky　　　　　　　　　　NaN
freq　　　　　　　　　　NaN　　　　　　　　　1　　　　　　　　　　NaN
mean　　　　31.833333　　　　　　　NaN　　　　　3.743333
std　　　　　　9.232682　　　　　　　NaN　　　　　0.661628
min　　　　　23.000000　　　　　　　NaN　　　　　2.560000
25%　　　　　25.000000　　　　　　　NaN　　　　　3.230000
50%　　　　　29.500000　　　　　　　NaN　　　　　3.790000
75%　　　　　35.500000　　　　　　　NaN　　　　　4.132500
max　　　　　51.000000　　　　　　　NaN　　　　　4.800000

Pandas SQL-Operationen Pandas Grundmethoden

Pandas-Tutorial

Pandas deskriptive Statistik

Beispiel

sum()

axis=1

mean()

std()

Functions & Description

Summarize data