P07: Pandas Series and DataFrames¶
Quick review of modules¶
pypi.org has 324k packages. From web servers, to machine learning frameworks, to video games, to bitcoin wallets, to recipe managers, to astronomy databases, etc.
There are millions of different classes, functions, constants, in these packages.
we can’t load them all in memory
we can’t ask someone writing the next python library to do X to check all other previous libraries to make sure no one has used the class names they have in mind.
Modules create namespaces, and modules manage what is loaded into memory.
# import numpy as np, then use np.method to interface with methods
import numpy as np
np.mean([9,4,5])
6.0
# import certain functions from a module
from statistics import stdev
stdev([9,4,5])
2.6457513110645907
Import Pandas!¶
DataFrames are an excellent choice if you’re dealing with mixed data types
Think of them as excel spreadsheets - can have columns of numbers, strings, etc
Powerful built in methods for summarizing and analyzing data
Powerful built in methods for cleaning data (removing outliers, missing values, and so forth)
When importing pandas, always use
pd
unless you have a good reason to do otherwise
Pandas Series¶
Pandas quick start guide for Series
A Series is a 1D array that can hold any type of data (numeric types, non-numeric, Python objects and so forth). Think excel spreadsheet with one column.
Each entry is labeled with an index that is used to keep track of what each entry is, and the label can be used to lookup the value corresponding to each index during analysis (remember keys in dictionaries? similar idea)
These labels are fixed - they will always index the same value unless you explicitly break that link.
The list of labels that forms the index can either be declared upon series creation or, by default, it will range from 0 to len(data)-1.
If you’re going to use Pandas to organize your data, specifying usable and informative labels is a good idea because that’s one of the main advantages over other data types like lists
Pandas DataFrames¶
Pandas quick start guide for DataFrames
A DataFrame (DF) is a labeled data struture that can be thought of as a 2D extension of the Series objects (think excel spreadsheet with more than one column)
A dataframe can accept many types of input, multiple Series, a dictionary of 1D arrays, another DF, etc.
Like a Series, DFs contain data values and their corresponding labels. However, because we’re now dealing with a 2D structure, we call the row labels the index argument and the column labels the column argument.
import pandas as pd
Why tabular data?¶
Suppose you want to store and analyze data about multiple attributes/variables like the height and weight of different people
We could make three lists: one for the names, one for the heights, and one for the weights
Then when we analyze the data, we’d know that
names[0]
corresponds toheights[0]
andweights[0]
…but that is pretty clunky and hard to keep track of…
names = ['john', 'sourabh', 'purva', 'pulkit','panayu']
heights = [68, 70, 67, 72, 64]
weight = [180, 170, 150, 200, 145]
print(f'Name: {names[0]}, Height: {heights[0]}, Weight {weight[0]}')
Name: john, Height: 68, Weight 180
A much more effecient approach is to store all of our data in a single dataframe¶
This will allow us to easily access all attributes associated with each person via the index arguments
And each attribute will be stored as a single pandas series (column) in the resulting dataframe
df = pd.DataFrame({'name':names,
'height':heights,
'weight': weight})
df
name | height | weight | |
---|---|---|---|
0 | john | 68 | 180 |
1 | sourabh | 70 | 170 |
2 | purva | 67 | 150 |
3 | pulkit | 72 | 200 |
4 | panayu | 64 | 145 |
A (potentially) more useful way¶
Note that pandas generated a default list of row labels that go from 0:N-1 where N is the number of rows
However, we can also specify more useful row names using the
index
keyword upon dataframe creation
df = pd.DataFrame({'height':heights,
'weight': weight}, index=names)
df
height | weight | |
---|---|---|
john | 68 | 180 |
sourabh | 70 | 170 |
purva | 67 | 150 |
pulkit | 72 | 200 |
panayu | 64 | 145 |
Basic indexing: Columns¶
First do it using the column names
This works much like you use keys to index into a dictionary
Notice that the index arguments (row labels) stay ‘attached’ to the column that we ask for…
df['height']
john 68
sourabh 70
purva 67
pulkit 72
panayu 64
Name: height, dtype: int64
# weight of just the first two people...
df['weight'][0:2]
john 180
sourabh 170
Name: weight, dtype: int64
Can index using the .field
syntax¶
df.height[0:2]
john 68
sourabh 70
Name: height, dtype: int64
# reverse (just like with a list...)
df.weight[::-1]
panayu 145
pulkit 200
purva 150
sourabh 170
john 180
Name: weight, dtype: int64
Basic Indexing: Rows¶
Can also index based on row labels
This is a bit more complex as there are several methods that make use of either the actual label or the order of entry in the dataframe…
Use df.loc to select a row by its label name
Contrary to usual slicing conventions with lists, both the start and the stop indices are included when using the DF.LOC option…see below for demo. This makes sense because you’re indexing by label name, not by a zero-based integer index.
Use df.iloc to select a row by its integer location (from 0 to length-1 of the axis)
You can use booleans to select a set of rows that satisfy some condition (logical indexing)
# index by row label
df.loc['john':'pulkit']
height | weight | |
---|---|---|
john | 68 | 180 |
sourabh | 70 | 170 |
purva | 67 | 150 |
pulkit | 72 | 200 |
# but this won't return anything because pulkit comes after john...
df.loc['pulkit':'john']
height | weight |
---|
iloc
¶
index by position in dataframe - works just like normal indexing of a list/tuple
# first three people
df.iloc[0:3]
height | weight | |
---|---|---|
john | 68 | 180 |
sourabh | 70 | 170 |
purva | 67 | 150 |
# reverse...
df.iloc[::-1]
height | weight | |
---|---|---|
panayu | 64 | 145 |
pulkit | 72 | 200 |
purva | 67 | 150 |
sourabh | 70 | 170 |
john | 68 | 180 |
Loading csv files using read_csv()
¶
Here we will use the mcd-menu.csv file to create a dataframe, which you can download here:
Can also find file on slack and on canvas
Note that here, I am intentionally using default indices (the numbers 0:N-1) so we don’t have to do so much typing…you’ll see.
mcd = pd.read_csv('mcd-menu.csv')
mcd
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
DataFrame operations¶
Get basic information about the shape of the dataframe and the contents¶
# shape - rows by columns, so 260 rows, and 24 columns of data per row
mcd.shape
(260, 24)
# turn the column names into a list...
list(mcd.columns)
# can also do: list(mcd.index) but the index is just default numbers here so not super interesting.
['Category',
'Item',
'Serving Size',
'Calories',
'Calories from Fat',
'Total Fat',
'Total Fat (% Daily Value)',
'Saturated Fat',
'Saturated Fat (% Daily Value)',
'Trans Fat',
'Cholesterol',
'Cholesterol (% Daily Value)',
'Sodium',
'Sodium (% Daily Value)',
'Carbohydrates',
'Carbohydrates (% Daily Value)',
'Dietary Fiber',
'Dietary Fiber (% Daily Value)',
'Sugars',
'Protein',
'Vitamin A (% Daily Value)',
'Vitamin C (% Daily Value)',
'Calcium (% Daily Value)',
'Iron (% Daily Value)']
head and tail¶
see the first few rows of dataframe (
head
) or the last few (tail
)head
andtail
also take optional inputs to specify exactly how many rows you want to viewthis is super useful to do when you first create a dataframe to make sure that everything looks right…
# first 3 rows
mcd.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 rows × 24 columns
# last 5 rows..
mcd.tail(5)
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
5 rows × 24 columns
Selecting columns - can do just like with out simpler dataframe example above…¶
use column names…
mcd['Calories']
0 300
1 250
2 370
3 450
4 400
...
255 510
256 690
257 340
258 810
259 410
Name: Calories, Length: 260, dtype: int64
# multiple columns - can be non-contiguous
mcd[['Calories', 'Saturated Fat']]
Calories | Saturated Fat | |
---|---|---|
0 | 300 | 5.0 |
1 | 250 | 3.0 |
2 | 370 | 8.0 |
3 | 450 | 10.0 |
4 | 400 | 8.0 |
... | ... | ... |
255 | 510 | 9.0 |
256 | 690 | 12.0 |
257 | 340 | 6.0 |
258 | 810 | 15.0 |
259 | 410 | 8.0 |
260 rows × 2 columns
Selecting rows by position¶
can also select by row label using
.loc
as described above, but since we’re using numerial index labels here it makes sense to use.iloc
# first two rows...
mcd.iloc[0:2]
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 rows × 24 columns
Selecting cells¶
above we selected into specific columns and or rows.
we can also select a subset of rows from a specific column, etc…
# print out a few rows so we can see the column names...
mcd.head(2)
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 rows × 24 columns
# this will give us the first entry in the 'Calories' column
mcd['Calories'].iloc[0:10]
0 300
1 250
2 370
3 450
4 400
5 430
6 460
7 520
8 410
9 470
Name: Calories, dtype: int64
# shorthand for above...can leave off the explicit .iloc
mcd['Calories'][0]
300
Getting subset of rows and colums¶
If asking for a non-contiguous set of columns, we can pass in a list defined with
[]
If asking for a contiguous set of columns, we can use
:
Note: need to be super careful using
.loc
(e.g., if you sorted the dataframe, then the following code would return all rows between the one labeled 0 and the one labeled 10…and if they are not in ascending order anymore then that might give you something unexpected!)
# first 10 rows, Calories and Total Fat columns
# index + name
mcd.loc[0:10, ['Calories','Total Fat']]
Calories | Total Fat | |
---|---|---|
0 | 300 | 13.0 |
1 | 250 | 8.0 |
2 | 370 | 23.0 |
3 | 450 | 28.0 |
4 | 400 | 23.0 |
5 | 430 | 23.0 |
6 | 460 | 26.0 |
7 | 520 | 30.0 |
8 | 410 | 20.0 |
9 | 470 | 25.0 |
10 | 430 | 27.0 |
# first 10 rows, all columns between 'Calories' and 'Total Fat'
mcd.loc[0:10, 'Calories':'Total Fat']
Calories | Calories from Fat | Total Fat | |
---|---|---|---|
0 | 300 | 120 | 13.0 |
1 | 250 | 70 | 8.0 |
2 | 370 | 200 | 23.0 |
3 | 450 | 250 | 28.0 |
4 | 400 | 210 | 23.0 |
5 | 430 | 210 | 23.0 |
6 | 460 | 230 | 26.0 |
7 | 520 | 270 | 30.0 |
8 | 410 | 180 | 20.0 |
9 | 470 | 220 | 25.0 |
10 | 430 | 240 | 27.0 |
Filtering rows via logical indexing¶
logical indexing (boolean indexing) will filter dataframes based on whether certain conditions are met
we did some of this with lists, and the concepts are the same, but it can get a little more complex because now we’re dealing with larger and multi-dimensional data sets.
start by filtering the dataframe based on the values of entries in a specific row.
# return a version of the dataframe with just rows
# where 'Calories' are greater than 1000
mcd[ mcd['Calories'] >= 1000 ]
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
31 | Breakfast | Big Breakfast with Hotcakes (Regular Biscuit) | 14.8 oz (420 g) | 1090 | 510 | 56.0 | 87 | 19.0 | 96 | 0.0 | ... | 111 | 37 | 6 | 23 | 17 | 36 | 15 | 2 | 25 | 40 |
32 | Breakfast | Big Breakfast with Hotcakes (Large Biscuit) | 15.3 oz (434 g) | 1150 | 540 | 60.0 | 93 | 20.0 | 100 | 0.0 | ... | 116 | 39 | 7 | 28 | 17 | 36 | 15 | 2 | 30 | 40 |
34 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (La... | 15.4 oz (437 g) | 1050 | 450 | 50.0 | 77 | 16.0 | 81 | 0.0 | ... | 115 | 38 | 7 | 28 | 18 | 35 | 4 | 2 | 25 | 30 |
82 | Chicken & Fish | Chicken McNuggets (40 piece) | 22.8 oz (646 g) | 1880 | 1060 | 118.0 | 182 | 20.0 | 101 | 1.0 | ... | 118 | 39 | 6 | 24 | 1 | 87 | 0 | 15 | 8 | 25 |
4 rows × 24 columns
# return a version of the dataframe with just rows
# where 'Total Fat' is greater than or equal to 40
# if we want to store the output of this filtering operation
# we need to assign the output (LHS)
fat40 = mcd[ mcd['Total Fat'] >= 40 ]
fat40.head(10)
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
27 | Breakfast | Big Breakfast (Regular Biscuit) | 9.5 oz (269 g) | 740 | 430 | 48.0 | 73 | 17.0 | 87 | 0.0 | ... | 51 | 17 | 3 | 12 | 3 | 28 | 15 | 2 | 15 | 25 |
28 | Breakfast | Big Breakfast (Large Biscuit) | 10 oz (283 g) | 800 | 470 | 52.0 | 80 | 18.0 | 90 | 0.0 | ... | 56 | 19 | 4 | 17 | 3 | 28 | 15 | 2 | 15 | 30 |
30 | Breakfast | Big Breakfast with Egg Whites (Large Biscuit) | 10.1 oz (286 g) | 690 | 370 | 41.0 | 63 | 14.0 | 72 | 0.0 | ... | 55 | 18 | 4 | 17 | 4 | 26 | 4 | 2 | 10 | 15 |
31 | Breakfast | Big Breakfast with Hotcakes (Regular Biscuit) | 14.8 oz (420 g) | 1090 | 510 | 56.0 | 87 | 19.0 | 96 | 0.0 | ... | 111 | 37 | 6 | 23 | 17 | 36 | 15 | 2 | 25 | 40 |
32 | Breakfast | Big Breakfast with Hotcakes (Large Biscuit) | 15.3 oz (434 g) | 1150 | 540 | 60.0 | 93 | 20.0 | 100 | 0.0 | ... | 116 | 39 | 7 | 28 | 17 | 36 | 15 | 2 | 30 | 40 |
33 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (Re... | 14.9 oz (423 g) | 990 | 410 | 46.0 | 70 | 16.0 | 78 | 0.0 | ... | 110 | 37 | 6 | 23 | 17 | 35 | 0 | 2 | 25 | 30 |
34 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (La... | 15.4 oz (437 g) | 1050 | 450 | 50.0 | 77 | 16.0 | 81 | 0.0 | ... | 115 | 38 | 7 | 28 | 18 | 35 | 4 | 2 | 25 | 30 |
47 | Beef & Pork | Double Quarter Pounder with Cheese | 10 oz (283 g) | 750 | 380 | 43.0 | 66 | 19.0 | 96 | 2.5 | ... | 42 | 14 | 3 | 11 | 10 | 48 | 10 | 2 | 30 | 35 |
51 | Beef & Pork | Bacon Clubhouse Burger | 9.5 oz (270 g) | 720 | 360 | 40.0 | 62 | 15.0 | 75 | 1.5 | ... | 51 | 17 | 4 | 14 | 14 | 39 | 8 | 25 | 30 | 25 |
81 | Chicken & Fish | Chicken McNuggets (20 piece) | 11.4 oz (323 g) | 940 | 530 | 59.0 | 91 | 10.0 | 50 | 0.0 | ... | 59 | 20 | 3 | 12 | 0 | 44 | 0 | 8 | 4 | 10 |
10 rows × 24 columns
This also works on non-numerical categories…¶
mcd[ mcd['Category'] == 'Breakfast' ]
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 | Breakfast | Steak & Egg McMuffin | 6.5 oz (185 g) | 430 | 210 | 23.0 | 36 | 9.0 | 46 | 1.0 | ... | 31 | 10 | 4 | 18 | 3 | 26 | 15 | 2 | 30 | 20 |
6 | Breakfast | Bacon, Egg & Cheese Biscuit (Regular Biscuit) | 5.3 oz (150 g) | 460 | 230 | 26.0 | 40 | 13.0 | 65 | 0.0 | ... | 38 | 13 | 2 | 7 | 3 | 19 | 10 | 8 | 15 | 15 |
7 | Breakfast | Bacon, Egg & Cheese Biscuit (Large Biscuit) | 5.8 oz (164 g) | 520 | 270 | 30.0 | 47 | 14.0 | 68 | 0.0 | ... | 43 | 14 | 3 | 12 | 4 | 19 | 15 | 8 | 20 | 20 |
8 | Breakfast | Bacon, Egg & Cheese Biscuit with Egg Whites (R... | 5.4 oz (153 g) | 410 | 180 | 20.0 | 32 | 11.0 | 56 | 0.0 | ... | 36 | 12 | 2 | 7 | 3 | 20 | 2 | 8 | 15 | 10 |
9 | Breakfast | Bacon, Egg & Cheese Biscuit with Egg Whites (L... | 5.9 oz (167 g) | 470 | 220 | 25.0 | 38 | 12.0 | 59 | 0.0 | ... | 42 | 14 | 3 | 12 | 4 | 20 | 6 | 8 | 15 | 15 |
10 | Breakfast | Sausage Biscuit (Regular Biscuit) | 4.1 oz (117 g) | 430 | 240 | 27.0 | 42 | 12.0 | 62 | 0.0 | ... | 34 | 11 | 2 | 6 | 2 | 11 | 0 | 0 | 6 | 15 |
11 | Breakfast | Sausage Biscuit (Large Biscuit) | 4.6 oz (131 g) | 480 | 280 | 31.0 | 48 | 13.0 | 65 | 0.0 | ... | 39 | 13 | 3 | 11 | 3 | 11 | 4 | 0 | 8 | 15 |
12 | Breakfast | Sausage Biscuit with Egg (Regular Biscuit) | 5.7 oz (163 g) | 510 | 290 | 33.0 | 50 | 14.0 | 71 | 0.0 | ... | 36 | 12 | 2 | 6 | 2 | 18 | 6 | 0 | 10 | 20 |
13 | Breakfast | Sausage Biscuit with Egg (Large Biscuit) | 6.2 oz (177 g) | 570 | 330 | 37.0 | 57 | 15.0 | 74 | 0.0 | ... | 42 | 14 | 3 | 11 | 3 | 18 | 10 | 0 | 10 | 20 |
14 | Breakfast | Sausage Biscuit with Egg Whites (Regular Biscuit) | 5.9 oz (167 g) | 460 | 250 | 27.0 | 42 | 12.0 | 62 | 0.0 | ... | 34 | 11 | 2 | 6 | 3 | 18 | 0 | 0 | 8 | 15 |
15 | Breakfast | Sausage Biscuit with Egg Whites (Large Biscuit) | 6.4 oz (181 g) | 520 | 280 | 32.0 | 49 | 13.0 | 65 | 0.0 | ... | 40 | 13 | 3 | 11 | 3 | 18 | 4 | 0 | 8 | 15 |
16 | Breakfast | Southern Style Chicken Biscuit (Regular Biscuit) | 5 oz (143 g) | 410 | 180 | 20.0 | 31 | 8.0 | 41 | 0.0 | ... | 41 | 14 | 2 | 6 | 3 | 17 | 0 | 2 | 6 | 15 |
17 | Breakfast | Southern Style Chicken Biscuit (Large Biscuit) | 5.5 oz (157 g) | 470 | 220 | 24.0 | 37 | 9.0 | 45 | 0.0 | ... | 46 | 15 | 3 | 11 | 4 | 17 | 4 | 2 | 8 | 15 |
18 | Breakfast | Steak & Egg Biscuit (Regular Biscuit) | 7.1 oz (201 g) | 540 | 290 | 32.0 | 49 | 16.0 | 78 | 1.0 | ... | 38 | 13 | 2 | 8 | 3 | 25 | 10 | 2 | 20 | 25 |
19 | Breakfast | Bacon, Egg & Cheese McGriddles | 6.1 oz (174 g) | 460 | 190 | 21.0 | 32 | 9.0 | 44 | 0.0 | ... | 48 | 16 | 2 | 9 | 15 | 19 | 10 | 10 | 20 | 15 |
20 | Breakfast | Bacon, Egg & Cheese McGriddles with Egg Whites | 6.3 oz (178 g) | 400 | 140 | 15.0 | 24 | 7.0 | 34 | 0.0 | ... | 47 | 16 | 2 | 9 | 16 | 20 | 2 | 10 | 15 | 10 |
21 | Breakfast | Sausage McGriddles | 5 oz (141 g) | 420 | 200 | 22.0 | 34 | 8.0 | 40 | 0.0 | ... | 44 | 15 | 2 | 8 | 15 | 11 | 0 | 0 | 8 | 10 |
22 | Breakfast | Sausage, Egg & Cheese McGriddles | 7.1 oz (201 g) | 550 | 280 | 31.0 | 48 | 12.0 | 61 | 0.0 | ... | 48 | 16 | 2 | 9 | 15 | 20 | 10 | 0 | 20 | 15 |
23 | Breakfast | Sausage, Egg & Cheese McGriddles with Egg Whites | 7.2 oz (205 g) | 500 | 230 | 26.0 | 40 | 10.0 | 52 | 0.0 | ... | 46 | 15 | 2 | 9 | 15 | 21 | 2 | 0 | 20 | 10 |
24 | Breakfast | Bacon, Egg & Cheese Bagel | 6.9 oz (197 g) | 620 | 280 | 31.0 | 48 | 11.0 | 56 | 0.5 | ... | 57 | 19 | 3 | 11 | 7 | 30 | 20 | 15 | 20 | 20 |
25 | Breakfast | Bacon, Egg & Cheese Bagel with Egg Whites | 7.1 oz (201 g) | 570 | 230 | 25.0 | 39 | 9.0 | 45 | 0.5 | ... | 55 | 18 | 3 | 12 | 8 | 30 | 10 | 15 | 20 | 15 |
26 | Breakfast | Steak, Egg & Cheese Bagel | 8.5 oz (241 g) | 670 | 310 | 35.0 | 53 | 13.0 | 63 | 1.5 | ... | 56 | 19 | 3 | 12 | 7 | 33 | 20 | 4 | 25 | 25 |
27 | Breakfast | Big Breakfast (Regular Biscuit) | 9.5 oz (269 g) | 740 | 430 | 48.0 | 73 | 17.0 | 87 | 0.0 | ... | 51 | 17 | 3 | 12 | 3 | 28 | 15 | 2 | 15 | 25 |
28 | Breakfast | Big Breakfast (Large Biscuit) | 10 oz (283 g) | 800 | 470 | 52.0 | 80 | 18.0 | 90 | 0.0 | ... | 56 | 19 | 4 | 17 | 3 | 28 | 15 | 2 | 15 | 30 |
29 | Breakfast | Big Breakfast with Egg Whites (Regular Biscuit) | 9.6 oz (272 g) | 640 | 330 | 37.0 | 57 | 14.0 | 69 | 0.0 | ... | 50 | 17 | 3 | 12 | 3 | 26 | 0 | 2 | 10 | 15 |
30 | Breakfast | Big Breakfast with Egg Whites (Large Biscuit) | 10.1 oz (286 g) | 690 | 370 | 41.0 | 63 | 14.0 | 72 | 0.0 | ... | 55 | 18 | 4 | 17 | 4 | 26 | 4 | 2 | 10 | 15 |
31 | Breakfast | Big Breakfast with Hotcakes (Regular Biscuit) | 14.8 oz (420 g) | 1090 | 510 | 56.0 | 87 | 19.0 | 96 | 0.0 | ... | 111 | 37 | 6 | 23 | 17 | 36 | 15 | 2 | 25 | 40 |
32 | Breakfast | Big Breakfast with Hotcakes (Large Biscuit) | 15.3 oz (434 g) | 1150 | 540 | 60.0 | 93 | 20.0 | 100 | 0.0 | ... | 116 | 39 | 7 | 28 | 17 | 36 | 15 | 2 | 30 | 40 |
33 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (Re... | 14.9 oz (423 g) | 990 | 410 | 46.0 | 70 | 16.0 | 78 | 0.0 | ... | 110 | 37 | 6 | 23 | 17 | 35 | 0 | 2 | 25 | 30 |
34 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (La... | 15.4 oz (437 g) | 1050 | 450 | 50.0 | 77 | 16.0 | 81 | 0.0 | ... | 115 | 38 | 7 | 28 | 18 | 35 | 4 | 2 | 25 | 30 |
35 | Breakfast | Hotcakes | 5.3 oz (151 g) | 350 | 80 | 9.0 | 13 | 2.0 | 9 | 0.0 | ... | 60 | 20 | 3 | 10 | 14 | 8 | 0 | 0 | 15 | 15 |
36 | Breakfast | Hotcakes and Sausage | 6.8 oz (192 g) | 520 | 210 | 24.0 | 37 | 7.0 | 36 | 0.0 | ... | 61 | 20 | 3 | 10 | 14 | 15 | 0 | 0 | 15 | 15 |
37 | Breakfast | Sausage Burrito | 3.9 oz (111 g) | 300 | 150 | 16.0 | 25 | 7.0 | 33 | 0.0 | ... | 26 | 9 | 1 | 5 | 2 | 12 | 10 | 2 | 15 | 15 |
38 | Breakfast | Hash Brown | 2 oz (56 g) | 150 | 80 | 9.0 | 14 | 1.5 | 6 | 0.0 | ... | 15 | 5 | 2 | 6 | 0 | 1 | 0 | 2 | 0 | 2 |
39 | Breakfast | Cinnamon Melts | 4 oz (114 g) | 460 | 170 | 19.0 | 30 | 9.0 | 43 | 0.0 | ... | 66 | 22 | 3 | 11 | 32 | 6 | 4 | 0 | 6 | 15 |
40 | Breakfast | Fruit & Maple Oatmeal | 9.6 oz (251 g) | 290 | 35 | 4.0 | 6 | 1.5 | 8 | 0.0 | ... | 58 | 19 | 5 | 19 | 32 | 5 | 2 | 130 | 10 | 10 |
41 | Breakfast | Fruit & Maple Oatmeal without Brown Sugar | 9.6 oz (251 g) | 260 | 40 | 4.0 | 6 | 1.5 | 8 | 0.0 | ... | 49 | 16 | 5 | 22 | 18 | 5 | 2 | 130 | 6 | 10 |
42 rows × 24 columns
And we chain together multiple conditions as well…¶
Example: all entries where Category is Breakfast, Calories greater than 300, and Calories less than 500
mcd[ (mcd['Category'] == 'Breakfast') &
(mcd['Calories'] > 300) &
(mcd['Calories'] < 500) ]
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 | Breakfast | Steak & Egg McMuffin | 6.5 oz (185 g) | 430 | 210 | 23.0 | 36 | 9.0 | 46 | 1.0 | ... | 31 | 10 | 4 | 18 | 3 | 26 | 15 | 2 | 30 | 20 |
6 | Breakfast | Bacon, Egg & Cheese Biscuit (Regular Biscuit) | 5.3 oz (150 g) | 460 | 230 | 26.0 | 40 | 13.0 | 65 | 0.0 | ... | 38 | 13 | 2 | 7 | 3 | 19 | 10 | 8 | 15 | 15 |
8 | Breakfast | Bacon, Egg & Cheese Biscuit with Egg Whites (R... | 5.4 oz (153 g) | 410 | 180 | 20.0 | 32 | 11.0 | 56 | 0.0 | ... | 36 | 12 | 2 | 7 | 3 | 20 | 2 | 8 | 15 | 10 |
9 | Breakfast | Bacon, Egg & Cheese Biscuit with Egg Whites (L... | 5.9 oz (167 g) | 470 | 220 | 25.0 | 38 | 12.0 | 59 | 0.0 | ... | 42 | 14 | 3 | 12 | 4 | 20 | 6 | 8 | 15 | 15 |
10 | Breakfast | Sausage Biscuit (Regular Biscuit) | 4.1 oz (117 g) | 430 | 240 | 27.0 | 42 | 12.0 | 62 | 0.0 | ... | 34 | 11 | 2 | 6 | 2 | 11 | 0 | 0 | 6 | 15 |
11 | Breakfast | Sausage Biscuit (Large Biscuit) | 4.6 oz (131 g) | 480 | 280 | 31.0 | 48 | 13.0 | 65 | 0.0 | ... | 39 | 13 | 3 | 11 | 3 | 11 | 4 | 0 | 8 | 15 |
14 | Breakfast | Sausage Biscuit with Egg Whites (Regular Biscuit) | 5.9 oz (167 g) | 460 | 250 | 27.0 | 42 | 12.0 | 62 | 0.0 | ... | 34 | 11 | 2 | 6 | 3 | 18 | 0 | 0 | 8 | 15 |
16 | Breakfast | Southern Style Chicken Biscuit (Regular Biscuit) | 5 oz (143 g) | 410 | 180 | 20.0 | 31 | 8.0 | 41 | 0.0 | ... | 41 | 14 | 2 | 6 | 3 | 17 | 0 | 2 | 6 | 15 |
17 | Breakfast | Southern Style Chicken Biscuit (Large Biscuit) | 5.5 oz (157 g) | 470 | 220 | 24.0 | 37 | 9.0 | 45 | 0.0 | ... | 46 | 15 | 3 | 11 | 4 | 17 | 4 | 2 | 8 | 15 |
19 | Breakfast | Bacon, Egg & Cheese McGriddles | 6.1 oz (174 g) | 460 | 190 | 21.0 | 32 | 9.0 | 44 | 0.0 | ... | 48 | 16 | 2 | 9 | 15 | 19 | 10 | 10 | 20 | 15 |
20 | Breakfast | Bacon, Egg & Cheese McGriddles with Egg Whites | 6.3 oz (178 g) | 400 | 140 | 15.0 | 24 | 7.0 | 34 | 0.0 | ... | 47 | 16 | 2 | 9 | 16 | 20 | 2 | 10 | 15 | 10 |
21 | Breakfast | Sausage McGriddles | 5 oz (141 g) | 420 | 200 | 22.0 | 34 | 8.0 | 40 | 0.0 | ... | 44 | 15 | 2 | 8 | 15 | 11 | 0 | 0 | 8 | 10 |
35 | Breakfast | Hotcakes | 5.3 oz (151 g) | 350 | 80 | 9.0 | 13 | 2.0 | 9 | 0.0 | ... | 60 | 20 | 3 | 10 | 14 | 8 | 0 | 0 | 15 | 15 |
39 | Breakfast | Cinnamon Melts | 4 oz (114 g) | 460 | 170 | 19.0 | 30 | 9.0 | 43 | 0.0 | ... | 66 | 22 | 3 | 11 | 32 | 6 | 4 | 0 | 6 | 15 |
17 rows × 24 columns
How does this work?¶
If we take a look at the filtering conditionals, they will determine all rows where the given condition is met
If they are met, you’ll get a
True
value returned, and if they are not met, you’ll get aFalse
value returned…Take a look if we just specify the conditionals…
Note that the length of the output matches the length of our dataframe (the number of rows)…i.e., every row has either a
True
or aFalse
(mcd['Category'] == 'Breakfast') & (mcd['Calories'] > 300)
0 False
1 False
2 True
3 True
4 True
...
255 False
256 False
257 False
258 False
259 False
Length: 260, dtype: bool
# now lets assign this list of booleans to another variable
index = (mcd['Category'] == 'Breakfast') & (mcd['Calories'] > 300)
# and by substitution, we get...
mcd [ index ]
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 | Breakfast | Steak & Egg McMuffin | 6.5 oz (185 g) | 430 | 210 | 23.0 | 36 | 9.0 | 46 | 1.0 | ... | 31 | 10 | 4 | 18 | 3 | 26 | 15 | 2 | 30 | 20 |
6 | Breakfast | Bacon, Egg & Cheese Biscuit (Regular Biscuit) | 5.3 oz (150 g) | 460 | 230 | 26.0 | 40 | 13.0 | 65 | 0.0 | ... | 38 | 13 | 2 | 7 | 3 | 19 | 10 | 8 | 15 | 15 |
7 | Breakfast | Bacon, Egg & Cheese Biscuit (Large Biscuit) | 5.8 oz (164 g) | 520 | 270 | 30.0 | 47 | 14.0 | 68 | 0.0 | ... | 43 | 14 | 3 | 12 | 4 | 19 | 15 | 8 | 20 | 20 |
8 | Breakfast | Bacon, Egg & Cheese Biscuit with Egg Whites (R... | 5.4 oz (153 g) | 410 | 180 | 20.0 | 32 | 11.0 | 56 | 0.0 | ... | 36 | 12 | 2 | 7 | 3 | 20 | 2 | 8 | 15 | 10 |
9 | Breakfast | Bacon, Egg & Cheese Biscuit with Egg Whites (L... | 5.9 oz (167 g) | 470 | 220 | 25.0 | 38 | 12.0 | 59 | 0.0 | ... | 42 | 14 | 3 | 12 | 4 | 20 | 6 | 8 | 15 | 15 |
10 | Breakfast | Sausage Biscuit (Regular Biscuit) | 4.1 oz (117 g) | 430 | 240 | 27.0 | 42 | 12.0 | 62 | 0.0 | ... | 34 | 11 | 2 | 6 | 2 | 11 | 0 | 0 | 6 | 15 |
11 | Breakfast | Sausage Biscuit (Large Biscuit) | 4.6 oz (131 g) | 480 | 280 | 31.0 | 48 | 13.0 | 65 | 0.0 | ... | 39 | 13 | 3 | 11 | 3 | 11 | 4 | 0 | 8 | 15 |
12 | Breakfast | Sausage Biscuit with Egg (Regular Biscuit) | 5.7 oz (163 g) | 510 | 290 | 33.0 | 50 | 14.0 | 71 | 0.0 | ... | 36 | 12 | 2 | 6 | 2 | 18 | 6 | 0 | 10 | 20 |
13 | Breakfast | Sausage Biscuit with Egg (Large Biscuit) | 6.2 oz (177 g) | 570 | 330 | 37.0 | 57 | 15.0 | 74 | 0.0 | ... | 42 | 14 | 3 | 11 | 3 | 18 | 10 | 0 | 10 | 20 |
14 | Breakfast | Sausage Biscuit with Egg Whites (Regular Biscuit) | 5.9 oz (167 g) | 460 | 250 | 27.0 | 42 | 12.0 | 62 | 0.0 | ... | 34 | 11 | 2 | 6 | 3 | 18 | 0 | 0 | 8 | 15 |
15 | Breakfast | Sausage Biscuit with Egg Whites (Large Biscuit) | 6.4 oz (181 g) | 520 | 280 | 32.0 | 49 | 13.0 | 65 | 0.0 | ... | 40 | 13 | 3 | 11 | 3 | 18 | 4 | 0 | 8 | 15 |
16 | Breakfast | Southern Style Chicken Biscuit (Regular Biscuit) | 5 oz (143 g) | 410 | 180 | 20.0 | 31 | 8.0 | 41 | 0.0 | ... | 41 | 14 | 2 | 6 | 3 | 17 | 0 | 2 | 6 | 15 |
17 | Breakfast | Southern Style Chicken Biscuit (Large Biscuit) | 5.5 oz (157 g) | 470 | 220 | 24.0 | 37 | 9.0 | 45 | 0.0 | ... | 46 | 15 | 3 | 11 | 4 | 17 | 4 | 2 | 8 | 15 |
18 | Breakfast | Steak & Egg Biscuit (Regular Biscuit) | 7.1 oz (201 g) | 540 | 290 | 32.0 | 49 | 16.0 | 78 | 1.0 | ... | 38 | 13 | 2 | 8 | 3 | 25 | 10 | 2 | 20 | 25 |
19 | Breakfast | Bacon, Egg & Cheese McGriddles | 6.1 oz (174 g) | 460 | 190 | 21.0 | 32 | 9.0 | 44 | 0.0 | ... | 48 | 16 | 2 | 9 | 15 | 19 | 10 | 10 | 20 | 15 |
20 | Breakfast | Bacon, Egg & Cheese McGriddles with Egg Whites | 6.3 oz (178 g) | 400 | 140 | 15.0 | 24 | 7.0 | 34 | 0.0 | ... | 47 | 16 | 2 | 9 | 16 | 20 | 2 | 10 | 15 | 10 |
21 | Breakfast | Sausage McGriddles | 5 oz (141 g) | 420 | 200 | 22.0 | 34 | 8.0 | 40 | 0.0 | ... | 44 | 15 | 2 | 8 | 15 | 11 | 0 | 0 | 8 | 10 |
22 | Breakfast | Sausage, Egg & Cheese McGriddles | 7.1 oz (201 g) | 550 | 280 | 31.0 | 48 | 12.0 | 61 | 0.0 | ... | 48 | 16 | 2 | 9 | 15 | 20 | 10 | 0 | 20 | 15 |
23 | Breakfast | Sausage, Egg & Cheese McGriddles with Egg Whites | 7.2 oz (205 g) | 500 | 230 | 26.0 | 40 | 10.0 | 52 | 0.0 | ... | 46 | 15 | 2 | 9 | 15 | 21 | 2 | 0 | 20 | 10 |
24 | Breakfast | Bacon, Egg & Cheese Bagel | 6.9 oz (197 g) | 620 | 280 | 31.0 | 48 | 11.0 | 56 | 0.5 | ... | 57 | 19 | 3 | 11 | 7 | 30 | 20 | 15 | 20 | 20 |
25 | Breakfast | Bacon, Egg & Cheese Bagel with Egg Whites | 7.1 oz (201 g) | 570 | 230 | 25.0 | 39 | 9.0 | 45 | 0.5 | ... | 55 | 18 | 3 | 12 | 8 | 30 | 10 | 15 | 20 | 15 |
26 | Breakfast | Steak, Egg & Cheese Bagel | 8.5 oz (241 g) | 670 | 310 | 35.0 | 53 | 13.0 | 63 | 1.5 | ... | 56 | 19 | 3 | 12 | 7 | 33 | 20 | 4 | 25 | 25 |
27 | Breakfast | Big Breakfast (Regular Biscuit) | 9.5 oz (269 g) | 740 | 430 | 48.0 | 73 | 17.0 | 87 | 0.0 | ... | 51 | 17 | 3 | 12 | 3 | 28 | 15 | 2 | 15 | 25 |
28 | Breakfast | Big Breakfast (Large Biscuit) | 10 oz (283 g) | 800 | 470 | 52.0 | 80 | 18.0 | 90 | 0.0 | ... | 56 | 19 | 4 | 17 | 3 | 28 | 15 | 2 | 15 | 30 |
29 | Breakfast | Big Breakfast with Egg Whites (Regular Biscuit) | 9.6 oz (272 g) | 640 | 330 | 37.0 | 57 | 14.0 | 69 | 0.0 | ... | 50 | 17 | 3 | 12 | 3 | 26 | 0 | 2 | 10 | 15 |
30 | Breakfast | Big Breakfast with Egg Whites (Large Biscuit) | 10.1 oz (286 g) | 690 | 370 | 41.0 | 63 | 14.0 | 72 | 0.0 | ... | 55 | 18 | 4 | 17 | 4 | 26 | 4 | 2 | 10 | 15 |
31 | Breakfast | Big Breakfast with Hotcakes (Regular Biscuit) | 14.8 oz (420 g) | 1090 | 510 | 56.0 | 87 | 19.0 | 96 | 0.0 | ... | 111 | 37 | 6 | 23 | 17 | 36 | 15 | 2 | 25 | 40 |
32 | Breakfast | Big Breakfast with Hotcakes (Large Biscuit) | 15.3 oz (434 g) | 1150 | 540 | 60.0 | 93 | 20.0 | 100 | 0.0 | ... | 116 | 39 | 7 | 28 | 17 | 36 | 15 | 2 | 30 | 40 |
33 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (Re... | 14.9 oz (423 g) | 990 | 410 | 46.0 | 70 | 16.0 | 78 | 0.0 | ... | 110 | 37 | 6 | 23 | 17 | 35 | 0 | 2 | 25 | 30 |
34 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (La... | 15.4 oz (437 g) | 1050 | 450 | 50.0 | 77 | 16.0 | 81 | 0.0 | ... | 115 | 38 | 7 | 28 | 18 | 35 | 4 | 2 | 25 | 30 |
35 | Breakfast | Hotcakes | 5.3 oz (151 g) | 350 | 80 | 9.0 | 13 | 2.0 | 9 | 0.0 | ... | 60 | 20 | 3 | 10 | 14 | 8 | 0 | 0 | 15 | 15 |
36 | Breakfast | Hotcakes and Sausage | 6.8 oz (192 g) | 520 | 210 | 24.0 | 37 | 7.0 | 36 | 0.0 | ... | 61 | 20 | 3 | 10 | 14 | 15 | 0 | 0 | 15 | 15 |
39 | Breakfast | Cinnamon Melts | 4 oz (114 g) | 460 | 170 | 19.0 | 30 | 9.0 | 43 | 0.0 | ... | 66 | 22 | 3 | 11 | 32 | 6 | 4 | 0 | 6 | 15 |
36 rows × 24 columns
Review of logical indexing…¶
make conjunctions by combining arrays of logical values with & (and operations)
make disjunctions with | (or operations)
and we need to put comparison operations in () before combining
Other handy methods…¶
Basic math operations…¶
describe is a handy way to get summary stats…
mean, std, etc…
mcd.describe()
Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | Cholesterol | Cholesterol (% Daily Value) | Sodium | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | ... | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 | 260.000000 |
mean | 368.269231 | 127.096154 | 14.165385 | 21.815385 | 6.007692 | 29.965385 | 0.203846 | 54.942308 | 18.392308 | 495.750000 | ... | 47.346154 | 15.780769 | 1.630769 | 6.530769 | 29.423077 | 13.338462 | 13.426923 | 8.534615 | 20.973077 | 7.734615 |
std | 240.269886 | 127.875914 | 14.205998 | 21.885199 | 5.321873 | 26.639209 | 0.429133 | 87.269257 | 29.091653 | 577.026323 | ... | 28.252232 | 9.419544 | 1.567717 | 6.307057 | 28.679797 | 11.426146 | 24.366381 | 26.345542 | 17.019953 | 8.723263 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 210.000000 | 20.000000 | 2.375000 | 3.750000 | 1.000000 | 4.750000 | 0.000000 | 5.000000 | 2.000000 | 107.500000 | ... | 30.000000 | 10.000000 | 0.000000 | 0.000000 | 5.750000 | 4.000000 | 2.000000 | 0.000000 | 6.000000 | 0.000000 |
50% | 340.000000 | 100.000000 | 11.000000 | 17.000000 | 5.000000 | 24.000000 | 0.000000 | 35.000000 | 11.000000 | 190.000000 | ... | 44.000000 | 15.000000 | 1.000000 | 5.000000 | 17.500000 | 12.000000 | 8.000000 | 0.000000 | 20.000000 | 4.000000 |
75% | 500.000000 | 200.000000 | 22.250000 | 35.000000 | 10.000000 | 48.000000 | 0.000000 | 65.000000 | 21.250000 | 865.000000 | ... | 60.000000 | 20.000000 | 3.000000 | 10.000000 | 48.000000 | 19.000000 | 15.000000 | 4.000000 | 30.000000 | 15.000000 |
max | 1880.000000 | 1060.000000 | 118.000000 | 182.000000 | 20.000000 | 102.000000 | 2.500000 | 575.000000 | 192.000000 | 3600.000000 | ... | 141.000000 | 47.000000 | 7.000000 | 28.000000 | 128.000000 | 87.000000 | 170.000000 | 240.000000 | 70.000000 | 40.000000 |
8 rows × 21 columns
# mean Calories...
mcd.Calories.mean()
# or
# mcd['Calories'].mean()
368.2692307692308
# standard deviation
mcd.Calories.std()
240.26988649145852
Sorting entire dataframe based on values in one column¶
here we can easily find the menu item with the fewest calories
sort by the ‘Calories’ column, then pull out the name of the item
default behavior is to not sort in place, so we need to reassign!
# sort - ascending order by default
mcd.sort_values('Calories')
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
125 | Beverages | Diet Dr Pepper (Child) | 12 fl oz cup | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
138 | Coffee & Tea | Iced Tea (Medium) | 21 fl oz cup | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
137 | Coffee & Tea | Iced Tea (Small) | 16 fl oz cup | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
136 | Beverages | Dasani Water Bottle | 16.9 fl oz | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
145 | Coffee & Tea | Coffee (Small) | 12 fl oz cup | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
33 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (Re... | 14.9 oz (423 g) | 990 | 410 | 46.0 | 70 | 16.0 | 78 | 0.0 | ... | 110 | 37 | 6 | 23 | 17 | 35 | 0 | 2 | 25 | 30 |
34 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (La... | 15.4 oz (437 g) | 1050 | 450 | 50.0 | 77 | 16.0 | 81 | 0.0 | ... | 115 | 38 | 7 | 28 | 18 | 35 | 4 | 2 | 25 | 30 |
31 | Breakfast | Big Breakfast with Hotcakes (Regular Biscuit) | 14.8 oz (420 g) | 1090 | 510 | 56.0 | 87 | 19.0 | 96 | 0.0 | ... | 111 | 37 | 6 | 23 | 17 | 36 | 15 | 2 | 25 | 40 |
32 | Breakfast | Big Breakfast with Hotcakes (Large Biscuit) | 15.3 oz (434 g) | 1150 | 540 | 60.0 | 93 | 20.0 | 100 | 0.0 | ... | 116 | 39 | 7 | 28 | 17 | 36 | 15 | 2 | 30 | 40 |
82 | Chicken & Fish | Chicken McNuggets (40 piece) | 22.8 oz (646 g) | 1880 | 1060 | 118.0 | 182 | 20.0 | 101 | 1.0 | ... | 118 | 39 | 6 | 24 | 1 | 87 | 0 | 15 | 8 | 25 |
260 rows × 24 columns
# descending order (ascending = False)
mcd.sort_values('Calories', ascending = False)
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
82 | Chicken & Fish | Chicken McNuggets (40 piece) | 22.8 oz (646 g) | 1880 | 1060 | 118.0 | 182 | 20.0 | 101 | 1.0 | ... | 118 | 39 | 6 | 24 | 1 | 87 | 0 | 15 | 8 | 25 |
32 | Breakfast | Big Breakfast with Hotcakes (Large Biscuit) | 15.3 oz (434 g) | 1150 | 540 | 60.0 | 93 | 20.0 | 100 | 0.0 | ... | 116 | 39 | 7 | 28 | 17 | 36 | 15 | 2 | 30 | 40 |
31 | Breakfast | Big Breakfast with Hotcakes (Regular Biscuit) | 14.8 oz (420 g) | 1090 | 510 | 56.0 | 87 | 19.0 | 96 | 0.0 | ... | 111 | 37 | 6 | 23 | 17 | 36 | 15 | 2 | 25 | 40 |
34 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (La... | 15.4 oz (437 g) | 1050 | 450 | 50.0 | 77 | 16.0 | 81 | 0.0 | ... | 115 | 38 | 7 | 28 | 18 | 35 | 4 | 2 | 25 | 30 |
33 | Breakfast | Big Breakfast with Hotcakes and Egg Whites (Re... | 14.9 oz (423 g) | 990 | 410 | 46.0 | 70 | 16.0 | 78 | 0.0 | ... | 110 | 37 | 6 | 23 | 17 | 35 | 0 | 2 | 25 | 30 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
117 | Beverages | Diet Coke (Child) | 12 fl oz cup | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
116 | Beverages | Diet Coke (Large) | 30 fl oz cup | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
115 | Beverages | Diet Coke (Medium) | 21 fl oz cup | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
114 | Beverages | Diet Coke (Small) | 16 fl oz cup | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
146 | Coffee & Tea | Coffee (Medium) | 16 fl oz cup | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
260 rows × 24 columns
Chaining together methods¶
Possible to apply multiple methods in one line of code…
Careful here…can be super effecient and compact, but you can get carried away and make your code really confusing and hard to understand (even to yourself!)
# sort and then display the last three rows
mcd.sort_values('Total Fat').tail(3)
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
81 | Chicken & Fish | Chicken McNuggets (20 piece) | 11.4 oz (323 g) | 940 | 530 | 59.0 | 91 | 10.0 | 50 | 0.0 | ... | 59 | 20 | 3 | 12 | 0 | 44 | 0 | 8 | 4 | 10 |
32 | Breakfast | Big Breakfast with Hotcakes (Large Biscuit) | 15.3 oz (434 g) | 1150 | 540 | 60.0 | 93 | 20.0 | 100 | 0.0 | ... | 116 | 39 | 7 | 28 | 17 | 36 | 15 | 2 | 30 | 40 |
82 | Chicken & Fish | Chicken McNuggets (40 piece) | 22.8 oz (646 g) | 1880 | 1060 | 118.0 | 182 | 20.0 | 101 | 1.0 | ... | 118 | 39 | 6 | 24 | 1 | 87 | 0 | 15 | 8 | 25 |
3 rows × 24 columns
# another way to achieve the same outcome...
mcd.sort_values('Total Fat').iloc[-3:]
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
81 | Chicken & Fish | Chicken McNuggets (20 piece) | 11.4 oz (323 g) | 940 | 530 | 59.0 | 91 | 10.0 | 50 | 0.0 | ... | 59 | 20 | 3 | 12 | 0 | 44 | 0 | 8 | 4 | 10 |
32 | Breakfast | Big Breakfast with Hotcakes (Large Biscuit) | 15.3 oz (434 g) | 1150 | 540 | 60.0 | 93 | 20.0 | 100 | 0.0 | ... | 116 | 39 | 7 | 28 | 17 | 36 | 15 | 2 | 30 | 40 |
82 | Chicken & Fish | Chicken McNuggets (40 piece) | 22.8 oz (646 g) | 1880 | 1060 | 118.0 | 182 | 20.0 | 101 | 1.0 | ... | 118 | 39 | 6 | 24 | 1 | 87 | 0 | 15 | 8 | 25 |
3 rows × 24 columns
# three highest fat items in descending order
mcd.sort_values('Total Fat', ascending = False).iloc[:3]
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
82 | Chicken & Fish | Chicken McNuggets (40 piece) | 22.8 oz (646 g) | 1880 | 1060 | 118.0 | 182 | 20.0 | 101 | 1.0 | ... | 118 | 39 | 6 | 24 | 1 | 87 | 0 | 15 | 8 | 25 |
32 | Breakfast | Big Breakfast with Hotcakes (Large Biscuit) | 15.3 oz (434 g) | 1150 | 540 | 60.0 | 93 | 20.0 | 100 | 0.0 | ... | 116 | 39 | 7 | 28 | 17 | 36 | 15 | 2 | 30 | 40 |
81 | Chicken & Fish | Chicken McNuggets (20 piece) | 11.4 oz (323 g) | 940 | 530 | 59.0 | 91 | 10.0 | 50 | 0.0 | ... | 59 | 20 | 3 | 12 | 0 | 44 | 0 | 8 | 4 | 10 |
3 rows × 24 columns
Solving simple tasks - examples¶
Find the highest john_index food¶
often times we’re asked to filter data based on some parameters (e.g., marketing tells you to define some index and find all items that fall into that category).
define some index:
john_index = 12*(protein grams + fiber grams)/calories
make it a new column in our dataframe…then we can sort by it (or filter by it)
All we have to do to make a new column is give it a name and give it some values - just like we can make new
key:value
pairs in a dictionary…
# this will define a new column on the fly and compute the john-index
mcd['john_index'] = 12 * (mcd['Protein'] + mcd['Dietary Fiber'])/mcd['Calories']
Note the resulting column has some funny stuff in it!¶
mcd['john_index'].unique()
array([0.84 , 1.056 , 0.58378378, 0.66666667, 0.75 ,
0.8372093 , 0.54782609, 0.50769231, 0.64390244, 0.58723404,
0.3627907 , 0.35 , 0.47058824, 0.44210526, 0.52173913,
0.48461538, 0.55609756, 0.5106383 , 0.6 , 0.66 ,
0.37142857, 0.48 , 0.552 , 0.63870968, 0.69473684,
0.64477612, 0.5027027 , 0.54375 , 0.46238532, 0.44869565,
0.4969697 , 0.37714286, 0.41538462, 0.52 , 0.24 ,
0.23478261, 0.4137931 , 0.46153846, 0.61132075, 0.76153846,
0.8 , 0.78688525, 0.71111111, 0.816 , 0.65 ,
0.70344828, 0.7255814 , 0.71666667, 0.75789474, 0.79090909,
0.66976744, 0.63529412, 1.06285714, 0.69850746, 1.01176471,
0.68852459, 1.04 , 0.64 , 0.89491525, 0.64186047,
0.53333333, 0.61395349, 0.975 , 0.59016393, 0.88 ,
0.57313433, 0.83076923, 0.57777778, 0.94736842, 0.63157895,
0.61276596, 0.59361702, 0.52307692, 1.02857143, 0.85263158,
1.69090909, 1.40689655, 0.52941176, 0.78461538, 0.54545455,
0.72857143, 0.20869565, 0.28235294, 0.25882353, 0.21818182,
1.2 , 0. , 0.4 , 0.288 , 0.225 ,
0.26666667, 0.32727273, 0.24705882, 0.25714286, nan,
inf, 0.96 , 0.92307692, 0.16 , 0.18947368,
0.17142857, 0.08 , 0.06666667, 0.05454545, 0.70588235,
0.68571429, 0.44444444, 0.42352941, 0.44651163, 0.43636364,
0.45714286, 0.57142857, 0.61818182, 1.32 , 0.624 ,
0.65806452, 0.68 , 0.94285714, 0.91764706, 0.98181818,
0.43902439, 0.432 , 0.58461538, 0.4125 , 0.425 ,
0.528 , 0.54193548, 0.40909091, 0.55714286, 0.08571429,
0.06315789, 0.08888889, 0.09230769, 0.096 , 0.1 ,
0.07058824, 0.15 , 0.37241379, 0.34285714, 0.45 ,
0.45517241, 0.49230769, 0.35294118, 0.36521739, 0.4173913 ,
0.48888889, 0.48648649, 0.21333333, 0.21492537, 0.18666667,
0.19636364, 0.19701493, 0.20377358, 0.19047619, 0.20526316,
0.27272727, 0.32307692, 0.31764706, 0.336 , 0.18461538,
0.21176471, 0.2490566 , 0.25454545, 0.26341463, 0.26181818,
0.26086957, 0.25411765, 0.27857143, 0.29142857, 0.29647059,
0.25846154, 0.28387097, 0.27906977, 0.30588235, 0.27826087,
0.34074074, 0.32195122])
Why?¶
inf
== infinity…happens when you divide by zero!for Diet Dr. Pepper, john_index == (1 + 0) / 0
nan
== not-a-number…happens when the thing you try to do isn’t a number (like 0/0)for Coffee and Tea, john_index == (0 + 0) / 0
mcd.head(2)
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | john_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 | 0.840 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 | 1.056 |
2 rows × 25 columns
# import numpy for np.isinf() to weed out the inf entries
import numpy as np
(mcd [ ~ mcd['john_index'].isna() & ~np.isinf(mcd['john_index'])]
.sort_values('john_index', ascending=False) )
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | john_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
86 | Salads | Premium Bacon Ranch Salad with Grilled Chicken | 8.5 oz (241 g) | 220 | 80 | 8.0 | 13 | 4.0 | 20 | 0.0 | ... | 3 | 2 | 10 | 4 | 29 | 110 | 30 | 15 | 8 | 1.690909 |
89 | Salads | Premium Southwest Salad with Grilled Chicken | 11.8 oz (335 g) | 290 | 80 | 8.0 | 13 | 2.5 | 13 | 0.0 | ... | 9 | 7 | 28 | 10 | 27 | 170 | 30 | 15 | 15 | 1.406897 |
163 | Coffee & Tea | Nonfat Latte (Small) | 12 fl oz cup | 100 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 5 | 1 | 3 | 13 | 10 | 10 | 0 | 30 | 0 | 1.320000 |
165 | Coffee & Tea | Nonfat Latte (Large) | 20 fl oz cup | 170 | 0 | 0.5 | 1 | 0.0 | 0 | 0.0 | ... | 8 | 1 | 6 | 21 | 16 | 15 | 0 | 50 | 2 | 1.200000 |
164 | Coffee & Tea | Nonfat Latte (Medium) | 16 fl oz cup | 130 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 6 | 1 | 4 | 16 | 12 | 15 | 0 | 40 | 0 | 1.200000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
113 | Beverages | Coca-Cola Classic (Child) | 12 fl oz cup | 100 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 9 | 0 | 0 | 28 | 0 | 0 | 0 | 0 | 0 | 0.000000 |
112 | Beverages | Coca-Cola Classic (Large) | 30 fl oz cup | 280 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 25 | 0 | 0 | 76 | 0 | 0 | 0 | 0 | 0 | 0.000000 |
110 | Beverages | Coca-Cola Classic (Small) | 16 fl oz cup | 140 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 13 | 0 | 0 | 39 | 0 | 0 | 0 | 0 | 0 | 0.000000 |
101 | Snacks & Sides | Apple Slices | 1.2 oz (34 g) | 15 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 1 | 0 | 0 | 3 | 0 | 0 | 160 | 2 | 0 | 0.000000 |
144 | Coffee & Tea | Sweet Tea (Child) | 12 fl oz cup | 110 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 | ... | 9 | 0 | 0 | 27 | 0 | 0 | 0 | 0 | 0 | 0.000000 |
244 rows × 25 columns
Alternate approach using .replace()
¶
can also directly replace nan/infs with other values
mcd['john_index'].replace(np.nan, 0).unique()
array([0.84 , 1.056 , 0.58378378, 0.66666667, 0.75 ,
0.8372093 , 0.54782609, 0.50769231, 0.64390244, 0.58723404,
0.3627907 , 0.35 , 0.47058824, 0.44210526, 0.52173913,
0.48461538, 0.55609756, 0.5106383 , 0.6 , 0.66 ,
0.37142857, 0.48 , 0.552 , 0.63870968, 0.69473684,
0.64477612, 0.5027027 , 0.54375 , 0.46238532, 0.44869565,
0.4969697 , 0.37714286, 0.41538462, 0.52 , 0.24 ,
0.23478261, 0.4137931 , 0.46153846, 0.61132075, 0.76153846,
0.8 , 0.78688525, 0.71111111, 0.816 , 0.65 ,
0.70344828, 0.7255814 , 0.71666667, 0.75789474, 0.79090909,
0.66976744, 0.63529412, 1.06285714, 0.69850746, 1.01176471,
0.68852459, 1.04 , 0.64 , 0.89491525, 0.64186047,
0.53333333, 0.61395349, 0.975 , 0.59016393, 0.88 ,
0.57313433, 0.83076923, 0.57777778, 0.94736842, 0.63157895,
0.61276596, 0.59361702, 0.52307692, 1.02857143, 0.85263158,
1.69090909, 1.40689655, 0.52941176, 0.78461538, 0.54545455,
0.72857143, 0.20869565, 0.28235294, 0.25882353, 0.21818182,
1.2 , 0. , 0.4 , 0.288 , 0.225 ,
0.26666667, 0.32727273, 0.24705882, 0.25714286, inf,
0.96 , 0.92307692, 0.16 , 0.18947368, 0.17142857,
0.08 , 0.06666667, 0.05454545, 0.70588235, 0.68571429,
0.44444444, 0.42352941, 0.44651163, 0.43636364, 0.45714286,
0.57142857, 0.61818182, 1.32 , 0.624 , 0.65806452,
0.68 , 0.94285714, 0.91764706, 0.98181818, 0.43902439,
0.432 , 0.58461538, 0.4125 , 0.425 , 0.528 ,
0.54193548, 0.40909091, 0.55714286, 0.08571429, 0.06315789,
0.08888889, 0.09230769, 0.096 , 0.1 , 0.07058824,
0.15 , 0.37241379, 0.34285714, 0.45 , 0.45517241,
0.49230769, 0.35294118, 0.36521739, 0.4173913 , 0.48888889,
0.48648649, 0.21333333, 0.21492537, 0.18666667, 0.19636364,
0.19701493, 0.20377358, 0.19047619, 0.20526316, 0.27272727,
0.32307692, 0.31764706, 0.336 , 0.18461538, 0.21176471,
0.2490566 , 0.25454545, 0.26341463, 0.26181818, 0.26086957,
0.25411765, 0.27857143, 0.29142857, 0.29647059, 0.25846154,
0.28387097, 0.27906977, 0.30588235, 0.27826087, 0.34074074,
0.32195122])
mcd['john_index'].replace(np.inf, 0).unique()
array([0.84 , 1.056 , 0.58378378, 0.66666667, 0.75 ,
0.8372093 , 0.54782609, 0.50769231, 0.64390244, 0.58723404,
0.3627907 , 0.35 , 0.47058824, 0.44210526, 0.52173913,
0.48461538, 0.55609756, 0.5106383 , 0.6 , 0.66 ,
0.37142857, 0.48 , 0.552 , 0.63870968, 0.69473684,
0.64477612, 0.5027027 , 0.54375 , 0.46238532, 0.44869565,
0.4969697 , 0.37714286, 0.41538462, 0.52 , 0.24 ,
0.23478261, 0.4137931 , 0.46153846, 0.61132075, 0.76153846,
0.8 , 0.78688525, 0.71111111, 0.816 , 0.65 ,
0.70344828, 0.7255814 , 0.71666667, 0.75789474, 0.79090909,
0.66976744, 0.63529412, 1.06285714, 0.69850746, 1.01176471,
0.68852459, 1.04 , 0.64 , 0.89491525, 0.64186047,
0.53333333, 0.61395349, 0.975 , 0.59016393, 0.88 ,
0.57313433, 0.83076923, 0.57777778, 0.94736842, 0.63157895,
0.61276596, 0.59361702, 0.52307692, 1.02857143, 0.85263158,
1.69090909, 1.40689655, 0.52941176, 0.78461538, 0.54545455,
0.72857143, 0.20869565, 0.28235294, 0.25882353, 0.21818182,
1.2 , 0. , 0.4 , 0.288 , 0.225 ,
0.26666667, 0.32727273, 0.24705882, 0.25714286, nan,
0.96 , 0.92307692, 0.16 , 0.18947368, 0.17142857,
0.08 , 0.06666667, 0.05454545, 0.70588235, 0.68571429,
0.44444444, 0.42352941, 0.44651163, 0.43636364, 0.45714286,
0.57142857, 0.61818182, 1.32 , 0.624 , 0.65806452,
0.68 , 0.94285714, 0.91764706, 0.98181818, 0.43902439,
0.432 , 0.58461538, 0.4125 , 0.425 , 0.528 ,
0.54193548, 0.40909091, 0.55714286, 0.08571429, 0.06315789,
0.08888889, 0.09230769, 0.096 , 0.1 , 0.07058824,
0.15 , 0.37241379, 0.34285714, 0.45 , 0.45517241,
0.49230769, 0.35294118, 0.36521739, 0.4173913 , 0.48888889,
0.48648649, 0.21333333, 0.21492537, 0.18666667, 0.19636364,
0.19701493, 0.20377358, 0.19047619, 0.20526316, 0.27272727,
0.32307692, 0.31764706, 0.336 , 0.18461538, 0.21176471,
0.2490566 , 0.25454545, 0.26341463, 0.26181818, 0.26086957,
0.25411765, 0.27857143, 0.29142857, 0.29647059, 0.25846154,
0.28387097, 0.27906977, 0.30588235, 0.27826087, 0.34074074,
0.32195122])
Or you can fill missing values using fillna…¶
For example, can replace all nans with the mean of all other values…
mcd['john_index'].fillna(mcd['john_index'].mean()).unique()
# remember you need to overwrite mcd or make a new dataframe in order for these changes to 'stick'
array([0.84 , 1.056 , 0.58378378, 0.66666667, 0.75 ,
0.8372093 , 0.54782609, 0.50769231, 0.64390244, 0.58723404,
0.3627907 , 0.35 , 0.47058824, 0.44210526, 0.52173913,
0.48461538, 0.55609756, 0.5106383 , 0.6 , 0.66 ,
0.37142857, 0.48 , 0.552 , 0.63870968, 0.69473684,
0.64477612, 0.5027027 , 0.54375 , 0.46238532, 0.44869565,
0.4969697 , 0.37714286, 0.41538462, 0.52 , 0.24 ,
0.23478261, 0.4137931 , 0.46153846, 0.61132075, 0.76153846,
0.8 , 0.78688525, 0.71111111, 0.816 , 0.65 ,
0.70344828, 0.7255814 , 0.71666667, 0.75789474, 0.79090909,
0.66976744, 0.63529412, 1.06285714, 0.69850746, 1.01176471,
0.68852459, 1.04 , 0.64 , 0.89491525, 0.64186047,
0.53333333, 0.61395349, 0.975 , 0.59016393, 0.88 ,
0.57313433, 0.83076923, 0.57777778, 0.94736842, 0.63157895,
0.61276596, 0.59361702, 0.52307692, 1.02857143, 0.85263158,
1.69090909, 1.40689655, 0.52941176, 0.78461538, 0.54545455,
0.72857143, 0.20869565, 0.28235294, 0.25882353, 0.21818182,
1.2 , 0. , 0.4 , 0.288 , 0.225 ,
0.26666667, 0.32727273, 0.24705882, 0.25714286, inf,
0.96 , 0.92307692, 0.16 , 0.18947368, 0.17142857,
0.08 , 0.06666667, 0.05454545, 0.70588235, 0.68571429,
0.44444444, 0.42352941, 0.44651163, 0.43636364, 0.45714286,
0.57142857, 0.61818182, 1.32 , 0.624 , 0.65806452,
0.68 , 0.94285714, 0.91764706, 0.98181818, 0.43902439,
0.432 , 0.58461538, 0.4125 , 0.425 , 0.528 ,
0.54193548, 0.40909091, 0.55714286, 0.08571429, 0.06315789,
0.08888889, 0.09230769, 0.096 , 0.1 , 0.07058824,
0.15 , 0.37241379, 0.34285714, 0.45 , 0.45517241,
0.49230769, 0.35294118, 0.36521739, 0.4173913 , 0.48888889,
0.48648649, 0.21333333, 0.21492537, 0.18666667, 0.19636364,
0.19701493, 0.20377358, 0.19047619, 0.20526316, 0.27272727,
0.32307692, 0.31764706, 0.336 , 0.18461538, 0.21176471,
0.2490566 , 0.25454545, 0.26341463, 0.26181818, 0.26086957,
0.25411765, 0.27857143, 0.29142857, 0.29647059, 0.25846154,
0.28387097, 0.27906977, 0.30588235, 0.27826087, 0.34074074,
0.32195122])
(mcd[mcd['Calories'] > 0]
.sort_values('john_index', ascending = False)
.head(1))
Category | Item | Serving Size | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | john_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
86 | Salads | Premium Bacon Ranch Salad with Grilled Chicken | 8.5 oz (241 g) | 220 | 80 | 8.0 | 13 | 4.0 | 20 | 0.0 | ... | 3 | 2 | 10 | 4 | 29 | 110 | 30 | 15 | 8 | 1.690909 |
1 rows × 25 columns