P07: Pandas Series and DataFrames

Quick review of modules

pypi.org has 324k packages. From web servers, to machine learning frameworks, to video games, to bitcoin wallets, to recipe managers, to astronomy databases, etc.

There are millions of different classes, functions, constants, in these packages.

  1. we can’t load them all in memory

  2. we can’t ask someone writing the next python library to do X to check all other previous libraries to make sure no one has used the class names they have in mind.

Modules create namespaces, and modules manage what is loaded into memory.

# import numpy as np, then use np.method to interface with methods
import numpy as np
np.mean([9,4,5])
6.0
# import certain functions from a module
from statistics import stdev
stdev([9,4,5])
2.6457513110645907

Import Pandas!

  • DataFrames are an excellent choice if you’re dealing with mixed data types

  • Think of them as excel spreadsheets - can have columns of numbers, strings, etc

  • Powerful built in methods for summarizing and analyzing data

  • Powerful built in methods for cleaning data (removing outliers, missing values, and so forth)

  • When importing pandas, always use pd unless you have a good reason to do otherwise

Pandas Series

Pandas quick start guide for Series

  • A Series is a 1D array that can hold any type of data (numeric types, non-numeric, Python objects and so forth). Think excel spreadsheet with one column.

  • Each entry is labeled with an index that is used to keep track of what each entry is, and the label can be used to lookup the value corresponding to each index during analysis (remember keys in dictionaries? similar idea)

  • These labels are fixed - they will always index the same value unless you explicitly break that link.

  • The list of labels that forms the index can either be declared upon series creation or, by default, it will range from 0 to len(data)-1.

    • If you’re going to use Pandas to organize your data, specifying usable and informative labels is a good idea because that’s one of the main advantages over other data types like lists

Pandas DataFrames

Pandas quick start guide for DataFrames

  • A DataFrame (DF) is a labeled data struture that can be thought of as a 2D extension of the Series objects (think excel spreadsheet with more than one column)

  • A dataframe can accept many types of input, multiple Series, a dictionary of 1D arrays, another DF, etc.

  • Like a Series, DFs contain data values and their corresponding labels. However, because we’re now dealing with a 2D structure, we call the row labels the index argument and the column labels the column argument.

import pandas as pd

Why tabular data?

  • Suppose you want to store and analyze data about multiple attributes/variables like the height and weight of different people

  • We could make three lists: one for the names, one for the heights, and one for the weights

  • Then when we analyze the data, we’d know that names[0] corresponds to heights[0] and weights[0]…but that is pretty clunky and hard to keep track of…

names = ['john', 'sourabh', 'purva', 'pulkit','panayu']
heights = [68, 70, 67, 72, 64]
weight = [180, 170, 150, 200, 145]
print(f'Name: {names[0]}, Height: {heights[0]}, Weight {weight[0]}')
Name: john, Height: 68, Weight 180

A much more effecient approach is to store all of our data in a single dataframe

  • This will allow us to easily access all attributes associated with each person via the index arguments

  • And each attribute will be stored as a single pandas series (column) in the resulting dataframe

df = pd.DataFrame({'name':names, 
                   'height':heights, 
                   'weight': weight})
df
name height weight
0 john 68 180
1 sourabh 70 170
2 purva 67 150
3 pulkit 72 200
4 panayu 64 145

A (potentially) more useful way

  • Note that pandas generated a default list of row labels that go from 0:N-1 where N is the number of rows

  • However, we can also specify more useful row names using the index keyword upon dataframe creation

df = pd.DataFrame({'height':heights, 
                   'weight': weight}, index=names)
df
height weight
john 68 180
sourabh 70 170
purva 67 150
pulkit 72 200
panayu 64 145

Basic indexing: Columns

  • First do it using the column names

  • This works much like you use keys to index into a dictionary

  • Notice that the index arguments (row labels) stay ‘attached’ to the column that we ask for…

df['height']
john       68
sourabh    70
purva      67
pulkit     72
panayu     64
Name: height, dtype: int64
# weight of just the first two people...
df['weight'][0:2]
john       180
sourabh    170
Name: weight, dtype: int64

Can index using the .field syntax

df.height[0:2]
john       68
sourabh    70
Name: height, dtype: int64
# reverse (just like with a list...)
df.weight[::-1]
panayu     145
pulkit     200
purva      150
sourabh    170
john       180
Name: weight, dtype: int64

Basic Indexing: Rows

  • Can also index based on row labels

  • This is a bit more complex as there are several methods that make use of either the actual label or the order of entry in the dataframe…

    • Use df.loc to select a row by its label name

      • Contrary to usual slicing conventions with lists, both the start and the stop indices are included when using the DF.LOC option…see below for demo. This makes sense because you’re indexing by label name, not by a zero-based integer index.

    • Use df.iloc to select a row by its integer location (from 0 to length-1 of the axis)

  • You can use booleans to select a set of rows that satisfy some condition (logical indexing)

# index by row label
df.loc['john':'pulkit']
height weight
john 68 180
sourabh 70 170
purva 67 150
pulkit 72 200
# but this won't return anything because pulkit comes after john...
df.loc['pulkit':'john']
height weight

iloc

  • index by position in dataframe - works just like normal indexing of a list/tuple

# first three people
df.iloc[0:3]
height weight
john 68 180
sourabh 70 170
purva 67 150
# reverse...
df.iloc[::-1]
height weight
panayu 64 145
pulkit 72 200
purva 67 150
sourabh 70 170
john 68 180

Loading csv files using read_csv()

  • Here we will use the mcd-menu.csv file to create a dataframe, which you can download here:

  • mcd-menu.csv

  • Can also find file on slack and on canvas

  • Note that here, I am intentionally using default indices (the numbers 0:N-1) so we don’t have to do so much typing…you’ll see.

mcd = pd.read_csv('mcd-menu.csv')
mcd
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

DataFrame operations

Get basic information about the shape of the dataframe and the contents

# shape - rows by columns, so 260 rows, and 24 columns of data per row
mcd.shape
(260, 24)
# turn the column names into a list...
list(mcd.columns)

# can also do: list(mcd.index) but the index is just default numbers here so not super interesting. 
['Category',
 'Item',
 'Serving Size',
 'Calories',
 'Calories from Fat',
 'Total Fat',
 'Total Fat (% Daily Value)',
 'Saturated Fat',
 'Saturated Fat (% Daily Value)',
 'Trans Fat',
 'Cholesterol',
 'Cholesterol (% Daily Value)',
 'Sodium',
 'Sodium (% Daily Value)',
 'Carbohydrates',
 'Carbohydrates (% Daily Value)',
 'Dietary Fiber',
 'Dietary Fiber (% Daily Value)',
 'Sugars',
 'Protein',
 'Vitamin A (% Daily Value)',
 'Vitamin C (% Daily Value)',
 'Calcium (% Daily Value)',
 'Iron (% Daily Value)']

head and tail

  • see the first few rows of dataframe (head) or the last few (tail)

  • head and tail also take optional inputs to specify exactly how many rows you want to view

  • this is super useful to do when you first create a dataframe to make sure that everything looks right…

# first 3 rows
mcd.head(3)
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10

3 rows × 24 columns

# last 5 rows..
mcd.tail(5)
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

5 rows × 24 columns

Selecting columns - can do just like with out simpler dataframe example above…

  • use column names…

mcd['Calories']
0      300
1      250
2      370
3      450
4      400
      ... 
255    510
256    690
257    340
258    810
259    410
Name: Calories, Length: 260, dtype: int64
# multiple columns - can be non-contiguous
mcd[['Calories', 'Saturated Fat']]
Calories Saturated Fat
0 300 5.0
1 250 3.0
2 370 8.0
3 450 10.0
4 400 8.0
... ... ...
255 510 9.0
256 690 12.0
257 340 6.0
258 810 15.0
259 410 8.0

260 rows × 2 columns

Selecting rows by position

  • can also select by row label using .loc as described above, but since we’re using numerial index labels here it makes sense to use .iloc

# first two rows...
mcd.iloc[0:2]
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8

2 rows × 24 columns

Selecting cells

  • above we selected into specific columns and or rows.

  • we can also select a subset of rows from a specific column, etc…

# print out a few rows so we can see the column names...
mcd.head(2)
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8

2 rows × 24 columns

# this will give us the first entry in the 'Calories' column
mcd['Calories'].iloc[0:10]
0    300
1    250
2    370
3    450
4    400
5    430
6    460
7    520
8    410
9    470
Name: Calories, dtype: int64
# shorthand for above...can leave off the explicit .iloc
mcd['Calories'][0]
300

Getting subset of rows and colums

  • If asking for a non-contiguous set of columns, we can pass in a list defined with []

  • If asking for a contiguous set of columns, we can use :

  • Note: need to be super careful using .loc (e.g., if you sorted the dataframe, then the following code would return all rows between the one labeled 0 and the one labeled 10…and if they are not in ascending order anymore then that might give you something unexpected!)

# first 10 rows, Calories and Total Fat columns
# index + name
mcd.loc[0:10, ['Calories','Total Fat']]
Calories Total Fat
0 300 13.0
1 250 8.0
2 370 23.0
3 450 28.0
4 400 23.0
5 430 23.0
6 460 26.0
7 520 30.0
8 410 20.0
9 470 25.0
10 430 27.0
# first 10 rows, all columns between 'Calories' and 'Total Fat'
mcd.loc[0:10, 'Calories':'Total Fat']
Calories Calories from Fat Total Fat
0 300 120 13.0
1 250 70 8.0
2 370 200 23.0
3 450 250 28.0
4 400 210 23.0
5 430 210 23.0
6 460 230 26.0
7 520 270 30.0
8 410 180 20.0
9 470 220 25.0
10 430 240 27.0

Filtering rows via logical indexing

  • logical indexing (boolean indexing) will filter dataframes based on whether certain conditions are met

  • we did some of this with lists, and the concepts are the same, but it can get a little more complex because now we’re dealing with larger and multi-dimensional data sets.

  • start by filtering the dataframe based on the values of entries in a specific row.

# return a version of the dataframe with just rows
# where 'Calories' are greater than 1000
mcd[ mcd['Calories'] >= 1000 ]
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
31 Breakfast Big Breakfast with Hotcakes (Regular Biscuit) 14.8 oz (420 g) 1090 510 56.0 87 19.0 96 0.0 ... 111 37 6 23 17 36 15 2 25 40
32 Breakfast Big Breakfast with Hotcakes (Large Biscuit) 15.3 oz (434 g) 1150 540 60.0 93 20.0 100 0.0 ... 116 39 7 28 17 36 15 2 30 40
34 Breakfast Big Breakfast with Hotcakes and Egg Whites (La... 15.4 oz (437 g) 1050 450 50.0 77 16.0 81 0.0 ... 115 38 7 28 18 35 4 2 25 30
82 Chicken & Fish Chicken McNuggets (40 piece) 22.8 oz (646 g) 1880 1060 118.0 182 20.0 101 1.0 ... 118 39 6 24 1 87 0 15 8 25

4 rows × 24 columns

# return a version of the dataframe with just rows
# where 'Total Fat' is greater than or equal to 40

# if we want to store the output of this filtering operation
# we need to assign the output (LHS)
fat40 = mcd[ mcd['Total Fat'] >= 40 ]
fat40.head(10)
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
27 Breakfast Big Breakfast (Regular Biscuit) 9.5 oz (269 g) 740 430 48.0 73 17.0 87 0.0 ... 51 17 3 12 3 28 15 2 15 25
28 Breakfast Big Breakfast (Large Biscuit) 10 oz (283 g) 800 470 52.0 80 18.0 90 0.0 ... 56 19 4 17 3 28 15 2 15 30
30 Breakfast Big Breakfast with Egg Whites (Large Biscuit) 10.1 oz (286 g) 690 370 41.0 63 14.0 72 0.0 ... 55 18 4 17 4 26 4 2 10 15
31 Breakfast Big Breakfast with Hotcakes (Regular Biscuit) 14.8 oz (420 g) 1090 510 56.0 87 19.0 96 0.0 ... 111 37 6 23 17 36 15 2 25 40
32 Breakfast Big Breakfast with Hotcakes (Large Biscuit) 15.3 oz (434 g) 1150 540 60.0 93 20.0 100 0.0 ... 116 39 7 28 17 36 15 2 30 40
33 Breakfast Big Breakfast with Hotcakes and Egg Whites (Re... 14.9 oz (423 g) 990 410 46.0 70 16.0 78 0.0 ... 110 37 6 23 17 35 0 2 25 30
34 Breakfast Big Breakfast with Hotcakes and Egg Whites (La... 15.4 oz (437 g) 1050 450 50.0 77 16.0 81 0.0 ... 115 38 7 28 18 35 4 2 25 30
47 Beef & Pork Double Quarter Pounder with Cheese 10 oz (283 g) 750 380 43.0 66 19.0 96 2.5 ... 42 14 3 11 10 48 10 2 30 35
51 Beef & Pork Bacon Clubhouse Burger 9.5 oz (270 g) 720 360 40.0 62 15.0 75 1.5 ... 51 17 4 14 14 39 8 25 30 25
81 Chicken & Fish Chicken McNuggets (20 piece) 11.4 oz (323 g) 940 530 59.0 91 10.0 50 0.0 ... 59 20 3 12 0 44 0 8 4 10

10 rows × 24 columns

This also works on non-numerical categories…

mcd[ mcd['Category'] == 'Breakfast' ]
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
5 Breakfast Steak & Egg McMuffin 6.5 oz (185 g) 430 210 23.0 36 9.0 46 1.0 ... 31 10 4 18 3 26 15 2 30 20
6 Breakfast Bacon, Egg & Cheese Biscuit (Regular Biscuit) 5.3 oz (150 g) 460 230 26.0 40 13.0 65 0.0 ... 38 13 2 7 3 19 10 8 15 15
7 Breakfast Bacon, Egg & Cheese Biscuit (Large Biscuit) 5.8 oz (164 g) 520 270 30.0 47 14.0 68 0.0 ... 43 14 3 12 4 19 15 8 20 20
8 Breakfast Bacon, Egg & Cheese Biscuit with Egg Whites (R... 5.4 oz (153 g) 410 180 20.0 32 11.0 56 0.0 ... 36 12 2 7 3 20 2 8 15 10
9 Breakfast Bacon, Egg & Cheese Biscuit with Egg Whites (L... 5.9 oz (167 g) 470 220 25.0 38 12.0 59 0.0 ... 42 14 3 12 4 20 6 8 15 15
10 Breakfast Sausage Biscuit (Regular Biscuit) 4.1 oz (117 g) 430 240 27.0 42 12.0 62 0.0 ... 34 11 2 6 2 11 0 0 6 15
11 Breakfast Sausage Biscuit (Large Biscuit) 4.6 oz (131 g) 480 280 31.0 48 13.0 65 0.0 ... 39 13 3 11 3 11 4 0 8 15
12 Breakfast Sausage Biscuit with Egg (Regular Biscuit) 5.7 oz (163 g) 510 290 33.0 50 14.0 71 0.0 ... 36 12 2 6 2 18 6 0 10 20
13 Breakfast Sausage Biscuit with Egg (Large Biscuit) 6.2 oz (177 g) 570 330 37.0 57 15.0 74 0.0 ... 42 14 3 11 3 18 10 0 10 20
14 Breakfast Sausage Biscuit with Egg Whites (Regular Biscuit) 5.9 oz (167 g) 460 250 27.0 42 12.0 62 0.0 ... 34 11 2 6 3 18 0 0 8 15
15 Breakfast Sausage Biscuit with Egg Whites (Large Biscuit) 6.4 oz (181 g) 520 280 32.0 49 13.0 65 0.0 ... 40 13 3 11 3 18 4 0 8 15
16 Breakfast Southern Style Chicken Biscuit (Regular Biscuit) 5 oz (143 g) 410 180 20.0 31 8.0 41 0.0 ... 41 14 2 6 3 17 0 2 6 15
17 Breakfast Southern Style Chicken Biscuit (Large Biscuit) 5.5 oz (157 g) 470 220 24.0 37 9.0 45 0.0 ... 46 15 3 11 4 17 4 2 8 15
18 Breakfast Steak & Egg Biscuit (Regular Biscuit) 7.1 oz (201 g) 540 290 32.0 49 16.0 78 1.0 ... 38 13 2 8 3 25 10 2 20 25
19 Breakfast Bacon, Egg & Cheese McGriddles 6.1 oz (174 g) 460 190 21.0 32 9.0 44 0.0 ... 48 16 2 9 15 19 10 10 20 15
20 Breakfast Bacon, Egg & Cheese McGriddles with Egg Whites 6.3 oz (178 g) 400 140 15.0 24 7.0 34 0.0 ... 47 16 2 9 16 20 2 10 15 10
21 Breakfast Sausage McGriddles 5 oz (141 g) 420 200 22.0 34 8.0 40 0.0 ... 44 15 2 8 15 11 0 0 8 10
22 Breakfast Sausage, Egg & Cheese McGriddles 7.1 oz (201 g) 550 280 31.0 48 12.0 61 0.0 ... 48 16 2 9 15 20 10 0 20 15
23 Breakfast Sausage, Egg & Cheese McGriddles with Egg Whites 7.2 oz (205 g) 500 230 26.0 40 10.0 52 0.0 ... 46 15 2 9 15 21 2 0 20 10
24 Breakfast Bacon, Egg & Cheese Bagel 6.9 oz (197 g) 620 280 31.0 48 11.0 56 0.5 ... 57 19 3 11 7 30 20 15 20 20
25 Breakfast Bacon, Egg & Cheese Bagel with Egg Whites 7.1 oz (201 g) 570 230 25.0 39 9.0 45 0.5 ... 55 18 3 12 8 30 10 15 20 15
26 Breakfast Steak, Egg & Cheese Bagel 8.5 oz (241 g) 670 310 35.0 53 13.0 63 1.5 ... 56 19 3 12 7 33 20 4 25 25
27 Breakfast Big Breakfast (Regular Biscuit) 9.5 oz (269 g) 740 430 48.0 73 17.0 87 0.0 ... 51 17 3 12 3 28 15 2 15 25
28 Breakfast Big Breakfast (Large Biscuit) 10 oz (283 g) 800 470 52.0 80 18.0 90 0.0 ... 56 19 4 17 3 28 15 2 15 30
29 Breakfast Big Breakfast with Egg Whites (Regular Biscuit) 9.6 oz (272 g) 640 330 37.0 57 14.0 69 0.0 ... 50 17 3 12 3 26 0 2 10 15
30 Breakfast Big Breakfast with Egg Whites (Large Biscuit) 10.1 oz (286 g) 690 370 41.0 63 14.0 72 0.0 ... 55 18 4 17 4 26 4 2 10 15
31 Breakfast Big Breakfast with Hotcakes (Regular Biscuit) 14.8 oz (420 g) 1090 510 56.0 87 19.0 96 0.0 ... 111 37 6 23 17 36 15 2 25 40
32 Breakfast Big Breakfast with Hotcakes (Large Biscuit) 15.3 oz (434 g) 1150 540 60.0 93 20.0 100 0.0 ... 116 39 7 28 17 36 15 2 30 40
33 Breakfast Big Breakfast with Hotcakes and Egg Whites (Re... 14.9 oz (423 g) 990 410 46.0 70 16.0 78 0.0 ... 110 37 6 23 17 35 0 2 25 30
34 Breakfast Big Breakfast with Hotcakes and Egg Whites (La... 15.4 oz (437 g) 1050 450 50.0 77 16.0 81 0.0 ... 115 38 7 28 18 35 4 2 25 30
35 Breakfast Hotcakes 5.3 oz (151 g) 350 80 9.0 13 2.0 9 0.0 ... 60 20 3 10 14 8 0 0 15 15
36 Breakfast Hotcakes and Sausage 6.8 oz (192 g) 520 210 24.0 37 7.0 36 0.0 ... 61 20 3 10 14 15 0 0 15 15
37 Breakfast Sausage Burrito 3.9 oz (111 g) 300 150 16.0 25 7.0 33 0.0 ... 26 9 1 5 2 12 10 2 15 15
38 Breakfast Hash Brown 2 oz (56 g) 150 80 9.0 14 1.5 6 0.0 ... 15 5 2 6 0 1 0 2 0 2
39 Breakfast Cinnamon Melts 4 oz (114 g) 460 170 19.0 30 9.0 43 0.0 ... 66 22 3 11 32 6 4 0 6 15
40 Breakfast Fruit & Maple Oatmeal 9.6 oz (251 g) 290 35 4.0 6 1.5 8 0.0 ... 58 19 5 19 32 5 2 130 10 10
41 Breakfast Fruit & Maple Oatmeal without Brown Sugar 9.6 oz (251 g) 260 40 4.0 6 1.5 8 0.0 ... 49 16 5 22 18 5 2 130 6 10

42 rows × 24 columns

And we chain together multiple conditions as well…

  • Example: all entries where Category is Breakfast, Calories greater than 300, and Calories less than 500

mcd[ (mcd['Category'] == 'Breakfast') & 
  (mcd['Calories'] > 300) & 
  (mcd['Calories'] < 500) ]
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
5 Breakfast Steak & Egg McMuffin 6.5 oz (185 g) 430 210 23.0 36 9.0 46 1.0 ... 31 10 4 18 3 26 15 2 30 20
6 Breakfast Bacon, Egg & Cheese Biscuit (Regular Biscuit) 5.3 oz (150 g) 460 230 26.0 40 13.0 65 0.0 ... 38 13 2 7 3 19 10 8 15 15
8 Breakfast Bacon, Egg & Cheese Biscuit with Egg Whites (R... 5.4 oz (153 g) 410 180 20.0 32 11.0 56 0.0 ... 36 12 2 7 3 20 2 8 15 10
9 Breakfast Bacon, Egg & Cheese Biscuit with Egg Whites (L... 5.9 oz (167 g) 470 220 25.0 38 12.0 59 0.0 ... 42 14 3 12 4 20 6 8 15 15
10 Breakfast Sausage Biscuit (Regular Biscuit) 4.1 oz (117 g) 430 240 27.0 42 12.0 62 0.0 ... 34 11 2 6 2 11 0 0 6 15
11 Breakfast Sausage Biscuit (Large Biscuit) 4.6 oz (131 g) 480 280 31.0 48 13.0 65 0.0 ... 39 13 3 11 3 11 4 0 8 15
14 Breakfast Sausage Biscuit with Egg Whites (Regular Biscuit) 5.9 oz (167 g) 460 250 27.0 42 12.0 62 0.0 ... 34 11 2 6 3 18 0 0 8 15
16 Breakfast Southern Style Chicken Biscuit (Regular Biscuit) 5 oz (143 g) 410 180 20.0 31 8.0 41 0.0 ... 41 14 2 6 3 17 0 2 6 15
17 Breakfast Southern Style Chicken Biscuit (Large Biscuit) 5.5 oz (157 g) 470 220 24.0 37 9.0 45 0.0 ... 46 15 3 11 4 17 4 2 8 15
19 Breakfast Bacon, Egg & Cheese McGriddles 6.1 oz (174 g) 460 190 21.0 32 9.0 44 0.0 ... 48 16 2 9 15 19 10 10 20 15
20 Breakfast Bacon, Egg & Cheese McGriddles with Egg Whites 6.3 oz (178 g) 400 140 15.0 24 7.0 34 0.0 ... 47 16 2 9 16 20 2 10 15 10
21 Breakfast Sausage McGriddles 5 oz (141 g) 420 200 22.0 34 8.0 40 0.0 ... 44 15 2 8 15 11 0 0 8 10
35 Breakfast Hotcakes 5.3 oz (151 g) 350 80 9.0 13 2.0 9 0.0 ... 60 20 3 10 14 8 0 0 15 15
39 Breakfast Cinnamon Melts 4 oz (114 g) 460 170 19.0 30 9.0 43 0.0 ... 66 22 3 11 32 6 4 0 6 15

17 rows × 24 columns

How does this work?

  • If we take a look at the filtering conditionals, they will determine all rows where the given condition is met

  • If they are met, you’ll get a True value returned, and if they are not met, you’ll get a False value returned…

  • Take a look if we just specify the conditionals…

  • Note that the length of the output matches the length of our dataframe (the number of rows)…i.e., every row has either a True or a False

(mcd['Category'] == 'Breakfast') & (mcd['Calories'] > 300)
0      False
1      False
2       True
3       True
4       True
       ...  
255    False
256    False
257    False
258    False
259    False
Length: 260, dtype: bool
# now lets assign this list of booleans to another variable
index = (mcd['Category'] == 'Breakfast') & (mcd['Calories'] > 300)
# and by substitution, we get...
mcd [ index ]
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
5 Breakfast Steak & Egg McMuffin 6.5 oz (185 g) 430 210 23.0 36 9.0 46 1.0 ... 31 10 4 18 3 26 15 2 30 20
6 Breakfast Bacon, Egg & Cheese Biscuit (Regular Biscuit) 5.3 oz (150 g) 460 230 26.0 40 13.0 65 0.0 ... 38 13 2 7 3 19 10 8 15 15
7 Breakfast Bacon, Egg & Cheese Biscuit (Large Biscuit) 5.8 oz (164 g) 520 270 30.0 47 14.0 68 0.0 ... 43 14 3 12 4 19 15 8 20 20
8 Breakfast Bacon, Egg & Cheese Biscuit with Egg Whites (R... 5.4 oz (153 g) 410 180 20.0 32 11.0 56 0.0 ... 36 12 2 7 3 20 2 8 15 10
9 Breakfast Bacon, Egg & Cheese Biscuit with Egg Whites (L... 5.9 oz (167 g) 470 220 25.0 38 12.0 59 0.0 ... 42 14 3 12 4 20 6 8 15 15
10 Breakfast Sausage Biscuit (Regular Biscuit) 4.1 oz (117 g) 430 240 27.0 42 12.0 62 0.0 ... 34 11 2 6 2 11 0 0 6 15
11 Breakfast Sausage Biscuit (Large Biscuit) 4.6 oz (131 g) 480 280 31.0 48 13.0 65 0.0 ... 39 13 3 11 3 11 4 0 8 15
12 Breakfast Sausage Biscuit with Egg (Regular Biscuit) 5.7 oz (163 g) 510 290 33.0 50 14.0 71 0.0 ... 36 12 2 6 2 18 6 0 10 20
13 Breakfast Sausage Biscuit with Egg (Large Biscuit) 6.2 oz (177 g) 570 330 37.0 57 15.0 74 0.0 ... 42 14 3 11 3 18 10 0 10 20
14 Breakfast Sausage Biscuit with Egg Whites (Regular Biscuit) 5.9 oz (167 g) 460 250 27.0 42 12.0 62 0.0 ... 34 11 2 6 3 18 0 0 8 15
15 Breakfast Sausage Biscuit with Egg Whites (Large Biscuit) 6.4 oz (181 g) 520 280 32.0 49 13.0 65 0.0 ... 40 13 3 11 3 18 4 0 8 15
16 Breakfast Southern Style Chicken Biscuit (Regular Biscuit) 5 oz (143 g) 410 180 20.0 31 8.0 41 0.0 ... 41 14 2 6 3 17 0 2 6 15
17 Breakfast Southern Style Chicken Biscuit (Large Biscuit) 5.5 oz (157 g) 470 220 24.0 37 9.0 45 0.0 ... 46 15 3 11 4 17 4 2 8 15
18 Breakfast Steak & Egg Biscuit (Regular Biscuit) 7.1 oz (201 g) 540 290 32.0 49 16.0 78 1.0 ... 38 13 2 8 3 25 10 2 20 25
19 Breakfast Bacon, Egg & Cheese McGriddles 6.1 oz (174 g) 460 190 21.0 32 9.0 44 0.0 ... 48 16 2 9 15 19 10 10 20 15
20 Breakfast Bacon, Egg & Cheese McGriddles with Egg Whites 6.3 oz (178 g) 400 140 15.0 24 7.0 34 0.0 ... 47 16 2 9 16 20 2 10 15 10
21 Breakfast Sausage McGriddles 5 oz (141 g) 420 200 22.0 34 8.0 40 0.0 ... 44 15 2 8 15 11 0 0 8 10
22 Breakfast Sausage, Egg & Cheese McGriddles 7.1 oz (201 g) 550 280 31.0 48 12.0 61 0.0 ... 48 16 2 9 15 20 10 0 20 15
23 Breakfast Sausage, Egg & Cheese McGriddles with Egg Whites 7.2 oz (205 g) 500 230 26.0 40 10.0 52 0.0 ... 46 15 2 9 15 21 2 0 20 10
24 Breakfast Bacon, Egg & Cheese Bagel 6.9 oz (197 g) 620 280 31.0 48 11.0 56 0.5 ... 57 19 3 11 7 30 20 15 20 20
25 Breakfast Bacon, Egg & Cheese Bagel with Egg Whites 7.1 oz (201 g) 570 230 25.0 39 9.0 45 0.5 ... 55 18 3 12 8 30 10 15 20 15
26 Breakfast Steak, Egg & Cheese Bagel 8.5 oz (241 g) 670 310 35.0 53 13.0 63 1.5 ... 56 19 3 12 7 33 20 4 25 25
27 Breakfast Big Breakfast (Regular Biscuit) 9.5 oz (269 g) 740 430 48.0 73 17.0 87 0.0 ... 51 17 3 12 3 28 15 2 15 25
28 Breakfast Big Breakfast (Large Biscuit) 10 oz (283 g) 800 470 52.0 80 18.0 90 0.0 ... 56 19 4 17 3 28 15 2 15 30
29 Breakfast Big Breakfast with Egg Whites (Regular Biscuit) 9.6 oz (272 g) 640 330 37.0 57 14.0 69 0.0 ... 50 17 3 12 3 26 0 2 10 15
30 Breakfast Big Breakfast with Egg Whites (Large Biscuit) 10.1 oz (286 g) 690 370 41.0 63 14.0 72 0.0 ... 55 18 4 17 4 26 4 2 10 15
31 Breakfast Big Breakfast with Hotcakes (Regular Biscuit) 14.8 oz (420 g) 1090 510 56.0 87 19.0 96 0.0 ... 111 37 6 23 17 36 15 2 25 40
32 Breakfast Big Breakfast with Hotcakes (Large Biscuit) 15.3 oz (434 g) 1150 540 60.0 93 20.0 100 0.0 ... 116 39 7 28 17 36 15 2 30 40
33 Breakfast Big Breakfast with Hotcakes and Egg Whites (Re... 14.9 oz (423 g) 990 410 46.0 70 16.0 78 0.0 ... 110 37 6 23 17 35 0 2 25 30
34 Breakfast Big Breakfast with Hotcakes and Egg Whites (La... 15.4 oz (437 g) 1050 450 50.0 77 16.0 81 0.0 ... 115 38 7 28 18 35 4 2 25 30
35 Breakfast Hotcakes 5.3 oz (151 g) 350 80 9.0 13 2.0 9 0.0 ... 60 20 3 10 14 8 0 0 15 15
36 Breakfast Hotcakes and Sausage 6.8 oz (192 g) 520 210 24.0 37 7.0 36 0.0 ... 61 20 3 10 14 15 0 0 15 15
39 Breakfast Cinnamon Melts 4 oz (114 g) 460 170 19.0 30 9.0 43 0.0 ... 66 22 3 11 32 6 4 0 6 15

36 rows × 24 columns

Review of logical indexing…

  • make conjunctions by combining arrays of logical values with & (and operations)

  • make disjunctions with | (or operations)

  • and we need to put comparison operations in () before combining

Other handy methods…

Basic math operations…

  • describe is a handy way to get summary stats…

  • mean, std, etc…

mcd.describe()
Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat Cholesterol Cholesterol (% Daily Value) Sodium ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
count 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 ... 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000 260.000000
mean 368.269231 127.096154 14.165385 21.815385 6.007692 29.965385 0.203846 54.942308 18.392308 495.750000 ... 47.346154 15.780769 1.630769 6.530769 29.423077 13.338462 13.426923 8.534615 20.973077 7.734615
std 240.269886 127.875914 14.205998 21.885199 5.321873 26.639209 0.429133 87.269257 29.091653 577.026323 ... 28.252232 9.419544 1.567717 6.307057 28.679797 11.426146 24.366381 26.345542 17.019953 8.723263
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 210.000000 20.000000 2.375000 3.750000 1.000000 4.750000 0.000000 5.000000 2.000000 107.500000 ... 30.000000 10.000000 0.000000 0.000000 5.750000 4.000000 2.000000 0.000000 6.000000 0.000000
50% 340.000000 100.000000 11.000000 17.000000 5.000000 24.000000 0.000000 35.000000 11.000000 190.000000 ... 44.000000 15.000000 1.000000 5.000000 17.500000 12.000000 8.000000 0.000000 20.000000 4.000000
75% 500.000000 200.000000 22.250000 35.000000 10.000000 48.000000 0.000000 65.000000 21.250000 865.000000 ... 60.000000 20.000000 3.000000 10.000000 48.000000 19.000000 15.000000 4.000000 30.000000 15.000000
max 1880.000000 1060.000000 118.000000 182.000000 20.000000 102.000000 2.500000 575.000000 192.000000 3600.000000 ... 141.000000 47.000000 7.000000 28.000000 128.000000 87.000000 170.000000 240.000000 70.000000 40.000000

8 rows × 21 columns

# mean Calories...
mcd.Calories.mean()

# or
# mcd['Calories'].mean()
368.2692307692308
# standard deviation
mcd.Calories.std()
240.26988649145852

Sorting entire dataframe based on values in one column

  • here we can easily find the menu item with the fewest calories

  • sort by the ‘Calories’ column, then pull out the name of the item

  • default behavior is to not sort in place, so we need to reassign!

# sort - ascending order by default
mcd.sort_values('Calories')
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
125 Beverages Diet Dr Pepper (Child) 12 fl oz cup 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 1 0 0 0 0
138 Coffee & Tea Iced Tea (Medium) 21 fl oz cup 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 0 0 0 0 0
137 Coffee & Tea Iced Tea (Small) 16 fl oz cup 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 0 0 0 0 0
136 Beverages Dasani Water Bottle 16.9 fl oz 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 0 0 0 0 0
145 Coffee & Tea Coffee (Small) 12 fl oz cup 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
33 Breakfast Big Breakfast with Hotcakes and Egg Whites (Re... 14.9 oz (423 g) 990 410 46.0 70 16.0 78 0.0 ... 110 37 6 23 17 35 0 2 25 30
34 Breakfast Big Breakfast with Hotcakes and Egg Whites (La... 15.4 oz (437 g) 1050 450 50.0 77 16.0 81 0.0 ... 115 38 7 28 18 35 4 2 25 30
31 Breakfast Big Breakfast with Hotcakes (Regular Biscuit) 14.8 oz (420 g) 1090 510 56.0 87 19.0 96 0.0 ... 111 37 6 23 17 36 15 2 25 40
32 Breakfast Big Breakfast with Hotcakes (Large Biscuit) 15.3 oz (434 g) 1150 540 60.0 93 20.0 100 0.0 ... 116 39 7 28 17 36 15 2 30 40
82 Chicken & Fish Chicken McNuggets (40 piece) 22.8 oz (646 g) 1880 1060 118.0 182 20.0 101 1.0 ... 118 39 6 24 1 87 0 15 8 25

260 rows × 24 columns

# descending order (ascending = False)
mcd.sort_values('Calories', ascending = False)
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
82 Chicken & Fish Chicken McNuggets (40 piece) 22.8 oz (646 g) 1880 1060 118.0 182 20.0 101 1.0 ... 118 39 6 24 1 87 0 15 8 25
32 Breakfast Big Breakfast with Hotcakes (Large Biscuit) 15.3 oz (434 g) 1150 540 60.0 93 20.0 100 0.0 ... 116 39 7 28 17 36 15 2 30 40
31 Breakfast Big Breakfast with Hotcakes (Regular Biscuit) 14.8 oz (420 g) 1090 510 56.0 87 19.0 96 0.0 ... 111 37 6 23 17 36 15 2 25 40
34 Breakfast Big Breakfast with Hotcakes and Egg Whites (La... 15.4 oz (437 g) 1050 450 50.0 77 16.0 81 0.0 ... 115 38 7 28 18 35 4 2 25 30
33 Breakfast Big Breakfast with Hotcakes and Egg Whites (Re... 14.9 oz (423 g) 990 410 46.0 70 16.0 78 0.0 ... 110 37 6 23 17 35 0 2 25 30
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
117 Beverages Diet Coke (Child) 12 fl oz cup 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 0 0 0 0 0
116 Beverages Diet Coke (Large) 30 fl oz cup 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 0 0 0 0 0
115 Beverages Diet Coke (Medium) 21 fl oz cup 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 0 0 0 0 0
114 Beverages Diet Coke (Small) 16 fl oz cup 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 0 0 0 0 0
146 Coffee & Tea Coffee (Medium) 16 fl oz cup 0 0 0.0 0 0.0 0 0.0 ... 0 0 0 0 0 0 0 0 0 0

260 rows × 24 columns

Chaining together methods

  • Possible to apply multiple methods in one line of code…

  • Careful here…can be super effecient and compact, but you can get carried away and make your code really confusing and hard to understand (even to yourself!)

# sort and then display the last three rows
mcd.sort_values('Total Fat').tail(3)
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
81 Chicken & Fish Chicken McNuggets (20 piece) 11.4 oz (323 g) 940 530 59.0 91 10.0 50 0.0 ... 59 20 3 12 0 44 0 8 4 10
32 Breakfast Big Breakfast with Hotcakes (Large Biscuit) 15.3 oz (434 g) 1150 540 60.0 93 20.0 100 0.0 ... 116 39 7 28 17 36 15 2 30 40
82 Chicken & Fish Chicken McNuggets (40 piece) 22.8 oz (646 g) 1880 1060 118.0 182 20.0 101 1.0 ... 118 39 6 24 1 87 0 15 8 25

3 rows × 24 columns

# another way to achieve the same outcome...
mcd.sort_values('Total Fat').iloc[-3:]
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
81 Chicken & Fish Chicken McNuggets (20 piece) 11.4 oz (323 g) 940 530 59.0 91 10.0 50 0.0 ... 59 20 3 12 0 44 0 8 4 10
32 Breakfast Big Breakfast with Hotcakes (Large Biscuit) 15.3 oz (434 g) 1150 540 60.0 93 20.0 100 0.0 ... 116 39 7 28 17 36 15 2 30 40
82 Chicken & Fish Chicken McNuggets (40 piece) 22.8 oz (646 g) 1880 1060 118.0 182 20.0 101 1.0 ... 118 39 6 24 1 87 0 15 8 25

3 rows × 24 columns

# three highest fat items in descending order
mcd.sort_values('Total Fat', ascending = False).iloc[:3]
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
82 Chicken & Fish Chicken McNuggets (40 piece) 22.8 oz (646 g) 1880 1060 118.0 182 20.0 101 1.0 ... 118 39 6 24 1 87 0 15 8 25
32 Breakfast Big Breakfast with Hotcakes (Large Biscuit) 15.3 oz (434 g) 1150 540 60.0 93 20.0 100 0.0 ... 116 39 7 28 17 36 15 2 30 40
81 Chicken & Fish Chicken McNuggets (20 piece) 11.4 oz (323 g) 940 530 59.0 91 10.0 50 0.0 ... 59 20 3 12 0 44 0 8 4 10

3 rows × 24 columns

Solving simple tasks - examples

Find the breakfast menu item with the fewest calories

mcd[ mcd['Category'] == 'Breakfast'].sort_values('Calories').head(1)
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
38 Breakfast Hash Brown 2 oz (56 g) 150 80 9.0 14 1.5 6 0.0 ... 15 5 2 6 0 1 0 2 0 2

1 rows × 24 columns

Find the highest john_index food

  • often times we’re asked to filter data based on some parameters (e.g., marketing tells you to define some index and find all items that fall into that category).

  • define some index: john_index = 12*(protein grams + fiber grams)/calories

  • make it a new column in our dataframe…then we can sort by it (or filter by it)

  • All we have to do to make a new column is give it a name and give it some values - just like we can make new key:value pairs in a dictionary…

# this will define a new column on the fly and compute the john-index
mcd['john_index'] = 12 * (mcd['Protein'] + mcd['Dietary Fiber'])/mcd['Calories']

Note the resulting column has some funny stuff in it!

mcd['john_index'].unique()
array([0.84      , 1.056     , 0.58378378, 0.66666667, 0.75      ,
       0.8372093 , 0.54782609, 0.50769231, 0.64390244, 0.58723404,
       0.3627907 , 0.35      , 0.47058824, 0.44210526, 0.52173913,
       0.48461538, 0.55609756, 0.5106383 , 0.6       , 0.66      ,
       0.37142857, 0.48      , 0.552     , 0.63870968, 0.69473684,
       0.64477612, 0.5027027 , 0.54375   , 0.46238532, 0.44869565,
       0.4969697 , 0.37714286, 0.41538462, 0.52      , 0.24      ,
       0.23478261, 0.4137931 , 0.46153846, 0.61132075, 0.76153846,
       0.8       , 0.78688525, 0.71111111, 0.816     , 0.65      ,
       0.70344828, 0.7255814 , 0.71666667, 0.75789474, 0.79090909,
       0.66976744, 0.63529412, 1.06285714, 0.69850746, 1.01176471,
       0.68852459, 1.04      , 0.64      , 0.89491525, 0.64186047,
       0.53333333, 0.61395349, 0.975     , 0.59016393, 0.88      ,
       0.57313433, 0.83076923, 0.57777778, 0.94736842, 0.63157895,
       0.61276596, 0.59361702, 0.52307692, 1.02857143, 0.85263158,
       1.69090909, 1.40689655, 0.52941176, 0.78461538, 0.54545455,
       0.72857143, 0.20869565, 0.28235294, 0.25882353, 0.21818182,
       1.2       , 0.        , 0.4       , 0.288     , 0.225     ,
       0.26666667, 0.32727273, 0.24705882, 0.25714286,        nan,
              inf, 0.96      , 0.92307692, 0.16      , 0.18947368,
       0.17142857, 0.08      , 0.06666667, 0.05454545, 0.70588235,
       0.68571429, 0.44444444, 0.42352941, 0.44651163, 0.43636364,
       0.45714286, 0.57142857, 0.61818182, 1.32      , 0.624     ,
       0.65806452, 0.68      , 0.94285714, 0.91764706, 0.98181818,
       0.43902439, 0.432     , 0.58461538, 0.4125    , 0.425     ,
       0.528     , 0.54193548, 0.40909091, 0.55714286, 0.08571429,
       0.06315789, 0.08888889, 0.09230769, 0.096     , 0.1       ,
       0.07058824, 0.15      , 0.37241379, 0.34285714, 0.45      ,
       0.45517241, 0.49230769, 0.35294118, 0.36521739, 0.4173913 ,
       0.48888889, 0.48648649, 0.21333333, 0.21492537, 0.18666667,
       0.19636364, 0.19701493, 0.20377358, 0.19047619, 0.20526316,
       0.27272727, 0.32307692, 0.31764706, 0.336     , 0.18461538,
       0.21176471, 0.2490566 , 0.25454545, 0.26341463, 0.26181818,
       0.26086957, 0.25411765, 0.27857143, 0.29142857, 0.29647059,
       0.25846154, 0.28387097, 0.27906977, 0.30588235, 0.27826087,
       0.34074074, 0.32195122])

Why?

  • inf == infinity…happens when you divide by zero!

    • for Diet Dr. Pepper, john_index == (1 + 0) / 0

  • nan == not-a-number…happens when the thing you try to do isn’t a number (like 0/0)

    • for Coffee and Tea, john_index == (0 + 0) / 0

mcd.head(2)
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value) john_index
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 10 4 17 3 17 10 0 25 15 0.840
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 10 4 17 3 18 6 0 25 8 1.056

2 rows × 25 columns

# import numpy for np.isinf() to weed out the inf entries
import numpy as np
(mcd [ ~ mcd['john_index'].isna() & ~np.isinf(mcd['john_index'])]
 .sort_values('john_index', ascending=False) )
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value) john_index
86 Salads Premium Bacon Ranch Salad with Grilled Chicken 8.5 oz (241 g) 220 80 8.0 13 4.0 20 0.0 ... 3 2 10 4 29 110 30 15 8 1.690909
89 Salads Premium Southwest Salad with Grilled Chicken 11.8 oz (335 g) 290 80 8.0 13 2.5 13 0.0 ... 9 7 28 10 27 170 30 15 15 1.406897
163 Coffee & Tea Nonfat Latte (Small) 12 fl oz cup 100 0 0.0 0 0.0 0 0.0 ... 5 1 3 13 10 10 0 30 0 1.320000
165 Coffee & Tea Nonfat Latte (Large) 20 fl oz cup 170 0 0.5 1 0.0 0 0.0 ... 8 1 6 21 16 15 0 50 2 1.200000
164 Coffee & Tea Nonfat Latte (Medium) 16 fl oz cup 130 0 0.0 0 0.0 0 0.0 ... 6 1 4 16 12 15 0 40 0 1.200000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
113 Beverages Coca-Cola Classic (Child) 12 fl oz cup 100 0 0.0 0 0.0 0 0.0 ... 9 0 0 28 0 0 0 0 0 0.000000
112 Beverages Coca-Cola Classic (Large) 30 fl oz cup 280 0 0.0 0 0.0 0 0.0 ... 25 0 0 76 0 0 0 0 0 0.000000
110 Beverages Coca-Cola Classic (Small) 16 fl oz cup 140 0 0.0 0 0.0 0 0.0 ... 13 0 0 39 0 0 0 0 0 0.000000
101 Snacks & Sides Apple Slices 1.2 oz (34 g) 15 0 0.0 0 0.0 0 0.0 ... 1 0 0 3 0 0 160 2 0 0.000000
144 Coffee & Tea Sweet Tea (Child) 12 fl oz cup 110 0 0.0 0 0.0 0 0.0 ... 9 0 0 27 0 0 0 0 0 0.000000

244 rows × 25 columns

Alternate approach using .replace()

  • can also directly replace nan/infs with other values

mcd['john_index'].replace(np.nan, 0).unique()
array([0.84      , 1.056     , 0.58378378, 0.66666667, 0.75      ,
       0.8372093 , 0.54782609, 0.50769231, 0.64390244, 0.58723404,
       0.3627907 , 0.35      , 0.47058824, 0.44210526, 0.52173913,
       0.48461538, 0.55609756, 0.5106383 , 0.6       , 0.66      ,
       0.37142857, 0.48      , 0.552     , 0.63870968, 0.69473684,
       0.64477612, 0.5027027 , 0.54375   , 0.46238532, 0.44869565,
       0.4969697 , 0.37714286, 0.41538462, 0.52      , 0.24      ,
       0.23478261, 0.4137931 , 0.46153846, 0.61132075, 0.76153846,
       0.8       , 0.78688525, 0.71111111, 0.816     , 0.65      ,
       0.70344828, 0.7255814 , 0.71666667, 0.75789474, 0.79090909,
       0.66976744, 0.63529412, 1.06285714, 0.69850746, 1.01176471,
       0.68852459, 1.04      , 0.64      , 0.89491525, 0.64186047,
       0.53333333, 0.61395349, 0.975     , 0.59016393, 0.88      ,
       0.57313433, 0.83076923, 0.57777778, 0.94736842, 0.63157895,
       0.61276596, 0.59361702, 0.52307692, 1.02857143, 0.85263158,
       1.69090909, 1.40689655, 0.52941176, 0.78461538, 0.54545455,
       0.72857143, 0.20869565, 0.28235294, 0.25882353, 0.21818182,
       1.2       , 0.        , 0.4       , 0.288     , 0.225     ,
       0.26666667, 0.32727273, 0.24705882, 0.25714286,        inf,
       0.96      , 0.92307692, 0.16      , 0.18947368, 0.17142857,
       0.08      , 0.06666667, 0.05454545, 0.70588235, 0.68571429,
       0.44444444, 0.42352941, 0.44651163, 0.43636364, 0.45714286,
       0.57142857, 0.61818182, 1.32      , 0.624     , 0.65806452,
       0.68      , 0.94285714, 0.91764706, 0.98181818, 0.43902439,
       0.432     , 0.58461538, 0.4125    , 0.425     , 0.528     ,
       0.54193548, 0.40909091, 0.55714286, 0.08571429, 0.06315789,
       0.08888889, 0.09230769, 0.096     , 0.1       , 0.07058824,
       0.15      , 0.37241379, 0.34285714, 0.45      , 0.45517241,
       0.49230769, 0.35294118, 0.36521739, 0.4173913 , 0.48888889,
       0.48648649, 0.21333333, 0.21492537, 0.18666667, 0.19636364,
       0.19701493, 0.20377358, 0.19047619, 0.20526316, 0.27272727,
       0.32307692, 0.31764706, 0.336     , 0.18461538, 0.21176471,
       0.2490566 , 0.25454545, 0.26341463, 0.26181818, 0.26086957,
       0.25411765, 0.27857143, 0.29142857, 0.29647059, 0.25846154,
       0.28387097, 0.27906977, 0.30588235, 0.27826087, 0.34074074,
       0.32195122])
mcd['john_index'].replace(np.inf, 0).unique()
array([0.84      , 1.056     , 0.58378378, 0.66666667, 0.75      ,
       0.8372093 , 0.54782609, 0.50769231, 0.64390244, 0.58723404,
       0.3627907 , 0.35      , 0.47058824, 0.44210526, 0.52173913,
       0.48461538, 0.55609756, 0.5106383 , 0.6       , 0.66      ,
       0.37142857, 0.48      , 0.552     , 0.63870968, 0.69473684,
       0.64477612, 0.5027027 , 0.54375   , 0.46238532, 0.44869565,
       0.4969697 , 0.37714286, 0.41538462, 0.52      , 0.24      ,
       0.23478261, 0.4137931 , 0.46153846, 0.61132075, 0.76153846,
       0.8       , 0.78688525, 0.71111111, 0.816     , 0.65      ,
       0.70344828, 0.7255814 , 0.71666667, 0.75789474, 0.79090909,
       0.66976744, 0.63529412, 1.06285714, 0.69850746, 1.01176471,
       0.68852459, 1.04      , 0.64      , 0.89491525, 0.64186047,
       0.53333333, 0.61395349, 0.975     , 0.59016393, 0.88      ,
       0.57313433, 0.83076923, 0.57777778, 0.94736842, 0.63157895,
       0.61276596, 0.59361702, 0.52307692, 1.02857143, 0.85263158,
       1.69090909, 1.40689655, 0.52941176, 0.78461538, 0.54545455,
       0.72857143, 0.20869565, 0.28235294, 0.25882353, 0.21818182,
       1.2       , 0.        , 0.4       , 0.288     , 0.225     ,
       0.26666667, 0.32727273, 0.24705882, 0.25714286,        nan,
       0.96      , 0.92307692, 0.16      , 0.18947368, 0.17142857,
       0.08      , 0.06666667, 0.05454545, 0.70588235, 0.68571429,
       0.44444444, 0.42352941, 0.44651163, 0.43636364, 0.45714286,
       0.57142857, 0.61818182, 1.32      , 0.624     , 0.65806452,
       0.68      , 0.94285714, 0.91764706, 0.98181818, 0.43902439,
       0.432     , 0.58461538, 0.4125    , 0.425     , 0.528     ,
       0.54193548, 0.40909091, 0.55714286, 0.08571429, 0.06315789,
       0.08888889, 0.09230769, 0.096     , 0.1       , 0.07058824,
       0.15      , 0.37241379, 0.34285714, 0.45      , 0.45517241,
       0.49230769, 0.35294118, 0.36521739, 0.4173913 , 0.48888889,
       0.48648649, 0.21333333, 0.21492537, 0.18666667, 0.19636364,
       0.19701493, 0.20377358, 0.19047619, 0.20526316, 0.27272727,
       0.32307692, 0.31764706, 0.336     , 0.18461538, 0.21176471,
       0.2490566 , 0.25454545, 0.26341463, 0.26181818, 0.26086957,
       0.25411765, 0.27857143, 0.29142857, 0.29647059, 0.25846154,
       0.28387097, 0.27906977, 0.30588235, 0.27826087, 0.34074074,
       0.32195122])

Or you can fill missing values using fillna…

  • For example, can replace all nans with the mean of all other values…

mcd['john_index'].fillna(mcd['john_index'].mean()).unique()
# remember you need to overwrite mcd or make a new dataframe in order for these changes to 'stick'
array([0.84      , 1.056     , 0.58378378, 0.66666667, 0.75      ,
       0.8372093 , 0.54782609, 0.50769231, 0.64390244, 0.58723404,
       0.3627907 , 0.35      , 0.47058824, 0.44210526, 0.52173913,
       0.48461538, 0.55609756, 0.5106383 , 0.6       , 0.66      ,
       0.37142857, 0.48      , 0.552     , 0.63870968, 0.69473684,
       0.64477612, 0.5027027 , 0.54375   , 0.46238532, 0.44869565,
       0.4969697 , 0.37714286, 0.41538462, 0.52      , 0.24      ,
       0.23478261, 0.4137931 , 0.46153846, 0.61132075, 0.76153846,
       0.8       , 0.78688525, 0.71111111, 0.816     , 0.65      ,
       0.70344828, 0.7255814 , 0.71666667, 0.75789474, 0.79090909,
       0.66976744, 0.63529412, 1.06285714, 0.69850746, 1.01176471,
       0.68852459, 1.04      , 0.64      , 0.89491525, 0.64186047,
       0.53333333, 0.61395349, 0.975     , 0.59016393, 0.88      ,
       0.57313433, 0.83076923, 0.57777778, 0.94736842, 0.63157895,
       0.61276596, 0.59361702, 0.52307692, 1.02857143, 0.85263158,
       1.69090909, 1.40689655, 0.52941176, 0.78461538, 0.54545455,
       0.72857143, 0.20869565, 0.28235294, 0.25882353, 0.21818182,
       1.2       , 0.        , 0.4       , 0.288     , 0.225     ,
       0.26666667, 0.32727273, 0.24705882, 0.25714286,        inf,
       0.96      , 0.92307692, 0.16      , 0.18947368, 0.17142857,
       0.08      , 0.06666667, 0.05454545, 0.70588235, 0.68571429,
       0.44444444, 0.42352941, 0.44651163, 0.43636364, 0.45714286,
       0.57142857, 0.61818182, 1.32      , 0.624     , 0.65806452,
       0.68      , 0.94285714, 0.91764706, 0.98181818, 0.43902439,
       0.432     , 0.58461538, 0.4125    , 0.425     , 0.528     ,
       0.54193548, 0.40909091, 0.55714286, 0.08571429, 0.06315789,
       0.08888889, 0.09230769, 0.096     , 0.1       , 0.07058824,
       0.15      , 0.37241379, 0.34285714, 0.45      , 0.45517241,
       0.49230769, 0.35294118, 0.36521739, 0.4173913 , 0.48888889,
       0.48648649, 0.21333333, 0.21492537, 0.18666667, 0.19636364,
       0.19701493, 0.20377358, 0.19047619, 0.20526316, 0.27272727,
       0.32307692, 0.31764706, 0.336     , 0.18461538, 0.21176471,
       0.2490566 , 0.25454545, 0.26341463, 0.26181818, 0.26086957,
       0.25411765, 0.27857143, 0.29142857, 0.29647059, 0.25846154,
       0.28387097, 0.27906977, 0.30588235, 0.27826087, 0.34074074,
       0.32195122])
(mcd[mcd['Calories'] > 0]
.sort_values('john_index', ascending = False)
.head(1))
Category Item Serving Size Calories Calories from Fat Total Fat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value) john_index
86 Salads Premium Bacon Ranch Salad with Grilled Chicken 8.5 oz (241 g) 220 80 8.0 13 4.0 20 0.0 ... 3 2 10 4 29 110 30 15 8 1.690909

1 rows × 25 columns