Loaders
The primary purpose of NFLData.jl
is to load sports data from the NFL into Julia. All functions that load data into Julia are prefixed with load_*()
. All files are loaded in as DataFrames
.
To load a data resource into memory, when a load_*()
function is called initially, NFLData.jl
downloads the data resource as a .parquet
, .csv
, or .csv.gz
file into a Scratch space. From there, NFLData.jl
reads the data into memory as a DataFrame. If the data object is referenced again, then the cached file (from the Scratch space) is read into memory, and the file is not redownloaded unless the cache is cleared. For more information on this behavior, see Caching.
Available resources
The data resources available in NFLData.jl
are maintained by the nflverse
organization (as is this package). These resources are directly stored in a variety of places, but most commonly as releases in the nflverse/nflverse-data
repository. These data resources are typically sourced directly from the NFL and its various APIs, but other third party resources are represented as well, such as data from pro-football-reference.com.
For a complete list of what resources are available, consult the {nflreadr}
documentation–all load_*()
functions from {nflreadr}
have been implemented in NFLData.jl
, and the arguments are all (approximately) the same.
Some examples are provided below in terms of how to read and query these load_*()
functions.
Universal data resources
Many loaders do not take any arguments, and simply load a data resource into a Julia environment. For example, load_players()
simply returns all players in the NFL's database, past and present.
julia> using NFLData
julia> load_players()
20753×32 DataFrame
Row │ status display_name first_name last_name esb_id gsis_id birth_da ⋯
│ String? String? String? String? String? String? String? ⋯
───────┼────────────────────────────────────────────────────────────────────────────────────
1 │ RET 'Omar Ellison 'Omar Ellison ELL711319 00-0004866 1971-10- ⋯
2 │ ACT A'Shawn Robinson A'Shawn Robinson ROB367960 00-0032889 1995-03-
3 │ DEV A.J. Arcuri A.J. Arcuri ARC716900 00-0037845 1997-08-
4 │ ACT A.J. Barner A.J. Barner BAR235889 00-0039793 2002-05-
5 │ RES A.J. Bouye Arlandus Bouye BOU651714 00-0030228 1991-08- ⋯
6 │ ACT A.J. Brown Arthur Brown BRO413223 00-0035676 1997-06-
7 │ ACT A.J. Cann Aaron Cann CAN364949 00-0032255 1991-10-
8 │ ACT A.J. Cole A.J. Cole COL214396 00-0035190 1995-11-
9 │ RET A.J. Cruz A.J. Cruz CRU779150 00-0032270 missing ⋯
10 │ RET A.J. Dalton A.J. Dalton DAL649400 00-0031108 missing
11 │ RET A.J. Davis A.J. Davis DAV115245 00-0029167 1989-07-
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
20744 │ DEV Zion Logue Zion Logue LOG824407 00-0039400 2002-07-
20745 │ RET Zipp Duncan Zipp Duncan DUN383863 00-0027294 missing ⋯
20746 │ RET Zola Davis Zola Davis DAV815538 00-0004071 1975-01-
20747 │ RET Zoltan Mesko Zoltan Mesko MES280733 00-0027749 1986-03-
20748 │ CUT Zonovan Knight Zonovan Knight KNI764772 00-0037157 2001-04-
20749 │ CUT Zuri Henry Zuri Henry HEN713594 00-0039689 2000-04- ⋯
20750 │ RET Zuriel Smith Zuriel Smith SMI828252 00-0022024 1980-01-
20751 │ CUT Zurlon Tipton Zurlon Tipton TIP645432 00-0030855 1989-04-
20752 │ DEV Zyon Gilbert Zyon Gilbert GIL144859 00-0037373 1999-02-
20753 │ ACT Zyon McCollum Zyon McCollum MCC496223 00-0037268 1999-05- ⋯
26 columns and 20732 rows omitted
Queryable by season
Many loaders can query by season. By default, these loaders will return the most recent season of data.
julia> load_pbp() # 2024 season thru week 2
5288×372 DataFrame
Row │ play_id game_id old_game_id home_team away_team season_type week ⋯
│ Float64? String? String? String? String? String? Int32? ⋯
──────┼─────────────────────────────────────────────────────────────────────────────────────
1 │ 1.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1 ⋯
2 │ 40.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1
3 │ 61.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1
4 │ 83.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1
5 │ 108.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1 ⋯
6 │ 133.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1
7 │ 155.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1
8 │ 177.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1
9 │ 199.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1 ⋯
10 │ 224.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1
11 │ 265.0 2024_01_ARI_BUF 2024090801 BUF ARI REG 1
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
5279 │ 4104.0 2024_02_TB_DET 2024091503 DET TB REG 2
5280 │ 4130.0 2024_02_TB_DET 2024091503 DET TB REG 2 ⋯
5281 │ 4155.0 2024_02_TB_DET 2024091503 DET TB REG 2
5282 │ 4162.0 2024_02_TB_DET 2024091503 DET TB REG 2
5283 │ 4187.0 2024_02_TB_DET 2024091503 DET TB REG 2
5284 │ 4210.0 2024_02_TB_DET 2024091503 DET TB REG 2 ⋯
5285 │ 4233.0 2024_02_TB_DET 2024091503 DET TB REG 2
5286 │ 4256.0 2024_02_TB_DET 2024091503 DET TB REG 2
5287 │ 4279.0 2024_02_TB_DET 2024091503 DET TB REG 2
5288 │ 4301.0 2024_02_TB_DET 2024091503 DET TB REG 2 ⋯
365 columns and 5267 rows omitted
You can return a different season of data, if available, by passing in the year to query:
julia> load_pbp(2023)
49665×372 DataFrame
Row │ play_id game_id old_game_id home_team away_team season_type week ⋯
│ Float64? String? String? String? String? String? Int32? ⋯
───────┼────────────────────────────────────────────────────────────────────────────────────
1 │ 1.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1 ⋯
2 │ 39.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1
3 │ 55.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1
4 │ 77.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1
5 │ 102.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1 ⋯
6 │ 124.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1
7 │ 147.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1
8 │ 172.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1
9 │ 197.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1 ⋯
10 │ 220.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1
11 │ 245.0 2023_01_ARI_WAS 2023091007 WAS ARI REG 1
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
49656 │ 4684.0 2023_22_SF_KC 2024021100 KC SF POST 22
49657 │ 4709.0 2023_22_SF_KC 2024021100 KC SF POST 22 ⋯
49658 │ 4734.0 2023_22_SF_KC 2024021100 KC SF POST 22
49659 │ 4771.0 2023_22_SF_KC 2024021100 KC SF POST 22
49660 │ 4759.0 2023_22_SF_KC 2024021100 KC SF POST 22
49661 │ 4791.0 2023_22_SF_KC 2024021100 KC SF POST 22 ⋯
49662 │ 4813.0 2023_22_SF_KC 2024021100 KC SF POST 22
49663 │ 4835.0 2023_22_SF_KC 2024021100 KC SF POST 22
49664 │ 4860.0 2023_22_SF_KC 2024021100 KC SF POST 22
49665 │ 4881.0 2023_22_SF_KC 2024021100 KC SF POST 22 ⋯
365 columns and 49644 rows omitted
To get data for multiple years, pass a range of years into the function.
julia> load_pbp(2022:2023)
99099×372 DataFrame
Row │ play_id game_id old_game_id home_team away_team season_type week ⋯
│ Float64? String? String? String? String? String? Int32? ⋯
───────┼────────────────────────────────────────────────────────────────────────────────────
1 │ 1.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1 ⋯
2 │ 43.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1
3 │ 68.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1
4 │ 89.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1
5 │ 115.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1 ⋯
6 │ 136.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1
7 │ 172.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1
8 │ 202.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1
9 │ 230.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1 ⋯
10 │ 254.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1
11 │ 275.0 2022_01_BAL_NYJ 2022091107 NYJ BAL REG 1
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
99090 │ 4684.0 2023_22_SF_KC 2024021100 KC SF POST 22
99091 │ 4709.0 2023_22_SF_KC 2024021100 KC SF POST 22 ⋯
99092 │ 4734.0 2023_22_SF_KC 2024021100 KC SF POST 22
99093 │ 4771.0 2023_22_SF_KC 2024021100 KC SF POST 22
99094 │ 4759.0 2023_22_SF_KC 2024021100 KC SF POST 22
99095 │ 4791.0 2023_22_SF_KC 2024021100 KC SF POST 22 ⋯
99096 │ 4813.0 2023_22_SF_KC 2024021100 KC SF POST 22
99097 │ 4835.0 2023_22_SF_KC 2024021100 KC SF POST 22
99098 │ 4860.0 2023_22_SF_KC 2024021100 KC SF POST 22
99099 │ 4881.0 2023_22_SF_KC 2024021100 KC SF POST 22 ⋯
365 columns and 99078 rows omitted
To get all years of data for a resource, pass true
into the function. Be advised that this may take a few seconds.
julia> load_pbp(true)
1186651×372 DataFrame
Row │ play_id game_id old_game_id home_team away_team season_type week ⋯
│ Float64? String? String? String? String? String? Int3 ⋯
─────────┼──────────────────────────────────────────────────────────────────────────────────
1 │ 35.0 1999_01_ARI_PHI 1999091200 PHI ARI REG ⋯
2 │ 60.0 1999_01_ARI_PHI 1999091200 PHI ARI REG
3 │ 82.0 1999_01_ARI_PHI 1999091200 PHI ARI REG
4 │ 103.0 1999_01_ARI_PHI 1999091200 PHI ARI REG
5 │ 126.0 1999_01_ARI_PHI 1999091200 PHI ARI REG ⋯
6 │ 150.0 1999_01_ARI_PHI 1999091200 PHI ARI REG
7 │ 176.0 1999_01_ARI_PHI 1999091200 PHI ARI REG
8 │ 197.0 1999_01_ARI_PHI 1999091200 PHI ARI REG
9 │ 218.0 1999_01_ARI_PHI 1999091200 PHI ARI REG ⋯
10 │ 240.0 1999_01_ARI_PHI 1999091200 PHI ARI REG
11 │ 260.0 1999_01_ARI_PHI 1999091200 PHI ARI REG
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
1186642 │ 4104.0 2024_02_TB_DET 2024091503 DET TB REG
1186643 │ 4130.0 2024_02_TB_DET 2024091503 DET TB REG ⋯
1186644 │ 4155.0 2024_02_TB_DET 2024091503 DET TB REG
1186645 │ 4162.0 2024_02_TB_DET 2024091503 DET TB REG
1186646 │ 4187.0 2024_02_TB_DET 2024091503 DET TB REG
1186647 │ 4210.0 2024_02_TB_DET 2024091503 DET TB REG ⋯
1186648 │ 4233.0 2024_02_TB_DET 2024091503 DET TB REG
1186649 │ 4256.0 2024_02_TB_DET 2024091503 DET TB REG
1186650 │ 4279.0 2024_02_TB_DET 2024091503 DET TB REG
1186651 │ 4301.0 2024_02_TB_DET 2024091503 DET TB REG ⋯
366 columns and 1186630 rows omitted
Trying to query a resource for a year where data is unavailable will throw an error.
julia> load_pbp(1995)
ERROR: DomainError with 1995:
No NFL PBP data available prior to 1999\!
Other queries
Some data is available to queried with other parameters. For example, you can query ESPN quarterback rating (QBR) data grouped by season or by week:
julia> load_espn_qbr("season") # by season
1413×23 DataFrame
Row │ season season_type game_week team_abb player_id name_short rank ⋯
│ Int32? String? String? String? String? String? Float64 ⋯
──────┼─────────────────────────────────────────────────────────────────────────────────────
1 │ 2006 Regular Season Total IND 1428 P. Manning 1 ⋯
2 │ 2006 Regular Season Total NE 2330 T. Brady 2
3 │ 2006 Regular Season Total SD 5529 P. Rivers 3
4 │ 2006 Regular Season Total CIN 4459 C. Palmer 4
5 │ 2006 Regular Season Total NO 2580 D. Brees 5 ⋯
6 │ 2006 Regular Season Total BAL 733 S. McNair 6
7 │ 2006 Regular Season Total NYJ 2149 C. Pennington 7
8 │ 2006 Regular Season Total DAL 5209 T. Romo 8
9 │ 2006 Regular Season Total PHI 1753 D. McNabb 9 ⋯
10 │ 2006 Regular Season Total ARI 9596 M. Leinart 10
11 │ 2006 Regular Season Total STL 2299 M. Bulger 11
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
1404 │ 2024 Regular Season Total NE 4569173 R. Stevenson missing
1405 │ 2024 Regular Season Total LV 2576336 A. Abdullah missing ⋯
1406 │ 2024 Regular Season Total NO 4243322 J. Haener missing
1407 │ 2024 Regular Season Total CAR 14012 A. Dalton missing
1408 │ 2024 Regular Season Total MIN 4242431 T. Chandler missing
1409 │ 2024 Regular Season Total DAL 2972515 C. Rush missing ⋯
1410 │ 2024 Regular Season Total MIA 4036419 S. Thompson missing
1411 │ 2024 Regular Season Total KC 4361529 I. Pacheco missing
1412 │ 2024 Regular Season Total SF 3126486 D. Samuel Sr. missing
1413 │ 2024 Regular Season Total ARI 4360175 C. Tune missing ⋯
17 columns and 1392 rows omitted
julia> load_espn_qbr("week") # by week
9604×30 DataFrame
Row │ season season_type game_id game_week week_text team_abb player_id name_sh ⋯
│ Int32? String? String? Int32? String? String? String? String? ⋯
──────┼─────────────────────────────────────────────────────────────────────────────────────
1 │ 2006 Regular 260910009 1 Week 1 CHI 4480 R. Gros ⋯
2 │ 2006 Regular 260910034 1 Week 1 PHI 1753 D. McNa
3 │ 2006 Regular 260910010 1 Week 1 NYJ 2149 C. Penn
4 │ 2006 Regular 260910019 1 Week 1 IND 1428 P. Mann
5 │ 2006 Regular 260910029 1 Week 1 ATL 2549 M. Vick ⋯
6 │ 2006 Regular 260907023 1 Week 1 PIT 1490 C. Batc
7 │ 2006 Regular 260910027 1 Week 1 BAL 733 S. McNa
8 │ 2006 Regular 260910030 1 Week 1 JAX 4465 B. Left
9 │ 2006 Regular 260911028 1 Week 1 MIN 331 B. John ⋯
10 │ 2006 Regular 260910017 1 Week 1 BUF 5547 J.P. Lo
11 │ 2006 Regular 260911028 1 Week 1 WSH 445 M. Brun
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
9595 │ 2024 Regular 401671659 1 Week 1 LAC 4038941 J. Herb
9596 │ 2024 Regular 401671719 1 Week 1 TEN 4361418 W. Levi ⋯
9597 │ 2024 Regular 401671761 1 Week 1 DAL 2577417 D. Pres
9598 │ 2024 Regular 401671719 1 Week 1 CHI 4431611 C. Will
9599 │ 2024 Regular 401671805 1 Week 1 GB 4036378 J. Love
9600 │ 2024 Regular 401671712 1 Week 1 NYG 3917792 D. Jone ⋯
9601 │ 2024 Regular 401671734 1 Week 1 CAR 4685720 B. Youn
9602 │ 2024 Regular 401671761 1 Week 1 CLE 3122840 D. Wats
9603 │ 2024 Regular 401671807 2 Week 2 BUF 3918298 J. Alle
9604 │ 2024 Regular 401671807 2 Week 2 MIA 4241479 T. Tago ⋯
23 columns and 9583 rows omitted
Data dictionaries
Data dictionaries for almost every data resource available through a load_*()
function is available on the {nflreadr}
website.