NFLData.jl

A package for intelligently loading NFL data into Julia.

NFLData.jl is a low-level data loader, designed to be a native Julia implementation of the popular {nflreadr} package. NFLData.jl makes a number of NFL data resources available quickly and handily in DataFrame format, while intelligently caching and updating these data sources to accomodate in-season changes.

Installation

NFLData.jl is available from the Julia package registry, and can be installed with the following one liner.

using Pkg; Pkg.add("NFLData")

You can also add the package using the Pkg.jl REPL. Open an interactive Julia session, then press ] to open the REPL, then run add NFLData.

Getting Started

After installation, return to the Julia repl, and type:

julia> using NFLData

You can now load data into your Julia environment. For example, to pull in NFL schedules since 1999, you can use load_schedules():

julia> load_schedules()
6978×46 DataFrame
  Row │ game_id          season  game_type  week   gameday     weekday   gametime  away_te ⋯
      │ String15         Int64   String3    Int64  Date        String15  Time      String3 ⋯
──────┼─────────────────────────────────────────────────────────────────────────────────────
    1 │ 1999_01_MIN_ATL    1999  REG            1  1999-09-12  Sunday    missing   MIN     ⋯
    2 │ 1999_01_KC_CHI     1999  REG            1  1999-09-12  Sunday    missing   KC
    3 │ 1999_01_PIT_CLE    1999  REG            1  1999-09-12  Sunday    missing   PIT
    4 │ 1999_01_OAK_GB     1999  REG            1  1999-09-12  Sunday    missing   OAK
    5 │ 1999_01_BUF_IND    1999  REG            1  1999-09-12  Sunday    missing   BUF     ⋯
    6 │ 1999_01_SF_JAX     1999  REG            1  1999-09-12  Sunday    missing   SF
    7 │ 1999_01_CAR_NO     1999  REG            1  1999-09-12  Sunday    missing   CAR
    8 │ 1999_01_NE_NYJ     1999  REG            1  1999-09-12  Sunday    missing   NE
    9 │ 1999_01_ARI_PHI    1999  REG            1  1999-09-12  Sunday    missing   ARI     ⋯
   10 │ 1999_01_DET_SEA    1999  REG            1  1999-09-12  Sunday    missing   DET
   11 │ 1999_01_BAL_STL    1999  REG            1  1999-09-12  Sunday    missing   BAL
  ⋮   │        ⋮           ⋮         ⋮        ⋮        ⋮          ⋮         ⋮          ⋮   ⋱
 6969 │ 2024_18_CHI_GB     2024  REG           18  2025-01-05  Sunday    13:00:00  CHI
 6970 │ 2024_18_JAX_IND    2024  REG           18  2025-01-05  Sunday    13:00:00  JAX     ⋯
 6971 │ 2024_18_SEA_LA     2024  REG           18  2025-01-05  Sunday    13:00:00  SEA
 6972 │ 2024_18_LAC_LV     2024  REG           18  2025-01-05  Sunday    13:00:00  LAC
 6973 │ 2024_18_BUF_NE     2024  REG           18  2025-01-05  Sunday    13:00:00  BUF
 6974 │ 2024_18_MIA_NYJ    2024  REG           18  2025-01-05  Sunday    13:00:00  MIA     ⋯
 6975 │ 2024_18_NYG_PHI    2024  REG           18  2025-01-05  Sunday    13:00:00  NYG
 6976 │ 2024_18_CIN_PIT    2024  REG           18  2025-01-05  Sunday    13:00:00  CIN
 6977 │ 2024_18_NO_TB      2024  REG           18  2025-01-05  Sunday    13:00:00  NO
 6978 │ 2024_18_HOU_TEN    2024  REG           18  2025-01-05  Sunday    13:00:00  HOU     ⋯
                                                            39 columns and 6957 rows omitted

You can load nflfastR play-by-play data into Julia with load_pbp(years). The function can take either one or multiple years as arguments, or you can pass true into the function to pull in all years of PBP data.

julia> load_pbp(2023)
49665×372 DataFrame
   Row │ play_id   game_id          old_game_id  home_team  away_team  season_type  week   ⋯
       │ Float64?  String?          String?      String?    String?    String?      Int32? ⋯
───────┼────────────────────────────────────────────────────────────────────────────────────
     1 │      1.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1 ⋯
     2 │     39.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1
     3 │     55.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1
     4 │     77.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1
     5 │    102.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1 ⋯
     6 │    124.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1
     7 │    147.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1
     8 │    172.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1
     9 │    197.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1 ⋯
    10 │    220.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1
    11 │    245.0  2023_01_ARI_WAS  2023091007   WAS        ARI        REG               1
   ⋮   │    ⋮             ⋮              ⋮           ⋮          ⋮           ⋮         ⋮    ⋱
 49656 │   4684.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 49657 │   4709.0  2023_22_SF_KC    2024021100   KC         SF         POST             22 ⋯
 49658 │   4734.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 49659 │   4771.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 49660 │   4759.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 49661 │   4791.0  2023_22_SF_KC    2024021100   KC         SF         POST             22 ⋯
 49662 │   4813.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 49663 │   4835.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 49664 │   4860.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 49665 │   4881.0  2023_22_SF_KC    2024021100   KC         SF         POST             22 ⋯
                                                          365 columns and 49644 rows omitted

julia> load_pbp(2022:2023)
99099×372 DataFrame
   Row │ play_id   game_id          old_game_id  home_team  away_team  season_type  week   ⋯
       │ Float64?  String?          String?      String?    String?    String?      Int32? ⋯
───────┼────────────────────────────────────────────────────────────────────────────────────
     1 │      1.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1 ⋯
     2 │     43.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1
     3 │     68.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1
     4 │     89.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1
     5 │    115.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1 ⋯
     6 │    136.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1
     7 │    172.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1
     8 │    202.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1
     9 │    230.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1 ⋯
    10 │    254.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1
    11 │    275.0  2022_01_BAL_NYJ  2022091107   NYJ        BAL        REG               1
   ⋮   │    ⋮             ⋮              ⋮           ⋮          ⋮           ⋮         ⋮    ⋱
 99090 │   4684.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 99091 │   4709.0  2023_22_SF_KC    2024021100   KC         SF         POST             22 ⋯
 99092 │   4734.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 99093 │   4771.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 99094 │   4759.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 99095 │   4791.0  2023_22_SF_KC    2024021100   KC         SF         POST             22 ⋯
 99096 │   4813.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 99097 │   4835.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 99098 │   4860.0  2023_22_SF_KC    2024021100   KC         SF         POST             22
 99099 │   4881.0  2023_22_SF_KC    2024021100   KC         SF         POST             22 ⋯
                                                          365 columns and 99078 rows omitted

julia> load_pbp(true)
1186651×372 DataFrame
     Row │ play_id   game_id          old_game_id  home_team  away_team  season_type  week ⋯
         │ Float64?  String?          String?      String?    String?    String?      Int3 ⋯
─────────┼──────────────────────────────────────────────────────────────────────────────────
       1 │     35.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG               ⋯
       2 │     60.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG
       3 │     82.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG
       4 │    103.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG
       5 │    126.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG               ⋯
       6 │    150.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG
       7 │    176.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG
       8 │    197.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG
       9 │    218.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG               ⋯
      10 │    240.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG
      11 │    260.0  1999_01_ARI_PHI  1999091200   PHI        ARI        REG
    ⋮    │    ⋮             ⋮              ⋮           ⋮          ⋮           ⋮         ⋮  ⋱
 1186642 │   4104.0  2024_02_TB_DET   2024091503   DET        TB         REG
 1186643 │   4130.0  2024_02_TB_DET   2024091503   DET        TB         REG               ⋯
 1186644 │   4155.0  2024_02_TB_DET   2024091503   DET        TB         REG
 1186645 │   4162.0  2024_02_TB_DET   2024091503   DET        TB         REG
 1186646 │   4187.0  2024_02_TB_DET   2024091503   DET        TB         REG
 1186647 │   4210.0  2024_02_TB_DET   2024091503   DET        TB         REG               ⋯
 1186648 │   4233.0  2024_02_TB_DET   2024091503   DET        TB         REG
 1186649 │   4256.0  2024_02_TB_DET   2024091503   DET        TB         REG
 1186650 │   4279.0  2024_02_TB_DET   2024091503   DET        TB         REG
 1186651 │   4301.0  2024_02_TB_DET   2024091503   DET        TB         REG               ⋯
                                                        366 columns and 1186630 rows omitted

NFLData.jl is designed to quickly load large datasets into memory. Here, we load a clean Julia session and pull in all seasons of PBP data, containing 362 columns and, as of the time of writing this documentation, 1,186,638 plays.

julia> @time pbp = load_pbp(true);
 37.171214 seconds (59.72 M allocations: 10.669 GiB, 15.75% gc time, 24.79% compilation time: <1% of which was recompilation)

Not bad for running on my pretty dinky work laptop (16 GB RM, Intel Core i7-1270P processor)!

Caching

NFLData.jl relies on Julia's JIT compilation to speed up running large objects into memory. However, JIT compiliation does not persist across sessions. NFLData.jl uses Scratch.jl to cache the data objects referenced across sessions, so subsequent calls that reference the same data objects are faster even if the call comes from a different Julia session. To learn more about this behavior, please visit the Caching chapter of this docomentation.

Logo Attribution

The NFLData.jl logo is sourced from Font Awesome Free 5.2.0 by @fontawesome. It has been remixed from its original form under the Creative Commons Attribution 4.0 International license.