kgl

Command line utilities for kaggle competitions

This module requires kaggle API token in order to work. See here for info on how to setup that.

Competition utils

Modified version of setup_comp from fastkaggle. I like to put my data into data folders so it’s easier to mask them in version control.


source

setup_comp

 setup_comp (competition, install='')

Get a path to data for competition, downloading it if needed

Setup competition projects

api = import_kaggle()
comps = api.competitions_list()
comp = comps[0]
comp.title, comp.url.split("/")[-1]
('AI Mathematical Olympiad - Progress Prize 2',
 'ai-mathematical-olympiad-progress-prize-2')
len("equity-post-HCT-survival-prediction  ")
37

source

disp_comp

 disp_comp (comp)
joinedkey = attrkey("user_has_entered")
comps.sort(key=joinedkey)
active, entered = (list(y) for x,y in it.groupby(comps, lambda x: x.user_has_entered))

source

get_competitions

 get_competitions ()
active, entered = get_competitions()
active[:1], entered[:1]
([{"id": 86023, "ref": "https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2", "title": "AI Mathematical Olympiad - Progress Prize 2", "url": "https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2", "description": "Solve national-level math challenges using artificial intelligence models", "organizationName": "AI|MO", "organizationRef": "", "category": "Featured", "reward": "2,117,152 Usd", "tags": [{"ref": "nlp", "name": "nlp", "description": "Natural Language Processing gives a computer program the ability to extract meaning human language. Applications include sentiment analysis, translation, and speech recognition.", "fullPath": "analysis > nlp", "competitionCount": 89, "datasetCount": 4512, "scriptCount": 8533, "totalCount": 13134}, {"ref": "mathematics", "name": "mathematics", "description": "", "fullPath": "subject > mathematics", "competitionCount": 4, "datasetCount": 120, "scriptCount": 179, "totalCount": 303}, {"ref": "accuracy score", "name": "accuracy score", "description": "Accuracy classification score. See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html", "fullPath": "", "competitionCount": 0, "datasetCount": 0, "scriptCount": 0, "totalCount": 0}], "deadline": "2025-04-01T23:59:00.000Z", "kernelCount": 0, "teamCount": 2162, "userHasEntered": false, "userRank": 0, "mergerDeadline": "2025-03-25T23:59:00.000Z", "newEntrantDeadline": "2025-03-25T23:59:00.000Z", "enabledDate": "2024-10-17T15:00:47.587Z", "maxDailySubmissions": 1, "maxTeamSize": 7, "evaluationMetric": "Accuracy Score", "awardsPoints": true, "isKernelsSubmissionsOnly": true, "submissionsDisabled": false}],
 [{"id": 91714, "ref": "https://www.kaggle.com/competitions/playground-series-s5e3", "title": "Binary Prediction with a Rainfall Dataset", "url": "https://www.kaggle.com/competitions/playground-series-s5e3", "description": "Playground Series - Season 5, Episode 3", "organizationName": "Kaggle", "organizationRef": "", "category": "Playground", "reward": "Swag", "tags": [{"ref": "weather and climate", "name": "weather and climate", "description": "Weather datasets and kernels come in all wind speeds and directions. You have weather data about hurricanes and other inclement phenomena, hourly readings, and general weather for various cities.", "fullPath": "subject > earth and nature > environment > weather and climate", "competitionCount": 13, "datasetCount": 1319, "scriptCount": 624, "totalCount": 1956}, {"ref": "beginner", "name": "beginner", "description": "New to data science? Explore tips, tricks, and beginner friendly work from other Kagglers.", "fullPath": "audience > beginner", "competitionCount": 12902, "datasetCount": 8233, "scriptCount": 42012, "totalCount": 63147}, {"ref": "time series analysis", "name": "time series analysis", "description": "", "fullPath": "technique > time series analysis", "competitionCount": 479, "datasetCount": 2663, "scriptCount": 3716, "totalCount": 6858}, {"ref": "tabular", "name": "tabular", "description": "", "fullPath": "data type > tabular", "competitionCount": 13566, "datasetCount": 11739, "scriptCount": 7106, "totalCount": 32411}, {"ref": "roc auc score", "name": "roc auc score", "description": "Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC)     from prediction scores. See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html", "fullPath": "", "competitionCount": 0, "datasetCount": 0, "scriptCount": 0, "totalCount": 0}], "deadline": "2025-03-31T23:59:00.000Z", "kernelCount": 0, "teamCount": 2734, "userHasEntered": true, "userRank": 2040, "mergerDeadline": "2025-03-31T23:59:00.000Z", "newEntrantDeadline": null, "enabledDate": "2025-03-01T00:01:34.057Z", "maxDailySubmissions": 5, "maxTeamSize": 3, "evaluationMetric": "Roc Auc Score", "awardsPoints": false, "isKernelsSubmissionsOnly": false, "submissionsDisabled": false}])

source

kgl_list

 kgl_list ()

List kaggle competitions

print(kgl_list())
Joined:
  1   playground-series-s5e3                   Binary Prediction with a Rainfall Datase
  2   store-sales-time-series-forecasting      Store Sales - Time Series Forecasting
Active:
  3   ai-mathematical-olympiad-progress-prize- AI Mathematical Olympiad - Progress Priz
  4   stanford-rna-3d-folding                  Stanford RNA 3D Folding
  5   byu-locating-bacterial-flagellar-motors- BYU - Locating Bacterial Flagellar Motor
  6   march-machine-learning-mania-2025        March Machine Learning Mania 2025
  7   drawing-with-llms                        Drawing with LLMs
  8   birdclef-2025                            BirdCLEF+ 2025
  9   titanic                                  Titanic - Machine Learning from Disaster
  10  home-data-for-ml-course                  Housing Prices Competition for Kaggle Le
  11  house-prices-advanced-regression-techniq House Prices - Advanced Regression Techn
  12  spaceship-titanic                        Spaceship Titanic
  13  digit-recognizer                         Digit Recognizer
  14  nlp-getting-started                      Natural Language Processing with Disaste
  15  connectx                                 Connect X
  16  llm-classification-finetuning            LLM Classification Finetuning
  17  gan-getting-started                      I’m Something of a Painter Myself
  18  contradictory-my-dear-watson             Contradictory, My Dear Watson
  19  tpu-getting-started                      Petals to the Metal - Flower Classificat
  20  konwinski-prize                          Konwinski Prize

source

maybe_int

 maybe_int (x:str)
comp = comps[0]
comp.url
'https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2'
comp.title
'AI Mathematical Olympiad - Progress Prize 2'

source

get_competition

 get_competition (n:str)

source

kgl_new

 kgl_new (n:str, save_to:str)

Setup nbdev environment for a kaggle competition

Type Details
n str competition id or name
save_to str project name to use locally and for github

Adopted from fastkaggle

Changes: - Allow uploading current project even if it’s not on pip - Kaggle API changed since 3 years ago, so had to fix code


source

create_lib_dataset

 create_lib_dataset (ds_name, lib_source, lib_path, username,
                     clear_after=False)

For each library, create or update a kaggle dataset with the latest version

Type Default Details
ds_name
lib_source
lib_path Local path to dl/create dataset
username You username
clear_after bool False Delete local copies after sync with kaggle?