ml_pipeline:¶
About¶
A centralized pattern for creating Machine Learning pipelines
Description¶
Contains a host of tools for standardizing the format of machine learning pipelines. Provides methods for querying databases, cleaning data, preprocessing data, model operations, exporting results, and more. Each machine learning project is a “child” of this template, with the ability to overwrite any of the default class attributes/methods.
Code Overview¶
Read this high-level overview is necessary to understand how the package operates.
Code Hierarchy Models → Model → Input_Files → Input_File → Features → Feature
Each point in the hierarchy has certain methods and attributes associated with it. These methods give you the functionality for operating the pipeline.
First Time Setup¶
If you have not installed the garden, follow the instructions here: the_garden
Usage¶
High-level overviews for common operations. For more detailed instructions, checkout the Pages.
Initializing a Repo for ml_pipeline¶
Navigate to a directory in the command prompt
cd C:/Path/to/Repo
Call the package’s main script
python -m ml_pipeline
Query new Raw Data¶
python main_XXX.py
At the Models options screen, select “Open Model”.
Select any Model from the list (this selection does not matter)
In the Model options screen, select “Open Input Files”.
Select the first option.
In the Input Files options screen, select “Open Input File”
Select the Input File for which you would like to query new data.
In the Input File options screen, select “Query from Source Database”
Move Query Staged data to Raw Data¶
1. python main_XXX.py 2.At the Models options screen, select the option for “Open Model” 3. Select the Model you would like to run 4.In the Model options screen, select the option for “Run Pipeline”
Running all Models¶
python main_XXX.py
At the Models options screen, select the option for “Run Models”