Setup
Requirements
- A Windows, Linux or Mac operating system (note that you may need to adjust python to python3, etc. when installing on Mac or Linux)
- Python 3.10 since Python 3.11 isn't fully supported by all ML libraries
- Docker (If you intend to run the docs locally)
- Pip (Installed with Python3)
- Git
Install
- Install the requirements listed above.
- clone the github repo by running the following in your git bash terminal (note that on the super computer you should do this within the projects directory):
- cd into the folder by typing: Note that all your future commands will be expected to start from this directory.
- Create a folder called
data
-
Download the data from here and name it
data.csv
within thedata
folder.Optionally: You can download wget.exe from https://eternallybored.org/misc/wget/ and put it in your git bash directory (The default Windows install directory is
C:\Program Files\Git\mingw64\bin
). Next, run the following in git bash (within the project directory) to download the file:while [ 1 ]; do wget --retry-connrefused --retry-on-http-error=500 --waitretry=1 --read-timeout=20 --timeout=15 -t 0 --continue -O data/data.csv https://data.transportation.gov/api/views/9k4m-a3jc/rows.csv?accessType=DOWNLOAD if [ $? = 0 ]; then break; fi; # check return value, break if successful (0) echo `error downloading. Trying again!` sleep 1s; done;
-
Create a virtual environment to install the required modules
and then activate your instance by running on Windows. For other OS' consult the python docs hereMake sure to always activate your venv before running anything in this pipeline
-
Install the required modules (note that on the super computer you can use --no-index to install from the local cache)
You're Finished Installing! Head over to the Development page to learn how to make your first pipeline user.