Release G: AI & Machine Learning

Kubeflow

Kubeflow is a collection of cloud native tools which covers all stages of the Model Development Life Cycle: data exploration, data preparation, model training/tuning/testing and model serving.

There is currently a diverse selection of libraries, tools and frameworks for machine learning.

Kubeflow allows you to compose and customize your own stack based on your specific needs

Kubeflow Pipelines

Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows using Docker containers.

The isolation provided by containers allows machine learning stages to be portable and reproducible.

Kubeflow Pipelines are designed to simplify the process of building and deploying machine learning systems at scale.


Kubeflow Pipelines provide:

  • An orchestration engine for running multistep workflows
  • Python SDK to build and run pipeline components
  • A user interface to visualize your workflows.

Kubeflow Pipelines are based on Argo Workflows, an open source, container-native workflow engine for Kubernetes.


You can install kubeflow pipelines as a standalone component (it does include minio) using the following commands:

Kubeflow Pipelines
kubectl create -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=1.8.2"
kubectl create -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=1.8.2"

Run the following command to view the pipeline dashboard on your localhost:

pipeline ui
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

See also:

Local Deployment

Experiment with the Pipelines Samples

Kubeflow Pipelines up and running in 5 minutes

Minio

MinIO Client Complete Guide

MinIO is a High Performance Object Storage. It is API compatible with Amazon S3 cloud storage service.

Run the following command to view the minio dashboard on your localhost:

minio ui
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

You can then connect to minio, create a bucket and then upload a file:

minio commands
mc config host add minio http://localhost:9000 minio minio123
mc mb minio/ml-training
mc cp iris.tar.gz minio/ml-training/data/iris.tar.gz

AWS CLI with MinIO Server

Minio Playground (AWS Access Key ID: Q3AM3UQ867SPQQA43P2F, AWS Secret Access Key: zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG)

MinIO Bucket Notification Guide

Querying data without servers or databases using Amazon S3 Select

How to use s3 select in AWS SDK for Go

AWS Glue 101: All you need to know with a full walk-through

Kserve

KServe enables serverless inferencing on Kubernetes and provides performant, high abstraction interfaces for common machine learning (ML) frameworks like TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX to solve production model serving use cases.

You can install kserve using the following commands:

kserve
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.8/hack/quick_install.sh" | bash

See Also:

Deploy Transformer with InferenceService

KFServing Transformers

Deployment with KServe

Knative: Creating a Service

KServe: A Robust and Extensible Cloud Native Model Server

Deploying a model using the KServe Python SDK

Knative Monitoring

Grafana can be used to monitor your kubeflow stack.

The knative-monitoring yaml is available here: monitoring-metrics-prometheus.yaml

You can install knative monitoring using the following commands:

Knative monitoring
kubectl create ns knative-monitoring
kubectl create -f kubeflow/monitoring-metrics-prometheus.yaml

Knative Eventing

Knative Eventing is a collection of APIs that enable you to use an event-driven architecture with your applications.

Knative eventing can be installed using the following commands:

Knative Eventing
kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.6.0/eventing-crds.yaml
kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.6.0/eventing-core.yaml

To install the In memory broker use:

In Memory Broker
kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.6.0/in-memory-channel.yaml
kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.6.0/mt-channel-broker.yaml

To install the kafka broker use:

Kafka
kubectl apply -f https://github.com/knative-sandbox/eventing-kafka-broker/releases/download/knative-v1.6.0/eventing-kafka-controller.yaml
kubectl apply -f https://github.com/knative-sandbox/eventing-kafka-broker/releases/download/knative-v1.6.0/eventing-kafka-broker.yaml

If you choose to use kafka you'll need to install strimzi first:

strimzi
kubectl create ns kafka
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
kubectl apply -f https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml -n kafka
kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n kafka

Usage of cloud events with kafkasource:

Knative eventing uses the cloudevents format for exchanging the events. Example of the cloudevents exchanged is as follows,

Producing kafka message:


Kafka message as cloud event:


See also:

Installing Knative Eventing using YAML files

Knative Eventing code samples

Creating a PingSource object

New trigger filters

Strimzi Quick Starts

Processing S3 Files using Knative Eventing

Explore Knative Eventing

Katib

Katib is the project which is agnostic to machine learning (ML) frameworks. It can tune hyperparameters of applications written in any language of the users' choice and natively supports many ML frameworks, such as TensorFlow, MXNet, PyTorch, XGBoost, and others.


ARGO

Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD. Define workflows where each step in the workflow is a container.

Installing Argo

Quick Start

See Also:

Argo Events - The Event-Based Dependency Manager for Kubernetes

MLFlow

MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models.

MLflow: A Machine Learning Lifecycle Platform

Installing MLflow

Quickstart

Time Series Databases

Influxdb

InfluxDB is an open-source time series database developed by InfluxData. It is written in the Go programming language and is used for the storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics.

A time series database is specific designed to handle time-stamped metrics and occurrences or measurements.

Internet of Things (IoT) is typically defined as a group of devices that are connected to the Internet, all collecting, sharing, and storing data.
Examples include temperature sensors in an air-conditioning unit and pressure sensors installed on a remote oil pump.

Scalability and the capacity to quickly consume data are the main database requirements for IoT apps. NoSQL systems are ideal for IoT since they are designed with significant horizontal scalability.

InfluxDB is central to many IoT solutions providing high throughput ingestion, compression and real-time querying of that same data.

You can run influxdb on minikube using the following file: influxdb.yaml

The configmap stores the influxdb configuration file which points to the directory where the data files are stored, in this case /var/influxdb, you may want to change this in your own environment.

Python code example - create a time series table in influxdb

Influxdb Dataframes
from influxdb import DataFrameClient, InfluxDBClient
import pandas_datareader as pdr
from datetime import datetime, timedelta

print("Create pandas DataFrame")
yesterday = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
lastyear = (datetime.now() - timedelta(days=365)).strftime('%Y-%m-%d')
today = datetime.today().strftime('%Y-%m-%d')

companies = ['AAPL'] #, 'MSFT', 'GOOGL']
df = pdr.DataReader(companies, 'yahoo', start=lastyear, end=yesterday)
print(df)
print(df.index)

column_names = list(df.columns)
print(column_names)
df.columns = (df.columns[0:]).tolist()
new_columns = []
symbol = ""
# Rename columns
for column in df.columns:
    #print(column[0])
    new_columns.append(column[0])
    symbol=column[1]
df.columns = new_columns
# Add column
df['Ticker'] = symbol
column_names = list(df.columns)
print(column_names)

user = 'influxdb'
password = 'influxdb'
dbname = 'shares'
protocol = 'line'
host = 'localhost'
port = 8086
measurement = 'price'

client = DataFrameClient(host, port, user, password, dbname)

print("Create database: " + dbname)
client.create_database(dbname)

print("Write DataFrame")
client.write_points(df, measurement, protocol=protocol)


print("Read DataFrame")
client = InfluxDBClient(host, port, user, password, dbname)
q = "select * from " + measurement
df = pd.DataFrame(client.query(q, chunked=True, chunk_size=10000).get_points())  # Returns all points
print(df.head())

print("Delete database: " + dbname)
client.drop_database(dbname)



Feature Engineering

Feature engineering is the process of selecting, manipulating, and transforming raw data into features that can be used in supervised learning. In order to make machine learning work well on new tasks, it might be necessary to design and train better features.

Feature Engine

Feature-engine is a Python library with multiple transformers to engineer and select features to use in machine learning models.

Feature-engine: A Python library for Feature Engineering and Selection

Feature-engine: A new open source Python package for feature engineering

Feature Selection

SelectFromModel Feature Selection Example in Python


Time Series Forecasting

Time series analysis comprises methods for analyzing time-series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values.

Feature engineering for time series forecasting

Feature engineering for time series forecasting


LSTM (Long short-term memory)

Time Series Forecasting With RNN(LSTM)

Time Series Forecasting Using LSTM


Explore and Interpret ML Models

Dalex

XAI in Python with dalex

Model Explanation With Dalex

Visualizing ML model bias with dalex

AI Fairness 360

AI Fairness 360

Anchors

Ulltimate Guide To Model Explainability


Feast

It is a framework for storing and serving features to machine learning models.

Creating feature store with Feast