1. Introduction
In this use case, we utilize the "Cell Metrics"(RRU.PrbUsedDl) dataset provided by the O-RAN SC SIM space, which includes synthetic data generated by a simulator, with all data recorded in Unix timestamp format.
The model training process is carried out on the O-RAN SC AI/ML Framework, including GPU support, and considers both traditional machine learning (ML) and deep learning (DL) approaches. For ML models, we use Random Forest and Support Vector Regression (SVR), while for DL models, we employ RNN, LSTM, and GRU architectures.
By managing the ON/OFF state of cells through traffic forecasting, we can reduce power consumption. Additionally, if the AI/ML models used for forecasting are operated in an eco-friendly manner, further power savings can be achieved. In this use case, we measure the carbon emissions and energy consumption during the cell traffic forecasting process using AI/ML to ensure that the forecasting model is not only effective but also environmentally sustainable.
...
Configuring GPU Usage in a Machine Learning Pipeline
Info |
---|
This section is based on contributions from Sungjin Lee's Github repository. For more details, visit this link. |
Step 1. Install the
nvidia-container-toolkit
Code Block language bash curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
Step 2. Configure
containerd
Code Block language bash sudo nvidia-ctk runtime configure --runtime=containerd
Code Block language bash sudo vim /etc/containerd/config.toml
Code Block [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "nvidia" # Change to "nvidia" # Additional configurations are omitted [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] BinaryName = "/usr/bin/nvidia-container-runtime" # Additional configurations are omitted # Include the following content below [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.env] LD_LIBRARY_PATH = "/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64"
Code Block language bash # Restart containerd service sudo systemctl restart containerd
Step 3. Install the
nvidia-device-plugin
Code Block language bash kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.16.2/deployments/static/nvidia-device-plugin.yml
Step 4. Build the
traininghost/pipelinegpuimage
imageWe built a new base image that recognizes the configured GPU to enable proper GPU usage in the ML pipeline components
To build the required image, you can refer to the provided
Dockerfile
andrequirement.txt
at the following link, or modify the pipeline image available in the existingaimlfw-dep
Code Block language bash sudo buildctl --addr=nerdctl-container://buildkitd build \ --frontend dockerfile.v0 \ --opt filename=Dockerfile.pipeline_gpu \ --local dockerfile=pipeline_gpu \ --local context=pipeline_gpu \ --output type=oci,name=traininghost/pipelinegpuimage:latest | sudo nerdctl load --namespace k8s.io
Step 5. Verify GPU usage with
nerdctl
If the output is similar to the one below, the GPU setup is complete.
Code Block language bash sudo nerdctl run -it --rm --gpus all --namespace k8s.io -p 8888:8888 -v $(pwd):/app_run traininghost/pipelinegpuimage:latest /bin/bash -c "nvidia-smi"
...
Step 1. Download the pipeline script(
pipeline.ipynb
) provided in the “File List”Step 2. Modify the pipeline script to satisfy your own requirements
Set data features required for model training (using
FeatureStoreSdk
)We used the
RRU_PrbUsedDl
column from the Viavi Dataset
we extracted data at 30-minute intervals to train the model, focusing on capturing meaningful patterns in traffic data.
Write a TensorFlow-based AI/ML model script
We used LSTM(Long Short-Term Memory) model to predict downlink traffic
You can add other model prediction accuracy(e.g. RMSE, MAE, MAPE)
Configure Energy and CO2 emission tracking for the Green Network use case using
CodeCarbon
we provide:
Training duration, RAM/CPU/GPU energy consumption, CO2 emissions
Upload the trained model along with its metrics (using
ModelMetricsSdk
)
Step 3. Compile the pipeline code to generate a Kubeflow pipeline YAML file
...
Set the TrainingJob name in lowercase
Configure the Feature Filter
For the query to work correctly, use backticks(`) to specify a specific cell site for filtering (e.g., `Viavi.Cell.Name` == "S1S10/B13/C1C3")
Refer to the image below
...
5. Result
These logs can be reviewed through the logs of the Kubeflow pod generated during training execution, and the details that can be checked are as follows:
...
6. Load Model
Code Block | ||
---|---|---|
| ||
apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: traffic-forecasting-model namespace: kserve-test spec: predictor: tensorflow: storageUri: "http://tm.traininghost:32002/model/viavi_lstm_s1b13c3s10b13c3/1/Model.zip" runtimeVersion: "2.517.10-gpu" resources: requests: cpu: 0.1 memory: 0.5Gi 1Gi nvidia.com/gpu: 1 limits: cpu: 0.1 memory: 0.5Gi 1Gi nvidia.com/gpu: 1 |
Code Block | ||
---|---|---|
| ||
kubectl create namespace kserve-test kubectl apply -f deploy.yaml -n kserve-test |
...
Code Block | ||
---|---|---|
| ||
source predict.sh |
...
7. Comparison
Code Block | ||
---|---|---|
| ||
import json
import requests
import matplotlib.pyplot as plt
with open('input.json', 'r') as file:
data = json.load(file)
instances = data['instances']
original_data = [instance[-1][0] for instance in instances[1:]]
print(f"Original: {original_data}")
print(f"Length: {len(original_data)}")
cluster_ip = "172.24.100.123" # IP of where Kserve is deployed
ingress_port = "32667" # Port of the Istio Ingress Gateway
url = f"http://{cluster_ip}:{ingress_port}/v1/models/traffic-forecasting-model:predict"
headers = {
"Host": "traffic-forecasting-model.kserve-test.svc.cluster.local",
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
predictions = [pred[0] for pred in response.json()['predictions']]
else:
print(f"Error: {response.status_code} - {response.text}")
predictions = []
print(f"Predictions: {predictions}")
plt.figure(figsize=(35, 7))
plt.plot(original_data, label='Original Data', linestyle='-', marker='o')
if predictions:
plt.plot(range(len(predictions)), predictions, label='Predictions', linestyle='--', marker='x')
plt.title('Comparison of Original Data and Predictions')
plt.xlabel('Index')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show() |
...
4. File list
CellReports.csv
The file contains traffic data from Viavi.View file name insert.py The file processes the dataset and inserts the data into InfluxDB.
(Changed required: DATASET_PATH , INFLUXINFLUXDB_IP , INFLUXINFLUXDB_TOKEN)View file name pipeline.ipynb The file defines the model structure and training process.
View file name deploy.yaml The yaml file is used for deploying model inference service.
View file name predict.sh The script used for excuting the model prediction.
View file name input.json The json file is used for Inference.
5. Example
Input data
Code Block | ||
---|---|---|
| ||
[
[
[
0.8342111290054597
],
[
0.6010991527639614
],
[
0.6010991527639614
],
[
0.7556675063318373
],
...
]
] |
output data
Code Block | ||
---|---|---|
| ||
"predictions": [
[0.112432681], [0.0969874561], [0.100156009], [0.308645517],
[0.303829], 0.41753903], [0.539769948], [0.618172765],
[0.634754062], [0.625748396], [0.530416906], [0.333626628],
[0.208134115], [0.137386888], [0.113081336], [0.105671018],
[0.14362359], [0.258447438], [0.55385375], [0.567666292],
[0.680490911], [0.738528132], [0.717839658], [0.653536677],
[0.524426162], [0.453805923], [0.268508255], [0.170119792],
[0.115568757], [0.0889923126], [0.0926849917],[0.143254936],
[0.277066618], [0.629820824], [0.710377932], [0.728215456],
[0.707016706], [0.638304889], [0.527941108], [0.411735266],
[0.345081121], [0.171168745], [0.14139232], [0.0903047472],
[0.231243849],[0.170835257], [0.178088814], [0.255523622],
[0.405331433], [0.626908183], [0.738317609], [0.797301173],
[0.575131953], [0.416216075], [0.296788305], [0.203006953],
[0.15064013], [0.136724383], [0.159134328], [0.414751947],
[0.386262804], [0.395934224], [0.432646126], [0.598033369],
[0.657576084], [0.664435327], [0.585267484], [0.40978539],
[0.28032586], [0.203143746], [0.136435121], [0.102216847],
[0.109813243], [0.164212137], [0.438008606], [0.511521935]
] |
Contributors
Peter Moonki Hong - Samsung
Taewan Kim - Samsung
Corbin(Geon) Kim - Kyunghee Univ. MCL
Sungjin Lee - Kyunghee Univ. MCL
Hyuksun Kwon - Kyunghee Univ. MCL
Hoseong Choi - Kyunghee Univ. MCL
Version History
Data | Ver. | Author | Comment |
---|---|---|---|
2024-12-10 | 1.0.0 | Corbin(Geon) Kim, Sungjin Lee, Hyuksun Kwon, Hoseong Choi | |
2024-01-08 | 1.0.1 | Sungjin Lee, Hyuksun Kwon | Corrected the input.json file |