Introduction.

The model catalog is an essential tool to correctly manage the entire life-cycle of a Machine Learning (ML) model.

After you have developed a first stable version of your model, before going to following validation and test phases, you should publish the model to the Model Catalog, in order to ensure that all the members of the team are working (testing, validating) exactly the same model.

In addition, you should add all the needed information, for all the different types of professional interested to work with that model.

In this short article, I want to highlight why a Model Catalog is essential, what you can achieve using correctly a Model Catalog.

Finally, I'll add details on how to use the Model Catalog on Oracle OCI Data Science.

 

MLflow and OCI DataScience

MLflow is a tool widely used to keep track of your experiments when you are doing hyper-parameters optimization.

I have often used it and it is not difficult to integrate it with a Notebook Session.

Starting from a request of one of my customers, I have developed a document that describes how to set up MLflow in OCI Cloud and how to integrate it with a DataScience Notebook.

You can find the document here

 

I video.

In questa pagina raccolgo i link ad una serie di video, da me preparati, per spiegare come utilizzare alcune funzionalità avanzate di Oracle DataScience od anche per spiegare come affrontare task avanzati nell'ambito della DataScience e del Machine Learning.

Ho aggiunto anche un video, il primo della lista, in cui spiego le caratteristiche di un'architettura per una soluzione end-2-end, basata su servizi Cloud Oracle.

 

Strumenti e risorse per facilitare l'adozione della Data Science e Machine Learning.

Come parte del mio lavoro, io provo ad aiutare molti dei nostri clienti nel semplificare l'adozione delle tecniche e degli strumenti legati al mondo del Machine Learning e della Data Science.

E, molto spesso, mi rendo conto del fatto che, anche se cambia il dominio di applicazione e cambiano i dati, si tratta di fare delle cose che ho già fatto in altri progetti. In altre parole mi rendo conto del fatto che è possibile riusare del codice.

Per questa ragione, ho deciso di cominciare a raccogliere in un repository GitHub una serie di esempi che illustrano le tecniche più comunemente utilizzate.

Alcuni esempi:

  • Come realizzare rapidamente l'istogramma di un set di features
  • Analizzare la cardinalità delle feature, per determinare quali vanno trattate come categoriche
  • Fare il plot della Matrice di Correlazione
  • Implementare K-fold cross validation
  • Accedere all'Object Storage
  • Utilizzare MLflow insieme ad Optuna

Introduction.

I have decided to list all the most important projects in AI/Machine Learning field that I have published on GitHub.

Image Classification in Healthcare

CXR-Anomaly-Detector

Development of a model based on a Deep Convolutional Network, using a subset of NIH-CXR-14 dataset, to detect if a Chest X-Ray (CXR) contains signs of any diseases.

The model has been trained on TPU, using resources from the Kaggle site.

I have tried to reproduce the results shown in this article from Nature, with very good results.

link to GitHub repository of the project

 

 

CXR-Pneumonia

Development of a model for Pneumonia detection, again based on NIH-CXR-14 dataset.

link to the GitHub repository of the project.

 

Diabetic Retinopathy.

Diabetic Retinopathy is one of the most common and dangerous complications of Diabetes. It is one of the most common causes of blindness in aged people.

Images of the retina can be used to diagnose and monitor the damages made and diagnose this disease. Kaggle in 2015 has launched a competition on this subject.

In this work, I have applied Google EfficientNet in order to see which kind of improvements can be obtained using a state-of-the-art DNN. The results have been really interesting: I could have reached 14th place in the competition.

link to GitHub

see also: https://luigisaetta.it/index.php/deep-learning-ai/43-learning-from-kaggle-competitions

 

Neural Networks for tabular data.

Deployment of a TF 2.3 model using ONNX

In this project I'm using TF 2.3, TF Feature Column API, Keras to develop a Fully Connected Neural Network for binary classification.

Data are coming from Wisconsin Breast Cancer Dataset.

I'm using ONNX as serialization format to explore how easy is to use ONNX for these kinds of models.

see also: https://github.com/luigisaetta/onnxdeployment