.. _Example: ============ Examples ============ Note that all the IPython notebooks used in the example are available here: https://github.com/ritabratamaiti/RapidML/tree/master/Examples **************************************** ASD (Autism Spectrum Disorder) Detection **************************************** .. _Github: https://github.com/ritabratamaiti/Autism-Detection-API GitHub: https://github.com/ritabratamaiti/Autism-Detection-API This project utilizes RapidML to detect ASD cases in adults. The training data consists of responses provided by the patients on the AQ-10 questionnaire. RapidML is utilized for selecting, training, serializing and packaging a high accuracy classifier. The files directory generated by RapidML, containing the packaged model is then uploaded to a WSGI server (See various deployment options here: http://flask.pocoo.org/docs/1.0/deploying/). PythonAnywhere (https://www.pythonanywhere.com) was used in this project. ``The builder_script.py`` utilizes RapidML. .. code-block:: python import RapidML import os import pandas as pd # This Autism Screening Adult Data Set is from UCI Machine Learning Repository and is available here: https://archive.ics.uci.edu/ml/datasets/Autism+Screening+Adult df = pd.read_csv('out.csv') df = df.drop(columns = ['Unnamed: 0']) df.head() ml_model = RapidML.rapid_classifier(df,name='ASDapi') *Note: The training data is an Autism Screening Adult DataSet from UCI Machine Learning Repository and is available here:* https://archive.ics.uci.edu/ml/datasets/Autism+Screening+Adult The code generates the following output. .. code-block:: text RapidML, Version: 0.1, Author: Ritabrata Maiti .---. .----------- / \ __ / ------ / / \( )/ ----- ////// ' \/ ` --- //// / // : : --- // / / /` '-- // //..\ ====UU====UU==== '//||\\` ''`` Warning: xgboost.XGBClassifier is not available and will not be used by TPOT. Warning: xgboost.XGBRegressor is not available and will not be used by TPOT. Warning: xgboost.XGBRegressor is not available and will not be used by TPOT. Warning: xgboost.XGBRegressor is not available and will not be used by TPOT. Using the RapidML Classifier; Experimental, For Issues Contact Author: ritabratamaiti@hiretrex.com Label Encoding is being done.... Training.... Generation 1 - Current best internal CV score: 1.0 Generation 2 - Current best internal CV score: 1.0 Generation 3 - Current best internal CV score: 1.0 Generation 4 - Current best internal CV score: 1.0 Generation 5 - Current best internal CV score: 1.0 Best pipeline: DecisionTreeClassifier(input_matrix, criterion=entropy, max_depth=2, min_samples_leaf=4, min_samples_split=6) Sample Output from input dataframe: 1,1,0,1,0,0,1,1,0,1,6,35.0,f,White-European,no,yes,United States,no,Self,NO The generated model, scripts and serialized files are stored in the directory: ``ASDapi``.This directory is uploaded to a WSGI server, for making cloud predictions. **Note**: This is a complete project, and some parts (such as the creation of the Android application) is outside the scope of RapidML documentation. Please visit the project on Github_ for more details. ******************* Boston House Prices ******************* Let's say we are building a machine learning model to run on the cloud and predict housing prices in an area, using parameters such as crime rates, business development, pollution metrics etc. We will be using the Boston House Prices dataset, due to its wide availability and usage within machine learning academia. Dataset description here: https://www.kaggle.com/c/boston-housing **Note**: We will be using ``sklearn.datasets`` for easy loading of the Boston-housing dataset within Python. Since we are predicting prices, it is clearly a regression problem. We will be using ``RapidML.rapid_regressor_arr`` for this task. .. code-block:: python # coding: utf-8 from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split import RapidML housing = load_boston() X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, train_size=0.75, test_size=0.25) model = RapidML.rapid_regressor_arr(X_train, y_train) print(model.m_tpot.score(X_test, y_test)) The following output is generated. .. code-block:: text RapidML, Version: 0.1, Author: Ritabrata Maiti .---. .----------- / \ __ / ------ / / \( )/ ----- ////// ' \/ ` --- //// / // : : --- // / / /` '-- // //..\\ ====UU====UU==== '//||\\\` ''`` Warning: xgboost.XGBClassifier is not available and will not be used by TPOT. Warning: xgboost.XGBRegressor is not available and will not be used by TPOT. Warning: xgboost.XGBRegressor is not available and will not be used by TPOT. Warning: xgboost.XGBRegressor is not available and will not be used by TPOT. Using RapidML Regressor with arrays, Inputs will not be label encoded.; Experimental, For Issues Contact Author: ritabratamaiti@hiretrex.com Training.... Generation 1 - Current best internal CV score: -11.913707598413463 Generation 2 - Current best internal CV score: -11.913707598413463 Generation 3 - Current best internal CV score: -11.913707598413463 Generation 4 - Current best internal CV score: -11.913707598413463 Generation 5 - Current best internal CV score: -11.404014702360742 Best pipeline: GradientBoostingRegressor(input_matrix, alpha=0.75, learning_rate=0.1, loss=huber, max_depth=3, max_features=1.0, min_samples_leaf=5, min_samples_split=4, n_estimators=100, subsample=0.6000000000000001) -10.908425630183695 As we can see in this example, a score of ``-10.908425630183695`` has been achieved. Do note that different models may be generated on a separate program run and hence the scores may fluctuate by a small margin (approximately 1% or so). In the directory ``RapidML_files``, the model file and ``API.py`` script has been generated which can be uploaded to a WSGI server (with ``Flask`` support) to perform cloud predictions. ******************************************************************************** Using RapidML to build a neural network (For recognizing hand-written digits) ******************************************************************************** This example serves to demonstrate the versatility of RapidMl, by using udm(User Defined Models). Do note that we will be using matplotlib to visualise the digits' images. In this example, we use ``RapidML.rapid_udm_arr`` in order to feed a neural network classifier (``sklearn.neural_network.MLPClassifier``) as the machine learning model. We use the ``digits`` dataset from ``sklearn.datasets``, and train the neural network on half the data. The other half is used for testing and visualization. The following are Jupyter Notebook cells and their corresponding output. .. code:: ipython3 import RapidML from sklearn import datasets from sklearn.neural_network import MLPClassifier import matplotlib.pyplot as plt .. code:: ipython3 digits = datasets.load_digits() # The data that we are interested in is made of 8x8 images of digits, let's # have a look at the first 4 images, stored in the `images` attribute of the # dataset. If we were working from image files, we could load them using # matplotlib.pyplot.imread. Note that each image must have the same size. For these # images, we know which digit they represent: it is given in the 'target' of # the dataset. images_and_labels = list(zip(digits.images, digits.target)) for index, (image, label) in enumerate(images_and_labels[:4]): plt.subplot(2, 4, index + 1) plt.axis('off') plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest') plt.title('Training: %i' % label) # To apply a classifier on this data, we need to flatten the image, to # turn the data in a (samples, feature) matrix: n_samples = len(digits.images) data = digits.images.reshape((n_samples, -1)) .. code:: ipython3 clf = MLPClassifier(alpha=1) mclf = RapidML.rapid_udm_arr(data[:n_samples // 2], digits.target[:n_samples // 2], clf) .. parsed-literal:: Using RapidML with User Defined Models and Arrays, Inputs will not be label encoded; note that the model provided by the user should be a Scikit_learn model and not a TPOT object.; Experimental, For Issues Contact Author: ritabratamaiti@hiretrex.com Training.... .. parsed-literal:: C:\Users\Ritabrata Maiti\Anaconda3\lib\site-packages\sklearn\neural_network\multilayer_perceptron.py:564: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet. % self.max_iter, ConvergenceWarning) .. code:: ipython3 expected = digits.target[n_samples // 2:] predicted = mclf.model.predict(data[n_samples // 2:]) .. code:: ipython3 from sklearn import metrics print("Classification report for classifier %s:\n%s\n" % (mclf.model, metrics.classification_report(expected, predicted))) print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted)) .. parsed-literal:: Classification report for classifier MLPClassifier(activation='relu', alpha=1, batch_size='auto', beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08, hidden_layer_sizes=(100,), learning_rate='constant', learning_rate_init=0.001, max_iter=200, momentum=0.9, nesterovs_momentum=True, power_t=0.5, random_state=None, shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False, warm_start=False): precision recall f1-score support 0 0.99 0.97 0.98 88 1 0.95 0.92 0.94 91 2 0.99 0.98 0.98 86 3 0.96 0.85 0.90 91 4 0.99 0.89 0.94 92 5 0.93 0.96 0.94 91 6 0.91 0.99 0.95 91 7 0.95 0.99 0.97 89 8 0.93 0.94 0.94 88 9 0.86 0.96 0.91 92 avg / total 0.95 0.94 0.94 899 Confusion matrix: [[85 0 0 0 1 0 2 0 0 0] [ 0 84 0 1 0 1 0 0 0 5] [ 1 0 84 1 0 0 0 0 0 0] [ 0 0 1 77 0 3 0 4 6 0] [ 0 0 0 0 82 0 6 0 0 4] [ 0 0 0 0 0 87 1 0 0 3] [ 0 1 0 0 0 0 90 0 0 0] [ 0 0 0 0 0 1 0 88 0 0] [ 0 3 0 0 0 0 0 0 83 2] [ 0 0 0 1 0 2 0 1 0 88]] .. code:: ipython3 images_and_predictions = list(zip(digits.images[n_samples // 2:], predicted)) for index, (image, prediction) in enumerate(images_and_predictions[:4]): plt.subplot(2, 4, index + 5) plt.axis('off') plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest') plt.title('Prediction: %i' % prediction) plt.show() .. image:: output_5_0.png **Note**: If you wish to use the model as a flask API, do remember to flatten the image, to turn the data in a (samples, feature) matrix, and then convert to URL argument. However, this method hasn't undergone complete testing and is not guaranteed to work. However, it is possible to modify the ``API.py`` file to say, accept an image and then flatten it within the script itself.