RapidML API¶
Getting started with RapidML is easy. All RapidML functions return an object of the rml
class.
The RapidML.rml
Class¶
-
class
RapidML.
rml
¶
rml
class attributes¶
model
:¶
This is the machine learning model generated by RapidML. It has already been trained on the training data and the target that was provided by the user, either as a DataFrame or in the form of X,y arrays wherein X is training data and y is target variables. This attribute is never null.
m_tpot
:¶
Note: This may be null depending on the type of the functions use. See function usage here. This is a TPOT object which may be a TPOTClassifier or a TPOTRegressor. RapidML uses this object to find the optimal machine learning model for the supplied data.
You can use the various functions and attributes of rml.m_tpot in order to evaluate the trained model. For example: rml.m_tpot.score(testing_features, testing_classes)
will allow us to evaluate our model on training data by returning an accuracy score. See the TPOT documentation for all the available functions and attributes of rml.m_tpot
d
:¶
Note: This may be null depending on the type of the functions use. See function usage here.
This is a defaultdict containing the labels and their corresponding transformed values, should we choose to labelencode the table. See sklearn.preprocessing.LabelEncoder for more details.
rml
class functions¶
put(self, mdl, d=None)
:¶
This is a method used by the RapidML functions for assignment of attributes of rml objects. Here mdl
can either be the model supplied by the user or supplied by RapidML via TPOT.
If mdl
is a TPOT object then the model
attribute is mdl.fitted_pipeline_
(the best pipeline found with TPOT for the training data) and the m_tpot
attribute is a TPOT object. However if mdl
is a fitted (trained) machine learning model then the model
attribute will be mdl and the m_tpot
attribute will be null.
If we decide to labelencode the training data, then the d
attribute will be the d supplied as the function argument. Otherwise, the d
attribute will be null.
le(self, df)
:¶
This function may be called by the user from an rml
object, in order to perform label encoding on another dataset, using the same encoding table used on a previous similar dataset.
For example, if we wish to perform the same transformation of labels on two DataFrames with same types of columns but different rows, then we first labelencode the first table, and then use this function to labelencode the next table.
RapidML.rapid_classifier
¶
-
RapidML.
rapid_classifier
(df, le = 'Yes', model = TPOTClassifier(generations=5, population_size=50, verbosity=2), name='RapidML_files')¶
The rapid_classifier
performs label encoding on the input DataFrame df
(which are the features), depending on the user’s input. It then uses a TPOT backend to perform an intelligent search to find and optimize the best classifier in accordance with the input data. Finally, it populates an rml
object’s attributes and returns this object.
Parameters¶
df
¶
Type: pandas.DataFrame
This is the input DataFrame provided by the users as the training features as well as the initial columns and the target as the last column on the DataFrame.
le
¶
Type: str
The default value is 'Yes'
. If le
is 'Yes'
, then RapidML will labelencode the input DataFrame supplied as df
, and store the LabelEncoder
in a defaultdict
. Or, if le
is 'No'
then LabelEncoding will not be done. For any other value of le
, a value error will be raised.
model
¶
Type: tpot.TPOTClassifier
The default value is tpot.TPOTClassifier(generations=5, population_size=50, verbosity=2)
. This is a TPOTClassifier object. You can pass a TPOTClassifier object with different parameter configurations as per your requirement. In general, increasing the generations
and population_size
increases the model’s accuracy. See TPOTClassifier for more details.
name
¶
Type: str
Default value is "RapidML_Files"
. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized dill
files, as well as the API.py and ``helper.py
scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the web API.
Returns¶
Returns a rml
object. If le
is 'Yes'
then rml.d
is populated, otherwise, it is null. rml.model
and rml.m_tpot
are always populated, when using rapid_classifier
.
Files Created¶
model
¶
This is the Machine Learning model generated by RapidML which is saved after being serialized via dill
.
d
¶
This is the DefaultDict
(like dict
) containing the LabelEncoder used to encode the labels in the DataFrame. It has been saved after serialization via dill
.
df
¶
This is the skeletal DataFrame, which contains only headers and no data. It has been saved after serialization via dill
.
dt
¶
This is a list containing the data types of the columns in the input DataFrame
. It has been saved after serialization via dill
.
f
¶
This is a string containing a dummy input value and can be fed to the API as an URL argument. It is the second row of the input DataFrame
, converted to a string. It has been saved after serialization via dill
.
API.py
¶
This is the actual Flask-API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.
helper.py
¶
This is a helper module used by API.py
and performs the actual predictions using the RapidML generated model.
RapidML.rapid_regressor
¶
-
RapidML.
rapid_regressor
(df, le = 'No', model = TPOTRegressor(generations=5, population_size=50, verbosity=2), name='RapidML_files')¶
The rapid_regressor
performs label encoding on the input DataFrame df
(which are the features), depending on the user’s input. It then uses a TPOT backend to perform an intelligent search to find and optimize the best regressor in accordance with the input data. Finally, it populates an rml
object’s attributes and returns this object.
Parameters¶
df
¶
Type: pandas.DataFrame
This is the input DataFrame provided by the users as the training features as well as the initial columns and the target as the last column on the DataFrame.
le
¶
Type: str
The default value is 'No'
. If le
is 'Yes'
, then RapidML will labelencode the input DataFrame supplied as df
, and store the LabelEncoder
in a defaultdict
. Or, if le
is 'No'
then LabelEncoding will not be done. For any other value of le
, a value error will be raised.
model
¶
Type: tpot.TPOTRegressor
The default value is tpot.TPOTRegressor(generations=5, population_size=50, verbosity=2)
. This is a TPOTRegressor object. You can pass a TPOTRegressor object with different parameter configurations as per your requirement. In general, increasing the generations
and population_size
increases the model’s accuracy. See TPOTRegressor for more details.
name
¶
Type: str
Default value is "RapidML_Files"
. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized dill
files, as well as the API.py and ``helper.py
scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the web API.
Returns¶
Returns a rml
object. If le
is 'Yes'
then rml.d
is populated, otherwise, it is null. rml.model
and rml.m_tpot
are always populated, when using rapid_regressor
.
Files Created¶
model
¶
This is the Machine Learning model generated by RapidML which is saved after being serialized via Dill
.
d
¶
This is the DefaultDict
(like dict
) containing the LabelEncoder used to encode the labels in the DataFrame. It has been saved after serialization via Dill
.
df
¶
This is the skeletal DataFrame, which contains only headers and no data. It has been saved after serialization via Dill
.
dt
¶
This is a list containing the data types of the columns in the input DataFrame
. It has been saved after serialization via Dill
.
f
¶
This is a string containing a dummy input value and can be fed to the API as an URL argument. It is the second row of the input DataFrame
, converted to a string. It has been saved after serialization via Dill
.
API.py
¶
This is the actual Flask-API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.
helper.py
¶
This is a helper module used by API.py
and performs the actual predictions using the RapidML generated model.
RapidML.rapid_classifier_arr
¶
-
RapidML.
rapid_classifier_arr
(X, Y, model = TPOTClassifier(generations=5, population_size=50, verbosity=2), name='RapidML_files')¶
The rapid_classifier_arr
function is similar to the rapid_classifier
, except rather receiving the features and targets as a single input DataFrame, the function receives the features as X (type numpy.array), and the target as Y (type numpy.array
). Another important point of difference is that this function doesn’t perform label encoding.
Parameters¶
model
¶
Type: tpot.TPOTClassifier
Default value is TPOTClassifier(generations=5, population_size=50, verbosity=2)
. This is a TPOTClassifier object. You can pass a TPOTClassifier object with different parameter configurations as per your requirement. In general, increasing the generations
and population_size
increases the model’s accuracy. See the TPOTClassifier for more details.
name
¶
Type: str
Default value is "RapidML_Files"
. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized Dill
files, as well as the API.py
and helper.py
scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the internet.
Returns¶
Returns a rml
object. rml.d
is always null. rml.model
and rml.m_tpot
are always populated.
RapidML.rapid_regressor_arr
¶
-
RapidML.
rapid_regressor_arr
(X, Y, model = TPOTRegressor(generations=5, population_size=50, verbosity=2), name='RapidML_files')¶
The rapid_regressor_arr
function is similar to the rapid_regressor
, except rather receiving the features and targets as a single input DataFrame, the function receives the features as X (type numpy.array), and the target as Y (type numpy.array
). Another important point of difference is that this function doesn’t perform label encoding.
Parameters¶
model
¶
Type: tpot.TPOTRegressor
Default value is TPOTRegressor(generations=5, population_size=50, verbosity=2)
. This is a TPOTRegressor object. You can pass a TPOTRegressor object with different parameter configurations as per your requirement. In general, increasing the generations
and population_size
increases the model’s accuracy. See the TPOTRegressor for more details.
name
¶
Type: str
Default value is "RapidML_Files"
. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized Dill
files, as well as the API.py
and helper.py
scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the internet.
Returns¶
Returns a rml
object. rml.d
is always null. rml.model
and rml.m_tpot
are always populated.
RapidML.rapid_udm
¶
-
RapidML.
rapid_udm
(df, model, le = 'No', name='RapidML_files')¶
This allows RapidML to be a versatile model in the hands of experienced Data Scientists and developers. It works similarly to the rapid_regressor
or the rapid_classifier
wherein a single DataFrame is passed which contains the input data as well as the target (which is the last column).
However, it allows the user to provide a sklearn
model of their choice. Depending on the user’s choice, label encoding is done or ignored. The model that is supplied is then fitted (trained) on the input data and then stored, by populating the rml.model
attribute.
Parameters¶
df
¶
Type: pandas.DataFrame
This is the input DataFrame provided by the users as the training features as well as the initial columns and the target as the last column on the DataFrame.
model
¶
Type: sklearn
model
This may be any model which supports the syntax sklearn.model.fit(X,y)
where X is input data and y is target.
le
¶
Type: str
The default value is 'Yes'
. If le
is 'Yes'
, then RapidML will labelencode the input DataFrame supplied as df
, and store the LabelEncoder
in a defaultdict
. Or, if le
is 'No'
then LabelEncoding will not be done. For any other value of le
, a value error will be raised.
name
¶
Type: str
Default value is "RapidML_Files"
. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized dill
files, as well as the API.py and ``helper.py
scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the web API.
Returns¶
Returns a rml
object. If le
is 'Yes'
then rml.d
is populated, otherwise, it is null. rml.model
is always populated, while rml.m_tpot
is always empty.
Files Created¶
model
¶
This is the Machine Learning model generated by RapidML which is saved after being serialized via dill
.
d
¶
This is the DefaultDict
(like dict
) containing the LabelEncoder used to encode the labels in the DataFrame. It has been saved after serialization via dill
.
df
¶
This is the skeletal DataFrame, which contains only headers and no data. It has been saved after serialization via dill
.
dt
¶
This is a list containing the data types of the columns in the input DataFrame
. It has been saved after serialization via dill
.
f
¶
This is a string containing a dummy input value and can be fed to the API as an URL argument. It is the second row of the input DataFrame
, converted to a string. It has been saved after serialization via dill
.
API.py
¶
This is the actual Flask-API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.
helper.py
¶
This is a helper module used by API.py
and performs the actual predictions using the RapidML generated model.
RapidML.rapid_udm_arr
¶
-
RapidML.
rapid_udm_arr
(X, Y, model, name='RapidML_files')¶
The rapid_udm _arr
function is similar to the rapid_udm
, except rather receiving the features and targets as a single input DataFrame, the function receives the features as X (type numpy.array), and the target as Y (type numpy.array
). Another important point of difference is that this function doesn’t perform label encoding.
Parameters¶
model
¶
Type: sklearn
model
This may be any model which supports the syntax sklearn.model.fit(X,y)
where X is input data and y is target.
name
¶
Type: str
Default value is "RapidML_Files"
. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized Dill
files, as well as the API.py
and helper.py
scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the internet.
Returns¶
Returns a rml
object. rml.model
is always populated. rml.d
and rml.m_tpot
are always null.