This sample implements two mlservicewrapper-core services to expose fastText functionality:
- Language identification
- Vectorization
This tutorial assumes you have:
- A git client installed
- Python 3.6+ installed
git clone https://github.com/ml-service-wrapper/ml-service-wrapper-sample-fasttext.git
cd ml-service-wrapper-sample-fasttextvirtualenv venv
source ./venv/bin/activateThis includes all core modules you'll need to implement a Service, including a basic debug host.
pip install mlservicewrapper-coreYou'll need a copy of fastText, which can occassionally have complexities during installation. See their Get Started documentation for details and instructions, and be sure to build and install the Python module.
You'll also need urllib3, which can be installed using pip:
pip install urllib3With the source code local, everything should work. Simply run the debug module, passing in the appropriate configuration file and sample file as arguments:
python -m mlservicewrapper.core.debug \
--config "./config/language_detection.json" \
--load-params ModelPath=./models/langdetect.bin \
--input-paths Data=./data/input/multiple_languages.csv \
--output-dir "./data/output"Open the generated file at ./data/output/Results.csv to see detected languages.
The flow of the debug execution is easily traceable:
-
The file at the
--configargument is parsed. In our case, it only has two parameters:{ "modulePath": "../src/service.py", "className": "LanguageDetectionService" }The debug module looks for the module at
modulePath, relative to the location of the configuration file. This helps improve portability of the code-base, especially for hosting in production.Once our
service.pyscript is imported, the debug module looks for a class defined in it that matches theclassNameproperty,LanguageDetectionService.The debug creates an instance of that service, builds out a context (more on that in a moment), then calls
loadon it. -
Because
LanguageDetectionServiceinherits fromFastTextServiceBase, it's really the implementation ofFastTextServiceBasethat gets called. It uses methods to allow different implementations, but flattened out, its first step is to read theModelPathparameter. Because we're running this in debug, the main source for this value is the--load-paramsargument. Since we defined./models/langdetect.bin, it will check whether that file exists.If this is your first time running the sample (or if you've deleted that file), the
FastTextServiceBaseimplementation will call its ownget_model_urlfunction. Because fastText has a pre-built model for language identification, we just use that. Notice, however, that theFastTextVectorizerService, outside scope of this walkthrough, would look for a parameterModelUrl.With that url in-hand, the file is downloaded to the location at
ModelPath. This step essentially primes the cache for future executions.Finally, knowing it has a model accessible, it calls
fasttext.load_model, and stores the result toself.modelfor use later. -
Being a debug run, the debug module moves right along into the next phase. In production environments, depending on the host, this may happen immediately, or it could be a while (e.g. if exposing an HTTP API).
-
Back in
LanguageDetectionService,processgets called. This is where all of our data-specific prediction logic lives, and it has several steps.First, it asks the given
ProcessContextfor a dataframe calledData. Hosting environments are left to figure out where to source a result, but in our run, it will first check the--input-pathsargument. Since we mappedDatato./data/input/multiple_languages.csv, the context will simply load that file into a dataframe and return it.With that dataframe available, a quick validation check is run: if the dataframe is missing a
Textcolumn, we want to raise a descriptive error, hencemlservicewrapper.core.errors.MissingDatasetFieldError. Fortunately our data file does have that column, and we're free of this error.A similar test is also run to ensure a field called
Idis available. TheIdfield isn't used for prediction, but gets echoed back in the result dataframe as a useful piece of context for callers.Now's when we start real ML logic which would vary heavily depending on model and desired outcome. For our implementation, the first step is to lightly clean the input text field, then call
DataFrame.applyto build a prediction dataframe for the results, add column labels, and finally re-insert theIdfield.With the result dataframe all ready, we make a final call to our
ProcessContext.set_output_dataframe, naming the outputResultsand passing them to be returned.Similarly to the input dataframe, it's up to the host environment to decide what to do with the dataframe. The debug module sees our
--output-dirargument, and saves the contents of the dataframe to a csv in that directory named after the dataset itself: specifically,Results.csv.
Now that the debug service is able to run with the debug module, it's trivial to deploy it as a HTTP service using the mlservicewrapper.host.http module.. Simply install the package and run it.
pip install mlservicewrapper-host-http
python -m mlservicewrapper.host.http --config ./config/language_detection.json --prodIn another window, you can use cURL to make calls against the hosted API.
curl --header "Content-Type: application/json" \
--request POST \
--data '{
"inputs": {
"Data": [
{ "Id": 1, "Text": "This is a test" },
{ "Id": 2, "Text": "Dies ist ein Test" }
]
}
}' \
http://localhost:5000/api/process/batchYou should get a response back that looks like this:
{
"outputs": {
"Results": [
{ "Id": 1, "Label": "en", "Score": 0.967 },
{ "Id": 2, "Label": "de", "Score": 1.000 }
]
}
}