Updated May 16th, 2022 by pradeepkumar.palaniswamy

Incorrect results when using documents as inputs

Problem You have a ML model that takes documents as inputs, specifically, an array of strings. You use a feature extractor like TfidfVectorizer to convert the documents to an array of strings and ingest the array into the model. The model is trained, and predictions happen in the notebook, but model serving doesn’t return the expected results for JS...

0 min reading time
Updated May 16th, 2022 by pradeepkumar.palaniswamy

Runs are not nested when SparkTrials is enabled in Hyperopt

Problem SparkTrials is an extension of Hyperopt, which allows runs to be distributed to Spark workers. When you start an MLflow run with nested=True in the worker function, the results are supposed to be nested under the parent run. Sometimes the results are not correctly nested under the parent run, even though you ran SparkTrials with nested=True ...

0 min reading time
Updated May 16th, 2022 by pradeepkumar.palaniswamy

KNN model using pyfunc returns ModuleNotFoundError or FileNotFoundError

Problem You have created a Sklearn model using KNeighborsClassifier and are using pyfunc to run a prediction. For example: %python import mlflow.pyfunc pyfunc_udf = mlflow.pyfunc.spark_udf(spark, model_uri=model_uri, result_type='string') predicted_df = merge.withColumn("prediction", pyfunc_udf(*merge.columns[1:])) predicted_df.collect() The predict...

0 min reading time
Load More