top of page
  • chenderson92

Evolution and Current State of Applied Analytics in Organizations: MLOps



The applied analytics field has come a long way in terms of its effective adoption within organizations: from prototypes and exploratory models to products, from its informal use as auxiliary tools in decision-making to becoming the core of new services. This journey is mainly reflected in the gradual implementation of new technologies.


However, cultural transformation is the first challenge organizations face when seeking to correctly implement a value generation strategy through applied analytics. This paradigm shift involves understanding the business problems that need to be solved and the habits that need to be reinforced to increase value delivery: the ideal world of data products.

It is essential to start with this aspect because MLOps practices, such as DataOps, are the tangible instruments of the mentioned cultural transformation.


Currently, analytics models (in all their forms, both classical statistical models and deep learning models) play an indispensable role in hundreds of thousands of products that the public interacts with, and that many of us build. After all, in a world that produces increasingly more data, models are condensed knowledge that is consumed to make decisions without hesitation or paralysis. Well, the main cultural change (from which all others derive) that needs to be understood is that, given the endless flow of data and the transformations society undergoes, all models are partial, none are definitive. Just remember how we all experienced 2020 in this field 😅; we dubbed it "The Year Models Broke."


If the models we build are not definitive, the products that depend on them are bound to degrade over time. The only way to keep them relevant is through constant model updates or replacements. Equally important is that the ability to experiment can shed light on new data products.


From this reality, three main challenges arise:

  1. The speed of experimentation and deployment.

  2. The visibility of experimentation and its results.

  3. Reproducibility, which allows us to use the right tool at the right time and ensure the quality of what has been done.



State of ML: A proper strategy designed to address these challenges also sheds light on more specific questions that everyone designing or leading data teams has faced:

  • How can we implement models more quickly?

  • How do we ensure that models can be moved between environments?

  • How do we free up our scientists or analysts' time for more valuable tasks, such as business understanding and experimentation, by reducing the effort in repetitive tasks like model deployment?

  • How do we reduce the risk associated with investing heavily in models that never make it into production?

  • How do we quickly identify which models are not aligned with the organization's strategy?

  • How do we monitor and update models as data and context evolve over time?

MLOps takes the stage 🎇: MLOps practices aim to answer these questions by unifying data preprocessing, model training, evaluation, monitoring, and deployment into a coherent process that can be easily maintained.


It is clear that MLOps demands automation as it ensures the quality of data products when the previous steps of analytics application (understanding, exploration, hypothesis testing) are taken for granted. In this sense, the ultimate goal of MLOps is to have models in production that solve real business needs.


However, we must be cautious ❗; as always with automation, there is a risk of overdesigning irrelevant processes.


What problems does MLflow solve?



In recent years, a growing number of tools focused on MLOps practices have emerged, with different emphases: on modeling, monitoring, experimentation, etc. However, to date, one of the most mature tools is MLflow. This is primarily for two main reasons:

  1. Its open-source nature allows for continuous incorporation of best practices and developments.

  2. Its technological agnosticism allows MLflow to be used coherently across different technologies, both in vendors (MLflow can easily be used on Azure ML or Sagemaker - AWS), in languages (R, Python), or in frameworks (Spark, Pytorch, Rapids).

MLflow aims to standardize a working practice, not just a technology.



Technologically, MLflow is a platform designed to manage the complete lifecycle of models through four main components that can be used independently:

  • MLFlow Tracking: Primarily focused on the visibility of analytical experiments: control of parameters, environments, metrics, and output files (charts, among others).

  • MLFlow Projects: A format in which the code used in the construction of experiments is packaged to ensure reproducibility. Projects offer direct integrations with GIT, which enables versioning and validations (continuous integration cycles).

  • MLFlow Models: A standardized, agnostic format that allows models to be deployed for use in subsequent tasks. This can include, for example, REST APIs on containers for real-time models or on frameworks like Spark for batch jobs.

  • MLFlow Registry: A centralized component for model management. It integrates directly with Tracking for versioning and transitioning experiments to productive models.

Why MLflow with Databricks? MLflow is one of Databricks' key initiatives in becoming a unified platform for data science. MLflow is incorporated into Databricks as a managed platform. This allows us to use MLflow with minimal configuration on notebooks, providing full visibility of MLflow from the Databricks Workspace and integration with GIT and CI/CD cycles. Additionally, we benefit from scalability (think distributed hyperparameter searches), security (inherit Databricks' role-based policies), and trust (Databricks takes care of all necessary updates).

 

By Minimalistech´s editorial team.


Minimalistech has more than 10 years of experience in providing a wide range of technology solutions based on the latest standards. We have built successful partnerships with several SF Bay Area, top performing companies, enhancing their potential and growth by providing the highest skilled IT engineers to work on their developments and projects.

11 visualizaciones0 comentarios

Comments


bottom of page