When building a GenAI chatbot, there are well-defined steps you need to take to get good results. This blog provides a high-level overview of these steps and then the associated video provides some practical insights focused on guiding the process, also known as prompt engineering or parameters if you are using the API. If you’re less technical, you can ignore the “technology used” which is typically the last couple of sentences in most steps below.
You need to avoid “Garbage In / Garbage Out” so you need clean data: accurate, complete, and consistent. If for example, you have data sourced by different companies, and those companies measure or calculate the data differently, it isn’t consistent. You’ll want to run standard processes of data normalization, deduplication and outlier removal. If your data has null (empty) fields, you may want to use synthetic data tools to make the existing data more complete and/or to generate sufficient data volume to improve the precision of the results.
Use the Train, Validate, Test (TVT) approach: train the model on a set of data. Then run a new set of data through the model to validate that it is working properly. Finally, test the results to ensure it is working. You’ll iterate on this process.
You’ll also want to define goals during the model training, what do you want it to optimize for? This is done using classification such as the F1-Score, or to oversimplify defined success/failure, so it can iterate and improve over time.
Then you’ll want to know how the model is working inside its “black box”. You can interpret what the AI is doing behind the scenes using things like Shapley Values (SHAP) and LIME. These are beyond the focus of this blog.
In some scenarios, you’ll need to give the model guidance and constraints based on domain knowledge. For example, you might provide guidance on how precisely the data is measured, or operational constraints on how precisely you can implement its recommendations. You might also have certain physical constraints. For example, when drilling a horizontal oil well, they drill down and then horizontally. The length of the horizontal portion, called the lateral, is constrained by physics, namely the weight of the drill pipe on the drill bit. In short, physics constraints how long your lateral can be. I heard about a large machine learning effort to optimize production, and their expensive result: drill longer laterals. The response from the domain experts: “We make them as long as physics allows, so your results are useless”. I talk a bit more about this topic in the associated video.
At this point, you use tools to analyze, weigh, and visualize correlations between various factors/variables and target outcomes, e.g. predictions/results. Feature importance is where AI ranks the importance of various data elements in achieving the desired optimization; answering the question “What variables have the most influence on results”. You also want to know which variables are ignored or have inverse correlations. You might use correlation heatmaps or partial dependence plots to achieve these goals.
The most basic command here is “Make it better”. Essentially, you iterate the process using goal-seeking algorithms to continue to improve results toward your goals. You can use grid search or Bayesian optimization for this phase. You might consider using genetic algorithms or reinforcement learning for goal-seeking improvements.
Depending on your application, confidence and uncertainty can be valuable. For example, if you are running quality assurance on the results, you may want to prioritize the quality assurance effort on results with higher uncertainty or lower confidence. This can be done with Bayesian methods or ensemble models. You might also use confidence intervals and prediction intervals to quantify uncertainty and Monte Carlo dropout to estimate uncertainty.
How do users interact with the AI engine? Via voice, through an existing interface, chat box, etc. Then when the AI requires user feedback—e.g. make selections, fill in data, etc.—how is that interaction handled in the UI? Basically, this phase is about fitting it into any existing application and building the workflows in a user-friendly way.
This step is considered in the very beginning because you may want to firewall your proprietary data and model from leakage into a public LLM. You might also consider running your own LLM—e.g. Llama—to keep all data and learning within your physical control. Before you deploy, you also need to ensure that data and learning aren’t leaked to users, in case you are deploying the results publicly. And of course, the typical security issues of secrets, vulnerabilities, configuration, etc. should all be locked down. Encryption of data at rest and in transit is also a consideration. You’ll also want to consider data protection regulations like GDPR and CCPA.
You’ll want a robust development pipeline for deploying improvements over time. You’ll also want to monitor and evaluate the results to ensure you don’t get quality degradation over time. On the contrary, your results should improve with time and usage.
As you can see, there is a fair amount of work required to ensure that your chatbot meets the expectations of you, your company and your users. Hopefully this roadmap helps the business people understand the steps and their value, and provide a few pointers to the technical team for accomplishing each step.
If you're looking for an engineer or a team to define and manage your GenAI chatbot, check out our take on offshore staff augmentation and how it can greatly benefit your business.
Mike Hogan
September 1, 2024