Real-Life Projects – SaM Solutions

Testing an LLM Chatbot in an MCP System

Mikhail Sinkin — Thu, 07 May 2026 14:41:33 +0000

(If you prefer video content, please watch the concise video summary of this article below)

Key Takeaways

Determinism no longer applies: LLM chatbot testing shifts from exact-match assertions to probabilistic, semantic validation, where multiple correct answers can exist for the same input.
Architecture defines test complexity: MCP orchestration, RAG pipelines, tool calls, and streaming responses create multiple failure points, making root-cause analysis inherently multi-layered.
Validation must be multi-dimensional: Combining must-have, must-not, and semantic similarity checks is essential to balance flexibility with control and reduce hallucination risks.
Test results are context- and configuration-dependent: Model version, prompt design, inference settings, and conversation history all influence outcomes, requiring continuous tuning and iterative test refinement.

Introduction: The Paradigm Shift in Quality Assurance for LLM-Based Systems

Testing an LLM chatbot inside an MCP-based system differs from testing classical software. Traditional systems are deterministic: the same input produces the same output. In a typical REST API, a request either returns the expected JSON payload or it does not. Assertions are straightforward.

A chatbot built around a large language model behaves differently. Testing a Large Language Model (LLM) output requires a fundamental paradigm shift. The assumptions that have governed software testing for decades — determinism, exact reproducibility, and binary state validation — break down when confronted with generative AI.

To understand the complexity of testing these applications, we must first look at the underlying architecture, explore why traditional assertions fail, and examine the unique, context-dependent pitfalls and specifics.

Reap the benefits of high quality software applications with SaM Solutions’ expert QA and testing services.

Learn more

System Architecture Overview

A modern LLM-based chatbot is a complex, multi-layered distributed system under the hood where each component introduces new variables into the testing equation.

When a user submits a prompt, it travels through several critical server-side components before a response is generated. Initially, the input is often processed by an orchestrator or reasoning engine. In enterprise environments like ours, this is typically where the Model Context Protocol (MCP) comes into play. MCP allows the LLM to securely interact with external data sources and internal tools without hardcoding integrations.

Simultaneously, the system employs a Retrieval-Augmented Generation (RAG) pattern. Before the LLM generates a response, the user’s query is embedded and sent to a vector database to retrieve semantically relevant context. This retrieved context, along with system instructions and chat history, is dynamically injected into a hidden meta-prompt. Only then is the payload sent to the inference engine (the model serving layer). Finally, the LLM generates tokens sequentially, which are streamed back to the client via a persistent connection, such as WebSockets using SignalR.

Challenge

These architectural decisions directly impact testability. Testing the “chatbot” means simultaneously testing the retrieval mechanisms, the orchestration layer, and the generative model itself. Therefore, failures in such an environment rarely come from a single place. If the chatbot answers incorrectly, the cause may be:

retrieval returned irrelevant documents
the prompt not optimized properly for use cases
the correct document was retrieved but the model ignored it
the model invented information not present in the context
the chatbot did not call a tool to trigger specific action or retrieve the specific data
the tool returned an error that was not propagated to the model
the context window truncated relevant information

The chatbot also operates inside a conversation. A response may depend on previous turns, retrieved documents, system prompts, and tool outputs. Testing a single prompt in isolation does not always reproduce the behavior seen in real conversations.

The business context adds pressure. In this system, the chatbot appears on a company website and answers questions from potential clients about the company’s experience and projects. If the bot invents projects or misunderstands a request, the damage goes beyond incorrect information. It can actively simulate successful lead handling, confirming that a contact request or submission has been sent to a sales team when in reality no downstream process has been triggered. The result is a broken conversion flow: the user believes a handoff to a human agent has occurred, while no lead is recorded, no notification is sent, and no follow-up ever happens!

Because of this, testing required a combination of traditional QA techniques and evaluation methods designed for LLM systems.

Fundamental differences in testing LLM output vs. deterministic systems

Classical software testing is built on determinism: given state *A* and input *B*, you expect that the system returns output *C*. If it returns *D*, you report a bug.

LLMs are inherently probabilistic. They calculate a probability distribution over the next possible token in a sequence. Consequently, identical inputs can produce different outputs. This non-deterministic nature obliterates traditional regression testing workflows. If you write an exact-match assertion expecting the bot to say, “The application is a web-based SaaS platform,” and the bot instead replies, “The software is an online platform delivered via SaaS,” a deterministic test fails.

This introduces the semantic correctness problem. An LLM’s output can be grammatically distinct, utilize different vocabulary, and be structured entirely differently, yet remain 100% factually accurate and valid.

Because of this, traditional bug classification and reproducibility workflows break down. A QA engineer cannot easily attach a “steps to reproduce” ticket for an LLM hallucination, because following those exact steps five minutes later may yield a perfect response.

Configuration-dependent nature of system output

Even when employing advanced semantic testing, QA teams must navigate a minefield of configuration-dependent variables that make test suites uniquely fragile.

First, test validity is tightly coupled to specific model versions. Different models have their own specifics. A test suite becomes a snapshot of expected behavior for a specific model at a specific time.

Second, inference settings like `Temperature` (which controls randomness) and `Top-P` (which controls vocabulary diversity) act as hidden test variables. A suite that is somewhat stable at Temperature 0.2 may become less deterministic at Temperature 0.7.

Furthermore, these tests are hyper-sensitive to system configuration. Small adjustments to the system prompt, even seemingly innocuous wording changes, can drastically alter the downstream outputs.

This leads to a persistent challenge: distinguishing system regressions from expected variance. When a test fails, the team must determine if the system actually broke (e.g., the RAG database went offline) or if the model merely generated a statistically improbable, but acceptable, variation of the answer that the semantic evaluator wasn’t tuned to handle.

Finally, multi-turn conversations introduce severe state pollution. Because the model relies on conversation history, an imperfect answer in turn one can corrupt the LLM’s context window for turn three. Testing multi-turn flows requires isolating the state, carefully managing the conversational context, and continuously re-validating the entire suite as the system evolves.

Thus, a test captures a constrained observation window: a single slice of behavior produced by a given model version, decoding configuration, system prompt, input prompt, and retrieval and conversation state. It represents one trajectory through a much larger probabilistic space of possible outputs.

***

The following paragraph details exactly how we built a tool to meet these challenges head-on.

Functional Testing

First of all, the list of use cases has been created. Functional testing started with the main user scenarios expected on the website.

Typical questions included:

experience in specific industries
technologies used for back-end or front-end development
examples of previous projects
rough project estimates
how to contact the sales team

Visitors usually ask about the company’s experience, technologies, and previous projects. Some conversations also lead to contact requests.

Later this list was expanded to test cases. Each test case is structured as an ordinary one, but has some specific inherent to AI-powered systems. There are the sections describing what must be, what is appropriate in response, and what must not be in it in any circumstances.

User’s question such as “Have you built any healthcare platforms before?” should produce an answer based on portfolio data stored in the knowledge base. Basically, the answer should mention real projects if they exist and avoid inventing clients.

Here is the story of how we built a custom, end-to-end Python-based test harness designed for end-to-end validation of streaming chatbot responses.

The challenge: WebSockets and non-deterministic outputs

The chatbot streams tokens sequentially via SignalR over WebSockets. We couldn’t just fire off an HTTP POST and read the JSON response. Therefore we created a modular Python framework broken down into a SignalR client, an evaluation engine, and a streamlined test runner.

It has been designed considering the separation of concerns principle: test data (JSON-based test cases and validation rules) is decoupled from the transport layer (SignalR/WebSockets), interpretation logic (NLP analysis), and the execution runner.

Building the SignalR Client

The first step was establishing communication. Since our chatbot works via SignalR, we opted for the lightweight `websocket-client` library in Python rather than pulling in heavy browser automation tools like Playwright or Selenium, as our goal was to test the API/back-end logic directly (Integration/E2E level without the UI overhead).

SignalR has its own quirks. It requires a specific JSON handshake (`{“protocol”: “json”, “version”: 1}`) and appends a very specific terminating character (`\x1e`) to the end of every payload.

Our client script establishes the WebSocket connection, manages the handshake, and enters a `while True` listening loop. Because the LLM streams its response by small chunks of data, the client parses incoming `ReceiveMessage` events, concatenating the text chunks until it receives an `isComplete: True` flag from the server, at which point it gracefully closes the socket and passes the complete string to our evaluator.

The three-layered validation strategy

Once we had the full text string from the chatbot, we needed to decide if it was “correct”. We implemented a three-tiered quality gate:

The “must-have” check (with synonyms)

While LLMs vary their phrasing, there are often hard business requirements regarding what must be mentioned. Using a JSON-driven test data approach, we define `must_have` arrays. To prevent flakiness, we built a synonym engine.

For example, if the test requires the bot to mention the application is “web-based”, our test data maps “web-based” to `[“saas”, “online platform”, “web application”, “AJAX-based”]`. If the bot uses any of those terms, the assertion passes.

The “must-not” check (hallucination prevention)

Equally important to what the bot says is what it should not say. AI models are prone to hallucination. If a user asks about a legacy accounting web app, the bot shouldn’t invent features. We feed the framework a `must_not` array containing terms like “mobile app”, “blockchain”, or “AI analytics”. If these are detected, the test immediately fails.

This mechanism forms a baseline validation layer. In most cases it produces stable and predictable results because it operates on explicit lexical constraints.

However, this stability is still superficial. For example, the absence of a term does not imply correctness. We had to run the test suite multiple times to expose flaky outputs, iteratively expanding the must_have set with additional terms until the results reached a level of reliability suitable for interpretation.

The weakest component in this setup is the must_not block itself. It assumes that undesired behavior can be exhaustively enumerated. In practice this is impossible.

Semantic similarity (the AI testing the AI)

We still should keep in mind that even if all keywords are present, the sentence structure could be completely wrong.

To solve this, we integrated `sentence-transformers` backed by `torch` and `scikit-learn`. We load the `all-MiniLM-L6-v2` model — a fast, lightweight NLP model perfect for calculating sentence embeddings.

When a test runs, we take the bot’s generated response and a pre-defined `expected_answer` from our JSON test cases (basically it’s taken directly from the data source). We convert both strings into high-dimensional vector embeddings and calculate the cosine similarity. If the similarity score drops below `0.70` (70%, which is also an empirical value, set after several iterations of test execution), the test fails. This allows our chatbot to use completely different sentence structures and vocabulary, yet still pass the test as long as the fundamental semantic meaning remains intact.

We consider a test passed only when it passes all three layers.

Decoupling logic from data: The JSON test case structure

One of the most critical architectural decisions we made early on was to strictly separate the test execution logic from the test data and validation rules. Rather than hardcoding test scenarios into Python scripts, we externalized everything into a structured JSON file.

This created a pristine separation of concerns: the Python runner handles the how (transport and interpretation), while the JSON file defines the what (the inputs and the quality gates).

Each test case is a self-contained JSON object that acts as a comprehensive contract for a specific chat interaction.

Pros and scalability of such approach:

Zero-code onboarding: The primary advantage is accessibility. Business analysts, product managers, or junior QA engineers can write, modify, and review test cases without needing to understand WebSockets, Python, or Sentence Transformers.They just update the JSON.
Infinite horizontal scalability: Because the runner iterates through a standard JSON array, scaling the test suite from 10 cases to 10,000 cases requires zero architectural changes to the underlying Python code.
Version control friendly: JSON files diff beautifully in Git. We can track exactly when a synonym was added or when an expected_answer was updated to reflect a new product feature.

Test runner and reporting

We built a custom CLI runner that parses the `test_cases.json` file and executes the suite.

To aid in debugging, we utilized `colorama` and regular expressions to strip out HTML tags and dynamically highlight detected keywords and synonyms in bright green directly in the terminal output. This allows QA engineers to visually verify why a test passed or failed at a glance.

Finally, execution metrics (Test ID, Pass/Fail status, and response duration in seconds) are continuously appended to a results log file, allowing us to track performance latency and regression metrics over time.

Results

Testing AI-powered systems requires thinking beyond traditional binary assertions.

Deploying this custom framework fundamentally transformed how our team approaches AI quality assurance. We moved away from the tedious manual testing that plagues many early-stage AI projects and replaced it with a more deterministic, data-driven pipeline.

By combining strict keyword validation with semantic evaluation, we achieved a safety net that is both flexible and rigorous. This is the foundation for gathering hard metrics, such as latency, similarity scores, and hallucination catch-rates.

What’s next?

While the current architecture handles single-turn queries beautifully, the next frontier is stateful, multi-turn conversations. We can evolve the framework to work in long contextual states, evaluating how well the bot remembers facts established three or four messages prior. Furthermore, we are looking into integrating dynamic LLM-as-a-Judge mechanisms, where a secondary model acts as the final arbiter for chatbot responses.

The system also can be extended to load and concurrency testing. By parallelizing the test suite across multiple independent chat sessions, we can simulate real-world usage patterns and evaluate system behavior under concurrent requests. This enables measurement of performance characteristics such as response latency, throughput, and stability.

Testing AI requires discarding the comfort of absolute determinism. By building frameworks that are as intelligent and adaptable as the systems they evaluate, our QA can stop playing catch-up and start leading the charge in building reliable AI products.

Technologies used: Python, WebSockets, SignalR, PyTorch, Sentence Transformers (NLP), Scikit-learn, JSON, Regex.

Need help with AI testing?

Testing LLM-based systems requires more than traditional QA approaches. A structured validation strategy can help you detect hallucinations and improve response reliability in production AI applications.

Siarhei Nestsiarenka, Chief QA

Let’s talk about your project

Upgrading from Umbraco 14 to 17: Step-by-Step Migration Guide

Natallia Sakovich — Tue, 31 Mar 2026 09:43:46 +0000

(If you prefer video content, please watch the concise video summary of this article below)

Key Takeaways

The upgrade from Umbraco 14 to 16 is significantly smoother than 13 to 14. Once the major breaking changes are behind you, the remaining migration steps become more predictable and easier to manage.
Temporarily disabling Razor compilation helps unblock the upgrade. This allows you to regenerate models and fix issues incrementally instead of resolving everything upfront.
Authentication and API changes require careful handling. Updates such as switching to Microsoft Entra ID and changes in ContentService (Save vs Publish) must be addressed to ensure system stability.
Prepare early for future versions like Umbraco 17. Upgrading tools, dependencies, and editors (e.g., TinyMCE to TipTap) in advance reduces complexity and makes future upgrades faster and smoother.

Once the most difficult step of the Umbraco 13 to 14 upgrade was completed, the remaining migration path, upgrading Umbraco 14 to 15, then to 16, and finally to Umbraco 17 turned out to be significantly smoother.

The main focus of this article is the Umbraco 14 to 16 upgrade, covering API changes, authentication updates, and editor migration. As a bonus, we also touch on our initial experience with Umbraco 17.

Streamline your content management and power up your digital marketing with SaM Solutions’ Umbraco services.

View offer

Fixing Razor Compilation Issues

We discovered a few helpful tricks while working with Razor Pages. One of them is to disable Razor Pages validation at the project level.

This is especially useful when updating between versions. Otherwise, all models can become invalid, and some types may be lost. To fix the models, you’d have to comment them out, but then Razor Pages would break. And to fix Razor Pages, you’d need to fix the models first. It turns out to be a vicious circle.

It is much easier to resolve this issue by temporarily disabling Razor Page compilation, which allows the project to run and regenerate the published models without fixing every Razor view immediately.

With this approach, you don’t need to correct every broken reference in your pages just to start the application. This is especially useful when upgrading between versions where models change and many views become invalid at once.

This can be done by modifying the .csproj file directly:

     
      false
        false

After starting the project and regenerating the models, you can gradually fix the views and re-enable Razor compilation.

After that, run the site locally and regenerate the models.

This does not include the number of problems that need to be solved related to backward incompatible database content and runtime errors/unexpected behavior of previously working logic (Null suddenly starting to appear instead of data, etc.).

Authentication Provider Changes

Authentication provider updates

The way the back-office authentication provider is defined has also changed. We temporarily disabled it to stay focused on completing the upgrade, and only updated the authentication setup when we reached version 16.

Until then, we simply commented out all authentication-related code and reverted to basic login authentication. Since this project used Azure AD (now Microsoft Entra ID), we eventually had to rewrite the integration.

Custom authentication plugin

The final solution was to add a custom authentication plugin configured through a YML file. (You can move this step to the end — that’s when we handled it too.)

Example: Description of our provider

We initially ran into issues where Umbraco wouldn’t recognize the plugin definition. After some trial and error, we found that maintaining a strict folder structure and file format was crucial.

We followed this official guide:

The way the back-office authentication provider is defined has also changed.

We temporarily disabled it to stay focused on completing the upgrade, and only updated the authentication setup when we reached version 16.

Until then, we simply commented out all authentication-related code and reverted to basic login authentication.

Since this project used Azure AD (now Microsoft Entra ID), we eventually had to rewrite the integration.

Upgrading to Umbraco 15

Compared with the previous step, the upgrade to version 15 was much smoother.

First, update the NuGet packages to the latest version 15 of both Umbraco and uSync. After that, you’ll probably start seeing a series of errors.

Next, comment out all the problematic “red” sections in the code and temporarily disable Razor Page compilation, as mentioned earlier, to regenerate the publish models. Once that’s done, uncomment the code and start fixing the remaining errors.

Finally, remove the parsing logic we discussed earlier, which seemed to be a quirk specific to version 14.

The types were restored in the published models. At this stage, we also started configuring our TipTap rich-text editor, but we won’t go into detail here, since this setup is highly project-specific. Here you can find the official documentation.

Upgrading to Umbraco 16

Update the NuGet packages to the latest version 16 of both Umbraco and uSync. A few errors are likely to show up.

Then, repeat the publish models trick mentioned earlier to rebuild them and continue with the upgrade process.

Content service changes in version 16

In this version, the content service removed the combined SaveAndPublish option and split it into two separate methods: Save and Publish. With these methods you also need to specify the culture for which the operation should be performed. This gives you the flexibility to perform several saves first and only publish at the very end, which can be very convenient in many scenarios (ours included).

If you haven’t already switched your editors from TinyMCE to TipTap, Umbraco 16 will effectively force you to do so, because the TinyMCE UI is no longer available in this version.

This is also the point where we implemented our authentication provider as an Umbraco Plugin, using the approach described earlier for integrating with external authentication (in our case, Microsoft Entra ID).

Lessons Learned

After completing the migration, several key lessons became clear.

The hardest step is upgrade from 13 to 14

Most of the breaking changes occur at this stage, so version 14 turned out to be the most incompatible with the previous version among all the upgrades described in this article. During this transition we encountered several major changes, including:

macros removed
nested types replaced
editor changes
authentication integration breaking and requiring updates

Compared to the later upgrades (14 → 15 → 16), this step required significantly more effort and adjustments.

Automate content migration

Writing a custom migration job saved enormous manual effort. Whenever possible, use existing migration tools, we relied on several of them during our upgrade. However, if no suitable tool is available, it is often worth building your own automation rather than updating thousands of content items manually.

Use database checkpoints

Create database backups after each successful upgrade step. These intermediate checkpoints allow you to roll back to the previous working stage if something breaks, instead of restarting the entire migration from the very beginning. These are like save points in a game, they significantly reduce the risk and effort involved in a long upgrade process.

Expect localization issues

Language codes and culture formats may change between versions. During our upgrade, we also discovered that one of the cultures used in our portal had become invalid in the newer Umbraco version. We are not sure how common this issue is, but it’s something to be aware of, as similar localization inconsistencies may appear during your upgrade process.

Upgrade tools early

Moving to new editors and components early makes later upgrades easier. It’s also important not to postpone upgrades for several versions, as jumping across multiple releases at once increases risk. The larger the version gap, the more breaking changes accumulate, making the upgrade more time-consuming and increasing the chance of missing important migration steps.

Bonus: Upgrading to Umbraco 17 and .NET 10

Step 1: Upgrade to .NET 10

The first step is to upgrade your project to .NET 10. You can find all breaking changes here.

At the very beginning, we also decided to migrate our solution file from .sln to .slnx, which significantly simplifies solution management. This format was introduced with .NET 9.2.

To migrate, simply run dotnet solution migrate:

Instead of a 50+ line .sln file full of GUID references, you get a much cleaner file with around 10 lines.

At the same time, don’t forget to:

update all CI/CD pipelines to use .slnx
switch to .NET 10 in your build environment
remove the old .sln file (after verifying everything works correctly)

After that, update all projects to:

.NET 10
C# language version 14.0

Step 2: Update dependencies

Next, update all dependent NuGet packages to their latest 10.0-compatible versions.

One breaking change we encountered was related to Swashbuckle.AspNetCore.

Step 3: Fix ICU runtime issue

After resolving .NET 10-related issues and attempting to run the project, we encountered the following error:

Failed to load app-local ICU: icuuc68.2.0.9.dll
   at System.Environment.FailFast(System.Runtime.CompilerServices.StackCrawlMarkHandle, System.String, System.Runtime.CompilerServices.ObjectHandleOnStack, System.String)
   at System.Environment.FailFast(System.Threading.StackCrawlMark ByRef, System.String, System.Exception, System.String)
   at System.Environment.FailFast(System.String)

This is a known issue in Umbraco.

It is likely related to incomplete support for .NET 10 in Umbraco. We fixed it by updating the ICU version in the .csproj file:

replaced with:

The version must match the installed ICU package.

Step 4: Fix minor breaking changes

After this fix, we only had 3 build errors across 2 files.

Here we just need to replace this:

  var backOfficePath = globalSettings.Value.GetBackOfficePath(hostingEnvironment)

with this:

 var backOfficePath = hostingEnvironment.GetBackOfficePath()

Step 5: Navigation API adjustment

Another change required adjusting navigation logic.

Final version:

               _navigationQueryService.TryGetRootKeys(out var rootKeys);
                var root = rootKeys
                    .Select(key => umbracoContextReference.UmbracoContext.Content.GetById(key))
                    .FirstOrDefault(x => x?.UrlSegment() == "unrestricted-pages");
                return new Home(root, _publishedValueFallback);

Step 6: Date handling changes (important)

Umbraco finally fixed an issue with dates stored in SQL that we faced as well, since we store and manage events for multiple locations and time zones. We had an issue in previous versions where Umbraco always tried to reconvert our stored dates, even though we had already stored them in a specific time zone, but Umbraco still returned them as time zone specific datetimes marked as UTC, which created a lot of confusion.

That’s why we created a helper method like this to reset UTC kind.

But after we migrated to Umbraco 17 we changed this helper just to:

Since now all dates are returned as Unspecified, we can convert them to any time zone we need without any recalculations, even if they were created before the Umbraco 17 update!

Step 7: Run Umbraco upgrade

After launching the application, you will see the standard Umbraco upgrade installation page.

Click Continue, the process is straightforward and should not cause issues.

Step 8: Final cleanup

After the upgrade, don’t forget to regenerate published models and rework obsolete methods, the amount of which is definitely increased in Umbraco 17. We encountered around 60+ warnings after the first build. This is important because many of these obsolete methods will be removed entirely in Umbraco 18.

Final Thoughts

Upgrading to Umbraco 16 and then to version 17 turned out to be much faster and smoother than all previous ones. Compared to the complexity of the Umbraco 13 to 14 migration, this step required significantly less effort.

Another key takeaway is: don’t delay upgrades. The longer you wait, the more breaking changes accumulate, making the eventual jump across multiple versions far more painful and time-consuming. It’s always better to upgrade incrementally rather than face a large, complex migration later.

And if you’ve already skipped several versions, don’t worry. The team at SaM Solutions is ready to take on that complexity and handle the upgrade for you.

Planning an Umbraco upgrade?

If you’re preparing to move to Umbraco 17, a well-structured migration strategy can save you weeks of rework and prevent critical issues with content and functionality.

Vadim Birkos, Senior Full-Stack .NET Developer, AI Enthusiast

Let’s talk about your project

Upgrading from Umbraco 13 to 14: Our Journey from Build Errors to Breakthroughs

Natallia Sakovich — Thu, 19 Mar 2026 16:31:18 +0000

(If you prefer video content, please watch the concise video summary of this article below)

Key Takeaways

The biggest challenge is the Umbraco 13 to 14 jump. Most breaking changes occur at this stage (macros removal, API changes, content structure updates). Once completed, further upgrades become significantly smoother.
Macros migration is the critical blocker. Since macros are removed in Umbraco 14, converting them to blocks (often via custom migration jobs) is essential to preserve existing content.
Preparation and strategy save time. Upgrading .NET in advance, creating backups, freezing content, and using a local-first migration approach help reduce risks and speed up the process.
Incremental upgrades are more reliable than skipping versions. Even if targeting newer versions like Umbraco 17, you’ll likely need to handle the same core challenges introduced in version 14.

Umbraco CMS is a widely used and rapidly evolving .NET-based content management system. At SaM Solutions, we also rely on Umbraco for our internal corporate portal. However, the platform’s fast pace of development has a downside: new versions frequently introduce breaking changes, and upgrades are not always seamless.

Our team has been working with Umbraco since version 9. At the time we started planning the upgrade, our corporate portal was running on Umbraco 13, which is scheduled to reach its end of life on December 14, 2026. Many community members recommend skipping version 16 entirely (its EOL is June 12, 2026) and waiting for Umbraco 17, which has a longer support window until November 27, 2028.

There is logic in that advice. However, our experience showed that the most difficult migration step occurs earlier, when moving from Umbraco 13 to 14. The following upgrades from version 14 to 15 and then to 16 are significantly smoother. In practice, anyone upgrading directly to version 17 will most likely still need to overcome the same obstacles.

This article shares our real-world Umbraco 13 to 14 upgrade experience, including the problems we faced and how we solved them.

Request SaM Solutions’ Umbraco implementation to speed up and optimize your content management workflows.

Our services

Initial System Architecture

Before starting the upgrade, our portal had the following technical setup.

Platform and framework

.NET 8
Pages and components (partially migrated to Next.js with headless) but prod is still on Razor Pages
Hangfire for background job processing

Umbraco ecosystem

Umbraco CMS 13.8.1
uSync for content synchronization
Examine Search
Azure AD authentication for the Umbraco backoffice and for all users of the Umraco portal, with automatic user creation in the portal

Background jobs

Our portal relied heavily on automated background jobs interacting with Umbraco content.

We had roughly six jobs, including:

Content validation jobs triggered on content updates that make some changes to it (checking fields, setting the author, etc.)
Jobs for content translation across cultures based on a cron schedule
Import jobs that fetch content from external services

One of the more complex jobs translated content between cultures based on a cron schedule. This functionality is described in more detail in another article about our internal AI-powered translation system.

It Should Be Noted

If you try to upgrade to version 16 right away and run it, you will get something like this:

That’s why we decided on the incremental migration, starting with upgrading Umbraco 13 to 14.

Custom Macros that Blocked the Upgrade

One of the biggest blockers when upgrading to Umbraco 14 was the removal of macros. The thing is that our content managers used several custom macros when creating news articles on the portal.

The most important ones included:

ExternalIframeBlock for displaying external iframe content
IframeMediaWrapper for displaying media inside an iframe based on a link
ScaleImage for resizing images dynamically
SliderImages — a custom image slider

Unfortunately, macros were completely removed in Umbraco 14 and replaced by blocks. To preserve existing content without rewriting thousands of articles manually, we decided to migrate the macros with the help of coding and turn them into blocks.

Source: Umbraco documentation

Umbraco Migration Strategy

To speed up the upgrade process and be able to deploy only the final version, we decided to perform the entire migration locally on a single machine.

The workflow looked as follows:

Restoring the database from the production environment to a local machine. We created a local copy of the production database to ensure that the migration process would run against real content and real data structures. This helped us detect potential issues early, especially those related to content serialization, macros, and custom blocks.
Introducing a code and content freeze on production until the final deployment. Because the upgrade process was large and involved multiple breaking changes, we temporarily froze both code changes and content updates in the production environment. In theory, content editing could still continue during the migration if content authors were willing to manually repeat those changes after the upgrade. However, to avoid inconsistencies and additional overhead, we chose to freeze the content until the final deployment. We know this is an unusual approach, but in our circumstances we could afford it.
Performing the entire migration locally and restoring the database after the upgrade was complete. Once all migration steps were finished and the upgraded application worked correctly locally, we manually restored the updated database to the target environment and deployed the final version of the portal.

Alternative scenario

In a more traditional scenario, the migration could have been performed incrementally using intermediate environments (for example, development → staging → pre-production) and deploying each intermediate upgrade step along the way.

However, we intentionally chose a local-first migration approach to accelerate the process. Many issues were easier to fix only after reaching the target version (Umbraco 16), so repeatedly deploying intermediate versions would have slowed down the overall workflow.

Another factor that made this approach feasible was our project setup. Technical Product Owner was actively involved in the upgrade process and was able to execute the migration steps directly on their local machine. This significantly simplified coordination and allowed us to move through the upgrade stages faster than a typical multi-environment deployment pipeline would allow.

This approach works best if staging and production databases are identical and if you can follow the same steps like we did in terms of security policy (when Technical Product Owner is involved).

After restoring the database in staging, some environment-specific configurations (like user permissions) had to be reconfigured.

Branching Strategy for the Umbraco Upgrade

During the migration we created separate Git branches for each final Umbraco version:

upgrade-to-14
upgrade-to-15
upgrade-to-16

This strategy turned out to be extremely helpful. Whenever something broke (which happened quite often) we could simply:

restore a fresh Umbraco 13 database backup
rerun the upgrade path
quickly compare results

We also strongly recommend creating database backups after each successful intermediate upgrade.

Preparing the Upgrade

Before upgrading Umbraco itself, we completed several preparatory steps.

At the very beginning, we recommend doing everything that can be done in advance. In our case, that meant writing a migration job, testing it, and updating to .NET 9. At the time of our migration from Umbraco 13, version 16 was the latest available, and it required .NET 9. This way, when you first upgrade to version 14 and encounter a bunch of errors, you can at least rule out those related to the .NET version.

The upgrade itself went fairly smoothly and, in most cases, was resolved by simply updating NuGet packages, though it definitely didn’t work perfectly on the first try.

Next, we upgraded Umbraco to 13.10 and uSync to 13.3, because that’s when blocks were introduced, the very ones we needed to migrate our macros to.

Migrating macros to blocks in Umbraco

After that, we updated the existing content and added a draft version of the blocks we wanted to migrate to (apparently, we copied the macro logic with a few small modifications).

Then we started developing the job for migrating macros to blocks, something we had to write ourselves, and it turned out to be one of the most important parts of the migration process.

Check out our version of the job, it can serve as a good starting point, though in your case you might need to cover more conditions.

Details of this migration job:

In the end, there weren’t any major difficulties (though that’s after we figured everything out). During the process, we had to understand how Umbraco stores its data internally to know how to properly convert macros into blocks.

Essentially, our task was to replace each macro in the content with a corresponding block. For example:

News text  
MACRO_WITH_IMAGE  
More news text

What we did:

Scanned content for macros using regular expressions
Identified macro occurrences
Created corresponding block components
Replaced macro markup with block references

For creating new data types, we could use built-in Umbraco types, which made the work easier. We just had to be careful to follow the correct structure. The job also required that all the necessary custom blocks were already created at the site level to enable conversion.

Our macros were fairly simple, displaying an image, embedding a video in an iframe, or showing multiple images. One advantage of migrating to a newer Umbraco version was that some of these custom blocks became unnecessary, since the new Rich Text Editor already included built-in components for them.

The migration itself was needed mainly to preserve old content.

After that, we ran the migration job. Once it was completed, we reviewed the list of pages that had contained macros and tested them to ensure everything worked correctly. If all looked good, we treated that as a successful upgrade checkpoint.

Migration of Nested Content to BlockList and Grid to BlockGrid

The next step was to install the package uSyncMigrations in the local environment via NuGet.

Next, we followed this sequence of steps:

Go to Settings → uSync → Everything → Clean Export.

This helps avoid situations where uSync files and the current database version are out of sync.

Then go to Settings → uSync Migrations → Convert Site and select the migration plan “Nested to BlockList and Grid to Block…”.

This starts the migration process itself. It runs entirely based on uSync files — it takes the existing files and converts them to the new format.

Once the migration finishes, perform an Import to apply the changes to the database. In our case, the process generated about 4,000 files.

After that, go to uSync → Everything → to import all migrated content into DB (BlockLists and BlockGrid).

Upgrading to Umbraco 14

To update the Umbraco NuGet package to version 14, you first need to temporarily disable the uSync package, otherwise, the main Umbraco package would not upgrade. Or change the package version manually in all csproj files or Directory.Packages.props in case you use CentralPackageManagement.

After the update, reinstall version 14 of uSync and start addressing any errors to get the project running again.

You also need to remove @inject Smidge.SmidgeHelper SmidgeHelper from _ViewImports.

@await Umbraco.RenderMacroAsync(macroAlias, parameters)

What Broke During the Umbraco 13 to 14 Upgrade

Notification API changes (SendingContentNotification removed)

SendingContentNotification no longer exists, which means all related handlers have to be rewritten. We decided to use the newer ContentSavingNotification approach and were able to refactor our handlers accordingly.

The main difference is that handler logic is now culture-dependent, and the structure of some objects has changed slightly.

Example:

V13

public class SomeMagicContentNotificationHandler(UmbracoHelper umbracoHelper)
    : INotificationHandler<SendingContentNotification> 

  foreach (var variant in notification.Content.Variants)
{
.....
}

Final version (v16)

public class SomeMagicContentNotificationHandler(
	IAppoinmentTimeZoneService appoinmentTimeZoneService,
	UmbracoHelper umbracoHelper)
	: INotificationHandler<ContentSavingNotification>
  foreach (var content in notification.SavedEntities)
   {
   	if (!ValidationIfItemTemplateIsDesired(content))
       	continue;
	foreach (var culture in content.AvailableCultures)
 	{
.....
}
}

In reality, there weren’t many changes but they still needed to be made.

Incompatible classes after migration

We also removed our MacrosMigrationJob, since its mission was complete. It was only needed for the migration from version 13 to 14. Once we moved to version 14, several incompatible classes appeared that would have prevented the project from building.

Label type changes in published models

Next, there was an issue with labels. We had Umbraco.Label types, and when we migrated to version 14, our published models changed the types to string. Although they should have been int and DateTime.

We handled the parsing manually but that’s actually unnecessary. In later Umbraco versions, this functionality will start working correctly again, and the models will return the proper type. It’s easier to just comment it out temporarily until you upgrade to the final version.

In the future, Umbraco will allow this to be configured directly at the platform level, so you’ll be able to specify a custom Label type if it differs from string.

Rich Text Editor migration (TinyMCE → TipTap)

During the upgrade to version 14, we also switched our editors from TinyMCE to TipTap.

This change didn’t cause any issues, but it made future updates easier, since in later versions, TinyMCE was removed entirely.

Source: Umbraco documentation

Content serialization issues after macro migration

Next, we needed to create another migration job. You can find our version here. In version 13, when we migrated macros to blocks, the properties were serialized in uppercase, which isn’t compatible with what’s expected in version 14 and later. To fix this, we wrote a small job that re-serializes all the created content where these macros or blocks were used.

This step is only necessary if you previously used custom macros. If not, Umbraco will handle everything automatically during the update.

To Sum Up

From our experience, this was the most difficult part of the entire Umbraco upgrade process. The transition from Umbraco 13 to 14 introduced the majority of breaking changes, and once we successfully passed this stage, the rest of the migration (14 → 15 → 16 → 17) became significantly faster and more predictable.

The main pain points were migrating macros to blocks and migrating nested types, which in our case also required changes in how we work with models at the code level (we will discuss it in more detail in the following article Upgrading from Umbraco 14 to 17).

It is also worth mentioning that during the upgrade we intentionally commented out certain parts of the code to unblock the process and move forward. Many of these fixes were completed only after reaching the final version. So if you don’t see a specific issue or fix described here, it most likely appears in the second part of this guide.

Planning an Umbraco upgrade?

If you’re preparing to move from Umbraco 13 to newer versions, a well-structured migration strategy can save you weeks of rework and prevent critical issues with content and functionality.

Vadim Birkos, Senior Full-Stack .NET Developer, AI Enthusiast

Let’s talk about your project

Developing an AI Assistant Prototype for Automated Lead Discovery and Qualification

Andrey Kopanev — Fri, 23 Jan 2026 09:33:16 +0000

(If you prefer video content, please watch the concise video summary of this article below)

Key Takeaways

Combining structured conversation scripts with LLM adaptability enables scalable, natural lead qualification without sounding robotic.
An AI assistant can accurately assess ICP fit by extracting and validating key business data from free-form conversations.
Respecting user intent and recognizing disengagement is critical for ethical, effective AI-driven outreach.
Production-ready AI lead generation requires not just models, but careful prompt design, validation logic, and cost control.

Finding potential clients for software development services is rarely a straightforward task. In practice, it often means manually monitoring chats, forums, and social networks, identifying promising conversations, and then reaching out to people one by one with similar introductory messages. This process is time-consuming, repetitive, and difficult to scale.

In this article, I explain how I built a working prototype of an AI-powered assistant SaMio that automates early-stage lead discovery and qualification for software development providers, while keeping conversations natural, respectful, and context-aware.

Need expert guidance on designing and implementing AI solutions for your business?

View offer

The Challenge: Manual Lead Discovery Doesn’t Scale

The starting point was a very common real-life workflow.

A company’s employee continuously monitors chats and social platforms to identify people who might be interested in software development services. Once such a person is found, the next step resembles cold outreach, only via modern messengers instead of phone calls. The outreach typically follows a script: greeting, a few qualifying questions, and then a proposal or a polite exit.

The problems with this approach were obvious:

The same introductory messages had to be written manually again and again.
Conversations rarely followed a perfectly linear script.
People could refuse, ignore messages, or shift the topic at any moment.
It was difficult to consistently evaluate whether a person actually fit the ideal customer profile (ICP).

The goal was to automate this process without turning it into a robotic spam machine and without annoying people.

The Core Idea: Scripted Flow and LLM Adaptability

Instead of trying to fully improvise conversations, I started with a structured script approach.

At the core of SaMio lies a conversation flow definition, consisting of:

Initial greeting messages
A sequence of qualifying questions
Final success or failure messages

Each question step has a specific validation goal, for example, identifying the industry, company size, or the person’s role in the organization.

The number of questions is flexible and can be extended depending on business needs.

However, a static script alone would never feel natural. Real conversations are messy. People answer indirectly, ask unrelated questions, or clarify earlier statements. This is where AI becomes essential.

Making the Conversation Feel Human

To make dialogues feel natural, I connected a large language model (LLM) that adapts to the interlocutor’s communication style and context.

SaMio does not simply send predefined messages. Instead, it:

Selects appropriate variants from the script
Interprets free-form user responses
Extracts relevant information from those responses
Adjusts tone and pacing to match the conversation

If someone suddenly asks about the weather or shifts the topic, the assistant can respond naturally and then gently steer the conversation back when appropriate.

At the same time, the assistant must recognize hard stops. If a person refuses to continue or clearly disengages, the assistant must respect that decision and end the dialogue without pushing further.

This balance — being adaptive without being intrusive — was one of the most important design goals.

Qualification and Validation Logic

As the conversation progresses, SaMio collects answers to the qualifying questions. Importantly:

Answers are stored separately and can be updated if the user adds something later.
The system avoids asking the same question multiple times.
Each answer contributes to an overall assessment of how well the person fits the target ICP.

Once all relevant data is collected or the conversation naturally reaches a conclusion the assistant evaluates the result.

If the potential client matches the ICP, the assistant selects one of several success messages. These are soft, non-pushy proposals, such as offering a short demo, sharing a guide, or showing real examples.

If the person is not a fit or declines further discussion, the assistant selects a failure message, always polite, appreciative, and respectful.

Post-Conversation Automation

SaMio does not stop at messaging.

After the conversation ends:

A summary email with the results of the dialogue is automatically sent.
The full conversation history is stored in the database for later review.
Relevant data is synchronized with Google Sheets, keeping lead tracking up to date without manual input.

This ensures transparency, traceability, and easy handover to sales or marketing teams.

Summary table example

Name	User name	Position	Company	Size	Industry	Date	Last update
Yena Polrix	Ypolrix_po	User is a product owner	N/A	Approximately 300 people	The user works in the composable commerce sector	2025-11-22	2025-11-22
Luma Qentari	lqentari_77	User is a tech strategy lead	N/A	The company has nearly a thousand workers	Cloud automation	2025-10-19	2026-01-10
Nira Solven	nsolven_vp	User is a director of the company	N/A	Over 1200 people	Enterprise platform services	2025-12-10	2025-12-10

Technical Decisions and Architecture

Initially, I experimented with a local LLM (Gemma). While this approach seemed attractive from a cost and privacy perspective, it quickly revealed limitations.

The model struggled to correctly interpret ambiguous responses. If a user replied off-topic but without rejecting the conversation the model often failed to adapt and continue meaningfully.

As a result, I switched to GPT-4o-mini, hosted on Azure. This model provided significantly better conversational robustness and context handling.

Another key challenge was controlling conversational drift. Since cloud-based models consume paid tokens, letting the assistant engage in long, irrelevant discussions was not an option. We had to carefully balance:

Allowing natural small talk
Gently redirecting the conversation
Preventing unnecessary token consumption
Avoiding obvious “bot-like” behavior

Prompt engineering played a major role here, defining boundaries while preserving flexibility.

Technology stack

The prototype was built using:

Azure
Docker
.NET
PostgreSQL

The Result: A Working, Production-Ready Prototype

SaMio is a fully functional prototype that:

Conducts natural, adaptive conversations
Qualifies leads based on real responses
Respects user boundaries
Automates follow-ups and reporting
Stores and synchronizes data reliably

Most importantly, it demonstrates how LLM-based assistants can move beyond simple chatbots and become practical tools for real business workflows when combined with structured logic, validation rules, and thoughtful constraints.

Read how we implemented an internal AI-powered system for content translation.

Final Thoughts

This project was a strong reminder that successful AI systems are rarely “pure AI.” The real value emerges at the intersection of scripted business logic, human communication patterns, and carefully controlled language models.

For software development providers, this approach opens new possibilities for scalable, respectful lead generation without sacrificing authenticity or trust.

If you’re exploring similar automation challenges, the key takeaway is simple: start with real human workflows, then let AI enhance them — not replace them blindly.

Need to tackle a similar challenge?

For organizations facing challenges in lead generation, AI-powered assistants offer a scalable way to automate outreach, assess ICP fit, and generate qualified leads, while keeping conversations natural, respectful, and efficient.

Andrey Kopanev, Senior .NET Developer, AI Enthusiast

Let’s talk about your project

How We Built a Production-Grade RAG Engine for a Website AI Chatbot

Maryia Shapel — Fri, 16 Jan 2026 13:14:36 +0000

(If you prefer video content, please watch the concise video summary of this article below)

Modern websites don’t suffer from a lack of information — on the contrary, they often suffer from information overload. Blogs, news, service pages, event announcements, case studies… everything is useful, everything is scaling, and everything is scattered across navigation paths that make little sense to a visitor who just wants one specific answer.

Key Takeaways

Production-grade AI chatbots require a RAG architecture that retrieves the right content first instead of relying on oversized prompts or fine-tuned models.
Clean content extraction, semantic chunking, and hybrid retrieval (dense + lexical reranking) are critical to answer quality and relevance.
Local, modular RAG components provide stronger control over cost, privacy, and scalability than fully managed cloud ingestion pipelines.
High-quality chatbot answers are an architectural outcome — driven by ingestion, retrieval, and ranking decisions, not by the LLM alone.

That’s exactly why we launched an internal RAG (Retrieval-Augmented Generation) project: to power a website chatbot that can answer questions based on the real site content. The chatbot can do it reliably and privately, without pretending that it “knows” things it has never seen.

Explore the practical story of how we built it, what didn’t work at first, and what finally made the answers noticeably sharper.

Leverage AI to transform your business with custom solutions from SaM Solutions’ expert developers.

View offer

Client’s Business Request

The project started with a client request to design and implement a scalable AI platform.

Our ultimate objective was to create a system that would:

register and access the system;
connect several different models;
embed an AI pop-up chat on any website.

Within the MCP scope, the chatbot required a few definite features:

answer user questions with the help of website content (blogs, articles, news, event pages);
send an email from the chat flow (so a visitor can ask questions, provide missing details, and trigger outreach without the necessity to file a classic “Contact us” form).

Understanding the Real Problem

A website chatbot may look simple at first glance. Nevertheless, it seems so until you try to make it accurate. The core constraint was straightforward:

A general LLM doesn’t know the сurrent state of the website by default.
Large cloud-based language models introduce significant concerns around cost, privacy, and dependency on external infrastructure. It is especially significant when it is applied beyond public web content to internal or sensitive knowledge.
As a result, many teams prefer to choose smaller, local models. However, this shift exposes a new bottleneck: limited context windows. When you feed entire documents or pages into a single prompt that no longer scales, the inputs get truncated, and irrelevant sections may consume valuable context. So, the “one huge prompt” approach becomes fragile and inefficient.

So instead of “teaching” the model everything at once, we chose the approach that works in real production systems: retrieve the right pieces of content first, then generate the answer from those pieces. That’s RAG.

Exploring Possible Solutions

We looked at several possible directions before committing:

Option 1: “Just prompt it with the page”

Why it failed:

Content pages embed LLM-irrelevant noise (e.g., navigation, repeated sections, hidden elements, cookie notices) unrelated to query-relevant information;
Context limitations mean the model discards content anyway;
Answers may vary depending on what got cut off.

Option 2: Cloud ingestion + managed search (e.g., Azure)

Why we didn’t pick it for this stage:

Cost grows fast once you index frequently and scale usage;
Privacy and control become more complex in the long run, especially for internal extensions.

Option 3: Fine-tuning

Why it didn’t fit:

Constant website updates mean constant re-training or drift;
Fine-tuning doesn’t automatically explain where you got that answer from;
It is also computationally expensive and demands deep ML expertise to train, maintain, and debug models effectively.

Option 4: RAG with local components

Why we chose it:

We keep total control over data and cost;
We can prove the concept on public content first and then confidently extend to private knowledge later;
We can continuously update the knowledge base with the help of content re-indexing.

Talk to our AI specialists about building smart, scalable software for your business.

How We Ran the Process

To deliver an internal RAG engine that spans ingestion, retrieval, LLM orchestration, and website embedding, we assembled a cross-functional team:

Project Manager (PM) — aligned stakeholders, defined iterations and milestones, and kept scope under control as long as requirements evolved;
Architect — owned the target architecture, integration approach, and key technical decisions, such as security, scalability, and data flow;
.NET Engineer — implemented the RAG services, retrieval pipeline, vector database integration, and MCP tools (search, actions);
Frontend Engineer — built the chatbot UI and the user flows needed to embed it on the website and make it usable in real sessions;
2 Java Engineers — supported Java-based backend services and integrations; developed the MCP platform in both .NET and Java, with a unified architecture and the Java implementation selected for production; built and maintained CI/CD, Kubernetes, GitOps, and monitored the platform stability.

This setup let us iterate quickly and improve the final answer quality, while MCP’s formalized communication protocol enabled a multi-language, heterogeneous team to collaborate without depending on a single skill set. We built the solution in iterations — the first one worked “technically,” but not “product-wise.”

Version 1: A quick Python prototype (worked, but messy)

We started with a script that:

crawled pages;
extracted content;
generated JSON files with extracted content;
which were later processed by a separate utility to create embeddings and store them in the vector database.

What went wrong:

We were embedding entire pages, including HTML and tags, which produced lots of irrelevant semantic noise;
We also hit practical issues around stable text-to-embedding conversion and consistency.

Result: the chatbot could answer, but its responses often felt fuzzy, overly broad, or based on the wrong fragment.

Version 2: A microservice-based pipeline (cleaner, scalable)

Once our microservices team got involved, we redesigned the ingestion approach: instead of “walking” through cross-links, the service used site APIs where possible. The microservice:

pulled content;
cleaned it;
split it into chunks;
embedded those chunks;
and pushed them into the vector database.

This alone improved relevance, because the model stopped “learning” navigation menus and repeated UI blocks.

Designing the RAG Strategy

RAG success depends on two things:

How you chunk content
How do you rank what you retrieved

Chunking experiments we tested

We explored multiple strategies, since manual, human-driven splitting is slow, hard to maintain with updates, and not a scalable approach.

What we explored included:

Sentence-window chunking
Semantic chunking (split/merge based on embedding similarity)
Hybrid approaches that combined fixed windows and semantic merging

Libraries and components we used in this exploration:

TextChunker (Microsoft.SemanticKernel.Text)
drittich.SemanticSlicer
SemanticChunker.NET
Custom strategies like SemanticDoubleParseMergeStrategy and WindowChunkStrategy

WindowChunkStrategy (the idea):

Take a 3-sentence window
Shift by one sentence → next window
Compare embeddings of neighboring windows
Merge them if they’re semantically close

This helped keep meaning intact without creating giant blobs of text. In the end, we settled on SemanticSlicer.

Final retrieval flow (the part that changed everything)

Our first implementation did a basic vector search. It worked — but not as well as we needed. So we added two key steps: query enrichment and reranking. Here’s the simplified pipeline:

Here’s the simplified pipeline after the user asks a question in chat (example: “What expertise does your company have?”):

Query enrichment

Before embedding the query, we send it to the LLM that expands it semantically. Example of transformation: “the company’s expertise”→ “the company’s expertise, successful projects, clients, competencies, industries.” We now run retrieval twice: vector search for the original query and vector search for the enriched query.

Reranking (BM25 / lexical relevance)

The system retrieves 15 results from the enriched query and 10 from the original one, then reranks them and selects the top 5 for response generation. This favors chunks with strong keyword overlap and rare, distinctive terms. Built as a modular microservice platform, all retrieval and ranking parameters are configurable on the fly to optimize relevance, performance, and scale. We select the top 5 chunks. Those records become the grounded context for the LLM answer.

This is where responses noticeably improved: tighter answers, fewer irrelevant citations, and better alignment with how humans actually ask questions.

Key Features Implemented

The following features form the core of the system, enabling reliable actions, accurate retrieval, and traceable knowledge grounding in real production use:

MCP tools for real actions

Within MCP, we implemented tools that the assistant can call programmatically: vector database search (semantic retrieval) and email sending (lead capture / follow-up).

A knowledge base built from real website content

Each chunk is traceable back to its source page. Re-indexing runs on a schedule (initially every few hours / up to twice per day depending on the pre-set configuration).

Hybrid retrieval quality improvements

We added two-pass retrieval (original + enriched query), BM25 reranking, top-K selection to keep prompts small and relevant, which, in essence, describes how RAG works overall, since its core task is to find the documents most relevant to a given query.

Overcoming Challenges

Content noise and “embedding the wrong things”

Early on, embedding raw page content caused the assistant to “learn” the wrong signals. Fix: cleaner extraction + chunking focused on meaningful text blocks.

Names and multilingual edge cases

Some user questions were not in English (e.g., “Who is the named person?”). Embedding models that aren’t strong in the foreign languages can misfire. For instance, the system can retrieve another person who has the same name, just because the vector similarity isn’t precise enough. Mitigation ideas we explored:

hybrid sparse+dense retrieval
adding extra signals for person-name detectionq

Evaluation and testing

RAG retrieval itself is deterministic for the same inputs, but its non-determinism comes from query embedding generation and from how the LLM processes and phrases the retrieved context. We learned that the most stable testing approach is to validate:

retrieval correctness (did we fetch the right chunks?)
answer grounding (did the answer use the provided context?)uo

Freshness and indexing latency

New content doesn’t become searchable instantly — ingestion takes time. We had to balance:

infrastructure load
the user expectation that “the chatbot should know what we posted today”
indexing frequency, which was fully configurable (as often as every 15 minutes)

Results and Business Impact

Even at this stage, the impact is already clear:

What changed for users

Visitors can ask questions in natural language instead of hunting through menus;
The chatbot provides faster discovery across blogs, articles, and updates;
The email tool enables a smoother lead flow: ask a question → provide missing details → send a message without the traditional form of friction.

What changed for the business

Better content utilization: valuable pages are not being “buried” anymore;
Scalable approach: we can extend the same architecture beyond public pages;
Cost and privacy control: the system is built around retrieving only what’s needed, rather than pushing everything into external prompts.

What’s Next

This project is still evolving. The next steps are focused on reliability and scale:

Automated evaluation for retrieval + grounded answers
Improved multilingual handling (especially for names and short queries)
Stronger hybrid retrieval (sparse + dense) to reduce “false friends” in vectors
Finalizing infrastructure pieces (AI server and RAG collection setup)
Revisiting multi-site coverage (the US site may run its own MCP server)

Summary

This project demonstrates how a production-grade RAG architecture can turn an AI chatbot from a surface-level interface into a reliable knowledge access layer for a growing website. By combining structured content ingestion, semantic chunking, hybrid retrieval, and controlled LLM orchestration, we built a system that delivers accurate, grounded answers based on real content — not assumptions or hallucinations.

The solution improves content discoverability for users and establishes a foundation that can be extended to internal knowledge bases, multi-language environments, and data sources. Most importantly, it proves that high-quality AI assistants are not created by prompts alone, but by deliberate architectural decisions across data, retrieval, and infrastructure.

Need to tackle a similar challenge?

For organizations facing similar challenges with corporate content translation and localization, locally deployed AI models offer a powerful alternative to traditional translation methods, balancing autonomy, control, and performance in one integrated solution.

Andrey Kopanev, Senior .NET Developer, AI Enthusiast

Let’s talk about your project

Internal AI Deployment for Seamless Content Translation: A Real-Life Project Story

Natallia Sakovich — Mon, 28 Jul 2025 08:53:35 +0000

Internal AI Deployment for Seamless Content Translation: A Real-Life Project Story

(If you prefer video content, please watch the concise video summary of this article below)

Key Takeaways

SaM Solutions implemented an AI-powered translation system to automatically translate more than a thousand internal CMS pages, addressing the operational challenges of manual translation in a growing multilingual environment.
A self-hosted large language model (LLM) was chosen to maximize data security, avoid subscription costs, run within the corporate intranet, and allow deep customization.
The team built a modular architecture integrated into the Umbraco CMS workflow using tools like Hangfire for background job scheduling, balancing automated translation with manual editorial checks where needed.
The system translated large content volumes in hours instead of an estimated weeks of manual work, showing efficiency gains and scalability.

In today’s multilingual work environments, fast and reliable localization is essential. At SaM Solutions, we recently tackled the challenge of translating a large volume of internal content into English by deploying an AI-driven solution. This case study details how our team leveraged a locally hosted large language model (LLM) to automate translation directly within our content management system (CMS), achieving efficiency, cost savings, and full data control.

Leverage AI to transform your business with custom solutions from SaM Solutions’ expert developers.

View offer

The Business Need

As SaM Solutions continues to grow internationally, the need for multilingual communication across departments has become more pressing. Our internal portal, powered by Umbraco CMS, serves as the central hub for news, articles, and corporate updates. With over a thousand pages of content in different languages (English, German, Polish, Lithuanian, etc.), we faced the operational challenge of ensuring this material would be accessible to all employees, regardless of the source or target language. This required a scalable solution for cross-translation between all corporate languages.

Manual translation was evaluated but quickly ruled out due to the volume of material and time constraints. We needed an automated solution that could be integrated directly into our corporate portal, preserve data privacy, and ensure quality outputs with minimal manual intervention.

Why We Chose a Locally Deployed LLM

Cloud-based translation services were not considered due to concerns over data confidentiality and ongoing subscription costs. Instead, we opted for a self-hosted LLM deployment. Here are some key benefits of this approach.

Data security

All processing occurs on internal infrastructure with no third-party exposure, preventing critical data leakage.

Lower long-term costs

After the initial on-premises setup, there are no recurring licensing fees and API subscription payments. You can use the model on a long-term basis after a one-time deployment.

Flexible configuration

You can configure every aspect of the system to match your specific needs. It supports locally deployed AI models as well as integrations with external providers like OpenAI, allowing you to choose, combine, or switch between models based on your tasks and infrastructure.

Intranet-based solution

The model runs entirely within the corporate intranet, providing fast, reliable access for internal teams without requiring an internet connection. This setup aligns with strict network policies and supports secure, uninterrupted operation across internal systems.

Full integration

A locally deployed AI model can be connected directly to the company’s internal applications through the development of Model Context Protocols (MCP), ensuring the integration of different workflows.

Custom tuning

The system is fully configurable to adapt to evolving content and language needs.

Fine-tuning

With local deployment, we gain full control over the training process, enabling precise fine-tuning of the language model on our domain-specific data. This allows the AI to better understand internal terminology, writing style, and context-specific phrasing.

Selecting the Right Model

We evaluated several open-source LLMs with different quantization levels and various versions, including Qwen, Deepseek, Mistral Nemo, Phi4, and Gemma. Each was tested using the same prompt on a representative sample of articles. Our evaluation team assessed output quality across several dimensions: linguistic accuracy, tone consistency, handling of markup and abbreviations, and overall usability.

Gemma 3 emerged as the clear winner. It provided the most consistent and coherent translations and required the least post-processing. Based on these findings, we moved forward with Gemma as the foundation of our localization pipeline.

Technical Implementation

Architecture

We designed a modular and resilient architecture to integrate AI-powered translation seamlessly into our existing CMS infrastructure. The system identifies untranslated documents by scanning Umbraco metadata for content types such as News and Articles. Each qualifying document is assigned a discrete translation job, ensuring traceability and isolation of processing.

A conditional publishing flow was established:

News items (in case they are published in the original language) are published automatically upon successful translation.
Articles are held in an unpublished state and submitted for manual review by designated editors. The reason is that articles are typically longer and often include specialized terminology (industry-specific or unique to our company), requiring a higher level of translation accuracy. This is especially important for corporate policies, ISO documentation, and other sensitive materials.

This approach balances automation with quality assurance, allowing rapid content delivery while maintaining editorial oversight where necessary.

Key tools

We selected Hangfire, a robust job scheduling library for .NET, to manage translation workflows. Hangfire provides:

Reliable background job execution
Retry logic for failed tasks
A built-in UI dashboard for monitoring and managing job status

To ensure secure and convenient access, we embedded the Hangfire dashboard directly into the Umbraco CMS interface and configured it with internal authentication controls.

To tailor Hangfire to our specific needs, we introduced several key customizations:

Extended logging capabilities: We integrated a third-party logging library with Hangfire to enable detailed monitoring and easier debugging of background tasks.
Task management extension: We developed additional functionality that allows us to manually add or restart specific tasks (such as translation jobs) directly within Hangfire. These controls were seamlessly embedded into the Hangfire dashboard, giving us better control and visibility over our job queue.

Translation jobs can be scheduled to run during off-peak hours to minimize resource contention and avoid disruptions to other internal processes. This allows us to maintain system performance while processing large volumes of content efficiently.

Overcoming Challenges

Throughout development and testing, several practical challenges emerged. We addressed each of them with targeted engineering decisions.

Handling long documents

Large texts occasionally exceeded the model’s optimal input length. To ensure stability, we implemented a segmentation mechanism that breaks content into chunks not to exceed the selected model context window.

Managing complex formatting

Articles containing intricate markup or embedded HTML tags sometimes led to hallucinated or malformed output. Although such cases were rare, they prompted us to implement a post-translation validation step. This step checks formatting integrity and ensures consistency. If validation fails, the system automatically generates a log entry, flagging the content for manual review.

Abbreviation detection

Certain content comprised short strings of characters, such as acronyms or product codes, that do not require translation. A pre-processing filter was added to bypass these cases.

Prompt tuning

While prompt quality is critical to translation fidelity, we found that even well-optimized prompts could yield unpredictable results. We continue to refine prompts based on observed edge cases.

Retry logic

If a translation attempt fails or returns incomplete content, the job is automatically retried up to three times. Failures are logged for diagnostics.

Post-processing checks

Completed translations are scanned for indicators of failure, such as mixed-language output or untranslated segments. These are flagged for manual review to ensure quality control.

Performance Metrics

During the initial rollout, the system demonstrated solid performance and processing consistency:

Initial batch processed: more than 1,300 documents successfully translated
Average translation time per document (under 1K words): approximately 30 seconds
Large documents (over 1K words): typically completed in around 2 minutes
Rare outliers: peak durations between 5 to 6 minutes
Operational efficiency: Translation time for the full corpus was reduced from an estimated 2.5 weeks of manual work to a few hours.

This level of performance met our expectations and confirmed the feasibility of ongoing automated localization for internal content workflows.

What’s Next?

To further streamline content localization, we are planning to develop a dedicated plugin for the Umbraco CMS. This plugin will introduce a “Translate with AI” button directly into the editorial interface, allowing users to initiate translation tasks with a single click.

The solution will support both locally hosted and external LLMs, giving editors the flexibility to select the most suitable engine for their needs. Once completed, we plan to release the plugin to the broader community via the official Umbraco marketplace.

Summing Up

This project demonstrates how AI can be deployed responsibly and effectively to solve practical business problems. By combining careful model selection, strong system architecture, and thoughtful integration into existing workflows, we delivered a secure and scalable solution that improves our internal operations while preparing us for future localization demands.

Need to tackle a similar challenge?

Andrey Kopanev, Senior .NET Developer, AI Enthusiast

Let’s talk about your project

How We Built a Risk Management System for an International Company: A Real-Life Project Story

Natallia Sakovich — Fri, 16 May 2025 15:24:41 +0000

How We Built a Risk Management System for an International Company: A Real-Life Project Story

(If you prefer video content, please watch the concise video summary of this article below)

Managing risks across multiple international branches while staying compliant with internal security policies is no easy task. That was exactly the challenge our client, a global enterprise with operations spanning several continents, was facing. They needed a centralized Risk Management System that could tie together their security framework, user access rules, and workflow automation across the board.

SaM Solutions’ development team built a custom enterprise risk management (ERM) module based on .NET, which became part of a larger system, to bring visibility, control, and efficiency to how the company handles risks.

Read on to discover how we approached this project step by step. And if you’re tackling something similar, we’d be happy to talk.

Rely on SaM Solutions’ vast expertise in .NET development to deliver your .NET-powered product or custom software.

Client’s Business Request

Our client is a global technology leader in the fields of electrification and automation, operating in over 140 countries with a workforce of around 160,000 employees. With such a massive international presence, they needed smarter tools to manage complexity, especially when it came to risk management and internal workflows.

One of the key challenges was the legacy setup: many processes were built around Lotus Notes forms that had become outdated and difficult to maintain. The client saw this as an opportunity not just to migrate away from the old system, but to rethink and modernize their entire approach.

At the top of the priority list was building a centralized Risk Management Module — a solution that would work across all branches worldwide and integrate tightly with the company’s security systems, user roles and permissions, and approval workflows. The system also needed to provide clear visualizations and intuitive organization of risks, making it easier for teams to assess and act on risk data.

But that was just part of the picture. Our task also included:

Developing an IT solution for the company’s R&D center, tailored to their internal processes and global collaboration needs.
Building a separate module for reporting, manpower budgeting, cross-budgeting, and multistage approval workflows to support financial planning and transparency across departments.

Understanding the Client’s Needs

When we started working with the client, it was clear they weren’t just looking for a new system — they were looking for a way to regain control over a set of critical processes that had become increasingly difficult to manage at scale.

It wasn’t just about replacing a legacy system. They wanted to rethink how risk and process management worked across their global structure.

Like many large enterprises, they were dealing with the following hurdles.

Fragmented data structure: Dozens of unorganized Lotus Notes forms were scattered across departments and countries, each with its own logic and purpose. There was no single source of truth for managing risks.
Access issues: User permissions were inconsistent. Some users had too much access, while others didn’t have enough, leading to both security concerns and bottlenecks in daily operations.
Data integrity problems: Without a unified system, keeping data accurate, up-to-date, and synchronized across branches was a constant challenge. This increased the risk of errors and compliance violations.
Limited visibility: There was no possibility to visually manage risks and prioritize them. Managers couldn’t easily see where issues were occurring, how they were being handled, or what the current risk landscape looked like.
Manual and inconsistent approvals: Especially in the R&D division, approval workflows were overly complicated and handled manually, slowing down projects and introducing unnecessary friction.
No unified risk management strategy: Each branch had its own way of assessing and responding to risks, which led to inconsistencies in how security and compliance standards were applied globally.

All of these issues were dragging the company down — slowing operations and creating unnecessary risks, both operational and regulatory. What they really needed was a single, flexible system that could bring everything together, work across all their international teams, and still be simple enough for anyone to use, both technical and non-technical staff.

Exploring Possible Solutions

At the start of the project, we explored a few different technology options. A complex, enterprise-wide, and security-focused system like this could technically be built using Java frameworks, Python-based back ends, PHP solutions, or even low-code enterprise platforms. Each option had its merits. But with so many moving parts, integration requirements, and a need for long-term scalability and support, we knew the choice of tech stack would have a big impact on the project’s success.

Why .NET Was the Best Option for the Project

In the end, .NET was the clear winner, and not just because it can handle large enterprise-level data efficiently and has built-in authentication and role-based access controls.

The client already had several internal systems running on .NET and a well-established IT department with .NET expertise. They also used Microsoft technologies across the organization, which made integration and ongoing support much smoother. By choosing .NET, we were able to build on their existing ecosystem, ensure maintainability due to long-term support from Microsoft, and keep development efficient by working closely with their in-house team. For this project, continuing the series of .NET-based solutions just made sense.

How We Ran the Process

With the technology stack in place and goals aligned, we followed an agile development process, working in sprints and keeping communication open with the client’s stakeholders. This allowed us to stay adaptable, deliver early value, and continuously refine the system based on real-world feedback.

To deliver a system of this scale and complexity, we assembled a cross-functional team, including:

2 Solution Architects (from the client’s side) — responsible for the overall system design and integration strategy
2 .NET Developers — focused on back-end development and business logic
3 Full-Stack Developers — implemented UI components and data visualizations
1 QA Engineer — handled test automation and manual testing across environments

This team worked in close collaboration with the client’s internal IT department, which also had .NET specialists who were gradually onboarded for support and future development.

Key features implemented

Risk management module (CRUD)

We built a full-featured module to manage risk data. Users can create, edit, and track risks at various organizational levels, with filtering options by region, department, and category. Risk categories in our module include:

External risks arising from outside the organization, often beyond direct control.

Geopolitical instability (e.g., sanctions, war), regulatory changes (e.g., GDPR, tax laws), market shifts or competitor disruption, supply chain breakdowns, currency fluctuations.

Operational risks related to internal processes, systems, or daily operations.

IT system outages or software bugs, manufacturing defects, cyberattacks and data breaches, process delays or human error, inadequate maintenance or equipment failure.

People and culture risks tied to workforce, leadership, and organizational culture.

Talent shortage or high turnover, resistance to organizational change, misaligned global teams, employee burnout or disengagement, leadership conflicts.

Finance and organization risks impacting financial stability and structural alignment.

Budget overruns or inaccurate forecasts, fraud or financial misconduct, compliance failures in reporting, inefficient organizational structure, revenue decline from market or internal issues.

Risk visualization

We developed an interactive interface for visualizing risks, grouped by categories, geographies, severity, and status. Color-coded dashboards, heat maps, and trend indicators help users quickly identify emerging threats and high-priority areas.

Reporting module

A separate module allows users to generate reports for internal use, audits, or compliance reviews. Reports can be exported in various formats and filtered by different organizational levels and risk parameters.

Flexible access control system

We implemented a dynamic, role-based access model that aligns with the client’s organizational hierarchy. Access rights can be configured at object level (e.g., specific risks or reports), ensuring users only see what’s relevant to them.

IT solution for the R&D center

We also built a dedicated internal system tailored for the company’s R&D center, streamlining project tracking, approvals, and internal communications. The module was designed with flexibility in mind to accommodate cross-functional teams and iterative research workflows.

Manpower budgeting and cross-budgeting module

To support broader operational planning, we delivered a budgeting tool that helps departments manage manpower forecasting, cost planning, and resource allocation. The system supports cross-budgeting scenarios between departments and includes multistage approval chains, making it easier to align financial plans across the organization.

Integration with workflow and approval processes

We connected the risk module with the company’s internal workflow engine, enabling automated routing of risk items through various approval stages. This helped standardize decision-making and reduce manual back-and-forth.

The entire system is built on .NET Core, with an SQL Server back end and a JavaScript front end based on the MVC pattern. We used CI/CD pipelines for streamlined deployments and hosted the system on Azure to ensure global availability, performance, and security.

Overcoming Challenges During the Development Process

Like any large-scale enterprise project, this one came with its own set of technical and organizational challenges. Here’s how we tackled the most critical ones.

Designing a flexible risk visualization component

One of the key requirements was to give users a visual understanding of risks across the organization. But with so many variables — regions, categories, severity levels, timelines — we needed to build a visualization component that was both flexible and user-friendly.

Our team designed a modular, interactive interface that allows risks to be grouped, filtered, and color-coded in real time. The final result gave users the ability to instantly grasp their risk landscape — without being overwhelmed by data.

Fine-tuning access control with the client

Implementing access rights wasn’t just about roles, it had to reflect the client’s organizational structure, operational models, and internal policies. We collaborated with the client to map their real-world hierarchy into a flexible permission system that could handle region-specific and department-specific access.

Through several iterations, we arrived at a model that was granular enough to meet security needs, but still easy to manage from an admin perspective.

Migrating data from legacy systems

The client had years’ worth of risk-related data stored across various Lotus Notes forms and spreadsheets. Migrating this into the new system while maintaining accuracy and relationships between records was a major task.

We created a custom migration pipeline to clean, transform, and import the data into the new structure, testing thoroughly to ensure data integrity at every step. This allowed users to start working in the new system without losing historical context.

Enabling real-time updates without extra overhead

In risk management, timing is everything — risks can appear or change daily. The system needed to support frequent updates without creating friction for users or overload for the support team.

To solve this, we focused on minimizing manual effort through smart defaults, inline editing, and change tracking. We also designed the system to match expected performance metrics and developed a disaster recovery plan.

Results and Business Impact

By the end of the project, the client had a centralized, flexible system that fully met their needs for managing risk across a global enterprise. What started as a fragmented collection of legacy forms evolved into a modern, integrated platform that supports everything from daily operations to high-level strategic decision-making.

The new solution brought several key benefits:

A single source of truth for risk data across all branches and departments
Improved transparency and accountability thanks to clear workflows and visualizations
Faster response to emerging risks, with real-time updates and streamlined approvals
Stronger security and compliance through role-based access and audit trails
Reduced manual work and more efficient collaboration between teams

The R&D and budgeting modules gave internal teams the tools they needed to plan smarter, work more efficiently, and stay aligned across functions and geographies.

For the client, this wasn’t just a new system — it was a strategic upgrade that set the foundation for future innovation and growth.

Need to tackle a similar challenge?

If your organization is facing similar challenges with risk management, process automation, or system modernization — our team can help. At SaM Solutions, we combine deep technical expertise with a practical, business-oriented approach to deliver solutions that scale.

Dzmitry Verasau, Chief .NET Technologist

Let’s talk about your project