This week I had a task to proof that RAG could be used in one of my clients. I must confess I was very sceptical with the possible outcomes but after 4 hours my view on the Retrieval Augmented Generation (RAG) changed a lot.
Let's start of what is RAG ?
Retrieval Augmented Generation (RAG) is an advanced framework in natural language processing (NLP) that combines retrieval-based techniques with generative models. It is designed to enhance the quality and accuracy of responses by retrieving relevant knowledge from external sources before generating an output. This approach is particularly effective in scenarios where the generative model alone may not have enough factual or up-to-date information.
How RAG Works
1. Retrieval Component:
When a query or input is given, the system searches an external knowledge base (like a database, document store, or even the web) to find relevant information.
This retrieval component can use models such as BM25 or Dense Vector Search (e.g., using embeddings generated by BERT or FAISS).
2. Augmented Generation Component:
The retrieved information is then fed into a generative language model (like GPT or BART).
The generative model uses this additional context to produce more accurate, relevant, and factual outputs.
When it Started ?
Salesforce has begun integrating Retrieval-Augmented Generation (RAG) within its ecosystem, specifically in tools like Einstein GPT and Data Cloud. RAG enhances AI-generated outputs by grounding responses with real-time, contextually relevant information retrieved from Salesforce’s structured and unstructured datasets. This is especially useful because large language models (LLMs) are often trained on static data, limiting their capacity to reflect recent or proprietary knowledge.
How Salesforce Uses RAG
Data Cloud Integration: In Salesforce Data Cloud, RAG retrieves content such as customer interactions, cases, emails, and knowledge articles. This information is indexed and stored using advanced techniques like vectorization, which enables semantic search for efficient retrieval.
Prompt Augmentation in Einstein GPT: Through tools like Prompt Builder, prompts can dynamically include relevant data fetched from the indexed knowledge store. This means the system generates highly contextual responses—crucial for applications like customer service or business insights—without requiring constant retraining of the underlying AI models.
Search Optimization and Flexibility: Salesforce supports vector and hybrid searches (which combine keyword and vector search), allowing users to fine-tune retrieval for specific queries. Developers can create custom retrievers to focus on particular data subsets, optimizing the relevance of responses.
By embedding RAG, Salesforce offers enhanced performance in areas like chatbots, virtual assistants, and case management systems, improving both the accuracy and trustworthiness of responses. This integration reduces the need for AI fine-tuning and provides more personalized insights through proprietary data augmentation.
How Make it Work ?
Offline Preparation
To implement RAG, start by connecting structured and unstructured data that RAG uses to ground LLM prompts. Data Cloud uses a search index to manage structured and unstructured content in a search-optimized way.
Offline preparation involves the following tasks in Data Cloud.
Connect your unstructured data.
Create a search index configuration that chunks and vectorizes the content. Data Cloud supports two search options: vector search and hybrid search (beta). Hybrid searches combine vector + keyword search. Chunking breaks the text into smaller units, reflecting passages of the original content, such as sentences or paragraphs.Vectorization converts chunks into numeric representations of the text that capture semantic similarities.
Store and manage the search index.
Integration and Run-time Use in Prompts
Once your offline preparation is complete, the final step is to embed the retriever in a prompt template and, optionally, to further customize search settings for that particular prompt.
Each time a prompt template with a retriever is run:
The retriever is invoked with a dynamic query initiated from the prompt template.
The query is vectorized (converted to numeric representations). Vectorization enables search to find semantic matches in the search index (which is already vectorized).
The query retrieves the relevant context from the indexed data in the search index.
The original prompt is populated with the information retrieved from the search index.
The prompt is submitted to the LLM, which generates and returns the prompt response.
As you can see working with RAG is not really complicated. The hard part of these process was the offline steps to ingest the right data. I really recommend you to try it and if you need a small proof of concept or proof of technology please don't hesitate to reach me directly.
#salesforce #ai #agentforce #wellarchitected #rag
Get latest news on your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.