Post-RAG09-22-20255 min read

Why Post-RAG?

By Nathan Anecone

Semantic Reach describes itself as post-RAG, in other words, as the thing that comes after RAG rather than another variation of it. Why do we feel justified in describing ourselves as post-RAG? We aren’t trying to come off as boastful or dismissive of the great work done in the RAG space, but we simply believe our technology has synthesized certain critical elements in a way that puts an end to the “soup” of countless flavors of RAG. It points toward what many RAG approaches have been reaching for all along—a unified, AI native data layer. First, let’s discuss the current issues with RAG and then why we believe we’ve definitively resolved them.

Sampling the RAG Soup

Retrieval Augmented Generation has emerged as a go-to solution for deploying generative AI technology for enterprise. It essentially allows for more factual and domain-relevant retrieval without the cost and finickiness of fine-tuning. But naive RAG, which essentially consists of semantic search over a conventional vector database, does not yield the level of precision and factuality that most enterprises need. So this has prompted an ever growing list of alternative RAG setups, each with their own strengths and drawbacks. To get a feel of the “RAG soup’, here’s a menu of some RAGs:

Naive RAG — the simplest form: retrieve top docs, feed them to the generator, no extra bells & whistles.
Advanced RAG — adds improved steps: e.g. query rewriting, reranking of retrieved docs, maybe filtering.
Agentic RAG — a RAG with an “agent-like” behavior: plans, decides what to investigate, may call tools, etc.
Graph RAG — uses relationships across entities, knowledge graphs, etc., to assist retrieval or guide what to fetch. Currently in vogue.
Multimodal RAG — retrieving and using non-textual content (images, videos, audio) in addition to text.
Speculative RAG — anticipates what follow-ups might be asked; pre-fetches those or does extra work ahead of time.
Corrective RAG — includes post-generation checking / correction steps; maybe a re-query or re-ranking to fix mistakes.
Modular RAG — different components are modular / replaceable (retriever, reranker, generator, etc.), so you can customize.

This list is not exhaustive. On its face, when you see a proliferation of techniques like this in the technology field with no clear winner it’s a red flag that something at the root of it isn’t right. Take a look at SQL for contrast. Aside from perhaps a few dialects and spinoffs, there’s really only one SQL. It just works so well in its basic form that there’s no reason to overcomplicate it. With RAG, we don’t seem to be seeing a clear winner in the space, instead we just have what’s trending and endless experiments. GraphRAG is currently on the rise, but its limits may too be revealed in time. For stakeholders this situation creates a confusing atmosphere. What RAG do I select? Why should I invest in this approach if a new, marginally better one will be proposed next month?

So that begs the question: what is the root problem with RAG that warrants all these imperfect remixes? We think the answer can be boiled down to a lack of a universal representation.

The Root Problem with RAG

Conventional vector embeddings as they are used in vector databases are good for unstructured data, but lack precision and fail to capture relationships. Structured data is orphaned from the flow entirely, demanding separate handling, which fragments backends. Graphs—specifically knowledge graphs and graph databases—were introduced for more fine-grained retrieval, but it’s added another representational layer in addition to the vector store, which you still need if you want semantic search. Any way you slice it, you have to blunt one edge to sharpen another. Data ends up fragmented or cast into an arbitrary form just to get certain things that should come for free. If you leave the embedding space, you multiply abstractions, and you lose the attractive parallelization and “AI-native” efficiency properties of working with embeddings. If you stay with embeddings, so the story goes, then you are stuck with similarity ops and can’t get more precision reasoning.

Semantic Reach: The Post-RAG Solution

That's where Semantic Reach comes in. We've figured out how to do everything you want from a RAG service—precise structured data calculations, semantic search, graph-like multi-hop reasoning plus more that you didn't even know you wanted, like learnable representations, crystal clear transparency and tracability, near constant time performance regardless of query complexity, and auto-associative memory characteristics, over one cohesive, compositional embedding space. Vector algebra elegantly handles all of this, and you never have to layer on more abstractions. You get the single, differentiable vector space, that is symbolically structured according to well-established mathematical principles backed by decades of research in the hyper-dimensional computing literature. So vector embeddings are in fact all you need after all.

We're taking things a step further by applying machine learning over this embedding space, so that data is dynamically associated and interconnected and the system gets smarter with usage. This creates what we call a "data brain", the data store becomes a living memory of how you use it. This presents an opportunity for agentic systems that continuously learn and adapt to evolving workflows.

Vector embeddings (albeit of a compositional variety) are the best choice for AI applications because they are the "lingua franca" of AI—they are the native tongue of these language models and by using this representation we can bring data backends closer into alignment with the models and agents built into them. If the view that vector embeddings are only good for unstructured data and are imprecise is disproven, as our results conclusively show, there's no more reason to run in circles looking for the next iteration of RAG. Our results suggest a strong step toward resolving these tensions, and we're committed to deepening this validation.

The result is a unique system that:

Natively computes structured queries (we ran it against TPC-H, the decades old industry standard for benchmarking relational databases!)
Natively emulates graph databases, without leaving the embedding space
Uniquely benefits from GPU acceleration, so your database can do some of the thinking for you
Learns associations, providing a "second brain" for agents
And much more, we are continuously shocked by what we find this system can do.

Seeing that we're saturating benchmarks like LongMemEval and LoCoMo, we think we're earning the right to describe ourselves as the "post-RAG." We plan on doing more benchmarks and demos in the future to provide more support for the claim that our solution is the universal one the industry seeks.