Content Retrievers

Once you have stored your Documents, Content Retrievers are the way to get the content back out and injected into your prompts.

Adding a retriever to your agent

The @PeoplelogicAgent annotation supports defining a single Content Retriever for your agent. All you need to do is pass a Spring bean name to the contentRetriever parameter on the annotation and it will automatically search that retriever each time you query your agent.

@PeoplelogicAgent(value="thirdAgent",
        name = "Third Agent", contentRetriever = "apmContentRetriever",
        persona = "Funny but a bit snarky")
@PeoplelogicAgentInstructions("Your job is to say the current date.")
public interface ThirdAgent extends WorkerAgent {
    @SystemMessage(BASE_WORKER_PROMPT)
    Result<PeoplelogicResult> acceptWork(@MemoryId String userId, @UserMessage String query, @V("PreviousResponse") String agentResponse);
}

You can find the bean names of the built-in retrievers below.

We'll be adding support to allow multiple retrievers included in a default query router in a future version.

Retrieval Augmentors and Query Routers

A Retrieval Augmentor is a Langchain4J concept that allows you to route queries between multiple Content Retrievers. Because the Agent SDK handles multi-tenancy, we provide an implementation of these called PeoplelogicRetrievalAugmentor that will automatically carry your tenant across the different threads.

Creating a new RetrievalAugmentor is just like creating any other Spring bean (add this to a Configuration class or anywhere else you define your Beans):

@Bean(value = "PeoplelogicKnowledgeRetrievalAugmentor")
public RetrievalAugmentor retrievalAugmentor() {
        // Let's create a query router that will route each query to both retrievers.
        // This does a quick lookup to verify that we need to use the RAG first
        return PeoplelogicRetrievalAugmentor.builder()
                .queryTransformer(ExpandingQueryTransformer.builder()
                .chatModel(model).build())
                .queryRouter(new DefaultQueryRouter(apmContentRetriever, trainingContentRetriever))
                .build();
}

You'll notice something new, the queryTransformer method. This particular Query Transformer uses the LLM to try several variations of the prompt to get the best matches. Langchain4J provides these for you and you can read more about them in their docs.

Now just add your new RetrievalAugmentor to your agent instead of the ContentRetriever directly and you're off to the races.

Pre-built Components

The agent SDK ships several Content Retrievers with built-in content that you may want to just leverage. All of the Content Retrievers are Spring beans and can be autowired into your classes.

Bean

Type

Description

apmContentRetriever

PeoplelogicClasspathContentRetriever

Contains a series of articles around Agile Performance Management. Great for building HR coaching applications.

trainingContentRetriever

PeoplelogicClasspathContentRetriever

Contains presentations and content around OKRs and leadership training.

handbookContentRetriever

PeoplelogicClasspathContentRetriever

Contains multiple samples of different handbooks to help facilitate the handbook creation tool.

policyContentRetriever

PeoplelogicClasspathContentRetriever

Contains a collection of policy examples and individual development plans.

customer-content-retriever

CustomerKnowledgeContentRetriever

Searches any of the content that your organization has uploaded that was shared with the organization.

personal-content-retriever

PersonalContentRetriever

Searches content that was uploaded in the course of executing a particular task (like an OKR cycle export for analysis).

Building your own Content Retriever

In addition to the built-in retrievers, we have made it easier to build new content retrievers that can handle multiple users and even multiple customers. These are handled through several abstract implementations that you can extend.

NamespaceAwareContentRetriever

Building a NamespaceAwareContentRetriever typically involves passing in some values in a constructor and overriding a single method:

public abstract String getNamespaceKey();

This namespace key is then used to separate documents as they are ingested. By default this retriever will filter out documents that are just for your organization (your tenant ). Overriding the constructor is what allows you to modify some of the other default settings, such as the filter and the size of the embeddings. Let's take a look at the PersonalContentRetriever:

public PersonalContentRetriever(@Value("${peoplelogic.agent.rag.path:/tmp/personal-uploads}")
                                    String ragBasePath,
                                    @Value("${peoplelogic.agent.rag.store.type:memory}")
                                    String ragStoreType,
                                    @Value("${peoplelogic.agent.rag.store.key:}")
                                    String ragStoreKey,
                                    @Value("${peoplelogic.agent.rag.store.host:}")
                                    String ragStoreHost,
                                    EmbeddingModel embeddingModel,
                                    DirectoryUtils directoryUtils) {
        this.ragBasePath = ragBasePath;
        this.ragStoreType = ragStoreType;
        this.ragStoreKey = ragStoreKey;
        this.ragStoreHost = ragStoreHost;
        this.embeddingModel = embeddingModel;
        this.indexName = "personal-content-retriever";
        this.minScore = 0.0;
        this.maxResults = 100; // Let's increase this just to be sure we get the whole review for example
        this.maxCharsInSegment = 5000; // Longer documents typically
        this.directoryUtils = directoryUtils;
        this.defaultFilter = (query) -> metadataKey("user").isEqualTo("" + TokenSecurityUtils.getCurrentUserId())
                .and(metadataKey("file_name").isIn(SearchFileContext.getCurrentFiles()));
    }

    @Override
    public String getNamespaceKey() {
        return TenantContext.getCurrentTenant() + "-" + TokenSecurityUtils.getCurrentUserId();
    }

The main piece to pay attention to here is the defaultFilter. This filter is what limits the queries to certain sections of the vector store. In this example, we're saying that we're filtering on the user metadata to the current user AND that we're looking inside very specific files that have been uploaded. You're probably wondering how we use that in practice, let's take a look at a tool:

String userPrompt = "Summarize the files named '" + filenames + "'.  Multiple files are separated by a comma.";

// We build a new instance of this agent so we can empty out the tools.
SearchFileContext.setCurrentFiles(filenames.split(",")); // This lets us narrow the search to a specific set of files
if (!waitForUpload(userPrompt, personalContentRetriever)) {
    return "There was a problem uploading the files to analyze.  Please try again.";
}

return getAgent().answerWithPrompt(memoryId +"_summary", userPrompt, systemPrompt);

public HRAnalystAgent getAgent() {
        if (hrAnalystAgent == null) {
            hrAnalystAgent = AiServices.builder(HRAnalystAgent.class)
                    .retrievalAugmentor(PeoplelogicRetrievalAugmentor.builder()
                            .queryTransformer(ExpandingQueryTransformer.builder().chatModel(chatLanguageModel).build())
                            .queryRouter(new DefaultQueryRouter(personalContentRetriever, apmContentRetriever, trainingContentRetriever))
                            .build())
                    .chatMemoryProvider(chatMemoryProvider)
                    .chatModel(chatLanguageModel)
                    .tools(Collections.emptyList()).build();
        }
        
        return hrAnalystAgent;
}

What you're seeing here is a specialized isntance of the HRAnalystAgent specifically designed to lookup content without calling tools (for use when we're inside a tool). We wait for the upload (because Pinecone can take a bit of time to finish processing!) and then have the agent build a response using *just* the file that was recently uploaded.

PeoplelogicClasspathContentRetriever

This ContentRetriever is much simpler. It takes a file on the classpath (usually in the resources folder) and loads it using the in-memory embedding store. It can then be used anywhere that ContentRetrievers are used. Any files that are in these resources will still be run through all DocumentProcessor instances but remember that the vector store is somewhat less sophisticated and results may be simpler!

PreviousDocument Processors NextDocument Listeners

Last updated 3 months ago

Was this helpful?