AI Resources

a16z AI Town based on the Stanford Smallville project. Run these agents on local machines.

GPTeam: collaborative AI agents

Content Creator

GPT-Author: utilizes a chain of GPT-4 and Stable Diffusion API calls to generate an original fantasy novel, output an EPUB file.

Media agent Media Agent scrapes Twitter and Reddit submissions, summarizes them, and chats with them in an interactive terminal. Podcast Copilot generate a social media post for a new podcast episode based on the audio file for the podcast.

DoppelBot is a Slack app that scrapes a target user's messages in Slack and fine-tunes a large language model (OpenLLaMa) to learn how to respond like them.

Coding specific

MetaGPT takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc.

Aider Chat

Aider is a command line tool that lets you pair program with GPT-3.5/GPT-4, to edit code stored in your local git repository. You can start a new project or work with an existing repo. And you can fluidly switch back and forth between the aider chat where you ask GPT to edit the code and your own editor to make changes yourself. Aider makes sure edits from you and GPT are committed to git with sensible commit messages. Aider is unique in that it works well with pre-existing, larger codebases.

GPT-Engineer an autonomous agent that build a code repository based on a specified project description. A personal code-generation toolbox.

RIFT

GPT-Migrate helps you migrate your codebase from one framework or language to another.

Personoids Lite for AI Chat provides examples of autonomous agent using chatGPT to solve Github issues (e.g., add documentation, bug fix). The repo is in Javascript.

Local LLMs and Privacy

PrivateGPT

How is LLaMa.cpp possible

Model fine-tuning

Finetune Llama 2 for Text-to-SQL how to fine-tune LLaMa 2 7B on a Text-to-SQL dataset, and then use it for inference against any database of structured data using LlamaIndex.

Misc interesting tools

roboflow reusable computer vision tools. Whether you need to load your dataset from your hard drive, draw detections on an image or video, or count how many detections are in a zone.

AI autonomy: from co-pilots to captains

AI autonomy can be understood as a spectrum of how much human involvement is needed for an AI system to perform a task. For instance, the Society for Automotive Engineers (SAE) defines six levels of driving automation, from no automation (level 0) to full vehicle autonomy (level 5). (src: 1) Most of today’s AI systems are like co-pilots that assist humans with their tasks. Humans are still in charge and provide guidance along the way. These systems are in the middle of the autonomy spectrum, but they are gradually moving towards higher levels. However, there are also some emerging AI systems that offer a glimpse into the future of full autonomy, where AI is able to complete certain tasks without any human intervention. This is a vision we outlined in our previous article, Web 3.0 will be an Autonomous Spatial Web. Full autonomy is not only challenging but also risky, as it requires AI systems to be reliable, consistent, and aware of their surroundings.

AI Co-pilot Stack diagram by Microsoft is a good framework for today's vast number of AI co-pilot applications.

Retrieval techniques for LLM applications

One of the key steps in search is retrieval, which is the process of finding and returning relevant documents in response to a user’s query. To perform retrieval efficiently, we need an index, which is a data structure that supports fast searching. A retriever is the component that uses the index to access and rank the documents according to their similarity or relevance to the query.

There are different ways to do retrieval, depending on the needs and preferences of the users. For example, some users may want to do semantic search, which is based on the meaning and context of the query and the documents. Others may want to do hybrid search, which combines semantic search with other criteria such as metadata filters or maximal marginal relevance. Metadata filters are used to narrow down the results before doing semantic search, while maximal marginal relevance is a method that balances similarity and diversity in the results.

Another variation in retrieval is whether to use a retriever or not. A retriever-less approach is one that does not rely on an index, but instead uses a large language model (LLM) to directly generate answers from a large context window of text. This approach has some advantages, such as simplicity and promising performance on some challenges. However, it also has some drawbacks, such as higher latency and limited scalability. Therefore, it may be more suitable for applications where latency is not critical and the corpus is reasonably small. For other applications, a retriever-based approach may be more efficient and effective.

Data Extraction

Camelot: extract tables from PDFs for Humans

Prompt engineering tools

Microsoft Azure AI released its Prompt flow tool in Build 2023.

end-to-end prompt development: construction, orchestration, testing, evaluation and deployment.
Help to integrate personalized and contextually relevant data from various different sources, it allows developers to call APIs through LangChain, Microsoft's Semantic Kernel and many other tools to pull data into the prompt
prompts can be constructed as they flow through different nodes in the graph and retrieve more relevant data
It allows calling any LLMs of the developer's choice and provide the responses with detailed trace information, so the developer can compare and engineer the prompts based on the results.

Microsoft Azure's Prompt Flow tool (Source: Microsoft Build 2023)

LLM Evaluation

OpenAI Eval

An auto evaluate tool for grading LLM Questions and Answers

Pick random selections of the input context and generate QA pairs
Auto generating a QA test set and auto-grading the result of the specfied QA chain.

Llama Index Question Generation and Evaluation module:

The module first automatically generate questions from the document (and does not require ground-truth answers for these questions)
It then generates answer with source for questions
The evaluation module will then access: 1) whether the response matches the source (nodes), 2) whether response and source matches query. 3) which specific source nodes of all retrieved source nodes are used to generate the response.

All about evaluating Large language models

Tracing support

Safety and security

In our prior article https://www.inweb3.com/generative-ai-a-blessing-or-a-curse-for-cybersecurity/ we discussed various possibilities that could put a generative AI system into risk, both at the model training phase and model usage (inference) phase.

When we are developing an application that uses existing models, two of the main attacks we face are prompt injection and model abuse. The way to handle that is to put guardrails both before and after the LLM interaction.

Layered approach to safety:

Model level, by model providers (e.g., OpenAI): detecting and filtering inputs
Content safety, wrapped around the model: detecting and filtering model input and output
Metaprompt: system level prompt that can be controlled by application deverlopers (Example from Microsoft Build 2023 src: https://www.youtube.com/watch?v=KMOV1Zy8YeM&t=14s )

Metaprompt for safety (source: Microsoft)

Human-in-the-loop Tool validation can enforce manual human approval of inputs to any tool in use through Langchain callbacks

Rebuff is a tool against prompt injection

Self-critique Constitutional AI chain can be used to enforce content moderation

Watermarking - cryptographic methods to mark and sign AI-generated content with metadata about its origin.

Langchain notes

Chain, agent, LLM

"A chain is just an end-to-end wrapper around multiple individual components." can fufill certain tasks, one simple and popular chain is LLMChain. It wraps an LLM model, a prompt template and an input/output guardrail, has an LLM with a particular prompt. So it is capable of interacting with the LLM to perform tasks that the prompt specifies. For example, an LLMChain can be asked to check the syntax of a particular SQL query.
an agent has an LLMChain and a list of tool names. It has the "Planning" capability. It normally receives an updated input and context of prior step results, it supplies those input and context, along with the list of tool names, to its LLMChain. The LLMChain then consult the LLM to determine which of the tools should be used in the next step.
By default, an LLM is isolated. So an LLMChain only generates answers based on what was provided in the input and what it has been trained on. A more general chain is capable of leveraging not only the LLM but also additional tools (e.g., search the Internet, consulting additional contexts it has not been trained on, perform scientific computations, etc.).
an agent executor is a popular example of a generalized chain. It coordinates between an agent and the execution of the tools associated with the agent. In this case, the chain contains an agent and all the tools on the agent's list of tool names. The agent in the agent executor also has an LLMChain, as usual. This LLMChain's prompt would also plan the tools used in the next steps and the corresponding tool input. The agent executor chain will receives these information and actually invoke the tool. Then the tool output will be fed back to the agent to plan for the next step.
Types of agents: Single-action agents, Multi-action agents, Plan and Execute agent (inspired by BabyAGI and then the Plan-and-Solve paper)

"TL;DR: we've introduced a BaseSingleActionAgent as the highest level abstraction for an agent that can be used in our current AgentExecutor. We've added a more practical LLMSingleActionAgent that implements this interface in a simple and extensible way (PromptTemplate + LLM + OutputParser)."

Langchain agentExecutor flow

When we do agent_executor(input), what happens?

chains/base.py has Chain's __call__ which calls its _call

        try:
            outputs = (
                self._call(inputs, run_manager=run_manager)
                if new_arg_supported
                else self._call(inputs)
            )
        except (KeyboardInterrupt, Exception) as e:
            run_manager.on_chain_error(e)
            raise e
        ...
        final_outputs: Dict[str, Any] = self.prep_outputs(
            inputs, outputs, return_only_outputs
        )

E.g., agents/agent.py has AgentExecutor(Chain)'its __call__ calls its _call which further calls its _take_next_step

        while self._should_continue(iterations, time_elapsed):
            next_step_output = self._take_next_step(
                name_to_tool_map,
                color_mapping,
                inputs,
                intermediate_steps,
                run_manager=run_manager,
            )
            if isinstance(next_step_output, AgentFinish):
                return self._return(
                    next_step_output, intermediate_steps, run_manager=run_manager
                )

            intermediate_steps.extend(next_step_output)
            if len(next_step_output) == 1:
                next_step_action = next_step_output[0]
                # See if tool should return directly
                tool_return = self._get_tool_return(next_step_action)
                if tool_return is not None:
                    return self._return(
                        tool_return, intermediate_steps, run_manager=run_manager
                    )
            iterations += 1
            time_elapsed = time.time() - start_time
        output = self.agent.return_stopped_response(
            self.early_stopping_method, intermediate_steps, **inputs
        )
        return self._return(output, intermediate_steps, run_manager=run_manager)

its _take_next_step further calls its agent's planner

            # Call the LLM to see what to do.
            output = self.agent.plan(
                intermediate_steps,
                callbacks=run_manager.get_child() if run_manager else None,
                **inputs,
            )
            ...
            
            if isinstance(output, AgentFinish):
                return output
            ...
            actions: List[AgentAction]
            if isinstance(output, AgentAction):
                actions = [output]
            for agent_action in actions:
                if agent_action.tool in name_to_tool_map:
                    tool = name_to_tool_map[agent_action.tool]
                    return_direct = tool.return_direct
                    tool_run_kwargs = self.agent.tool_run_logging_kwargs()
                    if return_direct:
                        tool_run_kwargs["llm_prefix"] = ""
                    # We then call the tool on the tool input to get an observation
                    observation = tool.run(
                        agent_action.tool_input,
                        verbose=self.verbose,
                        color=color,
                        callbacks=run_manager.get_child() if run_manager else None,
                        **tool_run_kwargs,
                    )
            ...
                result.append((agent_action, observation))
            return result

agents/agent.py has Agent(BaseSingleActionAgent)'s plan

        full_inputs = self.get_full_inputs(intermediate_steps, **kwargs)
        full_output = self.llm_chain.predict(callbacks=callbacks, **full_inputs)
        return self.output_parser.parse(full_output)

agents/chat/output_parser.py class ChatOutputParser(AgentOutputParser):

chains/llm.py has LLMChain(Chain)'s predict

        return self(kwargs, callbacks=callbacks)[self.output_key] # output key is "text"

Which calls back the LLMChain(Chain)'s __call__ and then its _call and prep_outputs

Unless output parsing failed, Check if output is AgentFinish, if not, run AgentAction tool. observation = tool.run

Improve resiliency

Handle tool error: if true, we need to raise a ToolException

e.g.

from flipside.errors import (
    QueryRunCancelledError,
    QueryRunExecutionError,
    QueryRunTimeoutError,
    SDKError,
)
from langchain.tools.base import ToolException

...
except Exception as e:
	raise ToolException(str(e))
...

The exception could be

e: ToolException('QUERY_RUN_EXECUTION_ERROR: an error has occured while executing your query. errorName=OperationFailedError, errorMessage=syntax error line 1 at position 167 unexpected \'TOK_INTERVAL\'., errorData={\'code\': \'001002\', \'data\': {\'age\': 0, \'pos\': -1, \'line\': -1, \'type\': \'COMPILATION\', \'queryId\': \'01af08d5-0603-c7a6-3d4f-83015d4e31df\', \'sqlState\': \'42601\', \'errorCode\': \'001002\', \'internalError\': False}, \'name\': \'OperationFailedError\', \'message\': "syntax error line 1 at position 167 unexpected \'TOK_INTERVAL\'.", \'sqlState\': \'42601\'}')

e.args[0]: 'QUERY_RUN_EXECUTION_ERROR: an error has occured while executing your query. errorName=OperationFailedError, errorMessage=syntax error line 1 at position 167 unexpected \'TOK_INTERVAL\'., errorData={\'code\': \'001002\', \'data\': {\'age\': 0, \'pos\': -1, \'line\': -1, \'type\': \'COMPILATION\', \'queryId\': \'01af08d5-0603-c7a6-3d4f-83015d4e31df\', \'sqlState\': \'42601\', \'errorCode\': \'001002\', \'internalError\': False}, \'name\': \'OperationFailedError\', \'message\': "syntax error line 1 at position 167 unexpected \'TOK_INTERVAL\'.", \'sqlState\': \'42601\'}'

If we have set self.handle_tool_error to True, we will have langchain.tools.base

        try:
            tool_args, tool_kwargs = self._to_args_and_kwargs(parsed_input)
            observation = (
                self._run(*tool_args, run_manager=run_manager, **tool_kwargs)
                if new_arg_supported
                else self._run(*tool_args, **tool_kwargs)
            )
        except ToolException as e:
            if not self.handle_tool_error:
                run_manager.on_tool_error(e)
                raise e
            elif isinstance(self.handle_tool_error, bool):
                if e.args:
                    observation = e.args[0]
                else:
                    observation = "Tool execution error"
            elif isinstance(self.handle_tool_error, str):
                observation = self.handle_tool_error
            elif callable(self.handle_tool_error):
                observation = self.handle_tool_error(e)
            else:
                raise ValueError(
                    f"Got unexpected type of `handle_tool_error`. Expected bool, str "
                    f"or callable. Received: {self.handle_tool_error}"
                )
            run_manager.on_tool_end(
                str(observation), color="red", name=self.name, **kwargs
            )
            return observation

So it merely set observation to e.args[0] , alternatively, it can call self.handle_tool_error to obtain a string return value and assign that to the observation.

Handle parsing errors | 🦜️🔗 Langchain

Occasionally the LLM cannot determine what step to take because its outputs are not correctly formatted to be handled by the output parser. In this case, by default the agent errors. But you can easily control this functionality with handleparsingerrors! Let’s explore how.

🦜️🔗 LangChain

Using Plugins as tools

Retrieve and select from a list of plugins as tools

We can use ChatGPT Plugin as tools
supply a list of ChatGPT Plugin urls (custom_agent_with_plugin_retrieval), or get based on a list from active plugins from https://plugplai.com directory (custom_agent_with_plugin_retrieval_using_plugnplai)
embed each Plugin description in a vectorstore, and perform a semantic search over a query to identify the relevant plugins to use.

Research

Instruction fine-tuning

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off their own generations. Our pipeline generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model. Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT-001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning. Our code and data are available at this https URL.

Gorilla: Large Language Model Connected with Massive APIs

Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate input arguments and their tendency to hallucinate the wrong usage of an API call. We release Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls. When combined with a document retriever, Gorilla demonstrates a strong capability to adapt to test-time document changes, enabling flexible user updates or version changes. It also substantially mitigates the issue of hallucination, commonly encountered when prompting LLMs directly. To evaluate the model's ability, we introduce APIBench, a comprehensive dataset consisting of HuggingFace, TorchHub, and TensorHub APIs. The successful integration of the retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs. Gorilla's code, model, data, and demo are available at this https URL

Foundation models

Attention is All You Need is the origin of the now prevalent "Transformer" model. See also Attention for Neural Networks, Clearly Explained, Intuition Behind Self-Attention Mechanism in Transformer Networks

Unifying Vision, Text, and Layout for Universal Document Processing: a model that learns from vision, text, and layout data using a Transformer and different pretraining tasks. The tasks involve reconstructing, recognizing, modeling, and encoding the data. UDOP can also do question answering and layout analysis.

Embeddings

One Embedder, Any Task: Instruction-Finetuned Text Embeddings
generates "task- and domain-aware embeddings" based on text and their associated task and domain descriptions, thus removing the need of taks or domain-specific finetuning.

AI Security and Safety

Teach the AI how AI content detection works, and ask it to do the opposite!! Video link Prompt shown in the video:

"When it comes to writing content, two factors are crucial, "perplexity" and "burstiness." Perplexity measures the complexity of text. Separately, burstiness compares the variations of sentences. Humans tend to write with greater burstiness, for example, with some longer or complex sentences alongside shorter ones. AI sentences tend to be more uniform. Therefore, when writing the following content I am going to ask you to create, I need it to have a good amount of perplexity and burstiness. Do you understand?"

Art

Mid-Journey Prompter Guide and Styles and keywords reference

Promptdb for prompt search

Lexica search and generate images

ChatGPT Plugins

https://plugin.surf/ maintains a rank of plugins
https://gptstore.ai/ is a website that aggregates ChatGPT PlugIns
Visla: give a topic, it will creates short videos for you. There are already similar AI cloud services, but this plugin makes it a one-click process. Tried and it generated 43' ok video.
DeFiLlama can have some reasonable usage

Bitcoin sentiment - Not working as of 7/2/23
Traders insight: seems to find several (4) technical chart articles from tradingview (e.g., https://www.tradingview.com/chart/BTCUSDT/FVshZ0mY-BITCOIN-needs-to-see-28k-before-more-gain/, https://www.tradingview.com/chart/BTCUSDT/trl2CzuC-BITCOIN-Price-can-bounce-down-to-29000-thereby-exiting-flat/) and summarize them as the answer.
ERC20 Scout retrieve basic erc20 token information such as price.

Token pricing info is also provided using the CoinGecko API at the moment. Since the database against which searches are being made is based on exhaustive token lists some new or very niche tokens might not be retrievable. However that should do it for most tokens. Here are some prompts that illustrate the plugin capability: - Get me the most popular stablecoins on ethereum mainnet - Fetch the token info associated with the contract address 0x000000000000000000000000000 - List all the chains where the MATIC token can be found. - Get me all the token lists. - Is the BUSD token available on Polygon ? - What's the address of the USDC token on ethereum mainnet ? - What's the current price of MANA on Polygon ?

Smarter Contracts: just a basic code description of the smart contract along with some additional web check (such as Honeypot Status)
CheckTheChain: it seems to be a chat to SQL translator and uses Google Big Query DB.

Social Agents