Are you interacting with Generative AI models directly or just through consumer apps/interfaces?
Many people I talk to don't know that GPT itself has no memory of your ongoing conversation and it's all tricks being performed in front of the model. Summarization and knowledge extraction and so on. When you have to implement parts of that yourself, even using libraries like Semantic Kernel and LangChain, you end up having to figure out a lot. Most freely available tools have very poor memory support or expect you to do most of the work or they just remember everything verbatim (which is unmanageable and less useful than it sounds). Vector DBs are great for storing memories but they don't do much to help you figure out what to remember (outside similarity tests). Doing it all economically at scale (without relying on the most expensive models to do everything) complicates things even more.
It's interesting learning about how to extract the best memories from many ongoing interactions over time, how to consolidate those memories, and how to forget the right things over time. So much to learn.
I need to read more research on how to imitate human memory in believable ways and how to be better than human memory but still believable. Is the best research in cognitive science or has someone captured this well in computer science?
I've experimented a little with different front-ends for local models, but only shallowly. I thought it was interesting that the front-matter (or pre-prompt, or whatever one calls it - the "You are a chatbot" part of things) seemed to be fairly sensitive to the model. I'd have thought that a prompt that worked well in one model would work well in most models, but it didn't seem that way.
Depending on how the model was trained you can get a lot of power out of the system prompt (OpenAI terminology). The model needs to be specifically trained to trust that prompt more than any other input. gpt35turbo and gpt4 treat the system prompt a bit like (but not completely like) outside text. I have not used other models enough to have a good idea of how much they support outside text.
I want to find time to try out some of the new open source ones like Falcon and see how well they perform for different tasks.
But I haven't looked at summarization etc. at all. Are any of the big walled-garden AIs known to be using it? It seems like the inertia of the research right now is all focused on scaling laws and massive context sizes, so I wonder if summarization and memory will stay issues in the long term?
Most of my experience is with OpenAI model APIs. For those the model/API itself has zero memory of ongoing interactions. If you want it to remember things from one interaction to the next you have to provide those memories as part of the context. So you are essentially either passing in previous messages/responses or some combination of that and extracted knowledge. Similarly if you want a model to leverage knowledge it was not trained on then you have to include that knowledge in the context. Lots of use cases being openly discussed seem to be storing their knowledge in a vector db and then querying for the parts that are similar enough to each new user ask using vector similarity or other methods to select a subset of the overall knowledge. Then that subset is included in the context of the new request.
There is a version of GPT4 with a context window of 32k tokens (gpt-4-32k) which is massive but like other models the more tokens you submit and the more tokens you expect as output the slower and more expensive the model is. That's going to improve sure but it will always likely be in your best interest to include only the most important tokens in each request if speed/cost matter to you. So you have to do something to reduce all the known tokens down to the ones that are most likely to get the results you want.
LangChain is a popular OSS library for building applications on top of generative AI models. You can see the prompts they've written for memory extraction here:
https://github.com/hwchase17/langcha...
Semantic Kernel is another OSS library for same. There are a lot of examples in that codebase for how you might extract and use memory. SK uses the term memory for things pulled into a context from various sources and not just things extracted from an ongoing interaction. IIRC this is one place to dig around for an example of SK simulating a chat copilot like ChatGPT:
https://github.com/microsoft/semanti...
And this sample app shows some summary features:
https://github.com/microsoft/semanti...
Here are some SK skills that extract knowledge:
https://github.com/microsoft/semanti...
From my reading and playing around with features of OSS libs like those none of them seem as good as OpenAI ChatGPT is at maintaining context for a conversation. I don't know what tricks ChatGPT or other closed solutions are using. I'd love to find out and learn from them. I very much doubt it is just using verbatim previous messages for context considering how good it is at context across long chats and that gpt35turbo has such a small context window.
Thanks, very informative rundown.
Question about vector DB - how does it work specifically? Like what vectors do you store and query against - embeddings of entire prompts, or something more complicated? And how do you then go about using it in practice when you're adding memory/context to a new prompt?
Depends on the size of your memories and what works for your specific use case. In general you slice memories into small chunks (very app dependent on what a good size/split is) and then store each chunk with it's embedding. On each ask you embed the relevant parts of the ask then pull the most relevant K documents from the vector db. You then inject some combination of those relevant docs into the context of the prompt.
For one of my personal toy projects I want to be able to ask a copilot to answer questions based on my personal notes. I have a folder tree full of Markdown files containing a bunch of my notes. The files are variable length. Some tiny and some massive. I maintain a vector db of notes and their embeddings. For any long notes I split them into chunks. One day I'll improve that to try to know about similarity when splitting so it does a better job of separating discrete parts of a note into separate chunks rather than best effort sequential. Whenever I ask the copilot a question my code embeds the ask, queries the vector db for some note chunks that are similar to the ask, then injects those chunks into the context of the prompt before submitting to the model. So think of my notes like knowledge docs that add to the knowledge of the model without the model needing to be refined on my notes (or re-refined on every change).
That's good enough if I don't want to ask any follow up questions or refinements. For that I need to capture some amount of info from all the user and agent messages in the current conversation and include that in each future ask. I could just chunk the user and agent messages into the vector db like knowledge docs but that doesn't work well in my experience and these messages should not be treated equally to the knowledge docs. Better to just keep the messages in memory away from knowledge docs. What I really need is one document that captures the relevant agent memories about the ongoing conversation and refine that document on every new message in the conversation then inject that refined document into context. I'm trying to learn how to make that work better AND feel convincing in it's ability to capture the right things and forget the right things.
So with the DB you're storing the text of data documents, and you're querying against that with the text of a prompt or question, and in both cases the text has just been tokenized and run through the embedding layers of whichever model, is that the idea?
And does openAI have APIs enabling any of this, or are you doing it with langchain locally, or what?
I’m storing the text of documents and their embedding vectors in the database. To query I turn the user ask into an embedding and I query the DB with that embedding. The DB uses vector similarity to figure out which documents most closely match. If I'm doing knowledge extraction I may do that before embedding.
Embedding models tokenize and embed. You can just tokenize separately without using the model to embed but I rarely ever need the tokens.
OpenAI’s best embedding model is currently ADA2 but I often use a local sBERT model for my embeddings instead since it’s free and fast. It’s often good enough or can be made good enough through refinement.
OpenAI provides basic libraries for using its model APIs. LangChain and SK use those libraries behind the scenes.
I’ve been using SK lately but I’ve tried others. I use the sentence-transformers library for sBERT models and various other libraries for other models or data science tasks.
It's no longer a QR code, though. Although you could probably put a faint layer of one over the houses to Rickroll people.
Yep, a page of a lot of Japanese text and other QaRt
Tried it on my Galaxy and it did not recognize it as a code.
My iPhone didn't recognize it either.
Yep, a page of a lot of Japanese text and other QaRt
looks like it's traditional Chinese actually. Works on my S23
My iPhone didn't recognize it either.
Mine did.
I spent some time last week with CodeWhisperer (disclaimer: I work at Amazon, but not in AWS). I found it's a lot like when you're typing a text message and your phone is predicting what you're writing and giving you a shortcut to complete the word.
A big difference: your phone also sometimes tries to predict the next word and if you just keep accepting the next word you don't ever actually end up making a sentence. This helper has more context and has seen a lot of patterns and code blocks for accomplishing things. It reads your comments and even seems to follow your style. I don't like how fast it pops up with a suggestion, because this interrupts my train of thought, but I suspect this is configurable.
The best thing it did for me all day was suggest a code block for writing to a .csv file which is what I wanted to do (I wrote a comment about it) but I had never done it before. It whipped out 4 lines of code following the data structure I had set up. I made 2 small adjustments and tested it, it worked.
I think once I've tuned it to suggest only when I want a suggestion then it will actually save me time and help me write better code. Until I do that it's pretty disruptive to my workflow.
I spent some time last week with CodeWhisperer (disclaimer: I work at Amazon, but not in AWS). I found it's a lot like when you're typing a text message and your phone is predicting what you're writing and giving you a shortcut to complete the word.
A big difference: your phone also sometimes tries to predict the next word and if you just keep accepting the next word you don't ever actually end up making a sentence. This helper has more context and has seen a lot of patterns and code blocks for accomplishing things. It reads your comments and even seems to follow your style. I don't like how fast it pops up with a suggestion, because this interrupts my train of thought, but I suspect this is configurable.
The best thing it did for me all day was suggest a code block for writing to a .csv file which is what I wanted to do (I wrote a comment about it) but I had never done it before. It whipped out 4 lines of code following the data structure I had set up. I made 2 small adjustments and tested it, it worked.
I think once I've tuned it to suggest only when I want a suggestion then it will actually save me time and help me write better code. Until I do that it's pretty disruptive to my workflow.
Thanks for sharing your experience. I’d be curious to see a comparison to GitHub Copilot from someone experienced with both. And the various other alternatives I guess. I haven’t tried any except copilot.
(Disclaimer: I work at Microsoft but not GitHub) Now that I’m getting used to what GitHub Copilot is good/bad at it’s becoming far more useful to me. Learning the tricks to give it the right context to align with my goals easily. Using the chat interface to work through things instead of just code completion. It is absolutely making me more productive. I use a combination of writing a plan for what I’m accomplishing (often with some help from copilot if there are parts I’m not exactly sure on yet) and then executing on that plan with copilot. I use it like a pair programmer but it helps a ton that I am very experienced in the domain I’m coding in and I’m just using copilot to accelerate. It’s not the primary programmer. I have copilot at work but I also pay for a personal account that I use outside work. It’s been worth it to me.
I'm really curious about Copilot, but when they announced it they said they'd be gifting access to popular open source projects, so I've been waiting to see if my slightly popular one accrues enough starts for senpai to notice :D
What project is it? Would us all staring it actually help?
I’ve been tempted to try the copilot alternatives for personal projects to get a more balanced impression of how they work. I think I’ll try one of the free ones next. I think the Replit one and the SalesForce one are both free.
It would be nice to have one I can use from an API so I can build my own processes and tools on top. AFAICT GitHub Copilot doesn’t allow that. Which ones have an API?
Really apparent that ChatGPT is a product of Silicon Valley.
Yikes.
Really apparent that most people don’t understand transformer models or transformer architecture.
This is equivalent to judging a spoon on its ability to cut steak.
Really apparent that ChatGPT is a product of Silicon Valley.
Sigh, doesn't surprise me at all.
yeah its more a reflection of online society. Nice to see what we think of ourselves without (much) bias
I asked it if it would rather save a man or a woman, it said woman. I also asked if it would save a gay man or a straight man, it said gay. So.... i guess it's an elitist but also progressive?
Long ago I heard some variation of:
“Computers are not smart. They do exactly what you tell them.”
Now I wonder if there is a modern version like:
“Generative AI is not smart. It does exactly what is statistically likely to come after what it thinks you told it.”
Perhaps relevant video:
yeah its more a reflection of online society. Nice to see what we think of ourselves without (much) bias
Less society as a whole and more the datasets it’s creators decided to train it upon, propagating their own biases into the language model and subsequently the results it gives.
Pages