
So as not to clog up other places, here is a thread for the general discussion of AI, its advances, implications, and suchlike. This thread meant primarily for the use of non-AIs (e.g. humans), but any AIs that happen by are of course welcome to join in. (If we're not sure whether someone is an AI or not, John Searle can presumably sort that out for us.)
As a starter, this article from today:
Do Large Language Models learn world models or just surface statistics?
It's a fun read, but the tl;dr is that the authors trained a GPT on transcripts of Othello games ("E5, D6, .." etc), so that it could successfully predict legal moves. Then they did various stuff with the GPT's internals, and were able to show that it had developed a "world model" of the game - i.e. that Othello has a grid of connected nodes whose state can alternate after each move - even though the GPT had only ever seen game transcripts.
Not a surprising result, exactly, but it's impressive that it can already be demonstrated, and it seems like this sort of poking around in AI models, and attempting to understand or intervene into their weights, will be a interesting thing to watch from here on out..
Trying again, with some distance...
Yet another explication of the Chinese Room thought experiment.
My issue with this is that the key elements of the CRA are extraneous to the question being examined. If a computer program can pass an arbitrarily hard Turing test then there are plenty of interesting questions we can ask about it, but putting that program in a room with John Searle illuminates none of those questions. Searle thinks that the human in the room is the system we're interested in ("me, for example" above), but this is clearly not the case - whatever the program's mental capabilities are or aren't, they don't change when Searle enters or leaves the room.
To see this, try reading Searle's two quotes about the CRA (section 1 in my link), and remove the bits about the human. The result boils down to: "imagine a room containing books of data and instructions which would allow someone to pass the Turing test in Chinese. I assert without argument that those books cannot understand Chinese."
In the analogy, this is like simply turning off the computer and then asserting that it addresses no interesting questions. Without input and output passing through a Black Box, we don't even have a basis for discussion of whether the Black Box, based on the output, can be said to "understand Chinese". So I don't buy this as a useful objection.
Also, would you assert that a simple Chinese-English dictionary, grammar and phrase book - a very large shelf of books, I guess - would understand Chinese? Because that's the conclusion that your argument supports.
Later...
What I'm pointing out is that that last claim - that the computer can't understand anything the man doesn't understand - isn't a conclusion Searle has reached, it's an assumption he makes but doesn't support. If he's already assuming that, why invoke the man or the room at all? He could just have said "Turing tests are flawed because functionalism is wrong" and called it a day.
But this is not what Searle is saying. Searle is saying that the human *also* does not understand Chinese; that there is literally nothing in the Room that actually *does* "understand" Chinese. But he needs an agent - a notional computer program or, for people less familiar, a person who can't speak or read Chinese - to actually make the engine run. Without it, as above, it's just a room full of books. His assumption is that no one believes that a room full of books "understands Chinese", and the corollary is that the operative agent - the program (ignorant human) making the system work is defined as not understanding Chinese.
And later...
If a system passes a test for having some property, then by definition it has the property it was tested for. Asking about "thinking like a human being" rather rigs the game, since that's very precisely what a Turing test tries not to examine, but consider the property "understands Chinese". If you were designing the most rigorous test you could think of for whether someone understands Chinese, wouldn't that test look fairly similar to a Turing test? If so, and if a chatbot then passes the test you designed, on what basis could anyone claim that the chatbot nonetheless doesn't understand Chinese? To do so would be to claim that the property cannot be detected by tests at all - in which case it's phlogiston and we can safely dispense with it.
I guess the issue here is, as a programmer I don't have any pre-existing opinion on whether chatbots can think/understand/etc. It's like asking whether they can glorp - we need a definition for glorp before we can even consider the question, and then we resolve the question by applying the definition. But Searle seems to view it the other way around - he seems to be saying "no no, as a member of the AUPSLOPTP I already know which systems can glorp. Humans obviously can, and chatbots definitely can't - not even hypothetical future ones. I can't tell you precisely what glorping is, or what effects it has on the world, or how you can test for it yourself. But chatbots can't do it, so if a chatbot ever passes your glorping test then that proves the test was flawed."
Is that what's going on here?
The bolded section above is flawed, and you actually are heading where Searle was. The answer is Searle's answer - that the Turing Test is *not sufficient* to judge whether a computer (for sake of simplicity) you are interacting with is actually *understanding* Chinese. It can't tell us that. We have to find some other way.
The "glorp" argument is an argument from ignorance. It asserts that since we don't completely understand thought, we can't speak about it usefully. The problem with that stance is that science and philosophy are pragmatic; they follow the best available evidence on a topic with hopes to improve our understanding of it (science) and discover whether it's actually a useful thing to study in a particular way (philosophy of science, as well as epistemology and other related topics).
If your argument is "First we have to define what thought is in such detail that no can dispute it", then you're going to be waiting a long time to be able to pitch in on the discussion. However, if you can accept that humans and many animals have varieties of conscious cognition, then the assumptions Searle makes are good and useful for our current state of knowledge. That's the Pragmatic approach.
In the interest of keeping things, well, interesting, here's a brief discussion of a technology that has shown utility in judging *why* an AI trained to do a certain task makes the choices it does. It ranks choices on how closely they follow an aggregate human response (that was used in training via expert human input); that is, on whether it's process of selection uses the same types of reasoning that a human does in making the choice, or whether it uses some method not accessible to or used by humans to reach the choice.
This of course means that we could build and train/tune AIs that reasonably *could* be said to understand tasks and the associated reasoning in the same way as humans do - even though it's not using a human brain to do it. It's still a long step from there to a general-purpose "think like a human" AI, and we'd end up recreating the foibles of thought of humans in some ways (but not emotions or other biochemically driven reactions), but it does give at least the promise that a useful follow-on to the Turing test can be developed.
Edit - (Note that this test *extends* the Turing test in useful ways, by examining both the data and the processes of thought (the reasoning across the data set). So in this sense, it's no longer a Turing test; it's a derivation that includes methods used to determine not just whether the answers are plausible (trustworthy), but *why* they are trustworthy, and when even seemingly good answers should be tossed out because they are not arrived at by methods validated by human expert experience.)
Edit 2 - And I see that your friend's paper is using similar ideas as the MIT one.
I didn't think you actually believed the stuff your arguments led to, but I had to put them out there. Please don't take that the wrong way.
Thanks! I still find the last bit interesting. It points to a possible framework for trusting AI judgements in specific fields, through what is essentially periodic (or continual) auditing and comparison with human expert judgements. I find that encouraging.
They're not saying it mashes bits of the source images into the ai generated one, they're saying that the source images were used to train the ai on what certain things should look like. So if an ai produces a picture of a woman in a white Victorian dress (one of the examples I saw on their site), proper attribution would list every source image that went into teaching the ai what a woman in a white Victorian dress could look like.
Ideally, every ai program like those their site talks about should have a publicly viewable list of all the source material it was trained on, but the creators of the ai's will never do that due to pretty much all of them ignoring copyright when having it scrape the internet for more material.
I would like to make a proposal that we change the terminology of this "Artificial Intelligence", AI, stuff to "Stochastic Intelligence", SI, as that is a far more accurate descriptive term for what it does.
But then are we not also stochastically intelligent? After all, there is random input into everything cognitive, human or otherwise. I think it's a sliding scale, and while AI and human cognition land at different points of involvement of randomness, all cognition is on the scale.
Artificial vs Human/Natural is the opposition that more accurately describes the intent behind the distinction. Randomness is present in both but is not the main point of either. AI is artificial because it is created by humans. That's the meaning of the label.
They're certainly implying that, but they're not doing anything along those lines. The site basically looks for visually similar (but otherwise arbitrary) images from SD's training data and shows them. If you give it a newly-taken photo of yourself it'll find some similar images and claim that yours was created from them.
But more generally, I don't think their central claim - that an AI's output is "based on" visually similar images in its input - is true in any meaningful sense. Or rather, we don't understand how AIs think well enough to even talk meaningfully about the question. I mean - the latent space is there, regardless of what you have and haven't trained on, but do certain images allow the AI to reach certain parts? Is it even the images per se, as opposed to the CLIP data? I don't think anyone can answer such things yet.
I'm deeply confused by this. The latent space is created by a process of reducing the dimensionality of features of objects which are themselves *related*; that is, similar. The latent space thus, to one degree or another, encodes *semantic* information that is a result of the similarities (defined by a similarity function) between the objects.
This semantic information is then applied in later steps (adding noise and then denoising) to create a *new* image that is distinct from all the previous images it was trained on in the defined similarity space. That's why it is *not* simply noising images and denoising them. The variety of output, including on repeated runs of the same parameter, would not be possible if the function worked as you describe above.
No matter whether we understand what the "black box" of the algorithm setting up the latent space does, we can put our fingers on the starting images in the similarity space, and so we can say what images where available to the algorithm that created the final image.
(It seems to me that this process is trying to be similar to what humans do when they say they know what a chair is. We can generalize from all the chairs we've seen, and generate new, unique ones; we are not limited to imagining only blurred versions of the ones we've seen before. We create a semantic understanding of the concept of "chair" from repeated viewings of different chairs. It does not matter if we do it in exactly the same way.)
You can say, however, that the output is related to related inputs in some way. That is, if you ask for a "chair with a horse sitting on it" you're not going to get a picture of a fish swimming in a lake, right?
When I say "related to related inputs", I'm referring to the fact that there is a step in creating the latent space that associates descriptors with multiple images, and so the latent space is *semantically* organized. That is, if you ask for a horse as part of your picture, that will be used to identify the latent space subset that contains or connects to the "horse" labels.
So it should be possible to say that within your picture, any training data labeled "horse" could be involved. But it's not taking one of those pictures, noising it and denoising it, then presenting it to you. It's using elements of those pictures to assemble a new image.
This is done by the code, not by some kind of "mind", but the effect is to create an image similar to the requested one, but uniquely distinct from it. The training data prepares the model; the pictures are not based on anything outside that data set, so there has to be a set of closely related images to the output; it's the way the model works. So when you say there is no way to trace the output back to a set of input data, I'd say that's incorrect; simply by going to referenced part of the latent space, you're looking at lower-dimensional representations of the similarly tagged input images, by definition. By extension, if you can access the training data and search for everything with the label "horse", you will find the set of possible input images for the output.
This means that any requested image that contains a "horse" should resemble the training data labeled "horse", right? So they actually are training the model to find visual similarities, just not with a human style cognitive process (that we know of). It's still trained to find visual similarities. That's the actual purpose of the model...
Right?
I've recently seen ads for both air conditioners and cram schools that claim to be AI-driven, so I think it's okay to just use the term AI for any concept at all, and conversely to refer to AI by any label.
As such I vote we start referring to AI and ML systems as "blockchains".
Do you actually consider that AI/ML/DL have any utility at all?
Stengah wrote:they're saying that the source images were used to train the ai on what certain things should look like.
They're certainly implying that, but they're not doing anything along those lines. The site basically looks for visually similar (but otherwise arbitrary) images from SD's training data and shows them. If you give it a newly-taken photo of yourself it'll find some similar images and claim that yours was created from them.
That's because you're deliberately misleading it. If you falsely claim the uploaded image was made by an AI, it will trust you and do what it was set up to do. It's not a reasonable expectation to think that it will tell you the image you submitted was not made by an AI.
There was an AI announcement last week that actually had me excited, rather than annoyed by all the... I'll call them Content Remixing AI.
DeepMind Announces Minecraft-Playing AI DreamerV3 (link to the paper)
The AI is 3 neural network agents working in tandem.
a world model which predicts the result of actions, a critic which predicts the value of world model states, and an actor which chooses actions to reach valuable states.
The model the AI used was generated over a few days with the 3 agents experimenting in isolation on some pretty beefy hardware. Together the AIs built a model that could reach a goal in Minecraft of digging up diamonds. That major goal of course has multiple sub goals, like 'make an iron pickaxe', 'learn how crafting works', 'dig', and, you know... 'walk'.
A previous AI + Minecraft milestone was last year when OpenAI had an AI agent observe about 70,000 hours of Minecraft Youtube content and use that knowledge to go in game and craft a diamond pickaxe from scratch. Cool, but holy crap that time investment, even sped up and parallelized.
Stengah wrote:fenomas wrote:The site basically looks for visually similar (but otherwise arbitrary) images from SD's training data and shows them. If you give it a newly-taken photo of yourself it'll find some similar images and claim that yours was created from them.
That's because you're deliberately misleading it. If you falsely claim the uploaded image was made by an AI, it will trust you and do what it was set up to do. It's not a reasonable expectation to think that it will tell you the image you submitted was not made by an AI.
I didn't say the site should detect non-AI images, I said that giving it one is an easy way to verify for yourself that what it's actually doing is an image similarity search.
Your complaint isn't making sense to me then. They explicitly say the site is performing a similarity search withing the training set and then displaying the closest results, so what about the premise do you find the absurd?
How does Stable Attribution work?When an A.I. model is trained to create images from text, it uses a huge dataset of images and their corresponding captions. The model is trained by showing it the captions, and having it try to recreate the images associated with each one, as closely as possible.
The model learns both general concepts present in millions of images, like what humans look like, as well as more specific details like textures, environments, poses and compositions which are more uniquely identifiable.
Version 1 of Stable Attribution’s algorithm decodes an image generated by an A.I. model into the most similar examples from the data that the model was trained with. Usually, the image the model creates doesn’t exist in its training data - it’s new - but because of the training process, the most influential images are the most visually similar ones, especially in the details.
The data from models like Stable Diffusion is publicly available - by indexing all of it, we can find the most similar images to what the model generates, no matter where they are in the dataset.
So I guess your concern about generative AI is the copyright one?
Recent studies have shown around a 2% incidence of what would be legally considered "copying", although it's early days and only small subsets of the models could be examined in detail for the studies. Here's an overview of one.
But I don't see researchers or IP folks arguing that models structured like StableDiffusion are not transformative. It's also obvious that these models depend entirely on their training databases for input, but I don't see how asking "which images were used to make this one" is useful from a technical, rather than legal stance. The question I'd ask is "which input images are closest to the output image", instead, and that's something that can be answered with a similarity search on the training database without much difficulty. It's entirely possible that the transformations could yield false similarities to images not used in the training process. What should we then conclude? I'd attribute that simply to the wide variety of images created by humans so far in history.
I think that as in many cases of new technologies, the exigencies of how AIs are trained and their eventual (extreme) utility will force a change in the law. Similar to the way musicians had to accommodate streaming music services. I suspect artists and many others will view this as "being screwed", but how can you realistic limit images of the world, which are needed for training, but in every direction contain copyright or private material, which we deal with every day?
Just like a human could on purpose or incidentally be influenced by the things they see in daily, I suppose.
Stengah, if you don't agree here I'm not trying to change your mind or anything. But my complaint, again, is that the site is claiming to be way more than just a similarity search. On the top page and elsewhere in its FAQ it's claiming to show which training images a given output is most influenced by - to the extent that the output should be attributed to those specific inputs, and share revenue with their authors. That's a much bigger claim than just saying which images are visually similar.
That said, ironically it seems to me like the site argues rather more forcefully against its conclusions than for them. Even for its built-in samples, the inputs and outputs look much less similar than I would have expected - which would seem to argue for StableDiffusion being far more transformative (in the copyright law sense) than its critics believe.
I cant agree or disagree since i still don't understand your complaint. Is it that you think theyre just guessing that images they select from the training data were the main influences for the ai generated image? Or is it that they're only picking a handful of the most visually similar images and they ought to be making an exhaustive list of all the possible source images that went into the AI's understanding of the terms applicable to the image it created?
Okay, but we know exactly what images they used to train it, so where is the issue in saying that those most similar to the output had the greatest influence on the output?
They're pretending that the question they want to answer (which images a target was derived from) and the question they can answer (which images a target is similar to) are the same question.
Still lost on what the ultimate problem is. Should we not give the creators of the model the benefit of the doubt as regards their model? Given that you and I are not going to trawl the code. But it strikes me that "derived from" and "similar to" are close enough in meaning that either of them could be argued in court as showing that the original was the source of the output. (I assume that's still your concern?)
Maybe this will help?
"Substantial similarity" is actually the term for the test used to determine whether something has violated copyright...
Kind of a click-baity splash image, but overall a good video discussing the new lawsuits alleging copyright infringement, whether a work is derivative or transformative, and implications for fair use.
Pages