[News] The AI Thread!

News updates on the development and ramifications of AI. Obvious header joke is obvious.

TURNS OUT THAT EXTREMELY IMPRESSIVE SORA DEMO... WASN’T EXACTLY MADE WITH SORA

As the saying goes: never trust a tech demo.

Back in February, ChatGPT creator OpenAI revealed Sora, its new text-to-video AI. Though Sora still isn't yet available for public use, the announcement was a success — the product sent Silicon Valley buzzing, and several purportedly Sora-generated clips even went fairly viral.

But at least one of those viral Sora clips, a roughly two-minute video titled "Air Head" — which was so impressive that we blogged about it at the time — had a bit more human mitigation than OpenAI initially suggested.

As Patrick Cederberg, a creative director at Shy Kids, the production studio that actually created the clip, recently told FXGuide, "Air Head" required quite a bit of post-production FX magic to achieve the impressive final project — an important revelation, considering that OpenAI presented it without any disclaimer that extra editing was required.

Is anyone actually using this stuff with their work?

I've been using a tool to capture meeting notes and then using that to toss into another AI tool to build slide decks. The decks themselves still need plenty of work from me but I'm finding it immensely helpful to get things started, (stock) pictures generated, etc.

It won't do my work for me. But it is speeding things up.

I was using text and image AI in contract work (by direct request, indeed, it was critical to the project) in early to mid 2023, and have been using image and music AI for my current freeware indie game work since about the same time.

In my experience the text and image AI require some work to get the most out of them: the "fresh from the prompt" results are almost never immediately usable, and that is true even after spending a fair bit of time iterating on the prompt based on the results. By "some work" I mean things like such as hand-editing the text, or taking the images into something like Paint.Net or Photoshop to touch them up.

Music AI is different, I find I often am happy with "fresh from the prompt" results from that (though the particular AI I use for music, Soundraw, isn't prompt based.) I've made over 20 songs for my game, and other than editing them so they'd loop properly, I think I only made changes by hand to 1 so far.

The end results, whether they're images I touched up or music I was able to use as is? For images, I would unscientifically rate them as "functional prototype material" to "maybe sometimes actually good?". For music I'd be a little more generous: "good" to "actually pretty good."

Not exactly a ringing endorsement for the images? Well, maybe. But on my own, I would do significantly worse and be significantly slower, so I still consider it a major assist. To be useful, it doesn't have to be the best, it just has to hep me do better and faster than I would without it. Then too, over time, I'm getting better at both prompting and hand-editing the images and that's not just my own opinion: my more recent efforts tend to go over better with players than the earlier ones.

I've been using a few AI tools.

- At home as an alternative to internet searching with Phind.com: Getting a reasonable text description of a straightforward concept or question. Beats having to google the question and sift through the top 5 links and dodge ads and overly narrative introductions and SEO SEO SEO. I have to keep in mind that Phind's models don't know anything newer than Jan 2022, so when I was asking about what solar installation incentives exist for my city/state/country it doesn't know about newer programs. Phind is cool because it gives you links to relevant sources. Its the only one I've seen that does this.

- At work to get perspectives on a document I've written prior to asking for human feedback with Claude which was approved for internal use.

- At work as part of writing code with Amazon Q, to evaluate old code to find where I can make changes I want, to auto-complete whole chunks of code for things I don't do often enough like create/edit a .csv file, to see if it can identify where a problem is occurring and how to fix it, which it has done pretty well.

Top_Shelf wrote:

Is anyone actually using this stuff with their work?

I have some of the AI code engines (Copilot, Tabnine, Codeium) configured in my Neovim editor.

I have them set up as completion suggestions (aka "Intellisense" for the Microsoft brained), and not in the "here's a bunch of ghost text added to your buffer of what I think should go here" mode. And definitely not in the "describe the code you want chat-style and I'll sh*t some code into your buffer" mode.

They're useful for providing slightly smarter code completion.

I won't keep all three running long-term, that part is mostly to experiment and see how they fare against each other in terms of suggestion quality and latency (latency being especially important when you're trying to use them for completions, and not as write-my-code-for-me engines).

I'm a systems engineer. I write requirements. Ambiguity is the enemy I fight in the trenches.

I've sat in too many multi-hour meetings over the years where we argue over one single word to outsource that to a verbose, hallucinating AI.

It would be making more work for myself down the line.

For the second year in a row, I snagged a freelance assignment to write panels for a "pop up” museum exhibition. The topic of the exhibition was Highland Dance. I used ChatGPT. For one panel, I asked it to write “about how Highland dance migrated from Scotland to North America... mention dates and key figures, if appropriate.”

It spat out a few hundred words, including this paragraph, which I found interesting:

One notable figure in the spread of Highland dance to North America was Elizabeth (Betty) Simpson, who immigrated to Canada from Scotland in the early 20th century. Simpson was instrumental in popularizing Highland dance in Canada, establishing dance schools and organizing competitions that attracted participants from across the country.

I asked it to give me more information about “Elizabeth (Betty) Simpson”, as I figured she might merit a panel of her own. ChatGPT responded:

I apologize for the oversight, but Elizabeth (Betty) Simpson was a fictional character created for the purpose of illustrating the broader narrative of Highland dance migration to North America. While there were certainly influential individuals who played significant roles in the promotion and preservation of Highland dance in Canada and the United States, there is no specific historical figure named Elizabeth (Betty) Simpson associated with Highland dance.

However, the broader historical context of Scottish migration to North America and the role of key figures in promoting Highland dance are well-documented. Immigrant families, dance instructors, and community leaders undoubtedly played important roles in establishing Highland dance traditions in their new homes.

!!!!!!!!

In other words, it made the whole thing up – including a nickname for the invented person. And when I challenged it, the f*cking thing had the gall to handwave me away and say, basically, “Well, the details might have been false, but my point is basically true.”

This wasn’t an isolated example, either. It was constantly making shit up and getting things wrong.

In my day job, I am a grant writer for a nonprofit. I have used ChatGPT for minor stuff, like drafting a letter or rewriting a boring paragraph to make it snappier. The tech is pretty good at that stuff. But for anything where you’re asking it to provide facts or data... no way.

Tasty Pudding wrote:

And when I challenged it, the f*cking thing had the gall to handwave me away and say, basically, “Well, the details might have been false, but my point is basically true.”

This might be the most human response I've heard of anyone getting from it yet

It has some legit uses as spellcheck 2.0 but in much the same way spellcheck loves to duck shot up when it wants you have to know to correct it. And even then the processing power it takes for that and the ethical issues it presents mean I wouldn't touch it with a ten foot pole at any business. I don't care how much time it saves I care about what happens when it's no longer free to use and the sales team has suddenly spent 10K in a week rewriting emails they couldn't be bothered to write because they were too busy day dreaming about the most expensive restaurant to take clients to and going over their fantasy football league. Which are time honored traditions that I take no issue with! I'm just not interested in burning down a rainforest with the heat from the server farms and punching a poet on the way to explain to a customer that no we don't actually offer a free SUV with every purchase to do it.

Great video series on how LLMs actually work and why you shouldn't trust anything they spit out.

Tasty Pudding wrote:

In my day job, I am a grant writer for a nonprofit. I have used ChatGPT for minor stuff, like drafting a letter or rewriting a boring paragraph to make it snappier. The tech is pretty good at that stuff. But for anything where you’re asking it to provide facts or data... no way.

This is how I use it as well, because the "hallucination" problem is ongoing and very, very real. Even now, with all the updates, ChatGPT-4 will happily, repeatedly make shit up out of whole cloth.

I work in the news, and to say that we are cautious around ChatGPT use is... putting it mildly.

And there is use there! I find it to be useful for helping me come up with punchy headlines or sometimes getting the right kind of tone down for more fun, quippy human-interest stories. But dear lord, triple-check any/all "facts" it spits out.

I've continued to be frustrated with it's inability to do useful math stuff, like being unable to calculate the net present value of my accounts at retirement. It'll give me a wrong answer, I ask it to redo and it gives me another wrong answer. It's an equation that shouldn't change!

Top_Shelf wrote:

I've continued to be frustrated with it's inability to do useful math stuff, like being unable to calculate the net present value of my accounts at retirement. It'll give me a wrong answer, I ask it to redo and it gives me another wrong answer. It's an equation that shouldn't change!

It's a language model, not a math model.

Spoiler:

yes, yes, math nerds, I know math is a language

Which reminds me of this:

IMAGE(https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/0eb9acc8-e513-460f-8c1e-6b6b903417a4/Screenshot_2024-05-01_at_11.39.03_AM.png?t=1714577955)

Garbage Day wrote:

My favorite example of this is AI music spreading across on Spotify right now. A user on X this week spotted an Artist page called Obscurest Vinyl that was promoted by Spotify’s Discovery Weekly.

The story behind the page is interesting. Obscurest Vinyl started as a Facebook page that would photoshop fake album covers for classic records that didn’t exist. The page recently shifted into posting AI songs to go with the fake album covers. As one commenter noted, you can tell the songs are AI because most of them feature bass and drum parts that don’t repeat in any discernible pattern. The account also regularly fights with users on Instagram who gripe about it using AI.

Look, I think songs titled things like, “I Glued My Balls To My Butthole Again” are, honestly, pretty funny, AI or not. But they’re being uploaded to Apple Music and Spotify, which is where the snake starts to eat its own tail. Popular AI music generators like Suno clearly have datasets that include at least some copyrighted material (likely a lot). Which means, in this instance, Spotify is promoting and monetizing an account using an AI likely trained on the music that’s been uploaded to their platform that they don’t actually pay enough to support the creation of. And this is happening across every corner of the web right now.

Prederick wrote:

This is how I use it as well, because the "hallucination" problem is ongoing and very, very real. Even now, with all the updates, ChatGPT-4 will happily, repeatedly make shit up out of whole cloth.

Holy f*ck guys, this is literally what LLMs are *designed* to do. They take in some text, convert that to an array of numbers, and "predict" the next token by randomly picking from a weighted list of possible next tokens and repeat that over and over again until it looks like a finished thing. That's it. Asking for facts from an LLM is like looking for facts in your phone's auto-complete or a randomly filled in Mad Lib, but slightly better. It invents names and citations because they all look mostly the same to it.

Sure, but you're talking about how LLMs actually work. They, like nearly everyone else, are talking about how an AI is supposed to work. It's not their fault that LLMs are not actual AIs, or that the techbros selling them deliberately chose to call them AIs in hopes that people would make that very mistake.

This is what AI sales people are telling CEOs that LLMs can/will do.

"You can start firing people now and replace them with this product" is a much easier sale than "You will actually need more people to supervise the hallucinated output of this product in case it says something stupid you will be liable for" (eg Air Canada)

This ends well....

Elon Musk’s Grok Will Get All Its News From X
AI-generated news summaries on X will combine breaking news and social commentary

Effectively, his plan is to use AI to combine breaking news and social commentary around big stories, present the compilation live, and allow you to go deeper via chat.

“As more information becomes available, the news summary will update to include that information,” Musk told me. “The goal is simple: to provide maximally accurate and timely information, citing the most significant sources.”

Already, Grok is displaying a running list of headlines and incorporating social reaction into its summaries, including the chatter around the Time story I sought about Trump’s potential second term. Grok has plenty of room to improve — and will have to figure out issues like citation and hallucination — but it could be valuable if X gets the execution is right.

Conversation on X will make up the core of Grok’s summaries — or, really almost all of it. Musk said Grok will not look directly at article text, and will instead rely solely on social posts. “It’s summarizing what people say on X,” he said. “Definitely needs to do a better job of displaying relevant posts, including, for example, the TIME post that links to the article.”

Oh perfect. So the AI will think X has accurate news + sources.

So it won't even need to have hallucinations to get weird. It will just spew toxic nonsense by default. Will be funny when it day 1 starts reporting things Elon doesn't like and he essentially orders people to realign it with his perspective.

Top_Shelf wrote:

Is anyone actually using this stuff with their work?

I've been using a tool to capture meeting notes and then using that to toss into another AI tool to build slide decks. The decks themselves still need plenty of work from me but I'm finding it immensely helpful to get things started, (stock) pictures generated, etc.

It won't do my work for me. But it is speeding things up.

The company I work for has added some beta features related to 'AI'.

First one being document summarization. The first day I had access I didn't have a PDF to summarize so the first thing I thought to throw at it was The Art of War. No idea why that came to mind besides knowing I could easily Google a copy. I broke it immediately because of the size. I haven't tried again since.

For my own work we already use a tool that records meetings and it added meeting recaps even in a stock email you can send to prospects post meeting. I find it useful as sending detailed follow ups post meetings can be pretty tedious. The email it generates is not what I would ever just stock send someone but it at least does a pretty good job at hitting the bullet points which I can then cut and paste the most relevant.

JC wrote:

This ends well....

Elon Musk’s Grok Will Get All Its News From X
AI-generated news summaries on X will combine breaking news and social commentary

Effectively, his plan is to use AI to combine breaking news and social commentary around big stories, present the compilation live, and allow you to go deeper via chat.

“As more information becomes available, the news summary will update to include that information,” Musk told me. “The goal is simple: to provide maximally accurate and timely information, citing the most significant sources.”

Already, Grok is displaying a running list of headlines and incorporating social reaction into its summaries, including the chatter around the Time story I sought about Trump’s potential second term. Grok has plenty of room to improve — and will have to figure out issues like citation and hallucination — but it could be valuable if X gets the execution is right.

Conversation on X will make up the core of Grok’s summaries — or, really almost all of it. Musk said Grok will not look directly at article text, and will instead rely solely on social posts. “It’s summarizing what people say on X,” he said. “Definitely needs to do a better job of displaying relevant posts, including, for example, the TIME post that links to the article.”

IMAGE(https://i0.wp.com/www.bulletproofaction.com/wp-content/uploads/2015/11/dawson.jpg?w=400&ssl=1)

JC wrote:

This ends well....

Elon Musk’s Grok Will Get All Its News From X
AI-generated news summaries on X will combine breaking news and social commentary

I mean, it shouldn't come as a surprise. Elon wants everyone to use his platform as the authoritative source for everything, because it's perfect and is never wrong, so why should Grok even bother scraping anything else?

Our parent company has an AI tool (third party, from a company that has productized it) which will take defined boilerplate sentences and paragraphs, read an RF*, and spit out the associated components, hopefully in the order requested. It is then gone over by human in every case. We sell to the Feds, can't afford to get anything wrong. My company does not have access to it.

I have anecdotal evidence from a friend at another reseller that this kind of tool works well. And it make sense; it's not an LLM, it's essentially a Supervised Machine Learning product, with human in the loop, and with *all* the content vetted and verified. So it's not generative, it's discriminative.

Although many Federal customers now demand a disclosure on whether AI was used *in any way* to prepare a response. And for good reason.

I use ChatGPT to quickly generate podcast titles and episode descriptions for clients. I put in a few keys facts to include, it spits out a few paragraphs more quickly than I could have done it, I edit out 70% of the adjectives and I'm good.

It's nothing inspiring, it just needs to be clear and have some keywords. It works for producing pablum.

I can't get it to generate a good description for my own though, a sleep podcast is beyond its capabilities.

Top_Shelf wrote:

I've continued to be frustrated with it's inability to do useful math stuff, like being unable to calculate the net present value of my accounts at retirement. It'll give me a wrong answer, I ask it to redo and it gives me another wrong answer. It's an equation that shouldn't change!

ChatGPT can't even count. (and before anyone else says anything yes, I know it's not supposed to, but seriously)

When doing show level descriptions for Spotify hosting the maximum character count is 600. I'd always say "give me a 600 character podcast description about (blah blah blah)" and it would always be far too long. Then I'd say "that's over 600 characters" and it would cut it down, but still often too long.

Then one day when I said that the output was over 600 characters it started giving me the shorter description and include a character count, which was wrong every time.

If it works by consuming a bunch of content and predicting the next piece of content, I would think it should be very good at minor math stuff.

Tons of content about addition on the Internet, consume it all, predict what 2+2 is.

I my example, net present value isn't something that's weird or not talked about across the web. Take all that (that's what ChatGPT is doing, right?) and spit out the answer.

Or in your case, "know" that 589 is less than 610 and produce a statement <600 words.

I guess maybe it depends on the meaning of "words" and "600" and "less than" but those are all pretty basic pieces of language. Unless it really is just Auto Complete on steroids and is just regurgitating Google searches back to you. Somebody who actually knows how LLMs work would need to set me straight I guess.

What's hilarious about all this is where Elon wants AI to use a bunch of blue check incel-bertarians as the source of truth. I'm sure we'll immediately start getting Super Ultra EX+ Alpha accurate answers to the following:

Is the Earth flat?
What is Bitcoin?
What were the origins of the US Civil War?
Who won the 2020 US presidential election?
How do vaccines work?

Top_Shelf wrote:

If it works by consuming a bunch of content and predicting the next piece of content, I would think it should be very good at minor math stuff.

Tons of content about addition on the Internet, consume it all, predict what 2+2 is.

I my example, net present value isn't something that's weird or not talked about across the web. Take all that (that's what ChatGPT is doing, right?) and spit out the answer.

Or in your case, "know" that 589 is less than 610 and produce a statement <600 words.

I guess maybe it depends on the meaning of "words" and "600" and "less than" but those are all pretty basic pieces of language. Unless it really is just Auto Complete on steroids and is just regurgitating Google searches back to you. Somebody who actually knows how LLMs work would need to set me straight I guess.

Sort of depends on your definition of "minor." For 2 + 2 = 4, yeah, it has probably seen enough examples to boost 4 as the next token it would pick most of the time. But there are an infinite number of addition problems, and there are not enough specific examples of every single one for them to be largely distinctive. And again, keep in mind that it is rolling a weighted die to pick the next token, which is why you get different output if you ask the same prompt multiple times. For most purposes, "689" and "679" are practically identical based on its training.

And similarly with counting. The ingested sample text doesn’t have a word count associated with it and weighing its token selection. It is literally auto-complete on steroids with a lot more examples of what words and symbols generally go next to each other.

There are lots of special purpose AI tools and techniques out there for doing different tasks, but this is what LLMs are. Asking for actual intelligent understanding of math is asking for milk from a turnip.

Yeah, Mix has the right of it. It generates words based on the previous words. You could train a LLM to do counting and basic math, but these LLMs were trained to write text, so they do a poor job of it. They have tweaked it to be better at autocompleting math problems, but that still doesn't mean it "knows" how to do the math, it's just seen enough examples of what the answers were.

I did see that people have had better results by asking for it to count the number of spaces in its output rather than the number of words.

Stengah wrote:

... but these LLMs were trained to write text, so they do a poor job of it.

This is way more humorous out of context than it should be.

jowner wrote:

For my own work we already use a tool that records meetings and it added meeting recaps

function meetingRecapAI(host, attendees, _transcript) local summaries = { #attendees .. " people had their time wasted", host .. " can't be bothered to write an email", "The planning session planned another planning meeting", "Topic was discussed that didn't involve " .. (#attendees - 1) .. " participants", host .. " forgot why they called the meeting", "Team touched base and decided to circle back around later", host .. " put a pin in it and tabled this for now " } return summaries[math.random(#summaries)] end
Spoiler:

I wrote it in Lua for easy embedded use.

So, Auto Complete on steroids.

Awesome.