The Joys Of Programming

PragProg's Christmas-in-July sale is on:

https://media.pragprog.com/newslette...

Happy July 24, or as the gerbils like to call it, "Christmas in July!"

Use coupon code XMAS724 to save 40% on all ebooks, screencasts, and audio books (yes! we have audio books too!) from pragprog.com.

IMAGE(https://i.redd.it/uf6rjj174ef31.png)

I'm working on a fun little project right now that I had exactly zero background on when I started, but I've learned quite a bit! I'm running all the GWJ Conference Calls through a Speech-to-Text pipeline to generate transcripts and then put those through an LDA topic model to see the evolution of the podcast and topics over time.

Took me quite a while to get my Google Cloud account set up right and feed all the correct credentials but once I did I've been rolling along! So far I've spent almost $200 of my $400 in free GCP credits and processed over 5000 minutes of podcast. I'll definitely continue to document my progress as I go and share the results in a forum thread once I get them. At the current pace it's going to take another 5 days of processing and at least 4 more free Google trials with alternate emails to get everything parsed.

That’s awesome! I’m keen to see some results.

staygold wrote:

I'll definitely continue to document my progress as I go and share the results in a forum thread once I get them.

I'm looking forward to this Awesome project

CPWilson wrote:
staygold wrote:

I'll definitely continue to document my progress as I go and share the results in a forum thread once I get them.

I'm looking forward to this Awesome project

I can only hope to approach the level of your awesome infographics!

Update 1: I'm through nearly 100 podcasts for speech to text. Running some intermediate descriptive statistics:

  • 86 podcasts
  • 115 hours, 11 minutes, 28 seconds, 93 milliseconds of podcast time
  • 815 895 spoken words (+/- about 15%, the accuracy isn't superb on the speech-to-text)
  • 3 965 324 characters in the words
  • $222.89 CDN in compute cost
  • $28.99 CDN in federal/provincial taxes
  • 11 693 mentions of "game" or "games"
  • 31 mentions of "nerd" or "nerds"
  • 8 mentions of "booth babes"

This afternoon I started and finished my web crawler to go through and download all the podcasts so I am now the unofficial historian and keeper of GWJ podcast history (except episodes 3 - 52 and 55 which I think rabbit had locally hosted and aren't available any more).

I found a slick solution using R to convert .mp3 files to .wav so I'm running that, using Python to talk to Google Cloud, and just fine tuning my topic modelling algorithm which is a lift a shift from a work project so hopefully will have some early results for the first 100 or so podcasts by end of week.

P.S. I have no classical training in coding (Bachelor in Chemical Engineering, Masters in Business Management) so if I'm not proof you can teach anyone to code, I don't know what more you need!

How many mentions of "Legion" though?

*Legion* wrote:

How many mentions of "Legion" though?

Legion? Omnipresent, of course.

*Legion* wrote:

How many mentions of "Legion" though?

Legion > Booth Babes
(12 mentions of Legion)

Your code must be buggy. No way am I losing to "nerds"!

IMAGE(https://media.giphy.com/media/8SjVrE66V6WVG/source.gif)

staygold wrote:

I'm working on a fun little project right now that I had exactly zero background on when I started, but I've learned quite a bit! I'm running all the GWJ Conference Calls through a Speech-to-Text pipeline to generate transcripts and then put those through an LDA topic model to see the evolution of the podcast and topics over time.

Figure out what percentage of words are "Like" and how it's changed over time.

-BEP

Nice project! I'm interested to see the results. It is something you could do a front page article for, I am sure the writers would be interested if you want to go that route.

Ok, I should know this, but my brain isn't providing me with answers.
I have a google sheet full of data that I need to turn into multiple sheets with data from the prime one.
What's the tool or method to go about this?
My brain jumps to python, but I'm hoping to do this all within the online google-verse if possible.
Is it Google Apps Script?

For your needs Apps Script could be appropriate. I found the limited tools available for developing with Apps Script a few years ago very constraining, but I was trying to update a company's financials that they were keeping in Google Sheets hooked together with secret Apps Script hobgoblins - I certainly do not recommend that use case.

If this is a repeatable need you would like to hand off to someone else - a new set of input Sheets arrives and a drone clicks a menu inside Sheets to make the magic happen - Apps Script can do that pretty well. If its a one-off that's just too complicated to handle with in-spreadsheet filters, my brain also defaults to Python.