The Joys Of Programming

So here's a thing that happened. I have a voxel game engine that I've been working on for 4-5 years, it's MIT licensed and on github but as far as I know nobody else has ever done anything with it.

Then last week, Mojang apparently released a new retro version of minecraft built on it.

Somebody who poked around the source was kind enough to email and let me know, otherwise I suppose I might not have found out...

That's awesome!

neat!

It's weird.. ostensibly it's great but I'm kind of down about it. I wish I'd known sooner, so I could have participated in the reddit threads and hackernews comments and whatever, instead of picking through them a week later.

I mean if mojang has further plans for it then maybe there's more to come. But somehow it has the look and feel of a one-time marketing goof that's now over with.

Gah. Well, I get to bother you folks about it, anyway.

I guess you can add that to the project's GitHub page. Minecraft approved!

Speaking of that, here's a reminder everybody - put proper attribution stuff in your open source work! Mojang built their files in such a way that attribution comment blocks would be retained in the source bundle, but I never got around to adding one so my engine went uncredited.

Pushed a fix today, but considering they're using an engine version from mid 2017 I have a feeling they may not immediately update.

I mentioned this in the GWJ Slack, but like many before me I have embarked on the project of parsing PDF files to extract information. Unfortunately the format and locations of the data of interest will vary, even within files from the same provider.

The existing solution we use is built on the .NET formulation of iText. It reads through a PDF, infers a row/column structure, gathers atomic text objects, and logs the points defining the rectangular cells that contain them. Finally, the page, content of the text object, and the coordinates defining the cell are dumped to SQLite.

From there my team queries the database and attempts to make sense of the data, transforming them into Excel and JSON files. It’s curiously messy and at present error-prone because, for instance, a given text object may overlap columns and so we need to find the orphans and merge them.

I’ve resolved to read the PDF spec as far as I need to make sense of the control codes that define fonts and font sizes and what not. I’ve used qpdf to uncompress a PDF to see how the actual text of interest is codified, thinking that there’s got to be a more-direct way than inferring inflexible rectilinear structure to parse the information.

That said, many far-smarter people have tried this and done the same thing we do, more or less. It just seems like the abundance of PDF readers/renderers suggests we could make more-efficient sense of the underlying data than that.

Anyone know if better approaches (not including “Don’t use PDF”—that’s not an option yet)?

muraii wrote:

I’ve resolved to read the PDF spec as far as I need...

And he was never seen again. Heard he was still gibbering when they gently closed the door to the padded room.

One can only hope.

I actually wrote some Elixir code a few years back at work to convert landscape pages in a PDF to portrait for faxing. Doctors kept uploading entire patient histories, ImageMagick would use more than the allowed RAM, it'd kill the job and put it right back on the front of the queue, killing the system over and over til I figured it out.

I don't envy you having to look at PDF internals at all.

Feel free to PM me any questions, though, and I'll see what I can do to help.

Obliged!

libvips seems to be the new hotness for image and PDF manipulation, is it more sane than imagemagick? Haven't had a chance to use it yet, but have some tools in production putting imagemagick to heavy use.

Mr Crinkle wrote:
muraii wrote:

I’ve resolved to read the PDF spec as far as I need...

And he was never seen again. Heard he was still gibbering when they gently closed the door to the padded room.

One of the lucky ones.

Mr Crinkle wrote:
muraii wrote:

I’ve resolved to read the PDF spec as far as I need...

And he was never seen again. Heard he was still gibbering when they gently closed the door to the padded room.

Which is why when my work needed to scrape PDFs, we used the iTextSharp library.

Quintin_Stone wrote:

Which is why when my work needed to scrape PDFs, we used the iTextSharp library.

Yup. We rolled our own and eventually scrapped it, switching to iTextSharp.

-BEP

I'm wondering if anyone has recommendations on reputable training courses. I'm feeling a little rusty on some specific technical skills and thinking of brushing up because of the possibility I might be on the job market in the future. In the past I would skim or read big technical books, but I haven't done that for a while.

I've been working at ad agency jobs for the past 5 years. By their nature, ad agencies are fairly reactive, unfortunately, so you generally learn in the moment by simply looking up language or framework documentation. What I'm thinking of is something more structured and guided to get the cobwebs out of the corners of my skillset.

bepnewt wrote:
Quintin_Stone wrote:

Which is why when my work needed to scrape PDFs, we used the iTextSharp library.

Yup. We rolled our own and eventually scrapped it, switching to iTextSharp.

-BEP

We use it too but I and the other new hire came in to this project with neither of us particularly familiar with PDFs. We’ve learned a lot and tried a bunch of different parsers/scrapers, mostly in Python, but we may need to step back a bit.

I know this gets asked like twice annually but maybe we're due? Anyone have particular affinity for reading technical books on electronic devices? So, like, some of the Humble Bundle/No Starch Press stuff on an iPad, or a second display?

I prefer the ability to flick through an actual book, but part of the reason I got an iPad pro with a pencil is to read digital textbooks. Those bundles are just so cheap, plus I can get online access to stuff through work.

muraii wrote:

I know this gets asked like twice annually but maybe we're due? Anyone have particular affinity for reading technical books on electronic devices? So, like, some of the Humble Bundle/No Starch Press stuff on an iPad, or a second display?

I do a lot of technical reading on my laptop. The tools in Preview are pretty nice for being able to highlight text and take searchable notes.

PWAlessi wrote:
muraii wrote:

I know this gets asked like twice annually but maybe we're due? Anyone have particular affinity for reading technical books on electronic devices? So, like, some of the Humble Bundle/No Starch Press stuff on an iPad, or a second display?

I do a lot of technical reading on my laptop. The tools in Preview are pretty nice for being able to highlight text and take searchable notes.

This.

I don't typically read through tech books without being on my laptop. On my laptop I can alt-tab between reading the book and trying examples out.

If I want to just burn through a tech book without trying out examples an iPad works.

In the past I would have said I preferred paper tech books for the sake of reference, but these days I use the Internet for reference more than tech books.

WAR IS PEACE
PROD IS TEST
IGNORANCE IS STRENGTH

muraii wrote:

I know this gets asked like twice annually but maybe we're due? Anyone have particular affinity for reading technical books on electronic devices? So, like, some of the Humble Bundle/No Starch Press stuff on an iPad, or a second display?

All digital, all the time. Paper books are heavy, space-eating, obsolete junk.

This attitude, though, comes with a strong preference for tablets with large screens. 7" and 8" tablets are mostly useless IMO. 10" is the bare minimum for a legitimate replace-all-print-media tablet for me, and really I would like something larger. There's not enough good large tablets out there.

I read a lot of tech books on a Kindle Paperwhite in landscape mode, mainly those from PragProg.

It's just wide enough to be feasible, and something larger would be better, but I make do with what I have.

I read a lot of technical material (math, physics, theoretical computer science, etc) in PDF on a 10" iPad. It's great but just barely big enough. The newer 11" and 12.9" iPad pros would be even better.

I also use my laptop, but like to read in bed with the iPad.

Does anyone here program on a Mac? My current personal laptop is a 2013 MacBook Pro 13". I've gotten a lot of mileage out of it, but it's starting to show its age in terms of compilation time, etc.

I've been programming on Macs only since about 2007, 2008, so I'm inclined to just get another MacBook Pro. This one has lasted me 6 years and is in good enough shape that I could definitely sell it or pass it on without trouble.

I don't love the modern MacBooks. I hate that they're entirely USB-C. I don't like the keyboard, but I don't really want to make a switch either. I'm wondering what others in this boat have done.

Right now we have just stuck with our 'mid 2014' 15" model. I agree with you that the keyboards have stopped me from seriously considering one of the new ones. I don't know what I will do in a year or two if they insist on staying with their current design.

I was in the same boat as you a year ago and spent months looking for an alternative that I either a) liked the look of or b) didn't like the look of, but maybe it was cheaper or had better components or something. In the end, I couldn't find anything, so I ended up with a late 2018 15" Macbook Pro (coming from a late 2013 13"). Note, though, that I basically never work on it without an external monitor, mouse, and keyboard. If I did, I'd have stuck with the 2013 because I hate both the trackpad and keyboard on the newer laptops.

I only program on a laptop if I have to (IE: when remote). Desktops are way faster and can drive more monitors. If I have to do remote development, as long as the internet is fast enough I usually remote to my desktop. At that point whatever feels best is what matters.

I can't help from the Mac side as the last Mac I used was running OS8 but I assume you can remote to a Mac desktop.

I have a 2017 15" Macbook Pro for programming. At work I attach a USB dongle for keyboard and mouse, and two monitors with DVI-USBC connectors. All the actual work is done on the two monitors; the laptop screen is for my email and slack desktop and my non-work browser. This works pretty well. It can handle most things I want to do as long as I don't get too click-happy firing up Docker containers.

Working at home the keyboard is adequate. I can get stuff done but I would not want to do a lot of serious fast typing on it. I don't use any external monitors.