The Joys Of Programming

Python has really started to grow on me.
I went to edit the library's website a bit today and ended up going through and reformatting the code for every page.
And yes I have been informed that HTML is not programming, please don't lynch me.

Mixolyde wrote:

Have you learned the power of the greatest remote shell tool ever: Screen? It cured my cancer AND made me more attractive to the opposite sex.

I was a Screen user and now am a tmux user.

In a lot of ways, tmux feels like a cleaner, more modern Screen. The config file syntax is much more sensible and readable.

It's also much easier to Google search for.

I just learned the miracle of screen two days ago. Effing VPN.

Back to recommendations for messing with *nix; I am a huge fan of running Linux from an external USB drive. In fact, I am typing this on an Ubuntu install that resides on a 500Gb usb drive sitting on my gaming rig.

I get the modern hardware of my gaming rig, I get a non-VM install (I find the general unresponsiveness of VMs off-putting, though they can be incredibly useful), and I didn't have to crack my case, or repartition a drive, or mess with boot records. I just enter my BIOS' boot menu on startup, and select the USB drive.

Funny thing is, I haven't unplugged the drive in three days... my transformation my be nearing completion.

AND

I just started messing with KVM this week. Ubuntu has a graphical vm manager that makes creation and installation a snap. And it runs over SSH. So simple.

(I find the general unresponsiveness of VMs off-putting, though they can be incredibly useful)

Under Windows, VMWare is pretty good. Running games is chancy at best, and really drive-intensive stuff will be a lot slower, but most of the time it feels almost native speed.

I just started messing with KVM this week.

KVM is super-responsive, and extremely solid. In another year or two, I think it may become one of the best virtualization systems going.

My "joy" the past couple of days was trying to get a demo together for a product that is at least two months from being done. My boss dropped by my office on Wednesday and casually dropped that he was going to a trade show and *had to* demo what we are working on. So, less than 2.5 days to hack a bunch of crap together.

This did beat the previous "joy" where one of our sales guy sold this same product with a go-live date on 10/10. Didn't find out about that until the week before. Was told we could "slip" a week or two. My boss seemed offended when I got angry and shouty then laughed in his face. Luckily, once the "slip" date came and went, expectations with the customer were reset and we could get back to some level of sanity. For a couple of days.

Between offshore devs (not my choice!), clueless salespeople and bosses, and floaty requirements, this month has just been a great big ball of good times.

I'm pretty excited about C++11. I've been playing with C++0x features as they've been rolled out to GCC, and they're pretty uniformly awesome.

Any suggestions for, books, websites, techniques, etc. for learning how to really approach a problem in an object-oriented way? I'm not a developer but I've taken a couple of Java courses just to try to test the waters, and while I don't really have much trouble with the exercises I'm given in class, when I try to approach a real-world issue I just can't "objectify" it. I write a lot of perl in my day-to-day job but when I've tried to use either Java or OO perl to do the same I end up giving up and writing a long inelegant script just to get it done.

Honestly, I think that "object-oriented design" is overemphasized a lot. The real goal is to design things in a way that makes clear sense, and gets the problem done. Sometimes OO is a good fit for that. Frequently it's not. One thing that contributes to the problem is that most OO languages conflate inheritance (code re-use) with subtyping (API compatibility), and some (like Java) also mix in code compartmentalization (often called modules or packages, although in some language modules are about APIs).

The feeling I get from your question is that part of what you're asking is about OO, but a more important part (from my POV) is that you want to write better more modular code (and part of why you're asking about OO is because you've heard it's good at that--and part is because languages like Java force you into using it.) So I'll ramble on about both a little bit.

If I'm mistaken and you already write the world's awesomest modular non-OO code, then all you need to hear is "Write your modules as classes with object state so that you can be using the module in different ways at the same time. When you need a new type of value, make that a class, too." If that's not enough, read on.

--

The key features to look for in a problem to see that OO design is truly a good fit for it are "identity" and "state". Let's look at those two in terms of files in the file system, and objects used to represent various things related to them in a programming environment.

First, let's look at a concept that's a really good match for OO: an object that represents a file open for IO. In this case, whenever you create a new object, you're "opening" a file. The object contains state about which file is open, where the "read head" is in the file, possibly buffers to support IO on it. If you have two different "file handle" objects, each one can have a different state: it can be open at a different position in the file, its buffers will contain different things. Because of that, each "file handle" has its own identity--if you have two different objects, their state is going to diverge, even if they're pointed at the same file. If you have two different routines that you want to work on a given file from front to back, you should give them different file handle objects, because each needs to have its state tracked separately. Contrariwise, if you want to call a routine that will read some data from a file, and then afterwards you want to continue work from where that routine left off, you should use the same file handle object so that you pick up with the new state.

A concept that doesn't fit as well is "file paths in the file system"--which is a common way to work with files that are not open. In this case, the objects don't really have their own identity: if you have two different objects each referring to the same path in the file system, they should be functionally identical. In addition, they don't really have their own internal state--any query you make to the object is only providing a reflection of information that's actually stored in the FS. As far as implementation goes, however, you might actually want to keep some state around: requesting details from the OS about a file (its permissions, its size, etc.) might be somewhat costly (particularly with a network FS). So maybe you want to store state in the object. But... if you do, and somebody else makes another object with the same FS path, it won't share state with the first object. Because of that, you might want to store the actual state in some central location so that it can be shared. Or you might want the standard "give me a file path object" function to return an *existing* object if one exists, etc. This all gets messy. In reality, a "file path" object is generally made to not contain state, but to make OS requests--it's easier to implement this correctly, and the cost of not caching information is pretty tiny. As a result, there's really no benefit from using such an object over simply passing around a string containing the FS path.

(Now, if your language's library had a way to refer to a file by inode, that might be different. In the case of simply referring to a path, you can't "follow" the file if it gets renamed. If you tracked the actual identifier the OS uses ... but I digress. There are a lot of problems if you try to do that, too.)

--

More usefully, here are some ideas that in my opinion might allow you to structure your code better:

Think in terms of "communicating subsystems"--services talking back and forth to each other like you might have over the network. Each subsystem has its own state, and has its own functionality that can be called upon. Some services are things you call, they do their thing, tell you they're done, and you continue. "Sort this file for me." "Okay, done." Some are things you call and they give you back a value. "Here's a file, leave the file on disk alone, but give me back the lines of the file after sorting them." "Okay, here are the lines." And some are things where you talk to the service multiple times. "Here's a file. Whenever I ask, please give me the next line in sorted order." "Okay." "Now give me the next (first) line." "Here it is." etc.

Sometimes, all of the above things make sense to implement just as single functions (methods), but as they get more complicated it can be useful to have them be objects. For example: Imagine a template system. The primary functionality here is to say "I have a template file F, and a bunch of name to value mappings M. I want to expand the template with those value mappings." A single-call way to do this is to have a function "expand_template(F, M)" that takes two parameters: the filename of the template, and the dictionary/hash/whatever with the mappings. You call it, and it returns the string value of the output.

How can this be elaborated in an object-oriented way?

First, consider that parsing the template might be costly, and you might want to expand the same template many times. This suggests that you might want a call "T = parse_template(F)" that takes a template filename, and returns a value that represents the results of parsing the template (a template object). Now you can tell the template object "T.expand_template(M)" to expand itself with a given mapping, and it doesn't have to re-parse the template file each time. You might have different methods you can call on the template object, depending on whether you want to get the results as a string, or write the results out to an open file handle.

Next, imagine that there are a variety of options you can set for how the template should be parsed. A simple toggle would be whether or not to allow the template to run arbitrary code. If your templates are all coming from a secure location, then of course you might want to call a function from inside the template to decide what to do. But if your templates are coming from an untrusted source, you only want to let them do pre-defined things. You need to know which option you're using at parse time, since you should report an error when you try to parse a template that does things it shouldn't. With only one option, you could just change the parse_template call: "T = parse_template(F, allow_functions)". But if you have a lot more options that are needed at parse time, that gets really messy. At this point, it might make sense to make the options "state" for the template parser, and that implies having a template parser object. So now: "P = template_parser()", and then "P.allow_functions(true)", and "T = P.parse_template(F)".

Back to the template object: It makes sense for this to be an object, as well, if you have questions you can ask the template. For example "what MIME type do you produce?" The template object might also have internal logic to let it go back to the parser and say "Hey, has the file I came from changed on disk? If so, re-parse me." so that updates to the template file are reflected in the output without the code that's using the template having to ask for updates.

And finally, once you have a concept of template parser objects, this opens you up to the possibility of having *different* template parsers, perhaps for different templating languages. Now when one customer wants language X and another wants language Y, you could compile different code for them. Or alternatively, your code could instantiate one of many parsers depending on your application's configuration.

--

Anyway, hope this has been some food for thought for you. The above is *something* like how I build systems in OO languages. Not exactly, because I'm also thinking in terms of modularity: I start right out with the question of "what do I want to do with a template?" and design a template API, then think "how do I get the template?" and design a template factory (parser) API, and go from there. I frequently start out with a very simple breakdown, using functions rather than objects, and then move to an object-based API when stuff starts to get complicated. In Java, I'm likely to switch to objects earlier, simply because it's hard to get anything done in Java without objects--and particularly, you can't define a type that isn't a class or (preferably) interface. In Python, I'm likely to keep the question of "what is a template value?" abstract--I might start out with it simply being a string that the expand_template function uses, then turn it into an object later. I frequently like function-call APIs over method-call APIs when I can use them, but I think that's a lot because I have a strong background in functional programming. (Practically, I think that they're better a lot of the time because they don't force you to lock in early what operations should be methods of what objects.)

In short: Don't stress OO. But *do* think about modularity. Think about what parts of the code you're writing are modular pieces (subsystems) that you might be able to re-use, and break those off into separate pieces of code. As your system grows more elaborate, you'll naturally tend towards a more OO approach in the design of those pieces. And even with simple systems, your code will be easier to follow if you can say "this piece does the templates", "this piece does the calculations", and "this piece is the program you call from the shell that asks for the calculations and then puts them into the template to produce output".

Sadly, there's no easy lesson for any of these things: learning how to design good APIs is all about experience. Ideas like OO design can give you a bit of a leg up at times, but they can also lead you into traps like thinking every problem demands an OO solution. Over the years, I've worked with a lot of people with different backgrounds, and the rules I've been able to observe are: a greater breadth of experience leads to much saner APIs. The people who've done Java (OO) *and* C (low-level) *and* Perl (scripting) *and* Lisp (functional) are the ones who write the most logical APIs, in which every part of the API uses the paradigm best suited to it. And, a greater depth of experience leads to much cleaner APIs. The people who've been working for over a decade have already made all of the mistakes of over-elaboration and over-simplification and know a bad design before they get too far into it, so they're the ones who write APIs that do just exactly what they need to without leaving anything necessary out or including anything fancy that isn't needed yet.

Finally: The best resource to look at is other peoples' code. That includes both the APIs of libraries you're using, and the programs that other people write. Read through other peoples' stuff and see how they get things done. Steal ideas from them. When you're using an API, think about how it's done, and what you like and don't like about it. When you write your own, think about what you're doing and try to do the things you liked and avoid the things you didn't.

Good luck.

So now I'm trying to figure out the best way to structure our RoR application. We're going to have Amazon S3 hosting for file uploads, and then use the AWS services for database support (not sure if there is a preference of which to use instead, looks like options mainly of RDS or SimpleDB).

We definitely want to use a cloud-based provider for hosting the core RoR app, I'm just not sure if we should go with someplace like Heroku or EngineYard, or roll-our-own using Amazon's cloud options. Scalability is definitely a concern, and being a production application that we're aiming for no less than tens of thousands of users, I need to be confident that we make the right decision. Then I wonder if i should I just dive into the Amazon EC2 option...

Any thoughts?

Hypatian, that is some of the best writing on programming I have ever read. Great job.

Wow, Hypatian. I think you just distilled the essence of everything I've ever read on OO programming (and development in general) into a single post. You didn't misread my question at all. Until recently "code reuse" for me has meant copying and pasting sections of one script into another, (please don't choke me) which has led to all the issues you might predict, and is the impetus behind my deciding enough is enough and really trying to address the way I approach my solutions. I'm going to have to reread it a few times to digest it fully but I really appreciate your response.

trueheart78 wrote:

So now I'm trying to figure out the best way to structure our RoR application. We're going to have Amazon S3 hosting for file uploads, and then use the AWS services for database support (not sure if there is a preference of which to use instead, looks like options mainly of RDS or SimpleDB).

We definitely want to use a cloud-based provider for hosting the core RoR app, I'm just not sure if we should go with someplace like Heroku or EngineYard, or roll-our-own using Amazon's cloud options. Scalability is definitely a concern, and being a production application that we're aiming for no less than tens of thousands of users, I need to be confident that we make the right decision. Then I wonder if i should I just dive into the Amazon EC2 option...

Any thoughts?

If you want easy and quick deployment you can't go wrong with Heroku. I'd consider your budget and delivery timeline. Heroku is great for supporting steep curves on the time and feature graph...but the monetary cost will be more than building your own. If you want total control over your environment and less cost, roll your own. I've deployed both ways, but whenever possible (or when I can convince a client that the cost difference is worth it), I prefer Heroku just because I can spend more time delivering features vs. building up and configuring/troubleshooting app servers. The other nice attraction of Heroku is all of their available add-ons and supported options. You need Redis or MongoDB? It is dead simple to get it added into your stack. You need workers for back-end processes? Just fire up a new dyno.

As a primarily C++ programmer (15 years) I would agree that picking the right tool for the job is paramount. However. I'd also warn you that use of functional programming when OO would make more sense is as bad as the converse. The industry is full of programmers trying to make every problem look like the target for their particular hammer variant.

My history has been generally in the application side dealing and display of sensor data (e.g. aviation/marine/sports systems) ; you tend to know the kinds of things you'll be doing up front, and what kind of data you're going to be dealing with. The requirements creep tends to be towards doing more with the same data.

The codebases I've dealt with have been littered with abstract interfaces with one derivative, and horribly complex functional metaprogramming sections that are only used once each. I've found that often in the rush to make no assumptions about what data you'll be dealing with and what you're going to be doing with it, you make the code much more complicated than it needs to be. My philosophy has always been KISS; and often the simple answer is often the best one (Simplicity is in the eye of the beholder, of course, and I suppose some people do find templated functions of two other templates easier to read):

Consider the hatred functional programmers have for the humble fragment:

for (ListType_t::const_iterator itr = m_List.begin(); itr != m_List.end(); ++itr) { _Function(*itr); }

Oh, the functor objects they want to write:

(in a header file somewhere else in the code tree) struct Functor { void operator()(const element_type& element) { _Function(element) } }; (in the place you want to use it) std::for_each(m_List.begin(), m_List.end(), functor);

for_each loops are great in theory (disconnecting that function from that list, allowing you to apply any function you want to the list, and the function to any list you like), but in practice I don't process the same loops multiple times with different functions. I generally don't apply the same operation to different lists of things. 99% of the time I'm doing specific operations to specific lists, and this all becomes stuff that's harder to debug, and harder to maintain.

If at any point in the process you've written code that you have any doubts about being able to understand 6 months from now (or the majority of your colleagues), then you've done it wrong, in my view. Either you need more comments, or you've coded it wrong, or both.

Oof. Yeah, going that far outside the idiom of the language you're writing in is pretty awful. That's one reason I don't use C++ when I can help it. ;> (Man, I really don't like C++'s idioms. :D)

The broad form of what you're saying is this: When you're writing in language X as if you're writing in language Y, you have *f*cked up*. You shouldn't try to re-make C++ into ML, or Perl into C++, or Lisp into Perl, or whatever. Write in the language you're writing in. Once you've been using that language a while, then you can afford to subvert its idioms and bring in stuff from outside. (e.g. "LOL, now I am going to implement monads in C++.") But you don't want to be doing that until you're experienced with the language in its own right.

The concern I was thinking of is more basic—along the lines of "should I make operation X a function or a method?" In my opinion, making it a method can take you down a rathole of design choices you don't necessarily want to deal with right away. (I have three different "things" I could attach this to—which one makes the most sense? What else should that class do? etc.)

jakeleg wrote:
trueheart78 wrote:

So now I'm trying to figure out the best way to structure our RoR application. We're going to have Amazon S3 hosting for file uploads, and then use the AWS services for database support (not sure if there is a preference of which to use instead, looks like options mainly of RDS or SimpleDB).

We definitely want to use a cloud-based provider for hosting the core RoR app, I'm just not sure if we should go with someplace like Heroku or EngineYard, or roll-our-own using Amazon's cloud options. Scalability is definitely a concern, and being a production application that we're aiming for no less than tens of thousands of users, I need to be confident that we make the right decision. Then I wonder if i should I just dive into the Amazon EC2 option...

Any thoughts?

If you want easy and quick deployment you can't go wrong with Heroku. I'd consider your budget and delivery timeline. Heroku is great for supporting steep curves on the time and feature graph...but the monetary cost will be more than building your own. If you want total control over your environment and less cost, roll your own. I've deployed both ways, but whenever possible (or when I can convince a client that the cost difference is worth it), I prefer Heroku just because I can spend more time delivering features vs. building up and configuring/troubleshooting app servers. The other nice attraction of Heroku is all of their available add-ons and supported options. You need Redis or MongoDB? It is dead simple to get it added into your stack. You need workers for back-end processes? Just fire up a new dyno.

How has the performance on Heroku been?

jakeleg wrote:

Before making a decision, I'd suggest using their free plan to host something and see how you like it.

I will do that, and that will at least let me offload the worry of getting something up and running. In fact, I just got the cue from management today to start on the new web app, and I'm stoked.

Here's one for you guys, how do I make the transition from writing little scripts in python that do parts of my job for me to be a full on, big boy developer? Answers can relate to professional and technical aspects.

boogle wrote:

Here's one for you guys, how do I make the transition from writing little scripts in python that do parts of my job for me to be a full on, big boy developer? Answers can relate to professional and technical aspects.

Learn VB?

It has been great for me. You can scale the dynos (resources) you have available for your app to meet spikes in concurrent requests. This is more a benefit of the cloud platform and something you can also do with EngineYard or EC2. Use of memcache can also greatly help performance, though it is also not a Heroku specific technique. Heroku just makes it extremely easy to implement.

Before making a decision, I'd suggest using their free plan to host something and see how you like it.

Boogle, it's just like any other skill. Take it one step at a time. Practice deliberately. Learn new things, don't just repeat what you already know. Ask questions if you get stuck. The bigger the code base gets, the more important it becomes to be able to read code and write readable code. There are simply an amazing number or resources on the internet now, but there are also an amazing number of things you could be learning. Don't get overwhelmed. It's all just integer arithmetic and Boolean logic underneath.

Since the topic has veered towards coding practices and writing simple and modular code, here is a good read. http://www.laputan.org/mud/

My problem is I'm not really in a dev or even programming role, I just use that to accomplish some of my tasks.
I'm the only programmer in my group, and one of few in my whole department.
I can read and write readable code, but its mostly just one off or occasional use scripts, not 'real' programming, that is stuff with self in the args and so on.
So to further iterate on this, where can I pick up CS terminology that seems to be thrown around on places I'm trying to learn and how can I push myself to learn to program more complex concepts and programs?

Um. Hmm. CLR, as a start, maybe? It's a tough question. What kinds of CS terminology?

boogle wrote:

My problem is I'm not really in a dev or even programming role, I just use that to accomplish some of my tasks.
I'm the only programmer in my group, and one of few in my whole department.
I can read and write readable code, but its mostly just one off or occasional use scripts, not 'real' programming, that is stuff with self in the args and so on.
So to further iterate on this, where can I pick up CS terminology that seems to be thrown around on places I'm trying to learn and how can I push myself to learn to program more complex concepts and programs?

From my experience, school then work. If you are looking to skip on the school side of it, then the best way to learn is dive in. Very much like learning a new spoken language, immersion will give you the best results.

Hypatian wrote:

Um. Hmm. CLR, as a start, maybe? It's a tough question. What kinds of CS terminology? :)

I think that should work. I'm fine as long as things stay in the realms of math or formal logic (which is a lot), but there are some CS terms like monkey patching that are a little difficult to grok at first.

And kazar, I guess I would then ask how to immerse myself in more deep, complex programs.

boogle wrote:
Hypatian wrote:

Um. Hmm. CLR, as a start, maybe? It's a tough question. What kinds of CS terminology? :)

I think that should work. I'm fine as long as things stay in the realms of math or formal logic (which is a lot), but there are some CS terms like monkey patching that are a little difficult to grok at first.

And kazar, I guess I would then ask how to immerse myself in more deep, complex programs.

The hard way but most effective way is to get a job working on a more deep and complex program. You would surround yourself with people who are knowledgeable in the subject and you would be getting first hand experience. Outside of that, you could get involved with an open source project. There are some very large projects out there (linux being the best example), but I wouldn't consider the open source community a friendly community.

Truthfully, the thing you can do might be to start getting into the app world. All these phones have SDKs, use mostly mainstream languages and have lots of good tutorials. You would also get lots of instant gratification as everything you do, you will see on your phone.

You can only learn by doing. You see a massive difference between people who have used a language in anger compared to people who've just read a book. Give yourself a project that interests you and try and implement it. Solve the problems that come up etc.

Good examples obviously depend on which language you're trying to learn, but I implemented a Sudoku solver in Java to learn that, with a GUI so I'd pick up AWT. For Python I did a marine simulator (tide effects, poor GPS HDOP etc.) because I actually needed one of those to test some of my work code. I don't know Erlang, but if I wanted to I might implement an AI system or a matrix linear regression/Cholesky decomposition algorithm in it.

If you've used some Python and want to do something math related, then playing around with SciPy (NumPy) is probably rewarding. You can then transition from NumPy to running calculations on your GPU using PyOpenCL.

Already using numpy, scipy and matplotlib for some reservoir sim back and front end work. It looks like I need to find a project to work on in an open source context as my workload here is less now that I do with scripts what would otherwise done by hand.