The Joys Of Programming

*Legion* wrote:

F**k it.

We'll do it in HyperCard.

IMAGE(http://images.granneman.com/tech-history/1987_HyperCard.jpg)

That's basically exactly how my Senior project in college panned out.

Mmm. HyperCard was awesome. Not in any way a database, though.

complexmath wrote:

Oops, you're right. FoxPro is Microsoft's old DB thing. For single-table stuff, Filemaker is passable, but it's terrible at doing multi-table stuff and the reporting support is surprisingly weak. The last time I used it I wanted to report on a single table with grouped subtotals and as far as I can tell this is impossible in Filemaker.

And Excel CAN do this! Pivot Tables are f*cking awesome-sauce once you learn the interface.

Mixolyde wrote:
complexmath wrote:

Oops, you're right. FoxPro is Microsoft's old DB thing. For single-table stuff, Filemaker is passable, but it's terrible at doing multi-table stuff and the reporting support is surprisingly weak. The last time I used it I wanted to report on a single table with grouped subtotals and as far as I can tell this is impossible in Filemaker.

And Excel CAN do this! Pivot Tables are f*cking awesome-sauce once you learn the interface.

Bingo.

Mixolyde wrote:
complexmath wrote:

Oops, you're right. FoxPro is Microsoft's old DB thing. For single-table stuff, Filemaker is passable, but it's terrible at doing multi-table stuff and the reporting support is surprisingly weak. The last time I used it I wanted to report on a single table with grouped subtotals and as far as I can tell this is impossible in Filemaker.

And Excel CAN do this! Pivot Tables are f*cking awesome-sauce once you learn the interface.

Nth this.

Question for anyone with Java web services (Jersey) experience, or maybe even RESTful knowledge:

I've got a POST method that generates a 201 Created response, and I want the Location header to refer to a different top-level resource, e.g.

@Path("/model-request") public class ModelRequestResource { @POST public Response create(...) { ... return ResponseBuilder.created( UriBuilder.fromResource(ModelResource.class).path("{id}").build(modelID) ).build(); } }

However, this always seems to end up relative to the "/model-request" path, so I get "/model-request/model/123" instead of "/model/123". The URI from UriBuilder seems to be correct (with leading slash), so is there any way to get ResponseBuilder not to treat the response as relative to the containing resource?

My solution was to set the Location header to "/model-request/123" and expose a corresponding GET endpoint which simply redirects to the desired URI with a 301 Moved Permanently. Is this roughly in line with good RESTful HTTP behavior?

Cyranix wrote:

Question for anyone with Java web services (Jersey) experience, or maybe even RESTful knowledge:

I've got a POST method that generates a 201 Created response, and I want the Location header to refer to a different top-level resource, e.g.

@Path("/model-request") public class ModelRequestResource { @POST public Response create(...) { ... return ResponseBuilder.created( UriBuilder.fromResource(ModelResource.class).path("{id}").build(modelID) ).build(); } }

However, this always seems to end up relative to the "/model-request" path, so I get "/model-request/model/123" instead of "/model/123". The URI from UriBuilder seems to be correct (with leading slash), so is there any way to get ResponseBuilder not to treat the response as relative to the containing resource?

My solution was to set the Location header to "/model-request/123" and expose a corresponding GET endpoint which simply redirects to the desired URI with a 301 Moved Permanently. Is this roughly in line with good RESTful HTTP behavior?

I'm not totally sure about good RESTful behavior, that's actually something I need to educate myself on a lot as I have a job interview in 11 days and that's highly relevant.

That said, I think the typical RESTful way to do this is to not use the model-request part of the URI in the first place. Simply POST to /model/123 (or better yet /model if you don't have an id yet).

I believe that's how it's done at my work. I'm too tired to look it up but if I see this thread tomorrow at work I'll confirm it.

Cyranix wrote:

Question for anyone with Java web services (Jersey) experience, or maybe even RESTful knowledge:

I've got a POST method that generates a 201 Created response, and I want the Location header to refer to a different top-level resource, e.g.

@Path("/model-request") public class ModelRequestResource { @POST public Response create(...) { ... return ResponseBuilder.created( UriBuilder.fromResource(ModelResource.class).path("{id}").build(modelID) ).build(); } }

However, this always seems to end up relative to the "/model-request" path, so I get "/model-request/model/123" instead of "/model/123". The URI from UriBuilder seems to be correct (with leading slash), so is there any way to get ResponseBuilder not to treat the response as relative to the containing resource?

My solution was to set the Location header to "/model-request/123" and expose a corresponding GET endpoint which simply redirects to the desired URI with a 301 Moved Permanently. Is this roughly in line with good RESTful HTTP behavior?

REST is supposed to present the world roughly in terms of "resources". Model-request is a more technical term so I think you should be just using /model/123 as Sixteen said. I do REST services for a living and this would be how I'd approach it. Down the road you may want to get a model, update a model, etc. Having the url be simply /model/{id} means it should work across all the different HTTP methods and be a fairly consistent API.

And now I know who to talk to if I have REST questions.

Thanks for the feedback. I really like the idea of reifying the actions that happen in this service -- especially since non-trivial processing is happening behind the scenes, so the response from the POST may indicate a Location for an entity that doesn't wholly exist yet but has merely been allocated -- but I think I will be able to make it all fit into one resource (which I agree is the standard way of doing things). I've condensed a lot of the model representation already, so this should be doable.

Cyranix wrote:

Thanks for the feedback. I really like the idea of reifying the actions that happen in this service -- especially since non-trivial processing is happening behind the scenes, so the response from the POST may indicate a Location for an entity that doesn't wholly exist yet but has merely been allocated -- but I think I will be able to make it all fit into one resource (which I agree is the standard way of doing things). I've condensed a lot of the model representation already, so this should be doable.

The one thing I would say is not to be dogmatic just for its own sake. Many people adopt REST without understanding that the most useful thing about REST is if people consume your API it should be predictable. But if you have something that doesn't fit that REST paradigm I wouldn't hesitate to break it out and do it differently.

You point to something that's an issue with web services in general. It's not uncommon to have a web service make a call behind the scenes that's suboptimal for performance in terms of the caller waiting for the HTTP response. I've seen approaches where you send back a 200 or 201 even if there is further processing happening on the server. You're basically saying that nothing failed on initial unmarshalling of the JSON or XML. The contract has been met and preliminary validation has been a success. Then you send off an asynchronous call to do the heavier lifting. This isn't unheard of. I'm not sure if others would consider it RESTful to not actually get the state back from a resource call, but I've seen it used that way. Typically if you did this you'd then make a second call later to try and GET the information you PUT or POSTed.

DanB wrote:
DSGamer wrote:
Cyranix wrote:

Thanks for the feedback. I really like the idea of reifying the actions that happen in this service -- especially since non-trivial processing is happening behind the scenes, so the response from the POST may indicate a Location for an entity that doesn't wholly exist yet but has merely been allocated -- but I think I will be able to make it all fit into one resource (which I agree is the standard way of doing things). I've condensed a lot of the model representation already, so this should be doable.

The one thing I would say is not to be dogmatic just for its own sake. Many people adopt REST without understanding that the most useful thing about REST is if people consume your API it should be predictable. But if you have something that doesn't fit that REST paradigm I wouldn't hesitate to break it out and do it differently.

You point to something that's an issue with web services in general. It's not uncommon to have a web service make a call behind the scenes that's suboptimal for performance in terms of the caller waiting for the HTTP response. I've seen approaches where you send back a 200 or 201 even if there is further processing happening on the server. You're basically saying that nothing failed on initial unmarshalling of the JSON or XML. The contract has been met and preliminary validation has been a success. Then you send off an asynchronous call to do the heavier lifting. This isn't unheard of. I'm not sure if others would consider it RESTful to not actually get the state back from a resource call, but I've seen it used that way. Typically if you did this you'd then make a second call later to try and GET the information you PUT or POSTed.

This is essentially how our services work because the processing we're doing for users takes from 20mins to 2 hours. So a user sends a request and we return a 200 that essentially just says we've caught the request and hands back some information they can use to check progress of the data generation. In the end of the day your RESTful interface so expose a consistent and predictable API to clients, I wouldn't get immensely hung up on always having to satisfy all the REST constraints if it's not applicable to your application, although I don't see that the kind of asynchronous webserver behaviour described actually breaks them.

Thanks for the additional points of view. I'm trying to conform to RESTful practices fairly well because I'm cooking up a small prototype and some of the people who will be evaluating it love having things by-the-book. I think in this case, even though the API is reliable and predictable as it stands, I will avail myself of the opportunity to try condensing the aforementioned activities into a single resource class. POSTing to /model and getting back a "201 Created" with minimal information should be feasible -- the way the corresponding GET is set up, that info should be sufficient even if the resulting file is not available for download (and the info will indicate the status of the heavy processing, plus attempts to download an unavailable file are gracefully rejected).

DSGamer wrote:
Cyranix wrote:

Thanks for the feedback. I really like the idea of reifying the actions that happen in this service -- especially since non-trivial processing is happening behind the scenes, so the response from the POST may indicate a Location for an entity that doesn't wholly exist yet but has merely been allocated -- but I think I will be able to make it all fit into one resource (which I agree is the standard way of doing things). I've condensed a lot of the model representation already, so this should be doable.

The one thing I would say is not to be dogmatic just for its own sake. Many people adopt REST without understanding that the most useful thing about REST is if people consume your API it should be predictable. But if you have something that doesn't fit that REST paradigm I wouldn't hesitate to break it out and do it differently.

You point to something that's an issue with web services in general. It's not uncommon to have a web service make a call behind the scenes that's suboptimal for performance in terms of the caller waiting for the HTTP response. I've seen approaches where you send back a 200 or 201 even if there is further processing happening on the server. You're basically saying that nothing failed on initial unmarshalling of the JSON or XML. The contract has been met and preliminary validation has been a success. Then you send off an asynchronous call to do the heavier lifting. This isn't unheard of. I'm not sure if others would consider it RESTful to not actually get the state back from a resource call, but I've seen it used that way. Typically if you did this you'd then make a second call later to try and GET the information you PUT or POSTed.

This is essentially how our services work because the processing we're doing for users takes from 20mins to 2 hours. So a user sends a request and we return a 200 that essentially just says we've caught the request and hands back some information they can use to check progress of the data generation. In the end of the day your RESTful interface should expose a consistent and predictable API to clients, I wouldn't get immensely hung up on always having to satisfy all the REST constraints if it's not applicable to your application, although I don't see that the kind of asynchronous webserver behaviour described actually breaks them.

SixteenBlue wrote:
Cyranix wrote:

Question for anyone with Java web services (Jersey) experience, or maybe even RESTful knowledge:

I've got a POST method that generates a 201 Created response, and I want the Location header to refer to a different top-level resource, e.g.

@Path("/model-request") public class ModelRequestResource { @POST public Response create(...) { ... return ResponseBuilder.created( UriBuilder.fromResource(ModelResource.class).path("{id}").build(modelID) ).build(); } }

However, this always seems to end up relative to the "/model-request" path, so I get "/model-request/model/123" instead of "/model/123". The URI from UriBuilder seems to be correct (with leading slash), so is there any way to get ResponseBuilder not to treat the response as relative to the containing resource?

My solution was to set the Location header to "/model-request/123" and expose a corresponding GET endpoint which simply redirects to the desired URI with a 301 Moved Permanently. Is this roughly in line with good RESTful HTTP behavior?

I'm not totally sure about good RESTful behavior, that's actually something I need to educate myself on a lot as I have a job interview in 11 days and that's highly relevant.

That said, I think the typical RESTful way to do this is to not use the model-request part of the URI in the first place. Simply POST to /model/123 (or better yet /model if you don't have an id yet).

I believe that's how it's done at my work. I'm too tired to look it up but if I see this thread tomorrow at work I'll confirm it.

Still hiring senior and non-senior engineers on my team, if you're interested. Just filled the UI engineer position with a guy who only knew PHP and a bit of Python.

Bonus_Eruptus wrote:
SixteenBlue wrote:
Cyranix wrote:

Question for anyone with Java web services (Jersey) experience, or maybe even RESTful knowledge:

I've got a POST method that generates a 201 Created response, and I want the Location header to refer to a different top-level resource, e.g.

@Path("/model-request") public class ModelRequestResource { @POST public Response create(...) { ... return ResponseBuilder.created( UriBuilder.fromResource(ModelResource.class).path("{id}").build(modelID) ).build(); } }

However, this always seems to end up relative to the "/model-request" path, so I get "/model-request/model/123" instead of "/model/123". The URI from UriBuilder seems to be correct (with leading slash), so is there any way to get ResponseBuilder not to treat the response as relative to the containing resource?

My solution was to set the Location header to "/model-request/123" and expose a corresponding GET endpoint which simply redirects to the desired URI with a 301 Moved Permanently. Is this roughly in line with good RESTful HTTP behavior?

I'm not totally sure about good RESTful behavior, that's actually something I need to educate myself on a lot as I have a job interview in 11 days and that's highly relevant.

That said, I think the typical RESTful way to do this is to not use the model-request part of the URI in the first place. Simply POST to /model/123 (or better yet /model if you don't have an id yet).

I believe that's how it's done at my work. I'm too tired to look it up but if I see this thread tomorrow at work I'll confirm it.

Still hiring senior and non-senior engineers on my team, if you're interested. Just filled the UI engineer position with a guy who only knew PHP and a bit of Python.

Austin, right? That's not really an option for me anymore.

SixteenBlue wrote:
Bonus_Eruptus wrote:
SixteenBlue wrote:
Cyranix wrote:

Question for anyone with Java web services (Jersey) experience, or maybe even RESTful knowledge:

I've got a POST method that generates a 201 Created response, and I want the Location header to refer to a different top-level resource, e.g.

@Path("/model-request") public class ModelRequestResource { @POST public Response create(...) { ... return ResponseBuilder.created( UriBuilder.fromResource(ModelResource.class).path("{id}").build(modelID) ).build(); } }

However, this always seems to end up relative to the "/model-request" path, so I get "/model-request/model/123" instead of "/model/123". The URI from UriBuilder seems to be correct (with leading slash), so is there any way to get ResponseBuilder not to treat the response as relative to the containing resource?

My solution was to set the Location header to "/model-request/123" and expose a corresponding GET endpoint which simply redirects to the desired URI with a 301 Moved Permanently. Is this roughly in line with good RESTful HTTP behavior?

I'm not totally sure about good RESTful behavior, that's actually something I need to educate myself on a lot as I have a job interview in 11 days and that's highly relevant.

That said, I think the typical RESTful way to do this is to not use the model-request part of the URI in the first place. Simply POST to /model/123 (or better yet /model if you don't have an id yet).

I believe that's how it's done at my work. I'm too tired to look it up but if I see this thread tomorrow at work I'll confirm it.

Still hiring senior and non-senior engineers on my team, if you're interested. Just filled the UI engineer position with a guy who only knew PHP and a bit of Python.

Austin, right? That's not really an option for me anymore.

You know, since the "incident"...

IMAGE(http://2.bp.blogspot.com/_MfEUzep_GRE/SgvY70L91NI/AAAAAAAAEjo/THdPqCKPHhU/s400/lost-incident-juliet.jpg)

SixteenBlue wrote:
Bonus_Eruptus wrote:
SixteenBlue wrote:
Cyranix wrote:

Question for anyone with Java web services (Jersey) experience, or maybe even RESTful knowledge:

I've got a POST method that generates a 201 Created response, and I want the Location header to refer to a different top-level resource, e.g.

@Path("/model-request") public class ModelRequestResource { @POST public Response create(...) { ... return ResponseBuilder.created( UriBuilder.fromResource(ModelResource.class).path("{id}").build(modelID) ).build(); } }

However, this always seems to end up relative to the "/model-request" path, so I get "/model-request/model/123" instead of "/model/123". The URI from UriBuilder seems to be correct (with leading slash), so is there any way to get ResponseBuilder not to treat the response as relative to the containing resource?

My solution was to set the Location header to "/model-request/123" and expose a corresponding GET endpoint which simply redirects to the desired URI with a 301 Moved Permanently. Is this roughly in line with good RESTful HTTP behavior?

I'm not totally sure about good RESTful behavior, that's actually something I need to educate myself on a lot as I have a job interview in 11 days and that's highly relevant.

That said, I think the typical RESTful way to do this is to not use the model-request part of the URI in the first place. Simply POST to /model/123 (or better yet /model if you don't have an id yet).

I believe that's how it's done at my work. I'm too tired to look it up but if I see this thread tomorrow at work I'll confirm it.

Still hiring senior and non-senior engineers on my team, if you're interested. Just filled the UI engineer position with a guy who only knew PHP and a bit of Python.

Austin, right? That's not really an option for me anymore.

C'mon. We have the only live boogle in captivity.

Tempting, if only for the epic IT Crowd re-enactments we would stage.

I only started using REST recently so don't take my word as gospel. I think the way you should deal with long running actions is to change your resources that you are posting. Say you want to buy a product, and it takes 5 days for a product to be created (my job actually has this type of scenario). Instead of creating the product with an HTTP POST, instead create an order with an HTTP POST. An order can be created instantly, it can be retrieved instantly and it can have state that changes as the product is being created (say a status saying the different stages of the creation of said product). When the order is complete, then you can do an HTTP GET on the product to get its details.

The biggest challenge I see when writing RESTful services is properly defining resources. Sometimes the resources your system already defines shouldn't be the resources that get exposed in REST. Sometimes the resources exposed by REST are more conceptual then a 1:1 of what is in the datastore.

Basically when you have a long running process it's good to have the "start" be some kind of token that the requester can reuse in any status update (or additional start requests) that ensure that the long running process doesn't get executed multiple times.

The architecture pattern for this is Idempotent messaging. You ensure that multiple requests return a consistent state. Not only useful for long running processes, but for handling potential communication failure scenarios.

Just wanted to pop in and thank you guys for the discussion a page or two back on non-RDBS and scalability. Sorry I don't have much to contribute, but it gave me some good things to think about. My group at work is struggling with the limits of what our environment can handle as we start offering more on-demand functionality through the web on top of a system that was in no way designed for that.

For the read operations, a caching layer can usually pick up a lot of slack. Maybe use memcached with a timeout on each entry. For writes, queue up operations on separate a process to commit them to the DB or whatever. If the operation is atomic and has to return a result to a client, give them a transaction ID when they submit the request and use that to track request state. The writer can wipe or update entries in the read cache as well, if you're writing a lot and don't want to frequently fail back to the DB on cache lookup failure.

In fact, all this can really be encapsulated into a single process that comprises a cache and a write queue. Then you may be able to throw out memcached and just use an in-memory hashtable, provided you don't care about synchronizing the cache across an array of these middleware processes. The only real trick then is persisting state so if the process crashes you can recover gracefully. Once all of this is done, you can upgrade storage without breaking clients.

Anyway, the fix for a slow storage mechanism is putting something faster between you and it. It could mean rewriting portions of client code if you don't want to fake the existing protocol though.

Gunner wrote:

Just wanted to pop in and thank you guys for the discussion a page or two back on non-RDBS and scalability. Sorry I don't have much to contribute, but it gave me some good things to think about. My group at work is struggling with the limits of what our environment can handle as we start offering more on-demand functionality through the web on top of a system that was in no way designed for that.

If you want, shoot me some specifics, architecture is my thing.

I posted about this in the @Roguelike thread, but I think it's interesting to post here, too. Someone wrote a relatively simple, browser-based star wars roguelike and put it up here:

http://ondras.github.com/star-wars/

AFAI can tell, the whole thing is written in client-side javascript, which means you can see the source for the whole game (in chrome: view-source:http://ondras.github.com/star-wars/), and in particular this neat roguelike toolkit for javascript, http://ondras.github.com/star-wars/j.... It uses simplex noise for generating the terrain, which is much smarter/faster than cellular automata for this use case. The background starfield on the main page is randomly generated on refresh, too.

Pretty amazing what you can just grab off the internet some days.

Do the files change frequently? If not, I think just getting the files on all the nodes before you run the job is a reasonable solution.

SixteenBlue wrote:

Do the files change frequently? If not, I think just getting the files on all the nodes before you run the job is a reasonable solution.

Agreed, and just checking to make sure they're there before your first job runs.

Another option might be to setup a job queue with something like Hazel cast, a distributed queue solution, where hadoop puts file moving jobs on the queue and other worker applications do the move and report their success on another queue that hadoop would check. This is probably over-engineering for your case, but Hazelcast looks so awesome that I just want to see it used for any distributed problem.

SixteenBlue wrote:

I know you said online resource but I learned Hadoop from this book and it was surprisingly helpful. Not sure when the last time I learned a new technology from a book was.

Thanks for this suggestion. I now have the skeleton of our new hadoop project here up and running, with most of the main technical problems solved/addressed. While I do a lot of java programming at work, I'm not really a java dev so some of that was a bit of an uphill struggle. It would be really helpful if every tutorial didn't assume that you wanted to count words in a document store. Also most of our jobs need to go Map->Map->Map->Reduce and hooking that together with what little I could scrape from the internet took a while.

One question I've not seen addressed is our first map task needs local access to 5Gb of ancillary data files. This can't be served over HDFS or what is an already time consuming process will take a significant performance hit. Does hadoop provide an easy way to push files to the compute node's local disk?

My first thought was, copy the data files to HDFS then have the first map task which runs on a node check for the files locally and if they aren't there then copy them from HDFS to the local disk. The trouble with this is that a multicore machine may have competing initial map tasks all trying to copy the files and dealing with lock files and whatnot seems a little complex. Does Hadoop provide a solution for this use case or should I just write a shell script to copy the data files to each compute node before I start the job?

For any given analysis the files won't change in any shortish time scale so the quick script option is probably the best then.

Hazelcast thing, looks neat, but is definitely way over engineering this issue. I suppose if we were planning to have this thing as some kind of always-on service then it might be a good idea but for now I don't think it's needed

Are you sure the time to copy the files locally isn't going to be longer than the time to read them from HDFS? Will you be running the jobs multiples times so you're trying to limit the data transfer to just the first time? That's what I assumed but wanted to make sure.