What's wrong with this picture?

 

class PageStoreDB {
    Page getPage(int pageId) throws PageNotFound, IOException {//...
    boolean deletePage(int pageId) throws IOException {//...
    boolean pageExists(int pageId) throws IOException {//...
    //etc...
}
There might well be more than I can see, even in this simple snippet, but I can see at least two problems--one pretty obvious and one maybe not so obvious.  Obvious (assuming we're in Java): PageStoreDB should be an interface, to make it easier to mock for tests, swap out implementations, etc.  Less obvious: I don't think pageId should be an int.

Almost everyone makes it an int, of course.  I've done it plenty of times myself, because it's natural--usually you're using an id to read from a database, and the id in the database is typically an int.  As programmers we also tend to think of ints as "primitive"--about as simple as you can get.  I'd like to suggest that this isn't so.  What if our snippet looked like this?
class PageID {      
      PageID invertSign(PageID id) {//...
      PageID add(PageID id) {//...
      PageID subtract(PageID id) {//...
      PageID multiply(PageID id) {//...
      PageID divide(PageID id) {//...
      PageID leftShift(PageID id) {//...
      PageID arithmeticRightShift(PageID id) {//...
      PageID logicalRightShift(PageID id) {//...
      PageID greaterThan(PageID id) {//...
      PageID lessThan(PageID id) {//...
      PageID equalTo(PageID id) {//...
      PageID notEqualTo(PageID id) {//...
      PageID lessThanOrEqualTo(PageID id) {//...
      //...and much, much more...
}
class PageStoreDB {
    Page getPage(PageID pageId) throws PageNotFound, 
                                       IOException {//...
    boolean deletePage(PageID pageId) throws IOException {//...
    boolean pageExists(PageID pageId) throws IOException {//...
    //etc...
}


You'd probably think the programmer who created this was borderline insane.  About the only methods in PageID that you'll actually use are equalTo and notEqualTo.  The rest are not just superfluous, they're distracting and potentially dangerous.  Do you really want to be able to logically right shift an identifier by accident?  But this is exactly the interface you're getting when you use an int for an identifier.  What you're really looking for in an identifier from the client side is something more like this:

 

abstract class PageID {
   abstract boolean equals(Object o);
}

The rest of the implementation details, mapping this object representation to the id field in the database, are exactly that--implementation details--and should be handled in implementing classes in the PageStoreDB's purview.

So, you say, is that it?  I have to write at least two classes to handle the simple identification of a Page, a job I could do with no work using an int, and all I get in return is that I can't accidentally right shift my id?  I still think you could argue that was enough to justify the work, but I think we actually get more.

Let's say we're using ints for our ids and, somewhere down the road, we have to start merging pages in the page store--that is, we want to essentially forward one id to another id.  With the int approach, we have some problems.  Just testing id1 == id2 doesn't really work any more.  You have to start writing methods like PageStoreDB.idForwardsTo, and remembering to call them when you want to compare ids.  If we're using the class as an id, however, it's pretty easy to handle.  We have each object keep a list of forwards, and consult these in the equals method.  This lets us maintain a nice invariant of our system--that all ids that identify the same page test as equal to each other--without altering client code, because we encapsulated early on.

We get a similar benefit if we stop identifying pages with ints in the database, perhaps because generating them in sequential fashion is causing a bottleneck, and start using uuids instead.  Very little code needs to be changed, and all of it is in the provider class, where it belongs, instead of in the client code.

There are a host more benefits as well: logging, checking for goodness, control over creation.  If and as these become needed, they can all be done in one place.

I'm coming to believe that the basic principle here applies pretty widely: namely, that you shouldn't use "primitive" types to represent just about anything in your system.  Use a URL class or an AnchorText class instead of a raw string, for instance, or an enum with custom methods for a set of bit fields.  It doesn't take much extra work once you get into the habit, and the aggregate work you save yourself down the road is usually more than worth it.

so-hiring

Recommended Articles

Join our subscribers

Sign up here and we'll keep you updated on the latest in product, UX, and engineering from HubSpot.

Subscribe to the newsletter