February 22, 2014

New Mexico? No, You mean Michigan! There -- fixed it for you!

"Just because you can doesn't mean you should."

We've heard that -- maybe we even said that -- yet we still do what we can, even when we shouldn't.

Today's case in point: AAA, the well-known automobile club in the US, won't let me look at New Mexico's AAA website because I'm in Michigan.

Continue reading "New Mexico? No, You mean Michigan! There -- fixed it for you!" »

July 01, 2013

First, Philosophy, then Data Science

One of the most telling articles on "Data Science" appeared in the NYTimes in April[1]. We are facing a massive shortage of data scientists, it read. "There will be almost half a million jobs in five years, and a shortage of up to 190,000 qualified data scientists."

Trouble is, the same article says, "Because data science is so new, universities are scrambling to define it and develop curriculums."

So -- we don't know what they are, but we need 500,000 of them.

Continue reading "First, Philosophy, then Data Science" »

April 13, 2013

Surviving That First Meeting

In a recent on-line forum, someone asked for guidance on the "basic questions" a Data Modeler should ask of their business customers in the initial meetings. It's a good question to ask, and it's encouraging that this person should think about getting this right.

Rather than provide a list of questions -- which, I think, depend very much on the circumstances of the project, the organization, the meeting invitees and your own standing in the group -- I thought it better to offer some advice about preparation.

Few things are relegated to the dustbin of wasted time than a meeting with an unprepared analyst -- especially the "outside consultant" who arrives expecting to be given a history of your business. When you arrive at that meeting, the chances are that there will be 4-5 people who have been with the company for years and been with each other for years, and 1 or 2 outsiders who just got off the boat. If the meeting dissolves into a history of the business, 3 people will be interested and 3-4 will be bored. That's a waste of time.

So -- advice to the consultant:

Continue reading "Surviving That First Meeting" »

November 03, 2012

Immovable Objects

There are many things that we treat everyday as immovable objects. Immovable? More correctly, they are things that we believe to exist in a particular context, and to always exist in that context. We see Mount Rushmore and quickly -- unconsciously -- put it in the context of South Dakota, the United States, or perhaps in the movie 'North By Northwest'. We see hula dancers, and unconsciously think of Hawaii. We see steamboats going along a river, and unconsciously think of the Mississippi, Missouri or perhaps the Ohio rivers, maybe we think of Mark Twain or 'Showboat'.

We make these cognitive leaps because we have become accustomed to certain signals in the data -- in this case, visual data -- that trigger assumptions about other related things we know.

When we are dealing with data, software and even business processes, we often make these same unconscious leaps. They are unconscious because we don't recognize that we are making the connection. And because we don't recognize that, we don't challenge it or demand proof of its validity, we don't question it. We make the leap and move on.

That's a reasonable thing to do in most cases. Challenging every assumption would waste an enormous amount of time, become extremely tiresome and tedious and annoying, and cost a lot of money.

Fair enough. We don't challenge obvious things. But we need to always be prepared to discover that the obvious connection is invalid. We must always be prepared to discover that some unquestionable belief is in doubt -- in fact, that it is wrong, and that it is leading us to make bad decisions.

Case in point -- telephone numbers.

Some years ago, the United States telephone system adopted the concept of an "area code". This 3-digit prefix to the US phone number (which, by then, was standardized on a 7-digit number format) was assigned to specific geographical areas of the country. Hawaii's area code was 808. Michigan had three area codes (517, 616 and 313). Other states, and Canadian provinces, got their selection of area codes.

Over time, as populations grew and telephone usage became more concentrated, the number of area codes expanded. High-density areas were divided into two area codes, then divided again, and yet again, as population -- or more specifically, as telephone usage -- grew. In my case, over the course of 35 years, I have lived in three different area codes, yet neither I nor my phone moved.

In due course, the area code became recognized as an indicator of not only your telephone number but -- more interestingly -- an indicator of your location. Your phone number starts with '212'? You're a New Yorker. '415'? San Francisco Bay Area. '313'? Detroiter. '312'? Chicago.

Enter the profilers and aggregators. Given a 10-digit phone number, we can now derive your location. From your location, we may be able to infer something else about you -- your demographic, your interests (downhill skiing if you're in Colorado; surfing in Southern California; theater if you're in New York; country music if you're in Nashville). A small piece of data intended to make the phone numbering system more manageable is now revealing something about you -- because we made the (now unconscious) leap from area code to location to probable interest.

Enter the mobile phone.

Before the entry of the consumer mobile phone, a telephone number was associated with a location. It was in my house at a particular address, or in an office in a particular building, or even in a phone booth on a particular street corner. The phone number pointed to a physical, geographical location. Thus it was completely reasonable to jump from phone number (specifically, area code) to location, and from location to interests.

But the mobile phone has no location. It moves from place to place. It is -- what's that word? -- right, it's "mobile".

With the mobile phone, you can't make any assumption about its location at a point in time. However, you can still make an assumption about the owner of a mobile phone. People will buy a mobile phone at a store near their home. That store will assign a number based on its location which is likely to be "close enough" to the customer's regular location. So, for a time, the assumption is safe, though a little shaky.

Enter the ability to "port" your landline phone number to a mobile phone.

In time, those phone numbers that did have a physical location -- your landline phone number -- became mobile. It was possible to replace a landline with a mobile phone, but keep the number. And, of course, the mobile phone was mobile, so a mobile phone with a landline phone number made that landline phone number mobile as well.

Suddenly, phone numbers which were reliably associated with a location became disassociated. In my case, I have a phone number that had been a landline in a location 5,000 miles away from my home. Beyond that, I took my mobile phone to a third location -- neither my home nor the original location of the phone number -- for an extended period of time.

Needless to say, the systems and users who had already made the phone number-to-location assumption did not change their assumptions. The 5 or 6 hour time difference between my phone number's apparent location and my actual location created some particular annoyance -- that telemarketing call at dinner time became a call waking me up at midnight. When the charity was scheduling a route to pick up discards in "my neighborhood", I would get a call asking if I had a donation to put on a porch 5,000 miles away. And -- this is, after all, election season -- the robo-calls would wake me in the middle of the night to urge me to vote for someone who did not represent me in any government.

Immovable objects sometimes move. That's the obvious lesson.

The more important lesson goes to our readiness to recognize that an assumption that served us well in the past has begun to fail. If we are ready for that shift, then we keep careful track of the assumptions we've made; as importantly, we keep track of the conclusions we've drawn that are dependent on those assumptions. When the shift comes, and the assumption must be abandoned, all the dependent conclusions also have to be adjusted -- either abandoned in turn, or defended on the base of some other known and reliable information.

When we build our software, or our data, models, we usually base the model on the business rules that we are given. It is nonetheless our responsibility, as software architects and data architects, to be vigilant about inferences, assumptions and conclusions that are built on convention rather than on a solid and lasting characteristic. If we design according to convention or common usage, we have to be prepared for that convention to be discarded, and with it, significant portions of our design.

Software Metaphysics calls for the careful vetting of all assumptions, all inferences, all derived information and practice that may -- however reliable it may seem at the time -- be one day relegated to the dustbin of history.

October 11, 2012

What's the best answer?

In a recent dialog on a web forum, someone asked how to find the "best" way to construct an SQL query. They had posed a problem and received many solutions, each different in syntax, in approach, and in detail. Reasonably, he asked "how do I find the 'best' solution?"

That's a question we should always be asking when we craft a solution to any problem. In IT, and in SQL particularly, I use these criteria to evaluate an answer:

(1) it must return the correct answer on the target platform
(2) it should use standard syntax and construct
(3) it should be easy to understand by someone who didn't write it
(4) it should be as fast as necessary... it needn't be as fast as possible
(5) it should play well with others in a multi-user environment
(6) it must be ready-to-run on time and under budget

The "best" answer is the one which blends all of these. Notice that only two of these are "must", the others are "should".

Let's take these one by one:

(1) it must return the correct answer on the target platform

You would think this goes without saying, but an astonishing number of solutions don't solve the problem, or solve a different problem. Gratefully, that doesn't happen frequently, yet it does happen often enough that you are well-advised to verify the result before you worry about whether the solution is a 'quality' solution or not.

It's important also to make sure the solution works on the target platform. Testing in a lab or a test-bed or in a database 'just like' the real one is not assurance that the solution works on the target. Why not? Usually because of the scale of the problem -- a table of 10's of millions of rows partitioned over multiple servers can turn your simply elegant solution in an out-of-control monster. If scale is not the problem, data often is -- your test-bed data is carefully crafted and follows all of the rules, but the real-life platform may have outliers, offenders and other difficulties.

(2) it should use standard syntax and construct

This is where the first 'religious' argument begins. On one side are those who argue that you should take best advantage of the platform-specific features afforded by the vendor. By the same token, they might argue that the vendor's ANSI SQL syntax is just a wrapper over its native implementation, and that the wrapper itself is a performance penalty.

The counter-argument is, of course, that the adoption of standard syntax and construct enables the query to work with any tool, or any tool user -- including any SQL programmer. That provides a long-term value and benefit, as tools and staff are replaced over time.

Ultimately, this question resolves to whether you are taking a short-term or long-term view, whether the solution is a single-use item (to be replaced when needed) or it is the foundation for re-use (to be adopted and extended when needed).

Having been an eyewitness to decades of churn in both technology and staff, I am always an advocate for standards, reuse and long-term viability.

(3) it should be easy to understand by someone who didn't write it

By the same token -- taking the long-term view -- the syntax that is used, and the solution strategy that is adopted, need to be intelligible to anyone who might chance to come across it. In the immediate case, you will be obligated to demonstrate to someone -- the user, your team partner, a peer review -- that your solution works on paper, that it is a sensible approach, that it is safe and will cause no harmful side-effects. In the longer case, you may be called upon to debug it six months from now, or someone else may have to do so, when it suddenly stops working, when a user wants a small change to it, or when the vendor upgrades their platform. If the solution is not easily understood, then a minor maintenance task may turn into hours or days of experimentation.

Being easy to understand might mean adding appropriate documentation to the programmer's manual, or embedding inline comments at appropriate places in the code. My preference, however, is to use common and recognizable idioms, to avoid "magic tricks", to use verbose language and names, to organize the code visually, and to use standard syntax (see previous item).

(4) it should be as fast as necessary... it needn't be as fast as possible

This is religious argument, the sequel -- how fast is fast enough?

On the forum previously mentioned, the first answer predictably said: "In the world of databases, usually, the fastest solution is the best answer." Is it? My answer is 'No' (obviously -- or else there would only be one criterion here, not six).

Ask anyone who has tried to set a speed record -- on the Bonneville Salt Flats, for example -- and they will tell you that it takes some very heroic, and very expensive, effort to be " the fastest". Even there, the goal is never to be "the fastest" -- instead, these speed-seekers only try to be faster than the last guy -- to be as fast as necessary to take the first position.

Why not go for the fastest possible? Because it takes too long, costs too much, and carries too much risk. Instead, as fast as necessary is a reasonable -- though difficult -- approach.

In IT, there is an unhealthy, even wasteful, focus on speed. Granted, in some cases, speed is critical -- but these are the rare exception, not the rule. The rule is that your solution is fixing a real-life problem that has real goals and timelines and costs, and that probably has little to do with technology. It probably has a lot to do with the business' domain -- banking, health care, government, education, manufacturing, retail or some other concern.

When your attention is directed to getting the solution that is "the fastest", you are likely distracted from the business that you are there to serve. That business will tell you that a solution you think is "too slow" is "fast enough" to solve their problem. When they tell you that, stop working on the solution.

(5) it should play well with others in a multi-user environment

It's very common to hear of a solution to a poorly-behaved database query that begins with "Your database table is wrong. You need to create a new index/column/table. Then you can simply...".

And it is equally common for the poor programmer to reply: "I can't change the database" -- the DBA or the vendor or the legacy system has already decided how the database will be structured. Or, more likely, implementing a database change is a months-long effort involving planning by four or five organizations, refactoring many tools and programs, and planning a data migration. By contrast, the user needs an solution by Monday morning.

Playing well with others requires that you take notice that your little problem is a small part of a much larger, more complex system. Therefore, your solution must fit within its allotted space -- memory-wise, space-wise, processor-wise and network-wise -- within that system complex.

(6) it must be ready-to-run on time and under budget

For those of us who don't have the luxury of working with an unlimited budget, without a clock or calendar, who are not doing problem-solving just for the pure joy of it -- this last criterion might be the first, or only, criterion on the list.

Every problem that is presented to you comes with the question: "When can you get this fixed?" and "How much will it cost?". We are fond of saying that anything is possible, it just takes time and money. This is where time and money have their way with your beautiful, elegant, all-encompassing, everlasting solution.

Someone is waiting -- impatiently -- for your solution. And someone else is waiting for that person to finish what he was asked for so they can get on with their job. You, the solution provider, the expert, are at the end of a long chain of people waiting to do their jobs.

I have never seen a solution that couldn't use just a few more tweaks. That wouldn't be just a little better if we could make one more change.

But inevitably, the time and the money run out. Your solution must be ready and deliverable before that happens.

So -- how do you know which solution is the best solution? Make sure of these two things -- that the solution works and that you've spent no more time and money than you can afford. If you've met those criteria, then you can use the others to evaluate how much better one solution is than another. If you've got reasonable-to-high scores in all criteria, you can be comfortable knowing you've got as good an answer as you might expect.... until the next time.

May 10, 2012

CONFIDENTIAL (in a web forum?)

This appeared at the end of someone's post on a discussion group in LinkedIn:
CONFIDENTIAL AND ATTORNEY/CLIENT PRIVILEGED: This e-mail and any attachments are confidential, intended for the addressee only and may be attorney/client privileged. If you are not the addressee, then please DO NOT read, copy or distribute the message or any attachment. Please reply to the sender that you received the message in error and delete it. Thank you.
Now, I know that it's not always obvious that your email system is attaching these boilerplate texts to every email you send. So I can forgive the user's carelessness in using email to reply to a discussion group.

But isn't there something the software developers could do to make this less obviously wrong?

I had earlier posted about "Software for Idiots", asking software makers to allow smart users to jump quickly past the step-by-step installation. I said that "How we envision our users defines how we build our software."

So could we understand how our helpful shortcut -- to automatically attach a signature message to every email we send -- can make our user seem ridiculous? And, if we understood that, could we find a way to make this more respectful of how the user presents him/herself?

Some of our software have the specific purpose of presenting our user to others. We don't always know who the other will be, or whether they will be business-like or friend-and-family.

Yes, it would be smart of the user to pay attention to how they use the software, to take the extra care to make sure only appropriate messages are added to our emails. It's not hard to do. But the truth is many users won't take that extra step. More to the point -- that extra step is actually several steps and easy to forget.

A thoughtful software designer might burn a few brain cycles in search of some face-saving, embarrassment-avoiding help for the users.

May 05, 2012

Why "Metaphysics"?

A very short blog entry this time : Read this article: Data Management Is Based on Philosophy, Not Science

My field of study at the university was philosophy (after a couple years of political theory and constitutional democracy). I still keep the works of Aristotle on my shelf. I've always thought philosophy was the perfect way to prepare for a career in information systems. I make that recommendation to parents asking what their computer-savvy child should major in. They always look at me like I've grown an extra eye!

But philosophy is the study of ... well, of everything, and metaphysics is the precursor to philosophy.

Hence -- software metaphysics -- the fundamental thinking that prepares the ground for all software.

Worth a read.