Home page Home page Home page Home page
Pixel
Pixel Header R1 C1 Pixel
Pixel Header R2 C1 Pixel
Pixel Header R3 C1 Pixel
Pixel
By apk | Wednesday, 7 March 2018 16:13 | 1 Comments
As we all know, computers are actually pretty dumb.  They can't count past 1.  On the other hand, they're very loyal.  They do exactly what they're told to do.  When you're debugging code, that's great.  Enter in the same set of data, step through the debugger, and it's guaranteed to always work the same way ever time.  Except for when it doesn't.  We've all experienced the frustration of finding something wrong during testing, but no matter what you do, when you step through the debugger, it doesn't happen.

Sure, we usually chalk this up to something with the message queue or interaction with another program that's being interfered with because the debugger now has focus.  But, sometimes, we know we're just fooling ourselves, and we start to wonder if maybe this machine has a mind of it's own.

Recently, we here at Sprezz Towers found ourselves with a machine where the only obvious conclusion was that the system had become sentient.  Before welcoming our new silicon overlords, we decided to give things a second look.  And a third.  Probably a fourth as well.

The problem was reasonably simple.  While attempting to calculate a shipping rate, we were receiving an on-screen message that the shipping rate could not be found.  We expected that because we were testing how the system responded to missing shipping rates.  This had been working for years before the modification we were testing.  And by years, I mean decades.  We don't know when the code was originally written because there's no date in the header, but there's a comment dated June 1993.  Yes, 25 years ago. The most recent comment regarding a change is from 2001.  The code has been recompiled since in the move from ARev to OI, but it's effectively unchanged for 17 years.

The change itself was relatively simple.  We added a set of date ranges to indicate when the shipping rate profile was active. For the overly technical readers, basically, we added three MV fields, HIST_START_DATE, HIST_END_DATE, HIST_ACTIVE.  The system is full of this type of code.  There's a routine that's been in use for 10 years:

validRange = checkDateRange( checkDate, startDates, endDates, activeFlags )

which just returns TRUE$ or FALSE$ indicating if the passed checkDate is inside a valid date range.  This isn't rocket science nor brain surgery.  And it was working fine.  The system displayed the invalid rate message when the date was invalid, and didn't display the message when the date was valid.  That wasn't the problem.  With one specific test case, the system was displaying a failure message we weren't expecting.  The error was related to the customer's configuration, and not the changed code.  It was a valid error, and the correct error was displaying on screen.  However, the wrong error text was being placed into the error log.

Setting a debug break point just before the shipping rate code, we stepped through the code and verified that all the error flags are being set correctly.  And they are.  Everything is working as expected.  Including logging the correct error text.

Going back to our opening premise, computers are dumb.  There are no UI threads or messages interfering here.  The error message bubbles through the system and is returned at the end of the process.  But we're clearly getting different results when stepping through the debugger as opposed to just running the code.  It's time to start thinking critically and logically about this and we start looking over all the code carefully.

When working with systems like this (RevG systems converted to ARev, then to OI), there are times where you feel a bit like you're practicing a programming version of anthropology.  You look at the code and how the style has changed and how variables are named and you can see the evolution of Rev coding styles and standards.  Generally, you can pinpoint about when this code was originally written, just on the use of CASE v IF/THEN statements or how things are capitalized.

As mentioned earlier, this code is old.  Even though the earliest comment date is 1993, we know the code is much older by the style and function.  It's been around since the origin of the system, a RevG application from 1982.  And as RevG code it uses RevG style syntax.  And RevG style calling parameters.  And RevG style error handling.

Ahh..error handling.  I could write a whole new post on inconsistencies in error handling, and whether FALSE$ means success or failure as we move from system to system (and programming epochs).  But in this case, it wasn't how the status was returned.  It was how the status was checked.  Most of us probably forget now, because OpenInsight (in it's modern form) is 25 years old, but in the ARev days, we didn't have get_status() and set_status().  We had status().  And we had...well, that was pretty much it.  There was @FILE.ERROR, but that always seemed too embedded with LH and the filing system to piggy-back on for your own use.

This code used status().  And it used get_status() and set_status() as well.  Needless to say, it was a complicated mess.  But, status() was the key and the source of our error.

It turns out, that when stepping through the debugger, when you return from a subroutine or function call, the contents of status() is cleared.  During normal execution, the system found a value in status(), and jumped into the error routine related to a failed shipping rate call.  However, stepping through the debugger, status()was set to 0, and the system continued on.  Here it checked get_status() for additional errors, and we received the warning about the mis-configured client.

The OI programming documentation says

Be aware, however, that between the time you set the value of Status() and next check it, some other procedure might have set its own value.

This is not related to the Get_Status and Set_Status functions.


While we expect this when calling multiple functions and routines, we didn't expect it to be cleared by the debugger when popping back on the return stack.

The lesson to be learned here is just because your old ARev code recompiles in OI, doesn't mean it's still working.  ARev is ARev and OI is OI, and while they might be very similar and compatible, they're not the same thing.  Something to keep in mind for those hoping to just ARev/32 their way into OpenInsight.  It's not all just cut and paste.

1 Comments:

Post a Comment

Subscribe to Post Comments [Atom]



<< Home

Pixel
Pixel Footer R1 C1 Pixel
Pixel
Pixel
Pixel