Friday, March 04, 2016

Can't make my mind up about "Feuerstein refactoring"

When writing large PL/SQL processes I do like to try to make the code as readable as possible.  One way is to follow Steven Feuerstein's advice as exemplified here in a blog post and here in a Youtube video  to refactor the code into small chunks. I have done that, but then find I have my doubts about it.  My problem with it is that it breaks the code into small local procedures and functions which then access variables declared at a higher scope.  That seems to break a commandment of structured programming and reminds me of my early FORTRAN days and the "common block".

An example from the blog post above is varable l_required_info: this is declared in the main function can_show_information and then used within subprograms like player_can:

      FUNCTION player_can (moment_in IN VARCHAR2)
         RETURN BOOLEAN
      IS
         l_return   BOOLEAN;
      BEGIN
         l_return :=
            CASE moment_in
               ...
               WHEN qdb_competition_mgr.c_resavail_closed
               THEN
                     c_closed_or_ranked
                  OR (    info_type_in = c_see_correctness
                      AND l_required_info.players_accept_quizzes =
                             qdb_config.c_yes)
      ...
            END;

         RETURN l_return;
      END;

(I removed some of the code to focus on the bit I'm interested in).

So this function player_can takes in one parameter moment_in and returns a value, but within it accesses variables from "outside" itself, e.g. l_required_info.  I don't mind the constants like c_see_correctness, because constants are, well, constants.  But the variables disturb me. Another procedure get_required_info changes the value of this variable, also without it being passed in as a parameter:

      PROCEDURE get_required_info
      IS
      BEGIN
         OPEN required_info_cur;

         FETCH required_info_cur INTO l_required_info;

         CLOSE required_info_cur;

My structured programming head makes me want to avoid this by passing all the values that each subroutine uses explicitly as parameters.  But then it all becomes a lot more verbose of course.  I have found myself writing code in the Feuerstein style, but then feeling edgy about having to defend it when others have to maintain it! (Most people are quite happy to write a single procedure hundreds or thousands of lines long of course!)

What do you think?  Are my concerns legitimate or am I just behind the times with my "structured programming" tendency?

3 comments:

Steven Feuerstein said...

Tony, you raise good points. Let's see if I can address them to anyone's satisfaction, including my own. :-)

1. "Global” reference inside nested subprogram: yep, you are right. It would be better to make any such references parameters. You can use SQL Dev’s Refactoring-> Extract Procedure feature to do that for you automatically.

But then the parameter list gets “more verbose” - well, I suppose and I suppose in the extreme case it might be a problem. In most cases, it will not be a problem, I believe. Or you might look at the long parameter list and realize “Heck I shouldn’t have 12 variables. I should have a record.” So you refactor into a single record, thereby generally tightening up your code.

Furthermore, my general response to critiques of this approach applies here: is the alternative any better? That is, one endless sequence of statements, which I find harder to read and maintain.

You definitely don’t have to give up structured programming techniques when following this piece of advice. In fact, I find it hard to see how you can really write well-structured code UNLESS you make extensive use of nested (or peer-level private) subprograms.

2. Do people really complain about maintaining this code? Are most people really “quite happy” to write spaghetti code? My impression is that everyone knows they shouldn’t do it, that it makes them nervous and a little embarrassed, but they never really thought about how to STOP doing it.

Bottom line: if you and I were on the same team, doing code reviews, I would say “Thanks, Tony, I need to move l_required_info into the parameter list.” Make the change, improve the code. Still benefit from improved readabiilty.

Well, I suppose you didn’t expect me to add a comment saying “You are right and I am wrong. My idea is terrible.”

So what’s really important is to hear from some others.

Thanks for taking the time to comment on my post!

Jeffrey Kemp said...

I find myself in the same boat, using local functions/procedures to simplify the main code block or allowing re-use without the inconvenience of having to define and pass all the parameters it needs.

Then, that method gets more and more complex, until it gets to a point where it really needs to be separated out to a self-contained method; by which time it's complexity means that working out all the parameters it needs is a lot of work and introduces regression risk so you avoid it.

One might say "better to avoid local methods in the first place" but half the time those local methods stay nice and small forever.

I'm using records more and more from the get-go now, passing them around everywhere, which often makes this type of refactoring a breeze.

Tony Andrews said...

Thanks for your details comments, Steven. I'm glad to find that on reflection you would make l_required_info a parameter to the subprograms - so we are not of different minds about the importance of structured programming techniques! To answer your second point, no one has actually complained about maintaining this sort of code; I just fear that they might, and if I'm not happy about the "common data" aspect myself I'd find it hard to stand up for my code.

I'm afraid I do think that a lot of developers ARE happy with writing repetitive and monolithic code, and in some sense they find it simpler because all the variables are declared in one place at the top (often lots of unused variables too) and they can start reading the code at the top and work their way down like reading a book (War and Peace), rather than encountering subprogram calls that they then have to reverse back up to find, iteratively. If I am being "odd" by writing code in small chunks then I need to be comfortable with explaining why it is an improvement to a hypothetical doubter. I should add that I am a contractor hired to write code, so not empowered to set standards or impose my way of doing things - rather I have to write good code and try to gently persuade others of the improvements I think I (and they) can make.

I think Jeffrey's comment is spot on about the complexity of working out all the parameters later if you have started by building small subprograms that share common data. That's the position I'm in at the moment with some freshly written code: now I think using parameters rather than "common" data would be better it is already quite difficult to refactor, as I have to check each variable's scope and ensure that I don't break too much while changing it. The great thing when all subprograms are self-contained is that you can move them about at will, even to a different package, without much work.

Record structures as parameters are a key helper, as both of you have said. I particularly detest procedures like get_employee with 15 OUT parameters for each attribute. These get called from 57 different places and then someone adds a new column to the table and a 16th OUT parameter to the procedure, and there's a ton of rework even though 56 of the callers don't care about the new column. However I've actually had resistance on that idea from a very good senior developer, who said he'd prefer that all the calls would break and need to be revisited, so that the impact of the change is considered in each case. We had to agree to disagree on that!

Thanks again for your comments, and I hope to get some more. But already I feel more motivated to stick with the approach while improving on the structure.