conveylive.com

Excerpt from the article: 'How to be a Programmer' - Part 3

By Robert L Read

Penelope Truce
Penelope Truce
Jun 7, 2011
0 Comments | 2331 Views | 0 Hits


Chapter 3: Intermediate

(...continued  from http://www.conveylive.com/a/How_to_be_a_Programmer_2)

3.1 Personal Skills

    How to Stay Motivated
    How to be Widely Trusted
    How to Tradeoff Time vs. Space
    How to Stress Test
    How to Balance Brevity and Abstraction
    How to Learn New Skills
    Learn to Type
    How to Do Integration Testing
    Communication Languages
    Heavy Tools
    How to analyze data


3.2 Team Skills

    How to Manage Development Time
    How to Manage Third-Party Software Risks
    How to Manage Consultants
    How to Communicate the Right Amount
    How to Disagree Honestly and Get Away with It

3.3 Judgement

    How to Tradeoff Quality Against Development Time
    How to Manage Software System Dependence
    How to Decide if Software is Too Immature
    How to Make a Buy vs. Build Decision
    How to Grow Professionally
    How to Evaluate Interviewees
    How to Know When to Apply Fancy Computer Science
    How to Talk to Non-Engineers


3.1 Personal Skills

How to Stay Motivated

It is a wonderful and surprising fact that programmers are highly motivated by the desire to create artifacts that are beautiful, useful, or nifty. This desire is not unique to programmers nor universal but it is so strong and common among programmers that it separates them from others in other roles.

This has practical and important consequences. If programmers are asked to do something that is not beautiful, useful, or nifty, they will have low morale. There's a lot of money to be made doing ugly, stupid, and boring stuff; but in the end, fun will make the most money for the company.

Obviously, there are entire industries organized around motivational techniques some of which apply here. The things that are specific to programming that I can identify are:

    ► Use the best language for the job.

   
Look for opportunities to apply new techniques, languages, and technologies.

   
Try to either learn or teach something, however small, in each project.

Finally, if possible, measure the impact of your work in terms of something that will be personally motivating. For example, when fixing bugs, counting the number of bugs that I have fixed is not at all motivational to me, because it is independent of the number that may still exist, and is also affects the total value I'm adding to my company's customers in only the smallest possible way. Relating each bug to a happy customer, however, is personally motivating to me.

How to be Widely Trusted

To be trusted you must be trustworthy. You must also be visible. If know one knows about you, no trust will be invested in you. With those close to you, such as your teammates, this should not be an issue. You establish trust by being responsive and informative to those outside your department or team. Occasionally someone will abuse this trust, and ask for unreasonable favors. Don't be afraid of this, just explain what you would have to give up doing to perform the favor.

Don't pretend to know something that you don't. With people that are not teammates, you may have to make a clear distinction between ``not knowing right off the top of my head'' and ``not being able to figure it out, ever.''

How to Tradeoff Time vs. Space

You can be a good programmer without going to college, but you can't be a good intermediate programmer without knowing basic computational complexity theory. You don't need to know ``big O'' notation, but I personally think you should be able to understand the difference between ``constant-time'',``n log n'' and ``n squared''. You might be able to intuit how to tradeoff time against space without this knowledge, but in its absence you will not have a firm basis for communicating with your colleagues.

In designing or understanding an algorithm, the amount of time it takes to run is sometimes a function of the size of the input. When that is true, we can say an algorithm's worst/expected/best-case running time is ``n log n'' if it is proportional to the size ($n$) times the logarithm of the size. The notation and way of speaking can be also be applied to the space taken up by a data structure.

To me, computational complexity theory is beautiful and as profound as physics---and a little bit goes a long way!

Time (processor cycles) and space (memory) can be traded off against each other. Engineering is about compromise, and this is a fine example. It is not always systematic. In general, however, one can save space by encoding things more tightly, at the expense of more computation time when you have to decode them. You can save time by caching, that is, spending space to store a local copy of something, at the expense of having to maintain the consistency of the cache. You can sometimes save time by maintaining more information in a data structure. This usually cost a small amount of space but may complicate the algorithm.

Improving the space/time tradeoff can often change one or the other dramatically. However, before you work on this you should ask yourself if what you are improving is really the thing that needs the most improvement. It's fun to work on an algorithm, but you can't let that blind you to the cold hard fact that improving something that is not a problem will not make any noticeable difference and will create a test burden.

Memory on modern computers appears cheap, because unlike processor time, you can't see it being used until you hit the wall; but then failure is catastrophic. There are also other hidden costs to using memory, such as your effect on other programs that must be resident, and the time to allocate and deallocate it. Consider this carefully before you trade away space to gain speed.

How to Stress Test

Stress testing is fun. At first it appears that the purpose of stress testing is to find out if the system works under a load. In reality, it is common that the system does work under a load but fails to work in some way when the load is heavy enough. I call this hitting the wall or bonking[1]. There may be some exceptions, but there is almost always a ‘wall’. The purpose of stress testing is to figure out where the wall is, and then figure out how to move the wall further out.

A plan for stress testing should be developed early in the project, because it often helps to clarify exactly what is expected. Is two seconds for a web page request a miserable failure or a smashing success? Is 500 concurrent users enough? That, of course, depends, but one must know the answer when designing the system that answers the request. The stress test needs to model reality well enough to be useful. It isn't really possible to simulate 500 erratic and unpredictable humans using a system concurrently very easily, but one can at least create 500 simulations and try to model some part of what they might do.

In stress testing, start out with a light load and load the system along some dimension---such as input rate or input size---until you hit the wall. If the wall is too close to satisfy your needs, figure out which resource is the bottleneck (there is usually a dominant one.) Is it memory, processor, I/O, network bandwidth, or data contention? Then figure out how you can move the wall. Note that moving the wall, that is, increasing the maximum load the system can handle, might not help or might actually hurt the performance of a lightly loaded system. Usually performance under heavy load is more important than performance under a light load.

You may have to get visibility into several different dimensions to build up a mental model of it; no single technique is sufficient. For instance, logging often gives a good idea of the wall-clock time between two events in the system, but unless carefully constructed, doesn't give visibility into memory utilization or even data structure size. Similarly, in a modern system, a number of computers and many software systems may be cooperating. Particularly when you are hitting the wall (that is, the performance is non-linear in the size of the input) these other software systems may be a bottleneck. Visibility into these systems, even if only measuring the processor load on all participating machines, can be very helpful.

Knowing where the wall is is essential not only to moving the wall, but also to providing predictability so that the business can be managed effectively.

How to Balance Brevity and Abstraction

Abstraction is key to programming. You should carefully choose how abstract you need to be. Beginning programmers in their enthusiasm often create more abstraction than is really useful. One sign of this is if you create classes that don't really contain any code and don't really do anything except serve to abstract something. The attraction of this is understandable but the value of code brevity must be measured against the value of abstraction. Occasionally, one sees a mistake made by enthusiastic idealists: at the start of the project a lot of classes are defined that seem wonderfully abstract and one may speculate that they will handle every eventuality that may arise. As the project progresses and fatigue sets in, the code itself becomes messy. Function bodies become longer than they should be. The empty classes are a burden to document that is ignored when under pressure. The final result would have been better if the energy spent on abstraction had been spent on keeping things short and simple. This is a form of speculative programming. I strongly recommend the article ``Succinctness is Power'' by Paul Graham[PGSite].

There is a certain dogma associated with useful techniques such as information hiding and object oriented programming that are sometimes taken too far. These techniques let one code abstractly and anticipate change. I personally think, however, that you should not produce much speculative code. For example, it is an accepted style to hide an integer variable on an object behind mutators and accessors, so that the variable itself is not exposed, only the little interface to it. This does allow the implementation of that variable to be changed without affecting the calling code, and is perhaps appropriate to a library writer who must publish a very stable API. But I don't think the benefit of this outweighs the cost of the wordiness of it when my team owns the calling code and hence can recode the caller as easily as the called. Four or five extra lines of code is a heavy price to pay for this speculative benefit.

Portability poses a similar problem. Should code be portable to a different computer, compiler, software system or platform, or simply easily ported? I think a non-portable, short-and-easily-ported piece of code is better than a long portable one. It is relatively easy and certainly a good idea to confine non-portable code to designated areas, such as a class that makes database queries that are specific to a given DBMS.

How to Learn New Skills

Learning new skills, especially non-technical ones, is the greatest fun of all. Most companies would have better morale if they understood how much this motivates programmers.

Humans learn by doing. Book-reading and class-taking are useful. But could you have any respect for a programmer who had never written a program? To learn any skill, you have to put yourself in a forgiving position where you can exercise that skill. When learning a new programming language, try to do a small project it in before you have to do a large project. When learning to manage a software project, try to manage a small one first.

A good mentor is no replacement for doing things yourself, but is a lot better than a book. What can you offer a potential mentor in exchange for their knowledge? At a minimum, you should offer to study hard so their time won't be wasted.

Try to get your boss to let you have formal training, but understand that it often not much better than the same amount of time spent simply playing with the new skill you want to learn. It is, however, easier to ask for training than playtime in our imperfect world, even though a lot of formal training is just sleeping through lectures waiting for the dinner party.

If you lead people, understand how they learn and assist them by assigning them projects that are the right size and that exercise skills they are interested in. Don't forget that the most important skills for a programmer are not the technical ones. Give your people a chance to play and practice courage, honesty, and communication.

Learn to Type

Learn to touch-type. This is an intermediate skill because writing code is so hard that the speed at which you can type is irrelevant and can't put much of a dent in the time it takes to write code, no matter how good you are. However, by the time you are an intermediate programmer you will probably spend a lot of time writing natural language to your colleagues and others. This is a fun test of your commitment; it takes dedicated time that is not much fun to learn something like that. Legend has it that when Michael Tiemann[2] was at MCC people would stand outside his door to listen to the hum generated by his keystrokes which were so rapid as to be indistinguishable.

How to Do Integration Testing

Integration testing is the testing of the integration of various components that have been unit tested. Integration is expensive and it comes out in the testing. You must include time for this in your estimates and your schedule.

Ideally you should organize a project so that there is not a phase at the end where integration must explicitly take place. It is far better to gradually integrate things as they are completed over the course of the project. If it is unavoidable estimate it carefully.

Communication Languages

There are some languages, that is, formally defined syntactic systems, that are not programming languages but communication languages---they are designed specifically to facillitate communication through standardization. In 2003 the most important of these are UML, XML, and SQL. You should have some familiarity with all of these so that you can communicate well and decide when to use them.

UML is a rich formal system for making drawings that describe designs. It's beauty lines in that is both visual and formal, capable of conveying a great deal of information if both the author and the audience know UML. You need to know about it because designs are sometimes communicated in it. There are very helpful tools for making UML drawings that look very professional. In a lot of cases UML is too formal, and I find myself using a simpler boxes and arrows style for design drawings. But I'm fairly sure UML is at least as good for you as studying Latin.

XML is a standard for defining new standards. It is not a solution to data interchange problems, though you sometimes see it presented as if it was. Rather, it is a welcome automation of the most boring part of data interchange, namely, structuring the representation into a linear sequence and parsing back into a structure. It provides some nice type- and correctness-checking, though again only a fraction of what you are likely to need in practicen.

SQL is a very powerful and rich data query and manipulation language that is not quite a programming language. It has many variations, typically quite product-dependent, which are less important than the standardized core. SQL is the lingua franca of relational databases. You may or may not work in any field that can benefit from an understanding of relational databases, but you should have a basic understanding of them and they syntax and meaning of SQL.

Heavy Tools

As our technological culture progresses, software technology moves from inconceivable, to research, to new products, to standardized products, to widely available and inexpensive products. These heavy tools can pull great loads, but can be intimidating and require a large investment in understanding. The intermediate programmer has to know how to manage them and when they should be used or considered.

To my mind right some of the best heavy tools are:

   
Relational Databases,

   
Full-text Search Engines,

   
Math libraries,

   
OpenGL,

   
XML parsers, and

   
Spreadsheets.

How to analyze data

Data analysis is a process in the early stages of software development, when you examine a business activity and find the requirements to convert it into a software application. This is a formal definition, which may lead you to believe that data analysis is an action that you should better leave to the systems analysts, while you, the programmer, should focus on coding what somebody else has designed. If we follow strictly the software engineering paradigm, it may be correct. Experienced programmers become designers and the sharpest designers become business analysts, thus being entitled to think about all the data requirements and give you a well defined task to carry out. This is not entirely accurate, because data is the core value of every programming activity. Whatever you do in your programs, you are either moving around or modifying data. The business analyst is analyzing the needs in a larger scale, and the software designer is further squeezing such scale so that, when the problem lands on your desk, it seems that all you need to do is to apply clever algorithms and start moving existing data.

Not so.

No matter at which stage you start looking at it, data is the main concern of a well designed application. If you look closely at how a business analyst gets the requirements out of the customer?s requests, you?ll realize that data plays a fundamental role. The analyst creates so called Data Flow Diagrams, where all data sources are identified and the flow of information is shaped. Having clearly defined which data should be part of the system, the designer will shape up the data sources, in terms of database relations, data exchange protocols, and file formats, so that the task is ready to be passed down to the programmer. However, the process is not over yet, because you ? the programmer ? even after this thorough process of data refinement, are required to analyze data to perform the task in the best possible way. The bottom line of your task is the core message of Niklaus Wirth, the father of several languages. ?Algorithms + Data Structures = Programs.? There is never an algorithm standing alone, doing something to itself. Every algorithm is supposed to do something to at least one piece of data.

Therefore, since algorithms don't spin their wheels in a vacuum, you need to analyze both the data that somebody else has identified for you and the data that is necessary to write down your code. A trivial example will make the matter clearer. You are implementing a search routine for a library. According to your specifications, the user can select books by a combination of genre, author, title, publisher, printing year, and number of pages. The ultimate goal of your routine is to produce a legal SQL statement to search the back-end database. Based on these requirements, you have several choices: check each control in turn, using a "switch" statement, or several "if" ones; make an array of data controls, checking each element to see if it is set; create (or use) an abstract control object from which inherit all your specific controls, and connect them to an event-driven engine. If your requirements include also tuning up the query performance, by making sure that the items are checked in a specific order, you may consider using a tree of components to build your SQL statement. As you can see, the choice of the algorithm depends on the data you decide to use, or to create. Such decisions can make all the difference between an efficient algorithm and a disastrous one. However, efficiency is not the only concern. You may use a dozen named variables in your code and make it as efficient as it can ever be. But such a piece of code might not be easily maintainable. Perhaps choosing an appropriate container for your variables could keep the same speed and in addition allow your colleagues to understand the code better when they look at it next year. Furthermore, choosing a well defined data structure may allow them to extend the functionality of your code without rewriting it. In the long run, your choices of data determines how long your code will survive after you are finished with it. Let me give you another example, just some more food for thought. Let's suppose that your task is to find all the words in a dictionary with more than three anagrams, where an anagram must be another word in the same dictionary. If you think of it as a computational task, you will end up with an endless effort, trying to work out all the combinations of each word and then comparing it to the other words in the list. However, if you analyze the data at hand, you'll realize that each word may be represented by a record containing the word itself and a sorted array of its letters as ID. Armed with such knowledge, finding anagrams means just sorting the list on the additional field and picking up the ones that share the same ID. The brute force algorithm may take several days to run, while the smart one is just a matter of a few seconds. Remember this example the next time you are facing an intractable problem.

 


Dedication to the programmers of Hire.com.

Copyright 2002, 2003 Robert L. Read

by Robert L. Read. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with one Invariant Section being ‘History (As of February, 2003)’, no Front-Cover Texts, and one Back-Cover Text: ‘The original version of this document was written by Robert L. Read without renumeration and dedicated to the programmers of Hire.com.’ A copy of the license is included in the section entitled ‘GNU Free Documentation License’.


 

Author's note: Original Article: How To Be A Programmer
Keywords: Programmer, How to, Motivated, Trusted, tradeoff, stress test, abstraction, skills, type, integration testing, languages, tools, analyze data, part 3



Please Signup to comment on this article